linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-06-17	hwmon: (pmbus) Add support for reading direct mode coefficients	Erik Rosen
	Add support for reading and decoding direct format coefficients to the PMBus core driver. If the new flag PMBUS_USE_COEFFICIENTS_CMD is set, the driver will use the COEFFICIENTS register together with the information in the pmbus_sensor_attr structs to initialize relevant coefficients for the direct mode format. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> [groeck: Initialize ret with -EINVAL in pmbus_init_coefficients()] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17	hwmon: (pmbus) Add new pmbus flag NO_WRITE_PROTECT	Erik Rosen
	Some PMBus chips respond with invalid data when reading the WRITE_PROTECT register. For such chips, this flag should be set so that the PMBus core driver doesn't use the WRITE_PROTECT command to determine its behavior. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17	hwmon: (pmbus) Add new flag PMBUS_READ_STATUS_AFTER_FAILED_CHECK	Erik Rosen
	Some PMBus chips end up in an undefined state when trying to read an unsupported register. For such chips, it is necessary to reset the chip pmbus controller to a known state after a failed register check. This can be done by reading a known register. By setting this flag the driver will try to read the STATUS register after each failed register check. This read may fail, but it will put the chip into a known state. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Link: https://lore.kernel.org/r/20210507194023.61138-2-erik.rosen@metormote.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17	devm-helpers: Add resource managed version of work init	Matti Vaittinen
	A few drivers which need a work-queue must cancel work at driver detach. Some of those implement remove() solely for this purpose. Help drivers to avoid unnecessary remove and error-branch implementation by adding managed verision of work initialization. This will also help drivers to avoid mixing manual and devm based unwinding when other resources are handled by devm. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/94ff4175e7f2ff134ed2fa7d6e7641005cc9784b.1623146580.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-17	media: subdev: disallow ioctl for saa6588/davinci	Arnd Bergmann
	The saa6588_ioctl() function expects to get called from other kernel functions with a 'saa6588_command' pointer, but I found nothing stops it from getting called from user space instead, which seems rather dangerous. The same thing happens in the davinci vpbe driver with its VENC_GET_FLD command. As a quick fix, add a separate .command() callback pointer for this driver and change the two callers over to that. This change can easily get backported to stable kernels if necessary, but since there are only two drivers, we may want to eventually replace this with a set of more specialized callbacks in the long run. Fixes: c3fda7f835b0 ("V4L/DVB (10537): saa6588: convert to v4l2_subdev.") Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-06-17	media: v4l2-subdev: add subdev-wide state struct	Tomi Valkeinen
	We have 'struct v4l2_subdev_pad_config' which contains configuration for a single pad used for the TRY functionality, and an array of those structs is passed to various v4l2_subdev_pad_ops. I was working on subdev internal routing between pads, and realized that there's no way to add TRY functionality for routes, which is not pad specific configuration. Adding a separate struct for try-route config wouldn't work either, as e.g. set-fmt needs to know the try-route configuration to propagate the settings. This patch adds a new struct, 'struct v4l2_subdev_state' (which at the moment only contains the v4l2_subdev_pad_config array) and the new struct is used in most of the places where v4l2_subdev_pad_config was used. All v4l2_subdev_pad_ops functions taking v4l2_subdev_pad_config are changed to instead take v4l2_subdev_state. The changes to drivers/media/v4l2-core/v4l2-subdev.c and include/media/v4l2-subdev.h were written by hand, and all the driver changes were done with the semantic patch below. The spatch needs to be applied to a select list of directories. I used the following shell commands to apply the spatch: dirs="drivers/media/i2c drivers/media/platform drivers/media/usb drivers/media/test-drivers/vimc drivers/media/pci drivers/staging/media" for dir in $dirs; do spatch -j8 --dir --include-headers --no-show-diff --in-place --sp-file v4l2-subdev-state.cocci $dir; done Note that Coccinelle chokes on a few drivers (gcc extensions?). With minor changes we can make Coccinelle run fine, and these changes can be reverted after spatch. The diff for these changes is: For drivers/media/i2c/s5k5baf.c: @@ -1481,7 +1481,7 @@ static int s5k5baf_set_selection(struct v4l2_subdev sd, &s5k5baf_cis_rect, v4l2_subdev_get_try_crop(sd, cfg, PAD_CIS), v4l2_subdev_get_try_compose(sd, cfg, PAD_CIS), - v4l2_subdev_get_try_crop(sd, cfg, PAD_OUT) + v4l2_subdev_get_try_crop(sd, cfg, PAD_OUT), }; s5k5baf_set_rect_and_adjust(rects, rtype, &sel->r); return 0; For drivers/media/platform/s3c-camif/camif-capture.c: @@ -1230,7 +1230,7 @@ static int s3c_camif_subdev_get_fmt(struct v4l2_subdev sd, mf = camif->mbus_fmt; break; - case CAMIF_SD_PAD_SOURCE_C...CAMIF_SD_PAD_SOURCE_P: + case CAMIF_SD_PAD_SOURCE_C: / crop rectangle at camera interface input / mf->width = camif->camif_crop.width; mf->height = camif->camif_crop.height; @@ -1332,7 +1332,7 @@ static int s3c_camif_subdev_set_fmt(struct v4l2_subdev sd, } break; - case CAMIF_SD_PAD_SOURCE_C...CAMIF_SD_PAD_SOURCE_P: + case CAMIF_SD_PAD_SOURCE_C: /* Pixel format can be only changed on the sink pad. / mf->code = camif->mbus_fmt.code; mf->width = crop->width; The semantic patch is: // <smpl> // Change function parameter @@ identifier func; identifier cfg; @@ func(..., - struct v4l2_subdev_pad_config cfg + struct v4l2_subdev_state sd_state , ...) { <... - cfg + sd_state ...> } // Change function declaration parameter @@ identifier func; identifier cfg; type T; @@ T func(..., - struct v4l2_subdev_pad_config cfg + struct v4l2_subdev_state sd_state , ...); // Change function return value @@ identifier func; @@ - struct v4l2_subdev_pad_config + struct v4l2_subdev_state func(...) { ... } // Change function declaration return value @@ identifier func; @@ - struct v4l2_subdev_pad_config + struct v4l2_subdev_state func(...); // Some drivers pass a local pad_cfg for a single pad to a called function. Wrap it // inside a pad_state. @@ identifier func; identifier pad_cfg; @@ func(...) { ... struct v4l2_subdev_pad_config pad_cfg; + struct v4l2_subdev_state pad_state = { .pads = &pad_cfg }; <+... ( v4l2_subdev_call \| sensor_call \| isi_try_fse \| isc_try_fse \| saa_call_all ) (..., - &pad_cfg + &pad_state ,...) ...+> } // If the function uses fields from pad_config, access via state->pads @@ identifier func; identifier state; @@ func(..., struct v4l2_subdev_state state , ...) { <... ( - state->try_fmt + state->pads->try_fmt \| - state->try_crop + state->pads->try_crop \| - state->try_compose + state->pads->try_compose ) ...> } // If the function accesses the filehandle, use fh->state instead @@ struct v4l2_subdev_fh fh; @@ - fh->pad + fh->state @@ struct v4l2_subdev_fh fh; @@ - fh.pad + fh.state // Start of vsp1 specific @@ @@ struct vsp1_entity { ... - struct v4l2_subdev_pad_config config; + struct v4l2_subdev_state config; ... }; @@ symbol entity; @@ vsp1_entity_init(...) { ... entity->config = - v4l2_subdev_alloc_pad_config + v4l2_subdev_alloc_state (&entity->subdev); ... } @@ symbol entity; @@ vsp1_entity_destroy(...) { ... - v4l2_subdev_free_pad_config + v4l2_subdev_free_state (entity->config); ... } @exists@ identifier func =~ "(^vsp1.)\|(hsit_set_format)\|(sru_enum_frame_size)\|(sru_set_format)\|(uif_get_selection)\|(uif_set_selection)\|(uds_enum_frame_size)\|(uds_set_format)\|(brx_set_format)\|(brx_get_selection)\|(histo_get_selection)\|(histo_set_selection)\|(brx_set_selection)"; symbol config; @@ func(...) { ... - struct v4l2_subdev_pad_config config; + struct v4l2_subdev_state config; ... } // End of vsp1 specific // Start of rcar specific @@ identifier sd; identifier pad_cfg; @@ rvin_try_format(...) { ... - struct v4l2_subdev_pad_config pad_cfg; + struct v4l2_subdev_state sd_state; ... - pad_cfg = v4l2_subdev_alloc_pad_config(sd); + sd_state = v4l2_subdev_alloc_state(sd); <... - pad_cfg + sd_state ...> - v4l2_subdev_free_pad_config(pad_cfg); + v4l2_subdev_free_state(sd_state); ... } // End of rcar specific // Start of rockchip specific @@ identifier func =~ "(rkisp1_rsz_get_pad_fmt)\|(rkisp1_rsz_get_pad_crop)\|(rkisp1_rsz_register)"; symbol rsz; symbol pad_cfg; @@ func(...) { + struct v4l2_subdev_state state = { .pads = rsz->pad_cfg }; ... - rsz->pad_cfg + &state ... } @@ identifier func =~ "(rkisp1_isp_get_pad_fmt)\|(rkisp1_isp_get_pad_crop)"; symbol isp; symbol pad_cfg; @@ func(...) { + struct v4l2_subdev_state state = { .pads = isp->pad_cfg }; ... - isp->pad_cfg + &state ... } @@ symbol rkisp1; symbol isp; symbol pad_cfg; @@ rkisp1_isp_register(...) { + struct v4l2_subdev_state state = { .pads = rkisp1->isp.pad_cfg }; ... - rkisp1->isp.pad_cfg + &state ... } // End of rockchip specific // Start of tegra-video specific @@ identifier sd; identifier pad_cfg; @@ __tegra_channel_try_format(...) { ... - struct v4l2_subdev_pad_config pad_cfg; + struct v4l2_subdev_state sd_state; ... - pad_cfg = v4l2_subdev_alloc_pad_config(sd); + sd_state = v4l2_subdev_alloc_state(sd); <... - pad_cfg + sd_state ...> - v4l2_subdev_free_pad_config(pad_cfg); + v4l2_subdev_free_state(sd_state); ... } @@ identifier sd_state; @@ __tegra_channel_try_format(...) { ... struct v4l2_subdev_state *sd_state; <... - sd_state->try_crop + sd_state->pads->try_crop ...> } // End of tegra-video specific // </smpl> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-06-17	crypto: api - remove CRYPTOA_U32 and related functions	Liu Shixin
	According to the advice of Eric and Herbert, type CRYPTOA_U32 has been unused for over a decade, so remove the code related to CRYPTOA_U32. After removing CRYPTOA_U32, the type of the variable attrs can be changed from union to struct. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-06-17	crypto: shash - avoid comparing pointers to exported functions under CFI	Ard Biesheuvel
	crypto_shash_alg_has_setkey() is implemented by testing whether the .setkey() member of a struct shash_alg points to the default version, called shash_no_setkey(). As crypto_shash_alg_has_setkey() is a static inline, this requires shash_no_setkey() to be exported to modules. Unfortunately, when building with CFI, function pointers are routed via CFI stubs which are private to each module (or to the kernel proper) and so this function pointer comparison may fail spuriously. Let's fix this by turning crypto_shash_alg_has_setkey() into an out of line function. Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-06-16	pstore/blk: Include zone in pstore_device_info	Kees Cook
	Information was redundant between struct pstore_zone_info and struct pstore_device_info. Use struct pstore_zone_info, with member name "zone". Additionally untangle the logic for the "best effort" block device instance. Signed-off-by: Kees Cook <keescook@chromium.org> Fixed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/lkml/20210617005424.182305-1-pulehui@huawei.com
2021-06-17	netfilter: nf_tables: add last expression	Pablo Neira Ayuso
	Add a new optional expression that tells you when last matching on a given rule / set element element has happened. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16	net/mlx5e: Don't create devices during unload flow	Dmytro Linkin
	Running devlink reload command for port in switchdev mode cause resources to corrupt: driver can't release allocated EQ and reclaim memory pages, because "rdma" auxiliary device had add CQs which blocks EQ from deletion. Erroneous sequence happens during reload-down phase, and is following: 1. detach device - suspends auxiliary devices which support it, destroys others. During this step "eth-rep" and "rdma-rep" are destroyed, "eth" - suspended. 2. disable SRIOV - moves device to legacy mode; as part of disablement - rescans drivers. This step adds "rdma" auxiliary device. 3. destroy EQ table - <failure>. Driver shouldn't create any device during unload flows. To handle that implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset on device attach. If flag is set do no-op on drivers rescan. Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16	net/smc: Make SMC statistics network namespace aware	Guvenc Gulce
	Make the gathered SMC statistics network namespace aware, for each namespace collect an own set of statistic information. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16	net/smc: Add netlink support for SMC fallback statistics	Guvenc Gulce
	Add support to collect more detailed SMC fallback reason statistics and provide these statistics to user space on the netlink interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16	net/smc: Add netlink support for SMC statistics	Guvenc Gulce
	Add the netlink function which collects the statistics information and delivers it to the userspace. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16	mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()	Hugh Dickins
	There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped. The first attempt used try_to_unmap(page, TTU_SYNC\|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page. However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside. The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside. Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly. This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe. Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped. Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: fc127da085c2 ("truncate: handle file thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/thp: try_to_unmap() use TTU_SYNC for safe splitting	Hugh Dickins
	Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/thp: make is_huge_zero_pmd() safe and quicker	Hugh Dickins
	Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry. Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time. __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present. Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/hugetlb: expand restore_reserve_on_error functionality	Mike Kravetz
	The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/ Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare	Peter Xu
	I found it by pure code review, that pte_same_as_swp() of unuse_vma() didn't take uffd-wp bit into account when comparing ptes. pte_same_as_swp() returning false negative could cause failure to swapoff swap ptes that was wr-protected by userfaultfd. Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [5.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm,hwpoison: fix race with hugetlb page allocation	Naoya Horiguchi
	When hugetlb page fault (under overcommitting situation) and memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race: CPU0: CPU1: gather_surplus_pages() page = alloc_surplus_huge_page() memory_failure_hugetlb() get_hwpoison_page(page) __get_hwpoison_page(page) get_page_unless_zero(page) zero = put_page_testzero(page) VM_BUG_ON_PAGE(!zero, page) enqueue_huge_page(h, page) put_page(page) __get_hwpoison_page() only checks the page refcount before taking an additional one for memory error handling, which is not enough because there's a time window where compound pages have non-zero refcount during hugetlb page initialization. So make __get_hwpoison_page() check page status a bit more for hugetlb pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags under hugetlb_lock makes sure that the hugetlb page is not transitive. It's notable that another new function, HWPoisonHandlable(), is helpful to prevent a race against other transitive page states (like a generic compound page just before PageHuge becomes true). Link: https://lkml.kernel.org/r/20210603233632.2964832-2-nao.horiguchi@gmail.com Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	Merge remote-tracking branch 'linux-pm/acpi-scan' into review-hans	Hans de Goede

2021-06-16	Merge tag 'intel-gpio-v5.14-1' into review-hans	Hans de Goede
	intel-gpio for v5.14-1 * Export two functions from GPIO ACPI for wider use * Clean up Whiskey Cove and Crystal Cove GPIO drivers The following is an automated git shortlog grouped by driver: crystalcove: - remove platform_set_drvdata() + cleanup probe gpiolib: - acpi: Add acpi_gpio_get_io_resource() - acpi: Introduce acpi_get_and_request_gpiod() helper wcove: - Split error handling for CTRL and IRQ registers - Unify style of to_reg() with to_ireg() - Use IRQ hardware number getter instead of direct access
2021-06-16	platform/surface: aggregator_cdev: Allow enabling of events from user-space	Maximilian Luz
	While events can already be enabled and disabled via the generic request IOCTL, this bypasses the internal reference counting mechanism of the controller. Due to that, disabling an event will turn it off regardless of any other client having requested said event, which may break functionality of that client. To solve this, add IOCTLs wrapping the ssam_controller_event_enable() and ssam_controller_event_disable() functions, which have been previously introduced for this specific purpose. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-6-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	platform/surface: aggregator_cdev: Add support for forwarding events to ↵	Maximilian Luz
	user-space Currently, debugging unknown events requires writing a custom driver. This is somewhat difficult, slow to adapt, and not entirely user-friendly for quickly trying to figure out things on devices of some third-party user. We can do better. We already have a user-space interface intended for debugging SAM EC requests, so let's add support for receiving events to that. This commit provides support for receiving events by reading from the controller file. It additionally introduces two new IOCTLs to control which event categories will be forwarded. Specifically, a user-space client can specify which target categories it wants to receive events from by registering the corresponding notifier(s) via the IOCTLs and after that, read the received events by reading from the controller device. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-5-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	platform/surface: aggregator: Update copyright	Maximilian Luz
	It's 2021, update the copyright accordingly. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-4-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	platform/surface: aggregator: Allow enabling of events without notifiers	Maximilian Luz
	We can already enable and disable SAM events via one of two ways: either via a (non-observer) notifier tied to a specific event group, or a generic event enable/disable request. In some instances, however, neither method may be desirable. The first method will tie the event enable request to a specific notifier, however, when we want to receive notifications for multiple event groups of the same target category and forward this to the same notifier callback, we may receive duplicate events, i.e. one event per registered notifier. The second method will bypass the internal reference counting mechanism, meaning that a disable request will disable the event regardless of any other client driver using it, which may break the functionality of that driver. To address this problem, add new functions that allow enabling and disabling of events via the event reference counting mechanism built into the controller, without needing to register a notifier. This can then be used in combination with observer notifiers to process multiple events of the same target category without duplication in the same callback function. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20210604134755.535590-3-luzmaximilian@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	platform/surface: aggregator: Allow registering notifiers without enabling ↵	Maximilian Luz
	events Currently, each SSAM event notifier is directly tied to one group of events. This makes sense as registering a notifier will automatically take care of enabling the corresponding event group and normally drivers only need notifications for a very limited number of events, associated with different callbacks for each group. However, there are rare cases, especially for debugging, when we want to get notifications for a whole event target category instead of just a single group of events in that category. Registering multiple notifiers, i.e. one per group, may be infeasible due to two issues: a) we might not know every event enable/disable specification as some events are auto-enabled by the EC and b) forwarding this to the same callback will lead to duplicate events as we might not know the full event specification to perform the appropriate filtering. This commit introduces observer-notifiers, which are notifiers that are not tied to a specific event group and do not attempt to manage any events. In other words, they can be registered without enabling any event group or incrementing the corresponding reference count and just act as silent observers, listening to all currently/previously enabled events based on their match-specification. Essentially, this allows us to register one single notifier for a full event target category, meaning that we can process all events of that target category in a single callback without duplication. Specifically, this will be used in the cdev debug interface to forward events to user-space via a device file from which the events can be read. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-2-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	ide: remove the legacy ide driver	Christoph Hellwig
	The legay ide driver has been replace with libata starting in 2003 and has been scheduled for removal for a while. Finally kill it off so that we can start cleaning up various bits of cruft it forced on the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: include: libata: Move fields commonly over-written to separate MACRO	Lee Jones
	This is a pre-cursor to some upcoming W=1 fix-ups. Fixes the following W=1 kernel build warning(s): Cc: Jens Axboe <axboe@kernel.dk> Cc: Mark Lord <mlord@pobox.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-2-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	xfrm: delete xfrm4_output_finish xfrm6_output_finish declarations	Antony Antony
	These function declarations are not needed any more. The definitions were deleted. Fixes: 2ab6096db2f1 ("xfrm: remove output_finish indirection from xfrm_state_afinfo") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-16	ACPI: Check StorageD3Enable _DSD property in ACPI code	Mario Limonciello
	Although first implemented for NVME, this check may be usable by other drivers as well. Microsoft's specification explicitly mentions that is may be usable by SATA and AHCI devices. Google also indicates that they have used this with SDHCI in a downstream kernel tree that a user can plug a storage device into. Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Suggested-by: Keith Busch <kbusch@kernel.org> CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> CC: Alexander Deucher <Alexander.Deucher@amd.com> CC: Rafael J. Wysocki <rjw@rjwysocki.net> CC: Prike Liang <prike.liang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-15	ahci: Add support for Dell S140 and later controllers	Charles Rose
	This patch enables support for Dell S140 and later controllers that use Intel's PCHs configured as PCI_CLASS_STORAGE_RAID. Reviewed-by: Mika Westerberg <mika.westerberg@intel.com> Signed-off-by: Charles Rose <charles.rose@dell.com> Link: https://lore.kernel.org/r/20210615190801.1744466-1-charles.rose@dell.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15	net: bonding: Use per-cpu rr_tx_counter	Jussi Maki
	The round-robin rr_tx_counter was shared across CPUs leading to significant cache thrashing at high packet rates. This patch switches the round-robin packet counter to use a per-cpu variable to decide the destination slave. On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh (-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after this patch. "perf top -e cache_misses" before: 12.31% [bonding] [k] bond_xmit_roundrobin_slave_get 10.59% [sch_fq_codel] [k] fq_codel_dequeue 9.34% [kernel] [k] skb_release_data after: 15.42% [sch_fq_codel] [k] fq_codel_dequeue 10.06% [kernel] [k] __memset 9.12% [kernel] [k] skb_release_data Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15	net: inline function get_net_ns_by_fd if NET_NS is disabled	Changbin Du
	The function get_net_ns_by_fd() could be inlined when NET_NS is not enabled. Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15	ptp: improve max_adj check against unreasonable values	Jakub Kicinski
	Scaled PPM conversion to PPB may (on 64bit systems) result in a value larger than s32 can hold (freq/scaled_ppm is a long). This means the kernel will not correctly reject unreasonably high ->freq values (e.g. > 4294967295ppb, 281474976645 scaled PPM). The conversion is equivalent to a division by ~66 (65.536), so the value of ppb is always smaller than ppm, but not small enough to assume narrowing the type from long -> s32 is okay. Note that reasonable user space (e.g. ptp4l) will not use such high values, anyway, 4289046510ppb ~= 4.3x, so the fix is somewhat pedantic. Fixes: d39a743511cd ("ptp: validate the requested frequency adjustment.") Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15	bpf: Support socket migration by eBPF.	Kuniyuki Iwashima
	This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to check if the attached eBPF program is capable of migrating sockets. When the eBPF program is attached, we run it for socket migration if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or net.ipv4.tcp_migrate_req is enabled. Currently, the expected_attach_type is not enforced for the BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the earlier idea in the commit aac3fc320d94 ("bpf: Post-hooks for sys_bind") to fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type(). Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to select a new listener based on the child socket. migrating_sk varies depending on if it is migrating a request in the accept queue or during 3WHS. - accept_queue : sock (ESTABLISHED/SYN_RECV) - 3WHS : request_sock (NEW_SYN_RECV) In the eBPF program, we can select a new listener by BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning SK_DROP. This feature is useful when listeners have different settings at the socket API level or when we want to free resources as soon as possible. - SK_PASS with selected_sk, select it as a new listener - SK_PASS with selected_sk NULL, fallbacks to the random selection - SK_DROP, cancel the migration. There is a noteworthy point. We select a listening socket in three places, but we do not have struct skb at closing a listener or retransmitting a SYN+ACK. On the other hand, some helper functions do not expect skb is NULL (e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer() in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb temporarily before running the eBPF program. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
2021-06-15	bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT.	Kuniyuki Iwashima
	We will call sock_reuseport.prog for socket migration in the next commit, so the eBPF program has to know which listener is closing to select a new listener. We can currently get a unique ID of each listener in the userspace by calling bpf_map_lookup_elem() for BPF_MAP_TYPE_REUSEPORT_SOCKARRAY map. This patch makes the pointer of sk available in sk_reuseport_md so that we can get the ID by BPF_FUNC_get_socket_cookie() in the eBPF program. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/netdev/20201119001154.kapwihc2plp4f7zc@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20210612123224.12525-9-kuniyu@amazon.co.jp
2021-06-15	tcp: Add reuseport_migrate_sock() to select a new listener.	Kuniyuki Iwashima
	reuseport_migrate_sock() does the same check done in reuseport_listen_stop_sock(). If the reuseport group is capable of migration, reuseport_migrate_sock() selects a new listener by the child socket hash and increments the listener's sk_refcnt beforehand. Thus, if we fail in the migration, we have to decrement it later. We will support migration by eBPF in the later commits. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-5-kuniyu@amazon.co.jp
2021-06-15	tcp: Keep TCP_CLOSE sockets in the reuseport group.	Kuniyuki Iwashima
	When we close a listening socket, to migrate its connections to another listener in the same reuseport group, we have to handle two kinds of child sockets. One is that a listening socket has a reference to, and the other is not. The former is the TCP_ESTABLISHED/TCP_SYN_RECV sockets, and they are in the accept queue of their listening socket. So we can pop them out and push them into another listener's queue at close() or shutdown() syscalls. On the other hand, the latter, the TCP_NEW_SYN_RECV socket is during the three-way handshake and not in the accept queue. Thus, we cannot access such sockets at close() or shutdown() syscalls. Accordingly, we have to migrate immature sockets after their listening socket has been closed. Currently, if their listening socket has been closed, TCP_NEW_SYN_RECV sockets are freed at receiving the final ACK or retransmitting SYN+ACKs. At that time, if we could select a new listener from the same reuseport group, no connection would be aborted. However, we cannot do that because reuseport_detach_sock() sets NULL to sk_reuseport_cb and forbids access to the reuseport group from closed sockets. This patch allows TCP_CLOSE sockets to remain in the reuseport group and access it while any child socket references them. The point is that reuseport_detach_sock() was called twice from inet_unhash() and sk_destruct(). This patch replaces the first reuseport_detach_sock() with reuseport_stop_listen_sock(), which checks if the reuseport group is capable of migration. If capable, it decrements num_socks, moves the socket backwards in socks[] and increments num_closed_socks. When all connections are migrated, sk_destruct() calls reuseport_detach_sock() to remove the socket from socks[], decrement num_closed_socks, and set NULL to sk_reuseport_cb. By this change, closed or shutdowned sockets can keep sk_reuseport_cb. Consequently, calling listen() after shutdown() can cause EADDRINUSE or EBUSY in inet_csk_bind_conflict() or reuseport_add_sock() which expects such sockets not to have the reuseport group. Therefore, this patch also loosens such validation rules so that a socket can listen again if it has a reuseport group with num_closed_socks more than 0. When such sockets listen again, we handle them in reuseport_resurrect(). If there is an existing reuseport group (reuseport_add_sock() path), we move the socket from the old group to the new one and free the old one if necessary. If there is no existing group (reuseport_alloc() path), we allocate a new reuseport group, detach sk from the old one, and free it if necessary, not to break the current shutdown behaviour: - we cannot carry over the eBPF prog of shutdowned sockets - we cannot attach/detach an eBPF prog to/from listening sockets via shutdowned sockets Note that when the number of sockets gets over U16_MAX, we try to detach a closed socket randomly to make room for the new listening socket in reuseport_grow(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-4-kuniyu@amazon.co.jp
2021-06-15	tcp: Add num_closed_socks to struct sock_reuseport.	Kuniyuki Iwashima
	As noted in the following commit, a closed listener has to hold the reference to the reuseport group for socket migration. This patch adds a field (num_closed_socks) to struct sock_reuseport to manage closed sockets within the same reuseport group. Moreover, this and the following commits introduce some helper functions to split socks[] into two sections and keep TCP_LISTEN and TCP_CLOSE sockets in each section. Like a double-ended queue, we will place TCP_LISTEN sockets from the front and TCP_CLOSE sockets from the end. TCP_LISTEN----------> <-------TCP_CLOSE +---+---+ --- +---+ --- +---+ --- +---+ \| 0 \| 1 \| ... \| i \| ... \| j \| ... \| k \| +---+---+ --- +---+ --- +---+ --- +---+ i = num_socks - 1 j = max_socks - num_closed_socks k = max_socks - 1 This patch also extends reuseport_add_sock() and reuseport_grow() to support num_closed_socks. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-3-kuniyu@amazon.co.jp
2021-06-15	net: Introduce net.ipv4.tcp_migrate_req.	Kuniyuki Iwashima
	This commit adds a new sysctl option: net.ipv4.tcp_migrate_req. If this option is enabled or eBPF program is attached, we will be able to migrate child sockets from a listener to another in the same reuseport group after close() or shutdown() syscalls. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-2-kuniyu@amazon.co.jp
2021-06-15	clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG	Tony Lindgren
	As we are using cpu_pm to save and restore context, we must also save and restore the timer sysconfig register TIOCP_CFG. This is needed because we are not calling PM runtime functions at all with cpu_pm. Fixes: b34677b0999a ("clocksource/drivers/timer-ti-dm: Implement cpu_pm notifier for context save and restore") Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Adam Ford <aford173@gmail.com> Cc: Andreas Kemnade <andreas@kemnade.info> Cc: Lokesh Vutla <lokeshvutla@ti.com> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210415085506.56828-1-tony@atomide.com
2021-06-15	quota: finish disable quotactl_path syscall	Marcin Juszkiewicz
	In commit 5b9fedb31e47 ("quota: Disable quotactl_path syscall") Jan Kara disabled quotactl_path syscall on several architectures. This commit disables it on all architectures using unified list of system calls: - arm64 - arc - csky - h8300 - hexagon - nds32 - nios2 - openrisc - riscv (32/64) CC: Jan Kara <jack@suse.cz> CC: Christian Brauner <christian.brauner@ubuntu.com> CC: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein Link: https://lore.kernel.org/r/20210614153712.313707-1-marcin@juszkiewicz.com.pl Fixes: 5b9fedb31e47 ("quota: Disable quotactl_path syscall") Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl> Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-14	net/mlx5: Enlarge interrupt field in CREATE_EQ	Shay Drory
	FW is now supporting more than 256 MSI-X per PF (up to 2K). Hence, enlarge interrupt field in CREATE_EQ to make use of the new MSI-X's. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14	net/mlx5: Provide cpumask at EQ creation phase	Leon Romanovsky
	The users of EQ are running their code on different CPUs and with various affinity patterns. Move the cpumask setting close to their actual usage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14	net: core: devlink: add dropped stats traps field	Oleksandr Mazur
	Whenever query statistics is issued for trap, devlink subsystem would also fill-in statistics 'dropped' field. This field indicates the number of packets HW dropped and failed to report to the device driver, and thus - to the devlink subsystem itself. In case if device driver didn't register callback for hard drop statistics querying, 'dropped' field will be omitted and not filled. Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy: micrel: ksz886x/ksz8081: add cabletest support	Oleksij Rempel
	This patch support for cable test for the ksz886x switches and the ksz8081 PHY. The patch was tested on a KSZ8873RLL switch with following results: - port 1: - provides invalid values, thus return -ENOTSUPP (Errata: DS80000830A: "LinkMD does not work on Port 1", http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8873-Errata-DS80000830A.pdf) - port 2: - can detect distance - can detect open on each wire of pair A (wire 1 and 2) - can detect open only on one wire of pair B (only wire 3) - can detect short between wires of a pair (wires 1 + 2 or 3 + 6) - short between pairs is detected as open. For example short between wires 2 + 3 is detected as open. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy/dsa micrel/ksz886x add MDI-X support	Oleksij Rempel
	Add support for MDI-X status and configuration Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy: micrel: move phy reg offsets to common header	Michael Grzeschik
	Some micrel devices share the same PHY register defines. This patch moves them to one common header so other drivers can reuse them. And reuse generic MII_* defines where possible. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	Merge remote-tracking branch 'regmap/for-5.14' into regmap-next	Mark Brown