linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-06-17	hwmon: (pmbus) Add new flag PMBUS_READ_STATUS_AFTER_FAILED_CHECK	Erik Rosen
	Some PMBus chips end up in an undefined state when trying to read an unsupported register. For such chips, it is necessary to reset the chip pmbus controller to a known state after a failed register check. This can be done by reading a known register. By setting this flag the driver will try to read the STATUS register after each failed register check. This read may fail, but it will put the chip into a known state. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Link: https://lore.kernel.org/r/20210507194023.61138-2-erik.rosen@metormote.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17	devm-helpers: Add resource managed version of work init	Matti Vaittinen
	A few drivers which need a work-queue must cancel work at driver detach. Some of those implement remove() solely for this purpose. Help drivers to avoid unnecessary remove and error-branch implementation by adding managed verision of work initialization. This will also help drivers to avoid mixing manual and devm based unwinding when other resources are handled by devm. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/94ff4175e7f2ff134ed2fa7d6e7641005cc9784b.1623146580.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-17	crypto: api - remove CRYPTOA_U32 and related functions	Liu Shixin
	According to the advice of Eric and Herbert, type CRYPTOA_U32 has been unused for over a decade, so remove the code related to CRYPTOA_U32. After removing CRYPTOA_U32, the type of the variable attrs can be changed from union to struct. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-06-16	pstore/blk: Include zone in pstore_device_info	Kees Cook
	Information was redundant between struct pstore_zone_info and struct pstore_device_info. Use struct pstore_zone_info, with member name "zone". Additionally untangle the logic for the "best effort" block device instance. Signed-off-by: Kees Cook <keescook@chromium.org> Fixed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/lkml/20210617005424.182305-1-pulehui@huawei.com
2021-06-16	net/mlx5e: Don't create devices during unload flow	Dmytro Linkin
	Running devlink reload command for port in switchdev mode cause resources to corrupt: driver can't release allocated EQ and reclaim memory pages, because "rdma" auxiliary device had add CQs which blocks EQ from deletion. Erroneous sequence happens during reload-down phase, and is following: 1. detach device - suspends auxiliary devices which support it, destroys others. During this step "eth-rep" and "rdma-rep" are destroyed, "eth" - suspended. 2. disable SRIOV - moves device to legacy mode; as part of disablement - rescans drivers. This step adds "rdma" auxiliary device. 3. destroy EQ table - <failure>. Driver shouldn't create any device during unload flows. To handle that implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset on device attach. If flag is set do no-op on drivers rescan. Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16	mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()	Hugh Dickins
	There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped. The first attempt used try_to_unmap(page, TTU_SYNC\|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page. However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside. The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside. Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly. This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe. Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped. Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: fc127da085c2 ("truncate: handle file thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/thp: try_to_unmap() use TTU_SYNC for safe splitting	Hugh Dickins
	Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/thp: make is_huge_zero_pmd() safe and quicker	Hugh Dickins
	Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry. Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time. __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present. Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/hugetlb: expand restore_reserve_on_error functionality	Mike Kravetz
	The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/ Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare	Peter Xu
	I found it by pure code review, that pte_same_as_swp() of unuse_vma() didn't take uffd-wp bit into account when comparing ptes. pte_same_as_swp() returning false negative could cause failure to swapoff swap ptes that was wr-protected by userfaultfd. Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [5.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	mm,hwpoison: fix race with hugetlb page allocation	Naoya Horiguchi
	When hugetlb page fault (under overcommitting situation) and memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race: CPU0: CPU1: gather_surplus_pages() page = alloc_surplus_huge_page() memory_failure_hugetlb() get_hwpoison_page(page) __get_hwpoison_page(page) get_page_unless_zero(page) zero = put_page_testzero(page) VM_BUG_ON_PAGE(!zero, page) enqueue_huge_page(h, page) put_page(page) __get_hwpoison_page() only checks the page refcount before taking an additional one for memory error handling, which is not enough because there's a time window where compound pages have non-zero refcount during hugetlb page initialization. So make __get_hwpoison_page() check page status a bit more for hugetlb pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags under hugetlb_lock makes sure that the hugetlb page is not transitive. It's notable that another new function, HWPoisonHandlable(), is helpful to prevent a race against other transitive page states (like a generic compound page just before PageHuge becomes true). Link: https://lkml.kernel.org/r/20210603233632.2964832-2-nao.horiguchi@gmail.com Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16	Merge remote-tracking branch 'linux-pm/acpi-scan' into review-hans	Hans de Goede

2021-06-16	Merge tag 'intel-gpio-v5.14-1' into review-hans	Hans de Goede
	intel-gpio for v5.14-1 * Export two functions from GPIO ACPI for wider use * Clean up Whiskey Cove and Crystal Cove GPIO drivers The following is an automated git shortlog grouped by driver: crystalcove: - remove platform_set_drvdata() + cleanup probe gpiolib: - acpi: Add acpi_gpio_get_io_resource() - acpi: Introduce acpi_get_and_request_gpiod() helper wcove: - Split error handling for CTRL and IRQ registers - Unify style of to_reg() with to_ireg() - Use IRQ hardware number getter instead of direct access
2021-06-16	platform/surface: aggregator: Update copyright	Maximilian Luz
	It's 2021, update the copyright accordingly. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-4-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	platform/surface: aggregator: Allow enabling of events without notifiers	Maximilian Luz
	We can already enable and disable SAM events via one of two ways: either via a (non-observer) notifier tied to a specific event group, or a generic event enable/disable request. In some instances, however, neither method may be desirable. The first method will tie the event enable request to a specific notifier, however, when we want to receive notifications for multiple event groups of the same target category and forward this to the same notifier callback, we may receive duplicate events, i.e. one event per registered notifier. The second method will bypass the internal reference counting mechanism, meaning that a disable request will disable the event regardless of any other client driver using it, which may break the functionality of that driver. To address this problem, add new functions that allow enabling and disabling of events via the event reference counting mechanism built into the controller, without needing to register a notifier. This can then be used in combination with observer notifiers to process multiple events of the same target category without duplication in the same callback function. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20210604134755.535590-3-luzmaximilian@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	platform/surface: aggregator: Allow registering notifiers without enabling ↵	Maximilian Luz
	events Currently, each SSAM event notifier is directly tied to one group of events. This makes sense as registering a notifier will automatically take care of enabling the corresponding event group and normally drivers only need notifications for a very limited number of events, associated with different callbacks for each group. However, there are rare cases, especially for debugging, when we want to get notifications for a whole event target category instead of just a single group of events in that category. Registering multiple notifiers, i.e. one per group, may be infeasible due to two issues: a) we might not know every event enable/disable specification as some events are auto-enabled by the EC and b) forwarding this to the same callback will lead to duplicate events as we might not know the full event specification to perform the appropriate filtering. This commit introduces observer-notifiers, which are notifiers that are not tied to a specific event group and do not attempt to manage any events. In other words, they can be registered without enabling any event group or incrementing the corresponding reference count and just act as silent observers, listening to all currently/previously enabled events based on their match-specification. Essentially, this allows us to register one single notifier for a full event target category, meaning that we can process all events of that target category in a single callback without duplication. Specifically, this will be used in the cdev debug interface to forward events to user-space via a device file from which the events can be read. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-2-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-06-16	ide: remove the legacy ide driver	Christoph Hellwig
	The legay ide driver has been replace with libata starting in 2003 and has been scheduled for removal for a while. Finally kill it off so that we can start cleaning up various bits of cruft it forced on the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ata: include: libata: Move fields commonly over-written to separate MACRO	Lee Jones
	This is a pre-cursor to some upcoming W=1 fix-ups. Fixes the following W=1 kernel build warning(s): Cc: Jens Axboe <axboe@kernel.dk> Cc: Mark Lord <mlord@pobox.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: linux-ide@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210528090502.1799866-2-lee.jones@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16	ACPI: Check StorageD3Enable _DSD property in ACPI code	Mario Limonciello
	Although first implemented for NVME, this check may be usable by other drivers as well. Microsoft's specification explicitly mentions that is may be usable by SATA and AHCI devices. Google also indicates that they have used this with SDHCI in a downstream kernel tree that a user can plug a storage device into. Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Suggested-by: Keith Busch <kbusch@kernel.org> CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> CC: Alexander Deucher <Alexander.Deucher@amd.com> CC: Rafael J. Wysocki <rjw@rjwysocki.net> CC: Prike Liang <prike.liang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-15	ahci: Add support for Dell S140 and later controllers	Charles Rose
	This patch enables support for Dell S140 and later controllers that use Intel's PCHs configured as PCI_CLASS_STORAGE_RAID. Reviewed-by: Mika Westerberg <mika.westerberg@intel.com> Signed-off-by: Charles Rose <charles.rose@dell.com> Link: https://lore.kernel.org/r/20210615190801.1744466-1-charles.rose@dell.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15	ptp: improve max_adj check against unreasonable values	Jakub Kicinski
	Scaled PPM conversion to PPB may (on 64bit systems) result in a value larger than s32 can hold (freq/scaled_ppm is a long). This means the kernel will not correctly reject unreasonably high ->freq values (e.g. > 4294967295ppb, 281474976645 scaled PPM). The conversion is equivalent to a division by ~66 (65.536), so the value of ppb is always smaller than ppm, but not small enough to assume narrowing the type from long -> s32 is okay. Note that reasonable user space (e.g. ptp4l) will not use such high values, anyway, 4289046510ppb ~= 4.3x, so the fix is somewhat pedantic. Fixes: d39a743511cd ("ptp: validate the requested frequency adjustment.") Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15	bpf: Support socket migration by eBPF.	Kuniyuki Iwashima
	This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to check if the attached eBPF program is capable of migrating sockets. When the eBPF program is attached, we run it for socket migration if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or net.ipv4.tcp_migrate_req is enabled. Currently, the expected_attach_type is not enforced for the BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the earlier idea in the commit aac3fc320d94 ("bpf: Post-hooks for sys_bind") to fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type(). Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to select a new listener based on the child socket. migrating_sk varies depending on if it is migrating a request in the accept queue or during 3WHS. - accept_queue : sock (ESTABLISHED/SYN_RECV) - 3WHS : request_sock (NEW_SYN_RECV) In the eBPF program, we can select a new listener by BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning SK_DROP. This feature is useful when listeners have different settings at the socket API level or when we want to free resources as soon as possible. - SK_PASS with selected_sk, select it as a new listener - SK_PASS with selected_sk NULL, fallbacks to the random selection - SK_DROP, cancel the migration. There is a noteworthy point. We select a listening socket in three places, but we do not have struct skb at closing a listener or retransmitting a SYN+ACK. On the other hand, some helper functions do not expect skb is NULL (e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer() in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb temporarily before running the eBPF program. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp
2021-06-14	net/mlx5: Enlarge interrupt field in CREATE_EQ	Shay Drory
	FW is now supporting more than 256 MSI-X per PF (up to 2K). Hence, enlarge interrupt field in CREATE_EQ to make use of the new MSI-X's. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14	net/mlx5: Provide cpumask at EQ creation phase	Leon Romanovsky
	The users of EQ are running their code on different CPUs and with various affinity patterns. Move the cpumask setting close to their actual usage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14	net: phy: micrel: ksz886x/ksz8081: add cabletest support	Oleksij Rempel
	This patch support for cable test for the ksz886x switches and the ksz8081 PHY. The patch was tested on a KSZ8873RLL switch with following results: - port 1: - provides invalid values, thus return -ENOTSUPP (Errata: DS80000830A: "LinkMD does not work on Port 1", http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8873-Errata-DS80000830A.pdf) - port 2: - can detect distance - can detect open on each wire of pair A (wire 1 and 2) - can detect open only on one wire of pair B (only wire 3) - can detect short between wires of a pair (wires 1 + 2 or 3 + 6) - short between pairs is detected as open. For example short between wires 2 + 3 is detected as open. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy/dsa micrel/ksz886x add MDI-X support	Oleksij Rempel
	Add support for MDI-X status and configuration Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	net: phy: micrel: move phy reg offsets to common header	Michael Grzeschik
	Some micrel devices share the same PHY register defines. This patch moves them to one common header so other drivers can reuse them. And reuse generic MII_* defines where possible. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14	Merge remote-tracking branch 'regmap/for-5.14' into regmap-next	Mark Brown

2021-06-14	CFI: Move function_nocfi() into compiler.h	Mark Rutland
	Currently the common definition of function_nocfi() is provided by <linux/mm.h>, and architectures are expected to provide a definition in <asm/memory.h>. Due to header dependencies, this can make it hard to use function_nocfi() in low-level headers. As function_nocfi() has no dependency on any mm code, nor on any memory definitions, it doesn't need to live in <linux/mm.h> or <asm/memory.h>. Generally, it would make more sense for it to live in <linux/compiler.h>, where an architecture can override it in <asm/compiler.h>. Move the definitions accordingly. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210602153701.35957-1-mark.rutland@arm.com
2021-06-14	Revert "cpufreq: CPPC: Add support for frequency invariance"	Viresh Kumar
	This reverts commit 4c38f2df71c8e33c0b64865992d693f5022eeaad. There are few races in the frequency invariance support for CPPC driver, namely the driver doesn't stop the kthread_work and irq_work on policy exit during suspend/resume or CPU hotplug. A proper fix won't be possible for the 5.13-rc, as it requires a lot of changes. Lets revert the patch instead for now. Fixes: 4c38f2df71c8 ("cpufreq: CPPC: Add support for frequency invariance") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-14	mmc: Improve function name when aborting a tuning cmd	Wolfram Sang
	'mmc_abort_tuning()' made me think tuning gets completely aborted. However, it sends only a STOP cmd to cancel the current tuning cmd. Tuning process may still continue after that. So, rename the function to 'mmc_send_abort_tuning()' to better reflect all this. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20210608180620.40059-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-06-14	mmc: core: Add support for cache ctrl for SD cards	Ulf Hansson
	In SD spec v6.x the SD function extension registers for performance enhancements were introduced. As a part of this an optional internal cache on the SD card, can be used to improve performance. The let the SD card use the cache, the host needs to enable it and manage flushing of the cache, so let's add support for this. Note that for an SD card supporting the cache it's mandatory for it, to also support the poweroff notification feature. According to the SD spec, if the cache has been enabled and a poweroff notification is sent to the card, that implicitly also means that the card should flush its internal cache. Therefore, dealing with cache flushing for REQ_OP_FLUSH block requests is sufficient. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20210511101359.83521-1-ulf.hansson@linaro.org
2021-06-14	mmc: core: Add support for Power Off Notification for SD cards	Ulf Hansson
	Rather than only deselecting the SD card via a CMD7, before we cut power to it at system suspend, at runtime suspend or at shutdown, let's add support for a graceful power off sequence via enabling the SD Power Off Notification feature. Note that, the Power Off Notification feature was added in the SD spec v4.x, which is several years ago. However, it's still a bit unclear how often the SD card vendors decides to implement support for it. To validate these changes a Sandisk Extreme PRO A2 64GB has been used, which seems to work nicely. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20210504161222.101536-12-ulf.hansson@linaro.org
2021-06-14	mmc: core: Read performance enhancements registers for SD cards	Ulf Hansson
	In SD spec v6.x the SD function extension registers for performance enhancements were introduced. These registers let the SD card announce supports for various performance related features, like "self-maintenance", "cache" and "command queuing". Let's extend the parsing of SD function extension registers and store the information in the struct mmc_card. This prepares for subsequent changes to implement the complete support for new the performance enhancement features. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20210504161222.101536-11-ulf.hansson@linaro.org
2021-06-14	mmc: core: Read the SD function extension registers for power management	Ulf Hansson
	In the SD spec v4.0 the CMD48/49 and CMD58/59 were introduced as optional commands. In the SD spec v4.1 the SD function extension registers were introduced, which requires support for CMD48/49/58/59 to be read/written from/to. Moreover, a specific function extension register were added to let the card announce support for optional features in regards to power management. The features that were added are "Power Off Notification", "Power Down Mode" and "Power Sustenance". As a first step to support this, let's read and parse the register for power management during the SD card initialization and store the information about the supported features in the struct mmc_card. In this way, we prepare for subsequent changes to implement the complete support for the new features. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Acked-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20210504161222.101536-10-ulf.hansson@linaro.org
2021-06-14	mmc: core: Parse the SD SCR register for support of CMD48/49 and CMD58/59	Ulf Hansson
	In SD spec v4.x the support for CMD48/49 and CMD58/59 were introduced as optional features. To let the card announce whether it supports the commands, the SCR register has been extended with corresponding support bits. Let's parse and store this information for later use. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Acked-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20210504161222.101536-9-ulf.hansson@linaro.org
2021-06-14	Merge tag 'for-5.14-regulator' of ↵	Mark Brown
	git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into regulator-5.14 regulator: Changes for v5.14-rc1 This adds regulator_sync_voltage_rdev(), which is used as a dependency for new Tegra power domain code.
2021-06-12	mm: relocate 'write_protect_seq' in struct mm_struct	Feng Tang
	0day robot reported a 9.2% regression for will-it-scale mmap1 test case[1], caused by commit 57efa1fe5957 ("mm/gup: prevent gup_fast from racing with COW during fork"). Further debug shows the regression is due to that commit changes the offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some cache alignment changes. From the perf data, the contention for 'mmap_lock' is very severe and takes around 95% cpu cycles, and it is a rw_semaphore struct rw_semaphore { atomic_long_t count; /* 8 bytes / atomic_long_t owner; / 8 bytes / struct optimistic_spin_queue osq; / spinner MCS lock / ... Before commit 57efa1fe5957 adds the 'write_protect_seq', it happens to have a very optimal cache alignment layout, as Linus explained: "and before the addition of the 'write_protect_seq' field, the mmap_sem was at offset 120 in 'struct mm_struct'. Which meant that count and owner were in two different cachelines, and then when you have contention and spend time in rwsem_down_write_slowpath(), this is probably exactly* the kind of layout you want. Because first the rwsem_write_trylock() will do a cmpxchg on the first cacheline (for the optimistic fast-path), and then in the case of contention, rwsem_down_write_slowpath() will just access the second cacheline. Which is probably just optimal for a load that spends a lot of time contended - new waiters touch that first cacheline, and then they queue themselves up on the second cacheline." After the commit, the rw_semaphore is at offset 128, which means the 'count' and 'owner' fields are now in the same cacheline, and causes more cache bouncing. Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will affect its offset: CONFIG_MMU CONFIG_MEMBARRIER CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES The layout above is on 64 bits system with 0day's default kernel config (similar to RHEL-8.3's config), in which all these 3 options are 'y'. And the layout can vary with different kernel configs. Relayouting a structure is usually a double-edged sword, as sometimes it can helps one case, but hurt other cases. For this case, one solution is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t (when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes hole in 'mm_struct' will not change other fields' alignment, while restoring the regression. Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1] Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-12	net: qualcomm: rmnet: trailer value is a checksum	Alex Elder
	The csum_value field in the rmnet_map_dl_csum_trailer structure is a "real" Internet checksum. It is a 16 bit value, in big endian format, which represents an inverted ones' complement sum over pairs of bytes. Make that clear by changing its type to __sum16. This makes a typecast in rmnet_map_ipv4_dl_csum_trailer() and another in rmnet_map_ipv6_dl_csum_trailer() unnecessary. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12	wwan: add interface creation support	Johannes Berg
	Add support to create (and destroy) interfaces via a new rtnetlink kind "wwan". The responsible driver has to use the new wwan_register_ops() to make this possible. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12	net: make get_net_ns return error if NET_NS is disabled	Changbin Du
	There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled. The reason is that nsfs tries to access ns->ops but the proc_ns_operations is not implemented in this case. [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [7.670268] pgd = 32b54000 [7.670544] [00000010] *pgd=00000000 [7.671861] Internal error: Oops: 5 [#1] SMP ARM [7.672315] Modules linked in: [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16 [7.673309] Hardware name: Generic DT based system [7.673642] PC is at nsfs_evict+0x24/0x30 [7.674486] LR is at clear_inode+0x20/0x9c The same to tun SIOCGSKNS command. To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is disabled. Meanwhile move it to right place net/core/net_namespace.c. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: c62cce2caee5 ("net: add an ioctl to get a socket network namespace") Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12	net: phy: Add 25G BASE-R interface mode	Steen Hegelund
	Add 25gbase-r phy interface mode Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-12	Merge tag 'usb-5.13-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of tiny USB fixes for 5.13-rc6. There are more than I would normally like, but there's been a bunch of people banging on the gadget and dwc3 and typec code recently for I think an Android release, which has resulted in a number of small fixes. It's nice to see companies send fixes upstream for this type of work, a notable change from years ago. Anyway, fixes in here are: - usb-serial device id updates - usb-serial cp210x driver fixes for broken firmware versions - typec fixes for crazy charging devices and other reported problems - dwc3 fixes for reported problems found - gadget fixes for reported problems - tiny xhci fixes - other small fixes for reported issues. - revert of a problem fix found by linux-next testing All of these have passed 0-day and linux-next testing with no reported problems (the revert for the found linux-next build problem included)" * tag 'usb-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (44 commits) Revert "usb: gadget: fsl: Re-enable driver for ARM SoCs" usb: typec: mux: Fix copy-paste mistake in typec_mux_match usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path usb: gadget: fsl: Re-enable driver for ARM SoCs usb: typec: wcove: Use LE to CPU conversion when accessing msg->header USB: serial: cp210x: fix CP2102N-A01 modem control USB: serial: cp210x: fix alternate function for CP2102N QFN20 usb: misc: brcmstb-usb-pinmap: check return value after calling platform_get_resource() usb: dwc3: ep0: fix NULL pointer exception usb: gadget: eem: fix wrong eem header operation usb: typec: intel_pmc_mux: Put ACPI device using acpi_dev_put() usb: typec: intel_pmc_mux: Add missed error check for devm_ioremap_resource() usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe() usb: typec: tcpm: Do not finish VDM AMS for retrying Responses usb: fix various gadget panics on 10gbps cabling usb: fix various gadgets null ptr deref on 10gbps cabling. usb: pci-quirks: disable D3cold on xhci suspend for s2idle on AMD Renoir usb: f_ncm: only first packet of aggregate needs to start timer USB: f_ncm: ncm_bitrate (speed) is unsigned MAINTAINERS: usb: add entry for isp1760 ...
2021-06-12	Merge tag 'char-misc-5.13-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small misc driver fixes for 5.13-rc6 that fix some reported problems: - Tiny phy driver fixes for reported issues - rtsx regression for when the device suspended - mhi driver fix for a use-after-free All of these have been in linux-next for a few days with no reported issues" * tag 'char-misc-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG bus: mhi: pci-generic: Fix hibernation bus: mhi: pci_generic: Fix possible use-after-free in mhi_pci_remove() bus: mhi: pci_generic: T99W175: update channel name from AT to DUN phy: Sparx5 Eth SerDes: check return value after calling platform_get_resource() phy: ralink: phy-mt7621-pci: drop 'of_match_ptr' to fix -Wunused-const-variable phy: ti: Fix an error code in wiz_probe() phy: phy-mtk-tphy: Fix some resource leaks in mtk_phy_init() phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe() phy: usb: Fix misuse of IS_ENABLED
2021-06-12	Merge tag 'sched-urgent-2021-06-12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: - Fix performance regression caused by lack of intended batching of RCU callbacks by over-eager NOHZ-full code. - Fix cgroups related corruption of load_avg and load_sum metrics. - Three fixes to fix blocked load, util_sum/runnable_sum and util_est tracking bugs" * tag 'sched-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling sched/pelt: Ensure that _sum is always synced with _avg tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed sched/fair: Make sure to update tg contrib for blocked load sched/fair: Keep load_avg and load_sum synced
2021-06-11	net: pcs: xpcs: export xpcs_do_config and xpcs_link_up	Vladimir Oltean
	The sja1105 hardware has a quirk in that some changes require a switch reset, which loses all configuration. When the reset is initiated, everything needs to be reprogrammed, including the MACs and the PCS. This is currently done in sja1105_static_config_reload() - we manually call sja1105_adjust_port_config(), sja1105_sgmii_pcs_config() and sja1105_sgmii_pcs_force_speed() which are all internal functions. There is a desire for sja1105 to use the common xpcs driver, and that means that the equivalents of those functions, xpcs_do_config() and xpcs_link_up() respectively, will no longer be local functions. Forcing phylink to retrigger a resolve somehow, say by doing dev_close() followed by dev_open() is not really an option, because the CPU port might have a PCS as well, and there is no net device which we can close and reopen for that. Additionally, the dev_close/dev_open sequence might force a renegotiation of the copper-side link for SGMII ports connected to a PHY, and this is undesirable as well, because the switch reset is much quicker than a PHY autoneg, so we would have a lot more downtime. The only solution I see is for the sja1105 driver to keep doing what it's doing, and that means we need to export the equivalents from xpcs for sja1105_sgmii_pcs_config and sja1105_sgmii_pcs_force_speed, and call them directly in sja1105_static_config_reload(). This will be done during the conversion patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: pcs: xpcs: add support for NXP SJA1110	Vladimir Oltean
	The NXP SJA1110 switch integrates its own, non-Synopsys PMA, but it manages it through the register space of the XPCS itself, in a small register window inside MDIO_MMD_VEND2 from address 0x8030 to 0x806e. This coincides with where the registers for the default Synopsys PMA are, but the register definitions are of course not the same. This situation is an odd hardware quirk, but the simplest way to manage it is to drive the SJA1110's PMA from within the XPCS driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: pcs: xpcs: add support for NXP SJA1105	Vladimir Oltean
	The NXP SJA1105 DSA switch integrates a Synopsys SGMII XPCS on port 4. The generic code works fine, except there is an integration issue which needs to be dealt with: in this switch, the XPCS is integrated with a PMA that has the TX lane polarity inverted by default (PLUS is MINUS, MINUS is PLUS). To obtain normal non-inverted behavior, the TX lane polarity must be inverted in the PCS, via the DIGITAL_CONTROL_2 register. We introduce a pma_config() method in xpcs_compat which is called by the phylink_pcs_config() implementation. Also, the NXP SJA1105 returns all zeroes in the PHY ID registers 2 and 3. We need to hack up an ad-hoc PHY ID (OUI is zero, device ID is 1) in order for the XPCS driver to recognize it. This PHY ID is added to the public include/linux/pcs/pcs-xpcs.h for that reason (for the sja1105 driver to be able to use it in a later patch). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	net: pcs: xpcs: rename mdio_xpcs_args to dw_xpcs	Vladimir Oltean
	The struct mdio_xpcs_args is reminiscent of when a similarly named struct mdio_xpcs_ops existed. Now that that is removed, we can shorten the name to dw_xpcs (dw for DesignWare). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11	Merge branch '100GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Jake Keller says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-06-11 Extend the ice driver to support basic PTP clock functionality for E810 devices. This includes some tangential work required to setup the sideband queue and driver shared parameters as well. This series only supports E810-based devices. This is because other devices based on the E822 MAC use a different and more complex PHY. The low level device functionality is kept within ice_ptp_hw.c and is designed to be extensible for supporting E822 devices in a future series. This series also only supports very basic functionality including the ptp_clock device and timestamping. Support for configuring periodic outputs and external input timestamps will be implemented in a future series. There are a couple of potential "what? why?" bits in this series I want to point out: 1) the PTP hardware functionality is shared between multiple functions. This means that the same clock registers are shared across multiple PFs. In order to avoid contention or clashing between PFs, firmware assigns "ownership" to one PF, while other PFs are merely "associated" with the timer. Because we share the hardware resource, only the clock owner will allocate and register a PTP clock device. Other PFs determine the appropriate PTP clock index to report by using a firmware interface to read a shared parameter that is set by the owning PF. 2) the ice driver uses its own kthread instead of using do_aux_work. This is because the periodic and asynchronous tasks are necessary for all PFs, but only one PF will allocate the clock. The series is broken up into functional pieces to allow easy review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>