summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-22fs/buffer: introduce sleeping flavors for pagecache lookupsDavidlohr Bueso
Add __find_get_block_nonatomic() and sb_find_get_block_nonatomic() calls for which users will be converted where safe. These versions will take the folio lock instead of the mapping's private_lock. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-3-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22MAINTAINERS: add HFS/HFS+ maintainersViacheslav Dubeyko
Both the hfs and hfsplus filesystem have been orphaned since at least 2014, i.e., over 10 years. However, HFS/HFS+ driver needs to stay for Debian Ports as otherwise we won't be able to boot PowerMacs using GRUB because GRUB won't be usable anymore on PowerMacs with HFS/HFS+ being removed from the kernel. This patch proposes to add Viacheslav Dubeyko and John Paul Adrian Glaubitz as maintainers of HFS/HFS+ driver. Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/20250417223507.1097186-1-slava@dubeyko.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/buffer: split locking for pagecache lookupsDavidlohr Bueso
Callers of __find_get_block() may or may not allow for blocking semantics, and is currently assumed that it will not. Layout two paths based on this. The the private_lock scheme will continued to be used for atomic contexts. Otherwise take the folio lock instead, which protects the buffers, such as vs migration and try_to_free_buffers(). Per the "hack idea", the latter can alleviate contention on the private_lock for bdev mappings. For reasons of determinism and avoid making bugs hard to reproduce, the trylocking is not attempted. No change in semantics. All lookup users still take the spinlock. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-2-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22lib: Ensure prime numbers tests are included in KUnit test runsMark Brown
When the select of PRIME_MUMBERS was removed from it's KUnit test Kconfig nothing was added to the KUnit configs, meaning that when run via the KUnit runner the tests are neither built nor run. Add PRIME_NUMBERS to all_tests.config so they are enabled when the KUnit runner builds the kernel. Fixes: 3f2925174f8b ("lib/prime_numbers: KUnit test should not select PRIME_NUMBERS") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250422-lib-fix-prime-numbers-kunit-v1-1-4278c1d4a4ae@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-04-22dma-coherent: Warn if OF reserved memory is beyond current coherent DMA maskChen-Yu Tsai
When a reserved memory region described in the device tree is attached to a device, it is expected that the device's limitations are correctly included in that description. However, if the device driver failed to implement DMA address masking or addressing beyond the default 32 bits (on arm64), then bad things could happen because the DMA address was truncated, such as playing back audio with no actual audio coming out, or DMA overwriting random blocks of kernel memory. Check against the coherent DMA mask when the memory regions are attached to the device. Give a warning when the memory region can not be covered by the mask. A warning instead of a hard error was chosen, because it is possible that existing drivers could be working fine even if they forgot to extend the coherent DMA mask. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250421083930.374173-1-wenst@chromium.org
2025-04-22ima: process_measurement() needlessly takes inode_lock() on MAY_READFrederick Lawler
On IMA policy update, if a measure rule exists in the policy, IMA_MEASURE is set for ima_policy_flags which makes the violation_check variable always true. Coupled with a no-action on MAY_READ for a FILE_CHECK call, we're always taking the inode_lock(). This becomes a performance problem for extremely heavy read-only workloads. Therefore, prevent this only in the case there's no action to be taken. Signed-off-by: Frederick Lawler <fred@cloudflare.com> Acked-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-04-22xfs: remove duplicate Zoned Filesystems sections in admin-guideHans Holmberg
Remove the duplicated section and while at it, turn spaces into tabs. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Fixes: c7b67ddc3c99 ("xfs: document zoned rt specifics in admin-guide") Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-22net: 802: Remove unused p8022 codeDr. David Alan Gilbert
p8022.c defines two external functions, register_8022_client() and unregister_8022_client(), the last use of which was removed in 2018 by commit 7a2e838d28cf ("staging: ipx: delete it from the tree") Remove the p8022.c file, it's corresponding header, and glue surrounding it. There was one place the header was included in vlan.c but it didn't use the functions it declared. There was a comment in net/802/Makefile about checking against net/core/Makefile, but that's at least 20 years old and there's no sign of net/core/Makefile mentioning it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250418011519.145320-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22XFS: fix zoned gc threshold math for 32-bit archesCarlos Maiolino
xfs_zoned_need_gc makes use of mult_frac() to calculate the threshold for triggering the zoned garbage collector, but, turns out mult_frac() doesn't properly work with 64-bit data types and this caused build failures on some 32-bit architectures. Fix this by essentially open coding mult_frac() in a 64-bit friendly way. Notice we don't need to bother with counters underflow here because xfs_estimate_freecounter() will always return a positive value, as it leverages percpu_counter_read_positive to read such counters. Fixes: 845abeb1f06a ("xfs: add tunable threshold parameter for triggering zone GC") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504181233.F7D9Atra-lkp@intel.com/ Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-22spi: stm32-ospi: Fix an error handling path in stm32_ospi_probe()Christophe JAILLET
If an error occurs after a successful stm32_ospi_dma_setup() call, some dma_release_channel() calls are needed to release some resources, as already done in the remove function. Fixes: 79b8a705e26c ("spi: stm32: Add OSPI driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com> Link: https://patch.msgid.link/2674c8df1d05ab312826b69bfe9559f81d125a0b.1744975624.git.christophe.jaillet@wanadoo.fr Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-22net: lwtunnel: disable BHs when requiredJustin Iurman
In lwtunnel_{output|xmit}(), dev_xmit_recursion() may be called in preemptible scope for PREEMPT kernels. This patch disables BHs before calling dev_xmit_recursion(). BHs are re-enabled only at the end, since we must ensure the same CPU is used for both dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other recursion levels in some cases) in order to maintain valid per-cpu counters. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/netdev/CAADnVQJFWn3dBFJtY+ci6oN1pDFL=TzCmNbRgey7MdYxt_AP2g@mail.gmail.com/ Reported-by: Eduard Zingerman <eddyz87@gmail.com> Closes: https://lore.kernel.org/netdev/m2h62qwf34.fsf@gmail.com/ Fixes: 986ffb3a57c5 ("net: lwtunnel: fix recursion loops") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416160716.8823-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22net: selftests: initialize TCP header and skb payload with zeroOleksij Rempel
Zero-initialize TCP header via memset() to avoid garbage values that may affect checksum or behavior during test transmission. Also zero-fill allocated payload and padding regions using memset() after skb_put(), ensuring deterministic content for all outgoing test packets. Fixes: 3e1e58d64c3d ("net: add generic selftest support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416160125.2914724-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22dma-mapping: Fix warning reported for missing prototypeBalbir Singh
lkp reported a warning about missing prototype for a recent patch. The kernel-doc style comments are out of sync, move them to the right function. Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504190615.g9fANxHw-lkp@intel.com/ Signed-off-by: Balbir Singh <balbirs@nvidia.com> [mszyprow: reformatted subject] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250422114034.3535515-1-balbirs@nvidia.com
2025-04-22rtase: Add ndo_setup_tc support for CBS offload in traffic control setupJustin Lai
Add support for ndo_setup_tc to enable CBS offload functionality as part of traffic control configuration for network devices, where CBS is applied from the CPU to the switch. More specifically, CBS is applied at the GMAC in the topmost architecture diagram. Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416115757.28156-1-justinlai0215@realtek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22rxrpc: rxgk: Set error code in rxgk_yfs_decode_ticket()Dan Carpenter
Propagate the error code if key_alloc() fails. Don't return success. Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/Z_-P_1iLDWksH1ik@stanley.mountain Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22net: phy: microchip: force IRQ polling mode for lan88xxFiona Klute
With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] https://github.com/raspberrypi/linux/issues/2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch Fixes: 792aec47d59d ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <fiona.klute@gmx.de> Cc: kernel-list@raspberrypi.com Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250416102413.30654-1-fiona.klute@gmx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22Merge branch 'ionic-support-qsfp-cmis'Paolo Abeni
Shannon Nelson says: ==================== ionic: support QSFP CMIS This patchset sets up support for additional pages and better handling of the QSFP CMIS data. v1: https://lore.kernel.org/netdev/20250411182140.63158-1-shannon.nelson@amd.com/ ==================== Link: https://patch.msgid.link/20250415231317.40616-1-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22ionic: add module eeprom channel data to ionic_if and ethtoolShannon Nelson
Make the CMIS module type's page 17 channel data available for ethtool to request. As done previously, carve space for this data from the port_info reserved space. In the future, if additional pages are needed, a new firmware AdminQ command will be added for accessing random pages. Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415231317.40616-4-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22ionic: support ethtool get_module_eeprom_by_pageShannon Nelson
Add support for the newer get_module_eeprom_by_page interface. Only the upper half of the 256 byte page is available for reading, and the firmware puts the two sections into the extended sprom buffer, so a union is used over the extended sprom buffer to make clear which page is to be accessed. With get_module_eeprom_by_page implemented there is no need for the older get_module_info or git_module_eeprom interfaces, so remove them. Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415231317.40616-3-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22ionic: extend the QSFP module sprom for more pagesShannon Nelson
Some QSFP modules have more eeprom to be read by ethtool than the initial high and low page 0 that is currently available in the DSC's ionic sprom[] buffer. Since the current sprom[] is baked into the middle of an existing API struct, to make the high end of page 1 and page 2 available a block is carved from a reserved space of the existing port_info struct and the ionic_get_module_eeprom() service is taught how to get there. Newer firmware writes the additional QSFP page info here, yet this remains backward compatible because older firmware sets this space to all 0 and older ionic drivers do not use the reserved space. Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415231317.40616-2-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22Merge branch 'vxlan-convert-fdb-table-to-rhashtable'Paolo Abeni
Ido Schimmel says: ==================== vxlan: Convert FDB table to rhashtable The VXLAN driver currently stores FDB entries in a hash table with a fixed number of buckets (256), resulting in reduced performance as the number of entries grows. This patchset solves the issue by converting the driver to use rhashtable which maintains a more or less constant performance regardless of the number of entries. Measured transmitted packets per second using a single pktgen thread with varying number of entries when the transmitted packet always hits the default entry (worst case): Number of entries | Improvement ------------------|------------ 1k | +1.12% 4k | +9.22% 16k | +55% 64k | +585% 256k | +2460% The first patches are preparations for the conversion in the last patch. Specifically, the series is structured as follows: Patch #1 adds RCU read-side critical sections in the Tx path when accessing FDB entries. Targeting at net-next as I am not aware of any issues due to this omission despite the code being structured that way for a long time. Without it, traces will be generated when converting FDB lookup to rhashtable_lookup(). Patch #2-#5 simplify the creation of the default FDB entry (all-zeroes). Current code assumes that insertion into the hash table cannot fail, which will no longer be true with rhashtable. Patches #6-#10 add FDB entries to a linked list for entry traversal instead of traversing over them using the fixed size hash table which is removed in the last patch. Patches #11-#12 add wrappers for FDB lookup that make it clear when each should be used along with lockdep annotations. Needed as a preparation for rhashtable_lookup() that must be called from an RCU read-side critical section. Patch #13 treats dst cache initialization errors as non-fatal. See more info in the commit message. The current code happens to work because insertion into the fixed size hash table is slow enough for the per-CPU allocator to be able to create new chunks of per-CPU memory. Patch #14 adds an FDB key structure that includes the MAC address and source VNI. To be used as rhashtable key. Patch #15 does the conversion to rhashtable. ==================== Link: https://patch.msgid.link/20250415121143.345227-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Convert FDB table to rhashtableIdo Schimmel
FDB entries are currently stored in a hash table with a fixed number of buckets (256), resulting in performance degradation as the number of entries grows. Solve this by converting the driver to use rhashtable which maintains more or less constant performance regardless of the number of entries. Measured transmitted packets per second using a single pktgen thread with varying number of entries when the transmitted packet always hits the default entry (worst case): Number of entries | Improvement ------------------|------------ 1k | +1.12% 4k | +9.22% 16k | +55% 64k | +585% 256k | +2460% In addition, the change reduces the size of the VXLAN device structure from 2584 bytes to 672 bytes. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-16-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Introduce FDB key structureIdo Schimmel
In preparation for converting the FDB table to rhashtable, introduce a key structure that includes the MAC address and source VNI. No functional changes intended. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-15-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Do not treat dst cache initialization errors as fatalIdo Schimmel
FDB entries are allocated in an atomic context as they can be added from the data path when learning is enabled. After converting the FDB hash table to rhashtable, the insertion rate will be much higher (*) which will entail a much higher rate of per-CPU allocations via dst_cache_init(). When adding a large number of entries (e.g., 256k) in a batch, a small percentage (< 0.02%) of these per-CPU allocations will fail [1]. This does not happen with the current code since the insertion rate is low enough to give the per-CPU allocator a chance to asynchronously create new chunks of per-CPU memory. Given that: a. Only a small percentage of these per-CPU allocations fail. b. The scenario where this happens might not be the most realistic one. c. The driver can work correctly without dst caches. The dst_cache_*() APIs first check that the dst cache was properly initialized. d. The dst caches are not always used (e.g., 'tos inherit'). It seems reasonable to not treat these allocation failures as fatal. Therefore, do not bail when dst_cache_init() fails and suppress warnings by specifying '__GFP_NOWARN'. [1] percpu: allocation failed, size=40 align=8 atomic=1, atomic alloc failed, no space left (*) 97% reduction in average latency of vxlan_fdb_update() when adding 256k entries in a batch. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-14-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Create wrappers for FDB lookupIdo Schimmel
__vxlan_find_mac() is called from both the data path (e.g., during learning) and the control path (e.g., when replacing an entry). The function is missing lockdep annotations to make sure that the FDB hash lock is held during FDB updates. Rename __vxlan_find_mac() to vxlan_find_mac_rcu() to reflect the fact that it should be called from an RCU read-side critical section and call it from vxlan_find_mac() which checks that the FDB hash lock is held. Change callers to invoke the appropriate function. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-13-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Rename FDB Tx lookup functionIdo Schimmel
vxlan_find_mac() is only expected to be called from the Tx path as it updates the 'used' timestamp. Rename it to vxlan_find_mac_tx() to reflect that and to avoid incorrect updates of this timestamp like those addressed by commit 9722f834fe9a ("vxlan: Avoid unnecessary updates to FDB 'used' time"). No functional changes intended. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-12-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Convert FDB flushing to RCUIdo Schimmel
Instead of holding the FDB hash lock when traversing the FDB linked list during flushing, use RCU and only acquire the lock for entries that need to be flushed. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-11-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Convert FDB garbage collection to RCUIdo Schimmel
Instead of holding the FDB hash lock when traversing the FDB linked list during garbage collection, use RCU and only acquire the lock for entries that need to be removed (aged out). Avoid races by using hlist_unhashed() to check that the entry has not been removed from the list by another thread. Note that vxlan_fdb_destroy() uses hlist_del_init_rcu() to remove an entry from the list which should cause list_unhashed() to return true. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-10-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Use linked list to traverse FDB entriesIdo Schimmel
In preparation for removing the fixed size hash table, convert FDB entry traversal to use the newly added FDB linked list. No functional changes intended. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-9-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Add a linked list of FDB entriesIdo Schimmel
Currently, FDB entries are stored in a hash table with a fixed number of buckets. The table is used for both lookups and entry traversal. Subsequent patches will convert the table to rhashtable which is not suitable for entry traversal. In preparation for this conversion, add FDB entries to a linked list. Subsequent patches will convert the driver to use this list when traversing entries during dump, flush, etc. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-8-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Use a single lock to protect the FDB tableIdo Schimmel
Currently, the VXLAN driver stores FDB entries in a hash table with a fixed number of buckets (256). Subsequent patches are going to convert this table to rhashtable with a linked list for entry traversal, as rhashtable is more scalable. In preparation for this conversion, move from a per-bucket spin lock to a single spin lock that protects the entire FDB table. The per-bucket spin locks were introduced by commit fe1e0713bbe8 ("vxlan: Use FDB_HASH_SIZE hash_locks to reduce contention") citing "huge contention when inserting/deleting vxlan_fdbs into the fdb_head". It is not clear from the commit message which code path was holding the spin lock for long periods of time, but the obvious suspect is the FDB cleanup routine (vxlan_cleanup()) that periodically traverses the entire table in order to delete aged-out entries. This will be solved by subsequent patches that will convert the FDB cleanup routine to traverse the linked list of FDB entries using RCU, only acquiring the spin lock when deleting an aged-out entry. The change reduces the size of the VXLAN device structure from 3600 bytes to 2576 bytes. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-7-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Relocate assignment of default remote deviceIdo Schimmel
The default FDB entry can be associated with a net device if a physical device (i.e., 'dev PHYS_DEV') was specified during the creation of the VXLAN device. The assignment of the net device pointer to 'dst->remote_dev' logically belongs in the if block that resolves the pointer from the specified ifindex, so move it there. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-6-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Unsplit default FDB entry creation and notificationIdo Schimmel
Commit 0241b836732f ("vxlan: fix default fdb entry netlink notify ordering during netdev create") split the creation of the default FDB entry from its notification to avoid sending a RTM_NEWNEIGH notification before RTM_NEWLINK. Previous patches restructured the code so that the default FDB entry is created after registering the VXLAN device and the notification about the new entry immediately follows its creation. Therefore, simplify the code and revert back to vxlan_fdb_update() which takes care of both creating the FDB entry and notifying user space about it. Hold the FDB hash lock when calling vxlan_fdb_update() like it expects. A subsequent patch will add a lockdep assertion to make sure this is indeed the case. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-5-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Insert FDB into hash table in vxlan_fdb_create()Ido Schimmel
Commit 7c31e54aeee5 ("vxlan: do not destroy fdb if register_netdevice() is failed") split the insertion of FDB entries into the FDB hash table from the function where they are created. This was done in order to work around a problem that is no longer possible after the previous patch. Simplify the code and move the body of vxlan_fdb_insert() back into vxlan_fdb_create(). Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-4-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Simplify creation of default FDB entryIdo Schimmel
There is asymmetry in how the default FDB entry (all-zeroes) is created and destroyed in the VXLAN driver. It is created as part of the driver's newlink() routine, but destroyed as part of its ndo_uninit() routine. This caused multiple problems in the past. First, commit 0241b836732f ("vxlan: fix default fdb entry netlink notify ordering during netdev create") split the notification about the entry from its creation so that it will not be notified to user space before the VXLAN device is registered. Then, commit 6db924687139 ("vxlan: Fix error path in __vxlan_dev_create()") made the error path in __vxlan_dev_create() asymmetric by destroying the FDB entry before unregistering the net device. Otherwise, the FDB entry would have been freed twice: By ndo_uninit() as part of unregister_netdevice() and by vxlan_fdb_destroy() in the error path. Finally, commit 7c31e54aeee5 ("vxlan: do not destroy fdb if register_netdevice() is failed") split the insertion of the FDB entry into the hash table from its creation, moving the insertion after the registration of the net device. Otherwise, like before, the FDB entry would have been freed twice: By ndo_uninit() as part of register_netdevice()'s error path and by vxlan_fdb_destroy() in the error path of __vxlan_dev_create(). The end result is that the code is unnecessarily complex. In addition, the fixed size hash table cannot be converted to rhashtable as vxlan_fdb_insert() cannot fail, which will no longer be true with rhashtable. Solve this by making the addition and deletion of the default FDB entry completely symmetric. Namely, as part of newlink() routine, create the entry, insert it into to the hash table and send a notification to user space after the net device was registered. Note that at this stage the net device is still administratively down and cannot transmit / receive packets. Move the deletion from ndo_uninit() to the dellink routine(): Flush the default entry together with all the other entries, before unregistering the net device. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-3-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22vxlan: Add RCU read-side critical sections in the Tx pathIdo Schimmel
The Tx path does not run from an RCU read-side critical section which makes the current lockless accesses to FDB entries invalid. As far as I am aware, this has not been a problem in practice, but traces will be generated once we transition the FDB lookup to rhashtable_lookup(). Add rcu_read_{lock,unlock}() around the handling of FDB entries in the Tx path. Remove the RCU read-side critical section from vxlan_xmit_nh() as now the function is always called from an RCU read-side critical section. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-2-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22drm/i915/pxp: fix undefined reference to `intel_pxp_gsccs_is_ready_for_sessions'Chen Linxuan
On x86_64 with gcc version 13.3.0, I compile kernel with: make defconfig ./scripts/kconfig/merge_config.sh .config <( echo CONFIG_COMPILE_TEST=y ) make KCFLAGS="-fno-inline-functions -fno-inline-small-functions -fno-inline-functions-called-once" Then I get a linker error: ld: vmlinux.o: in function `pxp_fw_dependencies_completed': kintel_pxp.c:(.text+0x95728f): undefined reference to `intel_pxp_gsccs_is_ready_for_sessions' This is caused by not having a intel_pxp_gsccs_is_ready_for_sessions() header stub for CONFIG_DRM_I915_PXP=n. Add it. Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com> Fixes: 99afb7cc8c44 ("drm/i915/pxp: Add ARB session creation and cleanup") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250415090616.2649889-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit b484c1e225a6a582fc78c4d7af7b286408bb7d41) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-04-22nvmet: fix out-of-bounds access in nvmet_enable_portRichard Weinberger
When trying to enable a port that has no transport configured yet, nvmet_enable_port() uses NVMF_TRTYPE_MAX (255) to query the transports array, causing an out-of-bounds access: [ 106.058694] BUG: KASAN: global-out-of-bounds in nvmet_enable_port+0x42/0x1da [ 106.058719] Read of size 8 at addr ffffffff89dafa58 by task ln/632 [...] [ 106.076026] nvmet: transport type 255 not supported Since commit 200adac75888, NVMF_TRTYPE_MAX is the default state as configured by nvmet_ports_make(). Avoid this by checking for NVMF_TRTYPE_MAX before proceeding. Fixes: 200adac75888 ("nvme: Add PCI transport type") Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
2025-04-22drm: panel: jd9365da: fix reset signal polarity in unprepareHugo Villeneuve
commit a8972d5a49b4 ("drm: panel: jd9365da-h3: fix reset signal polarity") fixed reset signal polarity in jadard_dsi_probe() and jadard_prepare(). It was not done in jadard_unprepare() because of an incorrect assumption about reset line handling in power off mode. After looking into the datasheet, it now appears that before disabling regulators, the reset line is deasserted first, and if reset_before_power_off_vcioo is true, then the reset line is asserted. Fix reset polarity by inverting gpiod_set_value() second argument in in jadard_unprepare(). Fixes: 6b818c533dd8 ("drm: panel: Add Jadard JD9365DA-H3 DSI panel") Fixes: 2b976ad760dc ("drm/panel: jd9365da: Support for kd101ne3-40ti MIPI-DSI panel") Fixes: a8972d5a49b4 ("drm: panel: jd9365da-h3: fix reset signal polarity") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250417195507.778731-1-hugo@hugovil.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250417195507.778731-1-hugo@hugovil.com
2025-04-22drm/meson: use unsigned long long / Hz for frequency typesMartin Blumenstingl
Christian reports that 4K output using YUV420 encoding fails with the following error: Fatal Error, invalid HDMI vclk freq 593406 Modetest shows the following: 3840x2160 59.94 3840 4016 4104 4400 2160 2168 2178 2250 593407 flags: xxxx, xxxx, drm calculated value -------------------------------------^ This indicates that there's a (1kHz) mismatch between the clock calculated by the drm framework and the meson driver. Relevant function call stack: (drm framework) -> meson_encoder_hdmi_atomic_enable() -> meson_encoder_hdmi_set_vclk() -> meson_vclk_setup() The video clock requested by the drm framework is 593407kHz. This is passed by meson_encoder_hdmi_atomic_enable() to meson_encoder_hdmi_set_vclk() and the following formula is applied: - the frequency is halved (which would be 296703.5kHz) and rounded down to the next full integer, which is 296703kHz - TMDS clock is calculated (296703kHz * 10) - video encoder clock is calculated - this needs to match a table from meson_vclk.c and so it doubles the previously halved value again (resulting in 593406kHz) - meson_vclk_setup() can't find (either directly, or by deriving it from 594000kHz * 1000 / 1001 and rounding to the closest integer value - which is 593407kHz as originally requested by the drm framework) a matching clock in it's internal table and errors out with "invalid HDMI vclk freq" Fix the division precision by switching the whole meson driver to use unsigned long long (64-bit) Hz values for clock frequencies instead of unsigned int (32-bit) kHz to fix the rouding error. Fixes: e5fab2ec9ca4 ("drm/meson: vclk: add support for YUV420 setup") Reported-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-3-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-3-martin.blumenstingl@googlemail.com
2025-04-22Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"Christian Hewitt
This reverts commit bfbc68e. The patch does permit the offending YUV420 @ 59.94 phy_freq and vclk_freq mode to match in calculations. It also results in all fractional rates being unavailable for use. This was unintended and requires the patch to be reverted. Fixes: bfbc68e4d869 ("drm/meson: vclk: fix calculation of 59.94 fractional rates") Cc: stable@vger.kernel.org Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-2-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250421201300.778955-2-martin.blumenstingl@googlemail.com
2025-04-22scsi: Improve CDL controlDamien Le Moal
With ATA devices supporting the CDL feature, using CDL requires that the feature be enabled with a SET FEATURES command. This command is issued as the translated command for the MODE SELECT command issued by scsi_cdl_enable() when the user enables CDL through the device cdl_enable sysfs attribute. However, the implementation of scsi_cdl_enable() always issues a MODE SELECT command for ATA devices when the enable argument is true, even if CDL is already enabled on the device. While this does not cause any issue with using CDL descriptors with read/write commands (the CDL feature will be enabled on the drive), issuing the MODE SELECT command even when the device CDL feature is already enabled will cause a reset of the ATA device CDL statistics log page (as defined in ACS, any CDL enable action must reset the device statistics). Avoid this needless actions (and the implied statistics log page reset) by modifying scsi_cdl_enable() to issue the MODE SELECT command to enable CDL if and only if CDL is not reported as already enabled on the device. And while at it, simplify the initialization of the is_ata boolean variable and move the declaration of the scsi mode data and sense header variables to within the scope of ATA device handling. Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-22ata: libata-scsi: Improve CDL controlDamien Le Moal
With ATA devices supporting the CDL feature, using CDL requires that the feature be enabled with a SET FEATURES command. This command is issued as the translated command for the MODE SELECT command issued by scsi_cdl_enable() when the user enables CDL through the device cdl_enable sysfs attribute. Currently, ata_mselect_control_ata_feature() always translates a MODE SELECT command for the ATA features subpage of the control mode page to a SET FEATURES command to enable or disable CDL based on the cdl_ctrl field. However, there is no need to issue the SET FEATURES command if: 1) The MODE SELECT command requests disabling CDL and CDL is already disabled. 2) The MODE SELECT command requests enabling CDL and CDL is already enabled. Fix ata_mselect_control_ata_feature() to issue the SET FEATURES command only when necessary. Since enabling CDL also implies a reset of the CDL statistics log page, avoiding useless CDL enable operations also avoids clearing the CDL statistics log. Also add debug messages to clearly signal when CDL is being enabled or disabled using a SET FEATURES command. Fixes: df60f9c64576 ("scsi: ata: libata: Add ATA feature control sub-page translation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
2025-04-22ata: libata-scsi: Fix ata_msense_control_ata_feature()Damien Le Moal
For the ATA features subpage of the control mode page, the T10 SAT-6 specifications state that: For a MODE SENSE command, the SATL shall return the CDL_CTRL field value that was last set by an application client. However, the function ata_msense_control_ata_feature() always sets the CDL_CTRL field to the 0x02 value to indicate support for the CDL T2A and T2B pages. This is thus incorrect and the value 0x02 must be reported only after the user enables the CDL feature, which is indicated with the ATA_DFLAG_CDL_ENABLED device flag. When this flag is not set, the CDL_CTRL field of the ATA feature subpage of the control mode page must report a value of 0x00. Fix ata_msense_control_ata_feature() to report the correct values for the CDL_CTRL field, according to the enable/disable state of the device CDL feature. Fixes: df60f9c64576 ("scsi: ata: libata: Add ATA feature control sub-page translation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
2025-04-22ata: libata-scsi: Fix ata_mselect_control_ata_feature() return typeDamien Le Moal
The function ata_mselect_control_ata_feature() has a return type defined as unsigned int but this function may return negative error codes, which are correctly propagated up the call chain as integers. Fix ata_mselect_control_ata_feature() to have the correct int return type. While at it, also fix a typo in this function description comment. Fixes: df60f9c64576 ("scsi: ata: libata: Add ATA feature control sub-page translation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
2025-04-22arm64: dts: amlogic: g12: fix reference to unknown/untested PWM clockMartin Blumenstingl
Device-tree expects absent clocks to be specified as <0> (instead of using <>). This fixes using the FCLK4/FCLK3 clocks as they are now seen at their correct index (while before they were recognized, but at the correct index - resulting in the hardware using a different clock than what the kernel sees). Fixes: e6884f2e4129 ("arm64: dts: amlogic: g12: switch to the new PWM controller binding") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250420164801.330505-5-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-04-22arm64: dts: amlogic: gx: fix reference to unknown/untested PWM clockMartin Blumenstingl
Device-tree expects absent clocks to be specified as <0> (instead of using <>). This fixes using the FCLK4/FCLK3 clocks as they are now seen at their correct index (while before they were recognized, but at the correct index - resulting in the hardware using a different clock than what the kernel sees). Fixes: a526eeef9a81 ("arm64: dts: amlogic: gx: switch to the new PWM controller binding") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250420164801.330505-4-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-04-22ARM: dts: amlogic: meson8b: fix reference to unknown/untested PWM clockMartin Blumenstingl
Device-tree expects absent clocks to be specified as <0> (instead of using <>). This fixes using the FCLK4/FCLK3 clocks as they are now seen at their correct index (while before they were recognized, but at the correct index - resulting in the hardware using a different clock than what the kernel sees). Fixes: dbf921861985 ("ARM: dts: amlogic: meson8b: switch to the new PWM controller binding") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250420164801.330505-3-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-04-22ARM: dts: amlogic: meson8: fix reference to unknown/untested PWM clockMartin Blumenstingl
Device-tree expects absent clocks to be specified as <0> (instead of using <>). This fixes using the FCLK4/FCLK3 clocks as they are now seen at their correct index (while before they were recognized, but at the correct index - resulting in the hardware using a different clock than what the kernel sees). Fixes: 802cff460aab ("ARM: dts: amlogic: meson8: switch to the new PWM controller binding") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250420164801.330505-2-martin.blumenstingl@googlemail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-04-22ARM: dts: opos6ul: add ksz8081 phy propertiesSébastien Szymanski
Commit c7e73b5051d6 ("ARM: imx: mach-imx6ul: remove 14x14 EVK specific PHY fixup") removed a PHY fixup that setted the clock mode and the LED mode. Make the Ethernet interface work again by doing as advised in the commit's log, set clock mode and the LED mode in the device tree. Fixes: c7e73b5051d6 ("ARM: imx: mach-imx6ul: remove 14x14 EVK specific PHY fixup") Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>