linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-04-01	netlink: create a new header for internal genetlink symbols	Jakub Kicinski
	There are things in linux/genetlink.h which are only used under net/netlink/. Move them to a new local header. A new header with just 2 externs isn't great, but alternative would be to include af_netlink.h in genetlink.c which feels even worse. Link: https://lore.kernel.org/r/20240329175710.291749-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	tcp/dccp: do not care about families in inet_twsk_purge()	Eric Dumazet
	We lost ability to unload ipv6 module a long time ago. Instead of calling expensive inet_twsk_purge() twice, we can handle all families in one round. Also remove an extra line added in my prior patch, per Kuniyuki Iwashima feedback. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/20240327192934.6843-1-kuniyu@amazon.com/ Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240329153203.345203-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	inet: preserve const qualifier in inet_csk()	Eric Dumazet
	We can change inet_csk() to propagate its argument const qualifier, thanks to container_of_const(). We have to fix few places that had mistakes, like tcp_bound_rto(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240329144931.295800-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02	crypto: remove CONFIG_CRYPTO_STATS	Eric Biggers
	Remove support for the "Crypto usage statistics" feature (CONFIG_CRYPTO_STATS). This feature does not appear to have ever been used, and it is harmful because it significantly reduces performance and is a large maintenance burden. Covering each of these points in detail: 1. Feature is not being used Since these generic crypto statistics are only readable using netlink, it's fairly straightforward to look for programs that use them. I'm unable to find any evidence that any such programs exist. For example, Debian Code Search returns no hits except the kernel header and kernel code itself and translations of the kernel header: https://codesearch.debian.net/search?q=CRYPTOCFGA_STAT&literal=1&perpkg=1 The patch series that added this feature in 2018 (https://lore.kernel.org/linux-crypto/1537351855-16618-1-git-send-email-clabbe@baylibre.com/) said "The goal is to have an ifconfig for crypto device." This doesn't appear to have happened. It's not clear that there is real demand for crypto statistics. Just because the kernel provides other types of statistics such as I/O and networking statistics and some people find those useful does not mean that crypto statistics are useful too. Further evidence that programs are not using CONFIG_CRYPTO_STATS is that it was able to be disabled in RHEL and Fedora as a bug fix (https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2947). Even further evidence comes from the fact that there are and have been bugs in how the stats work, but they were never reported. For example, before Linux v6.7 hash stats were double-counted in most cases. There has also never been any documentation for this feature, so it might be hard to use even if someone wanted to. 2. CONFIG_CRYPTO_STATS significantly reduces performance Enabling CONFIG_CRYPTO_STATS significantly reduces the performance of the crypto API, even if no program ever retrieves the statistics. This primarily affects systems with a large number of CPUs. For example, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039576 reported that Lustre client encryption performance improved from 21.7GB/s to 48.2GB/s by disabling CONFIG_CRYPTO_STATS. It can be argued that this means that CONFIG_CRYPTO_STATS should be optimized with per-cpu counters similar to many of the networking counters. But no one has done this in 5+ years. This is consistent with the fact that the feature appears to be unused, so there seems to be little interest in improving it as opposed to just disabling it. It can be argued that because CONFIG_CRYPTO_STATS is off by default, performance doesn't matter. But Linux distros tend to error on the side of enabling options. The option is enabled in Ubuntu and Arch Linux, and until recently was enabled in RHEL and Fedora (see above). So, even just having the option available is harmful to users. 3. CONFIG_CRYPTO_STATS is a large maintenance burden There are over 1000 lines of code associated with CONFIG_CRYPTO_STATS, spread among 32 files. It significantly complicates much of the implementation of the crypto API. After the initial submission, many fixes and refactorings have consumed effort of multiple people to keep this feature "working". We should be spending this effort elsewhere. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02	crypto: qat - add interface for live migration	Xin Zeng
	Extend the driver with a new interface to be used for VF live migration. This allows to create and destroy a qat_mig_dev object that contains a set of methods to allow to save and restore the state of QAT VF. This interface will be used by the qat-vfio-pci module. Signed-off-by: Xin Zeng <xin.zeng@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-01	dlm: remove lkb from callback tracepoints	Alexander Aring
	Stop using lkb structs in the callback tracepoints so that lkb references are not needed. This prepares for separating lkb structs from callbacks. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-01	block: add a bio_list_merge_init helper	Christoph Hellwig
	This is a simple combination of bio_list_merge + bio_list_init similar to list_splice_init. While it only saves a single line in a callers, it makes the move all bios from one list to another and reinitialize the original pattern a lot more obvious in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Sakai <msakai@redhat.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240328084147.2954434-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-01	Merge 6.9-rc2 into usb-next	Greg Kroah-Hartman
	We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-01	bus: mhi: host: Add mhi_power_down_keep_dev() API to support system ↵	Baochen Qiang
	suspend/hibernation Currently, ath11k fails to resume from system suspend/hibernation on some the x86 host machines with below error message: ``` ath11k_pci 0000:06:00.0: timeout while waiting for restart complete ``` This happens because, ath11k powers down the MHI stack during suspend and that leads to destruction of the struct device associated with the MHI channels. And during resume, ath11k calls calling mhi_sync_power_up() to power up the MHI subsystem and that eventually calls the driver framework's device_add() API from mhi_create_devices(). But the PM framework blocks the struct device creation during device_add() and this leads to probe deferral as below: ``` mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral ``` The reason for deferring device creation during resume is explained in dpm_prepare(): /* * It is unsafe if probing of devices will happen during suspend or * hibernation and system behavior will be unpredictable in this * case. So, let's prohibit device's probing here and defer their * probes instead. The normal behavior will be restored in * dpm_complete(). */ Due to the device probe deferral, qcom_mhi_qrtr_probe() API is not getting called during resume and thus MHI channels are not prepared. So this blocks the QMI messages from being transferred between ath11k and firmware, resulting in a firmware initialization failure. After consulting with Rafael, it was decided to not destroy the struct device for the MHI channels during system suspend/hibernation because the device is bound to appear again during resume. So to achieve this, a new API called mhi_power_down_keep_dev() is introduced for MHI controllers to keep the struct device when required. This API is similar to the existing mhi_power_down() API, except that it keeps the struct device associated with MHI channels instead of destroying them. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240305021320.3367-2-quic_bqiang@quicinc.com [mani: reworded the commit message and subject] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-04-01	power: supply: bq27xxx: Move health reading out of update loop	Andrew Davis
	Most of the functions that read values return a status and put the value itself in an a function parameter. Update health reading to match. As health is not checked for changes as part of the update loop, remove the read of this from the periodic update loop. This saves I2C/1W bandwidth. It also means we do not have to cache it, fresh values are read when requested. Signed-off-by: Andrew Davis <afd@ti.com> Link: https://lore.kernel.org/r/20240325203129.150030-6-afd@ti.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-04-01	power: supply: bq27xxx: Move cycle count reading out of update loop	Andrew Davis
	Most of the functions that read values return a status and put the value itself in an a function parameter. Update cycle count reading to match. As cycle count is not checked for changes as part of the update loop, remove the read of this from the periodic update loop. This saves I2C/1W bandwidth. It also means we do not have to cache it, fresh values are read when requested. Signed-off-by: Andrew Davis <afd@ti.com> Link: https://lore.kernel.org/r/20240325203129.150030-5-afd@ti.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-04-01	power: supply: bq27xxx: Move energy reading out of update loop	Andrew Davis
	Most of the functions that read values return a status and put the value itself in an a function parameter. Update energy reading to match. As energy is not checked for changes as part of the update loop, remove the read of this from the periodic update loop. This saves I2C/1W bandwidth. It also means we do not have to cache it, fresh values are read when requested. Signed-off-by: Andrew Davis <afd@ti.com> Link: https://lore.kernel.org/r/20240325203129.150030-4-afd@ti.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-04-01	power: supply: bq27xxx: Move charge reading out of update loop	Andrew Davis
	Most of the functions that read values return a status and put the value itself in an a function parameter. Update charge reading to match. As charge state is not checked for changes as part of the update loop, remove the read of this from the periodic update loop. This saves I2C/1W bandwidth. It also means we do not have to cache it, fresh values are read when requested. Signed-off-by: Andrew Davis <afd@ti.com> Link: https://lore.kernel.org/r/20240325203129.150030-3-afd@ti.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-04-01	power: supply: bq27xxx: Move time reading out of update loop	Andrew Davis
	Most of the functions that read values return a status and put the value itself in an a function parameter. Update time reading to match. As time is not checked for changes as part of the update loop, remove the read of the this from the periodic update loop. This saves I2C/1W bandwidth. It also means we do not have to cache it, fresh values are read when requested. Signed-off-by: Andrew Davis <afd@ti.com> Link: https://lore.kernel.org/r/20240325203129.150030-2-afd@ti.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-04-01	power: supply: bq27xxx: Move temperature reading out of update loop	Andrew Davis
	Most of the functions that read values return a status and put the value itself in an a function parameter. Update temperature reading to match. As temp is not checked for changes as part of the update loop, remove the read of the temperature from the periodic update loop. This saves I2C/1W bandwidth. It also means we do not have to cache it, fresh values are read when requested. Signed-off-by: Andrew Davis <afd@ti.com> Link: https://lore.kernel.org/r/20240325203129.150030-1-afd@ti.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-04-01	net: rps: move received_rps field to a better location	Eric Dumazet
	Commit 14d898f3c1b3 ("dev: Move received_rps counter next to RPS members in softnet data") was unfortunate: received_rps is dirtied by a cpu and never read by other cpus in fast path. Its presence in the hot RPS cache line (shared by many cpus) is hurting RPS/RFS performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	net: rps: add rps_input_queue_head_add() helper	Eric Dumazet
	process_backlog() can batch increments of sd->input_queue_head, saving some memory bandwidth. Also add READ_ONCE()/WRITE_ONCE() annotations around sd->input_queue_head accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	net: rps: change input_queue_tail_incr_save()	Eric Dumazet
	input_queue_tail_incr_save() is incrementing the sd queue_tail and save it in the flow last_qtail. Two issues here : - no lock protects the write on last_qtail, we should use appropriate annotations. - We can perform this write after releasing the per-cpu backlog lock, to decrease this lock hold duration (move away the cache line miss) Also move input_queue_head_incr() and rps helpers to include/net/rps.h, while adding rps_ prefix to better reflect their role. v2: Fixed a build issue (Jakub and kernel build bots) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	net: make softnet_data.dropped an atomic_t	Eric Dumazet
	If under extreme cpu backlog pressure enqueue_to_backlog() has to drop a packet, it could do this without dirtying a cache line and potentially slowing down the target cpu. Move sd->dropped into a separate cache line, and make it atomic. In non pressure mode, this field is not touched, no need to consume valuable space in a hot cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	net: move dev_xmit_recursion() helpers to net/core/dev.h	Eric Dumazet
	Move dev_xmit_recursion() and friends to net/core/dev.h They are only used from net/core/dev.c and net/core/filter.c. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	net: move kick_defer_list_purge() to net/core/dev.h	Eric Dumazet
	kick_defer_list_purge() is defined in net/core/dev.c and used from net/core/skubff.c Because we need softnet_data, include <linux/netdevice.h> from net/core/dev.h Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	pfcp: always set pfcp metadata	Michal Swiatkowski
	In PFCP receive path set metadata needed by flower code to do correct classification based on this metadata. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	pfcp: add PFCP module	Wojciech Drewek
	Packet Forwarding Control Protocol (PFCP) is a 3GPP Protocol used between the control plane and the user plane function. It is specified in TS 29.244[1]. Note that this module is not designed to support this Protocol in the kernel space. There is no support for parsing any PFCP messages. There is no API that could be used by any userspace daemon. Basically it does not support PFCP. This protocol is sophisticated and there is no need for implementing it in the kernel. The purpose of this module is to allow users to setup software and hardware offload of PFCP packets using tc tool. When user requests to create a PFCP device, a new socket is created. The socket is set up with port number 8805 which is specific for PFCP [29.244 4.2.2]. This allow to receive PFCP request messages, response messages use other ports. Note that only one PFCP netdev can be created. Only IPv4 is supported at this time. [1] https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3111 Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	ip_tunnel: convert __be16 tunnel flags to bitmaps	Alexander Lobakin
	Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL__BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_ occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [] ip_tunnel.ko: 390/-900 (-510) [] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [] ip6_tunnel.ko: 118/-10 (108) [] gre_flags_to_tnl_flags() grew, but still is inlined [] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	ip_tunnel: use a separate struct to store tunnel params in the kernel	Alexander Lobakin
	Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure to store params inside the kernel, IPv4 tunnel code uses the same ip_tunnel_parm which is being used to talk with the userspace. This makes it difficult to alter or add any fields or use a different format for whatever data. Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for now, and use it throughout the code. Define the pieces, where the copy user <-> kernel happens, as standalone functions, and copy the data there field-by-field, so that the kernel-side structure could be easily modified later on and the users wouldn't have to care about this. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()	Alexander Lobakin
	Now that we have generic bitmap_read() and bitmap_write(), which are inline and try to take care of non-bound-crossing and aligned cases to keep them optimized, collapse bitmap_{get,set}_value8() into simple wrappers around the former ones. bloat-o-meter shows no difference in vmlinux and -2 bytes for gpio-pca953x.ko, which says the optimization didn't suffer due to that change. The converted helpers have the value width embedded and always compile-time constant and that helps a lot. Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	bitmap: introduce generic optimized bitmap_size()	Alexander Lobakin
	The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue. Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13): 48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC: 8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter: add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617) Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes: add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520) Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros	Alexander Lobakin
	Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops on compile-time constants"), the non-atomic bitops are macros which can be expanded by the compilers into compile-time expressions, which will result in better optimized object code. Unfortunately, turned out that passing `volatile` to those macros discards any possibility of optimization, as the compilers then don't even try to look whether the passed bitmap is known at compilation time. In addition to that, the mentioned linkmode helpers are marked with `inline`, not `__always_inline`, meaning that it's not guaranteed some compiler won't uninline them for no reason, which will also effectively prevent them from being optimized (it's a well-known thing the compilers sometimes uninline `2 + 2`). Convert linkmode_*_bit() from inlines to macros. Their calling convention are 1:1 with the corresponding bitops, so that it's not even needed to enumerate and map the arguments, only the names. No changes in vmlinux' object code (compiled by LLVM for x86_64) whatsoever, but that doesn't necessarily means the change is meaningless. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	bitops: let the compiler optimize {__,}assign_bit()	Alexander Lobakin
	Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops on compile-time constants"), the compilers are able to expand inline bitmap operations to compile-time initializers when possible. However, during the round of replacement if-__set-else-__clear with __assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes difference in object code size for one module (even one function), where the pattern: DECLARE_BITMAP(foo) = { }; // on the stack, zeroed if (a) __set_bit(const_bit_num, foo); if (b) __set_bit(another_const_bit_num, foo); ... is heavily used, although there should be no difference: the bitmap is zeroed, so the second half of __assign_bit() should be compiled-out as a no-op. I either missed the fact that __assign_bit() has bitmap pointer marked as `volatile` (as we usually do for bitops) or was hoping that the compilers would at least try to look past the `volatile` for __always_inline functions. Anyhow, due to that attribute, the compilers were always compiling the whole expression and no mentioned compile-time optimizations were working. Convert __assign_bit() to a macro since it's a very simple if-else and all of the checks are performed inside __set_bit() and __clear_bit(), thus that wrapper has to be as transparent as possible. After that change, despite it showing only -20 bytes change for vmlinux (due to that it's still relatively unpopular), no drastic code size changes happen when replacing if-set-else-clear for onstack bitmaps with __assign_bit(), meaning the compiler now expands them to the actual operations will all the expected optimizations. Atomic assign_bit() is less affected due to its nature, but let's convert it to a macro as well to keep the code consistent and not leave a place for possible suboptimal codegen. Moreover, with certain kernel configuration it actually gives some saves (x86): do_ip_setsockopt 4154 4099 -55 Suggested-by: Yury Norov <yury.norov@gmail.com> # assign_bit(), too Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	bitops: make BYTES_TO_BITS() treewide-available	Alexander Lobakin
	Avoid open-coding that simple expression each time by moving BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export it to the rest of the kernel. Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always equals to %BITS_PER_BYTE, regardless of the target architecture. Do the same for the tools ecosystem as well (incl. its version of bitops.h). The previous implementation had its implicit type of long, while the new one is int, so adjust the format literal accordingly in the perf code. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	bitops: add missing prototype check	Alexander Lobakin
	Commit 8238b4579866 ("wait_on_bit: add an acquire memory barrier") added a new bitop, test_bit_acquire(), with proper wrapping in order to try to optimize it at compile-time, but missed the list of bitops used for checking their prototypes a bit below. The functions added have consistent prototypes, so that no more changes are required and no functional changes take place. Fixes: 8238b4579866 ("wait_on_bit: add an acquire memory barrier") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	lib/bitmap: add bitmap_{read,write}()	Syed Nayyar Waris
	The two new functions allow reading/writing values of length up to BITS_PER_LONG bits at arbitrary position in the bitmap. The code was taken from "bitops: Introduce the for_each_set_clump macro" by Syed Nayyar Waris with a number of changes and simplifications: - instead of using roundup(), which adds an unnecessary dependency on <linux/math.h>, we calculate space as BITS_PER_LONG-offset; - indentation is reduced by not using else-clauses (suggested by checkpatch for bitmap_get_value()); - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read() and bitmap_write(); - some redundant computations are omitted. Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Syed Nayyar Waris <syednwaris@gmail.com> Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Link: https://lore.kernel.org/lkml/fe12eedf3666f4af5138de0e70b67a07c7f40338.1592224129.git.syednwaris@gmail.com/ Suggested-by: Yury Norov <yury.norov@gmail.com> Co-developed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	timers: Fix kernel-doc format and add Return values	Randy Dunlap
	Fix kernel-doc format and warnings: timer.h:26: warning: Cannot understand * @TIMER_DEFERRABLE: A deferrable timer will work normally when the on line 26 - I thought it was a doc line timer.h:146: warning: No description found for return value of 'timer_pending' timer.h:180: warning: No description found for return value of 'del_timer_sync' timer.h:193: warning: No description found for return value of 'del_timer' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240331172652.14086-4-rdunlap@infradead.org
2024-04-01	time/timekeeping: Fix kernel-doc warnings and typos	Randy Dunlap
	Fix punctuation, spellos, and kernel-doc warnings: timekeeping.h:79: warning: No description found for return value of 'ktime_get_real' timekeeping.h:95: warning: No description found for return value of 'ktime_get_boottime' timekeeping.h:108: warning: No description found for return value of 'ktime_get_clocktai' timekeeping.h:149: warning: Function parameter or struct member 'mono' not described in 'ktime_mono_to_real' timekeeping.h:149: warning: No description found for return value of 'ktime_mono_to_real' timekeeping.h:255: warning: Function parameter or struct member 'cs_id' not described in 'system_time_snapshot' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240331172652.14086-3-rdunlap@infradead.org
2024-04-01	time/timecounter: Fix inline documentation	Randy Dunlap
	Fix kernel-doc warnings, text punctuation, and a kernel-doc marker (change '%' to '&' to indicate a struct): timecounter.h:72: warning: No description found for return value of 'cyclecounter_cyc2ns' timecounter.h:85: warning: Function parameter or member 'tc' not described in 'timecounter_adjtime' timecounter.h:111: warning: No description found for return value of 'timecounter_read' timecounter.h:128: warning: No description found for return value of 'timecounter_cyc2time' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240331172652.14086-2-rdunlap@infradead.org
2024-04-01	Merge tag 'v6.9-rc2' into media_stage	Hans Verkuil
	Linux 6.9-rc2 This is needed to pull in commit 11763a8598f88 ("fs/9p: fix uaf in in v9fs_stat2inode_dotl"), which fixes the broken virtme. With this fix the media regression tests can be run again without crashing.
2024-03-31	Merge tag 'kbuild-fixes-v6.9' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Deduplicate Kconfig entries for CONFIG_CXL_PMU - Fix unselectable choice entry in MIPS Kconfig, and forbid this structure - Remove unused include/asm-generic/export.h - Fix a NULL pointer dereference bug in modpost - Enable -Woverride-init warning consistently with W=1 - Drop KCSAN flags from .mod.c files tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: Fix typo HEIGTH to HEIGHT Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries kbuild: make -Woverride-init warnings more consistent modpost: do not make find_tosym() return NULL export.h: remove include/asm-generic/export.h kconfig: do not reparent the menu inside a choice block MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
2024-03-31	Merge tag 'irq_urgent_for_v6.9_rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix an unused function warning on irqchip/irq-armada-370-xp - Fix the IRQ sharing with pinctrl-amd and ACPI OSL * tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-370-xp: Suppress unused-function warning genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
2024-03-31	fpga: bridge: add owner module and take its refcount	Marco Pagani
	The current implementation of the fpga bridge assumes that the low-level module registers a driver for the parent device and uses its owner pointer to take the module's refcount. This approach is problematic since it can lead to a null pointer dereference while attempting to get the bridge if the parent device does not have a driver. To address this problem, add a module owner pointer to the fpga_bridge struct and use it to take the module's refcount. Modify the function for registering a bridge to take an additional owner module parameter and rename it to avoid conflicts. Use the old function name for a helper macro that automatically sets the module that registers the bridge as the owner. This ensures compatibility with existing low-level control modules and reduces the chances of registering a bridge without setting the owner. Also, update the documentation to keep it consistent with the new interface for registering an fpga bridge. Other changes: opportunistically move put_device() from __fpga_bridge_get() to fpga_bridge_get() and of_fpga_bridge_get() to improve code clarity since the bridge device is taken in these functions. Fixes: 21aeda950c5f ("fpga: add fpga bridge framework") Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Xu Yilun <yilun.xu@intel.com> Reviewed-by: Russ Weight <russ.weight@linux.dev> Signed-off-by: Marco Pagani <marpagan@redhat.com> Acked-by: Xu Yilun <yilun.xu@intel.com> Link: https://lore.kernel.org/r/20240322171839.233864-1-marpagan@redhat.com Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
2024-03-31	fpga: manager: add owner module and take its refcount	Marco Pagani
	The current implementation of the fpga manager assumes that the low-level module registers a driver for the parent device and uses its owner pointer to take the module's refcount. This approach is problematic since it can lead to a null pointer dereference while attempting to get the manager if the parent device does not have a driver. To address this problem, add a module owner pointer to the fpga_manager struct and use it to take the module's refcount. Modify the functions for registering the manager to take an additional owner module parameter and rename them to avoid conflicts. Use the old function names for helper macros that automatically set the module that registers the manager as the owner. This ensures compatibility with existing low-level control modules and reduces the chances of registering a manager without setting the owner. Also, update the documentation to keep it consistent with the new interface for registering an fpga manager. Other changes: opportunistically move put_device() from __fpga_mgr_get() to fpga_mgr_get() and of_fpga_mgr_get() to improve code clarity since the manager device is taken in these functions. Fixes: 654ba4cc0f3e ("fpga manager: ensure lifetime with of_fpga_mgr_get") Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Marco Pagani <marpagan@redhat.com> Acked-by: Xu Yilun <yilun.xu@intel.com> Link: https://lore.kernel.org/r/20240305192926.84886-1-marpagan@redhat.com Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
2024-03-30	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes and updates from James Bottomley: "Fully half this pull is updates to lpfc and qla2xxx which got committed just as the merge window opened. A sizeable fraction of the driver updates are simple bug fixes (and lock reworks for bug fixes in the case of lpfc), so rather than splitting the few actual enhancements out, we're just adding the drivers to the -rc1 pull. The enhancements for lpfc are log message removals, copyright updates and three patches redefining types. For qla2xxx it's just removing a debug message on module removal and the manufacturer detail update. The two major fixes are the sg teardown race and a core error leg problem with the procfs directory not being removed if we destroy a created host that never got to the running state. The rest are minor fixes and constifications" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (41 commits) scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload scsi: core: Fix unremoved procfs host directory regression scsi: mpi3mr: Avoid memcpy field-spanning write WARNING scsi: sd: Fix TCG OPAL unlock on system resume scsi: sg: Avoid sg device teardown race scsi: lpfc: Copyright updates for 14.4.0.1 patches scsi: lpfc: Update lpfc version to 14.4.0.1 scsi: lpfc: Define types in a union for generic void *context3 ptr scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr scsi: lpfc: Use a dedicated lock for ras_fwlog state scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up() scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port() scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling scsi: lpfc: Move NPIV's transport unregistration to after resource clean up scsi: lpfc: Remove unnecessary log message in queuecommand path scsi: qla2xxx: Update version to 10.02.09.200-k scsi: qla2xxx: Delay I/O Abort on PCI error scsi: qla2xxx: Change debug message during driver unload ...
2024-03-30	x86/mce: Clean up TP_printk() output line of the 'mce_record' tracepoint	Ingo Molnar
	- Only capitalize entries where that makes sense - Print separate values separately - Rename 'PROCESSOR' to vendor & CPUID Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Avadhut Naik <avadhut.naik@amd.com> Cc: "Tony Luck" <tony.luck@intel.com> Link: https://lore.kernel.org/r/ZgZpn/zbCJWYdL5y@gmail.com
2024-03-29	Merge tag 'drm-fixes-2024-03-30' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Regular fixes for rc2, quite a few i915/amdgpu as usual, some xe, and then mostly scattered around. rc3 might be quieter with the holidays but we shall see. bridge: - select DRM_KMS_HELPER dma-buf: - fix NULL-pointer deref dp: - fix div-by-zero in DP MST unplug code fbdev: - select FB_IOMEM_FOPS for SBus sched: - fix NULL-pointer deref xe: - Fix build on mips - Fix wrong bound checks - Fix use of msec rather than jiffies - Remove dead code amdgpu: - SMU 14.0.1 updates - DCN 3.5.x updates - VPE fix - eDP panel flickering fix - Suspend fix - PSR fix - DCN 3.0+ fix - VCN 4.0.6 updates - debugfs fix amdkfd: - DMA-Buf fix - GFX 9.4.2 TLB flush fix - CP interrupt fix i915: - Fix for BUG_ON/BUILD_BUG_ON IN I915_memcpy.c - Update a MTL workaround - Fix locking inversion in hwmon's sysfs - Remove a bogus error message around PXP - Fix UAF on VMA - Reset queue_priority_hint on parking - Display Fixes: - Remove duplicated audio enable/disable on SDVO and DP - Disable AuxCCS for Xe driver - Revert init order of MIPI DSI - DRRS debugfs fix with an extra refactor patch - VRR related fixes - Fix a JSL eDP corruption - Fix the cursor physical dma address - BIOS VBT related fix nouveau: - dmem: handle kcalloc() allocation failures qxl: - remove unused variables rockchip: - vop2: remove support for AR30 and AB30 formats vmwgfx: - debugfs: create ttm_resource_manager entry only if needed" * tag 'drm-fixes-2024-03-30' of https://gitlab.freedesktop.org/drm/kernel: (55 commits) drm/i915/bios: Tolerate devdata==NULL in intel_bios_encoder_supports_dp_dual_mode() drm/i915: Pre-populate the cursor physical dma address drm/i915/gt: Reset queue_priority_hint on parking drm/i915/vma: Fix UAF on destroy against retire race drm/i915: Do not print 'pxp init failed with 0' when it succeed drm/i915: Do not match JSL in ehl_combo_pll_div_frac_wa_needed() drm/i915/hwmon: Fix locking inversion in sysfs getter drm/i915/dsb: Fix DSB vblank waits when using VRR drm/i915/vrr: Generate VRR "safe window" for DSB drm/i915/display/debugfs: Fix duplicate checks in i915_drrs_status drm/i915/drrs: Refactor CPU transcoder DRRS check drm/i915/mtl: Update workaround 14018575942 drm/i915/dsi: Go back to the previous INIT_OTP/DISPLAY_ON order, mostly drm/i915/display: Disable AuxCCS framebuffers if built for Xe drm/i915: Stop doing double audio enable/disable on SDVO and g4x+ DP drm/i915: Add includes for BUG_ON/BUILD_BUG_ON in i915_memcpy.c drm/qxl: remove unused variable from `qxl_process_single_command()` drm/qxl: remove unused `count` variable from `qxl_surface_id_alloc()` drm/i915: add bug.h include to i915_memcpy.c drm/vmwgfx: Create debugfs ttm_resource_manager entry only if needed ...
2024-03-29	netlink: introduce type-checking attribute iteration	Johannes Berg
	There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	net: add sk_wake_async_rcu() helper	Eric Dumazet
	While looking at UDP receive performance, I saw sk_wake_async() was no longer inlined. This matters at least on AMD Zen1-4 platforms (see SRSO) This might be because rcu_read_lock() and rcu_read_unlock() are no longer nops in recent kernels ? Add sk_wake_async_rcu() variant, which must be called from contexts already holding rcu lock. As SOCK_FASYNC is deprecated in modern days, use unlikely() to give a hint to the compiler. sk_wake_async_rcu() is properly inlined from __udp_enqueue_schedule_skb() and sock_def_readable(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240328144032.1864988-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	net: udp: add IP/port data to the tracepoint udp/udp_fail_queue_rcv_skb	Balazs Scheidler
	The udp_fail_queue_rcv_skb() tracepoint lacks any details on the source and destination IP/port whereas this information can be critical in case of UDP/syslog. Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/0c8b3e33dbf679e190be6f4c6736603a76988a20.1711475011.git.balazs.scheidler@axoflow.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	net: port TP_STORE_ADDR_PORTS_SKB macro to be tcp/udp independent	Balazs Scheidler
	This patch moves TP_STORE_ADDR_PORTS_SKB() to a common header and removes the TCP specific implementation details. Previously the macro assumed the skb passed as an argument is a TCP packet, the implementation now uses an argument to the L4 header and uses that to extract the source/destination ports, which happen to be named the same in "struct tcphdr" and "struct udphdr" Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com> Link: https://lore.kernel.org/r/9e306f78260dfbbdc7353ba5f864cc027a409540.1711475011.git.balazs.scheidler@axoflow.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	Merge tag 'gpio-fixes-for-v6.9-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix a procfs failure when requesting an interrupt with a label containing the '/' character - add missing stubs for GPIO lookup functions for !GPIOLIB - fix debug messages that would print "(null)" for NULL strings * tag 'gpio-fixes-for-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Fix debug messaging in gpiod_find_and_request() gpiolib: Add stubs for GPIO lookup functions gpio: cdev: sanitize the label before requesting the interrupt
2024-03-29	af_unix: Replace garbage collection algorithm.	Kuniyuki Iwashima
	If we find a dead SCC during iteration, we call unix_collect_skb() to splice all skb in the SCC to the global sk_buff_head, hitlist. After iterating all SCC, we unlock unix_gc_lock and purge the queue. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-15-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	af_unix: Assign a unique index to SCC.	Kuniyuki Iwashima
	The definition of the lowlink in Tarjan's algorithm is the smallest index of a vertex that is reachable with at most one back-edge in SCC. This is not useful for a cross-edge. If we start traversing from A in the following graph, the final lowlink of D is 3. The cross-edge here is one between D and C. A -> B -> D D = (4, 3) (index, lowlink) ^ \| \| C = (3, 1) \| V \| B = (2, 1) `--- C <--' A = (1, 1) This is because the lowlink of D is updated with the index of C. In the following patch, we detect a dead SCC by checking two conditions for each vertex. 1) vertex has no edge directed to another SCC (no bridge) 2) vertex's out_degree is the same as the refcount of its file If 1) is false, there is a receiver of all fds of the SCC and its ancestor SCC. To evaluate 1), we need to assign a unique index to each SCC and assign it to all vertices in the SCC. This patch changes the lowlink update logic for cross-edge so that in the example above, the lowlink of D is updated with the lowlink of C. A -> B -> D D = (4, 1) (index, lowlink) ^ \| \| C = (3, 1) \| V \| B = (2, 1) `--- C <--' A = (1, 1) Then, all vertices in the same SCC have the same lowlink, and we can quickly find the bridge connecting to different SCC if exists. However, it is no longer called lowlink, so we rename it to scc_index. (It's sometimes called lowpoint.) Also, we add a global variable to hold the last index used in DFS so that we do not reset the initial index in each DFS. This patch can be squashed to the SCC detection patch but is split deliberately for anyone wondering why lowlink is not used as used in the original Tarjan's algorithm and many reference implementations. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>