summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-15Merge tag 'net-6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth and wireless. A few more fixes for the locking changes trickling in. Nothing too alarming, I suspect those will continue for another release. Other than that things are slowing down nicely. Current release - fix to a fix: - Bluetooth: hci_event: use key encryption size when its known - tools: ynl-gen: allow multi-attr without nested-attributes again Current release - regressions: - locking fixes: - lock lower level devices when updating features - eth: bnxt_en: bring back rtnl_lock() in the bnxt_open() path - devmem: fix panic when Netlink socket closes after module unload Current release - new code bugs: - eth: txgbe: fixes for FW communication on new AML devices Previous releases - always broken: - sched: flush gso_skb list too during ->change(), avoid potential null-deref on reconfig - wifi: mt76: disable NAPI on driver removal - hv_netvsc: fix error 'nvsp_rndis_pkt_complete error status: 2'" * tag 'net-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits) net: devmem: fix kernel panic when netlink socket close after module unload tsnep: fix timestamping with a stacked DSA driver net/tls: fix kernel panic when alloc_page failed bnxt_en: bring back rtnl_lock() in the bnxt_open() path mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devices wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request octeontx2-pf: Do not reallocate all ntuple filters wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl wifi: mt76: disable napi on driver removal Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer() hv_netvsc: Remove rmsg_pgcnt hv_netvsc: Preserve contiguous PFN grouping in the page buffer array hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges octeontx2-af: Fix CGX Receive counters net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capability net: libwx: Fix FW mailbox unknown command net: libwx: Fix FW mailbox reply timeout net: txgbe: Fix to calculate EEPROM checksum for AML devices octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy ...
2025-05-15ext4: clairfy the rules for modifying extentsZhang Yi
Add a comment at the beginning of extents_status.c to clarify the rules for loading, mapping, modifying, and removing extents and blocks. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250423085257.122685-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-15ext4: check env when mapping and modifying extentsZhang Yi
Add ext4_check_map_extents_env() to the places where loading extents, mapping blocks, removing blocks, and modifying extents, excluding the I/O writeback context. This function will verify whether the locking mechanisms in place are adequate. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250423085257.122685-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-15Bluetooth: btusb: use skb_pull to avoid unsafe access in QCA dump handlingEn-Wei Wu
Use skb_pull() and skb_pull_data() to safely parse QCA dump packets. This avoids direct pointer math on skb->data, which could lead to invalid access if the packet is shorter than expected. Fixes: 20981ce2d5a5 ("Bluetooth: btusb: Add WCN6855 devcoredump support") Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-05-15Bluetooth: L2CAP: Fix not checking l2cap_chan security levelLuiz Augusto von Dentz
l2cap_check_enc_key_size shall check the security level of the l2cap_chan rather than the hci_conn since for incoming connection request that may be different as hci_conn may already been encrypted using a different security level. Fixes: 522e9ed157e3 ("Bluetooth: l2cap: Check encryption key size on incoming connection") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-05-15ublk: fix dead loop when canceling io commandMing Lei
Commit: f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd") adds a request state check in ublk_cancel_cmd(), and if the request is started, skips canceling this uring_cmd. However, the current uring_cmd may be in ACTIVE state, without block request coming to the uring command. Meantime, if the cached request in tag_set.tags[tag] has been delivered to ublk server and reycycled, then this uring_cmd can't be canceled. ublk requests are aborted in ublk char device release handler, which depends on canceling all ACTIVE uring_cmd. So it causes a dead loop. Fix this issue by not taking a stale request into account when canceling uring_cmd in ublk_cancel_cmd(). Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-block/mruqwpf4tqenkbtgezv5oxwq7ngyq24jzeyqy4ixzvivatbbxv@4oh2wzz4e6qn/ Fixes: f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250515162601.77346-1-ming.lei@redhat.com [axboe: rewording of commit message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-15btrfs: use a single variable to track return value at btrfs_page_mkwrite()Filipe Manana
We have two variables to track return values, ret and ret2, with types vm_fault_t (an unsigned int type) and int, which makes it a bit confusing and harder to keep track. So use a single variable, of type int, and under the 'out' label return vmf_error(ret) in case ret contains an error, otherwise return VM_FAULT_NOPAGE. This is equivalent to what we had before and it's simpler. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: don't return VM_FAULT_SIGBUS on failure to set delalloc for mmap writeFilipe Manana
If the call to btrfs_set_extent_delalloc() fails we are always returning VM_FAULT_SIGBUS, which is odd since the error means "bad access" and the most likely cause for btrfs_set_extent_delalloc() is -ENOMEM, which should be translated to VM_FAULT_OOM. Instead of returning VM_FAULT_SIGBUS return vmf_error(ret2), which gives us a more appropriate return value, and we use that everywhere else too. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: simplify early error checking in btrfs_page_mkwrite()Filipe Manana
We have this entangled error checks early at btrfs_page_mkwrite(): 1) Try to reserve delalloc space by calling btrfs_delalloc_reserve_space() and storing the return value in the ret2 variable; 2) If the reservation succeed, call file_update_time() and store the return value in ret2 and also set the local variable 'reserved' to true (1); 3) Then do an error check on ret2 to see if any of the previous calls failed and if so, jump either to the 'out' label or to the 'out_noreserve' label, depending on whether 'reserved' is true or not. This is unnecessarily complex. Instead change this to a simpler and more straightforward approach: 1) Call btrfs_delalloc_reserve_space(), if that returns an error jump to the 'out_noreserve' label; 2) The call file_update_time() and if that returns an error jump to the 'out' label. Like this there's less nested if statements, no need to use a local variable to track if space was reserved and if statements are used only to check errors. Also move the call to extent_changeset_free() out of the 'out_noreserve' label and under the 'out' label since the changeset is allocated only if the call to reserve delalloc space succeeded. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: pass true to btrfs_delalloc_release_space() at btrfs_page_mkwrite()Filipe Manana
In the last call to btrfs_delalloc_release_space() where the value of the variable 'ret' is never zero, we pass the expression 'ret != 0' as the value for the argument 'qgroup_free', which always evaluates to true. Make this less confusing and more clear by explicitly passing true instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: fix wrong start offset for delalloc space release during mmap writeFilipe Manana
If we're doing a mmap write against a folio that has i_size somewhere in the middle and we have multiple sectors in the folio, we may have to release excess space previously reserved, for the range going from the rounded up (to sector size) i_size to the folio's end offset. We are calculating the right amount to release and passing it to btrfs_delalloc_release_space(), but we are passing the wrong start offset of that range - we're passing the folio's start offset instead of the end offset, plus 1, of the range for which we keep the reservation. This may result in releasing more space then we should and eventually trigger an underflow of the data space_info's bytes_may_use counter. So fix this by passing the start offset as 'end + 1' instead of 'page_start' to btrfs_delalloc_release_space(). Fixes: d0b7da88f640 ("Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15x86/cpuid: Rename have_cpuid_p() to cpuid_feature()Ahmed S. Darwish
In order to let all the APIs under <cpuid/api.h> have a shared "cpuid_" namespace, rename have_cpuid_p() to cpuid_feature(). Adjust all call-sites accordingly. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-4-darwi@linutronix.de
2025-05-15x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID headerAhmed S. Darwish
The main CPUID header <asm/cpuid.h> was originally a storefront for the headers: <asm/cpuid/api.h> <asm/cpuid/leaf_0x2_api.h> Now that the latter CPUID(0x2) header has been merged into the former, there is no practical difference between <asm/cpuid.h> and <asm/cpuid/api.h>. Migrate all users to the <asm/cpuid/api.h> header, in preparation of the removal of <asm/cpuid.h>. Don't remove <asm/cpuid.h> just yet, in case some new code in -next started using it. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de
2025-05-15x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h>Ahmed S. Darwish
Move all of the CPUID(0x2) APIs at <cpuid/leaf_0x2_api.h> into <cpuid/api.h>, in order centralize all CPUID APIs into the latter. While at it, separate the different CPUID leaf parsing APIs using header comments like "CPUID(0xN) parsing: ". Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-2-darwi@linutronix.de
2025-05-15perf/x86/intel: Fix segfault with PEBS-via-PT with sample_freqAdrian Hunter
Currently, using PEBS-via-PT with a sample frequency instead of a sample period, causes a segfault. For example: BUG: kernel NULL pointer dereference, address: 0000000000000195 <NMI> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0xca/0x290 ? exc_page_fault+0x7e/0x1b0 ? asm_exc_page_fault+0x26/0x30 ? intel_pmu_pebs_event_update_no_drain+0x40/0x60 ? intel_pmu_pebs_event_update_no_drain+0x32/0x60 intel_pmu_drain_pebs_icl+0x333/0x350 handle_pmi_common+0x272/0x3c0 intel_pmu_handle_irq+0x10a/0x2e0 perf_event_nmi_handler+0x2a/0x50 That happens because intel_pmu_pebs_event_update_no_drain() assumes all the pebs_enabled bits represent counter indexes, which is not always the case. In this particular case, bits 60 and 61 are set for PEBS-via-PT purposes. The behaviour of PEBS-via-PT with sample frequency is questionable because although a PMI is generated (PEBS_PMI_AFTER_EACH_RECORD), the period is not adjusted anyway. Putting that aside, fix intel_pmu_pebs_event_update_no_drain() by passing the mask of counter bits instead of 'size'. Note, prior to the Fixes commit, 'size' would be limited to the maximum counter index, so the issue was not hit. Fixes: 722e42e45c2f1 ("perf/x86: Support counter mask") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20250508134452.73960-1-adrian.hunter@intel.com
2025-05-15perf/aux: Allocate non-contiguous AUX pages by defaultYabin Cui
perf always allocates contiguous AUX pages based on aux_watermark. However, this contiguous allocation doesn't benefit all PMUs. For instance, ARM SPE and TRBE operate with virtual pages, and Coresight ETR allocates a separate buffer. For these PMUs, allocating contiguous AUX pages unnecessarily exacerbates memory fragmentation. This fragmentation can prevent their use on long-running devices. This patch modifies the perf driver to be memory-friendly by default, by allocating non-contiguous AUX pages. For PMUs requiring contiguous pages (Intel BTS and some Intel PT), the existing PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't require but can benefit from contiguous pages (some Intel PT), a new capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their existing behavior. Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250508232642.148767-1-yabinc@google.com
2025-05-15x86/msr: Add rdmsrl_on_cpu() compatibility wrapperIngo Molnar
Add a simple rdmsrl_on_cpu() compatibility wrapper for rdmsrq_on_cpu(), to make life in -next easier, where the PM tree recently grew more uses of the old API. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mario Limonciello <mario.limonciello@amd.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Link: https://lore.kernel.org/r/20250512145517.6e0666e3@canb.auug.org.au
2025-05-15irqchip/irq-pruss-intc: Simplify chained interrupt handler setupChen Ni
The chained interrupt handler setup installs the handler and handler data with two function call.s irq_set_chained_handler_and_data() can set both in one operation. Replace the two calls with one. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250515083450.3811411-1-nichen@iscas.ac.cn
2025-05-15net: devmem: fix kernel panic when netlink socket close after module unloadTaehee Yoo
Kernel panic occurs when a devmem TCP socket is closed after NIC module is unloaded. This is Devmem TCP unregistration scenarios. number is an order. (a)netlink socket close (b)pp destroy (c)uninstall result 1 2 3 OK 1 3 2 (d)Impossible 2 1 3 OK 3 1 2 (e)Kernel panic 2 3 1 (d)Impossible 3 2 1 (d)Impossible (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is closed. (b) page_pool_destroy() is called when the interface is down. (c) mp_ops->uninstall() is called when an interface is unregistered. (d) There is no scenario in mp_ops->uninstall() is called before page_pool_destroy(). Because unregister_netdevice_many_notify() closes interfaces first and then calls mp_ops->uninstall(). (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire netdev_lock(). But if the interface module has already been removed, net_device pointer is invalid, so it causes kernel panic. In summary, there are only 3 possible scenarios. A. sk close -> pp destroy -> uninstall. B. pp destroy -> sk close -> uninstall. C. pp destroy -> uninstall -> sk close. Case C is a kernel panic scenario. In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set binding->dev to NULL. It indicates an bound net_device was unregistered. It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock() if binding->dev is NULL. A new binding->lock is added to protect a dev of a binding. So, lock ordering is like below. priv->lock netdev_lock(dev) binding->lock Tests: Scenario A: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 kill $pid ip link set $interface down modprobe -rv $module Scenario B: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 ip link set $interface down kill $pid modprobe -rv $module Scenario C: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 modprobe -rv $module sleep 5 kill $pid Splat looks like: Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf] CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G B W 6.15.0-rc1+ #2 PREEMPT(undef) 0947ec89efa0fd68838b78e36aa1617e97ff5d7f Tainted: [B]=BAD_PAGE, [W]=WARN RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746) Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811 RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4 R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88 FS: 0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ... netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3)) genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705) ... netlink_release (net/netlink/af_netlink.c:737) ... __sock_release (net/socket.c:647) sock_close (net/socket.c:1393) Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250514154028.1062909-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15x86/mm: Fix kernel-doc descriptions of various pgtable methodsShivank Garg
So 'make W=1' complains about a couple of kernel-doc descriptions in our MM primitives in pgtable.c: arch/x86/mm/pgtable.c:623: warning: Function parameter or struct member 'reserve' not described in 'reserve_top_address' arch/x86/mm/pgtable.c:672: warning: Function parameter or struct member 'p4d' not described in 'p4d_set_huge' arch/x86/mm/pgtable.c:672: warning: Function parameter or struct member 'addr' not described in 'p4d_set_huge' ... so on Fix them all up, add missing parameter documentation, and fix various spelling inconsistencies while at it. [ mingo: Harmonize kernel-doc annotations some more. ] Signed-off-by: Shivank Garg <shivankg@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20250514062637.3287779-1-shivankg@amd.com
2025-05-15gpio: pxa: Make irq_chip immutablePeng Fan
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Constify pxa_muxed_gpio_chip, flag the irq_chip as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-9-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: timberdale: Make irq_chip immutablePeng Fan
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Constify timbgpio_irqchip, flag the irq_chip as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-8-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: xgene-sb: Make irq_chip immutablePeng Fan
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Constify xgene_gpio_sb_irq_chip, flag the irq_chip as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-7-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: davinci: Make irq_chip immutablePeng Fan
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Constify gpio_irqchip, flag the irq_chip as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-6-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: davinci: Update irq chip dataPeng Fan
Use "struct davinci_gpio_controller *chips" as irq chip data to prepare for immutable irq chip, then it will be easy to get gpio_chip pointer in irq mask/unmask. No functional change. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-5-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: mpc8xxx: Make irq_chip immutablePeng Fan
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Flag the irq_chip as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-4-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: lpc18xx: Make irq_chip immutablePeng Fan
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Constify lpc18xx_gpio_pin_ic, flag the irq_chip as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-3-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: grgpio: Make irq_chip immutablePeng Fan
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Constify grgpio_irq_chip, flag the irq_chip as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-2-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: bcm-kona: make irq_chip immutablePeng Fan
The gpiolib is moving to make irq_chip immutable, otherwise there is warning: "not an immutable chip, please consider fixing it!" The bcm_gpio_irq_chip already has irq hooks configured correctly, bcm_kona_gpio_irq_mask/bcm_kona_gpio_irq_unmask calls gpiochip_disable_irq/ gpiochip_enable_irq, and bcm_kona_gpio_irq_reqres/irq_release_resources calls gpiochip_reqres_irq/gpiochip_relres_irq. So just need to flag it as IRQCHIP_IMMUTABLE. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250509-gpio-v1-1-639377c98288@nxp.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15gpio: TODO: add item about GPIO drivers reading struct gpio_chip::baseAhmad Fatoum
drivers/pinctrl/pinctrl-at91.c uses struct gpio_chip::base to find out which bit to set in a register: dev_dbg(npct->dev, "enable pin %u as GPIO\n", offset); mask = 1 << (offset - chip->base); This adds a non-obvious dependency on the global numberspace from the GPIO driver itself, even if all consumers use the descriptor API. More such instances may exist and will need to be fixed in the quest for removal of the global numberspace, so add a reminder about it to the TODO list. Link: https://lore.kernel.org/all/1d00c056-3d61-4c22-bedd-3bae0bf1ddc4@pengutronix.de/ Suggested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Link: https://lore.kernel.org/r/20250507-gpio-chip-base-readback-v1-1-ade56e38480b@pengutronix.de Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15tsnep: fix timestamping with a stacked DSA driverGerhard Engleder
This driver is susceptible to a form of the bug explained in commit c26a2c2ddc01 ("gianfar: Fix TX timestamping with a stacked DSA driver") and in Documentation/networking/timestamping.rst section "Other caveats for MAC drivers", specifically it timestamps any skb which has SKBTX_HW_TSTAMP, and does not consider if timestamping has been enabled in adapter->hwtstamp_config.tx_type. Evaluate the proper TX timestamping condition only once on the TX path (in tsnep_xmit_frame_ring()) and store the result in an additional TX entry flag. Evaluate the new TX entry flag in the TX confirmation path (in tsnep_tx_poll()). This way SKBTX_IN_PROGRESS is set by the driver as required, but never evaluated. SKBTX_IN_PROGRESS shall not be evaluated as it can be set by a stacked DSA driver and evaluating it would lead to unwanted timestamps. Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250514195657.25874-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15gpio: mxc: configure dynamic GPIO base for CONFIG_GPIO_SYSFS=nAhmad Fatoum
i.MX GPIO numbering has been deterministic since commit 7e6086d9e54a ("gpio/mxc: specify gpio base for device tree probe"), a year after device tree support was first added back in 2011. Reverting to dynamically allocated GPIO base now would break most systems making use of the sysfs API. These systems will be eventually broken by the removal of the sysfs API, but that would result in GPIO scripts not working instead of essentially toggling at random according to probe order, which would happen if we unconditionally set base to -1. Yet, the warning is annoying and has resulted in many rejected attempts to remove it over the years[1][2][3]. As the i.MX GPIO driver is device tree only, GPIO_SYSFS is the only consumer of the deterministic GPIO numbering. Let's therefore restrict the static base and the warning that comes with it to configurations with CONFIG_GPIO_SYSFS enabled. [1]: https://lore.kernel.org/all/20230226205319.1013332-1-dario.binacchi@amarulasolutions.com/ [2]: https://lore.kernel.org/all/20230506085928.933737-2-haibo.chen@nxp.com/ [3]: https://lore.kernel.org/all/20241121145515.3087855-1-catalin.popescu@leica-geosystems.com/ Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250507-b4-imx-gpio-base-warning-v2-1-d43d09e2c6bf@pengutronix.de Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-15genirq: Retain disable depth for managed interrupts across CPU hotplugBrian Norris
Affinity-managed interrupts can be shut down and restarted during CPU hotunplug/plug. Thereby the interrupt may be left in an unexpected state. Specifically: 1. Interrupt is affine to CPU N 2. disable_irq() -> depth is 1 3. CPU N goes offline 4. irq_shutdown() -> depth is set to 1 (again) 5. CPU N goes online 6. irq_startup() -> depth is set to 0 (BUG! driver expects that the interrupt still disabled) 7. enable_irq() -> depth underflow / unbalanced enable_irq() warning This is only a problem for managed interrupts and CPU hotplug, all other cases like request()/free()/request() truly needs to reset a possibly stale disable depth value. Provide a startup function, which takes the disable depth into account, and invoked it for the managed interrupts in the CPU hotplug path. This requires to change irq_shutdown() to do a depth increment instead of setting it to 1, which allows to retain the disable depth, but is harmless for the other code paths using irq_startup(), which will still reset the disable depth unconditionally to keep the original correct behaviour. A kunit tests will be added separately to cover some of these aspects. [ tglx: Massaged changelog ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250514201353.3481400-2-briannorris@chromium.org
2025-05-15net: prestera: Use to_delayed_work()Chen Ni
Use to_delayed_work() instead of open-coding it. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://patch.msgid.link/20250514064053.2513921-1-nichen@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net/mlx5: Use to_delayed_work()Chen Ni
Use to_delayed_work() instead of open-coding it. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Acked-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250514072419.2707578-1-nichen@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net/tls: fix kernel panic when alloc_page failedPengtao He
We cannot set frag_list to NULL pointer when alloc_page failed. It will be used in tls_strp_check_queue_ok when the next time tls_strp_read_sock is called. This is because we don't reset full_len in tls_strp_flush_anchor_copy() so the recv path will try to continue handling the partial record on the next call but we dettached the rcvq from the frag list. Alternative fix would be to reset full_len. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 Call trace: tls_strp_check_rcv+0x128/0x27c tls_strp_data_ready+0x34/0x44 tls_data_ready+0x3c/0x1f0 tcp_data_ready+0x9c/0xe4 tcp_data_queue+0xf6c/0x12d0 tcp_rcv_established+0x52c/0x798 Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Pengtao He <hept.hept.hept@gmail.com> Link: https://patch.msgid.link/20250514132013.17274-1-hept.hept.hept@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15genirq: Bump the size of the local variable for sprintf()Andy Shevchenko
GCC is not happy about a sprintf() call on a buffer that might be too small for the given formatting string. kernel/irq/debugfs.c:233:26: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] Fix this by bumping the size of the local variable for sprintf(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250515085516.2913290-1-andriy.shevchenko@linux.intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202505151057.xbyXAbEn-lkp@intel.com/
2025-05-15Merge tag 'wireless-2025-05-15' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of stragglers: - mac80211: fix syzbot/ubsan in scan counted-by - mt76: fix NAPI handling on driver remove - mt67: fix multicast/ipv6 receive * tag 'wireless-2025-05-15' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl wifi: mt76: disable napi on driver removal ==================== Link: https://patch.msgid.link/20250515121749.61912-4-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15irqchip/gic-v4.1: Use local 4_1 ITS to generate VSGINianyao Tang
On multi-node GICv4.1 system, VSGI senders always use one certain 4_1 ITS, because find_4_1_its() returns the first its_node in the list, regardless of which node the VSGI sender is on. This brings guest VSGI performance drop when VM is not running on the same node as this returned ITS. On a 2-socket environment, each with one ITS and 32 cpu, GICv4.1 enabled, 4U8G guest, 4 vcpu is running on same socket. When the VM is on socket0, kvm-unit-tests ipi_hw result is 850ns. When the VM is on socket1, it is 750ns. The reason is that the VSGI sender always uses the last reported ITS (that on socket1) to inject VSGI. The access from a CPU to a other-socket ITS will cost 100ns more compared to an access to the local ITS. Using the local ITS results in a 12% reduction in IPI latency. Modify find_4_1_its() to return the first per-CPU local_4_1_its, which is initialized when the VPE table is inherited from the ITS or from another CPU. If it fails to find a local 4_1 ITS, it returns any 4_1 ITS like before. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Nianyao Tang <tangnianyao@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/20250515145359.2795959-1-tangnianyao@huawei.com
2025-05-15bnxt_en: bring back rtnl_lock() in the bnxt_open() pathMichael Chan
Error recovery, PCIe AER, resume, and TX timeout will invoke bnxt_open() with netdev_lock only. This will cause RTNL assert failure in netif_set_real_num_tx_queues(), netif_set_real_num_tx_queues(), and netif_set_real_num_tx_queues(). Example error recovery assert: RTNL: assertion failed at net/core/dev.c (3178) WARNING: CPU: 3 PID: 3392 at net/core/dev.c:3178 netif_set_real_num_tx_queues+0x1fd/0x210 Call Trace: <TASK> ? __pfx_bnxt_msix+0x10/0x10 [bnxt_en] __bnxt_open_nic+0x1ef/0xb20 [bnxt_en] bnxt_open+0xda/0x130 [bnxt_en] bnxt_fw_reset_task+0x21f/0x780 [bnxt_en] process_scheduled_works+0x9d/0x400 For now, bring back rtnl_lock() in all these code paths that can invoke bnxt_open(). In the bnxt_queue_start() error path, we don't have rtnl_lock held so we just change it to call netif_close() instead of bnxt_reset_task() for simplicity. This error path is unlikely so it should be fine. Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250514062908.2766677-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devicesIdo Schimmel
The driver only offloads neighbors that are constructed on top of net devices registered by it or their uppers (which are all Ethernet). The device supports GRE encapsulation and decapsulation of forwarded traffic, but the driver will not offload dummy neighbors constructed on top of GRE net devices as they are not uppers of its net devices: # ip link add name gre1 up type gre tos inherit local 192.0.2.1 remote 198.51.100.1 # ip neigh add 0.0.0.0 lladdr 0.0.0.0 nud noarp dev gre1 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 NOARP (Note that the neighbor is not marked with 'offload') When the driver is reloaded and the existing configuration is replayed, the driver does not perform the same check regarding existing neighbors and offloads the previously added one: # devlink dev reload pci/0000:01:00.0 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 offload NOARP If the neighbor is later deleted, the driver will ignore the notification (given the GRE net device is not its upper) and will therefore keep referencing freed memory, resulting in a use-after-free [1] when the net device is deleted: # ip neigh del 0.0.0.0 lladdr 0.0.0.0 dev gre1 # ip link del dev gre1 Fix by skipping neighbor replay if the net device for which the replay is performed is not our upper. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x1ea/0x200 Read of size 8 at addr ffff888155b0e420 by task ip/2282 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 mlxsw_sp_neigh_entry_update+0x1ea/0x200 mlxsw_sp_router_rif_gone_sync+0x2a8/0x440 mlxsw_sp_rif_destroy+0x1e9/0x750 mlxsw_sp_netdevice_ipip_ol_event+0x3c9/0xdc0 mlxsw_sp_router_netdevice_event+0x3ac/0x15e0 notifier_call_chain+0xca/0x150 call_netdevice_notifiers_info+0x7f/0x100 unregister_netdevice_many_notify+0xc8c/0x1d90 rtnl_dellink+0x34e/0xa50 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x131/0x360 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x121/0x1b0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 8fdb09a7674c ("mlxsw: spectrum_router: Replay neighbours when RIF is made") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/c53c02c904fde32dad484657be3b1477884e9ad6.1747225701.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15irqchip/riscv-imsic: Start local sync timer on correct CPUAndrew Bresticker
When starting the local sync timer to synchronize the state of a remote CPU it should be added on the CPU to be synchronized, not the initiating CPU. This results in interrupt delivery being delayed until the timer eventually runs (due to another mask/unmask/migrate operation) on the target CPU. Fixes: 0f67911e821c ("irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector") Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/all/20250514171320.3494917-1-abrestic@rivosinc.com
2025-05-15block/blk-throttle: silence !BLK_DEV_IO_TRACE variable warningsJens Axboe
If blk-throttle is enabled but blktrace is not, then the compiler will notice that the following two variables are unused: ../block/blk-throttle.c: In function 'throtl_pending_timer_fn': ../block/blk-throttle.c:1153:30: warning: unused variable 'bio_cnt_w' [-Wunused-variable] 1153 | unsigned int bio_cnt_w = sq_queued(sq, WRITE); | ^~~~~~~~~ ../block/blk-throttle.c:1152:30: warning: unused variable 'bio_cnt_r' [-Wunused-variable] 1152 | unsigned int bio_cnt_r = sq_queued(sq, READ); | ^~~~~~~~~ Silence that my annotating them with __maybe_unused. Fixes: 28ad83b774a6 ("blk-throttle: Split the service queue") Link: https://lore.kernel.org/all/20250515130830.9671-1-aishwarya.tcv@arm.com/ Reported-by: Aishwarya <aishwarya.tcv@arm.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-15Merge branch 'net-cover-more-per-cpu-storage-with-local-nested-bh-locking'Paolo Abeni
Sebastian Andrzej Siewior says: ==================== net: Cover more per-CPU storage with local nested BH locking I was looking at the build-time defined per-CPU variables in net/ and added the needed local-BH-locks in order to be able to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. The work is not yet complete, I just wanted to post what I have so far instead of sitting on it. v3: https://lore.kernel.org/all/20250430124758.1159480-1-bigeasy@linutronix.de/ v2: https://lore.kernel.org/all/20250414160754.503321-1-bigeasy@linutronix.de v1: https://lore.kernel.org/all/20250309144653.825351-1-bigeasy@linutronix.de ==================== Link: https://patch.msgid.link/20250512092736.229935-1-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15rds: Use nested-BH locking for rds_page_remainderSebastian Andrzej Siewior
rds_page_remainder is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Allison Henderson <allison.henderson@oracle.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-16-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15rds: Acquire per-CPU pointer within BH disabled sectionSebastian Andrzej Siewior
rds_page_remainder_alloc() obtains the current CPU with get_cpu() while disabling preemption. Then the CPU number is used to access the per-CPU data structure via per_cpu(). This can be optimized by relying on local_bh_disable() to provide a stable CPU number/ avoid migration and then using this_cpu_ptr() to retrieve the data structure. Cc: Allison Henderson <allison.henderson@oracle.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-15-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15rds: Disable only bottom halves in rds_page_remainder_alloc()Sebastian Andrzej Siewior
rds_page_remainder_alloc() is invoked from a preemptible context or a tasklet. There is no need to disable interrupts for locking. Use local_bh_disable() instead of local_irq_save() for locking. Cc: Allison Henderson <allison.henderson@oracle.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-14-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15mptcp: Use nested-BH locking for hmac_storageSebastian Andrzej Siewior
mptcp_delegated_actions is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Mat Martineau <martineau@kernel.org> Cc: Geliang Tang <geliang@kernel.org> Cc: mptcp@lists.linux.dev Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-13-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15net/sched: Use nested-BH locking for sch_frag_data_storageSebastian Andrzej Siewior
sch_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add local_lock_t to the struct and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250512092736.229935-12-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15net/sched: act_mirred: Move the recursion counter struct netdev_xmitSebastian Andrzej Siewior
mirred_nest_level is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move mirred_nest_level to struct netdev_xmit as u8, provide wrappers. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://patch.msgid.link/20250512092736.229935-11-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>