summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-11x86/xen: Fix __xen_hypercall_setfunc()Jason Andryuk
Hypercall detection is failing with xen_hypercall_intel() chosen even on an AMD processor. Looking at the disassembly, the call to xen_get_vendor() was removed. The check for boot_cpu_has(X86_FEATURE_CPUID) was used as a proxy for the x86_vendor having been set. When CONFIG_X86_REQUIRED_FEATURE_CPUID=y (the default value), DCE eliminates the call to xen_get_vendor(). An uninitialized value 0 means X86_VENDOR_INTEL, so the Intel function is always returned. Remove the if and always call xen_get_vendor() to avoid this issue. Fixes: 3d37d9396eb3 ("x86/cpufeatures: Add {REQUIRED,DISABLED} feature configs") Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Xin Li (Intel)" <xin@zytor.com> Link: https://lore.kernel.org/r/20250410193106.16353-1-jason.andryuk@amd.com
2025-04-10Merge tag 'irq-urgent-2025-04-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc irqchip fixes from Ingo Molnar: - Fix NULL pointer dereference crashes due to missing .chip_flags setup in the sg2042-msi and irq-bcm2712-mip irqchip drivers - Remove the davinci aintc irqchip driver's leftover header too * tag 'irq-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-bcm2712-mip: Set EOI/ACK flags in msi_parent_ops irqchip/sg2042-msi: Add missing chip flags irqchip/davinci: Remove leftover header
2025-04-10Merge tag 'timers-urgent-2025-04-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc timer fixes from Ingo Molnar: - Fix missing ACCESS_PRIVATE() that triggered a Sparse warning - Fix lockdep false positive in tick_freeze() on CONFIG_PREEMPT_RT=y - Avoid <vdso/unaligned.h> macro's variable shadowing to address build warning that triggers under W=2 builds * tag 'timers-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vdso: Address variable shadowing in macros timekeeping: Add a lockdep override in tick_freeze() hrtimer: Add missing ACCESS_PRIVATE() for hrtimer::function
2025-04-10Merge tag 'x86-urgent-2025-04-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix CPU topology related regression that limited Xen PV guests to a single CPU - Fix ancient e820__register_nosave_regions() bugs that were causing problems with kexec's artificial memory maps - Fix an S4 hibernation crash caused by two missing ENDBR's that were mistakenly removed in a recent commit - Fix a resctrl serialization bug - Fix early_printk documentation and comments - Fix RSB bugs, combined with preparatory updates to better match the code to vendor recommendations. - Add RSB mitigation document - Fix/update documentation - Fix the erratum_1386_microcode[] table to be NULL terminated * tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ibt: Fix hibernate x86/cpu: Avoid running off the end of an AMD erratum table Documentation/x86: Zap the subsection letters Documentation/x86: Update the naming of CPU features for /proc/cpuinfo x86/bugs: Add RSB mitigation document x86/bugs: Don't fill RSB on context switch with eIBRS x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier() x86/bugs: Use SBPB in write_ibpb() if applicable x86/bugs: Rename entry_ibpb() to write_ibpb() x86/early_printk: Use 'mmio32' for consistency, fix comments x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::name x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions() x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI
2025-04-10Merge tag 'perf-urgent-2025-04-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc perf events fixes from Ingo Molnar: - Fix __free_event() corner case splat - Fix false-positive uprobes related lockdep splat on CONFIG_PREEMPT_RT=y kernels - Fix a complicated perf sigtrap race that may result in hangs * tag 'perf-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix hang while freeing sigtrap event uprobes: Avoid false-positive lockdep splat on CONFIG_PREEMPT_RT=y in the ri_timer() uprobe timer callback, use raw_write_seqcount_*() perf/core: Fix WARN_ON(!ctx) in __free_event() for partial init
2025-04-10Merge tag 'objtool-urgent-2025-04-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc objtool fixes from Ingo Molnar: - Remove the recently introduced ANNOTATE_IGNORE_ALTERNATIVE noise from clac()/stac() code to make .s files more readable - Fix INSN_SYSCALL / INSN_SYSRET semantics - Fix various false-positive warnings * tag 'objtool-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix false-positive "ignoring unreachables" warning objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC objtool, xen: Fix INSN_SYSCALL / INSN_SYSRET semantics objtool: Stop UNRET validation on UD2 objtool: Split INSN_CONTEXT_SWITCH into INSN_SYSCALL and INSN_SYSRET objtool: Fix INSN_CONTEXT_SWITCH handling in validate_unret()
2025-04-10objtool: Fix false-positive "ignoring unreachables" warningJosh Poimboeuf
There's no need to try to automatically disable unreachable warnings if they've already been manually disabled due to CONFIG_KCOV quirks. This avoids a spurious warning with a KCOV kernel: fs/smb/client/cifs_unicode.o: warning: objtool: cifsConvertToUTF16.part.0+0xce5: ignoring unreachables due to jump table quirk Fixes: eeff7ac61526 ("objtool: Warn when disabling unreachable warnings") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/5eb28eeb6a724b7d945a961cfdcf8d41e6edf3dc.1744238814.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/r/202504090910.QkvTAR36-lkp@intel.com/
2025-04-10Merge tag 'net-6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - core: hold instance lock during NETDEV_CHANGE - rtnetlink: fix bad unlock balance in do_setlink() - ipv6: - fix null-ptr-deref in addrconf_add_ifaddr() - align behavior across nexthops during path selection Previous releases - regressions: - sctp: prevent transport UaF in sendmsg - mptcp: only inc MPJoinAckHMacFailure for HMAC failures Previous releases - always broken: - sched: - make ->qlen_notify() idempotent - ensure sufficient space when sending filter netlink notifications - sch_sfq: really don't allow 1 packet limit - netfilter: fix incorrect avx2 match of 5th field octet - tls: explicitly disallow disconnect - eth: octeontx2-pf: fix VF root node parent queue priority" * tag 'net-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) ethtool: cmis_cdb: Fix incorrect read / write length extension selftests: netfilter: add test case for recent mismatch bug nft_set_pipapo: fix incorrect avx2 match of 5th field octet net: ppp: Add bound checking for skb data on ppp_sync_txmung net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod. ipv6: Align behavior across nexthops during path selection net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend() selftests/tc-testing: sfq: check that a derived limit of 1 is rejected net_sched: sch_sfq: move the limit validation net_sched: sch_sfq: use a temporary work area for validating configuration net: libwx: handle page_pool_dev_alloc_pages error selftests: mptcp: validate MPJoin HMacFailure counters mptcp: only inc MPJoinAckHMacFailure for HMAC failures rtnetlink: Fix bad unlock balance in do_setlink(). net: ethtool: Don't call .cleanup_data when prepare_data fails tc: Ensure we have enough buffer space when sending filter netlink notifications net: libwx: Fix the wrong Rx descriptor field octeontx2-pf: qos: fix VF root node parent queue index selftests: tls: check that disconnect does nothing ...
2025-04-10Merge tag 'for-linus-6.15a-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A simple fix adding the module description of the Xenbus frontend module - A fix correcting the xen-acpi-processor Kconfig dependency for PVH Dom0 support - A fix for the Xen balloon driver when running as Xen Dom0 in PVH mode - A fix for PVH Dom0 in order to avoid problems with CPU idle and frequency drivers conflicting with Xen * tag 'for-linus-6.15a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: disable CPU idle and frequency drivers for PVH dom0 x86/xen: fix balloon target initialization for PVH dom0 xen: Change xen-acpi-processor dom0 dependency xenbus: add module description
2025-04-10Merge tag 'block-6.15-20250410' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Add a missing ublk selftest script, from test additions added last week - Two fixes for ublk error recovery and reissue - Cleanup of ublk argument passing * tag 'block-6.15-20250410' of git://git.kernel.dk/linux: ublk: pass ublksrv_ctrl_cmd * instead of io_uring_cmd * ublk: don't fail request for recovery & reissue in case of ubq->canceling ublk: fix handling recovery & reissue in ublk_abort_queue() selftests: ublk: fix test_stripe_04
2025-04-10Merge tag 'io_uring-6.15-20250410' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Reject zero sized legacy provided buffers upfront. No ill side effects from this one, only really done to shut up a silly syzbot test case. - Fix for a regression in tag posting for registered files or buffers, where the tag would be posted even when the registration failed. - two minor zcrx cleanups for code added this merge window. * tag 'io_uring-6.15-20250410' of git://git.kernel.dk/linux: io_uring/kbuf: reject zero sized provided buffers io_uring/zcrx: separate niov number from pages io_uring/zcrx: put refill data into separate cache line io_uring: don't post tag CQEs on file/buffer registration failure
2025-04-10Merge tag 'gpio-fixes-for-v6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix resource handling in gpio-tegra186 - fix wakeup source leaks in gpio-mpc8xxx and gpio-zynq - fix minor issues with some GPIO OF quirks - deprecate GPIOD_FLAGS_BIT_NONEXCLUSIVE and devm_gpiod_unhinge() symbols and add a TODO task to track replacing them with a better solution * tag 'gpio-fixes-for-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: of: Move Atmel HSMCI quirk up out of the regulator comment gpiolib: of: Fix the choice for Ingenic NAND quirk gpio: zynq: Fix wakeup source leaks on device unbind gpio: mpc8xxx: Fix wakeup source leaks on device unbind gpio: TODO: track the removal of regulator-related workarounds MAINTAINERS: add more keywords for the GPIO subsystem entry gpio: deprecate devm_gpiod_unhinge() gpio: deprecate the GPIOD_FLAGS_BIT_NONEXCLUSIVE flag gpio: tegra186: fix resource handling in ACPI probe path
2025-04-10Merge tag 'mtd/fixes-for-6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: "Two important fixes: the build of the SPI NAND layer with old GCC versions as well as the fix of the Qpic Makefile which was wrong in the first place. There are also two smaller fixes about a missing error and status check" * tag 'mtd/fixes-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: spinand: Fix build with gcc < 7.5 mtd: rawnand: Add status chack in r852_ready() mtd: inftlcore: Add error check for inftl_read_oob() mtd: nand: Drop explicit test for built-in CONFIG_SPI_QPIC_SNAND
2025-04-10ethtool: cmis_cdb: Fix incorrect read / write length extensionIdo Schimmel
The 'read_write_len_ext' field in 'struct ethtool_cmis_cdb_cmd_args' stores the maximum number of bytes that can be read from or written to the Local Payload (LPL) page in a single multi-byte access. Cited commit started overwriting this field with the maximum number of bytes that can be read from or written to the Extended Payload (LPL) pages in a single multi-byte access. Transceiver modules that support auto paging can advertise a number larger than 255 which is problematic as 'read_write_len_ext' is a 'u8', resulting in the number getting truncated and firmware flashing failing [1]. Fix by ignoring the maximum EPL access size as the kernel does not currently support auto paging (even if the transceiver module does) and will not try to read / write more than 128 bytes at once. [1] Transceiver module firmware flashing started for device enp177s0np0 Transceiver module firmware flashing in progress for device enp177s0np0 Progress: 0% Transceiver module firmware flashing encountered an error for device enp177s0np0 Status message: Write FW block EPL command failed, LPL length is longer than CDB read write length extension allows. Fixes: 9a3b0d078bd8 ("net: ethtool: Add support for writing firmware blocks using EPL payload") Reported-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Closes: https://lore.kernel.org/netdev/20250402183123.321036-3-michael.chan@broadcom.com/ Tested-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250409112440.365672-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-10Merge tag 'nf-25-04-10' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains a Netfilter fix and improved test coverage: 1) Fix AVX2 matching in nft_pipapo, from Florian Westphal. 2) Extend existing test to improve coverage for the aforementioned bug, also from Florian. netfilter pull request 25-04-10 * tag 'nf-25-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add test case for recent mismatch bug nft_set_pipapo: fix incorrect avx2 match of 5th field octet ==================== Link: https://patch.msgid.link/20250410103647.1030244-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-10selftests: netfilter: add test case for recent mismatch bugFlorian Westphal
Without 'nft_set_pipapo: fix incorrect avx2 match of 5th field octet" this fails: TEST: reported issues Add two elements, flush, re-add 1s [ OK ] net,mac with reload 0s [ OK ] net,port,proto 3s [ OK ] avx2 false match 0s [FAIL] False match for fe80:dead:01fe:0a02:0b03:6007:8009:a001 Other tests do not detect the kernel bug as they only alter parts in the /64 netmask. Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-10nft_set_pipapo: fix incorrect avx2 match of 5th field octetFlorian Westphal
Given a set element like: icmpv6 . dead:beef:00ff::1 The value of 'ff' is irrelevant, any address will be matched as long as the other octets are the same. This is because of too-early register clobbering: ymm7 is reloaded with new packet data (pkt[9]) but it still holds data of an earlier load that wasn't processed yet. The existing tests in nft_concat_range.sh selftests do exercise this code path, but do not trigger incorrect matching due to the network prefix limitation. Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation") Reported-by: sontu mazumdar <sontu21@gmail.com> Closes: https://lore.kernel.org/netfilter/CANgxkqwnMH7fXra+VUfODT-8+qFLgskq3set1cAzqqJaV4iEZg@mail.gmail.com/T/#t Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-10net: ppp: Add bound checking for skb data on ppp_sync_txmungArnaud Lecomte
Ensure we have enough data in linear buffer from skb before accessing initial bytes. This prevents potential out-of-bounds accesses when processing short packets. When ppp_sync_txmung receives an incoming package with an empty payload: (remote) gef➤ p *(struct pppoe_hdr *) (skb->head + skb->network_header) $18 = { type = 0x1, ver = 0x1, code = 0x0, sid = 0x2, length = 0x0, tag = 0xffff8880371cdb96 } from the skb struct (trimmed) tail = 0x16, end = 0x140, head = 0xffff88803346f400 "4", data = 0xffff88803346f416 ":\377", truesize = 0x380, len = 0x0, data_len = 0x0, mac_len = 0xe, hdr_len = 0x0, it is not safe to access data[2]. Reported-by: syzbot+29fc8991b0ecb186cf40@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=29fc8991b0ecb186cf40 Tested-by: syzbot+29fc8991b0ecb186cf40@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com> Link: https://patch.msgid.link/20250408-bound-checking-ppp_txmung-v2-1-94bb6e1b92d0@arnaud-lcm.com [pabeni@redhat.com: fixed subj typo] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-10vdso: Address variable shadowing in macrosPeng Jiang
Compiling the kernel with gcc12.3 W=2 results in shadowing warnings: warning: declaration of '__pptr' shadows a previous local [-Wshadow] const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); note: in definition of macro '__put_unaligned_t' __pptr->x = (val); note: in expansion of macro '__get_unaligned_t' __put_unaligned_t(type, __get_unaligned_t(type, src), dst); __get_unaligned_t() and __put_unaligned_t() use a local variable named '__pptr', which can lead to variable shadowing when these macros are used in the same scope. This results in a -Wshadow warning during compilation. To address this issue, rename the local variables within the macros to ensure uniqueness. Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250324191230477zpGtgIRSH4mEHdtxGtgx9@zte.com.cn
2025-04-09net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod.Kuniyuki Iwashima
When I ran the repro [0] and waited a few seconds, I observed two LOCKDEP splats: a warning immediately followed by a null-ptr-deref. [1] Reproduction Steps: 1) Mount CIFS 2) Add an iptables rule to drop incoming FIN packets for CIFS 3) Unmount CIFS 4) Unload the CIFS module 5) Remove the iptables rule At step 3), the CIFS module calls sock_release() for the underlying TCP socket, and it returns quickly. However, the socket remains in FIN_WAIT_1 because incoming FIN packets are dropped. At this point, the module's refcnt is 0 while the socket is still alive, so the following rmmod command succeeds. # ss -tan State Recv-Q Send-Q Local Address:Port Peer Address:Port FIN-WAIT-1 0 477 10.0.2.15:51062 10.0.0.137:445 # lsmod | grep cifs cifs 1159168 0 This highlights a discrepancy between the lifetime of the CIFS module and the underlying TCP socket. Even after CIFS calls sock_release() and it returns, the TCP socket does not die immediately in order to close the connection gracefully. While this is generally fine, it causes an issue with LOCKDEP because CIFS assigns a different lock class to the TCP socket's sk->sk_lock using sock_lock_init_class_and_name(). Once an incoming packet is processed for the socket or a timer fires, sk->sk_lock is acquired. Then, LOCKDEP checks the lock context in check_wait_context(), where hlock_class() is called to retrieve the lock class. However, since the module has already been unloaded, hlock_class() logs a warning and returns NULL, triggering the null-ptr-deref. If LOCKDEP is enabled, we must ensure that a module calling sock_lock_init_class_and_name() (CIFS, NFS, etc) cannot be unloaded while such a socket is still alive to prevent this issue. Let's hold the module reference in sock_lock_init_class_and_name() and release it when the socket is freed in sk_prot_free(). Note that sock_lock_init() clears sk->sk_owner for svc_create_socket() that calls sock_lock_init_class_and_name() for a listening socket, which clones a socket by sk_clone_lock() without GFP_ZERO. [0]: CIFS_SERVER="10.0.0.137" CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST" DEV="enp0s3" CRED="/root/WindowsCredential.txt" MNT=$(mktemp -d /tmp/XXXXXX) mount -t cifs ${CIFS_PATH} ${MNT} -o vers=3.0,credentials=${CRED},cache=none,echo_interval=1 iptables -A INPUT -s ${CIFS_SERVER} -j DROP for i in $(seq 10); do umount ${MNT} rmmod cifs sleep 1 done rm -r ${MNT} iptables -D INPUT -s ${CIFS_SERVER} -j DROP [1]: DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) ... Call Trace: <IRQ> __lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178) lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ... BUG: kernel NULL pointer dereference, address: 00000000000000c4 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G W 6.14.0 #36 Tainted: [W]=WARN Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178) Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44 RSP: 0018:ffa0000000468a10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027 RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88 RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88 R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1 FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234) ip_sublist_rcv_finish (net/ipv4/ip_input.c:576) ip_list_rcv_finish (net/ipv4/ip_input.c:628) ip_list_rcv (net/ipv4/ip_input.c:670) __netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986) netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129) napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496) e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815) __napi_poll.constprop.0 (net/core/dev.c:7191) net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382) handle_softirqs (kernel/softirq.c:561) __irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662) irq_exit_rcu (kernel/softirq.c:680) common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14)) </IRQ> <TASK> asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693) RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744) Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202 RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5 RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118) do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:315) common_startup_64 (arch/x86/kernel/head_64.S:421) </TASK> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CR2: 00000000000000c4 Fixes: ed07536ed673 ("[PATCH] lockdep: annotate nfs/nfsd in-kernel sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250407163313.22682-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09ipv6: Align behavior across nexthops during path selectionIdo Schimmel
A nexthop is only chosen when the calculated multipath hash falls in the nexthop's hash region (i.e., the hash is smaller than the nexthop's hash threshold) and when the nexthop is assigned a non-negative score by rt6_score_route(). Commit 4d0ab3a6885e ("ipv6: Start path selection from the first nexthop") introduced an unintentional difference between the first nexthop and the rest when the score is negative. When the first nexthop matches, but has a negative score, the code will currently evaluate subsequent nexthops until one is found with a non-negative score. On the other hand, when a different nexthop matches, but has a negative score, the code will fallback to the nexthop with which the selection started ('match'). Align the behavior across all nexthops and fallback to 'match' when the first nexthop matches, but has a negative score. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Fixes: 4d0ab3a6885e ("ipv6: Start path selection from the first nexthop") Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Closes: https://lore.kernel.org/netdev/67efef607bc41_1ddca82948c@willemb.c.googlers.com.notmuch/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250408084316.243559-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: phy: allow MDIO bus PM ops to start/stop state machine for ↵Vladimir Oltean
phylink-controlled PHY DSA has 2 kinds of drivers: 1. Those who call dsa_switch_suspend() and dsa_switch_resume() from their device PM ops: qca8k-8xxx, bcm_sf2, microchip ksz 2. Those who don't: all others. The above methods should be optional. For type 1, dsa_switch_suspend() calls dsa_user_suspend() -> phylink_stop(), and dsa_switch_resume() calls dsa_user_resume() -> phylink_start(). These seem good candidates for setting mac_managed_pm = true because that is essentially its definition [1], but that does not seem to be the biggest problem for now, and is not what this change focuses on. Talking strictly about the 2nd category of DSA drivers here (which do not have MAC managed PM, meaning that for their attached PHYs, mdio_bus_phy_suspend() and mdio_bus_phy_resume() should run in full), I have noticed that the following warning from mdio_bus_phy_resume() is triggered: WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY && phydev->state != PHY_UP); because the PHY state machine is running. It's running as a result of a previous dsa_user_open() -> ... -> phylink_start() -> phy_start() having been initiated by the user. The previous mdio_bus_phy_suspend() was supposed to have called phy_stop_machine(), but it didn't. So this is why the PHY is in state PHY_NOLINK by the time mdio_bus_phy_resume() runs. mdio_bus_phy_suspend() did not call phy_stop_machine() because for phylink, the phydev->adjust_link function pointer is NULL. This seems a technicality introduced by commit fddd91016d16 ("phylib: fix PAL state machine restart on resume"). That commit was written before phylink existed, and was intended to avoid crashing with consumer drivers which don't use the PHY state machine - phylink always does, when using a PHY. But phylink itself has historically not been developed with suspend/resume in mind, and apparently not tested too much in that scenario, allowing this bug to exist unnoticed for so long. Plus, prior to the WARN_ON(), it would have likely been invisible. This issue is not in fact restricted to type 2 DSA drivers (according to the above ad-hoc classification), but can be extrapolated to any MAC driver with phylink and MDIO-bus-managed PHY PM ops. DSA is just where the issue was reported. Assuming mac_managed_pm is set correctly, a quick search indicates the following other drivers might be affected: $ grep -Zlr PHYLINK_NETDEV drivers/ | xargs -0 grep -L mac_managed_pm drivers/net/ethernet/atheros/ag71xx.c drivers/net/ethernet/microchip/sparx5/sparx5_main.c drivers/net/ethernet/microchip/lan966x/lan966x_main.c drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c drivers/net/ethernet/freescale/dpaa/dpaa_eth.c drivers/net/ethernet/freescale/ucc_geth.c drivers/net/ethernet/freescale/enetc/enetc_pf_common.c drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c drivers/net/ethernet/marvell/mvneta.c drivers/net/ethernet/marvell/prestera/prestera_main.c drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/altera/altera_tse_main.c drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c drivers/net/ethernet/meta/fbnic/fbnic_phylink.c drivers/net/ethernet/tehuti/tn40_phy.c drivers/net/ethernet/mscc/ocelot_net.c Make the existing conditions dependent on the PHY device having a phydev->phy_link_change() implementation equal to the default phy_link_change() provided by phylib. Otherwise, we implicitly know that the phydev has the phylink-provided phylink_phy_change() callback, and when phylink is used, the PHY state machine always needs to be stopped/ started on the suspend/resume path. The code is structured as such that if phydev->phy_link_change() is absent, it is a matter of time until the kernel will crash - no need to further complicate the test. Thus, for the situation where the PM is not managed by the MAC, we will make the MDIO bus PM ops treat identically the phylink-controlled PHYs with the phylib-controlled PHYs where an adjust_link() callback is supplied. In both cases, the MDIO bus PM ops should stop and restart the PHY state machine. [1] https://lore.kernel.org/netdev/Z-1tiW9zjcoFkhwc@shell.armlinux.org.uk/ Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Reported-by: Wei Fang <wei.fang@nxp.com> Tested-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250407094042.2155633-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend()Vladimir Oltean
In an upcoming change, mdio_bus_phy_may_suspend() will need to distinguish a phylib-based PHY client from a phylink PHY client. For that, it will need to compare the phydev->phy_link_change() function pointer with the eponymous phy_link_change() provided by phylib. To avoid forward function declarations, the default PHY link state change method should be moved upwards. There is no functional change associated with this patch, it is only to reduce the noise from a real bug fix. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250407093900.2155112-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09Merge tag 'linux_kselftest-fixes-6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - Fixes tpm2, futex, and mincore tests - Create a dedicated .gitignore for tpm2 tests * tag 'linux_kselftest-fixes-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/mincore: Allow read-ahead pages to reach the end of the file selftests/futex: futex_waitv wouldblock test should fail selftests: tpm2: test_smoke: use POSIX-conformant expression operator selftests: tpm2: create a dedicated .gitignore
2025-04-09timekeeping: Add a lockdep override in tick_freeze()Sebastian Andrzej Siewior
tick_freeze() acquires a raw spinlock (tick_freeze_lock). Later in the callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver acquires a spinlock which becomes a sleeping lock on PREEMPT_RT. Lockdep complains about this lock nesting. Add a lockdep override for this special case and a comment explaining why it is okay. Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20250404133429.pnAzf-eF@linutronix.de Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/ Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
2025-04-09x86/ibt: Fix hibernatePeter Zijlstra
Todd reported, and Len confirmed, that commit 582077c94052 ("x86/cfi: Clean up linkage") broke S4 hiberate on a fair number of machines. Turns out these machines trip #CP when trying to restore the image. As it happens, the commit in question removes two ENDBR instructions in the hibernate code, and clearly got it wrong. Notably restore_image() does an indirect jump to relocated_restore_code(), which is a relocated copy of core_restore_code(). In turn, core_restore_code(), will at the end do an indirect jump to restore_jump_address (r8), which is pointing at a relocated restore_registers(). So both sites do indeed need to be ENDBR. Fixes: 582077c94052 ("x86/cfi: Clean up linkage") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Todd Brandt <todd.e.brandt@intel.com> Tested-by: Len Brown <len.brown@intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219998 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219998
2025-04-09hrtimer: Add missing ACCESS_PRIVATE() for hrtimer::functionNam Cao
The "function" field of struct hrtimer has been changed to private, but two instances have not been converted to use ACCESS_PRIVATE(). Convert them to use ACCESS_PRIVATE(). Fixes: 04257da0c99c ("hrtimers: Make callback function pointer private") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250408103854.1851093-1-namcao@linutronix.de Closes: https://lore.kernel.org/oe-kbuild-all/202504071931.vOVl13tt-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202504072155.5UAZjYGU-lkp@intel.com/
2025-04-09x86/cpu: Avoid running off the end of an AMD erratum tableDave Hansen
The NULL array terminator at the end of erratum_1386_microcode was removed during the switch from x86_cpu_desc to x86_cpu_id. This causes readers to run off the end of the array. Replace the NULL. Fixes: f3f325152673 ("x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'") Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2025-04-09ublk: pass ublksrv_ctrl_cmd * instead of io_uring_cmd *Caleb Sander Mateos
The ublk_ctrl_*() handlers all take struct io_uring_cmd *cmd but only use it to get struct ublksrv_ctrl_cmd *header from the io_uring SQE. Since the caller ublk_ctrl_uring_cmd() has already computed header, pass it instead of cmd. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250409012928.3527198-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-09ublk: don't fail request for recovery & reissue in case of ubq->cancelingMing Lei
ubq->canceling is set with request queue quiesced when io_uring context is exiting. USER_RECOVERY or !RECOVERY_FAIL_IO requires request to be re-queued and re-dispatch after device is recovered. However commit d796cea7b9f3 ("ublk: implement ->queue_rqs()") still may fail any request in case of ubq->canceling, this way breaks USER_RECOVERY or !RECOVERY_FAIL_IO. Fix it by calling __ublk_abort_rq() in case of ubq->canceling. Reviewed-by: Uday Shankar <ushankar@purestorage.com> Reported-by: Uday Shankar <ushankar@purestorage.com> Closes: https://lore.kernel.org/linux-block/Z%2FQkkTRHfRxtN%2FmB@dev-ushankar.dev.purestorage.com/ Fixes: d796cea7b9f3 ("ublk: implement ->queue_rqs()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250409011444.2142010-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-09ublk: fix handling recovery & reissue in ublk_abort_queue()Ming Lei
Commit 8284066946e6 ("ublk: grab request reference when the request is handled by userspace") doesn't grab request reference in case of recovery reissue. Then the request can be requeued & re-dispatch & failed when canceling uring command. If it is one zc request, the request can be freed before io_uring returns the zc buffer back, then cause kernel panic: [ 126.773061] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 126.773657] #PF: supervisor read access in kernel mode [ 126.774052] #PF: error_code(0x0000) - not-present page [ 126.774455] PGD 0 P4D 0 [ 126.774698] Oops: Oops: 0000 [#1] SMP NOPTI [ 126.775034] CPU: 13 UID: 0 PID: 1612 Comm: kworker/u64:55 Not tainted 6.14.0_blk+ #182 PREEMPT(full) [ 126.775676] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 [ 126.776275] Workqueue: iou_exit io_ring_exit_work [ 126.776651] RIP: 0010:ublk_io_release+0x14/0x130 [ublk_drv] Fixes it by always grabbing request reference for aborting the request. Reported-by: Caleb Sander Mateos <csander@purestorage.com> Closes: https://lore.kernel.org/linux-block/CADUfDZodKfOGUeWrnAxcZiLT+puaZX8jDHoj_sfHZCOZwhzz6A@mail.gmail.com/ Fixes: 8284066946e6 ("ublk: grab request reference when the request is handled by userspace") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250409011444.2142010-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-09Documentation/x86: Zap the subsection lettersBorislav Petkov (AMD)
The subsections already have numbering - no need for the letters too. Zap the latter. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250409111435.GEZ_ZWmz3_lkP8S9Lb@fat_crate.local
2025-04-09Documentation/x86: Update the naming of CPU features for /proc/cpuinfoNaveen N Rao (AMD)
Commit: 78ce84b9e0a5 ("x86/cpufeatures: Flip the /proc/cpuinfo appearance logic") changed how CPU feature names should be specified. Update document to reflect the same. Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250409111341.GDZ_ZWZS4LckBcirLE@fat_crate.local
2025-04-09Merge branch 'sch_sfq-derived-limit'David S. Miller
Octavian Purdila says: ==================== net_sched: sch_sfq: reject a derived limit of 1 Because sfq parameters can influence each other there can be situations where although the user sets a limit of 2 it can be lowered to 1: $ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1 $ tc qdisc show dev dummy0 qdisc sfq 1: dev dummy0 root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1024 $ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 10 depth 1 divisor 1 $ tc qdisc show dev dummy0 qdisc sfq 2: root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1 As a limit of 1 is invalid, this patch series moves the limit validation to after all configuration changes have been done. To do so, the configuration is done in a temporary work area then applied to the internal state. The patch series also adds new test cases. v3: - remove a couple of unnecessary comments - rearrange local variables to use reverse Christmas tree style declaration order v2: https://lore.kernel.org/all/20250402162750.1671155-1-tavip@google.com/ - remove tmp struct and directly use local variables v1: https://lore.kernel.org/all/20250328201634.3876474-1-tavip@google.com/ =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09selftests/tc-testing: sfq: check that a derived limit of 1 is rejectedOctavian Purdila
Because the limit is updated indirectly when other parameters are updated, there are cases where even though the user requests a limit of 2 it can actually be set to 1. Add the following test cases to check that the kernel rejects them: - limit 2 depth 1 flows 1 - limit 2 depth 1 divisor 1 Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09net_sched: sch_sfq: move the limit validationOctavian Purdila
It is not sufficient to directly validate the limit on the data that the user passes as it can be updated based on how the other parameters are changed. Move the check at the end of the configuration update process to also catch scenarios where the limit is indirectly updated, for example with the following configurations: tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1 tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 divisor 1 This fixes the following syzkaller reported crash: ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:6 index 65535 is out of range for type 'struct sfq_head[128]' CPU: 1 UID: 0 PID: 3037 Comm: syz.2.16 Not tainted 6.14.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x201/0x300 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_out_of_bounds+0xf5/0x120 lib/ubsan.c:429 sfq_link net/sched/sch_sfq.c:203 [inline] sfq_dec+0x53c/0x610 net/sched/sch_sfq.c:231 sfq_dequeue+0x34e/0x8c0 net/sched/sch_sfq.c:493 sfq_reset+0x17/0x60 net/sched/sch_sfq.c:518 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035 tbf_reset+0x41/0x110 net/sched/sch_tbf.c:339 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035 dev_reset_queue+0x100/0x1b0 net/sched/sch_generic.c:1311 netdev_for_each_tx_queue include/linux/netdevice.h:2590 [inline] dev_deactivate_many+0x7e5/0xe70 net/sched/sch_generic.c:1375 Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 10685681bafc ("net_sched: sch_sfq: don't allow 1 packet limit") Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09net_sched: sch_sfq: use a temporary work area for validating configurationOctavian Purdila
Many configuration parameters have influence on others (e.g. divisor -> flows -> limit, depth -> limit) and so it is difficult to correctly do all of the validation before applying the configuration. And if a validation error is detected late it is difficult to roll back a partially applied configuration. To avoid these issues use a temporary work area to update and validate the configuration and only then apply the configuration to the internal state. Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09x86/bugs: Add RSB mitigation documentJosh Poimboeuf
Create a document to summarize hard-earned knowledge about RSB-related mitigations, with references, and replace the overly verbose yet incomplete comments with a reference to the document. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ab73f4659ba697a974759f07befd41ae605e33dd.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Don't fill RSB on context switch with eIBRSJosh Poimboeuf
User->user Spectre v2 attacks (including RSB) across context switches are already mitigated by IBPB in cond_mitigation(), if enabled globally or if either the prev or the next task has opted in to protection. RSB filling without IBPB serves no purpose for protecting user space, as indirect branches are still vulnerable. User->kernel RSB attacks are mitigated by eIBRS. In which case the RSB filling on context switch isn't needed, so remove it. Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/98cdefe42180358efebf78e3b80752850c7a3e1b.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpolineJosh Poimboeuf
eIBRS protects against guest->host RSB underflow/poisoning attacks. Adding retpoline to the mix doesn't change that. Retpoline has a balanced CALL/RET anyway. So the current full RSB filling on VMEXIT with eIBRS+retpoline is overkill. Disable it or do the VMEXIT_LITE mitigation if needed. Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Woodhouse <dwmw2@infradead.org> Link: https://lore.kernel.org/r/84a1226e5c9e2698eae1b5ade861f1b8bf3677dc.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier()Josh Poimboeuf
IBPB is expected to clear the RSB. However, if X86_BUG_IBPB_NO_RET is set, that doesn't happen. Make indirect_branch_prediction_barrier() take that into account by calling write_ibpb() which clears RSB on X86_BUG_IBPB_NO_RET: /* Make sure IBPB clears return stack preductions too. */ FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET Note that, as of the previous patch, write_ibpb() also reads 'x86_pred_cmd' in order to use SBPB when applicable: movl _ASM_RIP(x86_pred_cmd), %eax Therefore that existing behavior in indirect_branch_prediction_barrier() is not lost. Fixes: 50e4b3b94090 ("x86/entry: Have entry_ibpb() invalidate return predictions") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/bba68888c511743d4cd65564d1fc41438907523f.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Use SBPB in write_ibpb() if applicableJosh Poimboeuf
write_ibpb() does IBPB, which (among other things) flushes branch type predictions on AMD. If the CPU has SRSO_NO, or if the SRSO mitigation has been disabled, branch type flushing isn't needed, in which case the lighter-weight SBPB can be used. The 'x86_pred_cmd' variable already keeps track of whether IBPB or SBPB should be used. Use that instead of hardcoding IBPB. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/17c5dcd14b29199b75199d67ff7758de9d9a4928.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Rename entry_ibpb() to write_ibpb()Josh Poimboeuf
There's nothing entry-specific about entry_ibpb(). In preparation for calling it from elsewhere, rename it to write_ibpb(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/1e54ace131e79b760de3fe828264e26d0896e3ac.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/early_printk: Use 'mmio32' for consistency, fix commentsAndy Shevchenko
First of all, using 'mmio' prevents proper implementation of 8-bit accessors. Second, it's simply inconsistent with uart8250 set of options. Rename it to 'mmio32'. While at it, remove rather misleading comment in the documentation. From now on mmio32 is self-explanatory and pciserial supports not only 32-bit MMIO accessors. Also, while at it, fix the comment for the "pciserial" case. The comment seems to be a copy'n'paste error when mentioning "serial" instead of "pciserial" (with double quotes). Fix this. With that, move it upper, so we don't calculate 'buf' twice. Fixes: 3181424aeac2 ("x86/early_printk: Add support for MMIO-based UARTs") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Denis Mukhin <dmukhin@ford.com> Link: https://lore.kernel.org/r/20250407172214.792745-1-andriy.shevchenko@linux.intel.com
2025-04-09x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::nameJames Morse
Since 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.") a helper rdt_kn_name() that checks that rdtgroup_mutex is held has been used for all accesses to the kernfs node name. rdtgroup_mkdir() uses the name to determine if a valid monitor group is being created by checking the parent name is "mon_groups". This is done without holding rdtgroup_mutex, and now triggers the following warning: | WARNING: suspicious RCU usage | 6.15.0-rc1 #4465 Tainted: G E | ----------------------------- | arch/x86/kernel/cpu/resctrl/internal.h:408 suspicious rcu_dereference_check() usage! [...] | Call Trace: | <TASK> | dump_stack_lvl | lockdep_rcu_suspicious.cold | is_mon_groups | rdtgroup_mkdir | kernfs_iop_mkdir | vfs_mkdir | do_mkdirat | __x64_sys_mkdir | do_syscall_64 | entry_SYSCALL_64_after_hwframe Creating a control or monitor group calls mkdir_rdt_prepare(), which uses rdtgroup_kn_lock_live() to take the rdtgroup_mutex. To avoid taking and dropping the lock, move the check for the monitor group name and position into mkdir_rdt_prepare() so that it occurs under rdtgroup_mutex. Hoist is_mon_groups() earlier in the file. [ bp: Massage. ] Fixes: 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250407124637.2433230-1-james.morse@arm.com
2025-04-08Merge tag 'linux_kselftest-kunit-6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fixes from Shuah Khan: - Fix the tool to report test count in case of a late test plan when tests are specified before the test plan - Fix spelling error * tag 'linux_kselftest-kunit-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Spelling s/slowm/slow/ kunit: tool: fix count of tests if late test plan
2025-04-08net: libwx: handle page_pool_dev_alloc_pages errorChenyuan Yang
page_pool_dev_alloc_pages could return NULL. There was a WARN_ON(!page) but it would still proceed to use the NULL pointer and then crash. This is similar to commit 001ba0902046 ("net: fec: handle page_pool_dev_alloc_pages error"). This is found by our static analysis tool KNighter. Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250407184952.2111299-1-chenyuan0y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08Merge branch 'mptcp-only-inc-mpjoinackhmacfailure-for-hmac-failures'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: only inc MPJoinAckHMacFailure for HMAC failures Recently, during a debugging session using local MPTCP connections, I noticed MPJoinAckHMacFailure was strangely not zero on the server side. The first patch fixes this issue -- present since v5.9 -- and the second one validates it in the selftests. ==================== Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-0-3c9ecd0a3a50@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08selftests: mptcp: validate MPJoin HMacFailure countersMatthieu Baerts (NGI0)
The parent commit fixes an issue around these counters where one of them -- MPJoinAckHMacFailure -- was wrongly incremented in some cases. This makes sure the counter is always 0. It should be incremented only in case of corruption, or a wrong implementation, which should not be the case in these selftests. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-2-3c9ecd0a3a50@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08mptcp: only inc MPJoinAckHMacFailure for HMAC failuresMatthieu Baerts (NGI0)
Recently, during a debugging session using local MPTCP connections, I noticed MPJoinAckHMacFailure was not zero on the server side. The counter was in fact incremented when the PM rejected new subflows, because the 'subflow' limit was reached. The fix is easy, simply dissociating the two cases: only the HMAC validation check should increase MPTCP_MIB_JOINACKMAC counter. Fixes: 4cf8b7e48a09 ("subflow: introduce and use mptcp_can_accept_new_subflow()") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-1-3c9ecd0a3a50@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>