summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-10spi: fsl-spi: Remove redundant probe error messageKevin Hao
An error message is already emitted by the driver core function call_driver_probe() when the driver probe fails. Therefore, this redundant probe error message is removed. Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://patch.msgid.link/20250410-spi-v1-2-56e867cc19cf@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-10spi: fsl-qspi: Fix double cleanup in probe error pathKevin Hao
Commit 40369bfe717e ("spi: fsl-qspi: use devm function instead of driver remove") introduced managed cleanup via fsl_qspi_cleanup(), but incorrectly retain manual cleanup in two scenarios: - On devm_add_action_or_reset() failure, the function automatically call fsl_qspi_cleanup(). However, the current code still jumps to err_destroy_mutex, repeating cleanup. - After the fsl_qspi_cleanup() action is added successfully, there is no need to manually perform the cleanup in the subsequent error path. However, the current code still jumps to err_destroy_mutex on spi controller failure, repeating cleanup. Skip redundant manual cleanup calls to fix these issues. Cc: stable@vger.kernel.org Fixes: 40369bfe717e ("spi: fsl-qspi: use devm function instead of driver remove") Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://patch.msgid.link/20250410-spi-v1-1-56e867cc19cf@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-10Merge tag 'nf-25-04-10' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains a Netfilter fix and improved test coverage: 1) Fix AVX2 matching in nft_pipapo, from Florian Westphal. 2) Extend existing test to improve coverage for the aforementioned bug, also from Florian. netfilter pull request 25-04-10 * tag 'nf-25-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add test case for recent mismatch bug nft_set_pipapo: fix incorrect avx2 match of 5th field octet ==================== Link: https://patch.msgid.link/20250410103647.1030244-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-10selftests: netfilter: add test case for recent mismatch bugFlorian Westphal
Without 'nft_set_pipapo: fix incorrect avx2 match of 5th field octet" this fails: TEST: reported issues Add two elements, flush, re-add 1s [ OK ] net,mac with reload 0s [ OK ] net,port,proto 3s [ OK ] avx2 false match 0s [FAIL] False match for fe80:dead:01fe:0a02:0b03:6007:8009:a001 Other tests do not detect the kernel bug as they only alter parts in the /64 netmask. Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-10nft_set_pipapo: fix incorrect avx2 match of 5th field octetFlorian Westphal
Given a set element like: icmpv6 . dead:beef:00ff::1 The value of 'ff' is irrelevant, any address will be matched as long as the other octets are the same. This is because of too-early register clobbering: ymm7 is reloaded with new packet data (pkt[9]) but it still holds data of an earlier load that wasn't processed yet. The existing tests in nft_concat_range.sh selftests do exercise this code path, but do not trigger incorrect matching due to the network prefix limitation. Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation") Reported-by: sontu mazumdar <sontu21@gmail.com> Closes: https://lore.kernel.org/netfilter/CANgxkqwnMH7fXra+VUfODT-8+qFLgskq3set1cAzqqJaV4iEZg@mail.gmail.com/T/#t Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-10net: ppp: Add bound checking for skb data on ppp_sync_txmungArnaud Lecomte
Ensure we have enough data in linear buffer from skb before accessing initial bytes. This prevents potential out-of-bounds accesses when processing short packets. When ppp_sync_txmung receives an incoming package with an empty payload: (remote) gef➤ p *(struct pppoe_hdr *) (skb->head + skb->network_header) $18 = { type = 0x1, ver = 0x1, code = 0x0, sid = 0x2, length = 0x0, tag = 0xffff8880371cdb96 } from the skb struct (trimmed) tail = 0x16, end = 0x140, head = 0xffff88803346f400 "4", data = 0xffff88803346f416 ":\377", truesize = 0x380, len = 0x0, data_len = 0x0, mac_len = 0xe, hdr_len = 0x0, it is not safe to access data[2]. Reported-by: syzbot+29fc8991b0ecb186cf40@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=29fc8991b0ecb186cf40 Tested-by: syzbot+29fc8991b0ecb186cf40@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com> Link: https://patch.msgid.link/20250408-bound-checking-ppp_txmung-v2-1-94bb6e1b92d0@arnaud-lcm.com [pabeni@redhat.com: fixed subj typo] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-10vdso: Address variable shadowing in macrosPeng Jiang
Compiling the kernel with gcc12.3 W=2 results in shadowing warnings: warning: declaration of '__pptr' shadows a previous local [-Wshadow] const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); note: in definition of macro '__put_unaligned_t' __pptr->x = (val); note: in expansion of macro '__get_unaligned_t' __put_unaligned_t(type, __get_unaligned_t(type, src), dst); __get_unaligned_t() and __put_unaligned_t() use a local variable named '__pptr', which can lead to variable shadowing when these macros are used in the same scope. This results in a -Wshadow warning during compilation. To address this issue, rename the local variables within the macros to ensure uniqueness. Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250324191230477zpGtgIRSH4mEHdtxGtgx9@zte.com.cn
2025-04-10drm/rockchip: dw_hdmi_qp: Fix io init for dw_hdmi_qp_rockchip_resumeAndy Yan
Use cfg->ctrl_ops->io_init callback make it work for all platform. And it's also gets rid of code duplication Fixes: 3f60dbd40d3f ("drm/rockchip: dw_hdmi_qp: Add platform ctrl callback") Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250317102757.565679-1-andyshrk@163.com
2025-04-10drm/rockchip: vop2: Fix interface enable/mux setting of DP1 on rk3588Andy Yan
This is a copy-paste error, which affects DP1 usage. Fixes: 328e6885996c ("drm/rockchip: vop2: Add platform specific callback") Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250312064218.524143-1-andyshrk@163.com
2025-04-10Merge tag 'amd-drm-fixes-6.15-2025-04-09' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.15-2025-04-09: amdgpu: - MES FW version caching fixes - Only use GTT as a fallback if we already have a backing store - dma_buf fix - IP discovery fix - Replay and PSR with VRR fix - DC FP fixes - eDP fixes - KIQ TLB invalidate fix - Enable dmem groups support - Allow pinning VRAM dma bufs if imports can do P2P - Workload profile fixes - Prevent possible division by 0 in fan handling amdkfd: - Queue reset fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250409165238.1180153-1-alexander.deucher@amd.com
2025-04-10erofs: remove duplicate codeBo Liu
Remove duplicate code in function z_erofs_register_pcluster() Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250410042048.3044-2-liubo03@inspur.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-04-09smb3: Add defines for two new FileSystemAttributesSteve French
Two new file system attributes were recently added. See MS-FSCC 2.5.1: FILE_SUPPORTS_POSIX_UNLINK_RENAME and FILE_RETURNS_CLEANUP_RESULT_INFO Update the missing defines for ksmbd and cifs.ko Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-10Merge tag 'drm-intel-fixes-2025-04-09' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes drm/i915 fixes for v6.15-rc2: - Fix scanline offset for LNL+ and BMG+ - Fix GVT unterminated-string-initialization build warning - Fix DP rate limit when sink doesn't support TPS4 - Handle GDDR + ECC memory type detection - Fix VRR parameter change check - Fix fence not released on early probe errors - Disable render power gating during live selftests Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/87lds9wlpq.fsf@intel.com
2025-04-09Merge branch 'support-skf_net_off-and-skf_ll_off-on-skb-frags'Alexei Starovoitov
Willem de Bruijn says: ==================== support SKF_NET_OFF and SKF_LL_OFF on skb frags From: Willem de Bruijn <willemb@google.com> Address a longstanding issue that may lead to missed packets depending on system configuration. Ensure that reading from packet contents works regardless of skb geometry, also when using the special SKF_.. negative offsets to offset from L2 or L3 header. Patch 2 is the selftest for the fix. v2->v3 - do not remove bpf_internal_load_pointer_neg_helper, because it is still used in the sparc32 JIT v1->v2 - introduce bfp_skb_load_helper_convert_offset to avoid open coding - selftest: add comment why early demux must be disabled v2: https://lore.kernel.org/netdev/20250404142633.1955847-1-willemdebruijn.kernel@gmail.com/ v1: https://lore.kernel.org/netdev/20250403140846.1268564-1-willemdebruijn.kernel@gmail.com/ ==================== Link: https://patch.msgid.link/20250408132833.195491-1-willemdebruijn.kernel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09selftests/net: test sk_filter support for SKF_NET_OFF on fragsWillem de Bruijn
Verify that a classic BPF linux socket filter correctly matches packet contents. Including when accessing contents in an skb_frag. 1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000. 2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags. 3. Send a packet for which the UDP header is in frag[0]. 4. Receive this packet to demonstrate that the socket accepted it. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20250408132833.195491-3-willemdebruijn.kernel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09bpf: support SKF_NET_OFF and SKF_LL_OFF on skb fragsWillem de Bruijn
Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to read when these offsets extend into frags. This has been observed with iwlwifi and reproduced with tun with IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, applied to a RAW socket, will silently miss matching packets. const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); struct sock_filter filter_code[] = { BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), This is unexpected behavior. Socket filter programs should be consistent regardless of environment. Silent misses are particularly concerning as hard to detect. Use skb_copy_bits for offsets outside linear, same as done for non-SKF_(LL|NET) offsets. Offset is always positive after subtracting the reference threshold SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of the two is an offset against skb->data, and may be negative, but it cannot point before skb->head, as skb_(mac|network)_offset would too. This appears to go back to when frag support was introduced to sk_run_filter in linux-2.4.4, before the introduction of git. The amount of code change and 8/16/32 bit duplication are unfortunate. But any attempt I made to be smarter saved very few LoC while complicating the code. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 Reported-by: Matt Moeller <moeller.matt@gmail.com> Co-developed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/r/20250408132833.195491-2-willemdebruijn.kernel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod.Kuniyuki Iwashima
When I ran the repro [0] and waited a few seconds, I observed two LOCKDEP splats: a warning immediately followed by a null-ptr-deref. [1] Reproduction Steps: 1) Mount CIFS 2) Add an iptables rule to drop incoming FIN packets for CIFS 3) Unmount CIFS 4) Unload the CIFS module 5) Remove the iptables rule At step 3), the CIFS module calls sock_release() for the underlying TCP socket, and it returns quickly. However, the socket remains in FIN_WAIT_1 because incoming FIN packets are dropped. At this point, the module's refcnt is 0 while the socket is still alive, so the following rmmod command succeeds. # ss -tan State Recv-Q Send-Q Local Address:Port Peer Address:Port FIN-WAIT-1 0 477 10.0.2.15:51062 10.0.0.137:445 # lsmod | grep cifs cifs 1159168 0 This highlights a discrepancy between the lifetime of the CIFS module and the underlying TCP socket. Even after CIFS calls sock_release() and it returns, the TCP socket does not die immediately in order to close the connection gracefully. While this is generally fine, it causes an issue with LOCKDEP because CIFS assigns a different lock class to the TCP socket's sk->sk_lock using sock_lock_init_class_and_name(). Once an incoming packet is processed for the socket or a timer fires, sk->sk_lock is acquired. Then, LOCKDEP checks the lock context in check_wait_context(), where hlock_class() is called to retrieve the lock class. However, since the module has already been unloaded, hlock_class() logs a warning and returns NULL, triggering the null-ptr-deref. If LOCKDEP is enabled, we must ensure that a module calling sock_lock_init_class_and_name() (CIFS, NFS, etc) cannot be unloaded while such a socket is still alive to prevent this issue. Let's hold the module reference in sock_lock_init_class_and_name() and release it when the socket is freed in sk_prot_free(). Note that sock_lock_init() clears sk->sk_owner for svc_create_socket() that calls sock_lock_init_class_and_name() for a listening socket, which clones a socket by sk_clone_lock() without GFP_ZERO. [0]: CIFS_SERVER="10.0.0.137" CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST" DEV="enp0s3" CRED="/root/WindowsCredential.txt" MNT=$(mktemp -d /tmp/XXXXXX) mount -t cifs ${CIFS_PATH} ${MNT} -o vers=3.0,credentials=${CRED},cache=none,echo_interval=1 iptables -A INPUT -s ${CIFS_SERVER} -j DROP for i in $(seq 10); do umount ${MNT} rmmod cifs sleep 1 done rm -r ${MNT} iptables -D INPUT -s ${CIFS_SERVER} -j DROP [1]: DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) ... Call Trace: <IRQ> __lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178) lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ... BUG: kernel NULL pointer dereference, address: 00000000000000c4 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G W 6.14.0 #36 Tainted: [W]=WARN Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178) Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44 RSP: 0018:ffa0000000468a10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027 RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88 RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88 R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1 FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234) ip_sublist_rcv_finish (net/ipv4/ip_input.c:576) ip_list_rcv_finish (net/ipv4/ip_input.c:628) ip_list_rcv (net/ipv4/ip_input.c:670) __netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986) netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129) napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496) e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815) __napi_poll.constprop.0 (net/core/dev.c:7191) net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382) handle_softirqs (kernel/softirq.c:561) __irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662) irq_exit_rcu (kernel/softirq.c:680) common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14)) </IRQ> <TASK> asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693) RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744) Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202 RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5 RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118) do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:315) common_startup_64 (arch/x86/kernel/head_64.S:421) </TASK> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CR2: 00000000000000c4 Fixes: ed07536ed673 ("[PATCH] lockdep: annotate nfs/nfsd in-kernel sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250407163313.22682-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09ipv6: Align behavior across nexthops during path selectionIdo Schimmel
A nexthop is only chosen when the calculated multipath hash falls in the nexthop's hash region (i.e., the hash is smaller than the nexthop's hash threshold) and when the nexthop is assigned a non-negative score by rt6_score_route(). Commit 4d0ab3a6885e ("ipv6: Start path selection from the first nexthop") introduced an unintentional difference between the first nexthop and the rest when the score is negative. When the first nexthop matches, but has a negative score, the code will currently evaluate subsequent nexthops until one is found with a non-negative score. On the other hand, when a different nexthop matches, but has a negative score, the code will fallback to the nexthop with which the selection started ('match'). Align the behavior across all nexthops and fallback to 'match' when the first nexthop matches, but has a negative score. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Fixes: 4d0ab3a6885e ("ipv6: Start path selection from the first nexthop") Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Closes: https://lore.kernel.org/netdev/67efef607bc41_1ddca82948c@willemb.c.googlers.com.notmuch/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250408084316.243559-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: phy: allow MDIO bus PM ops to start/stop state machine for ↵Vladimir Oltean
phylink-controlled PHY DSA has 2 kinds of drivers: 1. Those who call dsa_switch_suspend() and dsa_switch_resume() from their device PM ops: qca8k-8xxx, bcm_sf2, microchip ksz 2. Those who don't: all others. The above methods should be optional. For type 1, dsa_switch_suspend() calls dsa_user_suspend() -> phylink_stop(), and dsa_switch_resume() calls dsa_user_resume() -> phylink_start(). These seem good candidates for setting mac_managed_pm = true because that is essentially its definition [1], but that does not seem to be the biggest problem for now, and is not what this change focuses on. Talking strictly about the 2nd category of DSA drivers here (which do not have MAC managed PM, meaning that for their attached PHYs, mdio_bus_phy_suspend() and mdio_bus_phy_resume() should run in full), I have noticed that the following warning from mdio_bus_phy_resume() is triggered: WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY && phydev->state != PHY_UP); because the PHY state machine is running. It's running as a result of a previous dsa_user_open() -> ... -> phylink_start() -> phy_start() having been initiated by the user. The previous mdio_bus_phy_suspend() was supposed to have called phy_stop_machine(), but it didn't. So this is why the PHY is in state PHY_NOLINK by the time mdio_bus_phy_resume() runs. mdio_bus_phy_suspend() did not call phy_stop_machine() because for phylink, the phydev->adjust_link function pointer is NULL. This seems a technicality introduced by commit fddd91016d16 ("phylib: fix PAL state machine restart on resume"). That commit was written before phylink existed, and was intended to avoid crashing with consumer drivers which don't use the PHY state machine - phylink always does, when using a PHY. But phylink itself has historically not been developed with suspend/resume in mind, and apparently not tested too much in that scenario, allowing this bug to exist unnoticed for so long. Plus, prior to the WARN_ON(), it would have likely been invisible. This issue is not in fact restricted to type 2 DSA drivers (according to the above ad-hoc classification), but can be extrapolated to any MAC driver with phylink and MDIO-bus-managed PHY PM ops. DSA is just where the issue was reported. Assuming mac_managed_pm is set correctly, a quick search indicates the following other drivers might be affected: $ grep -Zlr PHYLINK_NETDEV drivers/ | xargs -0 grep -L mac_managed_pm drivers/net/ethernet/atheros/ag71xx.c drivers/net/ethernet/microchip/sparx5/sparx5_main.c drivers/net/ethernet/microchip/lan966x/lan966x_main.c drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c drivers/net/ethernet/freescale/dpaa/dpaa_eth.c drivers/net/ethernet/freescale/ucc_geth.c drivers/net/ethernet/freescale/enetc/enetc_pf_common.c drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c drivers/net/ethernet/marvell/mvneta.c drivers/net/ethernet/marvell/prestera/prestera_main.c drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/altera/altera_tse_main.c drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c drivers/net/ethernet/meta/fbnic/fbnic_phylink.c drivers/net/ethernet/tehuti/tn40_phy.c drivers/net/ethernet/mscc/ocelot_net.c Make the existing conditions dependent on the PHY device having a phydev->phy_link_change() implementation equal to the default phy_link_change() provided by phylib. Otherwise, we implicitly know that the phydev has the phylink-provided phylink_phy_change() callback, and when phylink is used, the PHY state machine always needs to be stopped/ started on the suspend/resume path. The code is structured as such that if phydev->phy_link_change() is absent, it is a matter of time until the kernel will crash - no need to further complicate the test. Thus, for the situation where the PM is not managed by the MAC, we will make the MDIO bus PM ops treat identically the phylink-controlled PHYs with the phylib-controlled PHYs where an adjust_link() callback is supplied. In both cases, the MDIO bus PM ops should stop and restart the PHY state machine. [1] https://lore.kernel.org/netdev/Z-1tiW9zjcoFkhwc@shell.armlinux.org.uk/ Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Reported-by: Wei Fang <wei.fang@nxp.com> Tested-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250407094042.2155633-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend()Vladimir Oltean
In an upcoming change, mdio_bus_phy_may_suspend() will need to distinguish a phylib-based PHY client from a phylink PHY client. For that, it will need to compare the phydev->phy_link_change() function pointer with the eponymous phy_link_change() provided by phylib. To avoid forward function declarations, the default PHY link state change method should be moved upwards. There is no functional change associated with this patch, it is only to reduce the noise from a real bug fix. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250407093900.2155112-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09Merge tag 'linux_kselftest-fixes-6.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - Fixes tpm2, futex, and mincore tests - Create a dedicated .gitignore for tpm2 tests * tag 'linux_kselftest-fixes-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/mincore: Allow read-ahead pages to reach the end of the file selftests/futex: futex_waitv wouldblock test should fail selftests: tpm2: test_smoke: use POSIX-conformant expression operator selftests: tpm2: create a dedicated .gitignore
2025-04-09cifs: Fix querying of WSL CHR and BLK reparse points over SMB1Pali Rohár
When reparse point in SMB1 query_path_info() callback was detected then query also for EA $LXDEV. In this EA are stored device major and minor numbers used by WSL CHR and BLK reparse points. Without major and minor numbers, stat() syscall does not work for char and block devices. Similar code is already in SMB2+ query_path_info() callback function. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-09cifs: Split parse_reparse_point callback to functions: get buffer and parse ↵Pali Rohár
buffer Parsing reparse point buffer is generic for all SMB versions and is already implemented by global function parse_reparse_point(). Getting reparse point buffer from the SMB response is SMB version specific, so introduce for it a new callback get_reparse_point_buffer. This functionality split is needed for followup change - getting reparse point buffer without parsing it. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-09cifs: Improve handling of name surrogate reparse points in reparse.cPali Rohár
Like previous changes for file inode.c, handle directory name surrogate reparse points generally also in reparse.c. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-09cifs: Remove explicit handling of IO_REPARSE_TAG_MOUNT_POINT in inode.cPali Rohár
IO_REPARSE_TAG_MOUNT_POINT is just a specific case of directory Name Surrogate reparse point. As reparse_info_to_fattr() already handles all directory Name Surrogate reparse point (done by the previous change), there is no need to have explicit case for IO_REPARSE_TAG_MOUNT_POINT. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-09timekeeping: Add a lockdep override in tick_freeze()Sebastian Andrzej Siewior
tick_freeze() acquires a raw spinlock (tick_freeze_lock). Later in the callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver acquires a spinlock which becomes a sleeping lock on PREEMPT_RT. Lockdep complains about this lock nesting. Add a lockdep override for this special case and a comment explaining why it is okay. Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20250404133429.pnAzf-eF@linutronix.de Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/ Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
2025-04-09x86/ibt: Fix hibernatePeter Zijlstra
Todd reported, and Len confirmed, that commit 582077c94052 ("x86/cfi: Clean up linkage") broke S4 hiberate on a fair number of machines. Turns out these machines trip #CP when trying to restore the image. As it happens, the commit in question removes two ENDBR instructions in the hibernate code, and clearly got it wrong. Notably restore_image() does an indirect jump to relocated_restore_code(), which is a relocated copy of core_restore_code(). In turn, core_restore_code(), will at the end do an indirect jump to restore_jump_address (r8), which is pointing at a relocated restore_registers(). So both sites do indeed need to be ENDBR. Fixes: 582077c94052 ("x86/cfi: Clean up linkage") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Todd Brandt <todd.e.brandt@intel.com> Tested-by: Len Brown <len.brown@intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219998 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219998
2025-04-09hrtimer: Add missing ACCESS_PRIVATE() for hrtimer::functionNam Cao
The "function" field of struct hrtimer has been changed to private, but two instances have not been converted to use ACCESS_PRIVATE(). Convert them to use ACCESS_PRIVATE(). Fixes: 04257da0c99c ("hrtimers: Make callback function pointer private") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250408103854.1851093-1-namcao@linutronix.de Closes: https://lore.kernel.org/oe-kbuild-all/202504071931.vOVl13tt-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202504072155.5UAZjYGU-lkp@intel.com/
2025-04-09cifs: Fix encoding of SMB1 Session Setup Kerberos Request in non-UNICODE modePali Rohár
Like in UNICODE mode, SMB1 Session Setup Kerberos Request contains oslm and domain strings. Extract common code into ascii_oslm_strings() and ascii_domain_string() functions (similar to unicode variants) and use these functions in non-UNICODE code path in sess_auth_kerberos(). Decision if non-UNICODE or UNICODE mode is used is based on the SMBFLG2_UNICODE flag in Flags2 packed field, and not based on the capabilities of server. Fix this check too. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-09tracing: Do not add length to print format in synthetic eventsSteven Rostedt
The following causes a vsnprintf fault: # echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events # echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger Because the synthetic event's "wakee" field is created as a dynamic string (even though the string copied is not). The print format to print the dynamic string changed from "%*s" to "%s" because another location (__set_synth_event_print_fmt()) exported this to user space, and user space did not need that. But it is still used in print_synth_event(), and the output looks like: <idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155 sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58 <idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91 bash-880 [002] d..5. 193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21 <idle>-0 [001] d..5. 193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129 sshd-session-879 [001] d..5. 193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50 The length isn't needed as the string is always nul terminated. Just print the string and not add the length (which was hard coded to the max string length anyway). Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Douglas Raillard <douglas.raillard@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home Fixes: 4d38328eb442d ("tracing: Fix synth event printk format for str fields"); Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-09smb: client: fix UAF in decryption with multichannelPaulo Alcantara
After commit f7025d861694 ("smb: client: allocate crypto only for primary server") and commit b0abcd65ec54 ("smb: client: fix UAF in async decryption"), the channels started reusing AEAD TFM from primary channel to perform synchronous decryption, but that can't done as there could be multiple cifsd threads (one per channel) simultaneously accessing it to perform decryption. This fixes the following KASAN splat when running fstest generic/249 with 'vers=3.1.1,multichannel,max_channels=4,seal' against Windows Server 2022: BUG: KASAN: slab-use-after-free in gf128mul_4k_lle+0xba/0x110 Read of size 8 at addr ffff8881046c18a0 by task cifsd/986 CPU: 3 UID: 0 PID: 986 Comm: cifsd Not tainted 6.15.0-rc1 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 print_report+0x156/0x528 ? gf128mul_4k_lle+0xba/0x110 ? __virt_addr_valid+0x145/0x300 ? __phys_addr+0x46/0x90 ? gf128mul_4k_lle+0xba/0x110 kasan_report+0xdf/0x1a0 ? gf128mul_4k_lle+0xba/0x110 gf128mul_4k_lle+0xba/0x110 ghash_update+0x189/0x210 shash_ahash_update+0x295/0x370 ? __pfx_shash_ahash_update+0x10/0x10 ? __pfx_shash_ahash_update+0x10/0x10 ? __pfx_extract_iter_to_sg+0x10/0x10 ? ___kmalloc_large_node+0x10e/0x180 ? __asan_memset+0x23/0x50 crypto_ahash_update+0x3c/0xc0 gcm_hash_assoc_remain_continue+0x93/0xc0 crypt_message+0xe09/0xec0 [cifs] ? __pfx_crypt_message+0x10/0x10 [cifs] ? _raw_spin_unlock+0x23/0x40 ? __pfx_cifs_readv_from_socket+0x10/0x10 [cifs] decrypt_raw_data+0x229/0x380 [cifs] ? __pfx_decrypt_raw_data+0x10/0x10 [cifs] ? __pfx_cifs_read_iter_from_socket+0x10/0x10 [cifs] smb3_receive_transform+0x837/0xc80 [cifs] ? __pfx_smb3_receive_transform+0x10/0x10 [cifs] ? __pfx___might_resched+0x10/0x10 ? __pfx_smb3_is_transform_hdr+0x10/0x10 [cifs] cifs_demultiplex_thread+0x692/0x1570 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] ? rcu_is_watching+0x20/0x50 ? rcu_lockdep_current_cpu_online+0x62/0xb0 ? find_held_lock+0x32/0x90 ? kvm_sched_clock_read+0x11/0x20 ? local_clock_noinstr+0xd/0xd0 ? trace_irq_enable.constprop.0+0xa8/0xe0 ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0x1fe/0x380 ? kthread+0x10f/0x380 ? __pfx_kthread+0x10/0x10 ? local_clock_noinstr+0xd/0xd0 ? ret_from_fork+0x1b/0x60 ? local_clock+0x15/0x30 ? lock_release+0x29b/0x390 ? rcu_is_watching+0x20/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Tested-by: David Howells <dhowells@redhat.com> Reported-by: Steve French <stfrench@microsoft.com> Closes: https://lore.kernel.org/r/CAH2r5mu6Yc0-RJXM3kFyBYUB09XmXBrNodOiCVR4EDrmxq5Szg@mail.gmail.com Fixes: f7025d861694 ("smb: client: allocate crypto only for primary server") Fixes: b0abcd65ec54 ("smb: client: fix UAF in async decryption") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-09x86/cpu: Avoid running off the end of an AMD erratum tableDave Hansen
The NULL array terminator at the end of erratum_1386_microcode was removed during the switch from x86_cpu_desc to x86_cpu_id. This causes readers to run off the end of the array. Replace the NULL. Fixes: f3f325152673 ("x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'") Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2025-04-09erofs: fix encoded extents handlingGao Xiang
- The MSB 32 bits of `z_fragmentoff` are available only in extent records of size >= 8B. - Use round_down() to calculate `lstart` as well as increase `pos` correspondingly for extent records of size == 8B. Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250408114448.4040220-2-hsiangkao@linux.alibaba.com
2025-04-09erofs: add __packed annotation to union(__le16..)Gao Xiang
I'm unsure why they aren't 2 bytes in size only in arm-linux-gnueabi. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202504051202.DS7QIknJ-lkp@intel.com Fixes: 61ba89b57905 ("erofs: add 48-bit block addressing on-disk support") Fixes: efb2aef569b3 ("erofs: add encoded extent on-disk definition") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250408114448.4040220-1-hsiangkao@linux.alibaba.com
2025-04-09erofs: set error to bio if file-backed IO failsSheng Yong
If a file-backed IO fails before submitting the bio to the lower filesystem, an error is returned, but the bio->bi_status is not marked as an error. However, the error information should be passed to the end_io handler. Otherwise, the IO request will be treated as successful. Fixes: 283213718f5d ("erofs: support compressed inodes for fileio") Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250408122351.2104507-1-shengyong1@xiaomi.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-04-09drm/amdgpu/mes12: optimize MES pipe FW version fetchingAlex Deucher
Don't fetch it again if we already have it. It seems the registers don't reliably have the value at resume in some cases. Fixes: 785f0f9fe742 ("drm/amdgpu: Add mes v12_0 ip block support (v4)") Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9e7b08d239c2f21e8f417854f81e5ff40edbebff) Cc: stable@vger.kernel.org # 6.12.x
2025-04-09drm/amd/pm/smu11: Prevent division by zeroDenis Arefev
The user can set any speed value. If speed is greater than UINT_MAX/8, division by zero is possible. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1e866f1fe528 ("drm/amd/pm: Prevent divide by zero") Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit da7dc714a8f8e1c9fc33c57cd63583779a3bef71) Cc: stable@vger.kernel.org
2025-04-09drm/amdgpu: cancel gfx idle work in device suspend for s0ixAlex Deucher
This is normally handled in the gfx IP suspend callbacks, but for S0ix, those are skipped because we don't want to touch gfx. So handle it in device suspend. Fixes: b9467983b774 ("drm/amdgpu: add dynamic workload profile switching for gfx10") Fixes: 963537ca2325 ("drm/amdgpu: add dynamic workload profile switching for gfx11") Fixes: 5f95a1549555 ("drm/amdgpu: add dynamic workload profile switching for gfx12") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 906ad451675155380c1dc1881a244ebde8e8df0a) Cc: stable@vger.kernel.org
2025-04-09drm/amd/display: pause the workload setting in dmKenneth Feng
Pause the workload setting in dm when doing idle optimization Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b23f81c442ac33af0c808b4bb26333b881669bb7)
2025-04-09drm/amdgpu/pm/swsmu: implement pause workload profileAlex Deucher
Add the callback for implementation for swsmu. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 92e511d1cecc6a8fa7bdfc8657f16ece9ab4d456)
2025-04-09drm/amdgpu/pm: add workload profile pause helperAlex Deucher
To be used for display idle optimizations when we want to pause non-default profiles. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6dafb5d4c7cdfc8f994e789d050e29e0d5ca6efd)
2025-04-09ublk: pass ublksrv_ctrl_cmd * instead of io_uring_cmd *Caleb Sander Mateos
The ublk_ctrl_*() handlers all take struct io_uring_cmd *cmd but only use it to get struct ublksrv_ctrl_cmd *header from the io_uring SQE. Since the caller ublk_ctrl_uring_cmd() has already computed header, pass it instead of cmd. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250409012928.3527198-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-09ublk: don't fail request for recovery & reissue in case of ubq->cancelingMing Lei
ubq->canceling is set with request queue quiesced when io_uring context is exiting. USER_RECOVERY or !RECOVERY_FAIL_IO requires request to be re-queued and re-dispatch after device is recovered. However commit d796cea7b9f3 ("ublk: implement ->queue_rqs()") still may fail any request in case of ubq->canceling, this way breaks USER_RECOVERY or !RECOVERY_FAIL_IO. Fix it by calling __ublk_abort_rq() in case of ubq->canceling. Reviewed-by: Uday Shankar <ushankar@purestorage.com> Reported-by: Uday Shankar <ushankar@purestorage.com> Closes: https://lore.kernel.org/linux-block/Z%2FQkkTRHfRxtN%2FmB@dev-ushankar.dev.purestorage.com/ Fixes: d796cea7b9f3 ("ublk: implement ->queue_rqs()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250409011444.2142010-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-09ublk: fix handling recovery & reissue in ublk_abort_queue()Ming Lei
Commit 8284066946e6 ("ublk: grab request reference when the request is handled by userspace") doesn't grab request reference in case of recovery reissue. Then the request can be requeued & re-dispatch & failed when canceling uring command. If it is one zc request, the request can be freed before io_uring returns the zc buffer back, then cause kernel panic: [ 126.773061] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 126.773657] #PF: supervisor read access in kernel mode [ 126.774052] #PF: error_code(0x0000) - not-present page [ 126.774455] PGD 0 P4D 0 [ 126.774698] Oops: Oops: 0000 [#1] SMP NOPTI [ 126.775034] CPU: 13 UID: 0 PID: 1612 Comm: kworker/u64:55 Not tainted 6.14.0_blk+ #182 PREEMPT(full) [ 126.775676] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 [ 126.776275] Workqueue: iou_exit io_ring_exit_work [ 126.776651] RIP: 0010:ublk_io_release+0x14/0x130 [ublk_drv] Fixes it by always grabbing request reference for aborting the request. Reported-by: Caleb Sander Mateos <csander@purestorage.com> Closes: https://lore.kernel.org/linux-block/CADUfDZodKfOGUeWrnAxcZiLT+puaZX8jDHoj_sfHZCOZwhzz6A@mail.gmail.com/ Fixes: 8284066946e6 ("ublk: grab request reference when the request is handled by userspace") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250409011444.2142010-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-09Documentation/x86: Zap the subsection lettersBorislav Petkov (AMD)
The subsections already have numbering - no need for the letters too. Zap the latter. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250409111435.GEZ_ZWmz3_lkP8S9Lb@fat_crate.local
2025-04-09Documentation/x86: Update the naming of CPU features for /proc/cpuinfoNaveen N Rao (AMD)
Commit: 78ce84b9e0a5 ("x86/cpufeatures: Flip the /proc/cpuinfo appearance logic") changed how CPU feature names should be specified. Update document to reflect the same. Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250409111341.GDZ_ZWZS4LckBcirLE@fat_crate.local
2025-04-09Merge branch 'sch_sfq-derived-limit'David S. Miller
Octavian Purdila says: ==================== net_sched: sch_sfq: reject a derived limit of 1 Because sfq parameters can influence each other there can be situations where although the user sets a limit of 2 it can be lowered to 1: $ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1 $ tc qdisc show dev dummy0 qdisc sfq 1: dev dummy0 root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1024 $ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 10 depth 1 divisor 1 $ tc qdisc show dev dummy0 qdisc sfq 2: root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1 As a limit of 1 is invalid, this patch series moves the limit validation to after all configuration changes have been done. To do so, the configuration is done in a temporary work area then applied to the internal state. The patch series also adds new test cases. v3: - remove a couple of unnecessary comments - rearrange local variables to use reverse Christmas tree style declaration order v2: https://lore.kernel.org/all/20250402162750.1671155-1-tavip@google.com/ - remove tmp struct and directly use local variables v1: https://lore.kernel.org/all/20250328201634.3876474-1-tavip@google.com/ =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09selftests/tc-testing: sfq: check that a derived limit of 1 is rejectedOctavian Purdila
Because the limit is updated indirectly when other parameters are updated, there are cases where even though the user requests a limit of 2 it can actually be set to 1. Add the following test cases to check that the kernel rejects them: - limit 2 depth 1 flows 1 - limit 2 depth 1 divisor 1 Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09net_sched: sch_sfq: move the limit validationOctavian Purdila
It is not sufficient to directly validate the limit on the data that the user passes as it can be updated based on how the other parameters are changed. Move the check at the end of the configuration update process to also catch scenarios where the limit is indirectly updated, for example with the following configurations: tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1 tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 divisor 1 This fixes the following syzkaller reported crash: ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:6 index 65535 is out of range for type 'struct sfq_head[128]' CPU: 1 UID: 0 PID: 3037 Comm: syz.2.16 Not tainted 6.14.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x201/0x300 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_out_of_bounds+0xf5/0x120 lib/ubsan.c:429 sfq_link net/sched/sch_sfq.c:203 [inline] sfq_dec+0x53c/0x610 net/sched/sch_sfq.c:231 sfq_dequeue+0x34e/0x8c0 net/sched/sch_sfq.c:493 sfq_reset+0x17/0x60 net/sched/sch_sfq.c:518 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035 tbf_reset+0x41/0x110 net/sched/sch_tbf.c:339 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035 dev_reset_queue+0x100/0x1b0 net/sched/sch_generic.c:1311 netdev_for_each_tx_queue include/linux/netdevice.h:2590 [inline] dev_deactivate_many+0x7e5/0xe70 net/sched/sch_generic.c:1375 Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 10685681bafc ("net_sched: sch_sfq: don't allow 1 packet limit") Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09net_sched: sch_sfq: use a temporary work area for validating configurationOctavian Purdila
Many configuration parameters have influence on others (e.g. divisor -> flows -> limit, depth -> limit) and so it is difficult to correctly do all of the validation before applying the configuration. And if a validation error is detected late it is difficult to roll back a partially applied configuration. To avoid these issues use a temporary work area to update and validate the configuration and only then apply the configuration to the internal state. Signed-off-by: Octavian Purdila <tavip@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>