summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-06swapfile: disable swapon for bs > ps devicesLuis Chamberlain
Devices which have a requirement for bs > ps cannot be supported for swap as swap still needs work. Now that the block device cache sets the min order for block devices we need this stop gap otherwise all swap operations are rejected. Without this you'll end up with errors on these devices as the swap code still needs much love to support min order. With this we at least now put a stop gap of its use, until the swap subsystem completes its major overhaul: mkswap: /dev/nvme3n1: warning: wiping old swap signature. Setting up swapspace version 1, size = 100 GiB (107374178304 bytes) no label, UUID=6af76b5c-7e7b-4902-b7f7-4c24dde6fa36 swapon: /dev/nvme3n1: swapon failed: Invalid argument Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/aBkS926thy9zvdZb@bombadil.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-06x86/cpu: Sanitize CPUID(0x80000000) outputAhmed S. Darwish
CPUID(0x80000000).EAX returns the max extended CPUID leaf available. On x86-32 machines without an extended CPUID range, a CPUID(0x80000000) query will just repeat the output of the last valid standard CPUID leaf on the CPU; i.e., a garbage values. Current tip:x86/cpu code protects against this by doing: eax = cpuid_eax(0x80000000); c->extended_cpuid_level = eax; if ((eax & 0xffff0000) == 0x80000000) { // CPU has an extended CPUID range. Check for 0x80000001 if (eax >= 0x80000001) { cpuid(0x80000001, ...); } } This is correct so far. Afterwards though, the same possibly broken EAX value is used to check the availability of other extended CPUID leaves: if (c->extended_cpuid_level >= 0x80000007) ... if (c->extended_cpuid_level >= 0x80000008) ... if (c->extended_cpuid_level >= 0x8000000a) ... if (c->extended_cpuid_level >= 0x8000001f) ... which is invalid. Fix this by immediately setting the CPU's max extended CPUID leaf to zero if CPUID(0x80000000).EAX doesn't indicate a valid CPUID extended range. While at it, add a comment, similar to kernel/head_32.S, clarifying the CPUID(0x80000000) sanity check. References: 8a50e5135af0 ("x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX") Fixes: 3da99c977637 ("x86: make (early)_identify_cpu more the same between 32bit and 64 bit") Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250506050437.10264-3-darwi@linutronix.de
2025-05-06tools/x86/kcpuid: Update bitfields to x86-cpuid-db v2.4Ahmed S. Darwish
Update kcpuid's CSV file to version 2.4, as generated by x86-cpuid-db. Summary of the v2.4 changes: * Mark CPUID(0x80000001) EDX:23 bit, 'e_mmx', as not exclusive to Transmeta since it is supported by AMD as well. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v2.4/CHANGELOG.rst Link: https://lore.kernel.org/r/20250506050437.10264-2-darwi@linutronix.de
2025-05-06Merge tag 'v6.15-rc5' into x86/cpu, to resolve conflictsIngo Molnar
Conflicts: tools/arch/x86/include/asm/cpufeatures.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-06powerpc/boot: Fix build with gcc 15Michal Suchanek
Similar to x86 the ppc boot code does not build with GCC 15. Copy the fix from commit ee2ab467bddf ("x86/boot: Use '-std=gnu11' to fix build with GCC 15") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Tested-by: Amit Machhiwal <amachhiw@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250331105722.19709-1-msuchanek@suse.de
2025-05-05tools: ynl-gen: validate 0 len strings from kernelDavid Wei
Strings from the kernel are guaranteed to be null terminated and ynl_attr_validate() checks for this. But it doesn't check if the string has a len of 0, which would cause problems when trying to access data[len - 1]. Fix this by checking that len is positive. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250503043050.861238-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05Merge branch 'selftests-drv-net-fix-ping-py-test-failure'Jakub Kicinski
Mohsin Bashir says: ==================== selftests: drv: net: fix `ping.py` test failure Fix `ping.py` test failure on an ipv6 system, and appropriately handle the cases where either one of the two address families (ipv4, ipv6) is not present. ==================== Link: https://patch.msgid.link/20250503013518.1722913-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05selftests: drv: net: add version indicatorMohsin Bashir
Currently, the test result does not differentiate between the cases when either one of the address families are configured or if both the address families are configured. Ideally, the result should report if a particular case was skipped. ./drivers/net/ping.py TAP version 13 1..7 ok 1 ping.test_default_v4 # SKIP Test requires IPv4 connectivity ok 2 ping.test_default_v6 ok 3 ping.test_xdp_generic_sb ok 4 ping.test_xdp_generic_mb ok 5 ping.test_xdp_native_sb ok 6 ping.test_xdp_native_mb ok 7 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:5 fail:0 xfail:0 xpass:0 skip:2 error:0 Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250503013518.1722913-4-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05selftests: drv: net: avoid skipping testsMohsin Bashir
On a system with either of the ipv4 or ipv6 information missing, tests are currently skipped. Ideally, the test should run as long as at least one address family is present. This patch make test run whenever possible. Before: ./drivers/net/ping.py TAP version 13 1..6 ok 1 ping.test_default # SKIP Test requires IPv4 connectivity ok 2 ping.test_xdp_generic_sb # SKIP Test requires IPv4 connectivity ok 3 ping.test_xdp_generic_mb # SKIP Test requires IPv4 connectivity ok 4 ping.test_xdp_native_sb # SKIP Test requires IPv4 connectivity ok 5 ping.test_xdp_native_mb # SKIP Test requires IPv4 connectivity ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:0 fail:0 xfail:0 xpass:0 skip:6 error:0 After: ./drivers/net/ping.py TAP version 13 1..6 ok 1 ping.test_default ok 2 ping.test_xdp_generic_sb ok 3 ping.test_xdp_generic_mb ok 4 ping.test_xdp_native_sb ok 5 ping.test_xdp_native_mb ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:5 fail:0 xfail:0 xpass:0 skip:1 error:0 Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20250503013518.1722913-3-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05selftests: drv: net: fix test failure on ipv6 sysMohsin Bashir
The `get_interface_info` call has ip version hard-coded which leads to failures on an IPV6 system. The NetDrvEnv class already gathers information about remote interface, so instead of fixing the local implementation switch to using cfg.remote_ifname. Before: ./drivers/net/ping.py Traceback (most recent call last): File "/new_tests/./drivers/net/ping.py", line 217, in <module> main() File "/new_tests/./drivers/net/ping.py", line 204, in main get_interface_info(cfg) File "/new_tests/./drivers/net/ping.py", line 128, in get_interface_info raise KsftFailEx('Can not get remote interface') net.lib.py.ksft.KsftFailEx: Can not get remote interface After: ./drivers/net/ping.py TAP version 13 1..6 ok 1 ping.test_default # SKIP Test requires IPv4 connectivity ok 2 ping.test_xdp_generic_sb # SKIP Test requires IPv4 connectivity ok 3 ping.test_xdp_generic_mb # SKIP Test requires IPv4 connectivity ok 4 ping.test_xdp_native_sb # SKIP Test requires IPv4 connectivity ok 5 ping.test_xdp_native_mb # SKIP Test requires IPv4 connectivity ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:0 fail:0 xfail:0 xpass:0 skip:6 error:0 Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250503013518.1722913-2-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05blk-throttle: Add an additional overflow check to the call ↵Zizhi Wo
calculate_bytes/io_allowed Now the tg->[bytes/io]_disp type is signed, and calculate_bytes/io_allowed return type is unsigned. Even if the bps/iops limit is not set to max, the return value of the function may still exceed INT_MAX or LLONG_MAX, which can cause overflow in outer variables. In such cases, we can add additional checks accordingly. And in throtl_trim_slice(), if the BPS/IOPS limit is set to max, there's no need to call calculate_bytes/io_allowed(). Introduces the helper functions throtl_trim_bps/iops to simplifies the process. For cases when the calculated trim value exceeds INT_MAX (causing an overflow), we reset tg->[bytes/io]_disp to zero, so return original tg->[bytes/io]_disp because it is the size that is actually trimmed. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250417132054.2866409-4-wozizhi@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05blk-throttle: Delete unnecessary carryover-related fields from throtl_grpZizhi Wo
We no longer need carryover_[bytes/ios] in tg, so it is removed. The related comments about carryover in tg are also merged into [bytes/io]_disp, and modify other related comments. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250417132054.2866409-3-wozizhi@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05blk-throttle: Fix wrong tg->[bytes/io]_disp update in __tg_update_carryover()Zizhi Wo
In commit 6cc477c36875 ("blk-throttle: carry over directly"), the carryover bytes/ios was be carried to [bytes/io]_disp. However, its update mechanism has some issues. In __tg_update_carryover(), we calculate "bytes" and "ios" to represent the carryover, but the computation when updating [bytes/io]_disp is incorrect. And if the sq->nr_queued is empty, we may not update tg->[bytes/io]_disp to 0 in tg_update_carryover(). We should set it to 0 in non carryover case. This patch fixes the issue. Fixes: 6cc477c36875 ("blk-throttle: carry over directly") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250417132054.2866409-2-wozizhi@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05Merge branch 'gre-reapply-ipv6-link-local-address-generation-fix'Jakub Kicinski
Guillaume Nault says: ==================== gre: Reapply IPv6 link-local address generation fix. Reintroduce the IPv6 link-local address generation fix for GRE and its kernel selftest. These patches were introduced by merge commit b3fc5927de4b ("Merge branch 'gre-fix-regressions-in-ipv6-link-local-address-generation'") but have been reverted by commit 8417db0be5bb ("Merge branch 'gre-revert-ipv6-link-local-address-fix'"), because it uncovered another bug in multipath routing. Now that this bug has been investigated and fixed, we can apply the GRE link-local address fix and its kernel selftest again. For convenience, here's the original cover letter: IPv6 link-local address generation has some special cases for GRE devices. This has led to several regressions in the past, and some of them are still not fixed. This series fixes the remaining problems, like the ipv6.conf.<dev>.addr_gen_mode sysctl being ignored and the router discovery process not being started (see details in patch 1). To avoid any further regressions, patch 2 adds selftests covering IPv4 and IPv6 gre/gretap devices with all combinations of currently supported addr_gen_mode values. ==================== Link: https://patch.msgid.link/cover.1746225213.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05selftests: Add IPv6 link-local address generation tests for GRE devices.Guillaume Nault
GRE devices have their special code for IPv6 link-local address generation that has been the source of several regressions in the past. Add selftest to check that all gre, ip6gre, gretap and ip6gretap get an IPv6 link-link local address in accordance with the net.ipv6.conf.<dev>.addr_gen_mode sysctl. Note: This patch was originally applied as commit 6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices."). However, it was then reverted by commit 355d940f4d5a ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."") because the commit it depended on was going to be reverted. Now that the situation is resolved, we can add this selftest again (no changes since original patch, appart from context update in tools/testing/selftests/net/Makefile). Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/2c3a5733cb3a6e3119504361a9b9f89fda570a2d.1746225214.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05gre: Fix again IPv6 link-local address generation.Guillaume Nault
Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE devices in most cases and fall back to using add_v4_addrs() only in case the GRE configuration is incompatible with addrconf_addr_gen(). GRE used to use addrconf_addr_gen() until commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") restricted this use to gretap and ip6gretap devices, and created add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones. The original problem came when commit 9af28511be10 ("addrconf: refuse isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its addr parameter was 0. The commit says that this would create an invalid address, however, I couldn't find any RFC saying that the generated interface identifier would be wrong. Anyway, since gre over IPv4 devices pass their local tunnel address to __ipv6_isatap_ifid(), that commit broke their IPv6 link-local address generation when the local address was unspecified. Then commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") tried to fix that case by defining add_v4_addrs() and calling it to generate the IPv6 link-local address instead of using addrconf_addr_gen() (apart for gretap and ip6gretap devices, which would still use the regular addrconf_addr_gen(), since they have a MAC address). That broke several use cases because add_v4_addrs() isn't properly integrated into the rest of IPv6 Neighbor Discovery code. Several of these shortcomings have been fixed over time, but add_v4_addrs() remains broken on several aspects. In particular, it doesn't send any Router Sollicitations, so the SLAAC process doesn't start until the interface receives a Router Advertisement. Also, add_v4_addrs() mostly ignores the address generation mode of the interface (/proc/sys/net/ipv6/conf/*/addr_gen_mode), thus breaking the IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases. Fix the situation by using add_v4_addrs() only in the specific scenario where the normal method would fail. That is, for interfaces that have all of the following characteristics: * run over IPv4, * transport IP packets directly, not Ethernet (that is, not gretap interfaces), * tunnel endpoint is INADDR_ANY (that is, 0), * device address generation mode is EUI64. In all other cases, revert back to the regular addrconf_addr_gen(). Also, remove the special case for ip6gre interfaces in add_v4_addrs(), since ip6gre devices now always use addrconf_addr_gen() instead. Note: This patch was originally applied as commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation."). However, it was then reverted by commit fc486c2d060f ("Revert "gre: Fix IPv6 link-local address generation."") because it uncovered another bug that ended up breaking net/forwarding/ip6gre_custom_multipath_hash.sh. That other bug has now been fixed by commit 4d0ab3a6885e ("ipv6: Start path selection from the first nexthop"). Therefore we can now revive this GRE patch (no changes since original commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation."). Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/a88cc5c4811af36007645d610c95102dccb360a6.1746225214.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05dt-bindings: net: ethernet-controller: Add informative text about RGMII delaysAndrew Lunn
Device Tree and Ethernet MAC driver writers often misunderstand RGMII delays. Rewrite the Normative section in terms of the PCB, is the PCB adding the 2ns delay. This meaning was previous implied by the definition, but often wrongly interpreted due to the ambiguous wording and looking at the definition from the wrong perspective. The new definition concentrates clearly on the hardware, and should be less ambiguous. Add an Informative section to the end of the binding describing in detail what the four RGMII delays mean. This expands on just the PCB meaning, adding in the implications for the MAC and PHY. Additionally, when the MAC or PHY needs to add a delay, which is software configuration, describe how Linux does this, in the hope of reducing errors. Make it clear other users of device tree binding may implement the software configuration in other ways while still conforming to the binding. Fixes: 9d3de3c58347 ("dt-bindings: net: Add YAML schemas for the generic Ethernet options") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250430-v6-15-rc3-net-rgmii-delays-v2-1-099ae651d5e5@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05selftests: ublk: kublk: fix include pathUday Shankar
Building kublk currently fails (with a "could not find linux/ublk_cmd.h" error message) if kernel headers are not installed in a system-global location (i.e. somewhere in the compiler's default include search path). This failure is unnecessary, as make kselftest installs kernel headers in the build tree - kublk's build just isn't looking for them properly. There is an include path in kublk's CFLAGS which is probably intended to find the kernel headers installed in the build tree; fix it so that it can actually find them. This introduces some macro redefinition issues between glibc-provided headers and kernel headers; fix those by eliminating one include in kublk. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250429-ublk_selftests-v2-3-e970b6d9e4f4@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05selftests: ublk: make test_generic_06 silent on successUday Shankar
Convention dictates that tests should not log anything on success. Make test_generic_06 follow this convention. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250429-ublk_selftests-v2-2-e970b6d9e4f4@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05selftests: ublk: kublk: build with -Werror iff WERROR!=0Uday Shankar
Compiler warnings can catch bugs at compile time; thus, heeding them is usually a good idea. Turn warnings into errors by default for the kublk build so that anyone making changes is forced to heed them. Compiler warnings can also sometimes produce annoying false positives, so provide a flag WERROR that the developer can use as follows to have the build and selftests run go through even if there are warnings: make WERROR=0 TARGETS=ublk kselftest Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250429-ublk_selftests-v2-1-e970b6d9e4f4@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06i2c: omap: fix deprecated of_property_read_bool() useJohan Hovold
Using of_property_read_bool() for non-boolean properties is deprecated and results in a warning during runtime since commit c141ecc3cecd ("of: Warn when of_property_read_bool() is used on non-boolean properties"). Fixes: b6ef830c60b6 ("i2c: omap: Add support for setting mux") Cc: Jayesh Choudhary <j-choudhary@ti.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com> Link: https://lore.kernel.org/r/20250415075230.16235-1-johan+linaro@kernel.org Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-05virtio-net: free xsk_buffs on error in virtnet_xsk_pool_enable()Jakub Kicinski
The selftests added to our CI by Bui Quang Minh recently reveals that there is a mem leak on the error path of virtnet_xsk_pool_enable(): unreferenced object 0xffff88800a68a000 (size 2048): comm "xdp_helper", pid 318, jiffies 4294692778 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): __kvmalloc_node_noprof+0x402/0x570 virtnet_xsk_pool_enable+0x293/0x6a0 (drivers/net/virtio_net.c:5882) xp_assign_dev+0x369/0x670 (net/xdp/xsk_buff_pool.c:226) xsk_bind+0x6a5/0x1ae0 __sys_bind+0x15e/0x230 __x64_sys_bind+0x72/0xb0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Acked-by: Jason Wang <jasowang@redhat.com> Fixes: e9f3962441c0 ("virtio_net: xsk: rx: support fill with xsk buffer") Link: https://patch.msgid.link/20250430163836.3029761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05virtio-net: don't re-enable refill work too early when NAPI is disabledJakub Kicinski
Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx") fixed a deadlock between reconfig paths and refill work trying to disable the same NAPI instance. The refill work can't run in parallel with reconfig because trying to double-disable a NAPI instance causes a stall under the instance lock, which the reconfig path needs to re-enable the NAPI and therefore unblock the stalled thread. There are two cases where we re-enable refill too early. One is in the virtnet_set_queues() handler. We call it when installing XDP: virtnet_rx_pause_all(vi); ... virtnet_napi_tx_disable(..); ... virtnet_set_queues(..); ... virtnet_rx_resume_all(..); We want the work to be disabled until we call virtnet_rx_resume_all(), but virtnet_set_queues() kicks it before NAPIs were re-enabled. The other case is a more trivial case of mis-ordering in __virtnet_rx_resume() found by code inspection. Taking the spin lock in virtnet_set_queues() (requested during review) may be unnecessary as we are under rtnl_lock and so are all paths writing to ->refill_enabled. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx") Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Link: https://patch.msgid.link/20250430163758.3029367-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05Merge branch 'net_sched-fix-a-regression-in-sch_htb'Jakub Kicinski
Cong Wang says: ==================== net_sched: fix a regression in sch_htb This patchset contains a fix for the regression reported by Alan and a selftest to cover that case. Please see each patch description for more details. ==================== Link: https://patch.msgid.link/20250428232955.1740419-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05selftests/tc-testing: Add a test case to cover basic HTB+FQ_CODEL caseCong Wang
Integrate the reproducer from Alan into TC selftests and use scapy to generate TCP traffic instead of relying on ping command. Cc: Alan J. Wylie <alan@wylie.me.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250428232955.1740419-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05sch_htb: make htb_deactivate() idempotentCong Wang
Alan reported a NULL pointer dereference in htb_next_rb_node() after we made htb_qlen_notify() idempotent. It turns out in the following case it introduced some regression: htb_dequeue_tree(): |-> fq_codel_dequeue() |-> qdisc_tree_reduce_backlog() |-> htb_qlen_notify() |-> htb_deactivate() |-> htb_next_rb_node() |-> htb_deactivate() For htb_next_rb_node(), after calling the 1st htb_deactivate(), the clprio[prio]->ptr could be already set to NULL, which means htb_next_rb_node() is vulnerable here. For htb_deactivate(), although we checked qlen before calling it, in case of qlen==0 after qdisc_tree_reduce_backlog(), we may call it again which triggers the warning inside. To fix the issues here, we need to: 1) Make htb_deactivate() idempotent, that is, simply return if we already call it before. 2) Make htb_next_rb_node() safe against ptr==NULL. Many thanks to Alan for testing and for the reproducer. Fixes: 5ba8b837b522 ("sch_htb: make htb_qlen_notify() idempotent") Reported-by: Alan J. Wylie <alan@wylie.me.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250428232955.1740419-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05bcachefs: Call bch2_fs_start before getting vfs superblockKent Overstreet
This reverts 1fdbe0b184c8 bcachefs: Make sure c->vfs_sb is set before starting fs switched up bch2_fs_get_tree() so that we got a superblock before calling bch2_fs_start, so that c->vfs_sb would always be initialized while the filesystem was active. This turned out not to be necessary, because blk_holder_ops were implemented using our own locking, not vfs locking. And this had the side effect of creating a super_block and doing our full recovery (including potentially fsck) before setting SB_BORN, which causes things like sync calls to hang until our recovery is finished. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05mm: remove NR_BOUNCE zone statChristoph Hellwig
The stat is always 0 now, so remove it and hardwire the user visible output to 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05block: remove bounce buffering supportChristoph Hellwig
The block layer bounce buffering support is unused now, remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05scsi: remove the no_highmem flag in the hostChristoph Hellwig
All users are gone now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05usb-storage: reject probe of device one non-DMA HCDs when using highmemChristoph Hellwig
usb-storage is the last user of the block layer bounce buffering now, and only uses it for HCDs that do not support DMA on highmem configs. Remove this support and fail the probe so that the block layer bounce buffering can go away. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05scsi: make ppa depend on !HIGHMEMChristoph Hellwig
This is one of the last drivers depending on the block layer bounce buffering code. Restrict it to run on non-highmem configs so that the bounce buffering code can be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05scsi: make imm depend on !HIGHMEMChristoph Hellwig
This is one of the last drivers depending on the block layer bounce buffering code. Restrict it to run on non-highmem configs so that the bounce buffering code can be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05scsi: make aha152x depend on !HIGHMEMChristoph Hellwig
This is one of the last drivers depending on the block layer bounce buffering code. Restrict it to run on non-highmem configs so that the bounce buffering code can be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250505081138.3435992-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05KVM: arm64: selftest: Don't try to disable AArch64 supportMarc Zyngier
Trying to cut the branch you are sat on is pretty dumb. And so is trying to disable the instruction set you are executing on. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114117.3618800-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05KVM: arm64: Prevent userspace from disabling AArch64 support at any ↵Marc Zyngier
virtualisable EL A sorry excuse for a selftest is trying to disable AArch64 support. And yes, this goes as well as you can imagine. Let's forbid this sort of things. Normal userspace shouldn't get caught doing that. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114117.3618800-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE modeMarc Zyngier
We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()Sebastian Ott
Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the initialization of the local memcache variable in user_mem_abort() conditional, leaving a codepath where it is used uninitialized via kvm_pgtable_stage2_map(). This can fail on any path that requires a stage-2 allocation without transition via a permission fault or dirty logging. Fix this by making sure that memcache is always valid. Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvmarm/3f5db4c7-ccce-fb95-595c-692fa7aad227@redhat.com/ Link: https://lore.kernel.org/r/20250505173148.33900-1-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05bcachefs: fix hung task timeout in journal readKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05bcachefs: Add missing barriers before wake_up_bit()Kent Overstreet
wake_up() doesn't require a barrier - but wake_up_bit() does. This only affected non x86, and primarily lead to lost wakeups after btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05bcachefs: Ensure proper write alignmentKent Overstreet
There was a buggy version of bcachefs-tools which picked misaligned bucket sizes when formatting, and we're also about to do dynamic block sizes - which will allow picking logical block size or physical block size of the device per-write, allowing for better compression ratios at the cost of slightly worse write performance (i.e. forcing the device to do RMW or extra buffering). To account for this, tweak bch2_alloc_sectors_start() to properly align open_buckets to the blocksize of the write we're about to do. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05bcachefs: Improve want_cached_ptr()Kent Overstreet
If promote target isn't set, rebalance should still leave a cached copy on the faster device. Fall back to foreground_target if it's set, or allow a cached copy on any device if neither are set. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05x86/alternative: Remove unused header #definesJuergen Gross
Remove some unfortunately-named unused macros which could potentially result in weird build failures. Fortunately, they are under an #ifdef __ASSEMBLER__ which has kept them from causing problems so far. [ dhansen: subject and changelog tweaks ] Fixes: 1a6ade825079 ("x86/alternative: Convert the asm ALTERNATIVE_3() macro") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250505131646.29288-1-jgross%40suse.com
2025-05-05Merge tag 'uml-for-linux-6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull uml fix from Johannes Berg: "There's just a single fix here for the _nofault changes that were causing issues with clang, and then when we looked at it some other issues seemed to exist" * tag 'uml-for-linux-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: fix _nofault accesses
2025-05-05Merge tag 'soc-fixes-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The main changes are once more for the NXP i.MX platform, addressing multiple regressions in recent devicetree updates for the i.MX8MM and i.MX6ULL SoCs, a PCIe fix for i.MX9 and a MAINTAINERS file update to disambiguate NXP i.MX SoCs from Sony IMX image sensors. The stm32 platform devicetree files get some compatibility fixes for the interrupt controller node. Another compatibility fix is done for the Arm Morello platform's cache controller node. The code changes are all for firmware drivers, fixing kernel-side bugs on the Arm FF-A and SCMI drivers" * tag 'soc-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: st: Use 128kB size for aliased GIC400 register access on stm32mp23 SoCs arm64: dts: st: Adjust interrupt-controller for stm32mp23 SoCs arm64: dts: st: Use 128kB size for aliased GIC400 register access on stm32mp21 SoCs arm64: dts: st: Adjust interrupt-controller for stm32mp21 SoCs arm64: dts: st: Use 128kB size for aliased GIC400 register access on stm32mp25 SoCs arm64: dts: st: Adjust interrupt-controller for stm32mp25 SoCs arm64: dts: imx8mm-verdin: Link reg_usdhc2_vqmmc to usdhc2 MAINTAINERS: add exclude for dt-bindings to imx entry ARM: dts: opos6ul: add ksz8081 phy properties arm64: dts: imx95: Correct the range of PCIe app-reg region arm64: dts: imx8mp: configure GPU and NPU clocks in nominal DTSI arm64: dts: morello: Fix-up cache nodes firmware: arm_ffa: Skip Rx buffer ownership release if not acquired firmware: arm_scmi: Fix timeout checks on polling path firmware: arm_scmi: Balance device refcount when destroying devices
2025-05-05xhci: dbc: Avoid event polling busyloop if pending rx transfers are inactive.Mathias Nyman
Event polling delay is set to 0 if there are any pending requests in either rx or tx requests lists. Checking for pending requests does not work well for "IN" transfers as the tty driver always queues requests to the list and TRBs to the ring, preparing to receive data from the host. This causes unnecessary busylooping and cpu hogging. Only set the event polling delay to 0 if there are pending tx "write" transfers, or if it was less than 10ms since last active data transfer in any direction. Cc: Łukasz Bartosik <ukaszb@chromium.org> Fixes: fb18e5bb9660 ("xhci: dbc: poll at different rate depending on data transfer activity") Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250505125630.561699-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-05usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeueMichal Pecio
VIA VL805 doesn't bother updating the EP Context cycle bit when the endpoint halts. This is seen by patching xhci_move_dequeue_past_td() to print the cycle bits of the EP Context and the TRB at hw_dequeue and then disconnecting a flash drive while reading it. Actual cycle state is random as expected, but the EP Context bit is always 1. This means that the cycle state produced by this function is wrong half the time, and then the endpoint stops working. Work around it by looking at the cycle bit of TD's end_trb instead of believing the Endpoint or Stream Context. Specifically: - rename cycle_found to hw_dequeue_found to avoid confusion - initialize new_cycle from td->end_trb instead of hw_dequeue - switch new_cycle toggling to happen after end_trb is found Now a workload which regularly stalls the device works normally for a few hours and clearly demonstrates the HW bug - the EP Context bit is not updated in a new cycle until Set TR Dequeue overwrites it: [ +0,000298] sd 10:0:0:0: [sdc] Attached SCSI disk [ +0,011758] cycle bits: TRB 1 EP Ctx 1 [ +5,947138] cycle bits: TRB 1 EP Ctx 1 [ +0,065731] cycle bits: TRB 0 EP Ctx 1 [ +0,064022] cycle bits: TRB 0 EP Ctx 0 [ +0,063297] cycle bits: TRB 0 EP Ctx 0 [ +0,069823] cycle bits: TRB 0 EP Ctx 0 [ +0,063390] cycle bits: TRB 1 EP Ctx 0 [ +0,063064] cycle bits: TRB 1 EP Ctx 1 [ +0,062293] cycle bits: TRB 1 EP Ctx 1 [ +0,066087] cycle bits: TRB 0 EP Ctx 1 [ +0,063636] cycle bits: TRB 0 EP Ctx 0 [ +0,066360] cycle bits: TRB 0 EP Ctx 0 Also tested on the buggy ASM1042 which moves EP Context dequeue to the next TRB after errors, one problem case addressed by the rework that implemented this loop. In this case hw_dequeue can be enqueue, so simply picking the cycle bit of TRB at hw_dequeue wouldn't work. Commit 5255660b208a ("xhci: add quirk for host controllers that don't update endpoint DCS") tried to solve the stale cycle problem, but it was more complex and got reverted due to a reported issue. Cc: Jonathan Bell <jonathan@raspberrypi.org> Cc: Oliver Neukum <oneukum@suse.com> Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250505125630.561699-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-05x86/CPU/AMD: Print the reason for the last resetYazen Ghannam
The following register contains bits that indicate the cause for the previous reset. PMx000000C0 (FCH::PM::S5_RESET_STATUS) This is useful for debug. The reasons for reset are broken into 6 high level categories. Decode it by category and print during boot. Specifics within a category are split off into debugging documentation. The register is accessed indirectly through a "PM" port in the FCH. Use MMIO access in order to avoid restrictions with legacy port access. Use a late_initcall() to ensure that MMIO has been set up before trying to access the register. This register was introduced with AMD Family 17h, so avoid access on older families. There is no CPUID feature bit for this register. [ bp: Simplify the reason dumping loop. - merge a fix to not access an array element after the last one: https://lore.kernel.org/r/20250505133609.83933-1-superm1@kernel.org Reported-by: James Dutton <james.dutton@gmail.com> ] [ mingo: - Use consistent .rst formatting - Fix 'Sleep' class field to 'ACPI-State' - Standardize pin messages around the 'tripped' verbiage - Remove reference to ring-buffer printing & simplify the wording - Use curly braces for multi-line conditional statements ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250422234830.2840784-6-superm1@kernel.org
2025-05-05s390/mm: Fix potential use-after-free in __crst_table_upgrade()Heiko Carstens
The pointer to the mm_struct which is passed to __crst_table_upgrade() may only be dereferenced if it is identical to current->active_mm. Otherwise the current task has no reference to the mm_struct and it may already be freed. In such a case this would result in a use-after-free bug. Make sure this use-after-free scenario does not happen by moving the code, which dereferences the mm_struct pointer, after the check which verifies that the pointer is identical to current->active_mm, like it was before lazy ASCE handling was reimplemented. Fixes: 8b72f5a97b82 ("s390/mm: Reimplement lazy ASCE handling") Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05s390/mm: Add mmap_assert_write_locked() check to crst_table_upgrade()Heiko Carstens
Add mmap_assert_write_locked() check to crst_table_upgrade() in order to verify that no concurrent page table upgrades of an mm can happen. This allows to remove the VM_BUG_ON() check which checks for the potential inconsistent result of concurrent updates. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>