linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-05-05	dt-bindings: net: ethernet-controller: Add informative text about RGMII delays	Andrew Lunn
	Device Tree and Ethernet MAC driver writers often misunderstand RGMII delays. Rewrite the Normative section in terms of the PCB, is the PCB adding the 2ns delay. This meaning was previous implied by the definition, but often wrongly interpreted due to the ambiguous wording and looking at the definition from the wrong perspective. The new definition concentrates clearly on the hardware, and should be less ambiguous. Add an Informative section to the end of the binding describing in detail what the four RGMII delays mean. This expands on just the PCB meaning, adding in the implications for the MAC and PHY. Additionally, when the MAC or PHY needs to add a delay, which is software configuration, describe how Linux does this, in the hope of reducing errors. Make it clear other users of device tree binding may implement the software configuration in other ways while still conforming to the binding. Fixes: 9d3de3c58347 ("dt-bindings: net: Add YAML schemas for the generic Ethernet options") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250430-v6-15-rc3-net-rgmii-delays-v2-1-099ae651d5e5@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05	strparser: Remove unused __strp_unpause	Dr. David Alan Gilbert
	The last use of __strp_unpause() was removed in 2022 by commit 84c61fe1a75b ("tls: rx: do not use the standard strparser") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250501002402.308843-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06	i2c: omap: fix deprecated of_property_read_bool() use	Johan Hovold
	Using of_property_read_bool() for non-boolean properties is deprecated and results in a warning during runtime since commit c141ecc3cecd ("of: Warn when of_property_read_bool() is used on non-boolean properties"). Fixes: b6ef830c60b6 ("i2c: omap: Add support for setting mux") Cc: Jayesh Choudhary <j-choudhary@ti.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com> Link: https://lore.kernel.org/r/20250415075230.16235-1-johan+linaro@kernel.org Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-05	virtio-net: free xsk_buffs on error in virtnet_xsk_pool_enable()	Jakub Kicinski
	The selftests added to our CI by Bui Quang Minh recently reveals that there is a mem leak on the error path of virtnet_xsk_pool_enable(): unreferenced object 0xffff88800a68a000 (size 2048): comm "xdp_helper", pid 318, jiffies 4294692778 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): __kvmalloc_node_noprof+0x402/0x570 virtnet_xsk_pool_enable+0x293/0x6a0 (drivers/net/virtio_net.c:5882) xp_assign_dev+0x369/0x670 (net/xdp/xsk_buff_pool.c:226) xsk_bind+0x6a5/0x1ae0 __sys_bind+0x15e/0x230 __x64_sys_bind+0x72/0xb0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Acked-by: Jason Wang <jasowang@redhat.com> Fixes: e9f3962441c0 ("virtio_net: xsk: rx: support fill with xsk buffer") Link: https://patch.msgid.link/20250430163836.3029761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05	virtio-net: don't re-enable refill work too early when NAPI is disabled	Jakub Kicinski
	Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx") fixed a deadlock between reconfig paths and refill work trying to disable the same NAPI instance. The refill work can't run in parallel with reconfig because trying to double-disable a NAPI instance causes a stall under the instance lock, which the reconfig path needs to re-enable the NAPI and therefore unblock the stalled thread. There are two cases where we re-enable refill too early. One is in the virtnet_set_queues() handler. We call it when installing XDP: virtnet_rx_pause_all(vi); ... virtnet_napi_tx_disable(..); ... virtnet_set_queues(..); ... virtnet_rx_resume_all(..); We want the work to be disabled until we call virtnet_rx_resume_all(), but virtnet_set_queues() kicks it before NAPIs were re-enabled. The other case is a more trivial case of mis-ordering in __virtnet_rx_resume() found by code inspection. Taking the spin lock in virtnet_set_queues() (requested during review) may be unnecessary as we are under rtnl_lock and so are all paths writing to ->refill_enabled. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx") Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Link: https://patch.msgid.link/20250430163758.3029367-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05	Merge branch 'net_sched-fix-a-regression-in-sch_htb'	Jakub Kicinski
	Cong Wang says: ==================== net_sched: fix a regression in sch_htb This patchset contains a fix for the regression reported by Alan and a selftest to cover that case. Please see each patch description for more details. ==================== Link: https://patch.msgid.link/20250428232955.1740419-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05	selftests/tc-testing: Add a test case to cover basic HTB+FQ_CODEL case	Cong Wang
	Integrate the reproducer from Alan into TC selftests and use scapy to generate TCP traffic instead of relying on ping command. Cc: Alan J. Wylie <alan@wylie.me.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250428232955.1740419-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05	sch_htb: make htb_deactivate() idempotent	Cong Wang
	Alan reported a NULL pointer dereference in htb_next_rb_node() after we made htb_qlen_notify() idempotent. It turns out in the following case it introduced some regression: htb_dequeue_tree(): \|-> fq_codel_dequeue() \|-> qdisc_tree_reduce_backlog() \|-> htb_qlen_notify() \|-> htb_deactivate() \|-> htb_next_rb_node() \|-> htb_deactivate() For htb_next_rb_node(), after calling the 1st htb_deactivate(), the clprio[prio]->ptr could be already set to NULL, which means htb_next_rb_node() is vulnerable here. For htb_deactivate(), although we checked qlen before calling it, in case of qlen==0 after qdisc_tree_reduce_backlog(), we may call it again which triggers the warning inside. To fix the issues here, we need to: 1) Make htb_deactivate() idempotent, that is, simply return if we already call it before. 2) Make htb_next_rb_node() safe against ptr==NULL. Many thanks to Alan for testing and for the reproducer. Fixes: 5ba8b837b522 ("sch_htb: make htb_qlen_notify() idempotent") Reported-by: Alan J. Wylie <alan@wylie.me.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250428232955.1740419-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-05-02 We've added 14 non-merge commits during the last 10 day(s) which contain a total of 13 files changed, 740 insertions(+), 121 deletions(-). The main changes are: 1) Avoid skipping or repeating a sk when using a UDP bpf_iter, from Jordan Rife. 2) Fixed a crash when a bpf qdisc is set in the net.core.default_qdisc, from Amery Hung. 3) A few other fixes in the bpf qdisc, from Amery Hung. - Always call qdisc_watchdog_init() in the .init prologue such that the .reset/.destroy epilogue can always call qdisc_watchdog_cancel() without issue. - bpf_qdisc_init_prologue() was incorrectly returning an error when the bpf qdisc is set as the default_qdisc and the mq is creating the default_qdisc. It is now fixed. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Cleanup bpf qdisc selftests selftests/bpf: Test attaching a bpf qdisc with incomplete operators bpf: net_sched: Make some Qdisc_ops ops mandatory selftests/bpf: Test setting and creating bpf qdisc as default qdisc bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc selftests/bpf: Add tests for bucket resume logic in UDP socket iterators selftests/bpf: Return socket cookies from sock_iter_batch progs bpf: udp: Avoid socket skips and repeats during iteration bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items bpf: udp: Get rid of st_bucket_done bpf: udp: Make sure iter->batch always contains a full bucket snapshot bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch bpf: net_sched: Fix using bpf qdisc as default qdisc selftests/bpf: Fix compilation errors ==================== Link: https://patch.msgid.link/20250503010755.4030524-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05	bcachefs: Call bch2_fs_start before getting vfs superblock	Kent Overstreet
	This reverts 1fdbe0b184c8 bcachefs: Make sure c->vfs_sb is set before starting fs switched up bch2_fs_get_tree() so that we got a superblock before calling bch2_fs_start, so that c->vfs_sb would always be initialized while the filesystem was active. This turned out not to be necessary, because blk_holder_ops were implemented using our own locking, not vfs locking. And this had the side effect of creating a super_block and doing our full recovery (including potentially fsck) before setting SB_BORN, which causes things like sync calls to hang until our recovery is finished. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05	KVM: arm64: selftest: Don't try to disable AArch64 support	Marc Zyngier
	Trying to cut the branch you are sat on is pretty dumb. And so is trying to disable the instruction set you are executing on. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114117.3618800-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05	KVM: arm64: Prevent userspace from disabling AArch64 support at any ↵	Marc Zyngier
	virtualisable EL A sorry excuse for a selftest is trying to disable AArch64 support. And yes, this goes as well as you can imagine. Let's forbid this sort of things. Normal userspace shouldn't get caught doing that. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114117.3618800-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05	KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode	Marc Zyngier
	We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05	KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()	Sebastian Ott
	Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the initialization of the local memcache variable in user_mem_abort() conditional, leaving a codepath where it is used uninitialized via kvm_pgtable_stage2_map(). This can fail on any path that requires a stage-2 allocation without transition via a permission fault or dirty logging. Fix this by making sure that memcache is always valid. Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvmarm/3f5db4c7-ccce-fb95-595c-692fa7aad227@redhat.com/ Link: https://lore.kernel.org/r/20250505173148.33900-1-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05	bcachefs: fix hung task timeout in journal read	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05	bcachefs: Add missing barriers before wake_up_bit()	Kent Overstreet
	wake_up() doesn't require a barrier - but wake_up_bit() does. This only affected non x86, and primarily lead to lost wakeups after btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05	bcachefs: Ensure proper write alignment	Kent Overstreet
	There was a buggy version of bcachefs-tools which picked misaligned bucket sizes when formatting, and we're also about to do dynamic block sizes - which will allow picking logical block size or physical block size of the device per-write, allowing for better compression ratios at the cost of slightly worse write performance (i.e. forcing the device to do RMW or extra buffering). To account for this, tweak bch2_alloc_sectors_start() to properly align open_buckets to the blocksize of the write we're about to do. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05	bcachefs: Improve want_cached_ptr()	Kent Overstreet
	If promote target isn't set, rebalance should still leave a cached copy on the faster device. Fall back to foreground_target if it's set, or allow a cached copy on any device if neither are set. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-05	Merge tag 'uml-for-linux-6.15-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull uml fix from Johannes Berg: "There's just a single fix here for the _nofault changes that were causing issues with clang, and then when we looked at it some other issues seemed to exist" * tag 'uml-for-linux-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: fix _nofault accesses
2025-05-05	Merge tag 'soc-fixes-6.15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The main changes are once more for the NXP i.MX platform, addressing multiple regressions in recent devicetree updates for the i.MX8MM and i.MX6ULL SoCs, a PCIe fix for i.MX9 and a MAINTAINERS file update to disambiguate NXP i.MX SoCs from Sony IMX image sensors. The stm32 platform devicetree files get some compatibility fixes for the interrupt controller node. Another compatibility fix is done for the Arm Morello platform's cache controller node. The code changes are all for firmware drivers, fixing kernel-side bugs on the Arm FF-A and SCMI drivers" * tag 'soc-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: st: Use 128kB size for aliased GIC400 register access on stm32mp23 SoCs arm64: dts: st: Adjust interrupt-controller for stm32mp23 SoCs arm64: dts: st: Use 128kB size for aliased GIC400 register access on stm32mp21 SoCs arm64: dts: st: Adjust interrupt-controller for stm32mp21 SoCs arm64: dts: st: Use 128kB size for aliased GIC400 register access on stm32mp25 SoCs arm64: dts: st: Adjust interrupt-controller for stm32mp25 SoCs arm64: dts: imx8mm-verdin: Link reg_usdhc2_vqmmc to usdhc2 MAINTAINERS: add exclude for dt-bindings to imx entry ARM: dts: opos6ul: add ksz8081 phy properties arm64: dts: imx95: Correct the range of PCIe app-reg region arm64: dts: imx8mp: configure GPU and NPU clocks in nominal DTSI arm64: dts: morello: Fix-up cache nodes firmware: arm_ffa: Skip Rx buffer ownership release if not acquired firmware: arm_scmi: Fix timeout checks on polling path firmware: arm_scmi: Balance device refcount when destroying devices
2025-05-05	xhci: dbc: Avoid event polling busyloop if pending rx transfers are inactive.	Mathias Nyman
	Event polling delay is set to 0 if there are any pending requests in either rx or tx requests lists. Checking for pending requests does not work well for "IN" transfers as the tty driver always queues requests to the list and TRBs to the ring, preparing to receive data from the host. This causes unnecessary busylooping and cpu hogging. Only set the event polling delay to 0 if there are pending tx "write" transfers, or if it was less than 10ms since last active data transfer in any direction. Cc: Łukasz Bartosik <ukaszb@chromium.org> Fixes: fb18e5bb9660 ("xhci: dbc: poll at different rate depending on data transfer activity") Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250505125630.561699-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-05	usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue	Michal Pecio
	VIA VL805 doesn't bother updating the EP Context cycle bit when the endpoint halts. This is seen by patching xhci_move_dequeue_past_td() to print the cycle bits of the EP Context and the TRB at hw_dequeue and then disconnecting a flash drive while reading it. Actual cycle state is random as expected, but the EP Context bit is always 1. This means that the cycle state produced by this function is wrong half the time, and then the endpoint stops working. Work around it by looking at the cycle bit of TD's end_trb instead of believing the Endpoint or Stream Context. Specifically: - rename cycle_found to hw_dequeue_found to avoid confusion - initialize new_cycle from td->end_trb instead of hw_dequeue - switch new_cycle toggling to happen after end_trb is found Now a workload which regularly stalls the device works normally for a few hours and clearly demonstrates the HW bug - the EP Context bit is not updated in a new cycle until Set TR Dequeue overwrites it: [ +0,000298] sd 10:0:0:0: [sdc] Attached SCSI disk [ +0,011758] cycle bits: TRB 1 EP Ctx 1 [ +5,947138] cycle bits: TRB 1 EP Ctx 1 [ +0,065731] cycle bits: TRB 0 EP Ctx 1 [ +0,064022] cycle bits: TRB 0 EP Ctx 0 [ +0,063297] cycle bits: TRB 0 EP Ctx 0 [ +0,069823] cycle bits: TRB 0 EP Ctx 0 [ +0,063390] cycle bits: TRB 1 EP Ctx 0 [ +0,063064] cycle bits: TRB 1 EP Ctx 1 [ +0,062293] cycle bits: TRB 1 EP Ctx 1 [ +0,066087] cycle bits: TRB 0 EP Ctx 1 [ +0,063636] cycle bits: TRB 0 EP Ctx 0 [ +0,066360] cycle bits: TRB 0 EP Ctx 0 Also tested on the buggy ASM1042 which moves EP Context dequeue to the next TRB after errors, one problem case addressed by the rework that implemented this loop. In this case hw_dequeue can be enqueue, so simply picking the cycle bit of TRB at hw_dequeue wouldn't work. Commit 5255660b208a ("xhci: add quirk for host controllers that don't update endpoint DCS") tried to solve the stale cycle problem, but it was more complex and got reverted due to a reported issue. Cc: Jonathan Bell <jonathan@raspberrypi.org> Cc: Oliver Neukum <oneukum@suse.com> Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250505125630.561699-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-05	s390: Update defconfigs	Heiko Carstens
	Just the regular update of all defconfigs. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05	s390/dcssblk: Fix build error with CONFIG_DAX=m and CONFIG_DCSSBLK=y	Gerald Schaefer
	After commit 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED support") moved the "select DAX" from config DCSSBLK to the new config DCSSBLK_DAX, randconfig tests could result in build errors like this: s390-linux-ld: drivers/s390/block/dcssblk.o: in function `dcssblk_shared_store': drivers/s390/block/dcssblk.c:417: undefined reference to `kill_dax' s390-linux-ld: drivers/s390/block/dcssblk.c:418: undefined reference to `put_dax' This is because it's now possible to have CONFIG_DCSSBLK=y, but CONFIG_DAX=m. Fix this by adding "depends on DAX \|\| DAX=n" to config DCSSBLK, to make it explicit that we want either no DAX, or the same "y/m" for both config DAX and DCSSBLK, similar to config BLK_DEV_DM. This also requires removing the "select DAX" from config DCSSBLK_DAX, or else there would be a recursive dependency detected. DCSSBLK_DAX is marked as BROKEN at the moment, and won't work well with DAX anyway, so it doesn't really matter if it is selected. Fixes: 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504291604.pvjonhWX-lkp@intel.com/ Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05	s390/entry: Fix last breaking event handling in case of stack corruption	Heiko Carstens
	In case of stack corruption stack_invalid() is called and the expectation is that register r10 contains the last breaking event address. This dependency is quite subtle and broke a couple of years ago without that anybody noticed. Fix this by getting rid of the dependency and read the last breaking event address from lowcore. Fixes: 56e62a737028 ("s390: convert to generic entry") Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05	s390/configs: Enable options required for TC flow offload	Konstantin Shkolnyy
	While testing Open vSwitch with Nvidia ConnectX-6 NIC, it was noticed that it didn't offload TC flows into the NIC, and its log contained many messages such as: "failed to offload flow: No such file or directory: <network device name>" and, upon enabling more versose logging, additionally: "received NAK error=2 - TC classifier not found" The options enabled here are listed as requirements in Nvidia online documentation, among other options that were already enabled. Now all options listed by Nvidia are enabled.. This option is also added because Fedora has it: CONFIG_NET_EMATCH Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05	s390/configs: Enable VDPA on Nvidia ConnectX-6 network card	Konstantin Shkolnyy
	ConnectX-6 is the first VDPA-capable NIC. For earlier NICs, Nvidia implements a VDPA emulation in s/w, which hasn't been validated on s390. Add options necessary for VDPA to work. These options are also added because Fedora has them: CONFIG_VDPA_SIM CONFIG_VDPA_SIM_NET CONFIG_VDPA_SIM_BLOCK CONFIG_VDPA_USER CONFIG_VP_VDPA Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-05	clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()	Sebastian Andrzej Siewior
	On x86 during boot, clockevent_i8253_disable() can be invoked via x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens with enabled interrupts. If some of the old i8253 hardware is actually used then lockdep will notice that i8253_lock is used in hard interrupt context. This causes lockdep to complain because it observed the lock being acquired with interrupts enabled and in hard interrupt context. Make clockevent_i8253_disable() acquire the lock with raw_spinlock_irqsave() to cure this. [ tglx: Massage change log and use guard() ] Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de
2025-05-05	loop: Add sanity check for read/write_iter	Lizhi Xu
	Some file systems do not support read_iter/write_iter, such as selinuxfs in this issue. So before calling them, first confirm that the interface is supported and then call it. It is releavant in that vfs_iter_read/write have the check, and removal of their used caused szybot to be able to hit this issue. Fixes: f2fed441c69b ("loop: stop using vfs_iter__{read,write} for buffered I/O") Reported-by: syzbot+6af973a3b8dfd2faefdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6af973a3b8dfd2faefdc Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250428143626.3318717-1-lizhi.xu@windriver.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05	riscv: misaligned: Add handling for ZCB instructions	Nylon Chen
	Add support for the Zcb extension's compressed half-word instructions (C.LHU, C.LH, and C.SH) in the RISC-V misaligned access trap handler. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Nylon Chen <nylon.chen@sifive.com> Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250411073850.3699180-2-nylon.chen@sifive.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-05-05	arm64: dts: amlogic: dreambox: fix missing clkc_audio node	Christian Hewitt
	Add the clkc_audio node to fix audio support on Dreambox One/Two. Fixes: 83a6f4c62cb1 ("arm64: dts: meson: add initial support for Dreambox One/Two") CC: stable@vger.kernel.org Suggested-by: Emanuel Strobel <emanuel.strobel@yahoo.com> Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/r/20250503084443.3704866-1-christianshewitt@gmail.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-05-05	selftests: netfilter: nft_fib.sh: check lo packets bypass fib lookup	Florian Westphal
	With reverted fix: PASS: fib expression did not cause unwanted packet drops [ 37.285169] ns1-KK76Kt nft_rpfilter: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=32287 DF PROTO=ICMP TYPE=8 CODE=0 ID=1818 SEQ=1 FAIL: rpfilter did drop packets FAIL: ns1-KK76Kt cannot reach 127.0.0.1, ret 0 Check for this. Link: https://lore.kernel.org/netfilter/20250422114352.GA2092@breakpoint.cc/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-05	netfilter: nft_set_pipapo: clamp maximum map bucket size to INT_MAX	Pablo Neira Ayuso
	Otherwise, it is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when resizing hashtable because __GFP_NOWARN is unset. Similar to: b541ba7d1f5a ("netfilter: conntrack: clamp maximum hashtable size to INT_MAX") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-05	netfilter: nft_set_pipapo: prevent overflow in lookup table allocation	Pablo Neira Ayuso
	When calculating the lookup table size, ensure the following multiplication does not overflow: - desc->field_len[] maximum value is U8_MAX multiplied by NFT_PIPAPO_GROUPS_PER_BYTE(f) that can be 2, worst case. - NFT_PIPAPO_BUCKETS(f->bb) is 2^8, worst case. - sizeof(unsigned long), from sizeof(*f->lt), lt in struct nft_pipapo_field. Then, use check_mul_overflow() to multiply by bucket size and then use check_add_overflow() to the alignment for avx2 (if needed). Finally, add lt_size_check_overflow() helper and use it to consolidate this. While at it, replace leftover allocation using the GFP_KERNEL to GFP_KERNEL_ACCOUNT for consistency, in pipapo_resize(). Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-05	netfilter: nf_conntrack: speed up reads from nf_conntrack proc file	Florian Westphal
	Dumping all conntrack entries via proc interface can take hours due to linear search to skip entries dumped so far in each cycle. Apply same strategy used to speed up ipvs proc reading done in commit 178883fd039d ("ipvs: speed up reads from ip_vs_conn proc file") to nf_conntrack. Note that the ctnetlink interface doesn't suffer from this problem, but many scripts depend on the nf_conntrack proc interface. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-05	netfilter: nft_quota: match correctly when the quota just depleted	Zhongqiu Duan
	The xt_quota compares skb length with remaining quota, but the nft_quota compares it with consumed bytes. The xt_quota can match consumed bytes up to quota at maximum. But the nft_quota break match when consumed bytes equal to quota. i.e., nft_quota match consumed bytes in [0, quota - 1], not [0, quota]. Fixes: 795595f68d6c ("netfilter: nft_quota: dump consumed quota") Signed-off-by: Zhongqiu Duan <dzq.aishenghu0@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-05	selftests: netfilter: add conntrack stress test	Florian Westphal
	Add a new test case to check: - conntrack_max limit is effective - conntrack_max limit cannot be exceeded from within a netns - resizing the hash table while packets are inflight works - removal of all conntrack rules disables conntrack in netns - conntrack tool dump (conntrack -L) returns expected number of (unique) entries - procfs interface - if available - has same number of entries as conntrack -L dump Expected output with selftest framework: selftests: net/netfilter: conntrack_resize.sh PASS: got 1 connections: netns conntrack_max is pernet bound PASS: got 100 connections: netns conntrack_max is init_net bound PASS: dump in netns had same entry count (-C 1778, -L 1778, -p 1778, /proc 0) PASS: dump in netns had same entry count (-C 2000, -L 2000, -p 2000, /proc 0) PASS: test parallel conntrack dumps PASS: resize+flood PASS: got 0 connections: conntrack disabled PASS: got 1 connections: conntrack enabled ok 1 selftests: net/netfilter: conntrack_resize.sh Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-05	netfilter: bridge: Move specific fragmented packet to slow_path instead of ↵	Huajian Yang
	dropping it The config NF_CONNTRACK_BRIDGE will change the bridge forwarding for fragmented packets. The original bridge does not know that it is a fragmented packet and forwards it directly, after NF_CONNTRACK_BRIDGE is enabled, function nf_br_ip_fragment and br_ip6_fragment will check the headroom. In original br_forward, insufficient headroom of skb may indeed exist, but there's still a way to save the skb in the device driver after dev_queue_xmit.So droping the skb will change the original bridge forwarding in some cases. Fixes: 3c171f496ef5 ("netfilter: bridge: add connection tracking system") Signed-off-by: Huajian Yang <huajianyang@asrmicro.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-05	drm/i915/slpc: Balance the inc/dec for num_waiters	Vinay Belgaumkar
	As seen in some recent failures, SLPC num_waiters value is < 0. This happens because the inc/dec are not balanced. We should skip decrement for the same conditions as the increment. Currently, we do that for power saving profile mode. This patch also ensures that num_waiters is incremented in the case min_softlimit is at boost freq. It ensures that we don't reduce the frequency while this request is in flight. v2: Add Fixes tags Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13598 Fixes: f864a29afc32 ("drm/i915/slpc: Optmize waitboost for SLPC") Fixes: 4a82ceb04ad4 ("drm/i915/slpc: Add sysfs for SLPC power profiles") Cc: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Sk Anirban <sk.anirban@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250428183555.3250021-1-vinay.belgaumkar@intel.com (cherry picked from commit d26e55085f4b7a63677670db827541209257b313) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-05	x86/microcode: Consolidate the loader enablement checking	Borislav Petkov (AMD)
	Consolidate the whole logic which determines whether the microcode loader should be enabled or not into a single function and call it everywhere. Well, almost everywhere - not in mk_early_pgtbl_32() because there the kernel is running without paging enabled and checking dis_ucode_ldr et al would require physical addresses and uglification of the code. But since this is 32-bit, the easier thing to do is to simply map the initrd unconditionally especially since that mapping is getting removed later anyway by zap_early_initrd_mapping() and avoid the uglification. In doing so, address the issue of old 486er machines without CPUID support, not booting current kernels. [ mingo: Fix no previous prototype for ‘microcode_loader_disabled’ [-Wmissing-prototypes] ] Fixes: 4c585af7180c1 ("x86/boot/32: Temporarily map initrd for microcode loading") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/CANpbe9Wm3z8fy9HbgS8cuhoj0TREYEEkBipDuhgkWFvqX0UoVQ@mail.gmail.com
2025-05-05	um: fix _nofault accesses	Johannes Berg
	Nathan reported [1] that when built with clang, the um kernel crashes pretty much immediately. This turned out to be an issue with the inline assembly I had added, when clang used %rax/%eax for both operands. Reorder it so current->thread.segv_continue is written first, and then the lifetime of _faulted won't have overlap with the lifetime of segv_continue. In the email thread Benjamin also pointed out that current->mm is only NULL for true kernel tasks, but we could do this for a userspace task, so the current->thread.segv_continue logic must be lifted out of the mm==NULL check. Finally, while looking at this, put a barrier() so the NULL assignment to thread.segv_continue cannot be reorder before the possibly faulting operation. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/r/20250402221254.GA384@ax162 [1] Fixes: d1d7f01f7cd3 ("um: mark rodata read-only and implement _nofault accesses") Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-04	Linux 6.15-rc5v6.15-rc5	Linus Torvalds

2025-05-04	Merge tag 'perf-tools-fixes-for-v6.15-2025-05-04' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Just a couple of build fixes on arm64" * tag 'perf-tools-fixes-for-v6.15-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf tools: Fix in-source libperf build perf tools: Fix arm64 build by generating unistd_64.h
2025-05-04	bcachefs: thread_with_stdio: fix spinning instead of exiting	Kent Overstreet
	bch2_stdio_redirect_vprintf() was missing a check for stdio->done, i.e. exiting. This caused the thread attempting to print to spin, and since it was being called from the kthread ran by thread_with_stdio, the userspace side hung as well. Change it to return -EPIPE - i.e. writing to a pipe that's been closed. Reported-by: Jan Solanti <jhs@psonet.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-04	Merge tag 'trace-v6.15-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix read out of bounds bug in tracing_splice_read_pipe() The size of the sub page being read can now be greater than a page. But the buffer used in tracing_splice_read_pipe() only allocates a page size. The data copied to the buffer is the amount in sub buffer which can overflow the buffer. Use min((size_t)trace_seq_used(&iter->seq), PAGE_SIZE) to limit the amount copied to the buffer to a max of PAGE_SIZE. - Fix the test for NULL from "!filter_hash" to "!filter_hash" The add_next_hash() function checked for NULL at the wrong pointer level. - Do not use the array in trace_adjust_address() if there are no elements The trace_adjust_address() finds the offset of a module that was stored in the persistent buffer when reading the previous boot buffer to see if the address belongs to a module that was loaded in the previous boot. An array is created that matches currently loaded modules with previously loaded modules. The trace_adjust_address() uses that array to find the new offset of the address that's in the previous buffer. But if no module was loaded, it ends up reading the last element in an array that was never allocated. Check if nr_entries is zero and exit out early if it is. - Remove nested lock of trace_event_sem in print_event_fields() The print_event_fields() function iterates over the ftrace_events list and requires the trace_event_sem semaphore held for read. But this function is always called with that semaphore held for read. Remove the taking of the semaphore and replace it with lockdep_assert_held_read(&trace_event_sem) tag 'trace-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Do not take trace_event_sem in print_event_fields() tracing: Fix trace_adjust_address() when there is no modules in scratch area ftrace: Fix NULL memory allocation check tracing: Fix oob write in trace_seq_to_buffer()
2025-05-04	Merge tag 'parisc-for-6.15-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Fix a double SIGFPE crash" * tag 'parisc-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix double SIGFPE crash
2025-05-04	parisc: Fix double SIGFPE crash	Helge Deller
	Camm noticed that on parisc a SIGFPE exception will crash an application with a second SIGFPE in the signal handler. Dave analyzed it, and it happens because glibc uses a double-word floating-point store to atomically update function descriptors. As a result of lazy binding, we hit a floating-point store in fpe_func almost immediately. When the T bit is set, an assist exception trap occurs when when the co-processor encounters any floating-point instruction except for a double store of register %fr0. The latter cancels all pending traps. Let's fix this by clearing the Trap (T) bit in the FP status register before returning to the signal handler in userspace. The issue can be reproduced with this test program: root@parisc:~# cat fpe.c static void fpe_func(int sig, siginfo_t i, void v) { sigset_t set; sigemptyset(&set); sigaddset(&set, SIGFPE); sigprocmask(SIG_UNBLOCK, &set, NULL); printf("GOT signal %d with si_code %ld\n", sig, i->si_code); } int main() { struct sigaction action = { .sa_sigaction = fpe_func, .sa_flags = SA_RESTART\|SA_SIGINFO }; sigaction(SIGFPE, &action, 0); feenableexcept(FE_OVERFLOW); return printf("%lf\n",1.7976931348623158E3081.7976931348623158E308); } root@parisc:~# gcc fpe.c -lm root@parisc:~# ./a.out Floating point exception root@parisc:~# strace -f ./a.out execve("./a.out", ["./a.out"], 0xf9ac7034 / 20 vars /) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=81921024, rlim_max=RLIM_INFINITY}) = 0 ... rt_sigaction(SIGFPE, {sa_handler=0x1110a, sa_mask=[], sa_flags=SA_RESTART\|SA_SIGINFO}, NULL, 8) = 0 --- SIGFPE {si_signo=SIGFPE, si_code=FPE_FLTOVF, si_addr=0x1078f} --- --- SIGFPE {si_signo=SIGFPE, si_code=FPE_FLTOVF, si_addr=0xf8f21237} --- +++ killed by SIGFPE +++ Floating point exception Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: John David Anglin <dave.anglin@bell.net> Reported-by: Camm Maguire <camm@maguirefamily.org> Cc: stable@vger.kernel.org
2025-05-04	Merge tag 'edac_urgent_for_v6.15_rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Test the correct structure member when handling correctable errors and avoid spurious interrupts, in altera_edac * tag 'edac_urgent_for_v6.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/altera: Set DDR and SDMMC interrupt mask before registration EDAC/altera: Test the correct error reg offset
2025-05-04	io_uring: always arm linked timeouts prior to issue	Jens Axboe
	There are a few spots where linked timeouts are armed, and not all of them adhere to the pre-arm, attempt issue, post-arm pattern. This can be problematic if the linked request returns that it will trigger a callback later, and does so before the linked timeout is fully armed. Consolidate all the linked timeout handling into __io_issue_sqe(), rather than have it spread throughout the various issue entry points. Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/1390 Reported-by: Chase Hiltz <chase@path.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-04	Merge tag 'x86-urgent-2025-05-04' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix SEV-SNP memory acceptance from the EFI stub for guests running at VMPL >0" * tag 'x86-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/sev: Support memory acceptance in the EFI stub under SVSM