summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-12netlink: fix policy dump for int with validation callbackJakub Kicinski
Recent devlink change added validation of an integer value via NLA_POLICY_VALIDATE_FN, for sparse enums. Handle this in policy dump. We can't extract any info out of the callback, so report only the type. Fixes: 429ac6211494 ("devlink: define enum for attr types of dynamic attributes") Reported-by: syzbot+01eb26848144516e7f0a@syzkaller.appspotmail.com Link: https://patch.msgid.link/20250509212751.1905149-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12Merge branch 'for-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Prepare for Intel IPU E2000 (GEN3) This is the first part in introducing RDMA support for idpf. ---------------------------------------------------------------- Tatyana Nikolova says: To align with review comments, the patch series introducing RDMA RoCEv2 support for the Intel Infrastructure Processing Unit (IPU) E2000 line of products is going to be submitted in three parts: 1. Modify ice to use specific and common IIDC definitions and pass a core device info to irdma. 2. Add RDMA support to idpf and modify idpf to use specific and common IIDC definitions and pass a core device info to irdma. 3. Add RDMA RoCEv2 support for the E2000 products, referred to as GEN3 to irdma. This first part is a 5 patch series based on the original "iidc/ice/irdma: Update IDC to support multiple consumers" patch to allow for multiple CORE PCI drivers, using the auxbus. Patches: 1) Move header file to new name for clarity and replace ice specific DSCP define with a kernel equivalent one in irdma 2) Unify naming convention 3) Separate header file into common and driver specific info 4) Replace ice specific DSCP define with a kernel equivalent one in ice 5) Implement core device info struct and update drivers to use it ---------------------------------------------------------------- v1: https://lore.kernel.org/20250505212037.2092288-1-anthony.l.nguyen@intel.com IWL reviews: [v5] https://lore.kernel.org/20250416021549.606-1-tatyana.e.nikolova@intel.com [v4] https://lore.kernel.org/20250225050428.2166-1-tatyana.e.nikolova@intel.com [v3] https://lore.kernel.org/20250207194931.1569-1-tatyana.e.nikolova@intel.com [v2] https://lore.kernel.org/20240824031924.421-1-tatyana.e.nikolova@intel.com [v1] https://lore.kernel.org/20240724233917.704-1-tatyana.e.nikolova@intel.com * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: iidc/ice/irdma: Update IDC to support multiple consumers ice: Replace ice specific DSCP mapping num with a kernel define iidc/ice/irdma: Break iidc.h into two headers iidc/ice/irdma: Rename to iidc_* convention iidc/ice/irdma: Rename IDC header file ==================== Link: https://patch.msgid.link/20250509200712.2911060-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12Merge branch 'net-vertexcom-mse102x-improve-rx-handling'Jakub Kicinski
Stefan Wahren says: ==================== net: vertexcom: mse102x: Improve RX handling This series is the second part of two series for the Vertexcom driver. It contains some improvements for the RX handling of the Vertexcom MSE102x. ==================== Link: https://patch.msgid.link/20250509120435.43646-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: vertexcom: mse102x: Simplify mse102x_rx_pkt_spiStefan Wahren
The function mse102x_rx_pkt_spi is used in two cases: * initial polling to re-arm RX interrupt * level based RX interrupt handler Both of them doesn't need an open-coded retry mechanism. In the first case the function can be called again, if the return code is IRQ_NONE. This keeps the error behavior during netdev open. In the second case the proper retry would be handled implicit by the SPI interrupt. So drop the retry code and simplify the receive path. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://patch.msgid.link/20250509120435.43646-7-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: vertexcom: mse102x: Return code for mse102x_rx_pkt_spiStefan Wahren
The MSE102x doesn't provide any interrupt register, so the only way to handle the level interrupt is to fetch the whole packet from the MSE102x internal buffer via SPI. So in cases the interrupt handler fails to do this, it should return IRQ_NONE. This allows the core to disable the interrupt in case the issue persists and prevent an interrupt storm. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://patch.msgid.link/20250509120435.43646-6-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: vertexcom: mse102x: Implement flag for valid CMDStefan Wahren
After removal of the invalid command counter only a relevant debug message is left, which can be cumbersome. So add a new flag to debugfs, which indicates whether the driver has ever received a valid CMD. This helps to differentiate between general and temporary receive issues. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250509120435.43646-5-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: vertexcom: mse102x: Drop invalid cmd statsStefan Wahren
There are several reasons for an invalid command response by the MSE102x: * SPI line interferences * MSE102x is in reset or has no firmware * MSE102x is busy * no packet in MSE102x receive buffer So the counter for invalid command isn't very helpful without further context. So drop the confusing statistics counter, but keep the debug messages about "unexpected response" in order to debug possible hardware issues. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250509120435.43646-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: vertexcom: mse102x: Add warning about IRQ trigger typeStefan Wahren
The example of the initial DT binding of the Vertexcom MSE 102x suggested a IRQ_TYPE_EDGE_RISING, which is wrong. So warn everyone to fix their device tree to level based IRQ. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250509120435.43646-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12dt-bindings: vertexcom-mse102x: Fix IRQ type in exampleStefan Wahren
According to the MSE102x documentation the trigger type is a high level. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250509120435.43646-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: dsa: sja1105: discard incoming frames in BR_STATE_LISTENINGVladimir Oltean
It has been reported that when under a bridge with stp_state=1, the logs get spammed with this message: [ 251.734607] fsl_dpaa2_eth dpni.5 eth0: Couldn't decode source port Further debugging shows the following info associated with packets: source_port=-1, switch_id=-1, vid=-1, vbid=1 In other words, they are data plane packets which are supposed to be decoded by dsa_tag_8021q_find_port_by_vbid(), but the latter (correctly) refuses to do so, because no switch port is currently in BR_STATE_LEARNING or BR_STATE_FORWARDING - so the packet is effectively unexpected. The error goes away after the port progresses to BR_STATE_LEARNING in 15 seconds (the default forward_time of the bridge), because then, dsa_tag_8021q_find_port_by_vbid() can correctly associate the data plane packets with a plausible bridge port in a plausible STP state. Re-reading IEEE 802.1D-1990, I see the following: "4.4.2 Learning: (...) The Forwarding Process shall discard received frames." IEEE 802.1D-2004 further clarifies: "DISABLED, BLOCKING, LISTENING, and BROKEN all correspond to the DISCARDING port state. While those dot1dStpPortStates serve to distinguish reasons for discarding frames, the operation of the Forwarding and Learning processes is the same for all of them. (...) LISTENING represents a port that the spanning tree algorithm has selected to be part of the active topology (computing a Root Port or Designated Port role) but is temporarily discarding frames to guard against loops or incorrect learning." Well, this is not what the driver does - instead it sets mac[port].ingress = true. To get rid of the log spam, prevent unexpected data plane packets to be received by software by discarding them on ingress in the LISTENING state. In terms of blame attribution: the prints only date back to commit d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID"). However, the settings would permit a LISTENING port to forward to a FORWARDING port, and the standard suggests that's not OK. Fixes: 640f763f98c2 ("net: dsa: sja1105: Add support for Spanning Tree Protocol") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250509113816.2221992-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: Lock lower level devices when updating featuresCosmin Ratiu
__netdev_update_features() expects the netdevice to be ops-locked, but it gets called recursively on the lower level netdevices to sync their features, and nothing locks those. This commit fixes that, with the assumption that it shouldn't be possible for both higher-level and lover-level netdevices to require the instance lock, because that would lead to lock dependency warnings. Without this, playing with higher level (e.g. vxlan) netdevices on top of netdevices with instance locking enabled can run into issues: WARNING: CPU: 59 PID: 206496 at ./include/net/netdev_lock.h:17 netif_napi_add_weight_locked+0x753/0xa60 [...] Call Trace: <TASK> mlx5e_open_channel+0xc09/0x3740 [mlx5_core] mlx5e_open_channels+0x1f0/0x770 [mlx5_core] mlx5e_safe_switch_params+0x1b5/0x2e0 [mlx5_core] set_feature_lro+0x1c2/0x330 [mlx5_core] mlx5e_handle_feature+0xc8/0x140 [mlx5_core] mlx5e_set_features+0x233/0x2e0 [mlx5_core] __netdev_update_features+0x5be/0x1670 __netdev_update_features+0x71f/0x1670 dev_ethtool+0x21c5/0x4aa0 dev_ioctl+0x438/0xae0 sock_ioctl+0x2ba/0x690 __x64_sys_ioctl+0xa78/0x1700 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250509072850.2002821-1-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: phy: dp83867: use 2ns delay if not specified in DTBMatthias Schiffer
Most PHY drivers default to a 2ns delay if internal delay is requested and no value is specified. Having a default value makes sense, as it allows a Device Tree to only care about board design (whether there are delays on the PCB or not), and not whether the delay is added on the MAC or the PHY side when needed. Whether the delays are actually applied is controlled by the DP83867_RGMII_*_CLK_DELAY_EN flags, so the behavior is only changed in configurations that would previously be rejected with -EINVAL. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Link: https://patch.msgid.link/e2509b248a11ee29ea408a50c231da4c1fa0863b.1746612711.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: phy: dp83867: remove check of delay strap configurationMatthias Schiffer
The check that intended to handle "rgmii" PHY mode differently to the RGMII modes with internal delay never worked as intended: - added in commit 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy"): logic error caused the condition to always evaluate to true - changed in commit a46fa260f6f5 ("net: phy: dp83867: Fix warning check for setting the internal delay"): now the condition incorrectly evaluates to false for rgmii-txid - removed in commit 2b892649254f ("net: phy: dp83867: Set up RGMII TX delay") Around the time of the removal, commit c11669a2757e ("net: phy: dp83867: Rework delay rgmii delay handling") started clearing the delay enable flags in RGMIICTL. The change attempted to preserve the historical behavior of not disabling internal delays with "rgmii" PHY mode and also documented this in a comment, but due to a conflict between "Set up RGMII TX delay" and "Rework delay rgmii delay handling", the behavior dp83867_verify_rgmii_cfg() warned about (and that was also described in a comment in dp83867_config_init()) disappeared in the following merge of net into net-next in commit b4b12b0d2f02 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"). While is doesn't appear that this breaking change was intentional, it has been like this since 2019, and the new behavior to disable the delays with "rgmii" PHY mode is generally desirable - in particular with MAC drivers that have to fix up the delay mode, resulting in the PHY driver not even seeing the same mode that was specified in the Device Tree. Remove the obsolete check and comment. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Link: https://patch.msgid.link/8a286207cd11b460bb0dbd27931de3626b9d7575.1746612711.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: cadence: macb: Fix a possible deadlock in macb_halt_tx.Mathieu Othacehe
There is a situation where after THALT is set high, TGO stays high as well. Because jiffies are never updated, as we are in a context with interrupts disabled, we never exit that loop and have a deadlock. That deadlock was noticed on a sama5d4 device that stayed locked for days. Use retries instead of jiffies so that the timeout really works and we do not have a deadlock anymore. Fixes: e86cd53afc590 ("net/macb: better manage tx errors") Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250509121935.16282-1-othacehe@gnu.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12dt-bindings: net: renesas-gbeth: Add support for RZ/V2N (R9A09G056) SoCLad Prabhakar
Document support for the GBETH IP found on the Renesas RZ/V2N (R9A09G056) SoC. The GBETH controller on the RZ/V2N SoC is functionally identical to the one found on the RZ/V2H(P) (R9A09G057) SoC, so `renesas,rzv2h-gbeth` will be used as a fallback compatible. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20250507173551.100280-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12Merge branch 'selftests-net-configure-rp_filter-in-setup_ns'Jakub Kicinski
Hangbin Liu says: ==================== selftests: net: configure rp_filter in setup_ns Some distributions enable rp_filter globally by default, which can interfere with various test cases. To address this, many tests explicitly disable rp_filter within their scripts. To avoid duplication and ensure consistent behavior across tests, this patch moves the rp_filter configuration into setup_ns, applied immediately after a new namespace is created. This change ensures that all namespace-based tests inherit the appropriate rp_filter settings, simplifying individual test scripts and improving maintainability. ==================== Link: https://patch.msgid.link/20250508081910.84216-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12selftests: mptcp: remove rp_filter configurationHangbin Liu
Remove the rp_filter configuration from MPTCP tests, as it is now handled by setup_ns. Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250508081910.84216-7-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12selftests: netfilter: remove rp_filter configurationHangbin Liu
Remove the rp_filter configuration in netfilter lib, as setup_ns already sets it appropriately by default Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250508081910.84216-6-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12selftests: net: use setup_ns for SRv6 tests and remove rp_filter configurationHangbin Liu
Some SRv6 tests manually set up network namespaces and disable rp_filter. Since the setup_ns library function already handles rp_filter configuration, convert these SRv6 tests to use setup_ns and remove the redundant rp_filter settings. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20250508081910.84216-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12selftests: net: use setup_ns for bareudp testingHangbin Liu
Switch bareudp testing to use setup_ns, which sets up rp_filter by default. This allows us to remove the manual rp_filter configuration from the script. Additionally, since setup_ns handles namespace naming and cleanup, we no longer need a separate cleanup function. We also move the trap setup earlier in the script, before the test setup begins. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250508081910.84216-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12selftests: net: remove redundant rp_filter configurationHangbin Liu
The following tests use setup_ns to create a network namespace, which will disables rp_filter immediately after namespace creation. Therefore, it is no longer necessary to disable rp_filter again within these individual tests. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250508081910.84216-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12selftests: net: disable rp_filter after namespace initializationHangbin Liu
Some distributions enable rp_filter globally by default. To ensure consistent behavior across environments, we explicitly disable it in several test cases. This patch moves the rp_filter disabling logic to immediately after the network namespace is initialized. With this change, individual test cases with creating namespace via setup_ns no longer need to disable rp_filter again. This helps avoid redundancy and ensures test consistency. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250508081910.84216-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net: ixp4xx_eth: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the intel ixp4xx ethernet driver to the new API, so that the ndo_eth_ioctl() path can be removed completely. hwtstamp_get() and hwtstamp_set() are only called if netif_running() when the code path is engaged through the legacy ioctl. As I don't want to make an unnecessary functional change which I can't test, preserve that restriction when going through the new operations. When cpu_is_ixp46x() is false, the execution of SIOCGHWTSTAMP and SIOCSHWTSTAMP falls through to phy_mii_ioctl(), which may process it in case of a timestamping PHY, or may return -EOPNOTSUPP. In the new API, the core handles timestamping PHYs directly and does not call the netdev driver, so just return -EOPNOTSUPP directly for equivalent logic. A gratuitous change I chose to do anyway is prefixing hwtstamp_get() and hwtstamp_set() with the driver name, ipx4xx. This reflects modern coding sensibilities, and we are touching the involved lines anyway. The remainder of eth_ioctl() is exactly equivalent to phy_do_ioctl_running(), so use that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20250508211043.3388702-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12selftests: drv-net: ping: make sure the ping test restores checksum offloadJakub Kicinski
The ping test flips checksum offload on and off. Make sure the original value is restored if test fails. Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250508214005.1518013-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12net/mlx5: support software TX timestampStanislav Fomichev
Having a software timestamp (along with existing hardware one) is useful to trace how the packets flow through the stack. mlx5e_tx_skb_update_hwts_flags is called from tx paths to setup HW timestamp; extend it to add software one as well. Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250508235109.585096-1-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-12Merge tag 'sched_ext-for-6.15-rc6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: "A little bit invasive for rc6 but they're important fixes, pass tests fine and won't break anything outside sched_ext: - scx_bpf_cpuperf_set() calls internal functions that require the rq to be locked. It assumed that the BPF caller has rq locked but that's not always true. Fix it by tracking whether rq is currently held by the CPU and grabbing it if necessary - bpf_iter_scx_dsq_new() was leaving the DSQ iterator in an uninitialized state after an error. However, next() and destroy() can be called on an iterator which failed initialization and thus they always need to be initialized even after an init error. Fix by always initializing the iterator - Remove duplicate BTF_ID_FLAGS() entries" * tag 'sched_ext-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: bpf_iter_scx_dsq_new() should always initialize iterator sched_ext: Fix rq lock state in hotplug ops sched_ext: Remove duplicate BTF_ID_FLAGS definitions sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set() sched_ext: Track currently locked rq
2025-05-12Merge tag 'cgroup-for-6.15-rc6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "One low-risk patch to fix a cpuset bug where it over-eagerly tries to modify CPU affinity of kernel threads" * tag 'cgroup-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
2025-05-12btrfs: add back warning for mount option commit values exceeding 300Kyoji Ogasawara
The Btrfs documentation states that if the commit value is greater than 300 a warning should be issued. The warning was accidentally lost in the new mount API update. Fixes: 6941823cc878 ("btrfs: remove old mount API code") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12btrfs: fix folio leak in submit_one_async_extent()Boris Burkov
If btrfs_reserve_extent() fails while submitting an async_extent for a compressed write, then we fail to call free_async_extent_pages() on the async_extent and leak its folios. A likely cause for such a failure would be btrfs_reserve_extent() failing to find a large enough contiguous free extent for the compressed extent. I was able to reproduce this by: 1. mount with compress-force=zstd:3 2. fallocating most of a filesystem to a big file 3. fragmenting the remaining free space 4. trying to copy in a file which zstd would generate large compressed extents for (vmlinux worked well for this) Step 4. hits the memory leak and can be repeated ad nauseam to eventually exhaust the system memory. Fix this by detecting the case where we fallback to uncompressed submission for a compressed async_extent and ensuring that we call free_async_extent_pages(). Fixes: 131a821a243f ("btrfs: fallback if compressed IO fails for ENOSPC") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Co-developed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12btrfs: fix discard worker infinite loop after disabling discardFilipe Manana
If the discard worker is running and there's currently only one block group, that block group is a data block group, it's in the unused block groups discard list and is being used (it got an extent allocated from it after becoming unused), the worker can end up in an infinite loop if a transaction abort happens or the async discard is disabled (during remount or unmount for example). This happens like this: 1) Task A, the discard worker, is at peek_discard_list() and find_next_block_group() returns block group X; 2) Block group X is in the unused block groups discard list (its discard index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past it become an unused block group and was added to that list, but then later it got an extent allocated from it, so its ->used counter is not zero anymore; 3) The current transaction is aborted by task B and we end up at __btrfs_handle_fs_error() in the transaction abort path, where we call btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode (setting SB_RDONLY in the super block's s_flags field); 4) Task A calls __add_to_discard_list() with the goal of moving the block group from the unused block groups discard list into another discard list, but at __add_to_discard_list() we end up doing nothing because btrfs_run_discard_work() returns false, since the super block has SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set anymore in fs_info->flags. So block group X remains in the unused block groups discard list; 5) Task A then does a goto into the 'again' label, calls find_next_block_group() again we gets block group X again. Then it repeats the previous steps over and over since there are not other block groups in the discard lists and block group X is never moved out of the unused block groups discard list since btrfs_run_discard_work() keeps returning false and therefore __add_to_discard_list() doesn't move block group X out of that discard list. When this happens we can get a soft lockup report like this: [71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97] [71.957] Modules linked in: xfs af_packet rfkill (...) [71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.957] Tainted: [W]=WARN [71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [71.957] Code: c1 01 48 83 (...) [71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [71.957] FS: 0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000 [71.957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0 [71.957] PKRU: 55555554 [71.957] Call Trace: [71.957] <TASK> [71.957] process_one_work+0x17e/0x330 [71.957] worker_thread+0x2ce/0x3f0 [71.957] ? __pfx_worker_thread+0x10/0x10 [71.957] kthread+0xef/0x220 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork+0x34/0x50 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork_asm+0x1a/0x30 [71.957] </TASK> [71.957] Kernel panic - not syncing: softlockup: hung tasks [71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W L 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP [71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.992] Call Trace: [71.993] <IRQ> [71.994] dump_stack_lvl+0x5a/0x80 [71.994] panic+0x10b/0x2da [71.995] watchdog_timer_fn.cold+0x9a/0xa1 [71.996] ? __pfx_watchdog_timer_fn+0x10/0x10 [71.997] __hrtimer_run_queues+0x132/0x2a0 [71.997] hrtimer_interrupt+0xff/0x230 [71.998] __sysvec_apic_timer_interrupt+0x55/0x100 [71.999] sysvec_apic_timer_interrupt+0x6c/0x90 [72.000] </IRQ> [72.000] <TASK> [72.001] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [72.002] Code: c1 01 48 83 (...) [72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [72.010] ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f] [72.011] process_one_work+0x17e/0x330 [72.012] worker_thread+0x2ce/0x3f0 [72.013] ? __pfx_worker_thread+0x10/0x10 [72.014] kthread+0xef/0x220 [72.014] ? __pfx_kthread+0x10/0x10 [72.015] ret_from_fork+0x34/0x50 [72.015] ? __pfx_kthread+0x10/0x10 [72.016] ret_from_fork_asm+0x1a/0x30 [72.017] </TASK> [72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [72.019] Rebooting in 90 seconds.. So fix this by making sure we move a block group out of the unused block groups discard list when calling __add_to_discard_list(). Fixes: 2bee7eb8bb81 ("btrfs: discard one region at a time in async discard") Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-12Merge tag 'platform-drivers-x86-v6.15-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: - amd/pmc: Use spurious 8042 quirk with MECHREVO Wujie 14XA - amd/pmf: - Ensure Smart PC policies are valid - Fix memory leak when the engine fails to start - amd/hsmp: Make amd_hsmp and hsmp_acpi as mutually exclusive drivers - asus-wmi: Fix wlan_ctrl_by_user detection - thinkpad_acpi: Add support for NEC Lavie X1475JAS * tag 'platform-drivers-x86-v6.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-wmi: Fix wlan_ctrl_by_user detection platform/x86/amd/pmc: Declare quirk_spurious_8042 for MECHREVO Wujie 14XA (GX4HRXL) platform/x86: thinkpad_acpi: Support also NEC Lavie X1475JAS platform/x86/amd/hsmp: Make amd_hsmp and hsmp_acpi as mutually exclusive drivers drivers/platform/x86/amd: pmf: Check for invalid Smart PC Policies drivers/platform/x86/amd: pmf: Check for invalid sideloaded Smart PC Policies
2025-05-12Merge tag 'udf_for_v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fix from Jan Kara: "Fix a bug in UDF inode eviction leading to spewing pointless error messages" * tag 'udf_for_v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Make sure i_lenExtents is uptodate on inode eviction
2025-05-12tracing: samples: Initialize trace_array_printk() with the correct functionSteven Rostedt
When using trace_array_printk() on a created instance, the correct function to use to initialize it is: trace_array_init_printk() Not trace_printk_init_buffer() The former is a proper function to use, the latter is for initializing trace_printk() and causes the NOTICE banner to be displayed. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Divya Indi <divya.indi@oracle.com> Link: https://lore.kernel.org/20250509152657.0f6744d9@gandalf.local.home Fixes: 89ed42495ef4a ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") Fixes: 38ce2a9e33db6 ("tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-12Merge tag 'vfs-6.15-rc7.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Ensure that simple_xattr_list() always includes security.* xattrs - Fix eventpoll busy loop optimization when combined with timeouts - Disable swapon() for devices with block sizes greater than page sizes - Don't call errseq_set() twice during mark_buffer_write_io_error(). Just use mapping_set_error() which takes care to not deference unconditionally * tag 'vfs-6.15-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Remove redundant errseq_set call in mark_buffer_write_io_error. swapfile: disable swapon for bs > ps devices fs/eventpoll: fix endless busy loop after timeout has expired fs/xattr.c: fix simple_xattr_list to always include security.* xattrs
2025-05-12kbuild: fix typos "module.builtin" to "modules.builtin"Masahiro Yamada
The filenames in the comments do not match the actual generated files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12Revert "kbuild, rust: use -fremap-path-prefix to make paths relative"Thomas Weißschuh
This reverts commit dbdffaf50ff9cee3259a7cef8a7bd9e0f0ba9f13. --remap-path-prefix breaks the ability of debuggers to find the source file corresponding to object files. As there is no simple or uniform way to specify the source directory explicitly, this breaks developers workflows. Revert the unconditional usage of --remap-path-prefix, equivalent to the same change for -ffile-prefix-map in KBUILD_CPPFLAGS. Fixes: dbdffaf50ff9 ("kbuild, rust: use -fremap-path-prefix to make paths relative") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12Revert "kbuild: make all file references relative to source root"Thomas Weißschuh
This reverts commit cacd22ce69585a91c386243cd662ada962431e63. -ffile-prefix-map breaks the ability of debuggers to find the source file corresponding to object files. As there is no simple or uniform way to specify the source directory explicitly, this breaks developers workflows. Revert the unconditional usage of -ffile-prefix-map. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/lkml/edc50aa7-0740-4942-8c15-96f12f2acc7e@kernel.org/ Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Closes: https://lore.kernel.org/lkml/aBEttQH4kimHFScx@intel.com/ Fixes: cacd22ce6958 ("kbuild: make all file references relative to source root") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12kbuild: fix dependency on sorttableMasahiro Yamada
Commit ac4f06789b4f ("kbuild: Create intermediate vmlinux build with relocations preserved") missed replacing one occurrence of "vmlinux" that was added during the same development cycle. Fixes: ac4f06789b4f ("kbuild: Create intermediate vmlinux build with relocations preserved") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org>
2025-05-12init: remove unused CONFIG_CC_CAN_LINK_STATICMasahiro Yamada
This is a leftover from commit 98e20e5e13d2 ("bpfilter: remove bpfilter"). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12um: let 'make clean' properly clean underlying SUBARCH as wellMasahiro Yamada
Building the kernel with O= is affected by stale in-tree build artifacts. So, if the source tree is not clean, Kbuild displays the following: $ make ARCH=um O=build defconfig make[1]: Entering directory '/.../linux/build' *** *** The source tree is not clean, please run 'make ARCH=um mrproper' *** in /.../linux *** make[2]: *** [/.../linux/Makefile:673: outputmakefile] Error 1 make[1]: *** [/.../linux/Makefile:248: __sub-make] Error 2 make[1]: Leaving directory '/.../linux/build' make: *** [Makefile:248: __sub-make] Error 2 Usually, running 'make mrproper' is sufficient for cleaning the source tree for out-of-tree builds. However, building UML generates build artifacts not only in arch/um/, but also in the SUBARCH directory (i.e., arch/x86/). If in-tree stale files remain under arch/x86/, Kbuild will reuse them instead of creating new ones under the specified build directory. This commit makes 'make ARCH=um clean' recurse into the SUBARCH directory. Reported-by: Shuah Khan <skhan@linuxfoundation.org> Closes: https://lore.kernel.org/lkml/20250502172459.14175-1-skhan@linuxfoundation.org/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
2025-05-12kbuild: Disable -Wdefault-const-init-unsafeNathan Chancellor
A new on by default warning in clang [1] aims to flags instances where const variables without static or thread local storage or const members in aggregate types are not initialized because it can lead to an indeterminate value. This is quite noisy for the kernel due to instances originating from header files such as: drivers/gpu/drm/i915/gt/intel_ring.h:62:2: error: default initialization of an object of type 'typeof (ring->size)' (aka 'const unsigned int') leaves the object uninitialized [-Werror,-Wdefault-const-init-var-unsafe] 62 | typecheck(typeof(ring->size), next); | ^ include/linux/typecheck.h:10:9: note: expanded from macro 'typecheck' 10 | ({ type __dummy; \ | ^ include/net/ip.h:478:14: error: default initialization of an object of type 'typeof (rt->dst.expires)' (aka 'const unsigned long') leaves the object uninitialized [-Werror,-Wdefault-const-init-var-unsafe] 478 | if (mtu && time_before(jiffies, rt->dst.expires)) | ^ include/linux/jiffies.h:138:26: note: expanded from macro 'time_before' 138 | #define time_before(a,b) time_after(b,a) | ^ include/linux/jiffies.h:128:3: note: expanded from macro 'time_after' 128 | (typecheck(unsigned long, a) && \ | ^ include/linux/typecheck.h:11:12: note: expanded from macro 'typecheck' 11 | typeof(x) __dummy2; \ | ^ include/linux/list.h:409:27: warning: default initialization of an object of type 'union (unnamed union at include/linux/list.h:409:27)' with const member leaves the object uninitialized [-Wdefault-const-init-field-unsafe] 409 | struct list_head *next = smp_load_acquire(&head->next); | ^ include/asm-generic/barrier.h:176:29: note: expanded from macro 'smp_load_acquire' 176 | #define smp_load_acquire(p) __smp_load_acquire(p) | ^ arch/arm64/include/asm/barrier.h:164:59: note: expanded from macro '__smp_load_acquire' 164 | union { __unqual_scalar_typeof(*p) __val; char __c[1]; } __u; \ | ^ include/linux/list.h:409:27: note: member '__val' declared 'const' here crypto/scatterwalk.c:66:22: error: default initialization of an object of type 'struct scatter_walk' with const member leaves the object uninitialized [-Werror,-Wdefault-const-init-field-unsafe] 66 | struct scatter_walk walk; | ^ include/crypto/algapi.h:112:15: note: member 'addr' declared 'const' here 112 | void *const addr; | ^ fs/hugetlbfs/inode.c:733:24: error: default initialization of an object of type 'struct vm_area_struct' with const member leaves the object uninitialized [-Werror,-Wdefault-const-init-field-unsafe] 733 | struct vm_area_struct pseudo_vma; | ^ include/linux/mm_types.h:803:20: note: member 'vm_flags' declared 'const' here 803 | const vm_flags_t vm_flags; | ^ Silencing the instances from typecheck.h is difficult because '= {}' is not available in older but supported compilers and '= {0}' would cause warnings about a literal 0 being treated as NULL. While it might be possible to come up with a local hack to silence the warning for clang-21+, it may not be worth it since -Wuninitialized will still trigger if an uninitialized const variable is actually used. In all audited cases of the "field" variant of the warning, the members are either not used in the particular call path, modified through other means such as memset() / memcpy() because the containing object is not const, or are within a union with other non-const members. Since this warning does not appear to have a high signal to noise ratio, just disable it. Cc: stable@vger.kernel.org Link: https://github.com/llvm/llvm-project/commit/576161cb6069e2c7656a8ef530727a0f4aefff30 [1] Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/CA+G9fYuNjKcxFKS_MKPRuga32XbndkLGcY-PVuoSwzv6VWbY=w@mail.gmail.com/ Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://github.com/ClangBuiltLinux/linux/issues/2088 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12kbuild: rpm-pkg: Add (elfutils-devel or libdw-devel) to BuildRequiresWangYuli
The dwarf.h header, which is included by scripts/gendwarfksyms/gendwarfksyms.h, resides within elfutils-devel or libdw-devel package. This portion of the code is compiled under the condition that CONFIG_GENDWARFKSYMS is enabled. Consequently, add (elfutils-devel or libdw-devel) to BuildRequires to prevent unforeseen compilation failures. Fix follow possible error: In file included from scripts/gendwarfksyms/cache.c:6: scripts/gendwarfksyms/gendwarfksyms.h:6:10: fatal error: 'dwarf.h' file not found 6 | #include <dwarf.h> | ^~~~~~~~~ Link: https://lore.kernel.org/all/3e52d80d-0c60-4df5-8cb5-21d4b1fce7b7@suse.com/ Fixes: f28568841ae0 ("tools: Add gendwarfksyms") Suggested-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12kbuild: deb-pkg: Add libdw-dev:native to Build-Depends-ArchWangYuli
The dwarf.h header, which is included by scripts/gendwarfksyms/gendwarfksyms.h, resides within the libdw-dev package. This portion of the code is compiled under the condition that CONFIG_GENDWARFKSYMS is enabled. Consequently, add libdw-dev to Build-Depends-Arch to prevent unforeseen compilation failures. Fix follow possible error: In file included from scripts/gendwarfksyms/symbols.c:6: scripts/gendwarfksyms/gendwarfksyms.h:6:10: fatal error: 'dwarf.h' file not found 6 | #include <dwarf.h> | ^~~~~~~~~ Fixes: f28568841ae0 ("tools: Add gendwarfksyms") Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Nicolas Schier <n.schier@avm.de> Tested-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12usr/include: openrisc: don't HDRTEST bpf_perf_event.hRandy Dunlap
Since openrisc does not support PERF_EVENTS, omit the HDRTEST of bpf_perf_event.h for arch/openrisc/. Fixes a build error: usr/include/linux/bpf_perf_event.h:14:28: error: field 'regs' has incomplete type Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-12kbuild: Require pahole <v1.28 or >v1.29 with GENDWARFKSYMS on X86Sami Tolvanen
With CONFIG_GENDWARFKSYMS, __gendwarfksyms_ptr variables are added to the kernel in EXPORT_SYMBOL() to ensure DWARF type information is available for exported symbols in the TUs where they're actually exported. These symbols are dropped when linking vmlinux, but dangling references to them remain in DWARF. With CONFIG_DEBUG_INFO_BTF enabled on X86, pahole versions after commit 47dcb534e253 ("btf_encoder: Stop indexing symbols for VARs") and before commit 9810758003ce ("btf_encoder: Verify 0 address DWARF variables are in ELF section") place these symbols in the .data..percpu section, which results in an "Invalid offset" error in btf_datasec_check_meta() during boot, as all the variables are at zero offset and have non-zero size. If CONFIG_DEBUG_INFO_BTF_MODULES is enabled, this also results in a failure to load modules with: failed to validate module [$module] BTF: -22 As the issue occurs in pahole v1.28 and the fix was merged after v1.29 was released, require pahole <v1.28 or >v1.29 when GENDWARFKSYMS is enabled with DEBUG_INFO_BTF on X86. Reported-by: Paolo Pisati <paolo.pisati@canonical.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-05-11Merge tag 'arm64_cbpf_mitigation_2025_05_08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 cBPF BHB mitigation from James Morse: "This adds the BHB mitigation into the code JITted for cBPF programs as these can be loaded by unprivileged users via features like seccomp. The existing mechanisms to disable the BHB mitigation will also prevent the mitigation being JITted. In addition, cBPF programs loaded by processes with the SYS_ADMIN capability are not mitigated as these could equally load an eBPF program that does the same thing. For good measure, the list of 'k' values for CPU's local mitigations is updated from the version on arm's website" * tag 'arm64_cbpf_mitigation_2025_05_08' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Add new CPUs 'k' values for branch mitigation arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users arm64: bpf: Add BHB mitigation to the epilogue for cBPF programs arm64: proton-pack: Expose whether the branchy loop k value arm64: proton-pack: Expose whether the platform is mitigated by firmware arm64: insn: Add support for encoding DSB
2025-05-11Merge tag 'its-for-linus-20250509' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ITS mitigation from Dave Hansen: "Mitigate Indirect Target Selection (ITS) issue. I'd describe this one as a good old CPU bug where the behavior is _obviously_ wrong, but since it just results in bad predictions it wasn't wrong enough to notice. Well, the researchers noticed and also realized that thus bug undermined a bunch of existing indirect branch mitigations. Thus the unusually wide impact on this one. Details: ITS is a bug in some Intel CPUs that affects indirect branches including RETs in the first half of a cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the second half of a cacheline. Researchers at VUSec found this behavior and reported to Intel. Affected processors: - Cascade Lake, Cooper Lake, Whiskey Lake V, Coffee Lake R, Comet Lake, Ice Lake, Tiger Lake and Rocket Lake. Scope of impact: - Guest/host isolation: When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to direct branches in the guest. - Intra-mode using cBPF: cBPF can be used to poison the branch history to exploit ITS. Realigning the indirect branches and RETs mitigates this attack vector. - User/kernel: With eIBRS enabled user/kernel isolation is *not* impacted by ITS. - Indirect Branch Prediction Barrier (IBPB): Due to this bug indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This will be fixed in the microcode. Mitigation: As indirect branches in the first half of cacheline are affected, the mitigation is to replace those indirect branches with a call to thunk that is aligned to the second half of the cacheline. RETs that take prediction from RSB are not affected, but they may be affected by RSB-underflow condition. So, RETs in the first half of cacheline are also patched to a return thunk that executes the RET aligned to second half of cacheline" * tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftest/x86/bugs: Add selftests for ITS x86/its: FineIBT-paranoid vs ITS x86/its: Use dynamic thunks for indirect branches x86/ibt: Keep IBT disabled during alternative patching mm/execmem: Unify early execmem_cache behaviour x86/its: Align RETs in BHB clear sequence to avoid thunking x86/its: Add support for RSB stuffing mitigation x86/its: Add "vmexit" option to skip mitigation on some CPUs x86/its: Enable Indirect Target Selection mitigation x86/its: Add support for ITS-safe return thunk x86/its: Add support for ITS-safe indirect thunk x86/its: Enumerate Indirect Target Selection (ITS) bug Documentation: x86/bugs/its: Add ITS documentation
2025-05-11Merge tag 'ibti-hisory-for-linus-2025-05-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 IBTI mitigation from Dave Hansen: "Mitigate Intra-mode Branch History Injection via classic BFP programs This adds the branch history clearing mitigation to cBPF programs for x86. Intra-mode BHI attacks via cBPF a.k.a IBTI-History was reported by researchers at VUSec. For hardware that doesn't support BHI_DIS_S, the recommended mitigation is to run the short software sequence followed by the IBHF instruction after cBPF execution. On hardware that does support BHI_DIS_S, enable BHI_DIS_S and execute the IBHF after cBPF execution. The Indirect Branch History Fence (IBHF) is a new instruction that prevents indirect branch target predictions after the barrier from using branch history from before the barrier while BHI_DIS_S is enabled. On older systems this will map to a NOP. It is recommended to add this fence at the end of the cBPF program to support VM migration. This instruction is required on newer parts with BHI_NO to fully mitigate against these attacks. The current code disables the mitigation for anything running with the SYS_ADMIN capability bit set. The intention was not to waste time mitigating a process that has access to anything it wants anyway" * tag 'ibti-hisory-for-linus-2025-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bhi: Do not set BHI_DIS_S in 32-bit mode x86/bpf: Add IBHF call at end of classic BPF x86/bpf: Call branch history clearing sequence on exit
2025-05-11Linux 6.15-rc6v6.15-rc6Linus Torvalds
2025-05-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "ARM: - Avoid use of uninitialized memcache pointer in user_mem_abort() - Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts to be taken while TGE=0 and fixing an ugly bug on AmpereOne that occurs when taking an interrupt while clearing the xMO bits (AC03_CPU_36) - Prevent VMMs from hiding support for AArch64 at any EL virtualized by KVM - Save/restore the host value for HCRX_EL2 instead of restoring an incorrect fixed value - Make host_stage2_set_owner_locked() check that the entire requested range is memory rather than just the first page RISC-V: - Add missing reset of smstateen CSRs x86: - Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the VMCB as its state is undefined after SHUTDOWN, emulating INIT is the least awful choice). - Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM KVM doesn't goof a sanity check in the future. - Free obsolete roots when (re)loading the MMU to fix a bug where pre-faulting memory can get stuck due to always encountering a stale root. - When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page to print state, so that KVM doesn't print stale/wrong information. - When changing memory attributes (e.g. shared <=> private), add potential hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM doesn't create a shared/private hugepage when the the corresponding attributes will become mixed (the attributes are commited *after* KVM finishes the invalidation). - Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is loaded led to very measurable performance regressions for non-KVM workloads" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions KVM: arm64: Fix memory check in host_stage2_set_owner_locked() KVM: arm64: Kill HCRX_HOST_FLAGS KVM: arm64: Properly save/restore HCRX_EL2 KVM: arm64: selftest: Don't try to disable AArch64 support KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort() KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields KVM: RISC-V: reset smstateen CSRs KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload() KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run() KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception