summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-27x86/fpu: Refine and simplify the magic number check during signal returnChang S. Bae
Before restoring xstate from the user space buffer, the kernel performs sanity checks on these magic numbers: magic1 in the software reserved area, and magic2 at the end of XSAVE region. The position of magic2 is calculated based on the xstate size derived from the user space buffer. But, the in-kernel record is directly available and reliable for this purpose. This reliance on user space data is also inconsistent with the recent fix in: d877550eaf2d ("x86/fpu: Stop relying on userspace for info to fault in xsave buffer") Simply use fpstate->user_size, and then get rid of unnecessary size-evaluation code. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211014500.3738-1-chang.seok.bae@intel.com
2025-02-27x86/percpu: Disable named address spaces for UBSAN_BOOL with KASAN for GCC < ↵Uros Bizjak
14.2 GCC < 14.2 does not correctly propagate address space qualifiers with -fsanitize=bool,enum. Together with address sanitizer then causes that load to be sanitized. Disable named address spaces for GCC < 14.2 when both, UBSAN_BOOL and KASAN are enabled. Reported-by: Matt Fleming <matt@readmodwrite.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250227140715.2276353-1-ubizjak@gmail.com Closes: https://lore.kernel.org/lkml/20241213190119.3449103-1-matt@readmodwrite.com/
2025-02-27arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation levelRyan Roberts
commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") changed the "invalidation level unknown" hint from 0 to TLBI_TTL_UNKNOWN (INT_MAX). But the fallback "unknown level" path in flush_hugetlb_tlb_range() was not updated. So as it stands, when trying to invalidate CONT_PMD_SIZE or CONT_PTE_SIZE hugetlb mappings, we will spuriously try to invalidate at level 0 on LPA2-enabled systems. Fix this so that the fallback passes TLBI_TTL_UNKNOWN, and while we are at it, explicitly use the correct stride and level for CONT_PMD_SIZE and CONT_PTE_SIZE, which should provide a minor optimization. Cc: stable@vger.kernel.org Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250226120656.2400136-4-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-02-27arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptesRyan Roberts
arm64 supports multiple huge_pte sizes. Some of the sizes are covered by a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some are covered by multiple ptes at a particular level (CONT_PTE_SIZE, CONT_PMD_SIZE). So the function has to figure out the size from the huge_pte pointer. This was previously done by walking the pgtable to determine the level and by using the PTE_CONT bit to determine the number of ptes at the level. But the PTE_CONT bit is only valid when the pte is present. For non-present pte values (e.g. markers, migration entries), the previous implementation was therefore erroneously determining the size. There is at least one known caller in core-mm, move_huge_pte(), which may call huge_ptep_get_and_clear() for a non-present pte. So we must be robust to this case. Additionally the "regular" ptep_get_and_clear() is robust to being called for non-present ptes so it makes sense to follow the behavior. Fix this by using the new sz parameter which is now provided to the function. Additionally when clearing each pte in a contig range, don't gather the access and dirty bits if the pte is not present. An alternative approach that would not require API changes would be to store the PTE_CONT bit in a spare bit in the swap entry pte for the non-present case. But it felt cleaner to follow other APIs' lead and just pass in the size. As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap entry offset field (layout of non-present pte). Since hugetlb is never swapped to disk, this field will only be populated for markers, which always set this bit to 0 and hwpoison swap entries, which set the offset field to a PFN; So it would only ever be 1 for a 52-bit PVA system where memory in that high half was poisoned (I think!). So in practice, this bit would almost always be zero for non-present ptes and we would only clear the first entry if it was actually a contiguous block. That's probably a less severe symptom than if it was always interpreted as 1 and cleared out potentially-present neighboring PTEs. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250226120656.2400136-3-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-02-27mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()Ryan Roberts
In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being cleared in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-02-27Merge tag 'net-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. We didn't get netfilter or wireless PRs this week, so next week's PR is probably going to be bigger. A healthy dose of fixes for bugs introduced in the current release nonetheless. Current release - regressions: - Bluetooth: always allow SCO packets for user channel - af_unix: fix memory leak in unix_dgram_sendmsg() - rxrpc: - remove redundant peer->mtu_lock causing lockdep splats - fix spinlock flavor issues with the peer record hash - eth: iavf: fix circular lock dependency with netdev_lock - net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net() RDMA driver register notifier after the device Current release - new code bugs: - ethtool: fix ioctl confusing drivers about desired HDS user config - eth: ixgbe: fix media cage present detection for E610 device Previous releases - regressions: - loopback: avoid sending IP packets without an Ethernet header - mptcp: reset connection when MPTCP opts are dropped after join Previous releases - always broken: - net: better track kernel sockets lifetime - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT - eth: enetc: number of error handling fixes - dsa: rtl8366rb: reshuffle the code to fix config / build issue with LED support" * tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: ti: icss-iep: Reject perout generation request idpf: fix checksums set in idpf_rx_rsc() selftests: drv-net: Check if combined-count exists net: ipv6: fix dst ref loop on input in rpl lwt net: ipv6: fix dst ref loop on input in seg6 lwt usbnet: gl620a: fix endpoint checking in genelink_bind() net/mlx5: IRQ, Fix null string in debug print net/mlx5: Restore missing trace event when enabling vport QoS net/mlx5: Fix vport QoS cleanup on error net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. af_unix: Fix memory leak in unix_dgram_sendmsg() net: Handle napi_schedule() calls from non-interrupt net: Clear old fragment checksum value in napi_reuse_skb gve: unlink old napi when stopping a queue using queue API net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net(). tcp: Defer ts_recent changes until req is owned net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() net: enetc: remove the mm_lock from the ENETC v4 driver net: enetc: add missing enetc4_link_deinit() net: enetc: update UDP checksum when updating originTimestamp field ...
2025-02-27efi/mokvar-table: Avoid repeated map/unmap of the same pageArd Biesheuvel
Tweak the logic that traverses the MOKVAR UEFI configuration table to only unmap the entry header and map the next one if they don't live in the same physical page. Link: https://lore.kernel.org/all/8f085931-3e9d-4386-9209-1d6c95616327@uncooperative.org/ Tested-By: Peter Jones <pjones@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-27efi: Don't map the entire mokvar table to determine its sizePeter Jones
Currently, when validating the mokvar table, we (re)map the entire table on each iteration of the loop, adding space as we discover new entries. If the table grows over a certain size, this fails due to limitations of early_memmap(), and we get a failure and traceback: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:139 __early_ioremap+0xef/0x220 ... Call Trace: <TASK> ? __early_ioremap+0xef/0x220 ? __warn.cold+0x93/0xfa ? __early_ioremap+0xef/0x220 ? report_bug+0xff/0x140 ? early_fixup_exception+0x5d/0xb0 ? early_idt_handler_common+0x2f/0x3a ? __early_ioremap+0xef/0x220 ? efi_mokvar_table_init+0xce/0x1d0 ? setup_arch+0x864/0xc10 ? start_kernel+0x6b/0xa10 ? x86_64_start_reservations+0x24/0x30 ? x86_64_start_kernel+0xed/0xf0 ? common_startup_64+0x13e/0x141 </TASK> ---[ end trace 0000000000000000 ]--- mokvar: Failed to map EFI MOKvar config table pa=0x7c4c3000, size=265187. Mapping the entire structure isn't actually necessary, as we don't ever need more than one entry header mapped at once. Changes efi_mokvar_table_init() to only map each entry header, not the entire table, when determining the table size. Since we're not mapping any data past the variable name, it also changes the code to enforce that each variable name is NUL terminated, rather than attempting to verify it in place. Cc: <stable@vger.kernel.org> Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-27cgroup/rstat: Fix forceidle time in cpu.statAbel Wu
The commit b824766504e4 ("cgroup/rstat: add force idle show helper") retrieves forceidle_time outside cgroup_rstat_lock for non-root cgroups which can be potentially inconsistent with other stats. Rather than reverting that commit, fix it in a way that retains the effort of cleaning up the ifdef-messes. Fixes: b824766504e4 ("cgroup/rstat: add force idle show helper") Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-27Merge tag 'sound-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of fixes. The only slightly large change is for ASoC Cirrus codec, but that's still in a normal range. All the rest are small device-specific fixes and should be fairly safe to take" * tag 'sound-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix microphone regression on ASUS N705UD ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15 ASoC: cs35l56: Prevent races when soft-resetting using SPI control firmware: cs_dsp: Remove async regmap writes ASoC: Intel: sof_sdw: warn both sdw and pch dmic are used ASoC: SOF: Intel: don't check number of sdw links when set dmic_fixup ASoC: dapm-graph: set fill colour of turned on nodes ASoC: fsl: Rename stream name of SAI DAI driver ASoC: es8328: fix route from DAC to output ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2 ASoC: tas2764: Set the SDOUT polarity correctly ASoC: tas2764: Fix power control mask ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports ASoC: tas2770: Fix volume scale
2025-02-27sched_ext: Documentation: add task lifecycle summaryAndrea Righi
Understanding the lifecycle of a task in sched_ext can be not trivial, therefore add a section to the main documentation that summarizes the entire workflow of a task using pseudo-code. Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-27tools/sched_ext: Provide a compatible helper for scx_bpf_events()Andrea Righi
Introduce __COMPAT_scx_bpf_events() to use scx_bpf_events() in a compatible way also with kernels that don't provide this kfunc. This also fixes the following error with scx_qmap when running on a kernel that does not provide scx_bpf_events(): ; scx_bpf_events(&events, sizeof(events)); @ scx_qmap.bpf.c:777 318: (b7) r2 = 72 ; R2_w=72 async_cb 319: <invalid kfunc call> kfunc 'scx_bpf_events' is referenced but wasn't resolved Fixes: 9865f31d852a4 ("sched_ext: Add scx_bpf_events() and scx_read_event() for BPF schedulers") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-27net: ti: icss-iep: Reject perout generation requestMeghana Malladi
IEP driver supports both perout and pps signal generation but perout feature is faulty with half-cooked support due to some missing configuration. Remove perout support from the driver and reject perout requests with "not supported" error code. Fixes: c1e0230eeaab2 ("net: ti: icss-iep: Add IEP driver") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250227092441.1848419-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27idpf: fix checksums set in idpf_rx_rsc()Eric Dumazet
idpf_rx_rsc() uses skb_transport_offset(skb) while the transport header is not set yet. This triggers the following warning for CONFIG_DEBUG_NET=y builds. DEBUG_NET_WARN_ON_ONCE(!skb_transport_header_was_set(skb)) [ 69.261620] WARNING: CPU: 7 PID: 0 at ./include/linux/skbuff.h:3020 idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261629] Modules linked in: vfat fat dummy bridge intel_uncore_frequency_tpmi intel_uncore_frequency_common intel_vsec_tpmi idpf intel_vsec cdc_ncm cdc_eem cdc_ether usbnet mii xhci_pci xhci_hcd ehci_pci ehci_hcd libeth [ 69.261644] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Tainted: G S W 6.14.0-smp-DEV #1697 [ 69.261648] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN [ 69.261650] RIP: 0010:idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261677] ? __warn (kernel/panic.c:242 kernel/panic.c:748) [ 69.261682] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261687] ? report_bug (lib/bug.c:?) [ 69.261690] ? handle_bug (arch/x86/kernel/traps.c:285) [ 69.261694] ? exc_invalid_op (arch/x86/kernel/traps.c:309) [ 69.261697] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 69.261700] ? __pfx_idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:4011) idpf [ 69.261704] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261708] ? idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:3072) idpf [ 69.261712] __napi_poll (net/core/dev.c:7194) [ 69.261716] net_rx_action (net/core/dev.c:7265) [ 69.261718] ? __qdisc_run (net/sched/sch_generic.c:293) [ 69.261721] ? sched_clock (arch/x86/include/asm/preempt.h:84 arch/x86/kernel/tsc.c:288) [ 69.261726] handle_softirqs (kernel/softirq.c:561) Fixes: 3a8845af66edb ("idpf: add RX splitq napi poll support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alan Brady <alan.brady@intel.com> Cc: Joshua Hay <joshua.a.hay@intel.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250226221253.1927782-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27selftests: drv-net: Check if combined-count existsJoe Damato
Some drivers, like tg3, do not set combined-count: $ ethtool -l enp4s0f1 Channel parameters for enp4s0f1: Pre-set maximums: RX: 4 TX: 4 Other: n/a Combined: n/a Current hardware settings: RX: 4 TX: 1 Other: n/a Combined: n/a In the case where combined-count is not set, the ethtool netlink code in the kernel elides the value and the code in the test: netnl.channels_get(...) With a tg3 device, the returned dictionary looks like: {'header': {'dev-index': 3, 'dev-name': 'enp4s0f1'}, 'rx-max': 4, 'rx-count': 4, 'tx-max': 4, 'tx-count': 1} Note that the key 'combined-count' is missing. As a result of this missing key the test raises an exception: # Exception| if channels['combined-count'] == 0: # Exception| ~~~~~~~~^^^^^^^^^^^^^^^^^^ # Exception| KeyError: 'combined-count' Change the test to check if 'combined-count' is a key in the dictionary first and if not assume that this means the driver has separate RX and TX queues. With this change, the test now passes successfully on tg3 and mlx5 (which does have a 'combined-count'). Fixes: 1cf270424218 ("net: selftest: add test for netdev netlink queue-get API") Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250226181957.212189-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27arm64: dts: rockchip: fix pinmux of UART5 for PX30 Ringneck on HaikouQuentin Schulz
UART5 uses GPIO0_B5 as UART RTS but muxed in its GPIO function, therefore UART5 must request this pin to be muxed in that function, so let's do that. Fixes: 5963d97aa780 ("arm64: dts: rockchip: add rs485 support on uart5 of px30-ringneck-haikou") Cc: stable@vger.kernel.org Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Link: https://lore.kernel.org/r/20250225-ringneck-dtbos-v3-2-853a9a6dd597@cherry.de Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-02-27arm64: dts: rockchip: fix pinmux of UART0 for PX30 Ringneck on HaikouQuentin Schulz
UART0 pinmux by default configures GPIO0_B5 in its UART RTS function for UART0. However, by default on Haikou, it is used as GPIO as UART RTS for UART5. Therefore, let's update UART0 pinmux to not configure the pin in that mode, a later commit will make UART5 request the GPIO pinmux. Fixes: c484cf93f61b ("arm64: dts: rockchip: add PX30-µQ7 (Ringneck) SoM with Haikou baseboard") Cc: stable@vger.kernel.org Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Link: https://lore.kernel.org/r/20250225-ringneck-dtbos-v3-1-853a9a6dd597@cherry.de Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-02-27Merge branch 'fixes-for-seg6-and-rpl-lwtunnels-on-input'Paolo Abeni
Justin Iurman says: ==================== fixes for seg6 and rpl lwtunnels on input As a follow up to commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels"), we also need a conditional dst cache on input for seg6_iptunnel and rpl_iptunnel to prevent dst ref loops (i.e., if the packet destination did not change, we may end up recording a reference to the lwtunnel in its own cache, and the lwtunnel state will never be freed). This series provides a fix to respectively prevent a dst ref loop on input in seg6_iptunnel and rpl_iptunnel. v2: - https://lore.kernel.org/netdev/20250211221624.18435-1-justin.iurman@uliege.be/ v1: - https://lore.kernel.org/netdev/20250209193840.20509-1-justin.iurman@uliege.be/ ==================== Link: https://patch.msgid.link/20250225175139.25239-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: ipv6: fix dst ref loop on input in rpl lwtJustin Iurman
Prevent a dst ref loop on input in rpl_iptunnel. Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel") Cc: Alexander Aring <alex.aring@gmail.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: ipv6: fix dst ref loop on input in seg6 lwtJustin Iurman
Prevent a dst ref loop on input in seg6_iptunnel. Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input") Cc: David Lebrun <dlebrun@google.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27arm64: dts: rockchip: fix u2phy1_host status for NanoPi R4SJustin Klaassen
The u2phy1_host should always have the same status as usb_host1_ehci and usb_host1_ohci, otherwise the EHCI and OHCI drivers may be initialized for a disabled usb port. Per the NanoPi R4S schematic, the phy-supply for u2phy1_host is set to the vdd_5v regulator. Fixes: db792e9adbf8 ("rockchip: rk3399: Add support for FriendlyARM NanoPi R4S") Cc: stable@vger.kernel.org Signed-off-by: Justin Klaassen <justin@tidylabs.net> Reviewed-by: Dragan Simic <dsimic@manjaro.org> Link: https://lore.kernel.org/r/20250225170420.3898-1-justin@tidylabs.net Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-02-27x86/bootflag: Replace open-coded parity calculation with parity8()Kuan-Wei Chiu
Refactor parity calculations to use the standard parity8() helper. This change eliminates redundant implementations and improves code efficiency. [ ubizjak: Updated the patch to apply to the latest x86 tree. ] Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250227125616.2253774-1-ubizjak@gmail.com
2025-02-27x86/cpu: Remove get_this_hybrid_cpu_*()Pawan Gupta
Because calls to get_this_hybrid_cpu_type() and get_this_hybrid_cpu_native_id() are not required now. cpu-type and native-model-id are cached at boot in per-cpu struct cpuinfo_topology. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-4-2ae010f50370@linux.intel.com
2025-02-27perf/x86/intel: Use cache cpu-type for hybrid PMU selectionPawan Gupta
get_this_hybrid_cpu_type() misses a case when cpu-type is populated regardless of X86_FEATURE_HYBRID_CPU. This is particularly true for hybrid variants that have P or E cores fused off. Instead use the cpu-type cached in struct x86_topology, as it does not rely on hybrid feature to enumerate cpu-type. This can also help avoid the model-specific fixup get_hybrid_cpu_type(). Also replace the get_this_hybrid_cpu_native_id() with its cached value in struct x86_topology. While at it, remove enum hybrid_cpu_type as it serves no purpose when we have the exact cpu-types defined in enum intel_cpu_type. Also rename atom_native_id to intel_native_id and move it to intel-family.h where intel_cpu_type lives. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-3-2ae010f50370@linux.intel.com
2025-02-27cpufreq: intel_pstate: Avoid SMP calls to get cpu-typePawan Gupta
Intel pstate driver relies on SMP calls to get the cpu-type of a given CPU. Remove the SMP calls and instead use the cached value of cpu-type which is more efficient. [ mingo: Forward ported it. ] Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-2-2ae010f50370@linux.intel.com
2025-02-27x86/cpu: Prefix hexadecimal values with 0x in cpu_debug_show()Pawan Gupta
The hex values in CPU debug interface are not prefixed with 0x. This may cause misinterpretation of values. Fix it. [ mingo: Restore previous vertical alignment of the output. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-1-2ae010f50370@linux.intel.com
2025-02-27usbnet: gl620a: fix endpoint checking in genelink_bind()Nikita Zhandarovich
Syzbot reports [1] a warning in usb_submit_urb() triggered by inconsistencies between expected and actually present endpoints in gl620a driver. Since genelink_bind() does not properly verify whether specified eps are in fact provided by the device, in this case, an artificially manufactured one, one may get a mismatch. Fix the issue by resorting to a usbnet utility function usbnet_get_endpoints(), usually reserved for this very problem. Check for endpoints and return early before proceeding further if any are missing. [1] Syzbot report: usb 5-1: Manufacturer: syz usb 5-1: SerialNumber: syz usb 5-1: config 0 descriptor?? gl620a 5-1:0.23 usb0: register 'gl620a' at usb-dummy_hcd.0-1, ... ------------[ cut here ]------------ usb 5-1: BOGUS urb xfer, pipe 3 != type 1 WARNING: CPU: 2 PID: 1841 at drivers/usb/core/urb.c:503 usb_submit_urb+0xe4b/0x1730 drivers/usb/core/urb.c:503 Modules linked in: CPU: 2 UID: 0 PID: 1841 Comm: kworker/2:2 Not tainted 6.12.0-syzkaller-07834-g06afb0f36106 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Workqueue: mld mld_ifc_work RIP: 0010:usb_submit_urb+0xe4b/0x1730 drivers/usb/core/urb.c:503 ... Call Trace: <TASK> usbnet_start_xmit+0x6be/0x2780 drivers/net/usb/usbnet.c:1467 __netdev_start_xmit include/linux/netdevice.h:5002 [inline] netdev_start_xmit include/linux/netdevice.h:5011 [inline] xmit_one net/core/dev.c:3590 [inline] dev_hard_start_xmit+0x9a/0x7b0 net/core/dev.c:3606 sch_direct_xmit+0x1ae/0xc30 net/sched/sch_generic.c:343 __dev_xmit_skb net/core/dev.c:3827 [inline] __dev_queue_xmit+0x13d4/0x43e0 net/core/dev.c:4400 dev_queue_xmit include/linux/netdevice.h:3168 [inline] neigh_resolve_output net/core/neighbour.c:1514 [inline] neigh_resolve_output+0x5bc/0x950 net/core/neighbour.c:1494 neigh_output include/net/neighbour.h:539 [inline] ip6_finish_output2+0xb1b/0x2070 net/ipv6/ip6_output.c:141 __ip6_finish_output net/ipv6/ip6_output.c:215 [inline] ip6_finish_output+0x3f9/0x1360 net/ipv6/ip6_output.c:226 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x1f8/0x540 net/ipv6/ip6_output.c:247 dst_output include/net/dst.h:450 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] mld_sendpack+0x9f0/0x11d0 net/ipv6/mcast.c:1819 mld_send_cr net/ipv6/mcast.c:2120 [inline] mld_ifc_work+0x740/0xca0 net/ipv6/mcast.c:2651 process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229 process_scheduled_works kernel/workqueue.c:3310 [inline] worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Reported-by: syzbot+d693c07c6f647e0388d3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d693c07c6f647e0388d3 Fixes: 47ee3051c856 ("[PATCH] USB: usbnet (5/9) module for genesys gl620a cables") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://patch.msgid.link/20250224172919.1220522-1-n.zhandarovich@fintech.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27Merge patch series "iomap: make buffered writes work with RWF_DONTCACHE"Christian Brauner
Support buffered writes with RWF_DONTCACHE. * patches from https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk: xfs: flag as supporting FOP_DONTCACHE iomap: make buffered writes work with RWF_DONTCACHE Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27xfs: flag as supporting FOP_DONTCACHEJens Axboe
Read side was already fully supported, and with the write side appropriately punted to the worker queue, all that's needed now is setting FOP_DONTCACHE in the file_operations structure to enable full support for read and write uncached IO. This provides similar benefits to using RWF_DONTCACHE with reads. Testing buffered writes on 32 files: writing bs 65536, uncached 0 1s: 196035MB/sec 2s: 132308MB/sec 3s: 132438MB/sec 4s: 116528MB/sec 5s: 103898MB/sec 6s: 108893MB/sec 7s: 99678MB/sec 8s: 106545MB/sec 9s: 106826MB/sec 10s: 101544MB/sec 11s: 111044MB/sec 12s: 124257MB/sec 13s: 116031MB/sec 14s: 114540MB/sec 15s: 115011MB/sec 16s: 115260MB/sec 17s: 116068MB/sec 18s: 116096MB/sec where it's quite obvious where the page cache filled, and performance dropped from to about half of where it started, settling in at around 115GB/sec. Meanwhile, 32 kswapds were running full steam trying to reclaim pages. Running the same test with uncached buffered writes: writing bs 65536, uncached 1 1s: 198974MB/sec 2s: 189618MB/sec 3s: 193601MB/sec 4s: 188582MB/sec 5s: 193487MB/sec 6s: 188341MB/sec 7s: 194325MB/sec 8s: 188114MB/sec 9s: 192740MB/sec 10s: 189206MB/sec 11s: 193442MB/sec 12s: 189659MB/sec 13s: 191732MB/sec 14s: 190701MB/sec 15s: 191789MB/sec 16s: 191259MB/sec 17s: 190613MB/sec 18s: 191951MB/sec and the behavior is fully predictable, performing the same throughout even after the page cache would otherwise have fully filled with dirty data. It's also about 65% faster, and using half the CPU of the system compared to the normal buffered write. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20250204184047.356762-3-axboe@kernel.dk Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27iomap: make buffered writes work with RWF_DONTCACHEJens Axboe
Add iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE is set for a write, mark the folios being written as uncached. Then writeback completion will drop the pages. The write_iter handler simply kicks off writeback for the pages, and writeback completion will take care of the rest. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27efivarfs: allow creation of zero length filesJames Bottomley
Temporarily allow the creation of zero length files in efivarfs so the 'fwupd' user space firmware update tool can continue to operate. This hack should be reverted as soon as the fwupd mechanisms for updating firmware have been fixed. fwupd has been coded to open a firmware file, close it, remove the immutable bit and write to it. Since commit 908af31f4896 ("efivarfs: fix error on write to new variable leaving remnants") this behaviour results in the first close removing the file which causes the second write to fail. To allow fwupd to keep working code up an indicator of size 1 if a write fails and only remove the file on that condition (so create at zero size is allowed). Tested-by: Richard Hughes <richard@hughsie.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> [ardb: replace LVFS with fwupd, as suggested by Richard] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-27x86/platform: Only allow CONFIG_EISA for 32-bitArnd Bergmann
The CONFIG_EISA menu was cleaned up in 2018, but this inadvertently brought the option back on 64-bit machines: ISA remains guarded by a CONFIG_X86_32 check, but EISA no longer depends on ISA. The last Intel machines ith EISA support used a 82375EB PCI/EISA bridge from 1993 that could be paired with the 440FX chipset on early Pentium-II CPUs, long before the first x86-64 products. Fixes: 6630a8e50105 ("eisa: consolidate EISA Kconfig entry in drivers/eisa") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-11-arnd@kernel.org
2025-02-27x86/pci: Remove old STA2x11 supportArnd Bergmann
ST ConneXt STA2x11 was an interface chip for Atom E6xx processors, using a number of components usually found on Arm SoCs. Most of this was merged upstream, but it was never complete enough to actually work and has been abandoned for many years. We already had an agreement on removing it in 2022, but nobody ever submitted the patch to do it. Without STA2x11, CONFIG_X86_32_NON_STANDARD no longer has any use - remove it. Suggested-by: Davide Ciminaghi <ciminaghi@gnudd.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-10-arnd@kernel.org
2025-02-27x86/cpu: Document CONFIG_X86_INTEL_MID as 64-bit-onlyArnd Bergmann
The X86_INTEL_MID code was originally introduced for the 32-bit Moorestown/Medfield/Clovertrail platform, later the 64-bit Merrifield/Moorefield variants were added, but the final Morganfield 14nm platform was canceled before it hit the market. To help users understand what the option actually refers to, update the help text, and add a dependency on 64-bit kernels. Ferry confirmed that all the hardware can run 64-bit kernels these days, but is still testing 32-bit kernels on the Intel Edison board, so this remains possible, but is guarded by a CONFIG_EXPERT dependency now, to gently push remaining users towards using CONFIG_64BIT. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Shevchenko <andy@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-9-arnd@kernel.org
2025-02-27x86/mm: Drop support for CONFIG_HIGHPTEArnd Bergmann
With the maximum amount of RAM now 4GB, there is very little point to still have PTE pages in highmem. Drop this for simplification. The only other architecture supporting HIGHPTE is 32-bit arm, and once that feature is removed as well, the highpte logic can be dropped from common code as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-8-arnd@kernel.org
2025-02-27x86/mm: Drop CONFIG_SWIOTLB for PAEArnd Bergmann
Since kernels with and without CONFIG_X86_PAE are now limited to the low 4GB of physical address space, there is no need to use swiotlb any more, so stop selecting this. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-7-arnd@kernel.org
2025-02-27x86/mm: Remove CONFIG_HIGHMEM64G supportArnd Bergmann
HIGHMEM64G support was added in linux-2.3.25 to support (then) high-end Pentium Pro and Pentium III Xeon servers with more than 4GB of addressing, NUMA and PCI-X slots started appearing. I have found no evidence of this ever being used in regular dual-socket servers or consumer devices, all the users seem obsolete these days, even by i386 standards: - Support for NUMA servers (NUMA-Q, IBM x440, unisys) was already removed ten years ago. - 4+ socket non-NUMA servers based on Intel 450GX/450NX, HP F8 and ServerWorks ServerSet/GrandChampion could theoretically still work with 8GB, but these were exceptionally rare even 20 years ago and would have usually been equipped with than the maximum amount of RAM. - Some SKUs of the Celeron D from 2004 had 64-bit mode fused off but could still work in a Socket 775 mainboard designed for the later Core 2 Duo and 8GB. Apparently most BIOSes at the time only allowed 64-bit CPUs. - The rare Xeon LV "Sossaman" came on a few motherboards with registered DDR2 memory support up to 16GB. - In the early days of x86-64 hardware, there was sometimes the need to run a 32-bit kernel to work around bugs in the hardware drivers, or in the syscall emulation for 32-bit userspace. This likely still works but there should never be a need for this any more. PAE mode is still required to get access to the 'NX' bit on Atom 'Pentium M' and 'Core Duo' CPUs. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-6-arnd@kernel.org
2025-02-27x86/cpu: Drop configuration options for early 64-bit CPUsArnd Bergmann
The x86 CPU selection menu is confusing for a number of reasons: When configuring 32-bit kernels, it shows a small number of early 64-bit microarchitectures (K8, Core 2) but not the regular generic 64-bit target that is the normal default. There is no longer a reason to run 32-bit kernels on production 64-bit systems, so only actual 32-bit CPUs need to be shown here. When configuring 64-bit kernels, the options also pointless as there is no way to pick any CPU from the past 15 years, leaving GENERIC_CPU as the only sensible choice. Address both of the above by removing the obsolete options and making all 64-bit kernels run on both Intel and AMD CPUs from any generation. Testing generic 32-bit kernels on 64-bit hardware remains possible, just not building a 32-bit kernel that requires a 64-bit CPU. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-5-arnd@kernel.org
2025-02-27x86/build: Rework CONFIG_GENERIC_CPU compiler flagsArnd Bergmann
Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to run on all CPUs, but the Makefile does not actually pass an -march= argument, instead relying on the default that was used to configure the toolchain. In many cases, gcc will be configured to -march=x86-64 or -march=k8 for maximum compatibility, but in other cases a distribution default may be either raised to a more recent ISA, or set to -march=native to build for the CPU used for compilation. This still works in the case of building a custom kernel for the local machine. The point where it breaks down is building a kernel for another machine that is older the the default target. Changing the default to -march=x86-64 would make it work reliable, but possibly produce worse code on distros that intentionally default to a newer ISA. To allow reliably building a kernel for either the oldest x86-64 CPUs, pass the -march=x86-64 flag to the compiler. This was not possible in early versions of x86-64 gcc, but works on all currently supported versions down to at least gcc-5. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-4-arnd@kernel.org
2025-02-27x86/smp: Drop 32-bit "bigsmp" machine supportArnd Bergmann
The x86-32 kernel used to support multiple platforms with more than eight logical CPUs, from the 1999-2003 timeframe: Sequent NUMA-Q, IBM Summit, Unisys ES7000 and HP F8. Support for all except the latter was dropped back in 2014, leaving only the F8 based DL740 and DL760 G2 machines in this catery, with up to eight single-core Socket-603 Xeon-MP processors with hyperthreading. Like the already removed machines, the HP F8 servers at the time cost upwards of $100k in typical configurations, but were quickly obsoleted by their 64-bit Socket-604 cousins and the AMD Opteron. Earlier servers with up to 8 Pentium Pro or Xeon processors remain fully supported as they had no hyperthreading. Similarly, the more common 4-socket Xeon-MP machines with hyperthreading using Intel or ServerWorks chipsets continue to work without this, and all the multi-core Xeon processors also run 64-bit kernels. While the "bigsmp" support can also be used to run on later 64-bit machines (including VM guests), it seems best to discourage that and get any remaining users to update their kernels to 64-bit builds on these. As a side-effect of this, there is also no more need to support NUMA configurations on 32-bit x86, as all true 32-bit NUMA platforms are already gone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250226213714.4040853-3-arnd@kernel.org
2025-02-27x86/Kconfig: Add cmpxchg8b support back to Geode CPUsArnd Bergmann
An older cleanup of mine inadvertently removed geode-gx1 and geode-lx from the list of CPUs that are known to support a working cmpxchg8b. Fixes: 88a2b4edda3d ("x86/Kconfig: Rework CONFIG_X86_PAE dependency") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250226213714.4040853-2-arnd@kernel.org
2025-02-27Merge branch 'x86/mm' into x86/cpu, to avoid conflictsIngo Molnar
We are going to apply a new series that conflicts with pending work in x86/mm, so merge in x86/mm to avoid it, and also to refresh the x86/cpu branch with fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-27MIPS: Ignore relocs against __ex_table for relocatable kernelXi Ruoyao
Since commit 6f2c2f93a190 ("scripts/sorttable: Remove unneeded Elf_Rel"), sorttable no longer clears relocs against __ex_table, claiming "it was never used." But in fact MIPS relocatable kernel had been implicitly depending on this behavior, so after this commit the MIPS relocatable kernel has started to spit oops like: CPU 1 Unable to handle kernel paging request at virtual address 000000fffbbdbff8, epc == ffffffff818f9a6c, ra == ffffffff813ad7d0 ... ... Call Trace: [<ffffffff818f9a6c>] __raw_copy_from_user+0x48/0x2fc [<ffffffff813ad7d0>] cp_statx+0x1a0/0x1e0 [<ffffffff813ae528>] do_statx_fd+0xa8/0x118 [<ffffffff813ae670>] sys_statx+0xd8/0xf8 [<ffffffff81156cc8>] syscall_common+0x34/0x58 So ignore those relocs on our own to fix the issue. Fixes: 6f2c2f93a190 ("scripts/sorttable: Remove unneeded Elf_Rel") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-02-27x86/bugs: Remove X86_FEATURE_USE_IBPBYosry Ahmed
X86_FEATURE_USE_IBPB was introduced in: 2961298efe1e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags") to have separate flags for when the CPU supports IBPB (i.e. X86_FEATURE_IBPB) and when an IBPB is actually used to mitigate Spectre v2. Ever since then, the uses of IBPB expanded. The name became confusing because it does not control all IBPB executions in the kernel. Furthermore, because its name is generic and it's buried within indirect_branch_prediction_barrier(), it's easy to use it not knowing that it is specific to Spectre v2. X86_FEATURE_USE_IBPB is no longer needed because all the IBPB executions it used to control are now controlled through other means (e.g. switch_mm_*_ibpb static branches). Remove the unused feature bit. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20250227012712.3193063-7-yosry.ahmed@linux.dev
2025-02-27KVM: nVMX: Always use IBPB to properly virtualize IBRSYosry Ahmed
On synthesized nested VM-exits in VMX, an IBPB is performed if IBRS is advertised to the guest to properly provide separate prediction domains for L1 and L2. However, this is currently conditional on X86_FEATURE_USE_IBPB, which depends on the host spectre_v2_user mitigation. In short, if spectre_v2_user=no, IBRS is not virtualized correctly and L1 becomes susceptible to attacks from L2. Fix this by performing the IBPB regardless of X86_FEATURE_USE_IBPB. Fixes: 2e7eab81425a ("KVM: VMX: Execute IBPB on emulated VM-exit when guest has IBRS") Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Jim Mattson <jmattson@google.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250227012712.3193063-6-yosry.ahmed@linux.dev
2025-02-27x86/bugs: Use a static branch to guard IBPB on vCPU switchYosry Ahmed
Instead of using X86_FEATURE_USE_IBPB to guard the IBPB execution in KVM when a new vCPU is loaded, introduce a static branch, similar to switch_mm_*_ibpb. This makes it obvious in spectre_v2_user_select_mitigation() what exactly is being toggled, instead of the unclear X86_FEATURE_USE_IBPB (which will be shortly removed). It also provides more fine-grained control, making it simpler to change/add paths that control the IBPB in the vCPU switch path without affecting other IBPBs. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250227012712.3193063-5-yosry.ahmed@linux.dev
2025-02-27x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set()Yosry Ahmed
If X86_FEATURE_USE_IBPB is not set, then both spectre_v2_user_ibpb and spectre_v2_user_stibp are set to SPECTRE_V2_USER_NONE in spectre_v2_user_select_mitigation(). Since ib_prctl_set() already checks for this before performing the IBPB, the X86_FEATURE_USE_IBPB check is redundant. Remove it. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20250227012712.3193063-4-yosry.ahmed@linux.dev
2025-02-27x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation()Yosry Ahmed
The check is performed when either switch_mm_cond_ibpb or switch_mm_always_ibpb is set. In both cases, X86_FEATURE_USE_IBPB is always set. Remove the redundant check. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20250227012712.3193063-3-yosry.ahmed@linux.dev
2025-02-27x86/bugs: Move the X86_FEATURE_USE_IBPB check into callersYosry Ahmed
indirect_branch_prediction_barrier() only performs the MSR write if X86_FEATURE_USE_IBPB is set, using alternative_msr_write(). In preparation for removing X86_FEATURE_USE_IBPB, move the feature check into the callers so that they can be addressed one-by-one, and use X86_FEATURE_IBPB instead to guard the MSR write. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250227012712.3193063-2-yosry.ahmed@linux.dev
2025-02-27selftests/x86/xstate: Fix spelling mistake "hader" -> "header"Colin Ian King
There is a spelling mistake in a sig_print() message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250227091533.599213-1-colin.i.king@gmail.com