summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-17wifi: ath12k: parse and save hardware mode info from ↵Baochen Qiang
WMI_SERVICE_READY_EXT_EVENTID event for later use WLAN hardware might support various hardware modes such as DBS (dual band simultaneously), SBS (single band simultaneously) and DBS_OR_SBS etc, see enum wmi_host_hw_mode_config_type. Firmware advertises actual supported modes in WMI_SERVICE_READY_EXT_EVENTID event. For each mode, firmware advertises frequency range each hardware MAC can operate on. In MLO case such information is necessary during vdev activation and link selection (which is done in following patches), so add a new structure ath12k_svc_ext_info to ath12k_wmi_base, then parse and save those information to it for later use. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00284-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ath12k-sbs-dbs-v1-1-54a29e7a3a88@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STATBjorn Andersson
When the ath12k driver is built without CONFIG_ATH12K_DEBUG, the recently refactored stats code can cause any user space application (such at NetworkManager) to consume 100% CPU for 3 seconds, every time stats are read. Commit 'b8a0d83fe4c7 ("wifi: ath12k: move firmware stats out of debugfs")' moved ath12k_debugfs_fw_stats_request() out of debugfs, by merging the additional logic into ath12k_mac_get_fw_stats(). Among the added responsibility of ath12k_mac_get_fw_stats() was the busy-wait for `fw_stats_done`. Signalling of `fw_stats_done` happens when one of the WMI_REQUEST_PDEV_STAT, WMI_REQUEST_VDEV_STAT, and WMI_REQUEST_BCN_STAT messages are received, but the handling of the latter two commands remained in the debugfs code. As `fw_stats_done` isn't signalled, the calling processes will spin until the timeout (3 seconds) is reached. Moving the handling of these two additional responses out of debugfs resolves the issue. Fixes: b8a0d83fe4c7 ("wifi: ath12k: move firmware stats out of debugfs") Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Tested-by: Abel Vesa <abel.vesa@linaro.org> Link: https://patch.msgid.link/20250609-ath12k-fw-stats-done-v1-1-2b3624656697@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-06-17net/sched: fix use-after-free in taprio_dev_notifierHyunwoo Kim
Since taprio’s taprio_dev_notifier() isn’t protected by an RCU read-side critical section, a race with advance_sched() can lead to a use-after-free. Adding rcu_read_lock() inside taprio_dev_notifier() prevents this. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Cc: stable@vger.kernel.org Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/aEzIYYxt0is9upYG@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge branch 'ptp_vclock-fixes'Jakub Kicinski
Vladimir Oltean says: ==================== ptp_vclock fixes While I was intending to test something else related to PTP in net-next I noticed any time I would run ptp4l on an interface, the kernel would print "ptp: physical clock is free running" and ptp4l would exit with an error code. I then found Jeongjun Park's patch and subsequent explanation provided to Jakub's question, specifically related to the code which introduced the breakage I am seeing. https://lore.kernel.org/netdev/CAO9qdTEjQ5414un7Yw604paECF=6etVKSDSnYmZzZ6Pg3LurXw@mail.gmail.com/ I had to look at the original issue that prompted Jeongjun Park's patch, and provide an alternative fix for it. Patch 1/2 in this set contains a logical revert plus the alternative fix, squashed into one. Patch 2/2 fixes another issue which was confusing during debugging/ characterization, namely: "ok, the kernel clearly thinks that any physical clock is free-running after this change (despite there being no vclocks), but why would ptp4l fail to create the clock altogether? Why not just fail to adjust it?" By reverting (locally) Jeongjun Park's commit, I could reproduce the reported lockdep splat using the commands from patch 1/2's commit message, and this goes away with the reworked implementation. ==================== Link: https://patch.msgid.link/20250613174749.406826-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17ptp: allow reading of currently dialed frequency to succeed on free-running ↵Vladimir Oltean
clocks There is a bug in ptp_clock_adjtime() which makes it refuse the operation even if we just want to read the current clock dialed frequency, not modify anything (tx->modes == 0). That should be possible even if the clock is free-running. For context, the kernel UAPI is the same for getting and setting the frequency of a POSIX clock. For example, ptp4l errors out at clock_create() -> clockadj_get_freq() -> clock_adjtime() time, when it should logically only have failed on actual adjustments to the clock, aka if the clock was configured as slave. But in master mode it should work. This was discovered when examining the issue described in the previous commit, where ptp_clock_freerun() returned true despite n_vclocks being zero. Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250613174749.406826-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17ptp: fix breakage after ptp_vclock_in_use() reworkVladimir Oltean
What is broken -------------- ptp4l, and any other application which calls clock_adjtime() on a physical clock, is greeted with error -EBUSY after commit 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()"). Explanation for the breakage ---------------------------- The blamed commit was based on the false assumption that ptp_vclock_in_use() callers already test for n_vclocks prior to calling this function. This is notably incorrect for the code path below, in which there is, in fact, no n_vclocks test: ptp_clock_adjtime() -> ptp_clock_freerun() -> ptp_vclock_in_use() The result is that any clock adjustment on any physical clock is now impossible. This is _despite_ there not being any vclock over this physical clock. $ ptp4l -i eno0 -2 -P -m ptp4l[58.425]: selected /dev/ptp0 as PTP clock [ 58.429749] ptp: physical clock is free running ptp4l[58.431]: Failed to open /dev/ptp0: Device or resource busy failed to create a clock $ cat /sys/class/ptp/ptp0/n_vclocks 0 The patch makes the ptp_vclock_in_use() function say "if it's not a virtual clock, then this physical clock does have virtual clocks on top". Then ptp_clock_freerun() uses this information to say "this physical clock has virtual clocks on top, so it must stay free-running". Then ptp_clock_adjtime() uses this information to say "well, if this physical clock has to be free-running, I can't do it, return -EBUSY". Simply put, ptp_vclock_in_use() cannot be simplified so as to remove the test whether vclocks are in use. What did the blamed commit intend to fix ---------------------------------------- The blamed commit presents a lockdep warning stating "possible recursive locking detected", with the n_vclocks_store() and ptp_clock_unregister() functions involved. The recursive locking seems this: n_vclocks_store() -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 1 -> device_for_each_child_reverse(..., unregister_vclock) -> unregister_vclock() -> ptp_vclock_unregister() -> ptp_clock_unregister() -> ptp_vclock_in_use() -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 2 The issue can be triggered by creating and then deleting vclocks: $ echo 2 > /sys/class/ptp/ptp0/n_vclocks $ echo 0 > /sys/class/ptp/ptp0/n_vclocks But note that in the original stack trace, the address of the first lock is different from the address of the second lock. This is because at step 1 marked above, &ptp->n_vclocks_mux is the lock of the parent (physical) PTP clock, and at step 2, the lock is of the child (virtual) PTP clock. They are different locks of different devices. In this situation there is no real deadlock, the lockdep warning is caused by the fact that the mutexes have the same lock class on both the parent and the child. Functionally it is fine. Proposed alternative solution ----------------------------- We must reintroduce the body of ptp_vclock_in_use() mostly as it was structured prior to the blamed commit, but avoid the lockdep warning. Based on the fact that vclocks cannot be nested on top of one another (ptp_is_attribute_visible() hides n_vclocks for virtual clocks), we already know that ptp->n_vclocks is zero for a virtual clock. And ptp->is_virtual_clock is a runtime invariant, established at ptp_clock_register() time and never changed. There is no need to serialize on any mutex in order to read ptp->is_virtual_clock, and we take advantage of that by moving it outside the lock. Thus, virtual clocks do not need to acquire &ptp->n_vclocks_mux at all, and step 2 in the code walkthrough above can simply go away. We can simply return false to the question "ptp_vclock_in_use(a virtual clock)". Other notes ----------- Releasing &ptp->n_vclocks_mux before ptp_vclock_in_use() returns execution seems racy, because the returned value can become stale as soon as the function returns and before the return value is used (i.e. n_vclocks_store() can run any time). The locking requirement should somehow be transferred to the caller, to ensure a longer life time for the returned value, but this seems out of scope for this severe bug fix. Because we are also fixing up the logic from the original commit, there is another Fixes: tag for that. Fixes: 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()") Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250613174749.406826-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17x86/its: Fix an ifdef typo in its_alloc()Lukas Bulwahn
Commit a82b26451de1 ("x86/its: explicitly manage permissions for ITS pages") reworks its_alloc() and introduces a typo in an ifdef conditional, referring to CONFIG_MODULE instead of CONFIG_MODULES. Fix this typo in its_alloc(). Fixes: a82b26451de1 ("x86/its: explicitly manage permissions for ITS pages") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250616100432.22941-1-lukas.bulwahn%40redhat.com
2025-06-17Merge branch 'bnxt_en-bug-fixes'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Bug fixes Thie first patch fixes a crash during PCIe AER when the bnxt_re RoCE driver is loaded. The second patch is a refactor patch needed by patch 3. Patch 3 fixes a packet drop issue if queue restart is done on a ring belonging to a non-default RSS context. Patch 2 and 3 are version 2 that has addressed the v1 issue by reducing the scope of the traffic disruptions: https://lore.kernel.org/netdev/CACKFLi=P9xYHVF4h2Ovjd-8DaoyzFAHnY6Y6H+1b7eGq+BQZzA@mail.gmail.com/ ==================== Link: https://patch.msgid.link/20250613231841.377988-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17bnxt_en: Update MRU and RSS table of RSS contexts on queue resetPavan Chebbi
The commit under the Fixes tag below which updates the VNICs' RSS and MRU during .ndo_queue_start(), needs to be extended to cover any non-default RSS contexts which have their own VNICs. Without this step, packets that are destined to a non-default RSS context may be dropped after .ndo_queue_start(). We further optimize this scheme by updating the VNIC only if the RX ring being restarted is in the RSS table of the VNIC. Updating the VNIC (in particular setting the MRU to 0) will momentarily stop all traffic to all rings in the RSS table. Any VNIC that has the RX ring excluded from the RSS table can skip this step and avoid the traffic disruption. Note that this scheme is just an improvement. A VNIC with multiple rings in the RSS table will still see traffic disruptions to all rings in the RSS table when one of the rings is being restarted. We are working on a FW scheme that will improve upon this further. Fixes: 5ac066b7b062 ("bnxt_en: Fix queue start to update vnic RSS table") Reported-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17bnxt_en: Add a helper function to configure MRU and RSSPavan Chebbi
Add a new helper function that will configure MRU and RSS table of a VNIC. This will be useful when we configure both on a VNIC when resetting an RX ring. This function will be used again in the next bug fix patch where we have to reconfigure VNICs for RSS contexts. Suggested-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17bnxt_en: Fix double invocation of bnxt_ulp_stop()/bnxt_ulp_start()Kalesh AP
Before the commit under the Fixes tag below, bnxt_ulp_stop() and bnxt_ulp_start() were always invoked in pairs. After that commit, the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop() has been called. This may result in the RoCE driver's aux driver .suspend() method being invoked twice. The 2nd bnxt_re_suspend() call will crash when it dereferences a NULL pointer: (NULL ib_device): Handle device suspend call BUG: kernel NULL pointer dereference, address: 0000000000000b78 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 20 UID: 0 PID: 181 Comm: kworker/u96:5 Tainted: G S 6.15.0-rc1 #4 PREEMPT(voluntary) Tainted: [S]=CPU_OUT_OF_SPEC Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 Workqueue: bnxt_pf_wq bnxt_sp_task [bnxt_en] RIP: 0010:bnxt_re_suspend+0x45/0x1f0 [bnxt_re] Code: 8b 05 a7 3c 5b f5 48 89 44 24 18 31 c0 49 8b 5c 24 08 4d 8b 2c 24 e8 ea 06 0a f4 48 c7 c6 04 60 52 c0 48 89 df e8 1b ce f9 ff <48> 8b 83 78 0b 00 00 48 8b 80 38 03 00 00 a8 40 0f 85 b5 00 00 00 RSP: 0018:ffffa2e84084fd88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffb4b6b934 RDI: 00000000ffffffff RBP: ffffa1760954c9c0 R08: 0000000000000000 R09: c0000000ffffdfff R10: 0000000000000001 R11: ffffa2e84084fb50 R12: ffffa176031ef070 R13: ffffa17609775000 R14: ffffa17603adc180 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa17daa397000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000b78 CR3: 00000004aaa30003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> bnxt_ulp_stop+0x69/0x90 [bnxt_en] bnxt_sp_task+0x678/0x920 [bnxt_en] ? __schedule+0x514/0xf50 process_scheduled_works+0x9d/0x400 worker_thread+0x11c/0x260 ? __pfx_worker_thread+0x10/0x10 kthread+0xfe/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag is already set. This will preserve the original symmetrical bnxt_ulp_stop() and bnxt_ulp_start(). Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED flag after taking the mutex to avoid any race condition. And for symmetry, only proceed in bnxt_ulp_start() if the BNXT_EN_FLAG_ULP_STOPPED is set. Fixes: 3c163f35bd50 ("bnxt_en: Optimize recovery path ULP locking in the driver") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Co-developed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Merge tag 'linux-can-fixes-for-6.16-20250617' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-06-17 The patch is by Brett Werling, and fixes the power regulator retrieval during probe of the tcan4x5x glue code for the m_can driver. * tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: tcan4x5x: fix power regulator retrieval during probe openvswitch: Allocate struct ovs_pcpu_storage dynamically ==================== Link: https://patch.msgid.link/20250617155123.2141584-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: netmem: fix skb_ensure_writable with unreadable skbsMina Almasry
skb_ensure_writable should succeed when it's trying to write to the header of the unreadable skbs, so it doesn't need an unconditional skb_frags_readable check. The preceding pskb_may_pull() call will succeed if write_len is within the head and fail if we're trying to write to the unreadable payload, so we don't need an additional check. Removing this check restores DSCP functionality with unreadable skbs as it's called from dscp_tg. Cc: willemb@google.com Cc: asml.silence@gmail.com Fixes: 65249feb6b3d ("net: add support for skbs with unreadable frags") Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250615200733.520113-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17x86/mm: Disable INVLPGB when PTI is enabledDave Hansen
PTI uses separate ASIDs (aka. PCIDs) for kernel and user address spaces. When the kernel needs to flush the user address space, it just sets a bit in a bitmap and then flushes the entire PCID on the next switch to userspace. This bitmap is a single 'unsigned long' which is plenty for all 6 dynamic ASIDs. But, unfortunately, the INVLPGB support brings along a bunch more user ASIDs, as many as ~2k more. The bitmap can't address that many. Fortunately, the bitmap is only needed for PTI and all the CPUs with INVLPGB are AMD CPUs that aren't vulnerable to Meltdown and don't need PTI. The only way someone can run into an issue in practice is by booting with pti=on on a newer AMD CPU. Disable INVLPGB if PTI is enabled. Avoid overrunning the small bitmap. Note: this will be fixed up properly by making the bitmap bigger. For now, just avoid the mostly theoretical bug. Fixes: 4afeb0ed1753 ("x86/mm: Enable broadcast TLB invalidation for multi-threaded processes") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rik van Riel <riel@surriel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250610222420.E8CBF472%40davehans-spike.ostc.intel.com
2025-06-17net: ti: icssg-prueth: Fix packet handling for XDP_TXMeghana Malladi
While transmitting XDP frames for XDP_TX, page_pool is used to get the DMA buffers (already mapped to the pages) and need to be freed/reycled once the transmission is complete. This need not be explicitly done by the driver as this is handled more gracefully by the xdp driver while returning the xdp frame. __xdp_return() frees the XDP memory based on its memory type, under which page_pool memory is also handled. This change fixes the transmit queue timeout while running XDP_TX. logs: [ 309.069682] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 45860 ms [ 313.933780] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 50724 ms [ 319.053656] icssg-prueth icssg1-eth eth2: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 55844 ms ... Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250616063319.3347541-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17Input: apple_z2 - drop default ARCH_APPLE in KconfigSven Peter
When the first driver for Apple Silicon was upstreamed we accidentally included `default ARCH_APPLE` in its Kconfig which then spread to almost every subsequent driver. As soon as ARCH_APPLE is set to y this will pull in many drivers as built-ins which is not what we want. Thus, drop `default ARCH_APPLE` from Kconfig. Signed-off-by: Sven Peter <sven@kernel.org> Link: https://lore.kernel.org/r/20250612-apple-kconfig-defconfig-v1-8-0e6f9cb512c1@kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-06-17Merge tag 'libnvdimm-fixes-6.16-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fix from Ira Weiny: "This converts the pmem-region device tree bindings to YAML to fix errors and bring it up to date" * tag 'libnvdimm-fixes-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dt-bindings: pmem: Convert binding to YAML
2025-06-17tools headers x86 cpufeatures: Sync with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: faad6645e1128ec2 ("x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold") 159013a7ca18c271 ("x86/its: Enumerate Indirect Target Selection (ITS) bug") f9f27c4a377a8b45 ("x86/cpufeatures: Add "Allowed SEV Features" Feature") b02dc185ee86836c ("x86/cpufeatures: Add X86_FEATURE_APX") d88bb2ded2efdc38 ("KVM: x86: Advertise support for AMD's PREFETCHI") This causes these perf files to be rebuilt and brings some X86_FEATURE that may be used by: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Please see tools/include/uapi/README for further details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Babu Moger <babu.moger@amd.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kishon Vijay Abraham I <kvijayab@amd.com> Cc: Manali Shukla <manali.shukla@amd.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/aFBWAI3kHYX5aL9G@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-17perf bench futex: Fix prctl include in musl libcArnaldo Carvalho de Melo
Namhyung Kim reported: I've updated the perf-tools-next to v6.16-rc1 and found a build error like below on alpine linux 3.18. In file included from bench/futex.c:6: /usr/include/sys/prctl.h:88:8: error: redefinition of 'struct prctl_mm_map' 88 | struct prctl_mm_map { | ^~~~~~~~~~~~ In file included from bench/futex.c:5: /linux/tools/include/uapi/linux/prctl.h:134:8: note: originally defined here 134 | struct prctl_mm_map { | ^~~~~~~~~~~~ make[4]: *** [/linux/tools/build/Makefile.build:86: /build/bench/futex.o] Error 1 git bisect says it's the first commit introduced the failure. So both /usr/include/sys/prctl.h and /linux/tools/include/uapi/linux/prctl.h provide struct prctl_mm_map but their include guard must be different. /usr/include/sys/prctl.h provided by glibc contains the prctl() declaration. It includes also linux/prctl.h. The /usr/include/sys/prctl.h on alpine linux is different. This is probably coming from musl. It contains the PR_* definition and the prctl() declaration. So it clashes here because now the one struct is available twice. The man page for prctl(2) says: | #include <linux/prctl.h> /* Definition of PR_* constants */ | #include <sys/prctl.h> so musl doesn't follow this. So don't include linux/prctl.h explicitely and add some new defines needed if they aren't available. Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/r/20250611092542.F4ooE2FL@linutronix.de Link: https://www.openwall.com/lists/musl/2025/06/12/11 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-17perf test: Directory file descriptor leakIan Rogers
Add missed close when iterating over the script directories. Fixes: f3295f5b067d3c26 ("perf tests: Use scandirat for shell script finding") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250614004108.1650988-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-17ksmbd: handle set/get info file for streamed fileNamjae Jeon
The bug only appears when: - windows 11 copies a file that has an alternate data stream - streams_xattr is enabled on the share configuration. Microsoft Edge adds a ZoneIdentifier data stream containing the URL for files it downloads. Another way to create a test file: - open cmd.exe - echo "hello from default data stream" > hello.txt - echo "hello again from ads" > hello.txt:ads.txt If you open the file using notepad, we'll see the first message. If you run "notepad hello.txt:ads.txt" in cmd.exe, we should see the second message. dir /s /r should least all streams for the file. The truncation happens because the windows 11 client sends a SetInfo/EndOfFile message on the ADS, but it is instead applied on the main file, because we don't check fp->stream. When receiving set/get info file for a stream file, Change to process requests using stream position and size. Truncate is unnecessary for stream files, so we skip set_file_allocation_info and set_end_of_file_info operations. Reported-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-17ksmbd: fix null pointer dereference in destroy_previous_sessionNamjae Jeon
If client set ->PreviousSessionId on kerberos session setup stage, NULL pointer dereference error will happen. Since sess->user is not set yet, It can pass the user argument as NULL to destroy_previous_session. sess->user will be set in ksmbd_krb5_authenticate(). So this patch move calling destroy_previous_session() after ksmbd_krb5_authenticate(). Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27391 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-17ksmbd: add free_transport ops in ksmbd connectionNamjae Jeon
free_transport function for tcp connection can be called from smbdirect. It will cause kernel oops. This patch add free_transport ops in ksmbd connection, and add each free_transports for tcp and smbdirect. Fixes: 21a4e47578d4 ("ksmbd: fix use-after-free in __smb2_lease_break_noti()") Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-17Input: Fully open-code compatible for greppingKrzysztof Kozlowski
It is very useful to find driver implementing compatibles with `git grep compatible`, so driver should not use defines for that string, even if this means string will be effectively duplicated. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250613071653.46809-2-krzysztof.kozlowski@linaro.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-06-17dt-bindings: HID: i2c-hid: elan: Introduce Elan eKTH8D18Chen-Yu Tsai
The Elan eKTH8D18 touchscreen controller is an I2C HID device with a longer boot-up time. Power sequence timing wise it is compatible with the eKTH6A12NAY, with a power-on delay of at least 5ms, 20ms out-of-reset for I2C ack response, and 150ms out-of-reset for I2C HID enumeration, both shorter than what the eKTH6A12NAY requires. Enumeration and subsequent operation follows the I2C HID standard. Add a compatible string for it with the ekth6a12nay one as a fallback. No enum was used as it is rare to actually add new entries. These chips are commonly completely backward compatible, and unless the power sequencing delays change, there is no real effort being made to keep track of new parts, which come out constantly. Also drop the constraints on the I2C address since it's not really part of the binding. Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://lore.kernel.org/r/20250617082004.1653492-2-wenst@chromium.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-06-17perf evsel: Missed close() when probing hybrid core PMUsIan Rogers
Add missing close() to avoid leaking perf events. In past perfs this mattered little as the function was just used by 'perf list'. As the function is now used to detect hybrid PMUs leaking the perf event is somewhat more painful. Fixes: b41f1cec91c37eee ("perf list: Skip unsupported events") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250614004108.1650988-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-17tools headers: Synchronize linux/bits.h with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in this cset: 1e7933a575ed8af4 ("uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"") 5b572e8a9f3dcd6e ("bits: introduce fixed-type BIT_U*()") 19408200c094858d ("bits: introduce fixed-type GENMASK_U*()") 31299a5e02112411 ("bits: add comments and newlines to #if, #else and #endif directives") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/linux/bits.h include/linux/bits.h Please see tools/include/uapi/README for further details. Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Cc: I Hsin Cheng <richard120310@gmail.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/aEr0ZJ60EbshEy6p@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-17tools arch amd ibs: Sync ibs.h with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes from: 861c6b1185fbb2e3 ("x86/platform/amd: Add standard header guards to <asm/amd/ibs.h>") A small change to tools/perf/check-headers.sh was made to cope with the move of this header done in: 3846389c03a85188 ("x86/platform/amd: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>") That don't result in any changes in the tools, just address this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/amd/ibs.h arch/x86/include/asm/amd/ibs.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/aEtCi0pup5FEwnzn@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-06-17workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()Chuyi Zhou
Now when isolcpus is enabled via the cmdline, wq_isolated_cpumask does not include these isolated CPUs, even wq_unbound_cpumask has already excluded them. It is only when we successfully configure an isolate cpuset partition that wq_isolated_cpumask gets overwritten by workqueue_unbound_exclude_cpumask(), including both the cmdline-specified isolated CPUs and the isolated CPUs within the cpuset partitions. Fix this issue by initializing wq_isolated_cpumask properly in workqueue_init_early(). Fixes: fe28f631fa94 ("workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask") Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-17Merge tag 'platform-drivers-x86-v6.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - amd/hsmp: Timeout handling fixes - amd/pmc: - Clear metrics table at start of cycle - Add PCSpecialist Lafite Pro V 14M to 8042 quirks list - amd/pmf: Fix error handling corner cases (nth attempt) - alienware-wmi-wmax: Revert G-Mode support as it lowers performance - dell_rbu: - Fix sparse lock context warning - Fix list head usage - Don't overwrite data buffer past the size of the last packet - ideapad-laptop: Ensure EC is not polled too frequently - intel-uncore-freq: - Fail module load when plat_info is NULL - Avoid a non-literal format string as it triggers a compiler warning - intel/pmc: Add Lunar Lake and Panther Lake support to SSRAM Telemetry - intel/power-domains: Fix error code in tpmi_init() - samsung-galaxybook: Add support for Notebook 9 Pro and others (SAM0426) * tag 'platform-drivers-x86-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1" platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list platform/x86/intel-uncore-freq: avoid non-literal format string platform/x86/intel/pmc: Add Panther Lake support to Intel PMC SSRAM Telemetry platform/x86/intel/pmc: Add Lunar Lake support to Intel PMC SSRAM Telemetry MAINTAINERS: .mailmap: Update Hans de Goede's email address platform/x86: dell_rbu: Bump version platform/x86: dell_rbu: Stop overwriting data buffer platform/x86: dell_rbu: Fix list usage platform/x86: dell_rbu: Fix lock context warning platform/x86/amd: pmf: Simplify error flow in amd_pmf_init_smart_pc() platform/x86/amd: pmf: Prevent amd_pmf_tee_deinit() from running twice platform/x86/amd: pmf: Use device managed allocations x86/platform/amd: replace down_timeout() with down_interruptible() x86/platform/amd: move final timeout check to after final sleep platform/x86/amd: pmc: Clear metrics table at start of cycle platform/x86/intel: power-domains: Fix error code in tpmi_init() platform/x86: samsung-galaxybook: Add SAM0426 platform/x86/intel-uncore-freq: Fail module load when plat_info is NULL platform/x86: ideapad-laptop: use usleep_range() for EC polling
2025-06-17sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from ↵Tejun Heo
sched_create_group() During task_group creation, sched_create_group() calls scx_group_set_weight() with CGROUP_WEIGHT_DFL to initialize the sched_ext portion. This is premature and ends up calling ops.cgroup_set_weight() with an incorrect @cgrp before ops.cgroup_init() is called. sched_create_group() should just initialize SCX related fields in the new task_group. Fix it by factoring out scx_tg_init() from sched_init() and making sched_create_group() call that function instead of scx_group_set_weight(). v2: Retain CONFIG_EXT_GROUP_SCHED ifdef in sched_init() as removing it leads to build failures on !CONFIG_GROUP_SCHED configs. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 819513666966 ("sched_ext: Add cgroup support") Cc: stable@vger.kernel.org # v6.12+
2025-06-17sched_ext: Make scx_group_set_weight() always update tg->scx.weightTejun Heo
Otherwise, tg->scx.weight can go out of sync while scx_cgroup is not enabled and ops.cgroup_init() may be called with a stale weight value. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 819513666966 ("sched_ext: Add cgroup support") Cc: stable@vger.kernel.org # v6.12+
2025-06-17bcachefs: fsck: Fix check_directory_structure when no check_direntsKent Overstreet
check_directory_structure runs after check_dirents, so it expects that it won't see any inodes with missing backpointers - normally. But online fsck can't run check_dirents yet, or the user might only be running a specific pass, so we need to be careful that this isn't an error. If an inode is unreachable, that's handled by a separate pass. Also, add a new 'bch2_inode_has_backpointer()' helper, since we were doing this inconsistently. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17RDMA/mlx5: Initialize obj_event->obj_sub_list before xa_insertMark Zhang
The obj_event may be loaded immediately after inserted, then if the list_head is not initialized then we may get a poisonous pointer. This fixes the crash below: mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced) mlx5_core.sf mlx5_core.sf.4: firmware version: 32.38.3056 mlx5_core 0000:03:00.0 en3f0pf0sf2002: renamed from eth0 mlx5_core.sf mlx5_core.sf.4: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps IPv6: ADDRCONF(NETDEV_CHANGE): en3f0pf0sf2002: link becomes ready Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000007760fb000 [0000000000000060] pgd=000000076f6d7003, p4d=000000076f6d7003, pud=0000000777841003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] SMP Modules linked in: ipmb_host(OE) act_mirred(E) cls_flower(E) sch_ingress(E) mptcp_diag(E) udp_diag(E) raw_diag(E) unix_diag(E) tcp_diag(E) inet_diag(E) binfmt_misc(E) bonding(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) isofs(E) cdrom(E) mst_pciconf(OE) ib_umad(OE) mlx5_ib(OE) ipmb_dev_int(OE) mlx5_core(OE) kpatch_15237886(OEK) mlxdevm(OE) auxiliary(OE) ib_uverbs(OE) ib_core(OE) psample(E) mlxfw(OE) tls(E) sunrpc(E) vfat(E) fat(E) crct10dif_ce(E) ghash_ce(E) sha1_ce(E) sbsa_gwdt(E) virtio_console(E) ext4(E) mbcache(E) jbd2(E) xfs(E) libcrc32c(E) mmc_block(E) virtio_net(E) net_failover(E) failover(E) sha2_ce(E) sha256_arm64(E) nvme(OE) nvme_core(OE) gpio_mlxbf3(OE) mlx_compat(OE) mlxbf_pmc(OE) i2c_mlxbf(OE) sdhci_of_dwcmshc(OE) pinctrl_mlxbf3(OE) mlxbf_pka(OE) gpio_generic(E) i2c_core(E) mmc_core(E) mlxbf_gige(OE) vitesse(E) pwr_mlxbf(OE) mlxbf_tmfifo(OE) micrel(E) mlxbf_bootctl(OE) virtio_ring(E) virtio(E) ipmi_devintf(E) ipmi_msghandler(E) [last unloaded: mst_pci] CPU: 11 PID: 20913 Comm: rte-worker-11 Kdump: loaded Tainted: G OE K 5.10.134-13.1.an8.aarch64 #1 Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.2.2.12968 Oct 26 2023 pstate: a0400089 (NzCv daIf +PAN -UAO -TCO BTYPE=--) pc : dispatch_event_fd+0x68/0x300 [mlx5_ib] lr : devx_event_notifier+0xcc/0x228 [mlx5_ib] sp : ffff80001005bcf0 x29: ffff80001005bcf0 x28: 0000000000000001 x27: ffff244e0740a1d8 x26: ffff244e0740a1d0 x25: ffffda56beff5ae0 x24: ffffda56bf911618 x23: ffff244e0596a480 x22: ffff244e0596a480 x21: ffff244d8312ad90 x20: ffff244e0596a480 x19: fffffffffffffff0 x18: 0000000000000000 x17: 0000000000000000 x16: ffffda56be66d620 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000040 x10: ffffda56bfcafb50 x9 : ffffda5655c25f2c x8 : 0000000000000010 x7 : 0000000000000000 x6 : ffff24545a2e24b8 x5 : 0000000000000003 x4 : ffff80001005bd28 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff244e0596a480 x0 : ffff244d8312ad90 Call trace: dispatch_event_fd+0x68/0x300 [mlx5_ib] devx_event_notifier+0xcc/0x228 [mlx5_ib] atomic_notifier_call_chain+0x58/0x80 mlx5_eq_async_int+0x148/0x2b0 [mlx5_core] atomic_notifier_call_chain+0x58/0x80 irq_int_handler+0x20/0x30 [mlx5_core] __handle_irq_event_percpu+0x60/0x220 handle_irq_event_percpu+0x3c/0x90 handle_irq_event+0x58/0x158 handle_fasteoi_irq+0xfc/0x188 generic_handle_irq+0x34/0x48 ... Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://patch.msgid.link/r/3ce7f20e0d1a03dc7de6e57494ec4b8eaf1f05c2.1750147949.git.leon@kernel.org Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-17RDMA/core: Rate limit GID cache warning messagesMaor Gottlieb
The GID cache warning messages can flood the kernel log when there are multiple failed attempts to add GIDs. This can happen when creating many virtual interfaces without having enough space for their GIDs in the GID table. Change pr_warn to pr_warn_ratelimited to prevent log flooding while still maintaining visibility of the issue. Link: https://patch.msgid.link/r/fd45ed4a1078e743f498b234c3ae816610ba1b18.1750062357.git.leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-17RDMA/mlx5: Fix unsafe xarray access in implicit ODP handlingOr Har-Toov
__xa_store() and __xa_erase() were used without holding the proper lock, which led to a lockdep warning due to unsafe RCU usage. This patch replaces them with xa_store() and xa_erase(), which perform the necessary locking internally. ============================= WARNING: suspicious RCPU usage 6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1 Not tainted ----------------------------- ./include/linux/xarray.h:1211 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by kworker/u136:0/219: at: process_one_work+0xbe4/0x15f0 process_one_work+0x75c/0x15f0 pagefault_mr+0x9a5/0x1390 [mlx5_ib] stack backtrace: CPU: 14 UID: 0 PID: 219 Comm: kworker/u136:0 Not tainted 6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib] Call Trace: dump_stack_lvl+0xa8/0xc0 lockdep_rcu_suspicious+0x1e6/0x260 xas_create+0xb8a/0xee0 xas_store+0x73/0x14c0 __xa_store+0x13c/0x220 ? xa_store_range+0x390/0x390 ? spin_bug+0x1d0/0x1d0 pagefault_mr+0xcb5/0x1390 [mlx5_ib] ? _raw_spin_unlock+0x1f/0x30 mlx5_ib_eqe_pf_action+0x3be/0x2620 [mlx5_ib] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5_ib_invalidate_range+0xcb0/0xcb0 [mlx5_ib] process_one_work+0x7db/0x15f0 ? pwq_dec_nr_in_flight+0xda0/0xda0 ? assign_work+0x168/0x240 worker_thread+0x57d/0xcd0 ? rescuer_thread+0xc40/0xc40 kthread+0x3b3/0x800 ? kthread_is_per_cpu+0xb0/0xb0 ? lock_downgrade+0x680/0x680 ? do_raw_spin_lock+0x12d/0x270 ? spin_bug+0x1d0/0x1d0 ? finish_task_switch.isra.0+0x284/0x9e0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? kthread_is_per_cpu+0xb0/0xb0 ret_from_fork+0x2d/0x70 ? kthread_is_per_cpu+0xb0/0xb0 ret_from_fork_asm+0x11/0x20 Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free") Link: https://patch.msgid.link/r/a85ddd16f45c8cb2bc0a188c2b0fcedfce975eb8.1750061791.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-17e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13Vitaly Lifshits
On some systems with Nahum 11 and Nahum 13 the value of the XTAL clock in the software STRAP is incorrect. This causes the PTP timer to run at the wrong rate and can lead to synchronization issues. The STRAP value is configured by the system firmware, and a firmware update is not always possible. Since the XTAL clock on these systems always runs at 38.4MHz, the driver may ignore the STRAP and just set the correct value. Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Reviewed-by: Gil Fine <gil.fine@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17ice: fix eswitch code memory leak in reset scenarioGrzegorz Nitka
Add simple eswitch mode checker in attaching VF procedure and allocate required port representor memory structures only in switchdev mode. The reset flows triggers VF (if present) detach/attach procedure. It might involve VF port representor(s) re-creation if the device is configured is switchdev mode (not legacy one). The memory was blindly allocated in current implementation, regardless of the mode and not freed if in legacy mode. Kmemeleak trace: unreferenced object (percpu) 0x7e3bce5b888458 (size 40): comm "bash", pid 1784, jiffies 4295743894 hex dump (first 32 bytes on cpu 45): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): pcpu_alloc_noprof+0x4c4/0x7c0 ice_repr_create+0x66/0x130 [ice] ice_repr_create_vf+0x22/0x70 [ice] ice_eswitch_attach_vf+0x1b/0xa0 [ice] ice_reset_all_vfs+0x1dd/0x2f0 [ice] ice_pci_err_resume+0x3b/0xb0 [ice] pci_reset_function+0x8f/0x120 reset_store+0x56/0xa0 kernfs_fop_write_iter+0x120/0x1b0 vfs_write+0x31c/0x430 ksys_write+0x61/0xd0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Testing hints (ethX is PF netdev): - create at least one VF echo 1 > /sys/class/net/ethX/device/sriov_numvfs - trigger the reset echo 1 > /sys/class/net/ethX/device/reset Fixes: 415db8399d06 ("ice: make representor code generic") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17net: ice: Perform accurate aRFS flow matchKrishna Kumar
This patch fixes an issue seen in a large-scale deployment under heavy incoming pkts where the aRFS flow wrongly matches a flow and reprograms the NIC with wrong settings. That mis-steering causes RX-path latency spikes and noisy neighbor effects when many connections collide on the same hash (some of our production servers have 20-30K connections). set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by hashing the skb sized by the per rx-queue table size. This results in multiple connections (even across different rx-queues) getting the same hash value. The driver steer function modifies the wrong flow to use this rx-queue, e.g.: Flow#1 is first added: Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10 Later when a new flow needs to be added: Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20 The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This results in both flows getting un-optimized - packets for Flow#1 goes to q#20, and then reprogrammed back to q#10 later and so on; and Flow #2 programming is never done as Flow#1 is matched first for all misses. Many flows may wrongly share the same hash and reprogram rules of the original flow each with their own q#. Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s 72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, enable aRFS (global value is 144 * rps_flow_cnt). Test notes about results from ice_rx_flow_steer(): --------------------------------------------------- 1. "Skip:" counter increments here: if (fltr_info->q_index == rxq_idx || arfs_entry->fltr_state != ICE_ARFS_ACTIVE) goto out; 2. "Add:" counter increments here: ret = arfs_entry->fltr_info.fltr_id; INIT_HLIST_NODE(&arfs_entry->list_entry); 3. "Update:" counter increments here: /* update the queue to forward to on an already existing flow */ Runtime comparison: original code vs with the patch for different rps_flow_cnt values. +-------------------------------+--------------+--------------+ | rps_flow_cnt | 512 | 2048 | +-------------------------------+--------------+--------------+ | Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K | | Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K | | CPU User | 216 vs 183 | 216 vs 206 | | CPU System | 1441 vs 1171 | 1447 vs 1320 | | CPU Softirq | 1245 vs 920 | 1238 vs 961 | | CPU Total | 29 vs 22.7 | 29 vs 24.9 | | aRFS Update | 533K vs 59 | 521K vs 32 | | aRFS Skip | 82M vs 77M | 7.2M vs 4.5M | +-------------------------------+--------------+--------------+ A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections showed no performance degradation. Some points on the patch/aRFS behavior: 1. Enabling full tuple matching ensures flows are always correctly matched, even with smaller hash sizes. 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs and fewer calls to driver for programming on misses. 3. Larger hash tables reduces mis-steering due to more unique flow hashes, but still has clashes. However, with larger per-device rps_flow_cnt, old flows take more time to expire and new aRFS flows cannot be added if h/w limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt pkts have been processed by this cpu that are not part of the flow). Fixes: 28bf26724fdb0 ("ice: Implement aRFS") Signed-off-by: Krishna Kumar <krikku@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17s390/ptrace: Fix pointer dereferencing in regs_get_kernel_stack_nth()Heiko Carstens
The recent change which added READ_ONCE_NOCHECK() to read the nth entry from the kernel stack incorrectly dropped dereferencing of the stack pointer in order to read the requested entry. In result the address of the entry is returned instead of its content. Dereference the pointer again to fix this. Reported-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/r/20250612163331.GA13384@willie-the-truck Fixes: d93a855c31b7 ("s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()") Cc: stable@vger.kernel.org Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-06-17can: tcan4x5x: fix power regulator retrieval during probeBrett Werling
Fixes the power regulator retrieval in tcan4x5x_can_probe() by ensuring the regulator pointer is not set to NULL in the successful return from devm_regulator_get_optional(). Fixes: 3814ca3a10be ("can: tcan4x5x: tcan4x5x_can_probe(): turn on the power before parsing the config") Signed-off-by: Brett Werling <brett.werling@garmin.com> Link: https://patch.msgid.link/20250612191825.3646364-1-brett.werling@garmin.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-06-17bcachefs: Fix restart handling in btree_node_scrub_work()Kent Overstreet
btree node scrub was sometimes failing to rewrite nodes with errors; bch2_btree_node_rewrite() can return a transaction restart and we weren't checking - the lockrestart_do() needs to wrap the entire operation. And there's a better helper it should've been using, bch2_btree_node_rewrite_key(), which makes all this more convenient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-17bpf: Mark dentry->d_inode as trusted_or_nullSong Liu
LSM hooks such as security_path_mknod() and security_inode_rename() have access to newly allocated negative dentry, which has NULL d_inode. Therefore, it is necessary to do the NULL pointer check for d_inode. Also add selftests that checks the verifier enforces the NULL pointer check. Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20250613052857.1992233-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17x86/process: Move the buffer clearing before MONITORBorislav Petkov (AMD)
Move the VERW clearing before the MONITOR so that VERW doesn't disarm it and the machine never enters C1. Original idea by Kim Phillips <kim.phillips@amd.com>. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-06-17x86/microcode/AMD: Add TSA microcode SHAsBorislav Petkov (AMD)
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-06-17KVM: SVM: Advertise TSA CPUID bits to guestsBorislav Petkov (AMD)
Synthesize the TSA CPUID feature bits for guests. Set TSA_{SQ,L1}_NO on unaffected machines. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-06-17x86/bugs: Add a Transient Scheduler Attacks mitigationBorislav Petkov (AMD)
Add the required features detection glue to bugs.c et all in order to support the TSA mitigation. Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
2025-06-17openvswitch: Allocate struct ovs_pcpu_storage dynamicallySebastian Andrzej Siewior
PERCPU_MODULE_RESERVE defines the maximum size that can by used for the per-CPU data size used by modules. This is 8KiB. Commit 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into one") restructured the per-CPU memory allocation for the module and moved the separate alloc_percpu() invocations at module init time to a static per-CPU variable which is allocated by the module loader. The size of the per-CPU data section for openvswitch is 6488 bytes which is ~80% of the available per-CPU memory. Together with a few other modules it is easy to exhaust the available 8KiB of memory. Allocate ovs_pcpu_storage dynamically at module init time. Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/all/c401e017-f8db-4f57-a1cd-89beb979a277@nvidia.com Fixes: 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into one") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250613123629.-XSoQTCu@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-17io_uring/sqpoll: don't put task_struct on tctx setup failureJens Axboe
A recent commit moved the error handling of sqpoll thread and tctx failures into the thread itself, as part of fixing an issue. However, it missed that tctx allocation may also fail, and that io_sq_offload_create() does its own error handling for the task_struct in that case. Remove the manual task putting in io_sq_offload_create(), as io_sq_thread() will notice that the tctx did not get setup and hence it should put itself and exit. Reported-by: syzbot+763e12bbf004fb1062e4@syzkaller.appspotmail.com Fixes: ac0b8b327a56 ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-17io_uring: remove duplicate io_uring_alloc_task_context() definitionJens Axboe
This function exists in both tctx.h (where it belongs) and in io_uring.h as a remnant of before the tctx handling code got split out. Remove the io_uring.h definition and ensure that sqpoll.c includes the tctx.h header to get the definition. Signed-off-by: Jens Axboe <axboe@kernel.dk>