Age | Commit message (Collapse) | Author |
|
Spin off the release implementation from vgic_put_irq() to prepare for a
more involved fix for lock ordering such that it may be unnested from
raw spinlocks. This has the minor functional change of doing call_rcu()
behind the xa_lock although it shouldn't be consequential.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250905100531.282980-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
KVM's use of krefs to manage LPIs isn't adding much, especially
considering vgic_irq_release() is a noop due to the lack of sufficient
context.
Switch to using a regular refcount in anticipation of adding a
meaningful release concept for LPIs.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250905100531.282980-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
While LPIs lack an active state, KVM unconditionally folds the active
state from the LR into the vgic_irq struct meaning this field cannot be
'creatively' reused for something else.
Drop the misleading comment to reflect this.
Link: https://lore.kernel.org/r/20250905100531.282980-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Prior to commit 75a5fbaf6623 ("KVM: arm64: Compute MDCR_EL2 at
vcpu_load()"), host MDCR_EL2 was saved correctly:
kvm_arch_vcpu_load()
kvm_vcpu_load_debug() /* Doesn't touch hardware MDCR_EL2. */
kvm_vcpu_load_vhe()
__activate_traps_common()
/* Saves host MDCR_EL2. */
*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2)
/* Writes VCPU MDCR_EL2. */
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2)
The MDCR_EL2 value saved previously was restored in
kvm_arch_vcpu_put() -> kvm_vcpu_put_vhe().
After the aforementioned commit, host MDCR_EL2 is never saved:
kvm_arch_vcpu_load()
kvm_vcpu_load_debug() /* Writes VCPU MDCR_EL2 */
kvm_vcpu_load_vhe()
__activate_traps_common()
/* Saves **VCPU** MDCR_EL2. */
*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2)
/* Writes VCPU MDCR_EL2 a second time. */
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2)
kvm_arch_vcpu_put() -> kvm_vcpu_put_vhe() then restores the VCPU MDCR_EL2
value. Also VCPU's MDCR_EL2 value gets written to hardware twice now.
Fix this by saving the host MDCR_EL2 in kvm_arch_vcpu_load() before it gets
overwritten by the VCPU's MDCR_EL2 value, and restore it on VCPU put.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250902130833.338216-3-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
According to the pseudocode for StatisticalProfilingEnabled() from Arm
DDI0487L.b, PMSCR_EL1 controls profiling at EL1 and EL0:
- PMSCR_EL1.E1SPE controls profiling at EL1.
- PMSCR_EL1.E0SPE controls profiling at EL0 if HCR_EL2.TGE=0.
These two fields reset to UNKNOWN values.
When KVM runs in VHE mode and profiling is enabled in the host, before
entering a guest, KVM does not touch any of the SPE registers, leaving the
buffer enabled, and it clears HCR_EL2.TGE. As a result, depending on the
reset value for the E1SPE and E0SPE fields, KVM might unintentionally
profile a guest. Make the behaviour consistent and predictable by clearing
PMSCR_EL1 when KVM initialises the host debug configuration.
Note that this is not a problem for nVHE, because KVM clears
PMSCR_EL1.{E1SPE,E0SPE} before entering the guest.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20250902130833.338216-2-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
kvm_vncr_tlb_lookup() is supposed to return true when the cached VNCR
TLB entry is valid for the current context. For non-Global entries, that
means the entry’s ASID must match the current ASID.
The current code returns true when the ASIDs do *not* match, which
inverts the logic. This is a potential vulnerability:
- Valid entries are ignored and we fall back to kvm_translate_vncr(),
hurting performance.
- Mismatched entries are treated as permission faults (-EPERM) instead
of triggering a fresh translation.
- This can also cause stale translations to be (wrongly) considered
valid across address spaces.
Flip the predicate so non-Global entries only hit when ASIDs match.
Special credit to Team 0xB6 for reporting: DongHa Lee, Gyujeong Jin,
Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang.
Signed-off-by: Geonha Lee <w1nsom3gna@korea.ac.kr>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250903150421.90752-1-w1nsom3gna@korea.ac.kr
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
nl80211_get_station() for MLO
Currently, nl80211_get_station() allocates a fixed buffer size using
NLMSG_DEFAULT_SIZE. In multi-link scenarios - particularly when the
number of links exceeds two - this buffer size is often insufficient
to accommodate complete station statistics, resulting in "no buffer
space available" errors.
To address this, modify nl80211_get_station() to return only
accumulated station statistics and exclude per link stats.
Pass a new flag (link_stats) to nl80211_send_station() to control
the inclusion of per link statistics. This allows retaining
detailed output with per link data in dump commands, while
excluding it from other commands where it is not needed.
This change modifies the handling of per link stats introduced in
commit 82d7f841d9bd ("wifi: cfg80211: extend to embed link level
statistics in NL message") to enable them only for
nl80211_dump_station().
Apply the same fix to cfg80211_del_sta_sinfo() by skipping per link
stats to avoid buffer issues. cfg80211_new_sta() doesn't include
stats and is therefore not impacted.
Fixes: 82d7f841d9bd ("wifi: cfg80211: extend to embed link level statistics in NL message")
Signed-off-by: Nithyanantham Paramasivam <nithyanantham.paramasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20250905124800.1448493-1-nithyanantham.paramasivam@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
Miri Korenblit says:
====================
iwlwifi fix
====================
Which is a fix for (old) 130/1030 devices to work again.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
Jeff Johnson says:
==================
ath.git update for v6.17-rc6
==================
There's a firmware API alignment fix, and a fix for powersave,
both for ath12k.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Commit 0e2f80afcfa6("fs/dax: ensure all pages are idle prior to
filesystem unmount") introduced the WARN_ON_ONCE to capture whether
the filesystem has removed all DAX entries or not and applied the
fix to xfs and ext4.
Apply the missed fix on erofs to fix the runtime warning:
[ 5.266254] ------------[ cut here ]------------
[ 5.266274] WARNING: CPU: 6 PID: 3109 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xff/0x260
[ 5.266294] Modules linked in:
[ 5.266999] CPU: 6 UID: 0 PID: 3109 Comm: umount Tainted: G S 6.16.0+ #6 PREEMPT(voluntary)
[ 5.267012] Tainted: [S]=CPU_OUT_OF_SPEC
[ 5.267017] Hardware name: Dell Inc. OptiPlex 5000/05WXFV, BIOS 1.5.1 08/24/2022
[ 5.267024] RIP: 0010:truncate_folio_batch_exceptionals+0xff/0x260
[ 5.267076] Code: 00 00 41 39 df 7f 11 eb 78 83 c3 01 49 83 c4 08 41 39 df 74 6c 48 63 f3 48 83 fe 1f 0f 83 3c 01 00 00 43 f6 44 26 08 01 74 df <0f> 0b 4a 8b 34 22 4c 89 ef 48 89 55 90 e8 ff 54 1f 00 48 8b 55 90
[ 5.267083] RSP: 0018:ffffc900013f36c8 EFLAGS: 00010202
[ 5.267095] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 5.267101] RDX: ffffc900013f3790 RSI: 0000000000000000 RDI: ffff8882a1407898
[ 5.267108] RBP: ffffc900013f3740 R08: 0000000000000000 R09: 0000000000000000
[ 5.267113] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 5.267119] R13: ffff8882a1407ab8 R14: ffffc900013f3888 R15: 0000000000000001
[ 5.267125] FS: 00007aaa8b437800(0000) GS:ffff88850025b000(0000) knlGS:0000000000000000
[ 5.267132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.267138] CR2: 00007aaa8b3aac10 CR3: 000000024f764000 CR4: 0000000000f52ef0
[ 5.267144] PKRU: 55555554
[ 5.267150] Call Trace:
[ 5.267154] <TASK>
[ 5.267181] truncate_inode_pages_range+0x118/0x5e0
[ 5.267193] ? save_trace+0x54/0x390
[ 5.267296] truncate_inode_pages_final+0x43/0x60
[ 5.267309] evict+0x2a4/0x2c0
[ 5.267339] dispose_list+0x39/0x80
[ 5.267352] evict_inodes+0x150/0x1b0
[ 5.267376] generic_shutdown_super+0x41/0x180
[ 5.267390] kill_block_super+0x1b/0x50
[ 5.267402] erofs_kill_sb+0x81/0x90 [erofs]
[ 5.267436] deactivate_locked_super+0x32/0xb0
[ 5.267450] deactivate_super+0x46/0x60
[ 5.267460] cleanup_mnt+0xc3/0x170
[ 5.267475] __cleanup_mnt+0x12/0x20
[ 5.267485] task_work_run+0x5d/0xb0
[ 5.267499] exit_to_user_mode_loop+0x144/0x170
[ 5.267512] do_syscall_64+0x2b9/0x7c0
[ 5.267523] ? __lock_acquire+0x665/0x2ce0
[ 5.267535] ? __lock_acquire+0x665/0x2ce0
[ 5.267560] ? lock_acquire+0xcd/0x300
[ 5.267573] ? find_held_lock+0x31/0x90
[ 5.267582] ? mntput_no_expire+0x97/0x4e0
[ 5.267606] ? mntput_no_expire+0xa1/0x4e0
[ 5.267625] ? mntput+0x24/0x50
[ 5.267634] ? path_put+0x1e/0x30
[ 5.267647] ? do_faccessat+0x120/0x2f0
[ 5.267677] ? do_syscall_64+0x1a2/0x7c0
[ 5.267686] ? from_kgid_munged+0x17/0x30
[ 5.267703] ? from_kuid_munged+0x13/0x30
[ 5.267711] ? __do_sys_getuid+0x3d/0x50
[ 5.267724] ? do_syscall_64+0x1a2/0x7c0
[ 5.267732] ? irqentry_exit+0x77/0xb0
[ 5.267743] ? clear_bhb_loop+0x30/0x80
[ 5.267752] ? clear_bhb_loop+0x30/0x80
[ 5.267765] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 5.267772] RIP: 0033:0x7aaa8b32a9fb
[ 5.267781] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8
[ 5.267787] RSP: 002b:00007ffd7c4c9468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 5.267796] RAX: 0000000000000000 RBX: 00005a61592a8b00 RCX: 00007aaa8b32a9fb
[ 5.267802] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005a61592b2080
[ 5.267806] RBP: 00007ffd7c4c9540 R08: 00007aaa8b403b20 R09: 0000000000000020
[ 5.267812] R10: 0000000000000001 R11: 0000000000000246 R12: 00005a61592a8c00
[ 5.267817] R13: 0000000000000000 R14: 00005a61592b2080 R15: 00005a61592a8f10
[ 5.267849] </TASK>
[ 5.267854] irq event stamp: 4721
[ 5.267859] hardirqs last enabled at (4727): [<ffffffff814abf50>] __up_console_sem+0x90/0xa0
[ 5.267873] hardirqs last disabled at (4732): [<ffffffff814abf35>] __up_console_sem+0x75/0xa0
[ 5.267884] softirqs last enabled at (3044): [<ffffffff8132adb3>] kernel_fpu_end+0x53/0x70
[ 5.267895] softirqs last disabled at (3042): [<ffffffff8132b5f4>] kernel_fpu_begin_mask+0xc4/0x120
[ 5.267905] ---[ end trace 0000000000000000 ]---
Fixes: bde708f1a65d ("fs/dax: always remove DAX page-cache entries when breaking layouts")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Friendy Su <friendy.su@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Matthieu Baerts says:
====================
mptcp: misc fixes for v6.17-rc6
Here are various unrelated fixes:
- Patch 1: Fix a wrong attribute type in the MPTCP Netlink specs. A fix
for v6.7.
- Patch 2: Avoid mentioning a deprecated MPTCP sysctl knob in the doc. A
fix for v6.15.
- Patch 3: Handle new warnings from ShellCheck v0.11.0. This prevents
some warnings reported by some CIs. If it is not a good material for
'net', please drop.
====================
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-0-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This v0.11.0 version introduces SC2329:
Warn when (non-escaping) functions are never invoked.
Except that, similar to SC2317, ShellCheck is currently unable to figure
out functions that are invoked via trap, or indirectly, when calling
functions via variables. It is then needed to disable this new SC2329.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-3-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The net.mptcp.pm_type sysctl knob has been deprecated in v6.15,
net.mptcp.path_manager should be used instead.
Adapt the section about path managers to suggest using the new sysctl
knob instead of the deprecated one.
Fixes: 595c26d122d1 ("mptcp: sysctl: set path manager by name")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-2-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This attribute is used as a signed number in the code in pm_netlink.c:
nla_put_s32(skb, MPTCP_ATTR_IF_IDX, ssk->sk_bound_dev_if))
The specs should then reflect that. Note that other 'if-idx' attributes
from the same .yaml file use a signed number as well.
Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-1-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Users reported a scenario where MPTCP connections that were configured
with SO_KEEPALIVE prior to connect would fail to enable their keepalives
if MTPCP fell back to TCP mode.
After investigating, this affects keepalives for any connection where
sync_socket_options is called on a socket that is in the closed or
listening state. Joins are handled properly. For connects,
sync_socket_options is called when the socket is still in the closed
state. The tcp_set_keepalive() function does not act on sockets that
are closed or listening, hence keepalive is not immediately enabled.
Since the SO_KEEPOPEN flag is absent, it is not enabled later in the
connect sequence via tcp_finish_connect. Setting the keepalive via
sockopt after connect does work, but would not address any subsequently
created flows.
Fortunately, the fix here is straight-forward: set SOCK_KEEPOPEN on the
subflow when calling sync_socket_options.
The fix was valdidated both by using tcpdump to observe keepalive
packets not being sent before the fix, and being sent after the fix. It
was also possible to observe via ss that the keepalive timer was not
enabled on these sockets before the fix, but was enabled afterwards.
Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Cc: stable@vger.kernel.org
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/aL8dYfPZrwedCIh9@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller managed to lock the lower device via ETHTOOL_SFEATURES:
netdev_lock include/linux/netdevice.h:2761 [inline]
netdev_lock_ops include/net/netdev_lock.h:42 [inline]
netdev_sync_lower_features net/core/dev.c:10649 [inline]
__netdev_update_features+0xcb1/0x1be0 net/core/dev.c:10819
netdev_update_features+0x6d/0xe0 net/core/dev.c:10876
macsec_notify+0x2f5/0x660 drivers/net/macsec.c:4533
notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
call_netdevice_notifiers net/core/dev.c:2281 [inline]
netdev_features_change+0x85/0xc0 net/core/dev.c:1570
__dev_ethtool net/ethtool/ioctl.c:3469 [inline]
dev_ethtool+0x1536/0x19b0 net/ethtool/ioctl.c:3502
dev_ioctl+0x392/0x1150 net/core/dev_ioctl.c:759
It happens because lower features are out of sync with the upper:
__dev_ethtool (real_dev)
netdev_lock_ops(real_dev)
ETHTOOL_SFEATURES
__netdev_features_change
netdev_sync_upper_features
disable LRO on the lower
if (old_features != dev->features)
netdev_features_change
fires NETDEV_FEAT_CHANGE
macsec_notify
NETDEV_FEAT_CHANGE
netdev_update_features (for each macsec dev)
netdev_sync_lower_features
if (upper_features != lower_features)
netdev_lock_ops(lower) # lower == real_dev
stuck
...
netdev_unlock_ops(real_dev)
Per commit af5f54b0ef9e ("net: Lock lower level devices when updating
features"), we elide the lock/unlock when the upper and lower features
are synced. Makes sure the lower (real_dev) has proper features after
the macsec link has been created. This makes sure we never hit the
situation where we need to sync upper flags to the lower.
Reported-by: syzbot+7e0f89fb6cae5d002de0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7e0f89fb6cae5d002de0
Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250908173614.3358264-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ndo hwtstamp callbacks are expected to run under the per-device ops
lock. Make the lower get/set paths consistent with the rest of ndo
invocations.
Kernel log:
WARNING: CPU: 13 PID: 51364 at ./include/net/netdev_lock.h:70 __netdev_update_features+0x4bd/0xe60
...
RIP: 0010:__netdev_update_features+0x4bd/0xe60
...
Call Trace:
<TASK>
netdev_update_features+0x1f/0x60
mlx5_hwtstamp_set+0x181/0x290 [mlx5_core]
mlx5e_hwtstamp_set+0x19/0x30 [mlx5_core]
dev_set_hwtstamp_phylib+0x9f/0x220
dev_set_hwtstamp_phylib+0x9f/0x220
dev_set_hwtstamp+0x13d/0x240
dev_ioctl+0x12f/0x4b0
sock_ioctl+0x171/0x370
__x64_sys_ioctl+0x3f7/0x900
? __sys_setsockopt+0x69/0xb0
do_syscall_64+0x6f/0x2e0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
...
</TASK>
....
---[ end trace 0000000000000000 ]---
Note that the mlx5_hwtstamp_set and mlx5e_hwtstamp_set functions shown
in the trace come from an in progress patch converting the legacy ioctl
to ndo_hwtstamp_get/set and are not present in mainline.
Fixes: ffb7ed19ac0a ("net: hold netdev instance lock during ioctl operations")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20250907080821.2353388-1-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename of open files in SMB2+ has been broken for a very long time,
resulting in data loss as the CIFS client would fail the rename(2)
call with -ENOENT and then removing the target file.
Fix this by implementing ->rename_pending_delete() for SMB2+, which
will rename busy files to random filenames (e.g. silly rename) during
unlink(2) or rename(2), and then marking them to delete-on-close.
Besides, introduce a FIND_WR_NO_PENDING_DELETE flag to prevent open(2)
from reusing open handles that had been marked as delete pending.
Handle it in cifs_get_readable_path() as well.
Reported-by: Jean-Baptiste Denis <jbdenis@pasteur.fr>
Closes: https://marc.info/?i=16aeb380-30d4-4551-9134-4e7d1dc833c0@pasteur.fr
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: Olga Kornievskaia <okorniev@redhat.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Scott Mayhew <smayhew@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The blamed commit changed the conditions which phylib uses to stop
and start the state machine in the suspend and resume paths, and
while improving it, has caused two issues.
The original code used this test:
phydev->attached_dev && phydev->adjust_link
and if true, the paths would handle the PHY state machine. This test
evaluates true for normal drivers that are using phylib directly
while the PHY is attached to the network device, but false in all
other cases, which include the following cases:
- when the PHY has never been attached to a network device.
- when the PHY has been detached from a network device (as phy_detach()
sets phydev->attached_dev to NULL, phy_disconnect() calls
phy_detach() and additionally sets phydev->adjust_link NULL.)
- when phylink is using the driver (as phydev->adjust_link is NULL.)
Only the third case was incorrect, and the blamed commit attempted to
fix this by changing this test to (simplified for brevity, see
phy_uses_state_machine()):
phydev->phy_link_change == phy_link_change ?
phydev->attached_dev && phydev->adjust_link : true
However, this also incorrectly evaluates true in the first two cases.
Fix the first case by ensuring that phy_uses_state_machine() returns
false when phydev->phy_link_change is NULL.
Fix the second case by ensuring that phydev->phy_link_change is set to
NULL when phy_detach() is called.
Reported-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/E1uvMEz-00000003Aoe-3qWe@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The encryption layer can't handle the padding iovs, so flatten the
compound request into a single buffer with required padding to prevent
the server from dropping the connection when finding unaligned
compound requests.
Fixes: bc925c1216f0 ("smb: client: improve compound padding in encryption")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf849ec8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
OpenWRT users reported regression on ARMv6 devices after updating to latest
HEAD, where tcpdump filter:
tcpdump "not ether host 3c37121a2b3c and not ether host 184ecbca2a3a \
and not ether host 14130b4d3f47 and not ether host f0f61cf440b7 \
and not ether host a84b4dedf471 and not ether host d022be17e1d7 \
and not ether host 5c497967208b and not ether host 706655784d5b"
fails with warning: "Kernel filter failed: No error information"
when using config:
# CONFIG_BPF_JIT_ALWAYS_ON is not set
CONFIG_BPF_JIT_DEFAULT_ON=y
The issue arises because commits:
1. "bpf: Fix array bounds error with may_goto" changed default runtime to
__bpf_prog_ret0_warn when jit_requested = 1
2. "bpf: Avoid __bpf_prog_ret0_warn when jit fails" returns error when
jit_requested = 1 but jit fails
This change restores interpreter fallback capability for BPF programs with
stack size <= 512 bytes when jit fails.
Reported-by: Felix Fietkau <nbd@nbd.name>
Closes: https://lore.kernel.org/bpf/2e267b4b-0540-45d8-9310-e127bf95fc63@nbd.name/
Fixes: 6ebc5030e0c5 ("bpf: Fix array bounds error with may_goto")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250909144614.2991253-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, out of all 3 types of waiters in the rqspinlock slow path
(i.e., pending bit waiter, wait queue head waiter, and wait queue
non-head waiter), only the pending bit waiter and wait queue head
waiters apply deadlock checks and a timeout on their waiting loop. The
assumption here was that the wait queue head's forward progress would be
sufficient to identify cases where the lock owner or pending bit waiter
is stuck, and non-head waiters relying on the head waiter would prove to
be sufficient for their own forward progress.
However, the head waiter itself can be preempted by a non-head waiter
for the same lock (AA) or a different lock (ABBA) in a manner that
impedes its forward progress. In such a case, non-head waiters not
performing deadlock and timeout checks becomes insufficient, and the
system can enter a state of lockup.
This is typically not a concern with non-NMI lock acquisitions, as lock
holders which in run in different contexts (IRQ, non-IRQ) use "irqsave"
variants of the lock APIs, which naturally excludes such lock holders
from preempting one another on the same CPU.
It might seem likely that a similar case may occur for rqspinlock when
programs are attached to contention tracepoints (begin, end), however,
these tracepoints either precede the enqueue into the wait queue, or
succeed it, therefore cannot be used to preempt a head waiter's waiting
loop.
We must still be careful against nested kprobe and fentry programs that
may attach to the middle of the head's waiting loop to stall forward
progress and invoke another rqspinlock acquisition that proceeds as a
non-head waiter. To this end, drop CC_FLAGS_FTRACE from the rqspinlock.o
object file.
For now, this issue is resolved by falling back to a repeated trylock on
the lock word from NMI context, while performing the deadlock checks to
break out early in case forward progress is impossible, and use the
timeout as a final fallback.
A more involved fix to terminate the queue when such a condition occurs
will be made as a follow up. A selftest to stress this aspect of nested
NMI/non-NMI locking attempts will be added in a subsequent patch to the
bpf-next tree when this fix lands and trees are synchronized.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes: 164c246571e9 ("rqspinlock: Protect waiters in queue from stalls")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250909184959.3509085-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Eryk reported an issue that I have put under Closes: tag, related to
umem addrs being prematurely produced onto pool's completion queue.
Let us make the skb's destructor responsible for producing all addrs
that given skb used.
Commit from fixes tag introduced the buggy behavior, it was not broken
from day 1, but rather when xsk multi-buffer got introduced.
In order to mitigate performance impact as much as possible, mimic the
linear and frag parts within skb by storing the first address from XSK
descriptor at sk_buff::destructor_arg. For fragments, store them at ::cb
via list. The nodes that will go onto list will be allocated via
kmem_cache. xsk_destruct_skb() will consume address stored at
::destructor_arg and optionally go through list from ::cb, if count of
descriptors associated with this particular skb is bigger than 1.
Previous approach where whole array for storing UMEM addresses from XSK
descriptors was pre-allocated during first fragment processing yielded
too big performance regression for 64b traffic. In current approach
impact is much reduced on my tests and for jumbo frames I observed
traffic being slower by at most 9%.
Magnus suggested to have this way of processing special cased for
XDP_SHARED_UMEM, so we would identify this during bind and set different
hooks for 'backpressure mechanism' on CQ and for skb destructor, but
given that results looked promising on my side I decided to have a
single data path for XSK generic Tx. I suppose other auxiliary stuff
would have to land as well in order to make it work.
Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
Reported-by: Eryk Kubanski <e.kubanski@partner.samsung.com>
Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20250904194907.2342177-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Unfortunately Mykola won't participate in BPF selftests maintenance
anymore. Remove the entry on his behalf.
Acked-by: Mykola Lysenko <nickolay.lysenko@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250909171638.2417272-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Rong Tao says:
====================
Fix bpf_strnstr() wrong 'len' parameter, bpf_strnstr("open", "open", 4)
should return 0 instead of -ENOENT. And fix a more general case when s2
is a suffix of the first len characters of s1.
====================
Link: https://patch.msgid.link/tencent_E72A37AF03A3B18853066E421B5969976208@qq.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Ilya Leoshkevich says:
====================
selftests/bpf: Fix "expression result unused" warnings with icecc
v3: https://lore.kernel.org/bpf/20250827194929.416969-1-iii@linux.ibm.com/
v3 -> v4: Go back to the original solution (Yonghong, Alexei).
v2: https://lore.kernel.org/bpf/20250827130519.411700-1-iii@linux.ibm.com/
v2 -> v3: Do not touch libbpf, explain how having two function
declarations works (Andrii).
Fix bpf-gcc build (CI).
v1: https://lore.kernel.org/bpf/20250508113804.304665-1-iii@linux.ibm.com/
v1 -> v2: Annotate bpf_obj_new_impl() with __must_check (Alexei).
Add an explanation about icecc.
I took another look at the "expression result unused" warnings I've
been seeing, and it turned out that the root cause was the icecc
compiler wrapper and what I consider a clang bug. Back then I've
reported that the problem was reproducible with plain clang, but now
I see that it was clearly a mixup, sorry about that.
The solution is to add a few awkward (void) casts. I've added a
detailed explanation of why they are helpful to the commit message.
====================
Link: https://patch.msgid.link/20250829030017.102615-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add tests for bpf_strnstr():
bpf_strnstr("", "", 0) = 0
bpf_strnstr("hello world", "hello", 5) = 0
bpf_strnstr(str, "hello", 4) = -ENOENT
bpf_strnstr("", "a", 0) = -ENOENT
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_2ED218F8082565C95D86A804BDDA8DBA200A@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
icecc is a compiler wrapper that distributes compile jobs over a build
farm [1]. It works by sending toolchain binaries and preprocessed
source code to remote machines.
Unfortunately using it with BPF selftests causes build failures due to
a clang bug [2]. The problem is that clang suppresses the
-Wunused-value warning if the unused expression comes from a macro
expansion. Since icecc compiles preprocessed source code, this
information is not available. This leads to -Wunused-value false
positives.
obj_new_no_struct() and obj_new_acq() use the bpf_obj_new() macro and
discard the result. arena_spin_lock_slowpath() uses two macros that
produce values and ignores the results. Add (void) casts to explicitly
indicate that this is intentional and suppress the warning.
An alternative solution is to change the macros to not produce values.
This would work today for the arena_spin_lock_slowpath() issue, but in
the future there may appear users who need them. Another potential
solution is to replace these macros with functions. Unfortunately this
would not work, because these macros work with unknown types and
control flow.
[1] https://github.com/icecc/icecream
[2] https://github.com/llvm/llvm-project/issues/142614
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250829030017.102615-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_strnstr() should not treat the ending '\0' of s2 as a matching character
if the parameter 'len' equal to s2 string length, for example:
1. bpf_strnstr("openat", "open", 4) = -ENOENT
2. bpf_strnstr("openat", "open", 5) = 0
This patch makes (1) return 0, fix just the `len == strlen(s2)` case.
And fix a more general case when s2 is a suffix of the first len
characters of s1.
Fixes: e91370550f1f ("bpf: Add kfuncs for read-only string operations")
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_17DC57B9D16BC443837021BEACE84B7C1507@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Small cleanup and test extension to probe the bpf_crypto_{encrypt,decrypt}()
kfunc when a bad dst buffer is passed in to assert that an error is returned.
Also, encrypt_sanity() and skb_crypto_setup() were explicit to set the global
status variable to zero before any test, so do the same for decrypt_sanity().
Do not explicitly zero the on-stack err before bpf_crypto_ctx_create() given
the kfunc is expected to do it internally for the success case.
Before kernel fix:
# ./vmtest.sh -- ./test_progs -t crypto
[...]
[ 1.531200] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.533388] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#87/1 crypto_basic/crypto_release:OK
#87/2 crypto_basic/crypto_acquire:OK
#87 crypto_basic:OK
test_crypto_sanity:PASS:skel open 0 nsec
test_crypto_sanity:PASS:ip netns add crypto_sanity_ns 0 nsec
test_crypto_sanity:PASS:ip -net crypto_sanity_ns -6 addr add face::1/128 dev lo nodad 0 nsec
test_crypto_sanity:PASS:ip -net crypto_sanity_ns link set dev lo up 0 nsec
test_crypto_sanity:PASS:open_netns 0 nsec
test_crypto_sanity:PASS:AF_ALG init fail 0 nsec
test_crypto_sanity:PASS:if_nametoindex lo 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup fd 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup retval 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup status 0 nsec
test_crypto_sanity:PASS:create qdisc hook 0 nsec
test_crypto_sanity:PASS:make_sockaddr 0 nsec
test_crypto_sanity:PASS:attach encrypt filter 0 nsec
test_crypto_sanity:PASS:encrypt socket 0 nsec
test_crypto_sanity:PASS:encrypt send 0 nsec
test_crypto_sanity:FAIL:encrypt status unexpected error: -5 (errno 95)
#88 crypto_sanity:FAIL
Summary: 1/2 PASSED, 0 SKIPPED, 1 FAILED
After kernel fix:
# ./vmtest.sh -- ./test_progs -t crypto
[...]
[ 1.540963] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.542404] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#87/1 crypto_basic/crypto_release:OK
#87/2 crypto_basic/crypto_acquire:OK
#87 crypto_basic:OK
#88 crypto_sanity:OK
Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Stanislav reported that in bpf_crypto_crypt() the destination dynptr's
size is not validated to be at least as large as the source dynptr's
size before calling into the crypto backend with 'len = src_len'. This
can result in an OOB write when the destination is smaller than the
source.
Concretely, in mentioned function, psrc and pdst are both linear
buffers fetched from each dynptr:
psrc = __bpf_dynptr_data(src, src_len);
[...]
pdst = __bpf_dynptr_data_rw(dst, dst_len);
[...]
err = decrypt ?
ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv) :
ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);
The crypto backend expects pdst to be large enough with a src_len length
that can be written. Add an additional src_len > dst_len check and bail
out if it's the case. Note that these kfuncs are accessible under root
privileges only.
Fixes: 3e1c6f35409f ("bpf: make common crypto API for TC/XDP programs")
Reported-by: Stanislav Fort <disclosure@aisle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
There is no reason to require this to happen on first submitted IB only.
We need to wait for the queue to be idle, but it can be done at any
time (including when there are multiple video sessions active).
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5)
Cc: stable@vger.kernel.org
|
|
There can be multiple engine info packages in one IB and the first one
may be common engine, not decode/encode.
We need to parse the entire IB instead of stopping after finding first
engine info.
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc)
Cc: stable@vger.kernel.org
|
|
Declare isp firmware file isp_4_1_1.bin required by isp4.1.1 device.
Suggested-by: Alexey Zagorodnikov <xglooom@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d97b74a833eba1f4f69f67198fd98ef036c0e5f9)
Cc: stable@vger.kernel.org
|
|
This function can be called from an atomic context so we can't use
fsleep().
Fixes: 01f60348d8fb ("drm/amd/display: Fix 'failed to blank crtc!'")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4549
Cc: Wen Chen <Wen.Chen3@amd.com>
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 27e4dc2c0543fd1808cc52bd888ee1e0533c4a2e)
|
|
Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free")
reordered when amdgpu_fence_driver_sw_fini() was called after
that patch, amdgpu_fence_driver_sw_fini() effectively became
a no-op as the sched entities we never freed because the
ring pointers were already set to NULL. Remove the NULL
setting.
Reported-by: Lin.Cao <lincao12@amd.com>
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5)
Cc: stable@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fix from Marek Szyprowski:
- one more fix for DMA API debugging infrastructure (Baochen Qiang)
* tag 'dma-mapping-6.17-2025-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-debug: don't enforce dma mapping check on noncoherent allocations
|
|
The i40e hardware has multiple hardware settings which define the Maximum
Frame Size (MFS) of the physical port. The firmware has an AdminQ command
(0x0603) to configure the MFS, but the i40e Linux driver never issues this
command.
In most cases this is no problem, as the NVM default value has the device
configured for its maximum value of 9728. Unfortunately, recent versions of
the iPXE intelxl driver now issue the 0x0603 Set Mac Config command,
modifying the MFS and reducing it from its default value of 9728.
This occurred as part of iPXE commit 6871a7de705b ("[intelxl] Use admin
queue to set port MAC address and maximum frame size"), a prerequisite
change for supporting the E800 series hardware in iPXE. Both the E700 and
E800 firmware support the AdminQ command, and the iPXE code shares much of
the logic between the two device drivers.
The ice E800 Linux driver already issues the 0x0603 Set Mac Config command
early during probe, and is thus unaffected by the iPXE change.
Since commit 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set"), the
i40e driver does check the I40E_PRTGL_SAH register, but it only logs a
warning message if its value is below the 9728 default. This register also
only covers received packets and not transmitted packets. A warning can
inform system administrators, but does not correct the issue. No
interactions from userspace cause the driver to write to PRTGL_SAH or issue
the 0x0603 AdminQ command. Only a GLOBR reset will restore the value to its
default value. There is no obvious method to trigger a GLOBR reset from
user space.
To fix this, introduce the i40e_aq_set_mac_config() function, similar to
the one from the ice driver. Call this during early probe to ensure that
the device configuration matches driver expectation. Unlike E800, the E700
firmware also has a bit to control whether the MAC should append CRC data.
It is on by default, but setting a 0 to this bit would disable CRC. The
i40e implementation must set this bit to ensure CRC will be appended by the
MAC.
In addition to the AQ command, instead of just checking the I40E_PRTGL_SAH
register, update its value to the 9728 default and write it back. This
ensures that the hardware is in the expected state, regardless of whether
the iPXE (or any other early boot driver) has modified this state.
This is a better user experience, as we now fix the issues with larger MTU
instead of merely warning. It also aligns with the way the ice E800 series
driver works.
A final note: The Fixes tag provided here is not strictly accurate. The
issue occurs as a result of an external entity (the iPXE intelxl driver),
and this is not a regression specifically caused by the mentioned change.
However, I believe the original change to just warn about PRTGL_SAH being
too low was an insufficient fix.
Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set")
Link: https://github.com/ipxe/ipxe/commit/6871a7de705b6f6a4046f0d19da9bcd689c3bc8e
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Expand workaround to additional graphics architectures.
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.17+
Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250903190122.1028373-2-julia.filipchuk@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 6fc957185e1691bb6dfa4193698a229db537c2a2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When the xe pm_notifier evicts for suspend / hibernate, there might be
racing tasks trying to re-validate again. This can lead to suspend taking
excessive time or get stuck in a live-lock. This behaviour becomes
much worse with the fix that actually makes re-validation bring back
bos to VRAM rather than letting them remain in TT.
Prevent that by having exec and the rebind worker waiting for a completion
that is set to block by the pm_notifier before suspend and is signaled
by the pm_notifier after resume / wakeup.
It's probably still possible to craft malicious applications that block
suspending. More work is pending to fix that.
v3:
- Avoid wait_for_completion() in the kernel worker since it could
potentially cause work item flushes from freezable processes to
wait forever. Instead terminate the rebind workers if needed and
re-launch at resume. (Matt Auld)
v4:
- Fix some bad naming and leftover debug printouts.
- Fix kerneldoc.
- Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld).
- Rework the interface of xe_vm_rebind_resume_worker (Matt Auld).
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com
(cherry picked from commit 599334572a5a99111015fbbd5152ce4dedc2f8b7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Its actions are opportunistic anyway and will be completed
on device suspend.
Marking as a fix to simplify backporting of the fix
that follows in the series.
v2:
- Keep the runtime pm reference over suspend / hibernate and
document why. (Matt Auld, Rodrigo Vivi):
Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-3-thomas.hellstrom@linux.intel.com
(cherry picked from commit ebd546fdffddfcaeab08afdd68ec93052c8fa740)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
VRAM+TT bos that are evicted from VRAM to TT may remain in
TT also after a revalidation following eviction or suspend.
This manifests itself as applications becoming sluggish
after buffer objects get evicted or after a resume from
suspend or hibernation.
If the bo supports placement in both VRAM and TT, and
we are on DGFX, mark the TT placement as fallback. This means
that it is tried only after VRAM + eviction.
This flaw has probably been present since the xe module was
upstreamed but use a Fixes: commit below where backporting is
likely to be simple. For earlier versions we need to open-
code the fallback algorithm in the driver.
v2:
- Remove check for dgfx. (Matthew Auld)
- Update the xe_dma_buf kunit test for the new strategy (CI)
- Allow dma-buf to pin in current placement (CI)
- Make xe_bo_validate() for pinned bos a NOP.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995
Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit cb3d7b3b46b799c96b54f8e8fe36794a55a77f0b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This is a user controlled configfs attribute, we should not
modify that outside the configfs attr.store() implementation.
Fixes: bc417e54e24b ("drm/xe: Enable configfs support for survivability mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com
(cherry picked from commit 079a5c83dbd23db7a6eed8f558cf75e264d8a17b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration
later than the first, the error path wants to free the IRQs requested
so far. However, it uses the wrong dev_id argument for free_irq(), so
it does not free the IRQs correctly and instead triggers the warning:
Trying to free already-free IRQ 173
WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0
Modules linked in: i40e(+) [...]
CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy)
Hardware name: [...]
RIP: 0010:__free_irq+0x192/0x2c0
[...]
Call Trace:
<TASK>
free_irq+0x32/0x70
i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e]
i40e_vsi_request_irq+0x79/0x80 [i40e]
i40e_vsi_open+0x21f/0x2f0 [i40e]
i40e_open+0x63/0x130 [i40e]
__dev_open+0xfc/0x210
__dev_change_flags+0x1fc/0x240
netif_change_flags+0x27/0x70
do_setlink.isra.0+0x341/0xc70
rtnl_newlink+0x468/0x860
rtnetlink_rcv_msg+0x375/0x450
netlink_rcv_skb+0x5c/0x110
netlink_unicast+0x288/0x3c0
netlink_sendmsg+0x20d/0x430
____sys_sendmsg+0x3a2/0x3d0
___sys_sendmsg+0x99/0xe0
__sys_sendmsg+0x8a/0xf0
do_syscall_64+0x82/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[...]
</TASK>
---[ end trace 0000000000000000 ]---
Use the same dev_id for free_irq() as for request_irq().
I tested this with inserting code to fail intentionally.
Fixes: 493fb30011b3 ("i40e: Move q_vectors from pointer to array to array of pointers")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The igb driver incorrectly skips the link test when the network
interface is admin down (if_running == false), causing the test to
always report PASS regardless of the actual physical link state.
This behavior is inconsistent with other drivers (e.g. i40e, ice, ixgbe,
etc.) which correctly test the physical link state regardless of admin
state.
Remove the if_running check to ensure link test always reflects the
physical link state.
Fixes: 8d420a1b3ea6 ("igb: correct link test not being run when link is down")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The igb driver currently causes a NULL pointer dereference when executing
the ethtool loopback test. This occurs because there is no associated
q_vector for the test ring when it is set up, as interrupts are typically
not added to the test rings.
Since commit 5ef44b3cb43b removed the napi_id assignment in
__xdp_rxq_info_reg(), there is no longer a need to pass a napi_id to it.
Therefore, simply use 0 as the last parameter.
Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Tianyu Xu <tianyxu@cisco.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Current mem limit check leaks some GTT memory (reserved_for_pt
reserved_for_ras + adev->vram_pin_size) for small APUs.
Since carveout VRAM is tunable on APUs, there are three case
regarding the carveout VRAM size relative to GTT:
1. 0 < carveout < gtt
apu_prefer_gtt = true, is_app_apu = false
2. carveout > gtt / 2
apu_prefer_gtt = false, is_app_apu = false
3. 0 = carveout
apu_prefer_gtt = true, is_app_apu = true
It doesn't make sense to check below limitation in case 1
(default case, small carveout) because the values in the below
expression are mixed with carveout and gtt.
adev->kfd.vram_used[xcp_id] + vram_needed >
vram_size - reserved_for_pt - reserved_for_ras -
atomic64_read(&adev->vram_pin_size)
gtt: kfd.vram_used, vram_needed, vram_size
carveout: reserved_for_pt, reserved_for_ras, adev->vram_pin_size
In case 1, vram allocation will go to gtt domain, skip vram check
since ttm_mem_limit check already cover this allocation.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fa7c99f04f6dd299388e9282812b14e95558ac8e)
|
|
When creating p2p links, KFD needs to check XGMI link
with two conditions, hive_id and is_sharing_enabled,
but it is missing to check is_sharing_enabled, so add
it to fix the error.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 36cc7d13178d901982da7a122c883861d98da624)
|
|
Fixes a bug where unbinding of the GPU would leave the oem i2c adapter
registered resulting in a null pointer dereference when applications try
to access the invalid device.
Fixes: 3d5470c97314 ("drm/amd/display/dm: add support for OEM i2c bus")
Cc: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 89923fb7ead4fdd37b78dd49962d9bb5892403e6)
Cc: stable@vger.kernel.org
|