Age | Commit message (Collapse) | Author |
|
Pull block fixes from Jens Axboe:
- Fix plugging for native zone writes
- Fix segment limit settings for != 4K page size archs
- Fix for slab names overflowing
* tag 'block-6.14-20250228' of git://git.kernel.dk/linux:
block: fix 'kmem_cache of name 'bio-108' already exists'
block: Remove zone write plugs when handling native zone append writes
block: make segment size limit workable for > 4K PAGE_SIZE
|
|
Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single(). Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Manually load the guest's DEBUGCTL prior to VMRUN (and restore the host's
value on #VMEXIT) if it diverges from the host's value and LBR
virtualization is disabled, as hardware only context switches DEBUGCTL if
LBR virtualization is fully enabled. Running the guest with the host's
value has likely been mildly problematic for quite some time, e.g. it will
result in undesirable behavior if BTF diverges (with the caveat that KVM
now suppresses guest BTF due to lack of support).
But the bug became fatal with the introduction of Bus Lock Trap ("Detect"
in kernel paralance) support for AMD (commit 408eb7417a92
("x86/bus_lock: Add support for AMD")), as a bus lock in the guest will
trigger an unexpected #DB.
Note, suppressing the bus lock #DB, i.e. simply resuming the guest without
injecting a #DB, is not an option. It wouldn't address the general issue
with DEBUGCTL, e.g. for things like BTF, and there are other guest-visible
side effects if BusLockTrap is left enabled.
If BusLockTrap is disabled, then DR6.BLD is reserved-to-1; any attempts to
clear it by software are ignored. But if BusLockTrap is enabled, software
can clear DR6.BLD:
Software enables bus lock trap by setting DebugCtl MSR[BLCKDB] (bit 2)
to 1. When bus lock trap is enabled, ... The processor indicates that
this #DB was caused by a bus lock by clearing DR6[BLD] (bit 11). DR6[11]
previously had been defined to be always 1.
and clearing DR6.BLD is "sticky" in that it's not set (i.e. lowered) by
other #DBs:
All other #DB exceptions leave DR6[BLD] unmodified
E.g. leaving BusLockTrap enable can confuse a legacy guest that writes '0'
to reset DR6.
Reported-by: rangemachine@gmail.com
Reported-by: whanos@sergal.fun
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219787
Closes: https://lore.kernel.org/all/bug-219787-28872@https.bugzilla.kernel.org%2F
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Mark BTF as reserved in DEBUGCTL on AMD, as KVM doesn't actually support
BTF, and fully enabling BTF virtualization is non-trivial due to
interactions with the emulator, guest_debug, #DB interception, nested SVM,
etc.
Don't inject #GP if the guest attempts to set BTF, as there's no way to
communicate lack of support to the guest, and instead suppress the flag
and treat the WRMSR as (partially) unsupported.
In short, make KVM behave the same on AMD and Intel (VMX already squashes
BTF).
Note, due to other bugs in KVM's handling of DEBUGCTL, the only way BTF
has "worked" in any capacity is if the guest simultaneously enables LBRs.
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Drop bits 5:2 from the guest's effective DEBUGCTL value, as AMD changed
the architectural behavior of the bits and broke backwards compatibility.
On CPUs without BusLockTrap (or at least, in APMs from before ~2023),
bits 5:2 controlled the behavior of external pins:
Performance-Monitoring/Breakpoint Pin-Control (PBi)—Bits 5:2, read/write.
Software uses thesebits to control the type of information reported by
the four external performance-monitoring/breakpoint pins on the
processor. When a PBi bit is cleared to 0, the corresponding external pin
(BPi) reports performance-monitor information. When a PBi bit is set to
1, the corresponding external pin (BPi) reports breakpoint information.
With the introduction of BusLockTrap, presumably to be compatible with
Intel CPUs, AMD redefined bit 2 to be BLCKDB:
Bus Lock #DB Trap (BLCKDB)—Bit 2, read/write. Software sets this bit to
enable generation of a #DB trap following successful execution of a bus
lock when CPL is > 0.
and redefined bits 5:3 (and bit 6) as "6:3 Reserved MBZ".
Ideally, KVM would treat bits 5:2 as reserved. Defer that change to a
feature cleanup to avoid breaking existing guest in LTS kernels. For now,
drop the bits to retain backwards compatibility (of a sort).
Note, dropping bits 5:2 is still a guest-visible change, e.g. if the guest
is enabling LBRs *and* the legacy PBi bits, then the state of the PBi bits
is visible to the guest, whereas now the guest will always see '0'.
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add an L1 (guest) assert to the nested exceptions test to verify that KVM
doesn't put VMRUN in an STI shadow (AMD CPUs bleed the shadow into the
guest's int_state if a #VMEXIT occurs before VMRUN fully completes).
Add a similar assert to the VMX side as well, because why not.
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli@broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Pull io_uring fix from Jens Axboe:
"Just a single fix headed for stable, ensuring that msg_control is
properly saved in compat mode as well"
* tag 'io_uring-6.14-20250228' of git://git.kernel.dk/linux:
io_uring/net: save msg_control for compat
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"Another couple of EFI fixes for v6.14.
Only James's patch stands out, as it implements a workaround for odd
behavior in fwupd in user space, which creates EFI variables by
touching a file in efivarfs, clearing the immutable bit (which gets
set automatically for $reasons) and then opening it again for writing,
none of which is really necessary.
The fwupd author and LVFS maintainer is already rolling out a fix for
this on the fwupd side, and suggested that the workaround in this PR
could be backed out again during the next cycle.
(There is a semantic mismatch in efivarfs where some essential
variable attributes are stored in the first 4 bytes of the file, and
so zero length files cannot exist, as they cannot be written back to
the underlying variable store. So now, they are dropped once the last
reference is released.)
Summary:
- Fix CPER error record parsing bugs
- Fix a couple of efivarfs issues that were introduced in the merge
window
- Fix an issue in the early remapping code of the MOKvar table"
* tag 'efi-fixes-for-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/mokvar-table: Avoid repeated map/unmap of the same page
efi: Don't map the entire mokvar table to determine its size
efivarfs: allow creation of zero length files
efivarfs: Defer PM notifier registration until .fill_super
efi/cper: Fix cper_arm_ctx_info alignment
efi/cper: Fix cper_ia_proc_ctx alignment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.14-rc5
- npcm fixes interrupt initialization sequence.
- ls2x fixes frequency setting.
- amd-asf re-enables interrupts properly at irq handler's exit.
|
|
The gpiod_direction_input_nonotify() function is supposed to return zero
if the direction for the pin is input. But instead it accidentally
returns GPIO_LINE_DIRECTION_IN (1) which will be cast into an ERR_PTR()
in gpiochip_request_own_desc(). The callers dereference it and it leads
to a crash.
I changed gpiod_direction_output_raw_commit() just for consistency but
returning GPIO_LINE_DIRECTION_OUT (0) is fine.
Cc: stable@vger.kernel.org
Fixes: 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/254f3925-3015-4c9d-aac5-bb9b4b2cd2c5@stanley.mountain
[Bartosz: moved the variable declarations to the top of the functions]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Device mapper bioset often has big bio_slab size, which can be more than
1000, then 8byte can't hold the slab name any more, cause the kmem_cache
allocation warning of 'kmem_cache of name 'bio-108' already exists'.
Fix the warning by extending bio_slab->name to 12 bytes, but fix output
of /proc/slabinfo
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250228132656.2838008-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introduce the sta_set_mesh_sinfo() to populate mesh related
fields in sinfo structure for station statistics.
This will allow for the simplified population of other fields in the
sinfo structure for link level in a subsequent patch to add support
for MLO station statistics.
No functionality changes added.
Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com>
Link: https://patch.msgid.link/20250213171632.1646538-3-quic_sarishar@quicinc.com
[reword since it's just an internal thing]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Currently, as multi-link operation(MLO) is not supported for mesh,
reorganize the sinfo structure for mesh-specific fields and embed
mesh related NL attributes together in organized view.
This will allow for the simplified reorganization of sinfo structure
for link level in a subsequent patch to add support for MLO station
statistics.
No functionality changes added.
Pahole summary before the reorg of sinfo structure:
- size: 256, cachelines: 4, members: 50
- sum members: 239, holes: 4, sum holes: 17
- paddings: 2, sum paddings: 2
- forced alignments: 1, forced holes: 1, sum forced holes: 1
Pahole summary after the reorg of sinfo structure:
- size: 248, cachelines: 4, members: 50
- sum members: 239, holes: 4, sum holes: 9
- paddings: 2, sum paddings: 2
- forced alignments: 1, last cacheline: 56 bytes
Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com>
Link: https://patch.msgid.link/20250213171632.1646538-2-quic_sarishar@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There is a spelling mistake in a IWL_DEBUG_RATE message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250227221917.658401-1-colin.i.king@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts
locally") moved the call to enable_drhd_fault_handling() to a code
path that does not hold any lock while traversing the drhd list. Fix
it by ensuring the dmar_global_lock lock is held when traversing the
drhd list.
Without this fix, the following warning is triggered:
=============================
WARNING: suspicious RCU usage
6.14.0-rc3 #55 Not tainted
-----------------------------
drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by cpuhp/1/23:
#0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
#1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
stack backtrace:
CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55
Call Trace:
<TASK>
dump_stack_lvl+0xb7/0xd0
lockdep_rcu_suspicious+0x159/0x1f0
? __pfx_enable_drhd_fault_handling+0x10/0x10
enable_drhd_fault_handling+0x151/0x180
cpuhp_invoke_callback+0x1df/0x990
cpuhp_thread_fun+0x1ea/0x2c0
smpboot_thread_fn+0x1f5/0x2e0
? __pfx_smpboot_thread_fn+0x10/0x10
kthread+0x12a/0x2d0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x4a/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat
about a possible deadlock between dmar_global_lock and cpu_hotplug_lock.
This is avoided by not holding dmar_global_lock when calling
iommu_device_register(), which initiates the device probe process.
Fixes: d74169ceb0d2 ("iommu/vt-d: Allocate DMAR fault interrupts locally")
Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com>
Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/
Tested-by: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Remove the device comparison check in context_setup_pass_through_cb.
pci_for_each_dma_alias already makes a decision on whether the
callback function should be called for a device. With the check
in place it will fail to create context entries for aliases as
it walks up to the root bus.
Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain")
Closes: https://lore.kernel.org/linux-iommu/82499eb6-00b7-4f83-879a-e97b4144f576@linux.intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20250224180316.140123-1-jsnitsel@redhat.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When updating the page table root field on the DTE, avoid overwriting any
bits that are already set. The earlier call to make_clear_dte() writes
default values that all DTEs must have set (currently DTE[V]), and those
must be preserved.
Currently this doesn't cause problems since the page table root update is
the first field that is set after make_clear_dte() is called, and
DTE_FLAG_V is set again later along with the permission bits (IR/IW).
Remove this redundant assignment too.
Fixes: fd5dff9de4be ("iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250106191413.3107140-1-alejandro.j.jimenez@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
After some digging around I have found that this laptop has Cirrus's smart
aplifiers connected to SPI bus (spi1-CSC3551:00-cs35l41-hda).
To get them correctly detected and working I had to modify patch_realtek.c
with ASUS EXPERTBOOK P5405CSA 1.0 SystemID (0x1043, 0x1f63) and add
corresponding hda_quirk (ALC245_FIXUP_CS35L41_SPI_2).
Signed-off-by: Daniel Bárta <daniel.barta@trustlab.cz>
Link: https://patch.msgid.link/20250227161256.18061-2-daniel.barta@trustlab.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Use the basic quirk for this type of amplifier. Sound works in speakers,
headphones, and microphone. Whereas none worked before.
Tested-by: Kyle Gospodnetich <me@kylegospodneti.ch>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://patch.msgid.link/20250227175107.33432-3-lkml@antheas.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
In commit 1e9c708dc3ae ("ALSA: hda/tas2781: Add new quirk for Lenovo,
ASUS, Dell projects") Baojun adds a bunch of projects to the file,
including for the Ally X. Turns out the initial Ally X was not sorted
properly, so the kernel had 2 quirks for it.
The previous quirk overrode the new one due to being earlier and they
are different. When AB testing, the normal pin fixup seems to work ok
but causes a bit of a minor popping. Given the other config is more
complicated and may cause undefined behavior, revert it.
Fixes: 1e9c708dc3ae ("ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects")
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://patch.msgid.link/20250227175107.33432-2-lkml@antheas.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Tariq Toukan says:
====================
mlx5: Trust lockdown health syndrome
This series introduces a new error type in the health syndrome,
specifically for trust lock-down. Additionally, it exposes the CRR bit
in the health buffer, which, when set, indicates that the error cannot
be recovered without a process involving a cold reset. We add The CRR
bit value to the health buffer info log and update it to be logged on
any syndrome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the new health syndrome value to hsynd_str() function
to indicate that the device got a trust lockdown fault.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose crr bit in struct health buffer. When set, it indicates that
the error cannot be recovered without flow involving a cold reset.
Add its value to the health buffer info log.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently health buffer data is logged either when FW fatal error
detected or miss counter reached max misses threshold.
Log health buffer whenever new health syndrome is detected.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case health counter has not increased for few polling intervals, miss
counter will reach max misses threshold and health report will be
triggered for FW health reporter. In case syndrome found on same health
poll another health report will be triggered.
Avoid two health reports on same syndrome by marking this syndrome as
already known.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull drm fixes from Dave Airlie:
"This week's fixes pull, amdgpu mostly, with some xe and a few misc
others, the fb defio fix is bit of a change, but it avoids some nasty
NULL pointer crashes due to defio assuming page backing in places it
didn't have pages.
amdgpu:
- Legacy dpm suspend/resume fix
- Runtime PM fix for DELL G5 SE
- MAINTAINERS updates
- Enforce Isolation fixes
- mailmap update
- EDID reading i2c fix
- PSR fix
- eDP fix
- HPD interrupt handling fix
- Clear memory fix
amdkfd:
- MQD handling fix
vkms:
- fix rounding error
imagination:
- header fix
nouveau:
- connector status fix
fb/defio:
- NULL ptr fix for defio drivers
i915:
- Fix encoder HW state readout for DP UHBR MST
xe:
- OA uapi fix (Umesh)
- Userptr related fixes
- Remove a duplicated register entry
- Scheduler related fix to prevent exec races when freeing it"
* tag 'drm-fixes-2025-02-28' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
drm/fbdev-dma: Add shadow buffering for deferred I/O
drm/nouveau: Do not override forced connector status
drm/i915/dp_mst: Fix encoder HW state readout for UHBR MST
drm/xe: cancel pending job timer before freeing scheduler
drm/xe/regs: remove a duplicate definition for RING_CTL_SIZE(size)
drm/imagination: remove unnecessary header include path
drm/amdgpu: init return value in amdgpu_ttm_clear_buffer
drm/amd/display: Fix HPD after gpu reset
drm/amd/display: add a quirk to enable eDP0 on DP1
drm/amd/display: Disable PSR-SU on eDP panels
MAINTAINERS: Update AMDGPU DML maintainers info
drm/amd/display: restore edid reading from a given i2c adapter
mailmap: Add entry for Rodrigo Siqueira
MAINTAINERS: Change my role from Maintainer to Reviewer
drm/amdgpu/mes: keep enforce isolation up to date
drm/amdgpu/gfx: only call mes for enforce isolation if supported
MAINTAINERS: update amdgpu maintainers list
drm/amdgpu: disable BAR resize on Dell G5 SE
drm/amdkfd: Preserve cp_hqd_pq_control on update_mqd
amdgpu/pm/legacy: fix suspend/resume issues
...
|
|
Kevin Krakauer says:
====================
selftests/net: deflake GRO tests and fix return value and output
The GRO selftests can flake and have some confusing behavior. These
changes make the output and return value of GRO behave as expected, then
deflake the tests.
v1: https://lore.kernel.org/20250218164555.1955400-1-krakauer@google.com
====================
Link: https://patch.msgid.link/20250226192725.621969-1-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
GRO tests are timing dependent and can easily flake. This is partially
mitigated in gro.sh by giving each subtest 3 chances to pass. However,
this still flakes on some machines. Reduce the flakiness by:
- Bumping retries to 6.
- Setting napi_defer_hard_irqs to 1 to reduce the chance that GRO is
flushed prematurely. This also lets us reduce the gro_flush_timeout
from 1ms to 100us.
Tested: Ran `gro.sh -t large` 1000 times. There were no failures with
this change. Ran inside strace to increase flakiness.
Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-4-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
gro.c:main no longer erroneously claims a test passes when running as a
sender.
Tested: Ran `gro.sh -t large` to verify the sender no longer prints a
status.
Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-3-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Modify gro.sh to return a useful exit code when the -t flag is used. It
formerly returned 0 no matter what.
Tested: Ran `gro.sh -t large` and verified that test failures return 1.
Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-2-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
pcim_iomap_regions() should receive the driver's name as its third
parameter, not the PCI device's name.
Define the driver name with a macro and use it at the appropriate
places, including pcim_iomap_regions().
Cc: stable@vger.kernel.org # v5.14+
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Tested-by: Henry Chen <chenx97@aosc.io>
Link: https://patch.msgid.link/20250226085208.97891-2-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The order in which queue->cmd and rcv_state are updated is crucial.
If these assignments are reordered by the compiler, the worker might not
get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
the correct reordering, set rcv_state using smp_store_release().
Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
nvme_tcp_recv_pdu() doesn't check the validity of the header length.
When header digests are enabled, a target might send a packet with an
invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst()
to access memory outside the allocated area and cause memory corruptions
by overwriting it with the calculated digest.
Fix this by rejecting packets with an unexpected header length.
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Gal Pressman says:
====================
Add missing netlink error message macros to coccinelle test
The newline_in_nl_msg.cocci test is missing some variants in the list of
checked macros, add them and fix all reported issues.
====================
Link: https://patch.msgid.link/20250226093904.6632-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Netlink error messages should not have a newline at the end of the
string.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-6-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Netlink error messages should not have a newline at the end of the
string.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Netlink error messages should not have a newline at the end of the
string.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Netlink error messages should not have a newline at the end of the
string.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add missing (GE)NL_SET_ERR_MSG_*() variants to the list of macros
checked for strings ending with a newline.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250226093904.6632-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Check whether denominator expression x * (x - 1) * 1000 mod {2^32, 2^64}
produce zero and skip stddev computation in that case.
For now don't care about rec->counter * rec->counter overflow because
rec->time * rec->time overflow will likely happen earlier.
Cc: stable@vger.kernel.org
Cc: Wen Yang <wenyang@linux.alibaba.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250206090156.1561783-1-kniv@yandex-team.ru
Fixes: e31f7939c1c27 ("ftrace: Avoid potential division by zero in function profiler")
Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The fprobe test fails on Fedora 41 since the fprobe test assumption that
the number of enabled_functions is zero before the test starts is not
necessarily true. Some user space tools, like systemd, add BPF programs
that attach to functions. Those will show up in the enabled_functions table
and must be taken into account by the fprobe test.
Therefore count the number of lines of enabled_functions before tests
start, and use that as base when comparing expected results.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250226142703.910860-1-hca@linux.ibm.com
Fixes: e85c5e9792b9 ("selftests/ftrace: Update fprobe test to check enabled_functions file")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The following commands causes a crash:
~# cd /sys/kernel/tracing/events/rcu/rcu_callback
~# echo 'hist:name=bad:keys=common_pid:onmax(bogus).save(common_pid)' > trigger
bash: echo: write error: Invalid argument
~# echo 'hist:name=bad:keys=common_pid' > trigger
Because the following occurs:
event_trigger_write() {
trigger_process_regex() {
event_hist_trigger_parse() {
data = event_trigger_alloc(..);
event_trigger_register(.., data) {
cmd_ops->reg(.., data, ..) [hist_register_trigger()] {
data->ops->init() [event_hist_trigger_init()] {
save_named_trigger(name, data) {
list_add(&data->named_list, &named_triggers);
}
}
}
}
ret = create_actions(); (return -EINVAL)
if (ret)
goto out_unreg;
[..]
ret = hist_trigger_enable(data, ...) {
list_add_tail_rcu(&data->list, &file->triggers); <<<---- SKIPPED!!! (this is important!)
[..]
out_unreg:
event_hist_unregister(.., data) {
cmd_ops->unreg(.., data, ..) [hist_unregister_trigger()] {
list_for_each_entry(iter, &file->triggers, list) {
if (!hist_trigger_match(data, iter, named_data, false)) <- never matches
continue;
[..]
test = iter;
}
if (test && test->ops->free) <<<-- test is NULL
test->ops->free(test) [event_hist_trigger_free()] {
[..]
if (data->name)
del_named_trigger(data) {
list_del(&data->named_list); <<<<-- NEVER gets removed!
}
}
}
}
[..]
kfree(data); <<<-- frees item but it is still on list
The next time a hist with name is registered, it causes an u-a-f bug and
the kernel can crash.
Move the code around such that if event_trigger_register() succeeds, the
next thing called is hist_trigger_enable() which adds it to the list.
A bunch of actions is called if get_named_trigger_data() returns false.
But that doesn't need to be called after event_trigger_register(), so it
can be moved up, allowing event_trigger_register() to be called just
before hist_trigger_enable() keeping them together and allowing the
file->triggers to be properly populated.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250227163944.1c37f85f@gandalf.local.home
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reported-by: Tomas Glozar <tglozar@redhat.com>
Tested-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Closes: https://lore.kernel.org/all/CAP4=nvTsxjckSBTz=Oe_UYh8keD9_sZC4i++4h72mJLic4_W4A@mail.gmail.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In some net-sysfs functions the ret value is initialized but never used
as it is always overridden. Remove those.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250226174644.311136-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
antonio@openvpn.net is still used for sending
patches under the OpenVPN Inc. umbrella, therefore this
address should not be re-mapped.
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20250227-b4-ovpn-v20-1-93f363310834@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the port range to rt_link, example:
# tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \
--do getlink --json '{"ifname": "geneve1"}' --output-json | jq
{
"ifname": "geneve1",
[...]
"linkinfo": {
"kind": "geneve",
"data": {
"id": 1000,
"remote": "147.28.227.100",
"udp-csum": 0,
"ttl": 0,
"tos": 0,
"label": 0,
"df": 0,
"port": 49431,
"udp-zero-csum6-rx": 1,
"ttl-inherit": 0,
"port-range": {
"low": 4000,
"high": 5000
}
}
},
[...]
}
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20250226182030.89440-2-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Recently, in case of Cilium, we run into users on Azure who require to use
tunneling for east/west traffic due to hitting IPAM API limits for Kubernetes
Pods if they would have gone with publicly routable IPs for Pods. In case
of tunneling, Cilium supports the option of vxlan or geneve. In order to
RSS spread flows among remote CPUs both derive a source port hash via
udp_flow_src_port() which takes the inner packet's skb->hash into account.
For clusters with many nodes, this can then hit a new limitation [0]: Today,
the Azure networking stack supports 1M total flows (500k inbound and 500k
outbound) for a VM. [...] Once this limit is hit, other connections are
dropped. [...] Each flow is distinguished by a 5-tuple (protocol, local IP
address, remote IP address, local port, and remote port) information. [...]
For vxlan and geneve, this can create a massive amount of UDP flows which
then run into the limits if stale flows are not evicted fast enough. One
option to mitigate this for vxlan is to narrow the source port range via
IFLA_VXLAN_PORT_RANGE while still being able to benefit from RSS. However,
geneve currently does not have this option and it spreads traffic across
the full source port range of [1, USHRT_MAX]. To overcome this limitation
also for geneve, add an equivalent IFLA_GENEVE_PORT_RANGE setting for users.
Note that struct geneve_config before/after still remains at 2 cachelines
on x86-64. The low/high members of struct ifla_geneve_port_range (which is
uapi exposed) are of type __be16. While they would be perfectly fine to be
of __u16 type, the consensus was that it would be good to be consistent
with the existing struct ifla_vxlan_port_range from a uapi consumer PoV.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput [0]
Link: https://patch.msgid.link/20250226182030.89440-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
So, in this particular case, we create a new `struct mlx5e_umr_wqe_hdr`
to enclose the header part of flexible structure `struct mlx5e_umr_wqe`.
This is, all the members except the flexible arrays `inline_mtts`,
`inline_klms` and `inline_ksms` in the anonymous union. We then replace
the header part with `struct mlx5e_umr_wqe_hdr hdr;` in `struct
mlx5e_umr_wqe`, and change the type of the object currently causing
trouble `umr_wqe` from `struct mlx5e_umr_wqe` to `struct
mlx5e_umr_wqe_hdr` --this last bit gets rid of the flex-array-in-the-middle
part and avoid the warnings.
Also, no new members should be added to `struct mlx5e_umr_wqe`, instead
any new members must be included in the header structure `struct
mlx5e_umr_wqe_hdr`. To enforce this, we use `static_assert()`, ensuring
that the memory layout of both the flexible structure and the newly
created header struct remain consistent.
The next step is to refactor the rest of the related code accordingly,
which means adding a bunch of `hdr.` wherever needed.
Lastly, we use `container_of()` whenever we need to retrieve a pointer
to the flexible structure `struct mlx5e_umr_wqe`.
So, with these changes, fix 125 of the following warnings:
drivers/net/ethernet/mellanox/mlx5/core/en.h:664:48: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/Z76HzPW1dFTLOSSy@kspp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With ipvs_reset() now done unconditionally in skb_scrub_packet()
we would then call the former twice netkit_prep_forward(). Thus
remove the now unnecessary explicit call.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250225212927.69271-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|