Age | Commit message (Collapse) | Author |
|
Add lockdep assertions in get_vmcs12() and get_shadow_vmcs12() to verify
the vCPU's mutex is held, as the returned VMCS objects are dynamically
allocated/freed when nested VMX is turned on/off, i.e. accessing vmcs12
structures without holding vcpu->mutex is susceptible to use-after-free.
Waive the assertion if the VM is being destroyed, as KVM currently forces
a nested VM-Exit when freeing the vCPU. If/when that wart is fixed, the
assertion can/should be converted to an unqualified lockdep assertion.
See also https://lore.kernel.org/all/Zsd0TqCeY3B5Sb5b@google.com.
Link: https://lore.kernel.org/r/20240906043413.1049633-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Explicitly invalidate posted_intr_nv when emulating nested VM-Enter and
posted interrupts are disabled to make it clear that posted_intr_nv is
valid if and only if nested posted interrupts are enabled, and as a cheap
way to harden against KVM bugs.
KVM initializes posted_intr_nv to -1 at vCPU creation and resets it to -1
when unloading vmcs12 and/or leaving nested mode, i.e. this is not a bug
fix (or at least, it's not intended to be a bug fix).
Note, tracking nested.posted_intr_nv as a u16 subtly adds a measure of
safety, as it prevents unintentionally matching KVM's informal "no IRQ"
vector of -1, stored as a signed int. Because a u16 can be always be
represented as a signed int, the effective "invalid" value of
posted_intr_nv, 65535, will be preserved as-is when comparing against an
int, i.e. will be zero-extended, not sign-extended, and thus won't get a
false positive if KVM is buggy and compares posted_intr_nv against -1.
Opportunistically add a comment in vmx_deliver_nested_posted_interrupt()
to call out that it must check vmx->nested.posted_intr_nv, not the vector
in vmcs12, which is presumably the _entire_ reason nested.posted_intr_nv
exists. E.g. vmcs12 is a KVM-controlled snapshot, so there are no TOCTOU
races to worry about, the only potential badness is if the vCPU leaves
nested and frees vmcs12 between the sender checking is_guest_mode() and
dereferencing the vmcs12 pointer.
Link: https://lore.kernel.org/r/20240906043413.1049633-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt() now that nVMX
essentially open codes kvm_get_apic_interrupt() in order to correctly
emulate nested posted interrupts.
Opportunistically stop exporting kvm_cpu_get_interrupt(), as the
aforementioned nVMX flow was the only user in vendor code.
Link: https://lore.kernel.org/r/20240906043413.1049633-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When synthensizing a nested VM-Exit due to an external interrupt, pend a
nested posted interrupt if the external interrupt vector matches L2's PI
notification vector, i.e. if the interrupt is a PI notification for L2.
This fixes a bug where KVM will incorrectly inject VM-Exit instead of
processing nested posted interrupt when IPI virtualization is enabled.
Per the SDM, detection of the notification vector doesn't occur until the
interrupt is acknowledge and deliver to the CPU core.
If the external-interrupt exiting VM-execution control is 1, any unmasked
external interrupt causes a VM exit (see Section 26.2). If the "process
posted interrupts" VM-execution control is also 1, this behavior is
changed and the processor handles an external interrupt as follows:
1. The local APIC is acknowledged; this provides the processor core
with an interrupt vector, called here the physical vector.
2. If the physical vector equals the posted-interrupt notification
vector, the logical processor continues to the next step. Otherwise,
a VM exit occurs as it would normally due to an external interrupt;
the vector is saved in the VM-exit interruption-information field.
For the most part, KVM has avoided problems because a PI NV for L2 that
arrives will L2 is active will be processed by hardware, and KVM checks
for a pending notification vector during nested VM-Enter. Thus, to hit
the bug, the PI NV interrupt needs to sneak its way into L1's vIRR while
L2 is active.
Without IPI virtualization, the scenario is practically impossible to hit,
modulo L1 doing weird things (see below), as the ordering between
vmx_deliver_posted_interrupt() and nested VM-Enter effectively guarantees
that either the sender will see the vCPU as being in_guest_mode(), or the
receiver will see the interrupt in its vIRR.
With IPI virtualization, introduced by commit d588bb9be1da ("KVM: VMX:
enable IPI virtualization"), the sending CPU effectively implements a rough
equivalent of vmx_deliver_posted_interrupt(), sans the nested PI NV check.
If the target vCPU has a valid PID, the CPU will send a PI NV interrupt
based on _L1's_ PID, as the sender's because IPIv table points at L1 PIDs.
PIR := 32 bytes at PID_ADDR;
// under lock
PIR[V] := 1;
store PIR at PID_ADDR;
// release lock
NotifyInfo := 8 bytes at PID_ADDR + 32;
// under lock
IF NotifyInfo.ON = 0 AND NotifyInfo.SN = 0; THEN
NotifyInfo.ON := 1;
SendNotify := 1;
ELSE
SendNotify := 0;
FI;
store NotifyInfo at PID_ADDR + 32;
// release lock
IF SendNotify = 1; THEN
send an IPI specified by NotifyInfo.NDST and NotifyInfo.NV;
FI;
As a result, the target vCPU ends up receiving an interrupt on KVM's
POSTED_INTR_VECTOR while L2 is running, with an interrupt in L1's PIR for
L2's nested PI NV. The POSTED_INTR_VECTOR interrupt triggers a VM-Exit
from L2 to L0, KVM moves the interrupt from L1's PIR to vIRR, triggers a
KVM_REQ_EVENT prior to re-entry to L2, and calls vmx_check_nested_events(),
effectively bypassing all of KVM's "early" checks on nested PI NV.
Without IPI virtualization, the bug can likely be hit only if L1 programs
an assigned device to _post_ an interrupt to L2's notification vector, by
way of L1's PID.PIR. Doing so would allow the interrupt to get into L1's
vIRR without KVM checking vmcs12's NV. Which is architecturally allowed,
but unlikely behavior for a hypervisor.
Cc: Zeng Guang <guang.zeng@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20240906043413.1049633-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In the should-be-impossible scenario that kvm_cpu_get_interrupt() doesn't
return a valid vector after checking kvm_cpu_has_interrupt(), skip VM-Exit
injection to reduce the probability of crashing/confusing L1. Now that
KVM gets the IRQ _before_ calling nested_vmx_vmexit(), squashing the
VM-Exit injection is trivial since there are no actions that need to be
undone.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20240906043413.1049633-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from
nested_vmx_vmexit() to vmx_check_nested_events(), which is subtly the one
and only path where KVM invokes nested_vmx_vmexit() with
EXIT_REASON_EXTERNAL_INTERRUPT. A future fix will perform a last-minute
check on L2's nested posted interrupt notification vector, just before
injecting a nested VM-Exit. To handle that scenario correctly, KVM needs
to get the interrupt _before_ injecting VM-Exit, as simply querying the
highest priority interrupt, via kvm_cpu_has_interrupt(), would result in
TOCTOU bug, as a new, higher priority interrupt could arrive between
kvm_cpu_has_interrupt() and kvm_cpu_get_interrupt().
Unfortunately, simply moving the call to kvm_cpu_get_interrupt() doesn't
suffice, as a VMWRITE to GUEST_INTERRUPT_STATUS.SVI is hiding in
kvm_get_apic_interrupt(), and acknowledging the interrupt before nested
VM-Exit would cause the VMWRITE to hit vmcs02 instead of vmcs01.
Open code a rough equivalent to kvm_cpu_get_interrupt() so that the IRQ
is acknowledged after emulating VM-Exit, taking care to avoid the TOCTOU
issue described above.
Opportunistically convert the WARN_ON() to a WARN_ON_ONCE(). If KVM has
a bug that results in a false positive from kvm_cpu_has_interrupt(),
spamming dmesg won't help the situation.
Note, nested_vmx_reflect_vmexit() can never reflect external interrupts as
they are always "wanted" by L0.
Link: https://lore.kernel.org/r/20240906043413.1049633-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Split the "ack" phase, i.e. the movement of an interrupt from IRR=>ISR,
out of kvm_get_apic_interrupt() and into a separate API so that nested
VMX can acknowledge a specific interrupt _after_ emulating a VM-Exit from
L2 to L1.
To correctly emulate nested posted interrupts while APICv is active, KVM
must:
1. find the highest pending interrupt.
2. check if that IRQ is L2's notification vector
3. emulate VM-Exit if the IRQ is NOT the notification vector
4. ACK the IRQ in L1 _after_ VM-Exit
When APICv is active, the process of moving the IRQ from the IRR to the
ISR also requires a VMWRITE to update vmcs01.GUEST_INTERRUPT_STATUS.SVI,
and so acknowledging the interrupt before switching to vmcs01 would result
in marking the IRQ as in-service in the wrong VMCS.
KVM currently fudges around this issue by doing kvm_get_apic_interrupt()
smack dab in the middle of emulating VM-Exit, but that hack doesn't play
nice with nested posted interrupts, as notification vector IRQs don't
trigger a VM-Exit in the first place.
Cc: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240906043413.1049633-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When SGX EDECCSSA support was added to KVM in commit 16a7fe3728a8
("KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest"), it
forgot to clear the X86_FEATURE_SGX_EDECCSSA bit in KVM CPU caps when
KVM SGX is disabled. Fix it.
Fixes: 16a7fe3728a8 ("KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest")
Signed-off-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20240905120837.579102-1-kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
GCC 12.3.0 complains about a potential NULL pointer dereference in
evmcs_load() as hv_get_vp_assist_page() can return NULL. In fact, this
cannot happen because KVM verifies (hv_init_evmcs()) that every CPU has a
valid VP assist page and aborts enabling the feature otherwise. CPU
onlining path is also checked in vmx_hardware_enable().
To make the compiler happy and to future proof the code, add a KVM_BUG_ON()
sentinel. It doesn't seem to be possible (and logical) to observe
evmcs_load() happening without an active vCPU so it is presumed that
kvm_get_running_vcpu() can't return NULL.
No functional change intended.
Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240816130124.286226-1-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In prepare_vmcs02_rare(), call vmx_segment_cache_clear() instead of
setting segment_cache.bitmask directly. Using the helper minimizes the
chances of prepare_vmcs02_rare() doing the wrong thing in the future, e.g.
if KVM ends up doing more than just zero the bitmask when purging the
cache.
No functional change intended.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20240725175232.337266-2-mlevitsk@redhat.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if
L1 attempts to load/store an MSR via the VMCS MSR lists that userspace has
disallowed access to via an MSR filter. Intel already disallows including
a handful of "special" MSRs in the VMCS lists, so denying access isn't
completely without precedent.
More importantly, the behavior is well-defined _and_ can be communicated
the end user, e.g. to the customer that owns a VM running as L1 on top of
KVM. On the other hand, ignoring userspace MSR filters is all but
guaranteed to result in unexpected behavior as the access will hit KVM's
internal state, which is likely not up-to-date.
Unlike KVM-internal accesses, instruction emulation, and dedicated VMCS
fields, the MSRs in the VMCS load/store lists are 100% guest controlled,
thus making it all but impossible to reason about the correctness of
ignoring the MSR filter. And if userspace *really* wants to deny access
to MSRs via the aforementioned scenarios, userspace can hide the
associated feature from the guest, e.g. by disabling the PMU to prevent
accessing PERF_GLOBAL_CTRL via its VMCS field. But for the MSR lists, KVM
is blindly processing MSRs; the MSR filters are the _only_ way for
userspace to deny access.
This partially reverts commit ac8d6cad3c7b ("KVM: x86: Only do MSR
filtering when access MSR by rdmsr/wrmsr").
Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>
Cc: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20240722235922.3351122-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In handle_encls_ecreate(), a page is allocated to store a copy of SECS
structure used by the ENCLS[ECREATE] leaf from the guest. This page is
only used temporarily and is freed after use in handle_encls_ecreate().
Don't account for the memory allocation of this page per [1].
Link: https://lore.kernel.org/kvm/b999afeb588eb75d990891855bc6d58861968f23.camel@intel.com/T/#mb81987afc3ab308bbb5861681aa9a20f2aece7fd [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20240715101224.90958-1-kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
vmcs_check16 function
According to the SDM, the meaning of field bit 0 is:
Access type (0 = full; 1 = high); must be full for 16-bit, 32-bit,
and natural-width fields. So there is no 32-bit high field here,
it should be a 32-bit field instead.
Signed-off-by: Qiang Liu <liuq131@chinatelecom.cn>
Link: https://lore.kernel.org/r/20240702064609.52487-1-liuq131@chinatelecom.cn
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two driver fixes for regressions from 6.11-rc1 due to the
driver core change making a structure in a driver core callback const.
These were missed by all testing EXCEPT for what Bart happened to be
running, so I appreciate the fixes provided here for some
odd/not-often-used driver subsystems that nothing else happened to
catch.
Both of these fixes have been in linux-next all week with no reported
issues"
* tag 'driver-core-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
mips: sgi-ip22: Fix the build
ARM: riscpc: ecard: Fix the build
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc fixes from Greg KH:
"Here are some small char/misc fixes for 6.11-rc4 to resolve reported
problems. Included in here are:
- fastrpc revert of a change that broke userspace
- xillybus fixes for reported issues
Half of these have been in linux-next this week with no reported
problems, I don't know if the last bit of xillybus driver changes made
it in, but they are 'obviously correct' so will be safe :)"
* tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
char: xillybus: Check USB endpoints when probing device
char: xillybus: Refine workqueue handling
Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
char: xillybus: Don't destroy workqueue from work item running on it
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.11-rc4 to
resolve some reported problems. Included in here are:
- conmakehash.c userspace build issues
- fsl_lpuart driver fix
- 8250_omap revert for reported regression
- atmel_serial rts flag fix
All of these have been in linux-next this week with no reported
issues"
* tag 'tty-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: 8250_omap: Set the console genpd always on if no console suspend"
tty: atmel_serial: use the correct RTS flag.
tty: vt: conmakehash: remove non-portable code printing comment header
tty: serial: fsl_lpuart: mark last busy before uart_add_one_port
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt driver fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.11-rc4 to
resolve some reported issues. Included in here are:
- thunderbolt driver fixes for reported problems
- typec driver fixes
- xhci fixes
- new device id for ljca usb driver
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[]
Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET"
usb: typec: ucsi: Fix the return value of ucsi_run_command()
usb: xhci: fix duplicate stall handling in handle_tx_event()
usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup()
thunderbolt: Mark XDomain as unplugged when router is removed
thunderbolt: Fix memory leaks in {port|retimer}_sb_regs_write()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs fixes from David Sterba:
"A more fixes. We got reports that shrinker added in 6.10 still causes
latency spikes and the fixes don't handle all corner cases. Due to
summer holidays we're taking a shortcut to disable it for release
builds and will fix it in the near future.
- only enable extent map shrinker for DEBUG builds, temporary quick
fix to avoid latency spikes for regular builds
- update target inode's ctime on unlink, mandated by POSIX
- properly take lock to read/update block group's zoned variables
- add counted_by() annotations"
* tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: only enable extent map shrinker for DEBUG builds
btrfs: zoned: properly take lock to read/update block group's zoned variables
btrfs: tree-checker: add dev extent item checks
btrfs: update target inode's ctime on unlink
btrfs: send: annotate struct name_cache_entry with __counted_by()
|
|
fuse_notify_store(), unlike fuse_do_readpage(), does not enable page
zeroing (because it can be used to change partial page contents).
So fuse_notify_store() must be more careful to fully initialize page
contents (including parts of the page that are beyond end-of-file)
before marking the page uptodate.
The current code can leave beyond-EOF page contents uninitialized, which
makes these uninitialized page contents visible to userspace via mmap().
This is an information leak, but only affects systems which do not
enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the
corresponding kernel command line parameter).
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574
Cc: stable@kernel.org
Fixes: a1d75f258230 ("fuse: add store request")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes. All except one are for MM. 10 of these are cc:stable and
the others pertain to post-6.10 issues.
As usual with these merges, singletons and doubletons all over the
place, no identifiable-by-me theme. Please see the lovingly curated
changelogs to get the skinny"
* tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/migrate: fix deadlock in migrate_pages_batch() on large folios
alloc_tag: mark pages reserved during CMA activation as not tagged
alloc_tag: introduce clear_page_tag_ref() helper function
crash: fix riscv64 crash memory reserve dead loop
selftests: memfd_secret: don't build memfd_secret test on unsupported arches
mm: fix endless reclaim on machines with unaccepted memory
selftests/mm: compaction_test: fix off by one in check_compaction()
mm/numa: no task_numa_fault() call if PMD is changed
mm/numa: no task_numa_fault() call if PTE is changed
mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
mm: don't account memmap per-node
mm: add system wide stats items category
mm: don't account memmap on failure
mm/hugetlb: fix hugetlb vs. core-mm PT locking
mseal: fix is_madv_discard()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix crashes on 85xx with some configs since the recent hugepd rework.
- Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL on some
platforms.
- Don't enable offline cores when changing SMT modes, to match existing
userspace behaviour.
Thanks to Christophe Leroy, Dr. David Alan Gilbert, Guenter Roeck, Nysal
Jan K.A, Shrikanth Hegde, Thomas Gleixner, and Tyrel Datwyler.
* tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/topology: Check if a core is online
cpu/SMT: Enable SMT only if a core is online
powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL
powerpc/mm: Fix size of allocated PGDIR
soc: fsl: qbman: remove unused struct 'cgr_comp'
|
|
Pull smb client fixes from Steve French:
- fix for clang warning - additional null check
- fix for cached write with posix locks
- flexible structure fix
* tag 'v6.11-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: smb2pdu.h: Use static_assert() to check struct sizes
smb3: fix lock breakage for cached writes
smb/client: avoid possible NULL dereference in cifs_free_subrequest()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C core fix replacing IS_ENABLED() with IS_REACHABLE()
For host drivers, there are two fixes:
- Tegra I2C Controller: Addresses a potential double-locking issue
during probe. ACPI devices are not IRQ-safe when invoking runtime
suspend and resume functions, so the irq_safe flag should not be
set.
- Qualcomm GENI I2C Controller: Fixes an oversight in the exit path
of the runtime_resume() function, which was missed in the previous
release"
* tag 'i2c-for-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: tegra: Do not mark ACPI devices as irq safe
i2c: Use IS_REACHABLE() for substituting empty ACPI functions
i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes to the mpi3mr driver. One to avoid oversize
allocations in tracing and the other to fix an uninitialized spinlock
in the user to driver feature request code (used to trigger dumps and
the like)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpi3mr: Avoid MAX_PAGE_ORDER WARNING for buffer allocations
scsi: mpi3mr: Add missing spin_lock_init() for mrioc->trigger_lock
|
|
Pull xfs fixes from Chandan Babu:
- Check for presence of only 'attr' feature before scrubbing an inode's
attribute fork.
- Restore the behaviour of setting AIL thread to TASK_INTERRUPTIBLE for
long (i.e. 50ms) sleep durations to prevent high load averages.
- Do not allow users to change the realtime flag of a file unless the
datadev and rtdev both support fsdax access modes.
* tag 'xfs-6.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
xfs: revert AIL TASK_KILLABLE threshold
xfs: attr forks require attr, not attr2
|
|
Pull bcachefs fixes from Kent OverstreetL
- New on disk format version, bcachefs_metadata_version_disk_accounting_inum
This adds one more disk accounting counter, which counts disk usage
and number of extents per inode number. This lets us track
fragmentation, for implementing defragmentation later, and it also
counts disk usage per inode in all snapshots, which will be a useful
thing to expose to users.
- One performance issue we've observed is threads spinning when they
should be waiting for dirty keys in the key cache to be flushed by
journal reclaim, so we now have hysteresis for the waiting thread, as
well as improving the tracepoint and a new time_stat, for tracking
time blocked waiting on key cache flushing.
... and various assorted smaller fixes.
* tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix locking in __bch2_trans_mark_dev_sb()
bcachefs: fix incorrect i_state usage
bcachefs: avoid overflowing LRU_TIME_BITS for cached data lru
bcachefs: Fix forgetting to pass trans to fsck_err()
bcachefs: Increase size of cuckoo hash table on too many rehashes
bcachefs: bcachefs_metadata_version_disk_accounting_inum
bcachefs: Kill __bch2_accounting_mem_mod()
bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in
bcachefs: Add a time_stat for blocked on key cache flush
bcachefs: Improve trans_blocked_journal_reclaim tracepoint
bcachefs: Add hysteresis to waiting on btree key cache flush
lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()
bcachefs: Convert for_each_btree_node() to lockrestart_do()
bcachefs: Add missing downgrade table entry
bcachefs: disk accounting: ignore unknown types
bcachefs: bch2_accounting_invalid() fixup
bcachefs: Fix bch2_trigger_alloc when upgrading from old versions
bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
|
|
We run this in full RW mode now, so we have to guard against the
superblock buffer being reallocated.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull memcg-v1 fix from Al Viro:
"memcg_write_event_control() oops fix"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
memcg_write_event_control(): fix a user-triggerable oops
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix the arm64 __get_mem_asm() to use the _ASM_EXTABLE_##type##ACCESS()
macro instead of the *_ERR() one in order to avoid writing -EFAULT to
the value register in case of a fault
- Initialise all elements of the acpi_early_node_map[] to NUMA_NO_NODE.
Prior to this fix, only the first element was initialised
- Move the KASAN random tag seed initialisation after the per-CPU areas
have been initialised (prng_state is __percpu)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix KASAN random tag seed initialization
arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
arm64: uaccess: correct thinko in __get_mem_asm()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One fix for the new T-Head TH1520 clk driver that marks a bus clk
critical so that it isn't turned off during late init which breaks
emmc-sdio"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: thead: fix dependency on clk_ignore_unused
|
|
Pull block fixes from Jens Axboe:
- Fix corruption issues with s390/dasd (Eric, Stefan)
- Fix a misuse of non irq locking grab of a lock (Li)
- MD pull request with a single data corruption fix for raid1 (Yu)
* tag 'block-6.11-20240824' of git://git.kernel.dk/linux:
block: Fix lockdep warning in blk_mq_mark_tag_wait
md/raid1: Fix data corruption for degraded array with slow disk
s390/dasd: fix error recovery leading to data corruption on ESE devices
s390/dasd: Remove DMA alignment
|
|
Pull io_uring fixes from Jens Axboe:
- Fix a comment in the uapi header using the wrong member name (Caleb)
- Fix KCSAN warning for a debug check in sqpoll (me)
- Two more NAPI tweaks (Olivier)
* tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux:
io_uring: fix user_data field name in comment
io_uring/sqpoll: annotate debug task == current with data_race()
io_uring/napi: remove duplicate io_napi_entry timeout assignation
io_uring/napi: check napi_enabled in io_napi_add() before proceeding
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix a possible (but unlikely) out-of-bounds read in interrupts
parsing code
- Add AT25 EEPROM "fujitsu,mb85rs256" compatible
- Update Konrad Dybcio's email
* tag 'devicetree-fixes-for-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of/irq: Prevent device address out-of-bounds read in interrupt map walk
dt-bindings: eeprom: at25: add fujitsu,mb85rs256 compatible
dt-bindings: Batch-update Konrad Dybcio's email
|
|
Although there are several patches improving the extent map shrinker,
there are still reports of too frequent shrinker behavior, taking too
much CPU for the kswapd process.
So let's only enable extent shrinker for now, until we got more
comprehensive understanding and a better solution.
Link: https://lore.kernel.org/linux-btrfs/3df4acd616a07ef4d2dc6bad668701504b412ffc.camel@intelfx.name/
Link: https://lore.kernel.org/linux-btrfs/c30fd6b3-ca7a-4759-8a53-d42878bf84f7@gmail.com/
Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
CC: stable@vger.kernel.org # 6.10+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix a Bang-bang thermal governor issue causing it to fail to reset the
state of cooling devices if they are 'on' to start with, but the
thermal zone temperature is always below the corresponding trip point
(Rafael Wysocki)"
* tag 'thermal-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: gov_bang_bang: Use governor_data to reduce overhead
thermal: gov_bang_bang: Add .manage() callback
thermal: gov_bang_bang: Split bang_bang_control()
thermal: gov_bang_bang: Call __thermal_cdev_update() directly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix an issue related to the ACPI EC device handling that causes the
_REG control method to be evaluated for EC operation regions that are
not expected to be used.
This confuses the platform firmware and provokes various types of
misbehavior on some systems (Rafael Wysocki)"
* tag 'acpi-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: EC: Evaluate _REG outside the EC scope more carefully
ACPICA: Add a depth argument to acpi_execute_reg_methods()
Revert "ACPI: EC: Evaluate orphan _REG under EC device"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Ira Weiny:
"Commit f467fee48da4 ("block: move the dax flag to queue_limits") broke
the DAX tests by skipping over the legacy pmem mapping pages case.
Set the DAX flag in this case as well"
* tag 'libnvdimm-fixes-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases
|
|
io_uring_cqe's user_data field refers to `sqe->data`, but io_uring_sqe
does not have a data field. Fix the comment to say `sqe->user_data`.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://github.com/axboe/liburing/pull/1206
Link: https://lore.kernel.org/r/20240816181526.3642732-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull rust fixes from Miguel Ojeda:
- Fix '-Os' Rust 1.80.0+ builds adding more intrinsics (also tweaked in
upstream Rust for the upcoming 1.82.0).
- Fix support for the latest version of rust-analyzer due to a change
on rust-analyzer config file semantics (considered a fix since most
developers use the latest version of the tool, which is the only one
actually supported by upstream). I am discussing stability of the
config file with upstream -- they may be able to start versioning it.
- Fix GCC 14 builds due to '-fmin-function-alignment' not skipped for
libclang (bindgen).
- A couple Kconfig fixes around '{RUSTC,BINDGEN}_VERSION_TEXT' to
suppress error messages in a foreign architecture chroot and to use a
proper default format.
- Clean 'rust-analyzer' target warning due to missing recursive make
invocation mark.
- Clean Clippy warning due to missing indentation in docs.
- Clean LLVM 19 build warning due to removed 3dnow feature upstream.
* tag 'rust-fixes-6.11' of https://github.com/Rust-for-Linux/linux:
rust: x86: remove `-3dnow{,a}` from target features
kbuild: rust-analyzer: mark `rust_is_available.sh` invocation as recursive
rust: add intrinsics to fix `-Os` builds
kbuild: rust: skip -fmin-function-alignment in bindgen flags
rust: Support latest version of `rust-analyzer`
rust: macros: indent list item in `module!`'s docs
rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- reintroduce the text patching global icache flush
- fix syscall entry code to correctly initialize a0, which manifested
as a strace bug
- XIP kernels now map the entire kernel, which fixes boot under at
least DEBUG_VIRTUAL=y
- initialize all nodes in the acpi_early_node_map initializer
- fix OOB access in the Andes vendor extension probing code
- A new key for scalar misaligned access performance in hwprobe, which
correctly treat the values as an enum (as opposed to a bitmap)
* tag 'riscv-for-linus-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix out-of-bounds when accessing Andes per hart vendor extension array
RISC-V: hwprobe: Add SCALAR to misaligned perf defines
RISC-V: hwprobe: Add MISALIGNED_PERF key
RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
riscv: change XIP's kernel_map.size to be size of the entire kernel
riscv: entry: always initialize regs->a0 to -ENOSYS
riscv: Re-introduce global icache flush in patch_text_XXX()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"A couple of fixes for tracing:
- Prevent a NULL pointer dereference in the error path of RTLA tool
- Fix an infinite loop bug when reading from the ring buffer when
closed. If there's a thread trying to read the ring buffer and it
gets closed by another thread, the one reading will go into an
infinite loop when the buffer is empty instead of exiting back to
user space"
* tag 'trace-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rtla/osnoise: Prevent NULL dereference in error handling
tracing: Return from tracing_buffers_read() if the file has been closed
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull key fixes from Jarkko Sakkinen:
"Two bug fixes for a memory corruption bug and a memory leak bug in the
DCP trusted keys type.
Just as a reminder DCP was a crypto coprocessor in i.MX SoCs"
* tag 'keys-trusted-next-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: dcp: fix leak of blob encryption key
KEYS: trusted: fix DCP blob payload length assignment
|
|
Reported-by: syzbot+95e40eae71609e40d851@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+510b0b28f8e6de64d307@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+e3938cd6d761b78750e6@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Also, improve the calculation of the new table size, so that it can
shrink when needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- fix misbehavior if suspend or resume is interrupted by a signal
- fix wrong indentation in dm-crypt.rst
- fix memory allocation failure in dm-persistent-data
* tag 'for-6.11/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm persistent data: fix memory allocation failure
Documentation: dm-crypt.rst warning + error fix
dm resume: don't return EINVAL when signalled
dm suspend: return -ERESTARTSYS instead of -EINTR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Bring back a lost return statement in io-page-fault code
- Remove an unused function declaration
* tag 'iommu-fixes-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu: Remove unused declaration iommu_sva_unbind_gpasid()
iommu: Restore lost return in iommu_report_device_fault()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fix from Bartosz Golaszewski:
- add the shutdown() callback to gpio-mlxbf3 in order to disable
interrupts during graceful reboot
* tag 'gpio-fixes-for-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mlxbf3: Support shutdown() function
|