Age | Commit message (Collapse) | Author |
|
All the big Linux distros enable CONFIG_SCHED_DEBUG, because
the various features it provides help not just with kernel
development, but with system administration and user-space
software development as well.
Reflect this reality and enable this functionality
unconditionally.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-4-mingo@kernel.org
|
|
With CONFIG_SCHED_DEBUG becoming unconditional, remove the
extra 'const_debug' indirection towards __read_mostly.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-3-mingo@kernel.org
|
|
The scheduler has this special SCHED_WARN() facility that
depends on CONFIG_SCHED_DEBUG.
Since CONFIG_SCHED_DEBUG is getting removed, convert
SCHED_WARN() to WARN_ON_ONCE().
Note that the warning output isn't 100% equivalent:
#define SCHED_WARN_ON(x) WARN_ONCE(x, #x)
Because SCHED_WARN_ON() would output the 'x' condition
as well, while WARN_ONCE() will only show a backtrace.
Hopefully these are rare enough to not really matter.
If it does, we should probably introduce a new WARN_ON()
variant that outputs the condition in stringified form,
or improve WARN_ON() itself.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-2-mingo@kernel.org
|
|
Track whether pages were unmapped from any MM (even ones with a currently
empty mm_cpumask) by the reclaim code, to figure out whether or not
broadcast TLB flush should be done when reclaim finishes.
The reason any MM must be tracked, and not only ones contributing to the
tlbbatch cpumask, is that broadcast ASIDs are expected to be kept up to
date even on CPUs where the MM is not currently active.
This change allows reclaim to avoid doing TLB flushes when only clean page
cache pages and/or slab memory were reclaimed, which is fairly common.
( This is a simpler alternative to the code that was in my INVLPGB series
before, and it seems to capture most of the benefit due to how common
it is to reclaim only page cache. )
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250319132520.6b10ad90@fangorn
|
|
Introduce a name for an old Pentium 4 model and replace the x86_model
checks with VFM ones. This gets rid of one of the last remaining
Intel-specific x86_model checks.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250318223828.2945651-3-sohil.mehta@intel.com
|
|
Architectural Perfmon was introduced on the Family 6 "Core" processors
starting with Yonah. Processors before Yonah need their own customized
PMU initialization.
p6_pmu_init() is expected to provide that initialization for early
Family 6 processors. But, currently, it could get called for any Family
6 processor if the architectural perfmon feature is disabled on that
processor. To simplify, restrict the P6 PMU initialization to early
Family 6 processors that do not have architectural perfmon support and
truly need the special handling.
As a result, the "unsupported" console print becomes practically
unreachable because all the released P6 processors are covered by the
switch cases. Move the console print to a common location where it can
cover all modern processors (including Family >15) that may not have
architectural perfmon support enumerated.
Also, use this opportunity to get rid of the unnecessary switch cases in
P6 initialization. Only the Pentium Pro processor needs a quirk, and the
rest of the processors do not need any special handling. The gaps in the
case numbers are only due to no processor with those model numbers being
released.
Use decimal numbers to represent Intel Family numbers. Also, convert one
of the last few Intel x86_model comparisons to a VFM-based one.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250318223828.2945651-2-sohil.mehta@intel.com
|
|
When the rseq UAPI header is included, 'union rseq' clashes with 'struct
rseq'. It's not the case in the rseq selftests but it does break the KVM
selftests that also include this file.
Rename 'union rseq' to 'union rseq_tls' to fix this.
Fixes: e6644c967d3c ("rseq/selftests: Ensure the rseq ABI TLS is actually 1024 bytes")
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20250319202144.1141542-1-mjeanson@efficios.com
|
|
Add David as a secondary maintainer.
[ bp: Rewrite commit message. ]
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250319181630.2673-1-davthompson@nvidia.com
|
|
Intel made a late change to the AVX10 specification that removes support
for a 256-bit maximum vector length and enumeration of the maximum
vector length. AVX10 will imply a maximum vector length of 512 bits.
I.e. there won't be any such thing as AVX10/256 or AVX10/512; there will
just be AVX10, and it will essentially just consolidate AVX512 features.
As a result of this new development, my strategy of providing both
*_avx10_256 and *_avx10_512 functions didn't turn out to be that useful.
The only remaining motivation for the 256-bit AVX512 / AVX10 functions
is to avoid downclocking on older Intel CPUs. But I already wrote
*_avx2 code too (primarily to support CPUs without AVX512), which
performs almost as well as *_avx10_256. So we should just use that.
Therefore, remove the *_avx10_256 CRC functions, and rename the
*_avx10_512 CRC functions to *_avx512. Make Ice Lake and Tiger Lake use
the *_avx2 functions instead of *_avx10_256 which they previously used.
Link: https://lore.kernel.org/r/20250319181316.91271-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/alexs/linux into docs-mw
Chinese translation docs for 6.15-rc1
This is the Chinese translation subtree for 6.15-rc1. It just
includes few changes:
- Chinese disclaimer change
- add a new translation doc: snp-tdx-threat-model
- fix a typo
Above patches are tested by 'make htmldocs/pdfdocs'
Signed-off-by: Alex Shi <alexs@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_event: Fix connection regression between LE and non-LE adapters
- Fix error code in chan_alloc_skb_cb()
* tag 'for-net-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_event: Fix connection regression between LE and non-LE adapters
Bluetooth: Fix error code in chan_alloc_skb_cb()
====================
Link: https://patch.msgid.link/20250314163847.110069-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix an entry in MAINTAINERS to avoid sending hwmon review requests to
the i2c mailing list
- Fix an out-of-bounds access in nct6775 driver
* tag 'hwmon-fixes-for-v6.14-rc8/6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775-core) Fix out of bounds access for NCT679{8,9}
MAINTAINERS: correct list and scope of LTC4286 HARDWARE MONITOR
|
|
Everywhere else in the driver uses devm_kzalloc() when allocating the
AXI data, so there is no kfree() of this structure. However,
dwc-qos-eth uses kzalloc(), which leads to this memory being leaked.
Switch to use devm_kzalloc().
Fixes: d8256121a91a ("stmmac: adding new glue driver dwmac-dwc-qos-eth")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tsRyv-0064nU-O9@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Previously with tegra-smmu, even with CONFIG_IOMMU_DMA, the default domain
could have been left as NULL. The NULL domain is specially recognized by
host1x_iommu_attach() as meaning it is not the DMA domain and
should be replaced with the special shared domain.
This happened prior to the below commit because tegra-smmu was using the
NULL domain to mean IDENTITY.
Now that the domain is properly labled the test in DRM doesn't see NULL.
Check for IDENTITY as well to enable the special domains.
This is the same issue and basic fix as seen in
commit fae6e669cdc5 ("drm/tegra: Do not assume that a NULL domain means no
DMA IOMMU").
Fixes: c8cc2655cc6c ("iommu/tegra-smmu: Implement an IDENTITY domain")
Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Closes: https://lore.kernel.org/all/c6a6f114-3acd-4d56-a13b-b88978e927dc@tecnico.ulisboa.pt/
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0-v1-10dcc8ce3869+3a7-host1x_identity_jgg@nvidia.com
|
|
It doesn't really make sense for the type of a control to change based
on the platform_max field. platform_max allows a specific system to
limit values of a control for safety but it seems reasonable the
control type should remain the same between different systems, even
if it is restricted down to just two values. Move the application of
platform_max to after control type determination in soc_info_volsw().
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250319175123.3835849-4-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Remove some local variables that aren't adding much in terms of clarity
or space saving.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250319175123.3835849-3-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
There are only two differences between snd_soc_get_volsw() and
snd_soc_get_volsw_sx(). The maximum field is handled differently, and
snd_soc_get_volsw() supports double controls with both values in the
same register.
Factor out the common code into a new helper and pass in the
appropriate max value such that it is handled correctly for each
control.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250319175123.3835849-2-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while
iterating all possible cpus. It only drops the lock if there is
scheduler or spin lock contention. If neither, then interrupts can be
disabled for a long time. On large machines this can disable interrupts
for a long enough time to drop network packets. On 400+ CPU machines
I've seen interrupt disabled for over 40 msec.
Prevent rstat from disabling interrupts while processing all possible
cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. This
approach was previously discussed in
https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/,
though this was in the context of an non-irq rstat spin lock.
Benchmark this change with:
1) a single stat_reader process with 400 threads, each reading a test
memcg's memory.stat repeatedly for 10 seconds.
2) 400 memory hog processes running in the test memcg and repeatedly
charging memory until oom killed. Then they repeat charging and oom
killing.
v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds
interrupts are disabled by rstat for 45341 usec:
# => started at: _raw_spin_lock_irq
# => ended at: cgroup_rstat_flush
#
#
# _------=> CPU#
# / _-----=> irqs-off/BH-disabled
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / _-=> migrate-disable
# ||||| / delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
stat_rea-96532 52d.... 0us*: _raw_spin_lock_irq
stat_rea-96532 52d.... 45342us : cgroup_rstat_flush
stat_rea-96532 52d.... 45342us : tracer_hardirqs_on <-cgroup_rstat_flush
stat_rea-96532 52d.... 45343us : <stack trace>
=> memcg1_stat_format
=> memory_stat_format
=> memory_stat_show
=> seq_read_iter
=> vfs_read
=> ksys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the
longest holder. The longest irqs-off holder has irqs disabled for
4142 usec, a huge reduction from previous 45341 usec rstat finding.
Running stat_reader memory.stat reader for 10 seconds:
- without memory hogs: 9.84M accesses => 12.7M accesses
- with memory hogs: 9.46M accesses => 11.1M accesses
The throughput of memory.stat access improves.
The mode of memory.stat access latency after grouping by of 2 buckets:
- without memory hogs: 64 usec => 16 usec
- with memory hogs: 64 usec => 8 usec
The memory.stat latency improves.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Tested-by: Greg Thelen <gthelen@google.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Make sure the test cleans up after itself. The XDP off statements
at the end of the test may not be reached.
Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312131040.660386-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
During a module removal, kvm_exit invokes arch specific disable
call which disables AIA. However, we invoke aia_exit before kvm_exit
resulting in the following warning. KVM kernel module can't be inserted
afterwards due to inconsistent state of IRQ.
[25469.031389] percpu IRQ 31 still enabled on CPU0!
[25469.031732] WARNING: CPU: 3 PID: 943 at kernel/irq/manage.c:2476 __free_percpu_irq+0xa2/0x150
[25469.031804] Modules linked in: kvm(-)
[25469.031848] CPU: 3 UID: 0 PID: 943 Comm: rmmod Not tainted 6.14.0-rc5-06947-g91c763118f47-dirty #2
[25469.031905] Hardware name: riscv-virtio,qemu (DT)
[25469.031928] epc : __free_percpu_irq+0xa2/0x150
[25469.031976] ra : __free_percpu_irq+0xa2/0x150
[25469.032197] epc : ffffffff8007db1e ra : ffffffff8007db1e sp : ff2000000088bd50
[25469.032241] gp : ffffffff8131cef8 tp : ff60000080b96400 t0 : ff2000000088baf8
[25469.032285] t1 : fffffffffffffffc t2 : 5249207570637265 s0 : ff2000000088bd90
[25469.032329] s1 : ff60000098b21080 a0 : 037d527a15eb4f00 a1 : 037d527a15eb4f00
[25469.032372] a2 : 0000000000000023 a3 : 0000000000000001 a4 : ffffffff8122dbf8
[25469.032410] a5 : 0000000000000fff a6 : 0000000000000000 a7 : ffffffff8122dc10
[25469.032448] s2 : ff60000080c22eb0 s3 : 0000000200000022 s4 : 000000000000001f
[25469.032488] s5 : ff60000080c22e00 s6 : ffffffff80c351c0 s7 : 0000000000000000
[25469.032582] s8 : 0000000000000003 s9 : 000055556b7fb490 s10: 00007ffff0e12fa0
[25469.032621] s11: 00007ffff0e13e9a t3 : ffffffff81354ac7 t4 : ffffffff81354ac7
[25469.032664] t5 : ffffffff81354ac8 t6 : ffffffff81354ac7
[25469.032698] status: 0000000200000100 badaddr: ffffffff8007db1e cause: 0000000000000003
[25469.032738] [<ffffffff8007db1e>] __free_percpu_irq+0xa2/0x150
[25469.032797] [<ffffffff8007dbfc>] free_percpu_irq+0x30/0x5e
[25469.032856] [<ffffffff013a57dc>] kvm_riscv_aia_exit+0x40/0x42 [kvm]
[25469.033947] [<ffffffff013b4e82>] cleanup_module+0x10/0x32 [kvm]
[25469.035300] [<ffffffff8009b150>] __riscv_sys_delete_module+0x18e/0x1fc
[25469.035374] [<ffffffff8000c1ca>] syscall_handler+0x3a/0x46
[25469.035456] [<ffffffff809ec9a4>] do_trap_ecall_u+0x72/0x134
[25469.035536] [<ffffffff809f5e18>] handle_exception+0x148/0x156
Invoke aia_exit and other arch specific cleanup functions after kvm_exit
so that disable gets a chance to be called first before exit.
Fixes: 54e43320c2ba ("RISC-V: KVM: Initial skeletal support for AIA")
Fixes: eded6754f398 ("riscv: KVM: add basic support for host vs guest profiling")
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250317-kvm_exit_fix-v1-1-aa5240c5dbd2@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Fix a regression on ATI AHCI controllers, where certain Samsung
drives fails to be detected on a warm boot when LPM is enabled.
LPM on ATI AHCI works fine with other drives. Likewise, the
Samsung drives works fine with LPM with other AHI controllers.
Thus, just like the weirdo ATA_QUIRK_NO_NCQ_ON_ATI quirk, add a
new ATA_QUIRK_NO_LPM_ON_ATI quirk to disable LPM only on ATI
AHCI controllers.
* tag 'ata-6.14-final' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Add ATA_QUIRK_NO_LPM_ON_ATI for certain Samsung SSDs
|
|
When we currently create a pidfd we check that the task hasn't been
reaped right before we create the pidfd. But it is of course possible
that by the time we return the pidfd to userspace the task has already
been reaped since we don't check again after having created a dentry for
it.
This was fine until now because that race was meaningless. But now that
we provide PIDFD_INFO_EXIT it is a problem because it is possible that
the kernel returns a reaped pidfd and it depends on the race whether
PIDFD_INFO_EXIT information is available. This depends on if the task
gets reaped before or after a dentry has been attached to struct pid.
Make this consistent and only returned pidfds for reaped tasks if
PIDFD_INFO_EXIT information is available. This is done by performing
another check whether the task has been reaped right after we attached a
dentry to struct pid.
Since pidfs_exit() is called before struct pid's task linkage is removed
the case where the task got reaped but a dentry was already attached to
struct pid and exit information was recorded and published can be
handled correctly. In that case we do return a pidfd for a reaped task
like we would've before.
Link: https://lore.kernel.org/r/20250316-kabel-fehden-66bdb6a83436@brauner
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
KVM Xen changes for 6.15
- Don't write to the Xen hypercall page on MSR writes that are initiated by
the host (userspace or KVM) to fix a class of bugs where KVM can write to
guest memory at unexpected times, e.g. during vCPU creation if userspace has
set the Xen hypercall MSR index to collide with an MSR that KVM emulates.
- Restrict the Xen hypercall MSR indx to the unofficial synthetic range to
reduce the set of possible collisions with MSRs that are emulated by KVM
(collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
in the synthetic range).
- Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.
- Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
entries when updating PV clocks, as there is no guarantee PV clocks will be
updated between TSC frequency changes and CPUID emulation, and guest reads
of Xen TSC should be rare, i.e. are not a hot path.
|
|
KVM PV clock changes for 6.15:
- Don't take kvm->lock when iterating over vCPUs in the suspend notifier to
fix a largely theoretical deadlock.
- Use the vCPU's actual Xen PV clock information when starting the Xen timer,
as the cached state in arch.hv_clock can be stale/bogus.
- Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different
PV clocks.
- Restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend notifier only
accounts for kvmclock, and there's no evidence that the flag is actually
supported by Xen guests.
- Clean up the per-vCPU "cache" of its reference pvclock, and instead only
track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately
expensive to compute, and rarely changes for modern setups).
|
|
KVM SVM changes for 6.15
- Ensure the PSP driver is initialized when both the PSP and KVM modules are
built-in (the initcall framework doesn't handle dependencies).
- Use long-term pins when registering encrypted memory regions, so that the
pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to
excessive fragmentation.
- Add macros and helpers for setting GHCB return/error codes.
- Add support for Idle HLT interception, which elides interception if the vCPU
has a pending, unmasked virtual IRQ when HLT is executed.
- Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical
address.
- Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g.
because the vCPU was "destroyed" via SNP's AP Creation hypercall.
- Reject SNP AP Creation if the requested SEV features for the vCPU don't
match the VM's configured set of features.
- Misc cleanups
|
|
Eliminates a jump over a call to capable() in the common case.
By default the limit is not even set, in which case the check can't even
fail to begin with. It remains unset at least on Debian and Ubuntu.
For this cases this can probably become a static branch instead.
In the meantime tidy it up.
I note the check separate from the bump makes the entire thing racy.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250319124923.1838719-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Previously, iomap_readpage_iter() returning 0 would break out of the
loops of iomap_readahead_iter(), which is what iomap_read_inline_data()
relies on.
However, commit d9dc477ff6a2 ("iomap: advance the iter directly on
buffered read") changes this behavior without calling
iomap_iter_advance(), which causes EROFS to get stuck in
iomap_readpage_iter().
It seems iomap_iter_advance() cannot be called in
iomap_read_inline_data() because of the iomap_write_begin() path, so
handle this in iomap_readpage_iter() instead.
Reported-and-tested-by: Bo Liu <liubo03@inspur.com>
Fixes: d9dc477ff6a2 ("iomap: advance the iter directly on buffered read")
Cc: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250319085125.4039368-1-hsiangkao@linux.alibaba.com
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
KVM VMX changes for 6.15
- Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus
modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1.
- Pass XFD_ERR as a psueo-payload when injecting #NM as a preparatory step
for upcoming FRED virtualization support.
- Decouple the EPT entry RWX protection bit macros from the EPT Violation bits
as a general cleanup, and in anticipation of adding support for emulating
Mode-Based Execution (MBEC).
- Reject KVM_RUN if userspace manages to gain control and stuff invalid guest
state while KVM is in the middle of emulating nested VM-Enter.
- Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs
in anticipation of adding sanity checks for secondary exit controls (the
primary field is out of bits).
|
|
KVM selftests changes for 6.15, part 2
- Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
improve its coverage by collecting all dirty entries on each iteration.
- Fix a few minor bugs related to handling of stats FDs.
- Add infrastructure to make vCPU and VM stats FDs available to tests by
default (open the FDs during VM/vCPU creation).
- Relax an assertion on the number of HLT exits in the xAPIC IPI test when
running on a CPU that supports AMD's Idle HLT (which elides interception of
HLT if a virtual IRQ is pending and unmasked).
- Misc cleanups and fixes.
|
|
into HEAD
KVM selftests changes for 6.15, part 1
- Misc cleanups and prep work.
- Annotate _no_printf() with "printf" so that pr_debug() statements are
checked by the compiler for default builds (and pr_info() when QUIET).
- Attempt to whack the last LLC references/misses mole in the Intel PMU
counters test by adding a data load and doing CLFLUSH{OPT} on the data
instead of the code being executed. The theory is that modern Intel CPUs
have learned new code prefetching tricks that bypass the PMU counters.
- Fix a flaw in the Intel PMU counters test where it asserts that an event is
counting correctly without actually knowing what the event counts on the
underlying hardware.
|
|
KVM x86 misc changes for 6.15:
- Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT.
- Add a helper to consolidate handling of mp_state transitions, and use it to
clear pv_unhalted whenever a vCPU is made RUNNABLE.
- Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
coalesce updates when multiple pieces of vCPU state are changing, e.g. as
part of a nested transition.
- Fix a variety of nested emulation bugs, and add VMX support for synthesizing
nested VM-Exit on interception (instead of injecting #UD into L2).
- Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS,
as KVM can't get the current CPL.
- Misc cleanups
|
|
KVM x86/mmu changes for 6.15
Add support for "fast" aging of SPTEs in both the TDP MMU and Shadow MMU, where
"fast" means "without holding mmu_lock". Not taking mmu_lock allows multiple
aging actions to run in parallel, and more importantly avoids stalling vCPUs,
e.g. due to holding mmu_lock for an extended duration while a vCPU is faulting
in memory.
For the TDP MMU, protect aging via RCU; the page tables are RCU-protected and
KVM doesn't need to access any metadata to age SPTEs.
For the Shadow MMU, use bit 1 of rmap pointers (bit 0 is used to terminate a
list of rmaps) to implement a per-rmap single-bit spinlock. When aging a gfn,
acquire the rmap's spinlock with read-only permissions, which allows hardening
and optimizing the locking and aging, e.g. locking an rmap for write requires
mmu_lock to also be held. The lock is NOT a true R/W spinlock, i.e. multiple
concurrent readers aren't supported.
To avoid forcing all SPTE updates to use atomic operations (clearing the
Accessed bit out of mmu_lock makes it inherently volatile), rework and rename
spte_has_volatile_bits() to spte_needs_atomic_update() and deliberately exclude
the Accessed bit. KVM (and mm/) already tolerates false positives/negatives
for Accessed information, and all testing has shown that reducing the latency
of aging is far more beneficial to overall system performance than providing
"perfect" young/old information.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.15
1. Remove unnecessary header include path.
2. Remove PGD saving during VM context switch.
3. Add perf events support for guest VM.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Holding the pte lock for the page that is being converted to secure is
needed to avoid races. A previous commit removed the locking, which
caused issues. Fix by locking the pte again.
|
|
When mounting a user-space filesystem using io_uring, the initialization
of the rings is done separately in the server side. If for some reason
(e.g. a server bug) this step is not performed it will be impossible to
unmount the filesystem if there are already requests waiting.
This issue is easily reproduced with the libfuse passthrough_ll example,
if the queue depth is set to '0' and a request is queued before trying to
unmount the filesystem. When trying to force the unmount, fuse_abort_conn()
will try to wake up all tasks waiting in fc->blocked_waitq, but because the
rings were never initialized, fuse_uring_ready() will never return 'true'.
Fixes: 3393ff964e0f ("fuse: block request allocation until io-uring init is complete")
Signed-off-by: Luis Henriques <luis@igalia.com>
Link: https://lore.kernel.org/r/20250306111218.13734-1-luis@igalia.com
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
irq_domain_add_linear() is going away as being obsolete now. Switch to
the preferred irq_domain_create_linear(). That differs in the first
parameter: It takes more generic struct fwnode_handle instead of struct
device_node. Therefore, of_fwnode_handle() is added around the
parameter.
Note some of the users can likely use dev->fwnode directly instead of
indirect of_fwnode_handle(dev->of_node). But dev->fwnode is not
guaranteed to be set for all, so this has to be investigated on case to
case basis (by people who can actually test with the HW).
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Prasad Kumpatla <quic_pkumpatl@quicinc.com>
Cc: linux-sound@vger.kernel.org
Link: https://patch.msgid.link/20250319092951.37667-36-jirislaby@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
There are only two differences between snd_soc_put_volsw() and
snd_soc_put_volsw_sx(). The maximum field is handled differently, and
snd_soc_put_volsw() supports double controls with both values in the
same register.
Factor out the common code into a new helper and pass in the
appropriate max value such that it is handled correctly for each
control.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-13-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
snd_soc_info_volsw() and snd_soc_info_volsw_sx() do very similar
things, and have a lot of code in common. Already this is causing
some issues as the detection of volume controls has been fixed
in the normal callback but not the sx callback. Factor out a new
helper containing the common code and leave the function specific
bits behind in each callback.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-12-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
With the addition of the soc_mixer_ctl_to_reg() helper it is now very
clear that the only difference between snd_soc_put_volsw() and
snd_soc_put_volsw_range() is that the former supports double controls
with both values in the same register. As such we can combine both
functions.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-11-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
With the addition of the soc_mixer_reg_to_ctl() helper it is now very
clear that the only difference between snd_soc_get_volsw() and
snd_soc_get_volsw_range() is that the former supports double controls
with both values in the same register. As such we can combine both
functions.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-10-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The only difference between snd_soc_info_volsw() and
snd_soc_info_volsw_range() is that the later will not force a 2
value control to be of type integer if the name ends in "Volume".
The kernel currently contains no users of snd_soc_info_volsw_range()
that would return a boolean control with this code, so the risk is
quite low and it seems appropriate that it should contain volume
control detection. So remove snd_soc_info_volsw_range() and point its
users at snd_soc_info_volsw().
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-9-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add a helper function to convert from control values to register values
that can replace a lot of the duplicated code in the various put
handlers.
This also fixes a small issue in snd_soc_put_volsw where the value is
converted to a control value before doing the invert, but the invert
is done against the register max which will result in incorrect values
for inverted controls with a non-zero minimum.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-8-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The current snd_soc_read_signed() helper is only used from
snd_soc_get_volsw() and can be implemented more simply with
sign_extend. Remove snd_soc_read_signed() and add a new
soc_mixer_reg_to_ctl() helper. This new helper does not
include the reading of the register, but does include min and
max handling. This allows the helper to replace more of the
duplicated code and makes it easier to process the differences
between single, double register and double shift style controls.
It is worth noting this adds support for sign_bit into the _range
and sx callbacks and the invert option to sx callbacks.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-7-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Most of the put handlers have identical code to verify the value passed
in from user-space. Factor this out into a single helper function.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-6-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Use GENMASK to make the masks for the various control helper functions.
Also factor out a shared helper function for the volsw and volsw_range
controls since the same code is appropriate for each. Note this does add
support for sign_bit into the volsw_range callbacks.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-5-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Update the comments for the xr_sx control helper functions to better
clarify the difference between these and the normal sx helpers.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-4-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
No functional changes just tidying up some tabbing etc.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-3-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add some basic kunit tests for some of the ASoC control put and get
helpers. This will assist in doing some refactoring. Note that presently
some tests fail, but the rest of the series will fix these up.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250318171459.3203730-2-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Fix a reference count leak in slave_show() by properly putting the device
reference obtained from device_find_any_child().
Fixes: 6c364062bfed ("spi: core: Add support for registering SPI slave controllers")
Fixes: c21b0837983d ("spi: Use device_find_any_child() instead of custom approach")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250319032305.70340-1-linmq006@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
According to AXP717 user manual, DCDC4 doesn't have a ramp delay like
DCDC1/2/3 do.
Remove it from the description and cleanup the macros.
Signed-off-by: Philippe Simons <simons.philippe@gmail.com>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://patch.msgid.link/20250318205147.42850-1-simons.philippe@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|