Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Improve the default select_cpu() implementation making it topology
aware and handle WAKE_SYNC better.
- set_arg_maybe_null() was used to inform the verifier which ops args
could be NULL in a rather hackish way. Use the new __nullable CFI
stub tags instead.
- On Sapphire Rapids multi-socket systems, a BPF scheduler, by
hammering on the same queue across sockets, could live-lock the
system to the point where the system couldn't make reasonable forward
progress.
This could lead to soft-lockup triggered resets or stalling out
bypass mode switch and thus BPF scheduler ejection for tens of
minutes if not hours. After trying a number of mitigations, the
following set worked reliably:
- Injecting artificial cpu_relax() loops in two places while
sched_ext is trying to turn on the bypass mode.
- Triggering scheduler ejection when soft-lockup detection is
imminent (a quarter of threshold left).
While not the prettiest, the impact both in terms of code complexity
and overhead is minimal.
- A common complaint on the API is the overuse of the word "dispatch"
and the confusion around "consume". This is due to how the dispatch
queues became more generic over time. Rename the affected kfuncs for
clarity. Thanks to BPF's compatibility features, this change can be
made in a way that's both forward and backward compatible. The
compatibility code will be dropped in a few releases.
- Other misc changes
* tag 'sched_ext-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (21 commits)
sched_ext: Replace scx_next_task_picked() with switch_class() in comment
sched_ext: Rename scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*()
sched_ext: Rename scx_bpf_consume() to scx_bpf_dsq_move_to_local()
sched_ext: Rename scx_bpf_dispatch[_vtime]() to scx_bpf_dsq_insert[_vtime]()
sched_ext: scx_bpf_dispatch_from_dsq_set_*() are allowed from unlocked context
sched_ext: add a missing rcu_read_lock/unlock pair at scx_select_cpu_dfl()
sched_ext: Clarify sched_ext_ops table for userland scheduler
sched_ext: Enable the ops breather and eject BPF scheduler on softlockup
sched_ext: Avoid live-locking bypass mode switching
sched_ext: Fix incorrect use of bitwise AND
sched_ext: Do not enable LLC/NUMA optimizations when domains overlap
sched_ext: Introduce NUMA awareness to the default idle selection policy
sched_ext: Replace set_arg_maybe_null() with __nullable CFI stub tags
sched_ext: Rename CFI stubs to names that are recognized by BPF
sched_ext: Introduce LLC awareness to the default idle selection policy
sched_ext: Clarify ops.select_cpu() for single-CPU tasks
sched_ext: improve WAKE_SYNC behavior for default idle CPU selection
sched_ext: Use btf_ids to resolve task_struct
sched/ext: Use tg_cgroup() to elieminate duplicate code
sched/ext: Fix unmatch trailing comment of CONFIG_EXT_GROUP_SCHED
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpu.stat now also shows niced CPU time
- Freezer and cpuset optimizations
- Other misc changes
* tag 'cgroup-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
cgroup/cpuset: Further optimize code if CONFIG_CPUSETS_V1 not set
cgroup/cpuset: Enforce at most one rebuild_sched_domains_locked() call per operation
cgroup/cpuset: Revert "Allow suppression of sched domain rebuild in update_cpumasks_hier()"
MAINTAINERS: remove Zefan Li
cgroup/freezer: Add cgroup CGRP_FROZEN flag update helper
cgroup/freezer: Reduce redundant traversal for cgroup_freeze
cgroup/bpf: only cgroup v2 can be attached by bpf programs
Revert "cgroup: Fix memory leak caused by missing cgroup_bpf_offline"
selftests/cgroup: Fix compile error in test_cpu.c
cgroup/rstat: Selftests for niced CPU statistics
cgroup/rstat: Tracking cgroup-level niced CPU time
cgroup/cpuset: Fix spelling errors in file kernel/cgroup/cpuset.c
|
|
Pull workqueue updates from Tejun Heo:
- The maximum concurrency limit of 512 which was set a long time ago is
too low now.
A legitimate use (BPF cgroup release) of system_wq could saturate it
under stress test conditions leading to false dependencies and
deadlocks.
While the offending use was switched to a dedicated workqueue, use
the opportunity to bump WQ_MAX_ACTIVE four fold and document that
system workqueue shouldn't be saturated. Workqueue should add at
least a warning mechanism for cases where system workqueues are
saturated.
- Recent workqueue updates to support more flexible execution topology
made unbound workqueues use per-cpu worker pool frontends which
pushed up workqueue flush overhead.
As consecutive CPUs are likely to be pointing to the same worker
pool, reduce overhead by switching locks only when necessary.
* tag 'wq-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Reduce expensive locks for unbound workqueue
workqueue: Adjust WQ_MAX_ACTIVE from 512 to 2048
workqueue: doc: Add a note saturating the system_wq is not permitted
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Print more precise information about the printk log buffer memory
usage.
- Make sure that the sysrq title is shown on the console even when
deferred.
- Do not enable earlycon by `console=` which is meant to disable the
default console.
* tag 'printk-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: add dummy printk_force_console_enter/exit helpers
tty: sysrq: Use printk_force_console context on __handle_sysrq
printk: Introduce FORCE_CON flag
printk: Improve memory usage logging during boot
init: Don't proxy `console=` to earlycon
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Tree wide:
- Make nr_irqs static to the core code and provide accessor functions
to remove existing and prevent future aliasing problems with local
variables or function arguments of the same name.
Core code:
- Prevent freeing an interrupt in the devres code which is not
managed by devres in the first place.
- Use seq_put_decimal_ull_width() for decimal values output in
/proc/interrupts which increases performance significantly as it
avoids parsing the format strings over and over.
- Optimize raising the timer and hrtimer soft interrupts by using the
'set bit only' variants instead of the combined version which
checks whether ksoftirqd should be woken up. The latter is a
pointless exercise as both soft interrupts are raised in the
context of the timer interrupt and therefore never wake up
ksoftirqd.
- Delegate timer/hrtimer soft interrupt processing to a dedicated
thread on RT.
Timer and hrtimer soft interrupts are always processed in ksoftirqd
on RT enabled kernels. This can lead to high latencies when other
soft interrupts are delegated to ksoftirqd as well.
The separate thread allows to run them seperately under a RT
scheduling policy to reduce the latency overhead.
Drivers:
- New drivers or extensions of existing drivers to support Renesas
RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
chips
- Support for multi-cluster GICs on MIPS.
MIPS CPUs can come with multiple CPU clusters, where each CPU
cluster has its own GIC (Generic Interrupt Controller). This
requires to access the GIC of a remote cluster through a redirect
register block.
This is encapsulated into a set of helper functions to keep the
complexity out of the actual code paths which handle the GIC
details.
- Support for encrypted guests in the ARM GICV3 ITS driver
The ITS page needs to be shared with the hypervisor and therefore
must be decrypted.
- Small cleanups and fixes all over the place"
* tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
irqchip/riscv-aplic: Prevent crash when MSI domain is missing
genirq/proc: Use seq_put_decimal_ull_width() for decimal values
softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
timers: Use __raise_softirq_irqoff() to raise the softirq.
hrtimer: Use __raise_softirq_irqoff() to raise the softirq
riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers
irqchip: Add T-HEAD C900 ACLINT SSWI driver
dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device
irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties
irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK
irqchip/mips-gic: Prevent indirect access to clusters without CPU cores
irqchip/mips-gic: Multi-cluster support
irqchip/mips-gic: Setup defaults in each cluster
irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic()
irqchip/mips-gic: Replace open coded online CPU iterations
genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show()
genirq/devres: Don't free interrupt which is not managed by devres
irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool()
irqchip/aspeed-intc: Add AST27XX INTC support
dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects updates from Thomas Gleixner:
- Prevent destroying the kmem_cache on early failure.
Destroying a kmem_cache requires work queues to be set up, but in the
early failure case they are not yet initializated. So rather leak the
cache instead of triggering a BUG.
- Reduce parallel pool fill attempts.
Refilling the object pool requires to take the global pool lock,
which causes a massive performance issue when a large number of CPUs
attempt to refill concurrently. It turns out that it's sufficient to
let one CPU handle the refill from the to free list and in case there
are not enough objects on it to allocate new objects from the kmem
cache.
This also splits the free list handling from the actual allocation
path as that yields better results on RT where allocation is
restricted to preemptible code paths. The refill from free list has
no such restrictions.
- Consolidate the global and the per CPU pools to use the same data
structure, so all helper functions can be shared.
- Simplify the object allocation/free logic.
The allocation/free logic is an incomprehensible maze, which tries to
utilize the to free list and the global pool in the best way. This
all can be simplified into a straight forward comprehensible code
flow.
- Convert the allocation/free mechanism to batch mode.
Transferring objects from the global pool to the per CPU pools or
vice versa is done by walking the hlist and moving object by object.
That not only increases the pool lock held time, it also dirties up
to 17 cache lines.
This can be avoided by storing the pointer to the first object in a
batch of 16 objects in the objects themself and propagate it through
the batch when an object is enqueued into a pool or to a temporary
hlist head on allocation.
This allows to move batches of objects with at max four cache lines
dirtied and reduces the pool lock held time and therefore contention
significantly.
- Improve the object reusage
The current implementation is too agressively freeing unused objects,
which is counterproductive on bursty workloads like a kernel compile.
Address this by:
* increasing the per CPU pool size
* refilling the per CPU pool from the to be freed pool when the
per CPU pool emptied a batch
* keeping track of object usage with a exponentially wheighted
moving average which prevents the work queue callback to free
objects prematuraly.
This combined reduces the allocation/free rate for a full kernel
compile significantly:
kmem_cache_alloc() kmem_cache_free()
Baseline: 380k 330k
Improved: 170k 117k
- A few cleanups and a more cache line friendly layout of debug
information on top.
* tag 'core-debugobjects-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
debugobjects: Track object usage to avoid premature freeing of objects
debugobjects: Refill per CPU pool more agressively
debugobjects: Double the per CPU slots
debugobjects: Move pool statistics into global_pool struct
debugobjects: Implement batch processing
debugobjects: Prepare kmem_cache allocations for batching
debugobjects: Prepare for batching
debugobjects: Use static key for boot pool selection
debugobjects: Rework free_object_work()
debugobjects: Rework object freeing
debugobjects: Rework object allocation
debugobjects: Move min/max count into pool struct
debugobjects: Rename and tidy up per CPU pools
debugobjects: Use separate list head for boot pool
debugobjects: Move pools into a datastructure
debugobjects: Reduce parallel pool fill attempts
debugobjects: Make debug_objects_enabled bool
debugobjects: Provide and use free_object_list()
debugobjects: Remove pointless debug printk
debugobjects: Reuse put_objects() on OOM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Ingo Molnar:
- Move Split and Bus lock code to a dedicated file (Ravi Bangoria)
- Add split/bus lock support for AMD (Ravi Bangoria)
* tag 'x86-splitlock-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bus_lock: Add support for AMD
x86/split_lock: Move Split and Bus lock code to a dedicated file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core facilities:
- Add the "Lazy preemption" model (CONFIG_PREEMPT_LAZY=y), which
optimizes fair-class preemption by delaying preemption requests to
the tick boundary, while working as full preemption for
RR/FIFO/DEADLINE classes. (Peter Zijlstra)
- x86: Enable Lazy preemption (Peter Zijlstra)
- riscv: Enable Lazy preemption (Jisheng Zhang)
- Initialize idle tasks only once (Thomas Gleixner)
- sched/ext: Remove sched_fork() hack (Thomas Gleixner)
Fair scheduler:
- Optimize the PLACE_LAG when se->vlag is zero (Huang Shijie)
Idle loop:
- Optimize the generic idle loop by removing unnecessary memory
barrier (Zhongqiu Han)
RSEQ:
- Improve cache locality of RSEQ concurrency IDs for intermittent
workloads (Mathieu Desnoyers)
Waitqueues:
- Make wake_up_{bit,var} less fragile (Neil Brown)
PSI:
- Pass enqueue/dequeue flags to psi callbacks directly (Johannes
Weiner)
Preparatory patches for proxy execution:
- Add move_queued_task_locked helper (Connor O'Brien)
- Consolidate pick_*_task to task_is_pushable helper (Connor O'Brien)
- Split out __schedule() deactivate task logic into a helper (John
Stultz)
- Split scheduler and execution contexts (Peter Zijlstra)
- Make mutex::wait_lock irq safe (Juri Lelli)
- Expose __mutex_owner() (Juri Lelli)
- Remove wakeups from under mutex::wait_lock (Peter Zijlstra)
Misc fixes and cleanups:
- Remove unused __HAVE_THREAD_FUNCTIONS hook support (David
Disseldorp)
- Update the comment for TIF_NEED_RESCHED_LAZY (Sebastian Andrzej
Siewior)
- Remove unused bit_wait_io_timeout (Dr. David Alan Gilbert)
- remove the DOUBLE_TICK feature (Huang Shijie)
- fix the comment for PREEMPT_SHORT (Huang Shijie)
- Fix unnused variable warning (Christian Loehle)
- No PREEMPT_RT=y for all{yes,mod}config"
* tag 'sched-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
sched, x86: Update the comment for TIF_NEED_RESCHED_LAZY.
sched: No PREEMPT_RT=y for all{yes,mod}config
riscv: add PREEMPT_LAZY support
sched, x86: Enable Lazy preemption
sched: Enable PREEMPT_DYNAMIC for PREEMPT_RT
sched: Add Lazy preemption model
sched: Add TIF_NEED_RESCHED_LAZY infrastructure
sched/ext: Remove sched_fork() hack
sched: Initialize idle tasks only once
sched: psi: pass enqueue/dequeue flags to psi callbacks directly
sched/uclamp: Fix unnused variable warning
sched: Split scheduler and execution contexts
sched: Split out __schedule() deactivate task logic into a helper
sched: Consolidate pick_*_task to task_is_pushable helper
sched: Add move_queued_task_locked helper
locking/mutex: Expose __mutex_owner()
locking/mutex: Make mutex::wait_lock irq safe
locking/mutex: Remove wakeups from under mutex::wait_lock
sched: Improve cache locality of RSEQ concurrency IDs for intermittent workloads
sched: idle: Optimize the generic idle loop by removing needless memory barrier
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Uprobes:
- Add BPF session support (Jiri Olsa)
- Switch to RCU Tasks Trace flavor for better performance (Andrii
Nakryiko)
- Massively increase uretprobe SMP scalability by SRCU-protecting
the uretprobe lifetime (Andrii Nakryiko)
- Kill xol_area->slot_count (Oleg Nesterov)
Core facilities:
- Implement targeted high-frequency profiling by adding the ability
for an event to "pause" or "resume" AUX area tracing (Adrian
Hunter)
VM profiling/sampling:
- Correct perf sampling with guest VMs (Colton Lewis)
New hardware support:
- x86/intel: Add PMU support for Intel ArrowLake-H CPUs (Dapeng Mi)
Misc fixes and enhancements:
- x86/intel/pt: Fix buffer full but size is 0 case (Adrian Hunter)
- x86/amd: Warn only on new bits set (Breno Leitao)
- x86/amd/uncore: Avoid a false positive warning about snprintf
truncation in amd_uncore_umc_ctx_init (Jean Delvare)
- uprobes: Re-order struct uprobe_task to save some space
(Christophe JAILLET)
- x86/rapl: Move the pmu allocation out of CPU hotplug (Kan Liang)
- x86/rapl: Clean up cpumask and hotplug (Kan Liang)
- uprobes: Deuglify xol_get_insn_slot/xol_free_insn_slot paths (Oleg
Nesterov)"
* tag 'perf-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
perf/core: Correct perf sampling with guest VMs
perf/x86: Refactor misc flag assignments
perf/powerpc: Use perf_arch_instruction_pointer()
perf/core: Hoist perf_instruction_pointer() and perf_misc_flags()
perf/arm: Drop unused functions
uprobes: Re-order struct uprobe_task to save some space
perf/x86/amd/uncore: Avoid a false positive warning about snprintf truncation in amd_uncore_umc_ctx_init
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
perf/x86/intel/pt: Add support for pause / resume
perf/core: Add aux_pause, aux_resume, aux_start_paused
perf/x86/intel/pt: Fix buffer full but size is 0 case
uprobes: SRCU-protect uretprobe lifetime (with timeout)
uprobes: allow put_uprobe() from non-sleepable softirq context
perf/x86/rapl: Clean up cpumask and hotplug
perf/x86/rapl: Move the pmu allocation out of CPU hotplug
uprobe: Add support for session consumer
uprobe: Add data pointer to consumer handlers
perf/x86/amd: Warn only on new bits set
uprobes: fold xol_take_insn_slot() into xol_get_insn_slot()
uprobes: kill xol_area->slot_count
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Lockdep:
- Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING (Sebastian Andrzej
Siewior)
- Add lockdep_cleanup_dead_cpu() (David Woodhouse)
futexes:
- Use atomic64_inc_return() in get_inode_sequence_number() (Uros
Bizjak)
- Use atomic64_try_cmpxchg_relaxed() in get_inode_sequence_number()
(Uros Bizjak)
RT locking:
- Add sparse annotation PREEMPT_RT's locking (Sebastian Andrzej
Siewior)
spinlocks:
- Use atomic_try_cmpxchg_release() in osq_unlock() (Uros Bizjak)
atomics:
- x86: Use ALT_OUTPUT_SP() for __alternative_atomic64() (Uros Bizjak)
- x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu() (Uros
Bizjak)
KCSAN, seqlocks:
- Support seqcount_latch_t (Marco Elver)
<linux/cleanup.h>:
- Add if_not_guard() conditional guard helper (David Lechner)
- Adjust scoped_guard() macros to avoid potential warning (Przemek
Kitszel)
- Remove address space of returned pointer (Uros Bizjak)
WW mutexes:
- locking/ww_mutex: Adjust to lockdep nest_lock requirements (Thomas
Hellström)
Rust integration:
- Fix raw_spin_lock initialization on PREEMPT_RT (Eder Zulian)
Misc cleanups & fixes:
- lockdep: Fix wait-type check related warnings (Ahmed Ehab)
- lockdep: Use info level for initial info messages (Jiri Slaby)
- spinlocks: Make __raw_* lock ops static (Geert Uytterhoeven)
- pvqspinlock: Convert fields of 'enum vcpu_state' to uppercase
(Qiuxu Zhuo)
- iio: magnetometer: Fix if () scoped_guard() formatting (Stephen
Rothwell)
- rtmutex: Fix misleading comment (Peter Zijlstra)
- percpu-rw-semaphores: Fix grammar in percpu-rw-semaphore.rst (Xiu
Jianfeng)"
* tag 'locking-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
locking/Documentation: Fix grammar in percpu-rw-semaphore.rst
iio: magnetometer: fix if () scoped_guard() formatting
rust: helpers: Avoid raw_spin_lock initialization for PREEMPT_RT
kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
seqlock, treewide: Switch to non-raw seqcount_latch interface
kcsan, seqlock: Support seqcount_latch_t
time/sched_clock: Broaden sched_clock()'s instrumentation coverage
time/sched_clock: Swap update_clock_read_data() latch writes
locking/atomic/x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu()
locking/atomic/x86: Use ALT_OUTPUT_SP() for __alternative_atomic64()
cleanup: Add conditional guard helper
cleanup: Adjust scoped_guard() macros to avoid potential warning
locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()
cleanup: Remove address space of returned pointer
locking/rtmutex: Fix misleading comment
locking/rt: Annotate unlock followed by lock for sparse.
locking/rt: Add sparse annotation for RCU.
locking/rt: Remove one __cond_lock() in RT's spin_trylock_irqsave()
locking/rt: Add sparse annotation PREEMPT_RT's sleeping locks.
locking/pvqspinlock: Convert fields of 'enum vcpu_state' to uppercase
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Introduction of the new SRCU-lite flavour with a new pair of
srcu_read_[un]lock_lite() APIs. In practice the read side using
this flavour becomes lighter by removing a full memory barrier on
LOCK and a full memory barrier on UNLOCK. This comes at the expense
of a higher latency write side with two (in the best case of a
snaphot of unused read-sides) or more RCU grace periods on the
update side which now assumes by itself the whole full ordering
guarantee against the LOCK/UNLOCK counters on both indexes, along
with the accesses performed inside.
Uretprobes is a known potential user.
Note this doesn't replace the default normal flavour of SRCU which
still behaves the same as usual.
- Add testing of SRCU-lite through rcutorture and rcuscale
- Various cleanups on the way.
Fixes:
- Allow short-circuiting RCU-TASKS-RUDE grace periods on
architectures that have sane noinstr boundaries forbidding tracing
on low-level idle and kernel entry code. RCU-TASKS is enough on
such configurations because it involves an RCU grace period that
waits for all idle tasks to either schedule out voluntarily or
enter into RCU unwatched noinstr code.
- Allow and test start_poll_synchronize_rcu() with IRQs disabled.
- Mention rcuog kthreads in relevant documentation and Kconfig help
- Various fixes and consolidations
rcutorture:
- Add --no-affinity on tools to leave the affinity setting of guests
up to the user.
- Add guest_os_delay parameter to rcuscale for better warm-up
control.
- Fix and improve some rcuscale error handling.
- Various cleanups and fixes
stall:
- Remove dead code
- Stop dumping tasks if a stalled grace period eventually ended
midway as that only produces confusing output.
- Optimize detection of stalling CPUs and avoid useless node locking
otherwise.
NOCB:
- Fix rcu_barrier() hang due to a race against callbacks
deoffloading. This is not yet used, except by rcutorture, and waits
for its promised cpusets interface.
- Remove leftover function declaration"
* tag 'rcu.release.v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (42 commits)
rcuscale: Remove redundant WARN_ON_ONCE() splat
rcuscale: Do a proper cleanup if kfree_scale_init() fails
srcu: Unconditionally record srcu_read_lock_lite() in ->srcu_reader_flavor
srcu: Check for srcu_read_lock_lite() across all CPUs
srcu: Remove smp_mb() from srcu_read_unlock_lite()
rcutorture: Avoid printing cpu=-1 for no-fault RCU boost failure
rcuscale: Add guest_os_delay module parameter
refscale: Correct affinity check
torture: Add --no-affinity parameter to kvm.sh
rcu/nocb: Fix missed RCU barrier on deoffloading
rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu
rcu/srcutiny: don't return before reenabling preemption
rcu-tasks: Remove open-coded one-byte cmpxchg() emulation
doc: Remove kernel-parameters.txt entry for rcutorture.read_exit
rcutorture: Test start-poll primitives with interrupts disabled
rcu: Permit start_poll_synchronize_rcu*() with interrupts disabled
rcu: Allow short-circuiting of synchronize_rcu_tasks_rude()
doc: Add rcuog kthreads to kernel-per-CPU-kthreads.rst
rcu: Add rcuog kthreads to RCU_NOCB_CPU help text
rcu: Use the BITS_PER_LONG macro
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hmon updates from Guenter Roeck:
"New drivers:
- ISL28022 power monitor
- Nuvoton NCT7363Y
Added support for new chips to existing drivers:
- The tmp180 driver now supports NXP p3t1085, including its I3C mode
- The ina2xx driver now supports SY24655 and INA260
- The amc6821 driver now supports tsd,mule
Other notable improvements:
- The sht4x driver now supports the chip heater
- The pmbus/isl68137 driver now supports a voltage divider on Vout
- The cros_ec driver registers with the thermal framework
- The pmbus/ltc2978 driver now supports LTC7841
- The PMBus core now allow drivers to override WRITE_PROTECT
Various other minor improvements and cleanups"
* tag 'hwmon-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (51 commits)
hwmon: (pmbus/isl68137) add support for voltage divider on Vout
dt-bindings: hwmon: isl68137: add bindings to support voltage dividers
hwmon: tmp108: fix I3C dependency
hwmon: (cros_ec) register thermal sensors to thermal framework
hwmon: (tmp108) Add support for I3C device
hwmon: (tmp108) Add helper function tmp108_common_probe() to prepare I3C support
hwmon: (acpi_power_meter) Fix fail to load module on platform without _PMD method
hwmon: (nct6775-core) Fix overflows seen when writing limit attributes
hwmon: (pwm-fan) Introduce start from stopped state handling
dt-bindings: hwmon: pwm-fan: Document start from stopped state properties
hwmon: (tmp108) Add NXP p3t1085 support
dt-bindings: hwmon: ti,tmp108: Add nxp,p3t1085 compatible string
hwmon: (sch5627, max31827) Fix typos in driver documentation
hwmon: (jc42) Drop of_match_ptr() protection
hwmon: (f71882fg) Fix grammar in fan speed trip points explanation
dt-bindings: hwmon: pmbus: add ti tps25990 support
hwmon: (pmbus/core) clear faults after setting smbalert mask
hwmon: (pmbus/core) allow drivers to override WRITE_PROTECT
hwmon: (pmbus) add documentation for existing flags
hwmon: (ina226) Add support for SY24655
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These include a couple of fixes, a new ACPI backlight quirk for Apple
MacbookPro11,2 and Air7,2 and a bunch of cleanups:
- Fix _CPC register setting issue for registers located in memory in
the ACPI CPPC library code (Lifeng Zheng)
- Use DEFINE_SIMPLE_DEV_PM_OPS in the ACPI battery driver, make it
use devm_ for initializing mutexes and allocating driver data, and
make it check the register_pm_notifier() return value (Thomas
Weißschuh, Andy Shevchenko)
- Make the ACPI EC driver support compile-time conditional and allow
ACPI to be built without CONFIG_HAS_IOPORT (Arnd Bergmann)
- Remove a redundant error check from the pfr_telemetry driver (Colin
Ian King)
- Rearrange the processor_perflib code in the ACPI processor driver
to avoid compiling x86-specific code on other architectures (Arnd
Bergmann)
- Add adev NULL check to acpi_quirk_skip_serdev_enumeration() and
make UART skip quirks work on PCI UARTs without an UID (Hans de
Goede)
- Force native backlight handling Apple MacbookPro11,2 and Air7,2 in
the ACPI video driver (Jonathan Denose)
- Switch several ACPI platform drivers back to using struct
platform_driver::remove() (Uwe Kleine-König)
- Replace strcpy() with strscpy() in multiple places in the ACPI
subsystem (Muhammad Qasim Abdul Majeed, Abdul Rahim)"
* tag 'acpi-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
ACPI: video: force native for Apple MacbookPro11,2 and Air7,2
ACPI: CPPC: Fix _CPC register setting issue
ACPI: Switch back to struct platform_driver::remove()
ACPI: x86: Add adev NULL check to acpi_quirk_skip_serdev_enumeration()
ACPI: x86: Make UART skip quirks work on PCI UARTs without an UID
ACPI: allow building without CONFIG_HAS_IOPORT
ACPI: processor_perflib: extend X86 dependency
ACPI: scan: Use strscpy() instead of strcpy()
ACPI: SBSHC: Use strscpy() instead of strcpy()
ACPI: SBS: Use strscpy() instead of strcpy()
ACPI: power: Use strscpy() instead of strcpy()
ACPI: pci_root: Use strscpy() instead of strcpy()
ACPI: pci_link: Use strscpy() instead of strcpy()
ACPI: event: Use strscpy() instead of strcpy()
ACPI: EC: Use strscpy() instead of strcpy()
ACPI: APD: Use strscpy() instead of strcpy()
ACPI: thermal: Use strscpy() instead of strcpy()
ACPI: battery: Check for error code from devm_mutex_init() call
ACPI: EC: make EC support compile-time conditional
ACPI: pfr_telemetry: remove redundant error check on ret
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These are thermal core changes, including the addition of support for
temperature thresholds that can be set from user space, fixes related
to thermal zone initialization, suspend/resume and exit, locking
rework and rearrangement of the code handling thermal zone temperature
updates.
Specifics:
- Add support for thermal thresholds that can be added and removed
from user space via netlink along with a related library update
(Daniel Lezcano)
- Fix thermal zone initialization, suspend/resume and exit
synchronization issues (Rafael Wysocki)
- Rearrange locking in the thermal core to use guards (Rafael
Wysocki)
- Make the code handling thermal zone temperature updates use sorted
lists of trip points to reduce the number of trip points table
walks in the thermal core (Rafael Wysocki)
- Fix and clean up the thermal testing facility code (Rafael Wysocki)
- Fix a Power Allocator thermal governor issue (ZhengShaobo)"
* tag 'thermal-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (45 commits)
thermal: testing: Initialize some variables annoteded with _free()
thermal: testing: Use DEFINE_FREE() and __free() to simplify code
thermal: testing: Simplify tt_get_tt_zone()
thermal: gov_power_allocator: Granted power set to max when nobody request power
thermal: core: Relocate thermal zone initialization routine
thermal: core: Use trip lists for trip crossing detection
thermal: core: Eliminate thermal_zone_trip_down()
thermal: core: Relocate functions that update trip points
thermal: core: Move some trip processing to thermal_trip_crossed()
thermal: core: Pass trip descriptor to thermal_trip_crossed()
thermal: core: Rearrange __thermal_zone_device_update()
thermal: core: Prepare for moving trips between sorted lists
thermal: core: Rename trip list node in struct thermal_trip_desc
thermal: core: Build sorted lists instead of sorting them later
thermal/lib: Fix memory leak on error in thermal_genl_auto()
thermal: thresholds: Fix thermal lock annotation issue
tools/thermal/thermal-engine: Take into account the thresholds API
tools/lib/thermal: Add the threshold netlink ABI
tools/lib/thermal: Make more generic the command encoding function
thermal: netlink: Add the commands and the events for the thresholds
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The amd-pstate cpufreq driver gets the majority of changes this time.
They are mostly fixes and cleanups, but one of them causes it to
become the default cpufreq driver on some AMD server platforms.
Apart from that, the menu cpuidle governor is modified to not use
iowait any more, the intel_idle gets a custom C-states table for
Granite Rapids Xeon D, and the intel_pstate driver will use a more
aggressive Balance- performance default EPP value on Granite Rapids
now.
There are also some fixes, cleanups and tooling updates.
Specifics:
- Update the amd-pstate driver to set the initial scaling frequency
policy lower bound to be the lowest non-linear frequency (Dhananjay
Ugwekar)
- Enable amd-pstate by default on servers starting with newer AMD
Epyc processors (Swapnil Sapkal)
- Align more codepaths between shared memory and MSR designs in
amd-pstate (Dhananjay Ugwekar)
- Clean up amd-pstate code to rename functions and remove redundant
calls (Dhananjay Ugwekar, Mario Limonciello)
- Do other assorted fixes and cleanups in amd-pstate (Dhananjay
Ugwekar and Mario Limonciello)
- Change the Balance-performance EPP value for Granite Rapids in the
intel_pstate driver to a more performance-biased one (Srinivas
Pandruvada)
- Simplify MSR read on the boot CPU in the ACPI cpufreq driver (Chang
S. Bae)
- Ensure sugov_eas_rebuild_sd() is always called when sugov_init()
succeeds to always enforce sched domains rebuild in case EAS needs
to be enabled (Christian Loehle)
- Switch cpufreq back to platform_driver::remove() (Uwe Kleine-König)
- Use proper frequency unit names in cpufreq (Marcin Juszkiewicz)
- Add a built-in idle states table for Granite Rapids Xeon D to the
intel_idle driver (Artem Bityutskiy)
- Fix some typos in comments in the cpuidle core and drivers (Shen
Lichuan)
- Remove iowait influence from the menu cpuidle governor (Christian
Loehle)
- Add min/max available performance state limits to the Energy Model
management code (Lukasz Luba)
- Update pm-graph to v5.13 (Todd Brandt)
- Add documentation for some recently introduced cpupower utility
options (Tor Vic)
- Make cpupower inform users where cpufreq-bench.conf should be
located when opening it fails (Peng Fan)
- Allow overriding cross-compiling env params in cpupower (Peng Fan)
- Add compile_commands.json to .gitignore in cpupower (John B. Wyatt
IV)
- Improve disable c_state block in cpupower bindings and add a test
to confirm that CPU state is disabled to it (John B. Wyatt IV)
- Add Chinese Simplified translation to cpupower (Kieran Moy)
- Add checks for xgettext and msgfmt to cpupower (Siddharth Menon)"
* tag 'pm-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (38 commits)
cpufreq: intel_pstate: Update Balance-performance EPP for Granite Rapids
cpufreq: ACPI: Simplify MSR read on the boot CPU
sched/cpufreq: Ensure sd is rebuilt for EAS check
intel_idle: add Granite Rapids Xeon D support
PM: EM: Add min/max available performance state limits
cpufreq/amd-pstate: Move registration after static function call update
cpufreq/amd-pstate: Push adjust_perf vfunc init into cpu_init
cpufreq/amd-pstate: Align offline flow of shared memory and MSR based systems
cpufreq/amd-pstate: Call cppc_set_epp_perf in the reenable function
cpufreq/amd-pstate: Do not attempt to clear MSR_AMD_CPPC_ENABLE
cpufreq/amd-pstate: Rename functions that enable CPPC
cpufreq/amd-pstate-ut: Add fix for min freq unit test
amd-pstate: Switch to amd-pstate by default on some Server platforms
amd-pstate: Set min_perf to nominal_perf for active mode performance gov
cpufreq/amd-pstate: Remove the redundant amd_pstate_set_driver() call
cpufreq/amd-pstate: Remove the switch case in amd_pstate_init()
cpufreq/amd-pstate: Call amd_pstate_set_driver() in amd_pstate_register_driver()
cpufreq/amd-pstate: Call amd_pstate_register() in amd_pstate_init()
cpufreq/amd-pstate: Set the initial min_freq to lowest_nonlinear_freq
cpufreq/amd-pstate: Remove the redundant verify() function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"This contains a single series from Uros to replace uses of
<linux/random.h> with prandom.h or other more specific headers
as needed, in order to avoid a circular header issue.
Uros' goal is to be able to use percpu.h from prandom.h, which
will then allow him to define __percpu in percpu.h rather than
in compiler_types.h"
* tag 'random-6.13-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: Include <linux/percpu.h> in <linux/prandom.h>
random: Do not include <linux/prandom.h> in <linux/random.h>
netem: Include <linux/prandom.h> in sch_netem.c
lib/test_scanf: Include <linux/prandom.h> instead of <linux/random.h>
lib/test_parman: Include <linux/prandom.h> instead of <linux/random.h>
bpf/tests: Include <linux/prandom.h> instead of <linux/random.h>
lib/rbtree-test: Include <linux/prandom.h> instead of <linux/random.h>
random32: Include <linux/prandom.h> instead of <linux/random.h>
kunit: string-stream-test: Include <linux/prandom.h>
lib/interval_tree_test.c: Include <linux/prandom.h> instead of <linux/random.h>
bpf: Include <linux/prandom.h> instead of <linux/random.h>
scsi: libfcoe: Include <linux/prandom.h> instead of <linux/random.h>
fscrypt: Include <linux/once.h> in fs/crypto/keyring.c
mtd: tests: Include <linux/prandom.h> instead of <linux/random.h>
media: vivid: Include <linux/prandom.h> in vivid-vid-cap.c
drm/lib: Include <linux/prandom.h> instead of <linux/random.h>
drm/i915/selftests: Include <linux/prandom.h> instead of <linux/random.h>
crypto: testmgr: Include <linux/prandom.h> instead of <linux/random.h>
x86/kaslr: Include <linux/prandom.h> instead of <linux/random.h>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Add sig driver API
- Remove signing/verification from akcipher API
- Move crypto_simd_disabled_for_test to lib/crypto
- Add WARN_ON for return values from driver that indicates memory
corruption
Algorithms:
- Provide crc32-arch and crc32c-arch through Crypto API
- Optimise crc32c code size on x86
- Optimise crct10dif on arm/arm64
- Optimise p10-aes-gcm on powerpc
- Optimise aegis128 on x86
- Output full sample from test interface in jitter RNG
- Retry without padata when it fails in pcrypt
Drivers:
- Add support for Airoha EN7581 TRNG
- Add support for STM32MP25x platforms in stm32
- Enable iproc-r200 RNG driver on BCMBCA
- Add Broadcom BCM74110 RNG driver"
* tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits)
crypto: marvell/cesa - fix uninit value for struct mv_cesa_op_ctx
crypto: cavium - Fix an error handling path in cpt_ucode_load_fw()
crypto: aesni - Move back to module_init
crypto: lib/mpi - Export mpi_set_bit
crypto: aes-gcm-p10 - Use the correct bit to test for P10
hwrng: amd - remove reference to removed PPC_MAPLE config
crypto: arm/crct10dif - Implement plain NEON variant
crypto: arm/crct10dif - Macroify PMULL asm code
crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl
crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code
crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply
crypto: arm64/crct10dif - Remove obsolete chunking logic
crypto: bcm - add error check in the ahash_hmac_init function
crypto: caam - add error check to caam_rsa_set_priv_key_form
hwrng: bcm74110 - Add Broadcom BCM74110 RNG driver
dt-bindings: rng: add binding for BCM74110 RNG
padata: Clean up in padata_do_multithreaded()
crypto: inside-secure - Fix the return value of safexcel_xcbcmac_cra_init()
crypto: qat - Fix missing destroy_workqueue in adf_init_aer()
crypto: rsassa-pkcs1 - Reinstate support for legacy protocols
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform firmware updates from Tzung-Bi Shih:
- Do not double register "simple-framebuffer" platform device if
Generic System Framebuffers (sysfb) already did that
- Fix a missing of unregistering platform driver in error handling path
* tag 'chrome-platform-firmware-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
firmware: google: Unregister driver_info on failure
firmware: coreboot: Don't register a pdev if screen_info data is present
firmware: sysfb: Add a sysfb_handles_screen_info() helper function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for running Linux in a protected VM under the Arm
Confidential Compute Architecture (CCA)
- Guarded Control Stack user-space support. Current patches follow the
x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
patches (already on the list) will add support for clone3() allowing
finer-grained control of the shadow stack size and placement from
libc
- AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
getting close with the upcoming dpISA support)
- Other arch features:
- In-kernel use of the memcpy instructions, FEAT_MOPS (previously
only exposed to user; uaccess support not merged yet)
- MTE: hugetlbfs support and the corresponding kselftests
- Optimise CRC32 using the PMULL instructions
- Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG
- Optimise the kernel TLB flushing to use the range operations
- POE/pkey (permission overlays): further cleanups after bringing
the signal handler in line with the x86 behaviour for 6.12
- arm64 perf updates:
- Support for the NXP i.MX91 PMU in the existing IMX driver
- Support for Ampere SoCs in the Designware PCIe PMU driver
- Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC
- Support for Samsung's 'Mongoose' CPU PMU
- Support for PMUv3.9 finer-grained userspace counter access
control
- Switch back to platform_driver::remove() now that it returns
'void'
- Add some missing events for the CXL PMU driver
- Miscellaneous arm64 fixes/cleanups:
- Page table accessors cleanup: type updates, drop unused macros,
reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
check addresses before runtime P4D/PUD folding
- Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
FEAT_ECV for the generic timers) allowing Linux to boot with
firmware deployments that don't set SCTLR_EL3.ECVEn
- ACPI/arm64: tighten the check for the array of platform timer
structures and adjust the error handling procedure in
gtdt_parse_timer_block()
- Optimise the cache flush for the uprobes xol slot (skip if no
change) and other uprobes/kprobes cleanups
- Fix the context switching of tpidrro_el0 when kpti is enabled
- Dynamic shadow call stack fixes
- Sysreg updates
- Various arm64 kselftest improvements
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits)
arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
kselftest/arm64: Try harder to generate different keys during PAC tests
kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
arm64/ptrace: Clarify documentation of VL configuration via ptrace
kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
acpi/arm64: remove unnecessary cast
arm64/mm: Change protval as 'pteval_t' in map_range()
kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
kselftest/arm64: Add FPMR coverage to fp-ptrace
kselftest/arm64: Expand the set of ZA writes fp-ptrace does
kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
kselftest/arm64: Enable build of PAC tests with LLVM=1
kselftest/arm64: Check that SVCR is 0 in signal handlers
selftests/mm: Fix unused function warning for aarch64_write_signal_pkey()
kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
kselftest/arm64: Fix build with stricter assemblers
arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
arm64/scs: Deal with 64-bit relative offsets in FDE frames
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
"Thirteen patches, all focused on moving away from the current 'secid'
LSM identifier to a richer 'lsm_prop' structure.
This move will help reduce the translation that is necessary in many
LSMs, offering better performance, and make it easier to support
different LSMs in the future"
* tag 'lsm-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: remove lsm_prop scaffolding
netlabel,smack: use lsm_prop for audit data
audit: change context data from secid to lsm_prop
lsm: create new security_cred_getlsmprop LSM hook
audit: use an lsm_prop in audit_names
lsm: use lsm_prop in security_inode_getsecid
lsm: use lsm_prop in security_current_getsecid
audit: update shutdown LSM data
lsm: use lsm_prop in security_ipc_getsecid
audit: maintain an lsm_prop in audit_context
lsm: add lsmprop_to_secctx hook
lsm: use lsm_prop in security_audit_rule_match
lsm: add the lsm_prop data structure
|
|
Pull io_uring updates from Jens Axboe:
- Cleanups of the eventfd handling code, making it fully private.
- Support for sending a sync message to another ring, without having a
ring available to send a normal async message.
- Get rid of the separate unlocked hash table, unify everything around
the single locked one.
- Add support for ring resizing. It can be hard to appropriately size
the CQ ring upfront, if the application doesn't know how busy it will
be. This results in applications sizing rings for the most busy case,
which can be wasteful. With ring resizing, they can start small and
grow the ring, if needed.
- Add support for fixed wait regions, rather than needing to copy the
same wait data tons of times for each wait operation.
- Rewrite the resource node handling, which before was serialized per
ring. This caused issues with particularly fixed files, where one
file waiting on IO could hold up putting and freeing of other
unrelated files. Now each node is handled separately. New code is
much simpler too, and was a net 250 line reduction in code.
- Add support for just doing partial buffer clones, rather than always
cloning the entire buffer table.
- Series adding static NAPI support, where a specific NAPI instance is
used rather than having a list of them available that need lookup.
- Add support for mapped regions, and also convert the fixed wait
support mentioned above to that concept. This avoids doing special
mappings for various planned features, and folds the existing
registered wait into that too.
- Add support for hybrid IO polling, which is a variant of strict
IOPOLL but with an initial sleep delay to avoid spinning too early
and wasting resources on devices that aren't necessarily in the < 5
usec category wrt latencies.
- Various cleanups and little fixes.
* tag 'for-6.13/io_uring-20241118' of git://git.kernel.dk/linux: (79 commits)
io_uring/region: fix error codes after failed vmap
io_uring: restore back registered wait arguments
io_uring: add memory region registration
io_uring: introduce concept of memory regions
io_uring: temporarily disable registered waits
io_uring: disable ENTER_EXT_ARG_REG for IOPOLL
io_uring: fortify io_pin_pages with a warning
switch io_msg_ring() to CLASS(fd)
io_uring: fix invalid hybrid polling ctx leaks
io_uring/uring_cmd: fix buffer index retrieval
io_uring/rsrc: add & apply io_req_assign_buf_node()
io_uring/rsrc: remove '->ctx_ptr' of 'struct io_rsrc_node'
io_uring/rsrc: pass 'struct io_ring_ctx' reference to rsrc helpers
io_uring: avoid normal tw intermediate fallback
io_uring/napi: add static napi tracking strategy
io_uring/napi: clean up __io_napi_do_busy_loop
io_uring/napi: Use lock guards
io_uring/napi: improve __io_napi_add
io_uring/napi: fix io_napi_entry RCU accesses
io_uring/napi: protect concurrent io_napi_entry timeout accesses
...
|
|
Pull block updates from Jens Axboe:
- NVMe updates via Keith:
- Use uring_cmd helper (Pavel)
- Host Memory Buffer allocation enhancements (Christoph)
- Target persistent reservation support (Guixin)
- Persistent reservation tracing (Guixen)
- NVMe 2.1 specification support (Keith)
- Rotational Meta Support (Matias, Wang, Keith)
- Volatile cache detection enhancment (Guixen)
- MD updates via Song:
- Maintainers update
- raid5 sync IO fix
- Enhance handling of faulty and blocked devices
- raid5-ppl atomic improvement
- md-bitmap fix
- Support for manually defining embedded partition tables
- Zone append fixes and cleanups
- Stop sending the queued requests in the plug list to the driver
->queue_rqs() handle in reverse order.
- Zoned write plug cleanups
- Cleanups disk stats tracking and add support for disk stats for
passthrough IO
- Add preparatory support for file system atomic writes
- Add lockdep support for queue freezing. Already found a bunch of
issues, and some fixes for that are in here. More will be coming.
- Fix race between queue stopping/quiescing and IO queueing
- ublk recovery improvements
- Fix ublk mmap for 64k pages
- Various fixes and cleanups
* tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux: (118 commits)
MAINTAINERS: Update git tree for mdraid subsystem
block: make struct rq_list available for !CONFIG_BLOCK
block/genhd: use seq_put_decimal_ull for diskstats decimal values
block: don't reorder requests in blk_mq_add_to_batch
block: don't reorder requests in blk_add_rq_to_plug
block: add a rq_list type
block: remove rq_list_move
virtio_blk: reverse request order in virtio_queue_rqs
nvme-pci: reverse request order in nvme_queue_rqs
btrfs: validate queue limits
block: export blk_validate_limits
nvmet: add tracing of reservation commands
nvme: parse reservation commands's action and rtype to string
nvmet: report ns's vwc not present
md/raid5: Increase r5conf.cache_name size
block: remove the ioprio field from struct request
block: remove the write_hint field from struct request
nvme: check ns's volatile write cache not present
nvme: add rotational support
nvme: use command set independent id ns if available
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata updates from Niklas Cassel:
- Fix typos in comments (Yan Zhen)
- Remove unused macro definitions (Damien Le Moal)
- Switch back to the .remove() callback (Uwe Kleine-König)
- Make use of the get_unaligned_be24() helper instead of open coding
(Andy Shevchenko)
- Refactor and cleanup ata_scsi_simulate() command emulation, such that
all commands use ata_scsi_rbuf_fill() with its own callback (Damien
Le Moal)
- Improve ata_scsi_simulate() command emulation by accurately setting
the SCSI command residual (number of bytes not filled) in the command
reply (Damien Le Moal)
- Add missing iommus property in ahci-platform device tree binding
(Frank Wunderlich)
* tag 'ata-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
dt-bindings: ata: ahci-platform: add missing iommus property
ata: libata-scsi: Return residual for emulated SCSI commands
ata: libata-scsi: Remove struct ata_scsi_args
ata: libata-scsi: Document all VPD page inquiry actors
ata: libata-scsi: Refactor ata_scsiop_maint_in()
ata: libata-scsi: Refactor ata_scsiop_read_cap()
ata: libata-scsi: Refactor ata_scsi_simulate()
ata: libata-scsi: Refactor scsi_6_lba_len() with use of get_unaligned_be24()
ata: Switch back to struct platform_driver::remove()
ata: libata: Remove unused macro definitions
ata: Fix typos in the comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Changes outside of btrfs: add io_uring command flag to track a dying
task (the rest will go via the block git tree).
User visible changes:
- wire encoded read (ioctl) to io_uring commands, this can be used on
itself, in the future this will allow 'send' to be asynchronous. As
a consequence, the encoded read ioctl can also work in non-blocking
mode
- new ioctl to wait for cleaned subvolumes, no need to use the
generic and root-only SEARCH_TREE ioctl, will be used by "btrfs
subvol sync"
- recognize different paths/symlinks for the same devices and don't
report them during rescanning, this can be observed with LVM or DM
- seeding device use case change, the sprout device (the one
capturing new writes) will not clear the read-only status of the
super block; this prevents accumulating space from deleted
snapshots
Performance improvements:
- reduce lock contention when traversing extent buffers
- reduce extent tree lock contention when searching for inline
backref
- switch from rb-trees to xarray for delayed ref tracking,
improvements due to better cache locality, branching factors and
more compact data structures
- enable extent map shrinker again (prevent memory exhaustion under
some types of IO load), reworked to run in a single worker thread
(there used to be problems causing long stalls under memory
pressure)
Core changes:
- raid-stripe-tree feature updates:
- make device replace and scrub work
- implement partial deletion of stripe extents
- new selftests
- split the config option BTRFS_DEBUG and add EXPERIMENTAL for
features that are experimental or with known problems so we don't
misuse debugging config for that
- subpage mode updates (sector < page):
- update compression implementations
- update writepage, writeback
- continued folio API conversions:
- buffered writes
- make buffered write copy one page at a time, preparatory work for
future integration with large folios, may cause performance drop
- proper locking of root item regarding starting send
- error handling improvements
- code cleanups and refactoring:
- dead code removal
- unused parameter reduction
- lockdep assertions"
* tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (119 commits)
btrfs: send: check for read-only send root under critical section
btrfs: send: check for dead send root under critical section
btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap()
btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl()
btrfs: fix a typo in btrfs_use_zone_append
btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read()
btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapshot()
btrfs: remove hole from struct btrfs_delayed_node
btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list
btrfs: add new ioctl to wait for cleaned subvolumes
btrfs: simplify range tracking in cow_file_range()
btrfs: remove conditional path allocation in btrfs_read_locked_inode()
btrfs: push cleanup into btrfs_read_locked_inode()
io_uring/cmd: let cmds to know about dying task
btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu()
btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl)
btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()
btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set
btrfs: change btrfs_encoded_read() so that reading of extent is done by caller
btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most
notably in the journaling code, bufered I/O, and compiler warning
cleanups"
* tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits)
jbd2: Fix comment describing journal_init_common()
ext4: prevent an infinite loop in the lazyinit thread
ext4: use struct_size() to improve ext4_htree_store_dirent()
ext4: annotate struct fname with __counted_by()
jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings
ext4: use str_yes_no() helper function
ext4: prevent delalloc to nodelalloc on remount
jbd2: make b_frozen_data allocation always succeed
ext4: cleanup variable name in ext4_fc_del()
ext4: use string choices helpers
jbd2: remove the 'success' parameter from the jbd2_do_replay() function
jbd2: remove useless 'block_error' variable
jbd2: factor out jbd2_do_replay()
jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass()
jbd2: unified release of buffer_head in do_one_pass()
jbd2: remove redundant judgments for check v1 checksum
ext4: use ERR_CAST to return an error-valued pointer
mm: zero range of eof folio exposed by inode size extension
ext4: partial zero eof block on unaligned inode size extension
ext4: disambiguate the return value of ext4_dio_write_end_io()
...
|
|
Pull xattr updates from Al Viro:
"Sanitize xattr and io_uring interactions with it, add *xattrat()
syscalls, sanitize struct filename handling in there"
* tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr: remove redundant check on variable err
fs/xattr: add *at family syscalls
new helpers: file_removexattr(), filename_removexattr()
new helpers: file_listxattr(), filename_listxattr()
replace do_getxattr() with saner helpers.
replace do_setxattr() with saner helpers.
new helper: import_xattr_name()
fs: rename struct xattr_ctx to kernel_xattr_ctx
xattr: switch to CLASS(fd)
io_[gs]etxattr_prep(): just use getname()
io_uring: IORING_OP_F[GS]ETXATTR is fine with REQ_F_FIXED_FILE
getname_maybe_null() - the third variant of pathname copy-in
teach filename_lookup() to treat NULL filename as ""
|
|
Pull 'struct fd' class updates from Al Viro:
"The bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same scope
where they'd been created, getting rid of reassignments and passing
them by reference, converting to CLASS(fd{,_pos,_raw}).
We are getting very close to having the memory safety of that stuff
trivial to verify"
* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
deal with the last remaing boolean uses of fd_file()
css_set_fork(): switch to CLASS(fd_raw, ...)
memcg_write_event_control(): switch to CLASS(fd)
assorted variants of irqfd setup: convert to CLASS(fd)
do_pollfd(): convert to CLASS(fd)
convert do_select()
convert vfs_dedupe_file_range().
convert cifs_ioctl_copychunk()
convert media_request_get_by_fd()
convert spu_run(2)
switch spufs_calls_{get,put}() to CLASS() use
convert cachestat(2)
convert do_preadv()/do_pwritev()
fdget(), more trivial conversions
fdget(), trivial conversions
privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
o2hb_region_dev_store(): avoid goto around fdget()/fdput()
introduce "fd_pos" class, convert fdget_pos() users to it.
fdget_raw() users: switch to CLASS(fd_raw)
convert vmsplice() to CLASS(fd)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs untorn write support from Christian Brauner:
"An atomic write is a write issed with torn-write protection. This
means for a power failure or any hardware failure all or none of the
data from the write will be stored, never a mix of old and new data.
This work is already supported for block devices. If a block device is
opened with O_DIRECT and the block device supports atomic write, then
FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block
device.
This contains the work to expand atomic write support to filesystems,
specifically ext4 and XFS. Currently, only support for writing exactly
one filesystem block atomically is added.
Since it's now possible to have filesystem block size > page size for
XFS, it's possible to write 4K+ blocks atomically on x86"
* tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: drop an obsolete comment in iomap_dio_bio_iter
ext4: Do not fallback to buffered-io for DIO atomic write
ext4: Support setting FMODE_CAN_ATOMIC_WRITE
ext4: Check for atomic writes support in write iter
ext4: Add statx support for atomic writes
xfs: Support setting FMODE_CAN_ATOMIC_WRITE
xfs: Validate atomic writes
xfs: Support atomic write for statx
fs: iomap: Atomic write support
fs: Export generic_atomic_write_valid()
block: Add bdev atomic write limits helpers
fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
block/fs: Pass an iocb to generic_atomic_write_valid()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull tmpfs case folding updates from Christian Brauner:
"This adds case-insensitive support for tmpfs.
The work contained in here adds support for case-insensitive file
names lookups in tmpfs. The main difference from other casefold
filesystems is that tmpfs has no information on disk, just on RAM, so
we can't use mkfs to create a case-insensitive tmpfs. For this
implementation, there's a mount option for casefolding. The rest of
the patchset follows a similar approach as ext4 and f2fs.
The use case for this feature is similar to the use case for ext4, to
better support compatibility layers (like Wine), particularly in
combination with sandboxing/container tools (like Flatpak).
Those containerization tools can share a subset of the host filesystem
with an application. In the container, the root directory and any
parent directories required for a shared directory are on tmpfs, with
the shared directories bind-mounted into the container's view of the
filesystem.
If the host filesystem is using case-insensitive directories, then the
application can do lookups inside those directories in a
case-insensitive way, without this needing to be implemented in
user-space. However, if the host is only sharing a subset of a
case-insensitive directory with the application, then the parent
directories of the mount point will be part of the container's root
tmpfs. When the application tries to do case-insensitive lookups of
those parent directories on a case-sensitive tmpfs, the lookup will
fail"
* tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
tmpfs: Initialize sysfs during tmpfs init
tmpfs: Fix type for sysfs' casefold attribute
libfs: Fix kernel-doc warning in generic_ci_validate_strict_name
docs: tmpfs: Add casefold options
tmpfs: Expose filesystem features via sysfs
tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs
tmpfs: Add casefold lookup support
libfs: Export generic_ci_ dentry functions
unicode: Recreate utf8_parse_version()
unicode: Export latest available UTF-8 version number
ext4: Use generic_ci_validate_strict_name helper
libfs: Create the helper function generic_ci_validate_strict_name()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull copy_struct_to_user helper from Christian Brauner:
"This adds a copy_struct_to_user() helper which is a companion helper
to the already widely used copy_struct_from_user().
It copies a struct from kernel space to userspace, in a way that
guarantees backwards-compatibility for struct syscall arguments as
long as future struct extensions are made such that all new fields are
appended to the old struct, and zeroed-out new fields have the same
meaning as the old struct.
The first user is sched_getattr() system call but the new extensible
pidfs ioctl will be ported to it as well"
* tag 'vfs-6.13.usercopy' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
sched_getattr: port to copy_struct_to_user
uaccess: add copy_struct_to_user helper
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs updates from Christian Brauner:
"Make overlayfs support specifying layers through file descriptors.
Currently overlayfs only allows specifying layers through path names.
This is inconvenient for users that want to assemble an overlayfs
mount purely based on file descriptors:
This enables user to specify both:
fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+", NULL, fd_work);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2);
in addition to:
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+", "/work", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0);
There's also a large set of new overlayfs selftests to test new
features and some older properties"
* tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add test for specifying 500 lower layers
selftests: add overlayfs fd mounting selftests
selftests: use shared header
Documentation,ovl: document new file descriptor based layers
ovl: specify layers via file descriptors
fs: add helper to use mount option as path or fd
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file updates from Christian Brauner:
"This contains changes the changes for files for this cycle:
- Introduce a new reference counting mechanism for files.
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop
it has O(N^2) behaviour under contention with N concurrent
operations and it is in a hot path in __fget_files_rcu().
The rcuref infrastructures remedies this problem by using an
unconditional increment relying on safe- and dead zones to make
this work and requiring rcu protection for the data structure in
question. This not just scales better it also introduces overflow
protection.
However, in contrast to generic rcuref, files require a memory
barrier and thus cannot rely on *_relaxed() atomic operations and
also require to be built on atomic_long_t as having massive amounts
of reference isn't unheard of even if it is just an attack.
This adds a file specific variant instead of making this a generic
library.
This has been tested by various people and it gives consistent
improvement up to 3-5% on workloads with loads of threads.
- Add a fastpath for find_next_zero_bit(). Skip 2-levels searching
via find_next_zero_bit() when there is a free slot in the word that
contains the next fd. This improves pts/blogbench-1.1.0 read by 8%
and write by 4% on Intel ICX 160.
- Conditionally clear full_fds_bits since it's very likely that a bit
in full_fds_bits has been cleared during __clear_open_fds(). This
improves pts/blogbench-1.1.0 read up to 13%, and write up to 5% on
Intel ICX 160.
- Get rid of all lookup_*_fdget_rcu() variants. They were used to
lookup files without taking a reference count. That became invalid
once files were switched to SLAB_TYPESAFE_BY_RCU and now we're
always taking a reference count. Switch to an already existing
helper and remove the legacy variants.
- Remove pointless includes of <linux/fdtable.h>.
- Avoid cmpxchg() in close_files() as nobody else has a reference to
the files_struct at that point.
- Move close_range() into fs/file.c and fold __close_range() into it.
- Cleanup calling conventions of alloc_fdtable() and expand_files().
- Merge __{set,clear}_close_on_exec() into one.
- Make __set_open_fd() set cloexec as well instead of doing it in two
separate steps"
* tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressor
fs: port files to file_ref
fs: add file_ref
expand_files(): simplify calling conventions
make __set_open_fd() set cloexec state as well
fs: protect backing files with rcu
file.c: merge __{set,clear}_close_on_exec()
alloc_fdtable(): change calling conventions.
fs/file.c: add fast path in find_next_fd()
fs/file.c: conditionally clear full_fds
fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()
move close_range(2) into fs/file.c, fold __close_range() into it
close_files(): don't bother with xchg()
remove pointless includes of <linux/fdtable.h>
get rid of ...lookup...fdget_rcu() family
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pagecache updates from Christian Brauner:
"Cleanup filesystem page flag usage: This continues the work to make
the mappedtodisk/owner_2 flag available to filesystems which don't use
buffer heads. Further patches remove uses of Private2. This brings us
very close to being rid of it entirely"
* tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
migrate: Remove references to Private2
ceph: Remove call to PagePrivate2()
btrfs: Switch from using the private_2 flag to owner_2
mm: Remove PageMappedToDisk
nilfs2: Convert nilfs_copy_buffer() to use folios
fs: Move clearing of mappedtodisk to buffer.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Fixup and improve NLM and kNFSD file lock callbacks
Last year both GFS2 and OCFS2 had some work done to make their
locking more robust when exported over NFS. Unfortunately, part of
that work caused both NLM (for NFS v3 exports) and kNFSD (for
NFSv4.1+ exports) to no longer send lock notifications to clients
This in itself is not a huge problem because most NFS clients will
still poll the server in order to acquire a conflicted lock
It's important for NLM and kNFSD that they do not block their
kernel threads inside filesystem's file_lock implementations
because that can produce deadlocks. We used to make sure of this by
only trusting that posix_lock_file() can correctly handle blocking
lock calls asynchronously, so the lock managers would only setup
their file_lock requests for async callbacks if the filesystem did
not define its own lock() file operation
However, when GFS2 and OCFS2 grew the capability to correctly
handle blocking lock requests asynchronously, they started
signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
for also trusting posix_lock_file() was inadvertently dropped, so
now most filesystems no longer produce lock notifications when
exported over NFS
Fix this by using an fop_flag which greatly simplifies the problem
and grooms the way for future uses by both filesystems and lock
managers alike
- Add a sysctl to delete the dentry when a file is removed instead of
making it a negative dentry
Commit 681ce8623567 ("vfs: Delete the associated dentry when
deleting a file") introduced an unconditional deletion of the
associated dentry when a file is removed. However, this led to
performance regressions in specific benchmarks, such as
ilebench.sum_operations/s, prompting a revert in commit
4a4be1ad3a6e ("Revert "vfs: Delete the associated dentry when
deleting a file""). This reintroduces the concept conditionally
through a sysctl
- Expand the statmount() system call:
* Report the filesystem subtype in a new fs_subtype field to
e.g., report fuse filesystem subtypes
* Report the superblock source in a new sb_source field
* Add a new way to return filesystem specific mount options in an
option array that returns filesystem specific mount options
separated by zero bytes and unescaped. This allows caller's to
retrieve filesystem specific mount options and immediately pass
them to e.g., fsconfig() without having to unescape or split
them
* Report security (LSM) specific mount options in a separate
security option array. We don't lump them together with
filesystem specific mount options as security mount options are
generic and most users aren't interested in them
The format is the same as for the filesystem specific mount
option array
- Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command
- Optimize acl_permission_check() to avoid costly {g,u}id ownership
checks if possible
- Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()
- Add synchronous wakeup support for ep_poll_callback.
Currently, epoll only uses wake_up() to wake up task. But sometimes
there are epoll users which want to use the synchronous wakeup flag
to give a hint to the scheduler, e.g., the Android binder driver.
So add a wake_up_sync() define, and use wake_up_sync() when sync is
true in ep_poll_callback()
Fixes:
- Fix kernel documentation for inode_insert5() and iget5_locked()
- Annotate racy epoll check on file->f_ep
- Make F_DUPFD_QUERY associative
- Avoid filename buffer overrun in initramfs
- Don't let statmount() return empty strings
- Add a cond_resched() to dump_user_range() to avoid hogging the CPU
- Don't query the device logical blocksize multiple times for hfsplus
- Make filemap_read() check that the offset is positive or zero
Cleanups:
- Various typo fixes
- Cleanup wbc_attach_fdatawrite_inode()
- Add __releases annotation to wbc_attach_and_unlock_inode()
- Add hugetlbfs tracepoints
- Fix various vfs kernel doc parameters
- Remove obsolete TODO comment from io_cancel()
- Convert wbc_account_cgroup_owner() to take a folio
- Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()
- Reorder struct posix_acl to save 8 bytes
- Annotate struct posix_acl with __counted_by()
- Replace one-element array with flexible array member in freevxfs
- Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"
* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
statmount: retrieve security mount options
vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
statmount: add flag to retrieve unescaped options
fs: add the ability for statmount() to report the sb_source
writeback: wbc_attach_fdatawrite_inode out of line
writeback: add a __releases annoation to wbc_attach_and_unlock_inode
fs: add the ability for statmount() to report the fs_subtype
fs: don't let statmount return empty strings
fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
hfsplus: don't query the device logical block size multiple times
freevxfs: Replace one-element array with flexible array member
fs: optimize acl_permission_check()
initramfs: avoid filename buffer overrun
fs/writeback: convert wbc_account_cgroup_owner to take a folio
acl: Annotate struct posix_acl with __counted_by()
acl: Realign struct posix_acl to save 8 bytes
epoll: Add synchronous wakeup support for ep_poll_callback
coredump: add cond_resched() to dump_user_range
mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs multigrain timestamps from Christian Brauner:
"This is another try at implementing multigrain timestamps. This time
with significant help from the timekeeping maintainers to reduce the
performance impact.
Thomas provided a base branch that contains the required timekeeping
interfaces for the VFS. It serves as the base for the multi-grain
timestamp work:
- Multigrain timestamps allow the kernel to use fine-grained
timestamps when an inode's attributes is being actively observed
via ->getattr(). With this support, it's possible for a file to get
a fine-grained timestamp, and another modified after it to get a
coarse-grained stamp that is earlier than the fine-grained time. If
this happens then the files can appear to have been modified in
reverse order, which breaks VFS ordering guarantees.
To prevent this, a floor value is maintained for multigrain
timestamps. Whenever a fine-grained timestamp is handed out, record
it, and when later coarse-grained stamps are handed out, ensure
they are not earlier than that value. If the coarse-grained
timestamp is earlier than the fine-grained floor, return the floor
value instead.
The timekeeper changes add a static singleton atomic64_t into
timekeeper.c that is used to keep track of the latest fine-grained
time ever handed out. This is tracked as a monotonic ktime_t value
to ensure that it isn't affected by clock jumps. Because it is
updated at different times than the rest of the timekeeper object,
the floor value is managed independently of the timekeeper via a
cmpxchg() operation, and sits on its own cacheline.
Two new public timekeeper interfaces are added:
(1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the
later of the coarse-grained clock and the floor time
(2) ktime_get_real_ts64_mg() gets the fine-grained clock value,
and tries to swap it into the floor. A timespec64 is filled
with the result.
- The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around
1 per jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting
via NFSv3, which relies on timestamps to validate caches. A lot of
changes can happen in a jiffy, so timestamps aren't sufficient to
help the client decide when to invalidate the cache. Even with
NFSv4, a lot of exported filesystems don't properly support a
change attribute and are subject to the same problems with
timestamp granularity. Other applications have similar issues with
timestamps (e.g backup applications).
If we were to always use fine-grained timestamps, that would
improve the situation, but that becomes rather expensive, as the
underlying filesystem would have to log a lot more metadata
updates.
This adds a way to only use fine-grained timestamps when they are
being actively queried. Use the (unused) top bit in
inode->i_ctime_nsec as a flag that indicates whether the current
timestamps have been queried via stat() or the like. When it's set,
we allow the kernel to use a fine-grained timestamp iff it's
necessary to make the ctime show a different value.
This solves the problem of being able to distinguish the timestamp
between updates, but introduces a new problem: it's now possible
for a file being changed to get a fine-grained timestamp. A file
that is altered just a bit later can then get a coarse-grained one
that appears older than the earlier fine-grained time. This
violates timestamp ordering guarantees.
This is where the earlier mentioned timkeeping interfaces help. A
global monotonic atomic64_t value is kept that acts as a timestamp
floor. When we go to stamp a file, we first get the latter of the
current floor value and the current coarse-grained time. If the
inode ctime hasn't been queried then we just attempt to stamp it
with that value.
If it has been queried, then first see whether the current coarse
time is later than the existing ctime. If it is, then we accept
that value. If it isn't, then we get a fine-grained time and try to
swap that into the global floor. Whether that succeeds or fails, we
take the resulting floor time, convert it to realtime and try to
swap that into the ctime.
We take the result of the ctime swap whether it succeeds or fails,
since either is just as valid.
Filesystems can opt into this by setting the FS_MGTIME fstype flag.
Others should be unaffected (other than being subject to the same
floor value as multigrain filesystems)"
* tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: reduce pointer chasing in is_mgtime() test
tmpfs: add support for multigrain timestamps
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
Documentation: add a new file documenting multigrain timestamps
fs: add percpu counters for significant multigrain timestamp events
fs: tracepoints around multigrain timestamp events
fs: handle delegated timestamps in setattr_copy_mgtime
timekeeping: Add percpu counter for tracking floor swap events
timekeeping: Add interfaces for handling timestamps with a floor value
fs: have setattr_copy handle multigrain timestamps appropriately
fs: add infrastructure for multigrain timestamps
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. All singletons, please see the
changelogs for details"
* tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: revert "mm: shmem: fix data-race in shmem_getattr()"
ocfs2: uncache inode which has failed entering the group
mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
mm, doc: update read_ahead_kb for MADV_HUGEPAGE
fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()
sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
mm/mremap: fix address wraparound in move_page_tables()
tools/mm: fix compile error
mm, swap: fix allocation and scanning race with swapoff
|
|
'rcu/srcu' into rcu/dev
|
|
Currently, srcu_read_lock_lite() uses the SRCU_READ_FLAVOR_LITE bit in
->srcu_reader_flavor to communicate to the grace-period processing in
srcu_readers_active_idx_check() that the smp_mb() must be replaced by a
synchronize_rcu(). Unfortunately, ->srcu_reader_flavor is not updated
unless the kernel is built with CONFIG_PROVE_RCU=y. Therefore in all
kernels built with CONFIG_PROVE_RCU=n, srcu_readers_active_idx_check()
incorrectly uses smp_mb() instead of synchronize_rcu() for srcu_struct
structures whose readers use srcu_read_lock_lite().
This commit therefore causes Tree SRCU srcu_read_lock_lite()
to unconditionally update ->srcu_reader_flavor so that
srcu_readers_active_idx_check() can make the correct choice.
Reported-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Closes: https://lore.kernel.org/all/d07e8f4a-d5ff-4c8e-8e61-50db285c57e9@amd.com/
Fixes: c0f08d6b5a61 ("srcu: Add srcu_read_lock_lite() and srcu_read_unlock_lite()")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
|
Merge updates of the ACPI battery and EC drivers, an ACPI Platform
Firmware Runtime (PFR) telemetry driver update and an ACPI OS support
layer change for 6.13-rc1:
- Use DEFINE_SIMPLE_DEV_PM_OPS in the ACPI battery driver, make it use
devm_ for initializing mutexes and allocating driver data, and make
it check the register_pm_notifier() return value (Thomas Weißschuh,
Andy Shevchenko).
- Make the ACPI EC driver support compile-time conditional and allow
ACPI to be built without CONFIG_HAS_IOPORT (Arnd Bergmann).
- Remove a redundant error check from the pfr_telemetry driver (Colin
Ian King).
* acpi-battery:
ACPI: battery: Check for error code from devm_mutex_init() call
ACPI: battery: use DEFINE_SIMPLE_DEV_PM_OPS
ACPI: battery: initialize mutexes through devm_ APIs
ACPI: battery: allocate driver data through devm_ APIs
ACPI: battery: check result of register_pm_notifier()
* acpi-ec:
ACPI: EC: make EC support compile-time conditional
* acpi-pfr:
ACPI: pfr_telemetry: remove redundant error check on ret
* acpi-osl:
ACPI: allow building without CONFIG_HAS_IOPORT
|
|
Now we've got a more generic region registration API, place
IORING_ENTER_EXT_ARG_REG and re-enable it.
First, the user has to register a region with the
IORING_MEM_REGION_REG_WAIT_ARG flag set. It can only be done for a
ring in a disabled state, aka IORING_SETUP_R_DISABLED, to avoid races
with already running waiters. With that we should have stable constant
values for ctx->cq_wait_{size,arg} in io_get_ext_arg_reg() and hence no
READ_ONCE required.
The other API difference is that we're now passing byte offsets instead
of indexes. The user _must_ align all offsets / pointers to the native
word size, failing to do so might but not necessarily has to lead to a
failure usually returned as -EFAULT. liburing will be hiding this
details from users.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/81822c1b4ffbe8ad391b4f9ad1564def0d26d990.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
"pmdomain core:
- Add GENPD_FLAG_DEV_NAME_FW flag to generate unique names
pmdomain providers:
- arm: Use FLAG_DEV_NAME_FW to ensure unique names
- imx93-blk-ctrl: Fix the remove path
arm_scmi/qcom-cpucp:
- Report duplicate OPPs as firmware bugs for arm_scmi
- Skip OPP duplicates for arm_scmi
- Mark the qcom-cpucp mailbox irq with IRQF_NO_SUSPEND flag"
* tag 'pmdomain-v6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
firmware: arm_scmi: Report duplicate opps as firmware bugs
firmware: arm_scmi: Skip opp duplicates
pmdomain: imx93-blk-ctrl: correct remove path
pmdomain: arm: Use FLAG_DEV_NAME_FW to ensure unique names
pmdomain: core: Add GENPD_FLAG_DEV_NAME_FW flag
|
|
Regions will serve multiple purposes. First, with it we can decouple
ring/etc. object creation from registration / mapping of the memory they
will be placed in. We already have hacks that allow to put both SQ and
CQ into the same huge page, in the future we should be able to:
region = create_region(io_ring);
create_pbuf_ring(io_uring, region, offset=0);
create_pbuf_ring(io_uring, region, offset=N);
The second use case is efficiently passing parameters. The following
patch enables back on top of regions IORING_ENTER_EXT_ARG_REG, which
optimises wait arguments. It'll also be useful for request arguments
replacing iovecs, msghdr, etc. pointers. Eventually it would also be
handy for BPF as well if it comes to fruition.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0798cf3a14fad19cfc96fc9feca5f3e11481691d.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We've got a good number of mappings we share with the userspace, that
includes the main rings, provided buffer rings, upcoming rings for
zerocopy rx and more. All of them duplicate user argument parsing and
some internal details as well (page pinnning, huge page optimisations,
mmap'ing, etc.)
Introduce a notion of regions. For userspace for now it's just a new
structure called struct io_uring_region_desc which is supposed to
parameterise all such mapping / queue creations. A region either
represents a user provided chunk of memory, in which case the user_addr
field should point to it, or a request for the kernel to allocate the
memory, in which case the user would need to mmap it after using the
offset returned in the mmap_offset field. With a uniform userspace API
we can avoid additional boiler plate code and apply future optimisation
to all of them at once.
Internally, there is a new structure struct io_mapped_region holding all
relevant runtime information and some helpers to work with it. This
patch limits it to user provided regions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0e6fe25818dfbaebd1bd90b870a6cac503fe1a24.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Disable wait argument registration as it'll be replaced with a more
generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing
in a few commits so leave it be.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70b1d1d218c41ba77a76d1789c8641dab0b0563e.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A previous commit changed how requests are linked in the plug structure,
but unlike the previous method, it uses a new type for it rather than
struct request. The latter is available even for !CONFIG_BLOCK, while
struct rq_list is now. Move it outside CONFIG_BLOCK.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: a3396b99990d ("block: add a rq_list type")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the
object_is_on_stack() function may produce incorrect results due to the
presence of tags in the obj pointer, while the stack pointer does not have
tags. This discrepancy can lead to incorrect stack object detection and
subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled.
Example of the warning:
ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4
Hardware name: linux,dummy-virt (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __debug_object_init+0x330/0x364
lr : __debug_object_init+0x330/0x364
sp : ffff800082ea7b40
x29: ffff800082ea7b40 x28: 98ff0000c0164518 x27: 98ff0000c0164534
x26: ffff800082d93ec8 x25: 0000000000000001 x24: 1cff0000c00172a0
x23: 0000000000000000 x22: ffff800082d93ed0 x21: ffff800081a24418
x20: 3eff800082ea7bb0 x19: efff800000000000 x18: 0000000000000000
x17: 00000000000000ff x16: 0000000000000047 x15: 206b63617473206e
x14: 0000000000000018 x13: ffff800082ea7780 x12: 0ffff800082ea78e
x11: 0ffff800082ea790 x10: 0ffff800082ea79d x9 : 34d77febe173e800
x8 : 34d77febe173e800 x7 : 0000000000000001 x6 : 0000000000000001
x5 : feff800082ea74b8 x4 : ffff800082870a90 x3 : ffff80008018d3c4
x2 : 0000000000000001 x1 : ffff800082858810 x0 : 0000000000000050
Call trace:
__debug_object_init+0x330/0x364
debug_object_init_on_stack+0x30/0x3c
schedule_hrtimeout_range_clock+0xac/0x26c
schedule_hrtimeout+0x1c/0x30
wait_task_inactive+0x1d4/0x25c
kthread_bind_mask+0x28/0x98
init_rescuer+0x1e8/0x280
workqueue_init+0x1a0/0x3cc
kernel_init_freeable+0x118/0x200
kernel_init+0x28/0x1f0
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------
Link: https://lkml.kernel.org/r/20241113042544.19095-1-qun-wei.lin@mediatek.com
Signed-off-by: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Cc: Andrew Yang <andrew.yang@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Casper Li <casper.li@mediatek.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Quite calm week. No new regression under investigation.
Current release - regressions:
- eth: revert "igb: Disable threaded IRQ for igb_msix_other"
Current release - new code bugs:
- bluetooth: btintel: direct exception event to bluetooth stack
Previous releases - regressions:
- core: fix data-races around sk->sk_forward_alloc
- netlink: terminate outstanding dump on socket close
- mptcp: error out earlier on disconnect
- vsock: fix accept_queue memory leak
- phylink: ensure PHY momentary link-fails are handled
- eth: mlx5:
- fix null-ptr-deref in add rule err flow
- lock FTE when checking if active
- eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol
Previous releases - always broken:
- sched: fix u32's systematic failure to free IDR entries for hnodes.
- sctp: fix possible UAF in sctp_v6_available()
- eth: bonding: add ns target multicast address to slave device
- eth: mlx5: fix msix vectors to respect platform limit
- eth: icssg-prueth: fix 1 PPS sync"
* tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
net: sched: u32: Add test case for systematic hnode IDR leaks
selftests: bonding: add ns multicast group testing
bonding: add ns target multicast address to slave device
net: ti: icssg-prueth: Fix 1 PPS sync
stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
net: Make copy_safe_from_sockptr() match documentation
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
ipmr: Fix access to mfc_cache_list without lock held
samples: pktgen: correct dev to DEV
net: phylink: ensure PHY momentary link-fails are handled
mptcp: pm: use _rcu variant under rcu_read_lock
mptcp: hold pm lock when deleting entry
mptcp: update local address flags when setting it
net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
MAINTAINERS: Re-add cancelled Renesas driver sections
Revert "igb: Disable threaded IRQ for igb_msix_other"
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
virtio/vsock: Improve MSG_ZEROCOPY error handling
vsock: Fix sk_error_queue memory leak
...
|
|
* for-next/pkey-signal:
: Bring arm64 pkey signal delivery in line with the x86 behaviour
selftests/mm: Fix unused function warning for aarch64_write_signal_pkey()
selftests/mm: Define PKEY_UNRESTRICTED for pkey_sighandler_tests
selftests/mm: Enable pkey_sighandler_tests on arm64
selftests/mm: Use generic pkey register manipulation
arm64: signal: Remove unused macro
arm64: signal: Remove unnecessary check when saving POE state
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
Revert "kasan: Disable Software Tag-Based KASAN with GCC"
kasan: Fix Software Tag-Based KASAN with GCC
kasan: Disable Software Tag-Based KASAN with GCC
Documentation/protection-keys: add AArch64 to documentation
arm64: set POR_EL0 for kernel threads
# Conflicts:
# arch/arm64/kernel/signal.c
|
|
'for-next/tlb', 'for-next/misc', 'for-next/mte', 'for-next/sysreg', 'for-next/stacktrace', 'for-next/hwcap3', 'for-next/kselftest', 'for-next/crc32', 'for-next/guest-cca', 'for-next/haft' and 'for-next/scs', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
perf: Switch back to struct platform_driver::remove()
perf: arm_pmuv3: Add support for Samsung Mongoose PMU
dt-bindings: arm: pmu: Add Samsung Mongoose core compatible
perf/dwc_pcie: Fix typos in event names
perf/dwc_pcie: Add support for Ampere SoCs
ARM: pmuv3: Add missing write_pmuacr()
perf/marvell: Marvell PEM performance monitor support
perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control
perf/dwc_pcie: Convert the events with mixed case to lowercase
perf/cxlpmu: Support missing events in 3.1 spec
perf: imx_perf: add support for i.MX91 platform
dt-bindings: perf: fsl-imx-ddr: Add i.MX91 compatible
drivers perf: remove unused field pmu_node
* for-next/gcs: (42 commits)
: arm64 Guarded Control Stack user-space support
kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
arm64/gcs: Fix outdated ptrace documentation
kselftest/arm64: Ensure stable names for GCS stress test results
kselftest/arm64: Validate that GCS push and write permissions work
kselftest/arm64: Enable GCS for the FP stress tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Add GCS signal tests
kselftest/arm64: Add test coverage for GCS mode locking
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Verify the GCS hwcap
arm64: Add Kconfig for Guarded Control Stack (GCS)
arm64/ptrace: Expose GCS via ptrace and core files
arm64/signal: Expose GCS state in signal frames
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/mm: Implement map_shadow_stack()
...
* for-next/probes:
: Various arm64 uprobes/kprobes cleanups
arm64: insn: Simulate nop instruction for better uprobe performance
arm64: probes: Remove probe_opcode_t
arm64: probes: Cleanup kprobes endianness conversions
arm64: probes: Move kprobes-specific fields
arm64: probes: Fix uprobes for big-endian kernels
arm64: probes: Fix simulate_ldr*_literal()
arm64: probes: Remove broken LDR (literal) uprobe support
* for-next/asm-offsets:
: arm64 asm-offsets.c cleanup (remove unused offsets)
arm64: asm-offsets: remove PREEMPT_DISABLE_OFFSET
arm64: asm-offsets: remove DMA_{TO,FROM}_DEVICE
arm64: asm-offsets: remove VM_EXEC and PAGE_SZ
arm64: asm-offsets: remove MM_CONTEXT_ID
arm64: asm-offsets: remove COMPAT_{RT_,SIGFRAME_REGS_OFFSET
arm64: asm-offsets: remove VMA_VM_*
arm64: asm-offsets: remove TSK_ACTIVE_MM
* for-next/tlb:
: TLB flushing optimisations
arm64: optimize flush tlb kernel range
arm64: tlbflush: add __flush_tlb_range_limit_excess()
* for-next/misc:
: Miscellaneous patches
arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
arm64/ptrace: Clarify documentation of VL configuration via ptrace
acpi/arm64: remove unnecessary cast
arm64/mm: Change protval as 'pteval_t' in map_range()
arm64: uprobes: Optimize cache flushes for xol slot
acpi/arm64: Adjust error handling procedure in gtdt_parse_timer_block()
arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANG
arm64/ptdump: Test both PTE_TABLE_BIT and PTE_VALID for block mappings
arm64/mm: Sanity check PTE address before runtime P4D/PUD folding
arm64/mm: Drop setting PTE_TYPE_PAGE in pte_mkcont()
ACPI: GTDT: Tighten the check for the array of platform timer structures
arm64/fpsimd: Fix a typo
arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumers
arm64: Return early when break handler is found on linked-list
arm64/mm: Re-organize arch_make_huge_pte()
arm64/mm: Drop _PROT_SECT_DEFAULT
arm64: Add command-line override for ID_AA64MMFR0_EL1.ECV
arm64: head: Drop SWAPPER_TABLE_SHIFT
arm64: cpufeature: add POE to cpucap_is_possible()
arm64/mm: Change pgattr_change_is_safe() arguments as pteval_t
* for-next/mte:
: Various MTE improvements
selftests: arm64: add hugetlb mte tests
hugetlb: arm64: add mte support
* for-next/sysreg:
: arm64 sysreg updates
arm64/sysreg: Update ID_AA64MMFR1_EL1 to DDI0601 2024-09
* for-next/stacktrace:
: arm64 stacktrace improvements
arm64: preserve pt_regs::stackframe during exec*()
arm64: stacktrace: unwind exception boundaries
arm64: stacktrace: split unwind_consume_stack()
arm64: stacktrace: report recovered PCs
arm64: stacktrace: report source of unwind data
arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
arm64: use a common struct frame_record
arm64: pt_regs: swap 'unused' and 'pmr' fields
arm64: pt_regs: rename "pmr_save" -> "pmr"
arm64: pt_regs: remove stale big-endian layout
arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
* for-next/hwcap3:
: Add AT_HWCAP3 support for arm64 (also wire up AT_HWCAP4)
arm64: Support AT_HWCAP3
binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4
* for-next/kselftest: (30 commits)
: arm64 kselftest fixes/cleanups
kselftest/arm64: Try harder to generate different keys during PAC tests
kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
kselftest/arm64: Add FPMR coverage to fp-ptrace
kselftest/arm64: Expand the set of ZA writes fp-ptrace does
kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
kselftest/arm64: Enable build of PAC tests with LLVM=1
kselftest/arm64: Check that SVCR is 0 in signal handlers
kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
kselftest/arm64: Fix build with stricter assemblers
kselftest/arm64: Test signal handler state modification in fp-stress
kselftest/arm64: Provide a SIGUSR1 handler in the kernel mode FP stress test
kselftest/arm64: Implement irritators for ZA and ZT
kselftest/arm64: Remove unused ADRs from irritator handlers
kselftest/arm64: Correct misleading comments on fp-stress irritators
kselftest/arm64: Poll less often while waiting for fp-stress children
kselftest/arm64: Increase frequency of signal delivery in fp-stress
kselftest/arm64: Fix encoding for SVE B16B16 test
...
* for-next/crc32:
: Optimise CRC32 using PMULL instructions
arm64/crc32: Implement 4-way interleave using PMULL
arm64/crc32: Reorganize bit/byte ordering macros
arm64/lib: Handle CRC-32 alternative in C code
* for-next/guest-cca:
: Support for running Linux as a guest in Arm CCA
arm64: Document Arm Confidential Compute
virt: arm-cca-guest: TSM_REPORT support for realms
arm64: Enable memory encrypt for Realms
arm64: mm: Avoid TLBI when marking pages as valid
arm64: Enforce bounce buffers for realm DMA
efi: arm64: Map Device with Prot Shared
arm64: rsi: Map unprotected MMIO as decrypted
arm64: rsi: Add support for checking whether an MMIO is protected
arm64: realm: Query IPA size from the RMM
arm64: Detect if in a realm and set RIPAS RAM
arm64: rsi: Add RSI definitions
* for-next/haft:
: Support for arm64 FEAT_HAFT
arm64: pgtable: Warn unexpected pmdp_test_and_clear_young()
arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG
arm64: Add support for FEAT_HAFT
arm64: setup: name 'tcr2' register
arm64/sysreg: Update ID_AA64MMFR1_EL1 register
* for-next/scs:
: Dynamic shadow call stack fixes
arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
arm64/scs: Deal with 64-bit relative offsets in FDE frames
arm64/scs: Fix handling of DWARF augmentation data in CIE/FDE frames
|