Age | Commit message (Collapse) | Author |
|
Calling conventions for __bpf_map_get() would be more convenient
if it left fpdut() on failure to callers. Makes for simpler logics
in the callers.
Among other things, the proof of memory safety no longer has to
rely upon file->private_data never being ERR_PTR(...) for bpffs files.
Original calling conventions made it impossible for the caller to tell
whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
the file not be a bpf map one (in which case it would've done fdput())
or because it found that ERR_PTR(-EINVAL) in file->private_data of a
bpf map file (in which case fdput() would _not_ have been done).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Factor out the logic to extract bpf_map instances from FD embedded in
bpf_insns, adding it to the list of used_maps (unless it's already
there, in which case we just reuse map's index). This simplifies the
logic in resolve_pseudo_ldimm64(), especially around `struct fd`
handling, as all that is now neatly contained in the helper and doesn't
leak into a dozen error handling paths.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Swith fdget_raw() use cases in bpf_inode_storage.c to CLASS(fd_raw).
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Irregularity here is fdput() not in the same scope as fdget();
just fold ____bpf_prog_get() into its (only) caller and that's
it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Merge Al Viro's struct fd refactorings.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
consume_remote_task() and dispatch_to_local_dsq() use
move_task_to_local_dsq() to migrate the task to the target CPU. Currently,
move_task_to_local_dsq() expects the caller to lock both the source and
destination rq's. While this may save a few lock operations while the rq's
are not contended, under contention, the double locking can exacerbate the
situation significantly (refer to the linked message below).
Update the migration path so that double locking is not used.
move_task_to_local_dsq() now expects the caller to be locking the source rq,
drops it and then acquires the destination rq lock. Code is simpler this way
and, on a 2-way NUMA machine w/ Xeon Gold 6138, 'hackbench 100 thread 5000`
shows ~3% improvement with scx_simple.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20240806082716.GP37996@noisy.programming.kicks-ass.net
Acked-by: David Vernet <void@manifault.com>
|
|
Add an interface for a user-defined workqueue lockdep map, which is
helpful when multiple workqueues are created for the same purpose. This
also helps avoid leaking lockdep maps on each workqueue creation.
v2:
- Add alloc_workqueue_lockdep_map (Tejun)
v3:
- Drop __WQ_USER_OWNED_LOCKDEP (Tejun)
- static inline alloc_ordered_workqueue_lockdep_map (Tejun)
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Will help enable user-defined lockdep maps for workqueues.
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Will help enable user-defined lockdep maps for workqueues.
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
`__bpf_ops_sched_ext_ops` was missing the initialization of some struct
attributes. With
https://lore.kernel.org/all/20240722183049.2254692-4-martin.lau@linux.dev/
every single attributes need to be initialized programs (like scx_layered)
will fail to load.
05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_init not found in kernel, skipping it as it's set to zero
05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_exit not found in kernel, skipping it as it's set to zero
05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_prep_move not found in kernel, skipping it as it's set to zero
05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_move not found in kernel, skipping it as it's set to zero
05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_cancel_move not found in kernel, skipping it as it's set to zero
05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_set_weight not found in kernel, skipping it as it's set to zero
05:26:48 [WARN] libbpf: prog 'layered_dump': BPF program load failed: unknown error (-524)
05:26:48 [WARN] libbpf: prog 'layered_dump': -- BEGIN PROG LOAD LOG --
attach to unsupported member dump of struct sched_ext_ops
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
05:26:48 [WARN] libbpf: prog 'layered_dump': failed to load: -524
05:26:48 [WARN] libbpf: failed to load object 'bpf_bpf'
05:26:48 [WARN] libbpf: failed to load BPF skeleton 'bpf_bpf': -524
Error: Failed to load BPF program
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The regressing commit is new in 6.10. It assumed that anytime event->prog
is set bpf_overflow_handler() should be invoked to execute the attached bpf
program. This assumption is false for tracing events, and as a result the
regressing commit broke bpftrace by invoking the bpf handler with garbage
inputs on overflow.
Prior to the regression the overflow handlers formed a chain (of length 0,
1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added
bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog()
(the tracing case) did not. Both set event->prog. The chain of overflow
handlers was replaced by a single overflow handler slot and a fixed call to
bpf_overflow_handler() when appropriate. This modifies the condition there
to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the
previous behavior and fixing bpftrace.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Reported-by: Joe Damato <jdamato@fastly.com>
Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/
Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery")
Cc: stable@vger.kernel.org
Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240813151727.28797-1-jdamato@fastly.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
to ringbuffer") disabled non-panic CPUs to further write messages to
ringbuffer after panicked.
Since the commit, non-panicked CPU's are not allowed to write to
ring buffer after panicked and CPU backtrace which is triggered
after panicked to sample non-panicked CPUs' backtrace no longer
serves its function as it has nothing to print.
Fix the issue by allowing non-panicked CPUs to write into ringbuffer
while CPU backtrace is in flight.
Fixes: 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer")
Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240812072703.339690-1-takakura@valinux.co.jp
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
When the domain suffix is not supplied alloc_fwnode_name() unconditionally
adds a separator.
Fix the format strings to get rid of the stray '-' separator.
Fixes: 1e7c05292531 ("irqdomain: Allow giving name suffix for domain")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240812193101.1266625-3-andriy.shevchenko@linux.intel.com
|
|
The code uses if (bus_token) and if (bus_token == DOMAIN_BUS_ANY).
Since bus_token is an enum, the latter is more robust against changes.
Convert all !bus_token checks to explicitely check for DOMAIN_BUS_ANY.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240812193101.1266625-2-andriy.shevchenko@linux.intel.com
|
|
We want the compiler to see that fdput() on empty instance
is a no-op. The emptiness check is that file reference is NULL,
while fdput() is "fput() if FDPUT_FPUT is present in flags".
The reason why fdput() on empty instance is a no-op is something
compiler can't see - it's that we never generate instances with
NULL file reference combined with non-zero flags.
It's not that hard to deal with - the real primitives behind
fdget() et.al. are returning an unsigned long value, unpacked by (inlined)
__to_fd() into the current struct file * + int. The lower bits are
used to store flags, while the rest encodes the pointer. Linus suggested
that keeping this unsigned long around with the extractions done by inlined
accessors should generate a sane code and that turns out to be the case.
Namely, turning struct fd into a struct-wrapped unsinged long, with
fd_empty(f) => unlikely(f.word == 0)
fd_file(f) => (struct file *)(f.word & ~3)
fdput(f) => if (f.word & 1) fput(fd_file(f))
ends up with compiler doing the right thing. The cost is the patch
footprint, of course - we need to switch f.file to fd_file(f) all over
the tree, and it's not doable with simple search and replace; there are
false positives, etc.
Note that the sole member of that structure is an opaque
unsigned long - all accesses should be done via wrappers and I don't
want to use a name that would invite manual casts to file pointers,
etc. The value of that member is equal either to (unsigned long)p | flags,
p being an address of some struct file instance, or to 0 for an empty fd.
For now the new predicate (fd_empty(f)) has no users; all the
existing checks have form (!fd_file(f)). We will convert to fd_empty()
use later; here we only define it (and tell the compiler that it's
unlikely to return true).
This commit only deals with representation change; there will
be followups.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
Further investigation shows that the crash is due to invalid memory access
in stacksafe(). More specifically, it is the following code:
if (exact != NOT_EXACT &&
old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
cur->stack[spi].slot_type[i % BPF_REG_SIZE])
return false;
The 'i' iterates old->allocated_stack.
If cur->allocated_stack < old->allocated_stack the out-of-bound
access will happen.
To fix the issue add 'i >= cur->allocated_stack' check such that if
the condition is true, stacksafe() should fail. Otherwise,
cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal.
Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
Cc: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: Daniel Hodges <hodgesd@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240812214847.213612-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If a core is offline then enabling SMT should not online CPUs of
this core. By enabling SMT, what is intended is either changing the SMT
value from "off" to "on" or setting the SMT level (threads per core) from a
lower to higher value.
On PowerPC the ppc64_cpu utility can be used, among other things, to
perform the following functions:
ppc64_cpu --cores-on # Get the number of online cores
ppc64_cpu --cores-on=X # Put exactly X cores online
ppc64_cpu --offline-cores=X[,Y,...] # Put specified cores offline
ppc64_cpu --smt={on|off|value} # Enable, disable or change SMT level
If the user has decided to offline certain cores, enabling SMT should
not online CPUs in those cores. This patch fixes the issue and changes
the behaviour as described, by introducing an arch specific function
topology_is_core_online(). It is currently implemented only for PowerPC.
Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support")
Reported-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Closes: https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ
Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240731030126.956210-2-nysal@linux.ibm.com
|
|
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
SRCU keeps a count of the number of callbacks that the current
srcu_barrier() is waiting on, but there is currently no easy way to
work out which callback is stuck. One way to do this is to mark idle
SRCU-barrier callbacks by making the ->next pointer point to the callback
itself, and this commit does just that.
Later commits will use this for debug output.
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
SRCU maintains the ->srcu_n_exp_nodelay and ->reschedule_count values
to guide heuristics governing auto-expediting of normal SRCU grace
periods and grace-period-state-machine delays. This commit adds KCSAN
ASSERT_EXCLUSIVE_WRITER() calls to check for concurrent updates to
these fields.
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
Using a higher value for the initial gp sequence counters allows for
wrapping to occur faster. It can help with surfacing any issues that may
be happening as a result of the wrap around.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
It was discovered that isolated CPUs could sometimes be disturbed by
kworkers processing kfree_rcu() works causing higher than expected
latency. It is because the RCU core uses "system_wq" which doesn't have
the WQ_UNBOUND flag to handle all its work items. Fix this violation of
latency limits by using "system_unbound_wq" in the RCU core instead.
This will ensure that those work items will not be run on CPUs marked
as isolated.
Beside the WQ_UNBOUND flag, the other major difference between system_wq
and system_unbound_wq is their max_active count. The system_unbound_wq
has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-50220
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time keeping fixes from Thomas Gleixner:
- Fix a couple of issues in the NTP code where user supplied values are
neither sanity checked nor clamped to the operating range. This
results in integer overflows and eventualy NTP getting out of sync.
According to the history the sanity checks had been removed in favor
of clamping the values, but the clamping never worked correctly under
all circumstances. The NTP people asked to not bring the sanity
checks back as it might break existing applications.
Make the clamping work correctly and add it where it's missing
- If adjtimex() sets the clock it has to trigger the hrtimer subsystem
so it can adjust and if the clock was set into the future expire
timers if needed. The caller should provide a bitmask to tell
hrtimers which clocks have been adjusted.
adjtimex() uses not the proper constant and uses CLOCK_REALTIME
instead, which is 0. So hrtimers adjusts only the clocks, but does
not check for expired timers, which might make them expire really
late. Use the proper bitmask constant instead.
* tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
ntp: Safeguard against time_constant overflow
ntp: Clamp maxerror and esterror to operating range
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three small fixes for interrupt core and drivers:
- The interrupt core fails to honor caller supplied affinity hints
for non-managed interrupts and uses the system default affinity on
startup instead. Set the missing flag in the descriptor to tell the
core to use the provided affinity.
- Fix a shift out of bounds error in the Xilinx driver
- Handle switching to level trigger correctly in the RISCV APLIC
driver. It failed to retrigger the interrupt which causes it to
become stale"
* tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
irqchip/xilinx: Fix shift out of bounds
genirq/irqdesc: Honor caller provided affinity in alloc_desc()
|
|
Instead of getting estimated free pages from memblock directly, we have
introduced an API, memblock_estimated_nr_free_pages(), which is more
friendly for users.
Just replace it with new API, no functional change.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport <rppt@kernel.org>
CC: David Hildenbrand <david@redhat.com>
CC: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20240808001415.6298-2-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
|
|
Add the __counted_by compiler attribute to the flexible array member
records to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Increment nr_records before adding a new pointer to the records array.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any
meaning.
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
ct_nmi_nesting_cpu()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any
meaning.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any
meaning.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
into .nmi_nesting
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any
meaning.
[ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ]
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any
meaning.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, reflect that change in the related helpers.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
.nesting
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, reflect that change in the related helpers.
[ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ]
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- avoid a deadlock with dma-debug and netconsole (Rik van Riel)
* tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: avoid deadlock between dma debug vs printk and netconsole
|
|
Merge the irqdomain changes which are required for regmap to apply
depending patches and therefore tagged.
|
|
Devices can provide multiple interrupt lines. One reason for this is that
a device has multiple subfunctions, each providing its own interrupt line.
Another reason is that a device can be designed to be used (also) on a
system where some of the interrupts can be routed to another processor.
A line often further acts as a demultiplex for specific interrupts
and has it's respective set of interrupt (status, mask, ack, ...)
registers.
Regmap supports the handling of these registers and demultiplexing
interrupts, but the interrupt domain code ends up assigning the same name
for the per interrupt line domains. This causes a naming collision in the
debugFS code and leads to confusion, as /proc/interrupts shows two separate
interrupts with the same domain name and hardware interrupt number.
Instead of adding a workaround in regmap or driver code, allow giving a
name suffix for the domain name when the domain is created.
Add a name_suffix field in the irq_domain_info structure and make
irq_domain_instantiate() use this suffix if it is given when a domain is
created.
[ tglx: Adopt it to the cleanup patch and fixup the invalid NULL return ]
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/871q2yvk5x.ffs@tglx
|
|
irq_domain_set_name() is truly unreadable gunk. Clean it up before adding
more.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Link: https://lore.kernel.org/all/874j7uvkbm.ffs@tglx
|
|
irq_domain_create_simple() and irq_domain_create_legacy() use
__irq_domain_instantiate(), but have extra handling of allocating interrupt
descriptors and associating interrupts in them. Some of that is duplicated.
There are also call sites which have conditonals to invoke different
interrupt domain creator functions, where one of them is usually
irq_domain_create_legacy(). Alternatively they associate the interrupts for
the legacy case after creating the domain.
Moving the extra logic of irq_domain_create_simple()/legacy() into
__irq_domain_instantiate() allows to consolidate that.
Introduce hwirq_base and virq_base members in the irq_domain_info
structure, which allows to transport the required information and add the
conditional interrupt descriptor allocation and interrupt association into
__irq_domain_instantiate().
This reduces irq_domain_create_legacy() and irq_domain_create_simple() to
trivial wrappers which fill in the info structure and allows call sites
which must support the legacy case along with more modern mechanism to
select the domain type via the parameters of the info struct.
[ tglx: Massaged change log ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/32d07bd79eb2b5416e24da9e9e8fe5955423dcf9.1723120028.git.mazziesaccount@gmail.com
|
|
When running the following:
# cd /sys/kernel/tracing/
# echo 1 > events/sched/sched_waking/enable
# echo 1 > events/sched/sched_switch/enable
# echo 0 > tracing_on
# dd if=per_cpu/cpu0/trace_pipe_raw of=/tmp/raw0.dat
The dd task would get stuck in an infinite loop in the kernel. What would
happen is the following:
When ring_buffer_read_page() returns -1 (no data) then a check is made to
see if the buffer is empty (as happens when the page is not full), it will
call wait_on_pipe() to wait until the ring buffer has data. When it is it
will try again to read data (unless O_NONBLOCK is set).
The issue happens when there's a reader and the file descriptor is closed.
The wait_on_pipe() will return when that is the case. But this loop will
continue to try again and wait_on_pipe() will again return immediately and
the loop will continue and never stop.
Simply check if the file was closed before looping and exit out if it is.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240808235730.78bf63e5@rorschach.local.home
Fixes: 2aa043a55b9a7 ("tracing/ring-buffer: Fix wait_on_pipe() race")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull kprobe fixes from Masami Hiramatsu:
- Fix misusing str_has_prefix() parameter order to check symbol prefix
correctly
- bpf: remove unused declaring of bpf_kprobe_override
* tag 'probes-fixes-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobes: Fix to check symbol prefixes correctly
bpf: kprobe: remove unused declaring of bpf_kprobe_override
|
|
It turns out that the WARN_ON_ONCE() call in css_release_work_fn
introduced by commit ab0312526867 ("cgroup: Show # of subsystem CSSes
in cgroup.stat") is incorrect. Although css->nr_descendants must be
0 when a css is released and ready to be freed, the corresponding
cgrp->nr_dying_subsys[ss->id] may not be 0 if a subsystem is activated
and deactivated multiple times with one or more of its previous
activation leaving behind dying csses.
Fix the incorrect warning by removing the cgrp->nr_dying_subsys check.
Fixes: ab0312526867 ("cgroup: Show # of subsystem CSSes in cgroup.stat")
Closes: https://lore.kernel.org/cgroups/6f301773-2fce-4602-a391-8af7ef00b2fb@redhat.com/T/#t
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The recursive aes-arm-bs module load situation reported by Russell King
is getting fixed in the crypto layer, but this in the meantime fixes the
"recursive load hangs forever" by just making the waiting for the first
module load be interruptible.
This should now match the old behavior before commit 9b9879fc0327
("modules: catch concurrent module loads, treat them as idempotent"),
which used the different "wait for module to be ready" code in
module_patient_check_exists().
End result: a recursive module load will still block, but now a signal
will interrupt it and fail the second module load, at which point the
first module will successfully complete loading.
Fixes: 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent")
Cc: Russell King <linux@armlinux.org.uk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
sched_ext currently doesn't generate messages when the BPF scheduler is
enabled and disabled unless there are errors. It is useful to have paper
trail. Improve logging around enable/disable:
- Generate info messages on enable and non-error disable.
- Update error exit message formatting so that it's consistent with
non-error message. Also, prefix ei->msg with the BPF scheduler's name to
make it clear where the message is coming from.
- Shorten scx_exit_reason() strings for SCX_EXIT_UNREG* for brevity and
consistency.
v2: Use pr_*() instead of KERN_* consistently. (David)
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: David Vernet <void@manifault.com>
|
|
SCX_RQ_ONLINE
scx_rq_online() currently only tests SCX_RQ_ONLINE. This isn't fully correct
- e.g. consume_dispatch_q() uses task_run_on_remote_rq() which tests
scx_rq_online() to see whether the current rq can run the task, and, if so,
calls consume_remote_task() to migrate the task to @rq. While the test
itself was done while locking @rq, @rq can be temporarily unlocked by
consume_remote_task() and nothing prevents SCX_RQ_ONLINE from going offline
before the migration takes place.
To address the issue, add cpu_active() test to scx_rq_online(). There is a
synchronize_rcu() between cpu_active() being cleared and the rq going
offline, so if an on-going scheduling operation sees cpu_active(), the
associated rq is guaranteed to not go offline until the scheduling operation
is complete.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 60c27fb59f6c ("sched_ext: Implement sched_ext_ops.cpu_online/offline()")
Acked-by: David Vernet <void@manifault.com>
|
|
process_ddsp_deferred_locals() executes deferred direct dispatches to the
local DSQs of remote CPUs. It iterates the tasks on
rq->scx.ddsp_deferred_locals list, removing and calling
dispatch_to_local_dsq() on each. However, the list is protected by the rq
lock that can be dropped by dispatch_to_local_dsq() temporarily, so the list
can be modified during the iteration, which can lead to oopses and other
failures.
Fix it by popping from the head of the list instead of iterating the list.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5b26f7b920f7 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches")
Acked-by: David Vernet <void@manifault.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Have reading of event format files test if the metadata still exists.
When a event is freed, a flag (EVENT_FILE_FL_FREED) in the metadata
is set to state that it is to prevent any new references to it from
happening while waiting for existing references to close. When the
last reference closes, the metadata is freed. But the "format" was
missing a check to this flag (along with some other files) that
allowed new references to happen, and a use-after-free bug to occur.
- Have the trace event meta data use the refcount infrastructure
instead of relying on its own atomic counters.
- Have tracefs inodes use alloc_inode_sb() for allocation instead of
using kmem_cache_alloc() directly.
- Have eventfs_create_dir() return an ERR_PTR instead of NULL as the
callers expect a real object or an ERR_PTR.
- Have release_ei() use call_srcu() and not call_rcu() as all the
protection is on SRCU and not RCU.
- Fix ftrace_graph_ret_addr() to use the task passed in and not
current.
- Fix overflow bug in get_free_elt() where the counter can overflow the
integer and cause an infinite loop.
- Remove unused function ring_buffer_nr_pages()
- Have tracefs freeing use the inode RCU infrastructure instead of
creating its own.
When the kernel had randomize structure fields enabled, the rcu field
of the tracefs_inode was overlapping the rcu field of the inode
structure, and corrupting it. Instead, use the destroy_inode()
callback to do the initial cleanup of the code, and then have
free_inode() free it.
* tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracefs: Use generic inode RCU for synchronizing freeing
ring-buffer: Remove unused function ring_buffer_nr_pages()
tracing: Fix overflow in get_free_elt()
function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
eventfs: Use SRCU for freeing eventfs_inodes
eventfs: Don't return NULL in eventfs_create_dir()
tracefs: Fix inode allocation
tracing: Use refcount for trace_event_file reference counter
tracing: Have format file honor EVENT_FILE_FL_FREED
|
|
Pull bcachefs fixes from Kent Overstreet:
"Assorted little stuff:
- lockdep fixup for lockdep_set_notrack_class()
- we can now remove a device when using erasure coding without
deadlocking, though we still hit other issues
- the 'allocator stuck' timeout is now configurable, and messages are
ratelimited. The default timeout has been increased from 10 seconds
to 30"
* tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs:
bcachefs: Use bch2_wait_on_allocator() in btree node alloc path
bcachefs: Make allocator stuck timeout configurable, ratelimit messages
bcachefs: Add missing path_traverse() to btree_iter_next_node()
bcachefs: ec should not allocate from ro devs
bcachefs: Improved allocator debugging for ec
bcachefs: Add missing bch2_trans_begin() call
bcachefs: Add a comment for bucket helper types
bcachefs: Don't rely on implicit unsigned -> signed integer conversion
lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT
bcachefs: Fix double free of ca->buckets_nouse
|
|
Russell King reported that the arm cbc(aes) crypto module hangs when
loaded, and Herbert Xu bisected it to commit 9b9879fc0327 ("modules:
catch concurrent module loads, treat them as idempotent"), and noted:
"So what's happening here is that the first modprobe tries to load a
fallback CBC implementation, in doing so it triggers a load of the
exact same module due to module aliases.
IOW we're loading aes-arm-bs which provides cbc(aes). However, this
needs a fallback of cbc(aes) to operate, which is made out of the
generic cbc module + any implementation of aes, or ecb(aes). The
latter happens to also be provided by aes-arm-cb so that's why it
tries to load the same module again"
So loading the aes-arm-bs module ends up wanting to recursively load
itself, and the recursive load then ends up waiting for the original
module load to complete.
This is a regression, in that it used to be that we just tried to load
the module multiple times, and then as we went on to install it the
second time we would instead just error out because the module name
already existed.
That is actually also exactly what the original "catch concurrent loads"
patch did in commit 9828ed3f695a ("module: error out early on concurrent
load of the same module file"), but it turns out that it ends up being
racy, in that erroring out before the module has been fully initialized
will cause failures in dependent module loading.
See commit ac2263b588df (which was the revert of that "error out early")
commit for details about why erroring out before the module has been
initialized is actually fundamentally racy.
Now, for the actual recursive module load (as opposed to just
concurrently loading the same module twice), the race is not an issue.
At the same time it's hard for the kernel to see that this is recursion,
because the module load is always done from a usermode helper, so the
recursion is not some simple callchain within the kernel.
End result: this is not the real fix, but this at least adds a warning
for the situation (admittedly much too late for all the debugging pain
that Russell and Herbert went through) and if we can come to a
resolution on how to detect the recursion properly, this re-organizes
the code to make that easier.
Link: https://lore.kernel.org/all/ZrFHLqvFqhzykuYw@shell.armlinux.org.uk/
Reported-by: Russell King <linux@armlinux.org.uk>
Debugged-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.12:
UAPI Changes:
virtio:
- Define DRM capset
Cross-subsystem Changes:
dma-buf:
- heaps: Clean up documentation
printk:
- Pass description to kmsg_dump()
Core Changes:
CI:
- Update IGT tests
- Point upstream repo to GitLab instance
modesetting:
- Introduce Power Saving Policy property for connectors
- Add might_fault() to drm_modeset_lock priming
- Add dynamic per-crtc vblank configuration support
panic:
- Avoid build-time interference with framebuffer console
docs:
- Document Colorspace property
scheduler:
- Remove full_recover from drm_sched_start
TTM:
- Make LRU walk restartable after dropping locks
- Allow direct reclaim to allocate local memory
Driver Changes:
amdgpu:
- Support Power Saving Policy connector property
ast:
- astdp: Support AST2600 with VGA; Clean up HPD
bridge:
- Silence error message on -EPROBE_DEFER
- analogix: Clean aup
- bridge-connector: Fix double free
- lt6505: Disable interrupt when powered off
- tc358767: Make default DP port preemphasis configurable
gma500:
- Update i2c terminology
ivpu:
- Add MODULE_FIRMWARE()
lcdif:
- Fix pixel clock
loongson:
- Use GEM refcount over TTM's
mgag200:
- Improve BMC handling
- Support VBLANK intterupts
nouveau:
- Refactor and clean up internals
- Use GEM refcount over TTM's
panel:
- Shutdown fixes plus documentation
- Refactor several drivers for better code sharing
- boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus
DT; Fix porch parameter
- edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1,
BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
CMN N116BCP-EA2, CSW MNB601LS1-4
- himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
- ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
- jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor
for code sharing
sti:
- Fix module owner
stm:
- Avoid UAF wih managed plane and CRTC helpers
- Fix module owner
- Fix error handling in probe
- Depend on COMMON_CLK
- ltdc: Fix transparency after disabling plane; Remove unused interrupt
tegra:
- Call drm_atomic_helper_shutdown()
v3d:
- Clean up perfmon
vkms:
- Clean up
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
|