Age | Commit message (Collapse) | Author |
|
After the commit 7903f907a2260 ("pid: perform free_pid() calls outside
of tasklist_lock") __unhash_process() -> detach_pid() no longer calls
free_pid(), proc_flush_pid() can just use p->thread_pid without the
now pointless get_pid() + put_pid().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/20250411121857.GA10550@redhat.com
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
destroy_workqueue() does not ensure that non-pending work submitted with
queue_delayed_work() gets cancelled. The caller has to ensure that
manually.
Add this information about delayed_work in destroy_workqueue()'s
docstring.
Add a TODO for destroy_workqueue() to wait for all delayed_work.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
In the debug and resolution of an issue involving forced use of bounce
buffers, 7170130e4c72 ("x86/mm/init: Handle the special case of device
private pages in add_pages(), to not increase max_pfn and trigger
dma_addressing_limited() bounce buffers"). It would have been easier
to debug the issue if dma_addressing_limited() had debug information
about the device not being able to address all of memory and thus forcing
all accesses through a bounce buffer. Please see[2]
Implement dev_dbg to debug the potential use of bounce buffers
when we hit the condition. When swiotlb is used,
dma_addressing_limited() is used to determine the size of maximum dma
buffer size in dma_direct_max_mapping_size(). The debug prints could be
triggered in that check as well (when enabled).
Link: https://lore.kernel.org/lkml/20250401000752.249348-1-balbirs@nvidia.com/ [1]
Link: https://lore.kernel.org/lkml/20250310112206.4168-1-spasswolf@web.de/ [2]
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20250414113752.3298276-1-balbirs@nvidia.com
|
|
If the test added in commit b5ffbd139688 ("sysctl: move the extra1/2
boundary check of u8 to sysctl_check_table_array") is run as a module, a
lingering reference to the module is left behind, and a 'sysctl -a'
leads to a panic.
To reproduce
CONFIG_KUNIT=y
CONFIG_SYSCTL_KUNIT_TEST=m
Then run these commands:
modprobe sysctl-test
rmmod sysctl-test
sysctl -a
The panic varies but generally looks something like this:
BUG: unable to handle page fault for address: ffffa4571c0c7db4
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100000067 PUD 100351067 PMD 114f5e067 PTE 0
Oops: Oops: 0000 [#1] SMP NOPTI
... ... ...
RIP: 0010:proc_sys_readdir+0x166/0x2c0
... ... ...
Call Trace:
<TASK>
iterate_dir+0x6e/0x140
__se_sys_getdents+0x6e/0x100
do_syscall_64+0x70/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Move the test to lib/test_sysctl.c where the registration reference is
handled on module exit
Fixes: b5ffbd139688 ("sysctl: move the extra1/2 boundary check of u8 to sysctl_check_table_array")
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Hide get_vm_area() from MMUless builds
The function get_vm_area() is not defined when CONFIG_MMU is not
defined. Hide that function within #ifdef CONFIG_MMU.
- Fix output of synthetic events when they have dynamic strings
The print fmt of the synthetic event's format file use to have "%.*s"
for dynamic size strings even though the user space exported
arguments had only __get_str() macro that provided just a nul
terminated string. This was fixed so that user space could parse this
properly.
But the reason that it had "%.*s" was because internally it provided
the maximum size of the string as one of the arguments. The fix that
replaced "%.*s" with "%s" caused the trace output (when the kernel
reads the event) to write "(efault)" as it would now read the length
of the string as "%s".
As the string provided is always nul terminated, there's no reason
for the internal code to use "%.*s" anyway. Just remove the length
argument to match the "%s" that is now in the format.
- Fix the ftrace subops hash logic of the manager ops hash
The function_graph uses the ftrace subops code. The subops code is a
way to have a single ftrace_ops registered with ftrace to determine
what functions will call the ftrace_ops callback. More than one user
of function graph can register a ftrace_ops with it. The function
graph infrastructure will then add this ftrace_ops as a subops with
the main ftrace_ops it registers with ftrace. This is because the
functions will always call the function graph callback which in turn
calls the subops ftrace_ops callbacks.
The main ftrace_ops must add a callback to all the functions that the
subops want a callback from. When a subops is registered, it will
update the main ftrace_ops hash to include the functions it wants.
This is the logic that was broken.
The ftrace_ops hash has a "filter_hash" and a "notrace_hash" where
all the functions in the filter_hash but not in the notrace_hash are
attached by ftrace. The original logic would have the main ftrace_ops
filter_hash be a union of all the subops filter_hashes and the main
notrace_hash would be a intersect of all the subops filter hashes.
But this was incorrect because the notrace hash depends on the
filter_hash it is associated to and not the union of all
filter_hashes.
Instead, when a subops is added, just include all the functions of
the subops hash that are in its filter_hash but not in its
notrace_hash. The main subops hash should not use its notrace hash,
unless all of its subops hashes have an empty filter_hash (which
means to attach to all functions), and then, and only then, the main
ftrace_ops notrace hash can be the intersect of all the subops
hashes.
This not only fixes the bug, but also simplifies the code.
- Add a selftest to better test the subops filtering
Add a selftest that would catch the bug fixed by the above change.
- Fix extra newline printed in function tracing with retval
The function parameter code changed the output logic slightly and
called print_graph_retval() and also printed a newline. The
print_graph_retval() also prints a newline which caused blank lines
to be printed in the function graph tracer when retval was added.
This caused one of the selftests to fail if retvals were enabled.
Instead remove the new line output from print_graph_retval() and have
the callers always print the new line so that it doesn't have to do
special logic if it calls print_graph_retval() or not.
- Fix out-of-bound memory access in the runtime verifier
When rv_is_container_monitor() is called on the last entry on the
link list it references the next entry, which is the list head and
causes an out-of-bound memory access.
* tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix out-of-bound memory access in rv_is_container_monitor()
ftrace: Do not have print_graph_retval() add a newline
tracing/selftest: Add test to better test subops filtering of function graph
ftrace: Fix accounting of subop hashes
ftrace: Properly merge notrace hashes
tracing: Do not add length to print format in synthetic events
tracing: Hide get_vm_area() from MMUless builds
|
|
Pull bpf fixes from Alexei Starovoitov:
- Followup fixes for resilient spinlock (Kumar Kartikeya Dwivedi):
- Make res_spin_lock test less verbose, since it was spamming BPF
CI on failure, and make the check for AA deadlock stronger
- Fix rebasing mistake and use architecture provided
res_smp_cond_load_acquire
- Convert BPF maps (queue_stack and ringbuf) to resilient spinlock
to address long standing syzbot reports
- Make sure that classic BPF load instruction from SKF_[NET|LL]_OFF
offsets works when skb is fragmeneted (Willem de Bruijn)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Convert ringbuf map to rqspinlock
bpf: Convert queue_stack map to rqspinlock
bpf: Use architecture provided res_smp_cond_load_acquire
selftests/bpf: Make res_spin_lock AA test condition stronger
selftests/net: test sk_filter support for SKF_NET_OFF on frags
bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
selftests/bpf: Make res_spin_lock test less verbose
|
|
When rv_is_container_monitor() is called on the last monitor in
rv_monitors_list, KASAN yells:
BUG: KASAN: global-out-of-bounds in rv_is_container_monitor+0x101/0x110
Read of size 8 at addr ffffffff97c7c798 by task setup/221
The buggy address belongs to the variable:
rv_monitors_list+0x18/0x40
This is due to list_next_entry() is called on the last entry in the list.
It wraps around to the first list_head, and the first list_head is not
embedded in struct rv_monitor_def.
Fix it by checking if the monitor is last in the list.
Cc: stable@vger.kernel.org
Cc: Gabriele Monaco <gmonaco@redhat.com>
Fixes: cb85c660fcd4 ("rv: Add option for nested monitors and include sched")
Link: https://lore.kernel.org/e85b5eeb7228bfc23b8d7d4ab5411472c54ae91b.1744355018.git.namcao@linutronix.de
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The retval and retaddr options for function_graph tracer will add a
comment at the end of a function for both leaf and non leaf functions that
looks like:
__wake_up_common(); /* ret=0x1 */
} /* pick_next_task_fair ret=0x0 */
The function print_graph_retval() adds a newline after the "*/". But if
that's not called, the caller function needs to make sure there's a
newline added.
This is confusing and when the function parameters code was added, it
added a newline even when calling print_graph_retval() as the fact that
the print_graph_retval() function prints a newline isn't obvious.
This caused an extra newline to be printed and that made it fail the
selftests when the retval option was set, as the selftests were not
expecting blank lines being injected into the trace.
Instead of having print_graph_retval() print a newline, just have the
caller always print the newline regardless if it calls print_graph_retval()
or not. This not only fixes this bug, but it also simplifies the code.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250411133015.015ca393@gandalf.local.home
Reported-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/ccc40f2b-4b9e-4abd-8daf-d22fce2a86f0@sirena.org.uk/
Fixes: ff5c9c576e754 ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In a prior patch series we tried to cleanly differentiate between:
(1) The task has already been reaped.
(2) The caller requested a pidfd for a thread-group leader but the pid
actually references a struct pid that isn't used as a thread-group
leader.
as this was causing issues for non-threaded workloads.
But there's cases where the current simple logic is wrong. Specifically,
if the pid was a leader pid and the check races with __unhash_process().
Stabilize this by using the pidfd waitqueue lock.
Link: https://lore.kernel.org/20250411-work-pidfs-enoent-v2-2-60b2d3bb545f@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Move the pidfd notification out of __change_pid() and into
__unhash_process(). The only valid call to __change_pid() with a NULL
argument and PIDTYPE_PID is from __unhash_process(). This is a lot more
obvious than calling it from __change_pid().
Link: https://lore.kernel.org/20250411-work-pidfs-enoent-v2-1-60b2d3bb545f@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The function graph infrastructure uses ftrace to hook to functions. It has
a single ftrace_ops to manage all the users of function graph. Each
individual user (tracing, bpf, fprobes, etc) has its own ftrace_ops to
track the functions it will have its callback called from. These
ftrace_ops are "subops" to the main ftrace_ops of the function graph
infrastructure.
Each ftrace_ops has a filter_hash and a notrace_hash that is defined as:
Only trace functions that are in the filter_hash but not in the
notrace_hash.
If the filter_hash is empty, it means to trace all functions.
If the notrace_hash is empty, it means do not disable any function.
The function graph main ftrace_ops needs to be a superset containing all
the functions to be traced by all the subops it has. The algorithm to
perform this merge was incorrect.
When the first subops was added to the main ops, it simply made the main
ops a copy of the subops (same filter_hash and notrace_hash).
When a second ops was added, it joined the new subops filter_hash with the
main ops filter_hash as a union of the two sets. The intersect between the
new subops notrace_hash and the main ops notrace_hash was created as the
new notrace_hash of the main ops.
The issue here is that it would then start tracing functions than no
subops were tracing. For example if you had two subops that had:
subops 1:
filter_hash = '*sched*' # trace all functions with "sched" in it
notrace_hash = '*time*' # except do not trace functions with "time"
subops 2:
filter_hash = '*lock*' # trace all functions with "lock" in it
notrace_hash = '*clock*' # except do not trace functions with "clock"
The intersect of '*time*' functions with '*clock*' functions could be the
empty set. That means the main ops will be tracing all functions with
'*time*' and all "*clock*" in it!
Instead, modify the algorithm to be a bit simpler and correct.
First, when adding a new subops, even if it's the first one, do not add
the notrace_hash if the filter_hash is not empty. Instead, just add the
functions that are in the filter_hash of the subops but not in the
notrace_hash of the subops into the main ops filter_hash. There's no
reason to add anything to the main ops notrace_hash.
The notrace_hash of the main ops should only be non empty iff all subops
filter_hashes are empty (meaning to trace all functions) and all subops
notrace_hashes include the same functions.
That is, the main ops notrace_hash is empty if any subops filter_hash is
non empty.
The main ops notrace_hash only has content in it if all subops
filter_hashes are empty, and the content are only functions that intersect
all the subops notrace_hashes. If any subops notrace_hash is empty, then
so is the main ops notrace_hash.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/20250409152720.216356767@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The global notrace hash should be jointly decided by the intersection of
each subops's notrace hash, but not the filter hash.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250408160258.48563-1-andybnac@gmail.com
Fixes: 5fccc7552ccb ("ftrace: Add subops logic to allow one ops to manage many")
Signed-off-by: Andy Chiu <andybnac@gmail.com>
[ fixed removing of freeing of filter_hash ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When no audit rules are in place, AUDIT_ANOM_{LINK,CREAT} events
reported in audit_log_path_denied() are unconditionally dropped due to
an explicit check for the existence of any audit rules. Given this is a
report of a security violation, allow it to be recorded regardless of
the existence of any audit rules.
To test,
mkdir -p /root/tmp
chmod 1777 /root/tmp
touch /root/tmp/test.txt
useradd test
chown test /root/tmp/test.txt
{echo C0644 12 test.txt; printf 'hello\ntest1\n'; printf \\000;} | \
scp -t /root/tmp
Check with
ausearch -m ANOM_CREAT -ts recent
Link: https://issues.redhat.com/browse/RHEL-9065
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
audit_log_vformat() is using printf() type of format, and GCC compiler
(Debian 14.2.0-17) is not happy about this:
kernel/audit.c:1978:9: error: function ‘audit_log_vformat’
might be a candidate for ‘gnu_printf’ format attribute
kernel/audit.c:1987:17: error: function ‘audit_log_vformat’
might be a candidate for ‘gnu_printf’ format attribute
Fix the compilation errors (`make W=1` when CONFIG_WERROR=y, which is
default) by adding __printf() attribute.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[PM: commit description line wrap fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Convert the raw spinlock used by BPF ringbuf to rqspinlock. Currently,
we have an open syzbot report of a potential deadlock. In addition, the
ringbuf can fail to reserve spuriously under contention from NMI
context.
It is potentially attractive to enable unconstrained usage (incl. NMIs)
while ensuring no deadlocks manifest at runtime, perform the conversion
to rqspinlock to achieve this.
This change was benchmarked for BPF ringbuf's multi-producer contention
case on an Intel Sapphire Rapids server, with hyperthreading disabled
and performance governor turned on. 5 warm up runs were done for each
case before obtaining the results.
Before (raw_spinlock_t):
Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1 11.440 ± 0.019M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2 2.706 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3 3.130 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4 2.472 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8 2.352 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 2.813 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 1.988 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 2.245 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 2.148 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 2.190 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 2.490 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 2.180 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 2.201 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 2.226 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 2.164 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 1.874 ± 0.001M/s (drops 0.000 ± 0.000M/s)
After (rqspinlock_t):
Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1 11.078 ± 0.019M/s (drops 0.000 ± 0.000M/s) (-3.16%)
rb-libbpf nr_prod 2 2.801 ± 0.014M/s (drops 0.000 ± 0.000M/s) (3.51%)
rb-libbpf nr_prod 3 3.454 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.35%)
rb-libbpf nr_prod 4 2.567 ± 0.002M/s (drops 0.000 ± 0.000M/s) (3.84%)
rb-libbpf nr_prod 8 2.468 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.93%)
rb-libbpf nr_prod 12 2.510 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-10.77%)
rb-libbpf nr_prod 16 2.075 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.38%)
rb-libbpf nr_prod 20 2.640 ± 0.001M/s (drops 0.000 ± 0.000M/s) (17.59%)
rb-libbpf nr_prod 24 2.092 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-2.61%)
rb-libbpf nr_prod 28 2.426 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.78%)
rb-libbpf nr_prod 32 2.331 ± 0.004M/s (drops 0.000 ± 0.000M/s) (-6.39%)
rb-libbpf nr_prod 36 2.306 ± 0.003M/s (drops 0.000 ± 0.000M/s) (5.78%)
rb-libbpf nr_prod 40 2.178 ± 0.002M/s (drops 0.000 ± 0.000M/s) (-1.04%)
rb-libbpf nr_prod 44 2.293 ± 0.001M/s (drops 0.000 ± 0.000M/s) (3.01%)
rb-libbpf nr_prod 48 2.022 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-6.56%)
rb-libbpf nr_prod 52 1.809 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-3.47%)
There's a fair amount of noise in the benchmark, with numbers on reruns
going up and down by 10%, so all changes are in the range of this
disturbance, and we see no major regressions.
Reported-by: syzbot+850aaf14624dc0c6d366@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000004aa700061379547e@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250411101759.4061366-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a tracepoint to monitor TCP send operations, enabling detailed
visibility into TCP message transmission.
Create a new tracepoint within the tcp_sendmsg_locked function,
capturing traditional fields along with size_goal, which indicates the
optimal data size for a single TCP segment. Additionally, a reference to
the struct sock sk is passed, allowing direct access for BPF programs.
The implementation is largely based on David's patch[1] and suggestions.
Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250408-tcpsendmsg-v3-2-208b87064c28@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc2).
Conflict:
Documentation/networking/netdevices.rst
net/core/lock_debug.c
04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE")
03df156dd3a6 ("xdp: double protect netdev->xdp_flags with netdev->lock")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc timer fixes from Ingo Molnar:
- Fix missing ACCESS_PRIVATE() that triggered a Sparse warning
- Fix lockdep false positive in tick_freeze() on CONFIG_PREEMPT_RT=y
- Avoid <vdso/unaligned.h> macro's variable shadowing to address build
warning that triggers under W=2 builds
* tag 'timers-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vdso: Address variable shadowing in macros
timekeeping: Add a lockdep override in tick_freeze()
hrtimer: Add missing ACCESS_PRIVATE() for hrtimer::function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc perf events fixes from Ingo Molnar:
- Fix __free_event() corner case splat
- Fix false-positive uprobes related lockdep splat on
CONFIG_PREEMPT_RT=y kernels
- Fix a complicated perf sigtrap race that may result in hangs
* tag 'perf-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix hang while freeing sigtrap event
uprobes: Avoid false-positive lockdep splat on CONFIG_PREEMPT_RT=y in the ri_timer() uprobe timer callback, use raw_write_seqcount_*()
perf/core: Fix WARN_ON(!ctx) in __free_event() for partial init
|
|
Replace all usage of raw_spinlock_t in queue_stack_maps.c with
rqspinlock. This is a map type with a set of open syzbot reports
reproducing possible deadlocks. Prior attempt to fix the issues
was at [0], but was dropped in favor of this approach.
Make sure we return the -EBUSY error in case of possible deadlocks or
timeouts, just to make sure user space or BPF programs relying on the
error code to detect problems do not break.
With these changes, the map should be safe to access in any context,
including NMIs.
[0]: https://lore.kernel.org/all/20240429165658.1305969-1-sidchintamaneni@gmail.com
Reported-by: syzbot+8bdfc2c53fb2b63e1871@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000004c3fc90615f37756@google.com
Reported-by: syzbot+252bc5c744d0bba917e1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c80abd0616517df9@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250410153142.2064340-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In v2 of rqspinlock [0], we fixed potential problems with WFE usage in
arm64 to fallback to a version copied from Ankur's series [1]. This
logic was moved into arch-specific headers in v3 [2].
However, we missed using the arch-provided res_smp_cond_load_acquire
in commit ebababcd0372 ("rqspinlock: Hardcode cond_acquire loops for arm64")
due to a rebasing mistake between v2 and v3 of the rqspinlock series.
Fix the typo to fallback to the arm64 definition as we did in v2.
[0]: https://lore.kernel.org/bpf/20250206105435.2159977-18-memxor@gmail.com
[1]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com
[2]: https://lore.kernel.org/bpf/20250303152305.3195648-9-memxor@gmail.com
Fixes: ebababcd0372 ("rqspinlock: Hardcode cond_acquire loops for arm64")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250410145512.1876745-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The update of element in fd htab is in-place now, therefore, there is no
need to allocate per-cpu extra_elems, just remove it.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add is_fd_htab() helper to check whether the map is htab of maps.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
As reported by Cody Haas [1], when there is concurrent map lookup and
map update operation in an existing element for htab of maps, the map
lookup procedure may return -ENOENT unexpectedly.
The root cause is twofold:
1) the update of existing element involves two separated list operation
In htab_map_update_elem(), it first inserts the new element at the head
of list, then it deletes the old element. Therefore, it is possible a
lookup operation has already iterated to the middle of the list when a
concurrent update operation begins, and the lookup operation will fail
to find the target element.
2) the immediate reuse of htab element.
It is more subtle. Even through the lookup operation finds the old
element, it is possible that the target element has been removed by a
concurrent update operation, and the element has been reused immediately
by other update operation which runs on the same CPU as the previous
update operation, and the element is inserted into the same bucket list.
After these steps above, when the lookup operation tries to compare the
key in the old element with the expected key, the match will fail
because the key in the old element have been overwritten by other update
operation.
The two-step update process is relatively straightforward to address.
The more challenging aspect is the immediate reuse. As Alexei pointed
out:
So since 2022 both prealloc and no_prealloc reuse elements.
We can consider a new flag for the hash map like F_REUSE_AFTER_RCU_GP
that will use _rcu() flavor of freeing into bpf_ma,
but it has to have a strong reason.
Given that htab of maps doesn't support special field in value and
directly stores the inner map pointer in htab_element, just do in-place
update for htab of maps instead of attempting to address the immediate
reuse issue.
[1]: https://lore.kernel.org/xdp-newbies/CAH7f-ULFTwKdoH_t2SFc5rWCVYLEg-14d1fBYWH2eekudsnTRg@mail.gmail.com/
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Rename __htab_percpu_map_update_elem to htab_map_update_elem_in_place,
and add a new percpu argument for the helper to support in-place update
for both per-cpu htab and htab of maps.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
All hash maps store map key and map value together. The relative offset
of the map value compared to the map key is round_up(key_size, 8).
Therefore, factor out a common helper htab_elem_value() to calculate the
address of the map value instead of duplicating the logic.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Recent change [0] resulted in a "BUG: using __this_cpu_read() in
preemptible" splat [1]. PREEMPT kernels have additional requirements
on what can and can not run with/without preemption enabled.
Expose those constrains in the debug kernels.
0: https://lore.kernel.org/netdev/20250314120048.12569-2-justin.iurman@uliege.be/
1: https://lore.kernel.org/netdev/20250402094458.006ba2a7@kernel.org/T/#mbf72641e9d7d274daee9003ef5edf6833201f1bc
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250402172305.1775226-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The link_create.flags are currently not used for multi-uprobes, so return
-EINVAL if it is set, same as for other attach APIs.
We allow target_fd to have an arbitrary value for multi-uprobe, though,
as there are existing users (libbpf) relying on this.
Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250407035752.1108927-2-chen.dylane@linux.dev
|
|
The link_create.flags are currently not used for multi-kprobes, so return
-EINVAL if it is set, same as for other attach APIs.
We allow target_fd, on the other hand, to have an arbitrary value for
multi-kprobe, as there are existing users (libbpf) relying on this.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250407035752.1108927-1-chen.dylane@linux.dev
|
|
tick_freeze() acquires a raw spinlock (tick_freeze_lock). Later in the
callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver
acquires a spinlock which becomes a sleeping lock on PREEMPT_RT. Lockdep
complains about this lock nesting.
Add a lockdep override for this special case and a comment explaining
why it is okay.
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250404133429.pnAzf-eF@linutronix.de
Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/
Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
|
|
Move posix_timers_cache initialization to posixtimer_init(). At that point
the memory subsystem is already up and running.
Also move the cache pointer to the __timer_data variable to avoid
potential false sharing, since it never was marked as __ro_after_init.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250402133114.253901-1-edumazet@google.com
|
|
scx_has_op is used to encode which ops are implemented by the BPF scheduler
into an array of static_keys. While this saves a bit of branching overhead,
that is unlikely to be noticeable compared to the overall cost. As the
global static_keys can't work with the planned hierarchical multiple
scheduler support, replace the static_key array with a bitmap.
In repeated hackbench runs before and after static_keys removal on an AMD
Ryzen 3900X, I couldn't tell any measurable performance difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Changwoo Min <changwoo@igalia.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
|
|
scx_ops_allow_queued_wakeup is used to encode SCX_OPS_ALLOW_QUEUED_WAKEUP
into a static_key. The test is gated behind scx_enabled(), and, even when
sched_ext is enabled, is unlikely for the static_key usage to make any
meaningful difference. It is made to use a static_key mostly because there
was no reason not to. However, global static_keys can't work with the
planned hierarchical multiple scheduler support. Remove the static_key and
instead test SCX_OPS_ALLOW_QUEUED_WAKEUP directly.
In repeated hackbench runs before and after static_keys removal on an AMD
Ryzen 3900X, I couldn't tell any measurable performance difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Changwoo Min <changwoo@igalia.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
|
|
scx_ops_cpu_preempt is used to encode whether ops.cpu_acquire/release() are
implemented into a static_key. These tests aren't hot enough for static_key
usage to make any meaningful difference and are made to use a static_key
mostly because there was no reason not to. However, global static_keys can't
work with the planned hierarchical multiple scheduler support. Remove the
static_key and instead use an internal ops flag SCX_OPS_HAS_CPU_PREEMPT to
record and test whether ops.cpu_acquire/release() are implemented.
In repeated hackbench runs before and after static_keys removal on an AMD
Ryzen 3900X, I couldn't tell any measurable performance difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Changwoo Min <changwoo@igalia.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
|
|
scx_ops_enq_last/exiting/migration_disabled are used to encode the
corresponding SCX_OPS_ flags into static_keys. These flags aren't hot enough
for static_key usage to make any meaningful difference and are made
static_keys mostly because there was no reason not to. However, global
static_keys can't work with the planned hierarchical multiple scheduler
support. Remove the static_keys and test the ops flags directly.
In repeated hackbench runs before and after static_keys removal on an AMD
Ryzen 3900X, I couldn't tell any measurable performance difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Changwoo Min <changwoo@igalia.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
|
|
Purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Changwoo Min <changwoo@igalia.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
|
|
The "function" field of struct hrtimer has been changed to private, but
two instances have not been converted to use ACCESS_PRIVATE().
Convert them to use ACCESS_PRIVATE().
Fixes: 04257da0c99c ("hrtimers: Make callback function pointer private")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250408103854.1851093-1-namcao@linutronix.de
Closes: https://lore.kernel.org/oe-kbuild-all/202504071931.vOVl13tt-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202504072155.5UAZjYGU-lkp@intel.com/
|
|
Now that all abuse is gone and the legit users are converted to
guard(msi_descs_lock), rename the lock functions and document them as
internal.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com>
Link: https://lore.kernel.org/all/20250319105506.864699741@linutronix.de
|
|
Provide a lock guard for MSI descriptor locking and update the core code
accordingly.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/all/20250319105506.144672678@linutronix.de
|
|
The size parameter is optional and strscpy() automatically determines
the length of the destination buffer using sizeof() if the argument is
omitted. This makes the explicit sizeof() calls unnecessary. Remove
them to shorten and simplify the code.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250318080755.61126-2-thorsten.blum@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The following causes a vsnprintf fault:
# echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
# echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger
Because the synthetic event's "wakee" field is created as a dynamic string
(even though the string copied is not). The print format to print the
dynamic string changed from "%*s" to "%s" because another location
(__set_synth_event_print_fmt()) exported this to user space, and user
space did not need that. But it is still used in print_synth_event(), and
the output looks like:
<idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
<idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91
bash-880 [002] d..5. 193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21
<idle>-0 [001] d..5. 193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129
sshd-session-879 [001] d..5. 193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50
The length isn't needed as the string is always nul terminated. Just print
the string and not add the length (which was hard coded to the max string
length anyway).
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home
Fixes: 4d38328eb442d ("tracing: Fix synth event printk format for str fields");
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When building with W=1, this variable is unused for configs with
CONFIG_CMA_SIZE_SEL_PERCENTAGE=y:
kernel/dma/contiguous.c:67:26: error: 'size_bytes' defined but not used [-Werror=unused-const-variable=]
Change this to a macro to avoid the warning.
Fixes: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20250409151557.3890443-1-arnd@kernel.org
|
|
Move sparc sysctls (reboot-cmd, stop-a, scons-poweroff and tsb-ratio)
into a new file (arch/sparc/kernel/setup.c). This file will be included
for both 32 and 64 bit sparc. Leave "tsb-ratio" under SPARC64 ifdef as
it was in kernel/sysctl.c. The sysctl table register is called with
arch_initcall placing it after its original place in proc_root_init.
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
Move stack_tracer_enabled into trace_stack_sysctl_table. This is part of
a greater effort to move ctl tables into their respective subsystems
which will reduce the merge conflicts in kernel/sysctl.c.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
Move trace ctl tables into their own const array in
kernel/trace/trace.c. The sysctl table register is called with
subsys_initcall placing if after its original place in proc_root_init.
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Move print-fatal-signals into its own const ctl table array in
kernel/signal.c. This is part of a greater effort to move ctl tables
into their respective subsystems which will reduce the merge conflicts
in kernel/sysctl.c.
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
Move panic, panic_on_oops, panic_print, panic_on_warn into
kerne/panic.c. This is part of a greater effort to move ctl tables into
their respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
When booting BeagleBone Black:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4398 lockdep_hardirqs_on_prepare+0x23c/0x280
DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled)
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.15.0-rc1-boneblack-00004-g195298c3b116 #209 NONE
Hardware name: Generic AM33XX (Flattened Device Tree)
Call trace:
_raw_spin_unlock_irq from irq_map_generic_chip+0x144/0x190
irq_map_generic_chip from irq_domain_associate_locked+0x68/0x164
irq_domain_associate_locked from irq_create_fwspec_mapping+0x34c/0x43c
irq_create_fwspec_mapping from irq_create_of_mapping+0x64/0x8c
irq_create_of_mapping from irq_of_parse_and_map+0x54/0x7c
irq_of_parse_and_map from dmtimer_clkevt_init_common+0x54/0x15c
dmtimer_clkevt_init_common from dmtimer_systimer_init+0x41c/0x5b8
dmtimer_systimer_init from timer_probe+0x68/0xf0
timer_probe from start_kernel+0x4a4/0x6bc
start_kernel from 0x0
irq event stamp: 0
hardirqs last enabled at (0): [<00000000>] 0x0
hardirqs last disabled at (0): [<00000000>] 0x0
softirqs last enabled at (0): [<00000000>] 0x0
softirqs last disabled at (0): [<00000000>] 0x0
---[ end trace 0000000000000000 ]---
and:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at init/main.c:1022 start_kernel+0x4e8/0x6bc
Interrupts were enabled early
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.15.0-rc1-boneblack-00004-g195298c3b116 #209 NONE
Tainted: [W]=WARN
Hardware name: Generic AM33XX (Flattened Device Tree)
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x6c/0x90
dump_stack_lvl from __warn+0x70/0x1b0
__warn from warn_slowpath_fmt+0x1d4/0x1ec
warn_slowpath_fmt from start_kernel+0x4e8/0x6bc
start_kernel from 0x0
irq event stamp: 0
hardirqs last enabled at (0): [<00000000>] 0x0
hardirqs last disabled at (0): [<00000000>] 0x0
softirqs last enabled at (0): [<00000000>] 0x0
softirqs last disabled at (0): [<00000000>] 0x0
---[ end trace 0000000000000000 ]---
Fix this by correcting two misconversions of
raw_spin_{,un}lock_irq{save,restore}() to lock guards.
Fixes: 195298c3b11628a6 ("genirq/generic-chip: Convert core code to lock guards")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/514f94c5891c61ac0a4a7fdad113e75db1eea367.1744135467.git.geert+renesas@glider.be
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- fprobe: remove fprobe_hlist_node when module unloading
When a fprobe target module is removed, the fprobe_hlist_node should
be removed from the fprobe's hash table to prevent reusing
accidentally if another module is loaded at the same address.
- fprobe: lock module while registering fprobe
The module containing the function to be probeed is locked using a
reference counter until the fprobe registration is complete, which
prevents use after free.
- fprobe-events: fix possible UAF on modules
Basically as same as above, but in the fprobe-events layer we also
need to get module reference counter when we find the tracepoint in
the module.
* tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: fprobe: Cleanup fprobe hash when module unloading
tracing: fprobe events: Fix possible UAF on modules
tracing: fprobe: Fix to lock module while registering fprobe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- A number of cpuset remote partition related fixes and cleanups along
with selftest updates.
- A change from this merge window made cgroup_rstat_updated_list()
called outside cgroup_rstat_lock leading to list corruptions. Fix it
by relocating the call inside the lock.
* tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix race between newly created partition and dying one
cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock
selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh
selftest/cgroup: Clean up and restructure test_cpuset_prs.sh
selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and state separator
cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename it
cgroup/cpuset: Code cleanup and comment update
cgroup/cpuset: Don't allow creation of local partition over a remote one
cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
cgroup/cpuset: Fix error handling in remote_partition_disable()
cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
|