summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2020-11-09module: fix comment styleSergey Shtylyov
Many comments in this module do not comply with the preferred multi-line comment style as reported by 'scripts/checkpatch.pl': WARNING: Block comments use * on subsequent lines WARNING: Block comments use a trailing */ on a separate line Fix those comments, along with (unreported for some reason?) the starts of the multi-line comments not being /* on their own line... Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2020-11-09module: add more 'kernel-doc' commentsSergey Shtylyov
Some functions have the proper 'kernel-doc' comments but these don't start with proper /** -- fix that, along with adding () to the function name on the following lines to fully comply with the 'kernel-doc' format. Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2020-11-09module: fix up 'kernel-doc' commentsSergey Shtylyov
Some 'kernel-doc' function comments do not fully comply with the specified format due to: - missing () after the function name; - "RETURNS:"/"Returns:" instead of "Return:" when documenting the function's result. - empty line before describing the function's arguments. Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2020-11-08fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parentEddy Wu
current->group_leader->exit_signal may change during copy_process() if current->real_parent exits. Move the assignment inside tasklist_lock to avoid the race. Signed-off-by: Eddy Wu <eddy_wu@trendmicro.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-08Merge tag 'perf-urgent-2020-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the perf core plugging a memory leak in the address filter parser" * tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix a memory leak in perf_event_parse_addr_filter()
2020-11-08Merge tag 'locking-urgent-2020-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fix from Thomas Gleixner: "A single fix for the futex code where an intermediate state in the underlying RT mutex was not handled correctly and triggering a BUG() instead of treating it as another variant of retry condition" * tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Handle transient "ownerless" rtmutex state correctly
2020-11-08Merge tag 'irq-urgent-2020-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for interrupt chip drivers: - Fix the fallout of the IPI as interrupt conversion in Kconfig and the BCM2836 interrupt chip driver - Fixes for interrupt affinity setting and the handling of hierarchical irq domains in the SiFive PLIC driver - Make the unmapped event handling in the TI SCI driver work correctly - A few minor fixes and cleanups in various chip drivers and Kconfig" * tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dt-bindings: irqchip: ti, sci-inta: Fix diagram indentation for unmapped events irqchip/ti-sci-inta: Add support for unmapped event handling dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm irqchip/sifive-plic: Fix chip_data access within a hierarchy irqchip/sifive-plic: Fix broken irq_set_affinity() callback irqchip/stm32-exti: Add all LP timer exti direct events support irqchip/bcm2836: Fix missing __init annotation irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY irqchip/mst: Make mst_intc_of_init static irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7 genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
2020-11-08Merge tag 'core-urgent-2020-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull entry code fix from Thomas Gleixner: "A single fix for the generic entry code to correct the wrong assumption that the lockdep interrupt state needs not to be established before calling the RCU check" * tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Fix the incorrect ordering of lockdep and RCU check
2020-11-07futex: Handle transient "ownerless" rtmutex state correctlyMike Galbraith
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner(). This is one possible chain of events leading to this: Task Prio Operation T1 120 lock(F) T2 120 lock(F) -> blocks (top waiter) T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter) XX timeout/ -> wakes T2 signal T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set) T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter and the lower priority T2 cannot steal the lock. -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON() The comment states that this is invalid and rt_mutex_real_owner() must return a non NULL owner when the trylock failed, but in case of a queued and woken up waiter rt_mutex_real_owner() == NULL is a valid transient state. The higher priority waiter has simply not yet managed to take over the rtmutex. The BUG_ON() is therefore wrong and this is just another retry condition in fixup_pi_state_owner(). Drop the locks, so that T3 can make progress, and then try the fixup again. Gratian provided a great analysis, traces and a reproducer. The analysis is to the point, but it confused the hell out of that tglx dude who had to page in all the futex horrors again. Condensed version is above. [ tglx: Wrote comment and changelog ] Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
2020-11-07Merge branch 'linus' into perf/kprobesIngo Molnar
Conflicts: include/asm-generic/atomic-instrumented.h kernel/kprobes.c Use the upstream atomic-instrumented.h checksum, and pick the kprobes version of kernel/kprobes.c, which effectively reverts this upstream workaround: 645f224e7ba2: ("kprobes: Tell lockdep about kprobe nesting") Since the new code *should* be fine without nesting. Knock on wood ... Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-07Merge branch 'linus' into perf/kprobesIngo Molnar
Merge recent kprobes updates into perf/kprobes that came from -mm. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-07perf/core: Fix a memory leak in perf_event_parse_addr_filter()kiyin(尹亮)
As shown through runtime testing, the "filename" allocation is not always freed in perf_event_parse_addr_filter(). There are three possible ways that this could happen: - It could be allocated twice on subsequent iterations through the loop, - or leaked on the success path, - or on the failure path. Clean up the code flow to make it obvious that 'filename' is always freed in the reallocation path and in the two return paths as well. We rely on the fact that kfree(NULL) is NOP and filename is initialized with NULL. This fixes the leak. No other side effects expected. [ Dan Carpenter: cleaned up the code flow & added a changelog. ] [ Ingo Molnar: updated the changelog some more. ] Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Signed-off-by: "kiyin(尹亮)" <kiyin@tencent.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu> Cc: Anthony Liguori <aliguori@amazon.com> -- kernel/events/core.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
2020-11-07irqdomain: Introduce irq_domain_create_legacy() APIAndy Shevchenko
Introduce irq_domain_create_legacy() API which is functional equivalent to the existing irq_domain_add_legacy(), but takes a pointer to the struct fwnode_handle as a parameter. This is useful for non OF systems. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20201030165919.86234-5-andriy.shevchenko@linux.intel.com
2020-11-07irqdomain: Replace open coded of_node_to_fwnode()Andy Shevchenko
of_node_to_fwnode() should be used for conversion. Replace the open coded variant of it in of_phandle_args_to_fwspec(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20201030165919.86234-4-andriy.shevchenko@linux.intel.com
2020-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf 2020-11-06 1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David. 2) Tighten bpf_lsm function check, from KP. 3) Fix bpftool attaching to flow dissector, from Lorenz. 4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard. * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Update verification logic for LSM programs bpf: Zero-fill re-used per-cpu map element bpf: BPF_PRELOAD depends on BPF_SYSCALL tools/bpftool: Fix attaching flow dissector libbpf: Fix possible use after free in xsk_socket__delete libbpf: Fix null dereference in xsk_socket__delete libbpf, hashmap: Fix undefined behavior in hash_bits bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE tools, bpftool: Remove two unused variables. tools, bpftool: Avoid array index warnings. xsk: Fix possible memory leak at socket close bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs samples/bpf: Set rlimit for memlock to infinity in all samples bpf: Fix -Wshadow warnings selftest/bpf: Fix profiler test using CO-RE relocation for enums ==================== Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06kcsan: Fix encoding masks and regain address bitMarco Elver
The watchpoint encoding masks for size and address were off-by-one bit each, with the size mask using 1 unnecessary bit and the address mask missing 1 bit. However, due to the way the size is shifted into the encoded watchpoint, we were effectively wasting and never using the extra bit. For example, on x86 with PAGE_SIZE==4K, we have 1 bit for the is-write bit, 14 bits for the size bits, and then 49 bits left for the address. Prior to this fix we would end up with this usage: [ write<1> | size<14> | wasted<1> | address<48> ] Fix it by subtracting 1 bit from the GENMASK() end and start ranges of size and address respectively. The added static_assert()s verify that the masks are as expected. With the fixed version, we get the expected usage: [ write<1> | size<14> | address<49> ] Functionally no change is expected, since that extra address bit is insignificant for enabled architectures. Acked-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcu-tasks: Make the units of ->init_fract be jiffiesPaul E. McKenney
Currently, the units of ->init_fract are milliseconds while those of ->gp_sleep are jiffies. For consistency with each other and with the argument of schedule_timeout_idle(), this commit changes the units of ->init_fract to jiffies. This change does affect the backoff algorithm, but only on systems where HZ is not 1000, and even there the change makes more sense, given that the current setup would "back off" to the same number of jiffies repeatedly. In contrast, with this change, the number of jiffies waited increases on each pass through the loop in the rcu_tasks_wait_gp() function. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Don't do need_resched() testing if ->sync is NULLPaul E. McKenney
If cur_ops->sync is NULL, rcu_torture_fwd_prog_nr() will nevertheless attempt to call through it. This commit therefore flags cases where neither need_resched() nor call_rcu() forward-progress testing can be performed due to NULL function pointers, and also causes rcu_torture_fwd_prog_nr() to take an early exit if cur_ops->sync() is NULL. Reported-by: Tom Rix <trix@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06locktorture: Invoke percpu_free_rwsem() to do percpu-rwsem cleanupHou Tao
When executing the LOCK06 locktorture scenario featuring percpu-rwsem, the RCU callback rcu_sync_func() may still be pending after locktorture module is removed. This can in turn lead to the following Oops: BUG: unable to handle page fault for address: ffffffffc00eb920 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 6500a067 P4D 6500a067 PUD 6500c067 PMD 13a36c067 PTE 800000013691c163 Oops: 0000 [#1] PREEMPT SMP CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:rcu_cblist_dequeue+0x12/0x30 Call Trace: <IRQ> rcu_core+0x1b1/0x860 __do_softirq+0xfe/0x326 asm_call_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x5f/0x80 irq_exit_rcu+0xaf/0xc0 sysvec_apic_timer_interrupt+0x2e/0xb0 asm_sysvec_apic_timer_interrupt+0x12/0x20 This commit avoids tis problem by adding an exit hook in lock_torture_ops and using it to call percpu_free_rwsem() for percpu rwsem torture during the module-cleanup function, thus ensuring that rcu_sync_func() completes before module exits. It is also necessary to call the exit hook if lock_torture_init() fails half-way, so this commit also adds an ->init_called field in lock_torture_cxt to indicate that exit hook, if present, must be called. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06scftorture: Add full-test stutter capabilityPaul E. McKenney
In virtual environments on systems with hardware assist, inter-processor interrupts must do very different things based on whether the target vCPU is running or not. This commit therefore enables torture-test stuttering to better test these running/not-running transitions. Suggested-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Small code cleanupsPaul E. McKenney
The rcu_torture_cleanup() function fails to NULL out the reader_tasks pointer after freeing it and its fakewriter_tasks loop has redundant braces. This commit therefore cleans these up. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Make stutter_wait() caller restore priorityPaul E. McKenney
Currently, stutter_wait() will happily spin waiting for the stutter interval to end even if the caller is running at a real-time priority level. This could starve normal-priority tasks for no good reason. This commit therefore drops the calling task's priority to SCHED_OTHER MAX_NICE if stutter_wait() needs to wait. But when it waits, stutter_wait() returns true, which allows the caller to restore the priority if needed. Callers that were already running at SCHED_OTHER MAX_NICE obviously do not need any changes, but this commit also restores priority for higher-priority callers. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Prevent hangs for invalid argumentsPaul E. McKenney
If an rcutorture torture-test run is given a bad kvm.sh argument, the test will complain to the console, which is good. What is bad is that from the user's perspective, it will just hang for the time specified by the --duration argument. This commit therefore forces an immediate kernel shutdown if a rcu_torture_init()-time error occurs, thus avoiding the appearance of a hang. It also forces a console splat in this case to clearly indicate the presence of an error. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06locktorture: Prevent hangs for invalid argumentsPaul E. McKenney
If an locktorture torture-test run is given a bad kvm.sh argument, the test will complain to the console, which is good. What is bad is that from the user's perspective, it will just hang for the time specified by the --duration argument. This commit therefore forces an immediate kernel shutdown if a lock_torture_init()-time error occurs, thus avoiding the appearance of a hang. It also forces a console splat in this case to clearly indicate the presence of an error. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06locktorture: Ignore nreaders_stress if no readlock supportHou Tao
Exclusive locks do not have readlock support, which means that a locktorture run with the following module parameters will do nothing: torture_type=mutex_lock nwriters_stress=0 nreaders_stress=1 This commit therefore rejects this combination for exclusive locks by returning -EINVAL during module init. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06refscale: Prevent hangs for invalid argumentsPaul E. McKenney
If an refscale torture-test run is given a bad kvm.sh argument, the test will complain to the console, which is good. What is bad is that from the user's perspective, it will just hang for the time specified by the --duration argument. This commit therefore forces an immediate kernel shutdown if a ref_scale_init()-time error occurs, thus avoiding the appearance of a hang. It also forces a console splat in this case to clearly indicate the presence of an error. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcuscale: Prevent hangs for invalid argumentsPaul E. McKenney
If an rcuscale torture-test run is given a bad kvm.sh argument, the test will complain to the console, which is good. What is bad is that from the user's perspective, it will just hang for the time specified by the --duration argument. This commit therefore forces an immediate kernel shutdown if a rcu_scale_init()-time error occurs, thus avoiding the appearance of a hang. It also forces a console splat in this case to clearly indicate the presence of an error. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcuscale: Add RCU Tasks TracePaul E. McKenney
This commit adds the ability to test performance and scalability of RCU Tasks Trace updaters. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06scftorture: Add an alternative IPI vectorPaul E. McKenney
The scftorture tests currently use only smp_call_function() and friends, which means that these tests cannot locate bugs caused by interactions between different IPI vectors. This commit therefore adds the rescheduling IPI to the mix. Note that this commit permits resched_cpus() only when scftorture is built in. This is a workaround. Longer term, this will use real wakeups rather than resched_cpu(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06torture: Make torture_stutter() use hrtimerPaul E. McKenney
The torture_stutter() function uses schedule_timeout_interruptible() to time the stutter duration, but this can miss race conditions due to its being time-synchronized with everything else that is based on the timer wheels. This commit therefore converts torture_stutter() to use the high-resolution timers via schedule_hrtimeout(), and also to fuzz the stutter interval. While in the area, this commit also limits the spin-loop portion of the stutter_wait() function's wait loop to two jiffies, down from about one second. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06torture: Periodically pause in stutter_wait()Paul E. McKenney
Running locktorture scenario LOCK05 results in hangs: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05 The lock_torture_writer() kthreads set themselves to MAX_NICE while running SCHED_OTHER. Other locktorture kthreads run at default niceness, also SCHED_OTHER. This results in these other locktorture kthreads indefinitely preempting the lock_torture_writer() kthreads. Note that the cond_resched() in the stutter_wait() function's loop is ineffective because this scenario is built with CONFIG_PREEMPT=y. It is not clear that such indefinite preemption is supposed to happen, but in the meantime this commit prevents kthreads running in stutter_wait() from being completely CPU-bound, thus allowing the other threads to get some CPU in a timely fashion. This commit also uses hrtimers to provide very short sleeps to avoid degrading the sudden-on testing that stutter is supposed to provide. Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06locktorture: Track time of last ->writeunlock()Paul E. McKenney
This commit adds a last_lock_release variable that tracks the time of the last ->writeunlock() call, which allows easier diagnosing of lock hangs when using a kernel debugger. Acked-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUsPaul E. McKenney
Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to learn their clock frequency. Which is a bit strange, given that waking them from idle likely significantly changes their clock frequency. This commit therefore avoids sending /proc/cpuinfo-induced IPIs to idle CPUs. [ paulmck: Also check for idle in arch_freq_prepare_all(). ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org>
2020-11-06bpf: Update verification logic for LSM programsKP Singh
The current logic checks if the name of the BTF type passed in attach_btf_id starts with "bpf_lsm_", this is not sufficient as it also allows attachment to non-LSM hooks like the very function that performs this check, i.e. bpf_lsm_verify_prog. In order to ensure that this verification logic allows attachment to only LSM hooks, the LSM_HOOK definitions in lsm_hook_defs.h are used to generate a BTF_ID set. Upon verification, the attach_btf_id of the program being attached is checked for presence in this set. Fixes: 9e4e01dfd325 ("bpf: lsm: Implement attach, detach and execution") Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201105230651.2621917-1-kpsingh@chromium.org
2020-11-06bpf: Implement get_current_task_btf and RET_PTR_TO_BTF_IDKP Singh
The currently available bpf_get_current_task returns an unsigned integer which can be used along with BPF_CORE_READ to read data from the task_struct but still cannot be used as an input argument to a helper that accepts an ARG_PTR_TO_BTF_ID of type task_struct. In order to implement this helper a new return type, RET_PTR_TO_BTF_ID, is added. This is similar to RET_PTR_TO_BTF_ID_OR_NULL but does not require checking the nullness of returned pointer. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201106103747.2780972-6-kpsingh@chromium.org
2020-11-06bpf: Implement task local storageKP Singh
Similar to bpf_local_storage for sockets and inodes add local storage for task_struct. The life-cycle of storage is managed with the life-cycle of the task_struct. i.e. the storage is destroyed along with the owning task with a callback to the bpf_task_storage_free from the task_free LSM hook. The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the security blob which are now stackable and can co-exist with other LSMs. The userspace map operations can be done by using a pid fd as a key passed to the lookup, update and delete operations. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201106103747.2780972-3-kpsingh@chromium.org
2020-11-06bpf: Allow LSM programs to use bpf spin locksKP Singh
Usage of spin locks was not allowed for tracing programs due to insufficient preemption checks. The verifier does not currently prevent LSM programs from using spin locks, but the helpers are not exposed via bpf_lsm_func_proto. Based on the discussion in [1], non-sleepable LSM programs should be able to use bpf_spin_{lock, unlock}. Sleepable LSM programs can be preempted which means that allowng spin locks will need more work (disabling preemption and the verifier ensuring that no sleepable helpers are called when a spin lock is held). [1]: https://lore.kernel.org/bpf/20201103153132.2717326-1-kpsingh@chromium.org/T/#md601a053229287659071600d3483523f752cd2fb Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201106103747.2780972-2-kpsingh@chromium.org
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: live-patching@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06ftrace: Reverse what the RECURSION flag means in the ftrace_opsSteven Rostedt (VMware)
Now that all callbacks are recursion safe, reverse the meaning of the RECURSION flag and rename it from RECURSION_SAFE to simply RECURSION. Now only callbacks that request to have recursion protecting it will have the added trampoline to do so. Also remove the outdated comment about "PER_CPU" when determining to use the ftrace_ops_assist_func. Link: https://lkml.kernel.org/r/20201028115613.742454631@goodmis.org Link: https://lkml.kernel.org/r/20201106023547.904270143@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: linux-doc@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06perf/ftrace: Check for rcu_is_watching() in callback functionSteven Rostedt (VMware)
If a ftrace callback requires "rcu_is_watching", then it adds the FTRACE_OPS_FL_RCU flag and it will not be called if RCU is not "watching". But this means that it will use a trampoline when called, and this slows down the function tracing a tad. By checking rcu_is_watching() from within the callback, it no longer needs the RCU flag set in the ftrace_ops and it can be safely called directly. Link: https://lkml.kernel.org/r/20201028115613.591878956@goodmis.org Link: https://lkml.kernel.org/r/20201106023547.711035826@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06perf/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/20201028115613.444477858@goodmis.org Link: https://lkml.kernel.org/r/20201106023547.466892083@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06livepatch: Trigger WARNING if livepatch function fails due to recursionSteven Rostedt (VMware)
If for some reason a function is called that triggers the recursion detection of live patching, trigger a warning. By not executing the live patch code, it is possible that the old unpatched function will be called placing the system into an unknown state. Link: https://lore.kernel.org/r/20201029145709.GD16774@alley Link: https://lkml.kernel.org/r/20201106023547.312639435@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: live-patching@vger.kernel.org Suggested-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06livepatch/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/20201028115613.291169246@goodmis.org Link: https://lkml.kernel.org/r/20201106023547.122802424@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: live-patching@vger.kernel.org Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06ftrace: Add ftrace_test_recursion_trylock() helper functionSteven Rostedt (VMware)
To make it easier for ftrace callbacks to have recursion protection, provide a ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() helper that tests for recursion. Link: https://lkml.kernel.org/r/20201028115612.634927593@goodmis.org Link: https://lkml.kernel.org/r/20201106023546.378584067@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06ftrace: Move the recursion testing into global headersSteven Rostedt (VMware)
Currently, if a callback is registered to a ftrace function and its ftrace_ops does not have the RECURSION flag set, it is encapsulated in a helper function that does the recursion for it. Really, all the callbacks should have their own recursion protection for performance reasons. But they should not all implement their own. Move the recursion helpers to global headers, so that all callbacks can use them. Link: https://lkml.kernel.org/r/20201028115612.460535535@goodmis.org Link: https://lkml.kernel.org/r/20201106023546.166456258@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06printk: remove unneeded dead-store assignmentLukas Bulwahn
make clang-analyzer on x86_64 defconfig caught my attention with: kernel/printk/printk_ringbuffer.c:885:3: warning: Value stored to 'desc' is never read [clang-analyzer-deadcode.DeadStores] desc = to_desc(desc_ring, head_id); ^ Commit b6cf8b3f3312 ("printk: add lockless ringbuffer") introduced desc_reserve() with this unneeded dead-store assignment. As discussed with John Ogness privately, this is probably just some minor left-over from previous iterations of the ringbuffer implementation. So, simply remove this unneeded dead assignment to make clang-analyzer happy. As compilers will detect this unneeded assignment and optimize this anyway, the resulting object code is identical before and after this change. No functional change. No change to object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20201106034005.18822-1-lukas.bulwahn@gmail.com
2020-11-05bpf: Lift hashtab key_size limitFlorian Lehner
Currently key_size of hashtab is limited to MAX_BPF_STACK. As the key of hashtab can also be a value from a per cpu map it can be larger than MAX_BPF_STACK. The use-case for this patch originates to implement allow/disallow lists for files and file paths. The maximum length of file paths is defined by PATH_MAX with 4096 chars including nul. This limit exceeds MAX_BPF_STACK. Changelog: v5: - Fix cast overflow v4: - Utilize BPF skeleton in tests - Rebase v3: - Rebase v2: - Add a test for bpf side Signed-off-by: Florian Lehner <dev@der-flo.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201029201442.596690-1-dev@der-flo.net
2020-11-05bpf: Zero-fill re-used per-cpu map elementDavid Verbeiren
Zero-fill element values for all other cpus than current, just as when not using prealloc. This is the only way the bpf program can ensure known initial values for all cpus ('onallcpus' cannot be set when coming from the bpf program). The scenario is: bpf program inserts some elements in a per-cpu map, then deletes some (or userspace does). When later adding new elements using bpf_map_update_elem(), the bpf program can only set the value of the new elements for the current cpu. When prealloc is enabled, previously deleted elements are re-used. Without the fix, values for other cpus remain whatever they were when the re-used entry was previously freed. A selftest is added to validate correct operation in above scenario as well as in case of LRU per-cpu map element re-use. Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Signed-off-by: David Verbeiren <david.verbeiren@tessares.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201104112332.15191-1-david.verbeiren@tessares.net
2020-11-05bpf: BPF_PRELOAD depends on BPF_SYSCALLRandy Dunlap
Fix build error when BPF_SYSCALL is not set/enabled but BPF_PRELOAD is by making BPF_PRELOAD depend on BPF_SYSCALL. ERROR: modpost: "bpf_preload_ops" [kernel/bpf/preload/bpf_preload.ko] undefined! Reported-by: kernel test robot <lkp@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201105195109.26232-1-rdunlap@infradead.org