summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-03-26srcu: Check for in-flight callbacks in _cleanup_srcu_struct()Paul E. McKenney
If someone fails to drain the corresponding SRCU callbacks (for example, by failing to invoke srcu_barrier()) before invoking either cleanup_srcu_struct() or cleanup_srcu_struct_quiesced(), the resulting diagnostic is an ambiguous use-after-free diagnostic, and even then only if you are running something like KASAN. This commit therefore improves SRCU diagnostics by adding checks for in-flight callbacks at _cleanup_srcu_struct() time. Note that these diagnostics can still be defeated, for example, by invoking call_srcu() concurrently with cleanup_srcu_struct(). Which is a really bad idea, but sometimes all too easy to do. But even then, these diagnostics have at least some probability of catching the problem. Reported-by: Sagi Grimberg <sagi@grimberg.me> Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Tested-by: Bart Van Assche <bvanassche@acm.org>
2019-03-26rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_specialPaul E. McKenney
The task_struct structure's ->rcu_read_unlock_special field is only ever read or written by the owning task, but it is accessed both at process and interrupt levels. It may therefore be accessed using plain reads and writes while interrupts are disabled, but must be accessed using READ_ONCE() and WRITE_ONCE() or better otherwise. This commit makes a few adjustments to align with this discipline. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Fix typo in tree_exp.h commentPaul E. McKenney
This commit changes a rcu_exp_handler() comment from rcu_preempt_defer_qs() to rcu_preempt_deferred_qs() in order to better match reality. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Eliminate redundant NULL-pointer checkPaul E. McKenney
Because rcu_wake_cond() checks for a null task_struct pointer, there is no need for its callers to do so. This commit eliminates the redundant check. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Fix force_qs_rnp() header commentZhouyi Zhou
Previously, threads blocked on offlining CPUS were migrated to the root rcu_node structure, thus requiring RCU priority boosting on this structure. However, since commit d19fb8d1f3f6 ("rcu: Don't migrate blocked tasks even if all corresponding CPUs offline"), RCU does not migrate blocked tasks. Consequently, RCU no longer does RCU priority boosting on the root rcu_node structure as of commit 1be0085b515e ("rcu: Don't initiate RCU priority boosting on root rcu_node"). This commit therefore brings comments for the force_qs_rnp() function's header comment in line with this new no-root-boosting reality. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> [ paulmck: Also remove obsolete comment on suppressing new grace periods. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Update jiffies_to_sched_qs and adjust_jiffies_till_sched_qs() commentsPaul E. McKenney
This commit better documents the jiffies_to_sched_qs default-value strategy used by adjust_jiffies_till_sched_qs() Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Default jiffies_to_sched_qs to jiffies_till_sched_qsNeeraj Upadhyay
The current code only calls adjust_jiffies_till_sched_qs() if jiffies_till_sched_qs is left at its default value, so when the jiffies_till_sched_qs kernel-boot parameter actually is specified, jiffies_to_sched_qs will be left with the value zero, which will result in useless slowdowns of cond_resched(). This commit therefore changes rcu_init_geometry() to unconditionally invoke adjust_jiffies_till_sched_qs(), which ensures that jiffies_to_sched_qs will be initialized in all cases, thus maintaining good cond_resched() performance. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Fix self-wakeups for grace-period kthreadNeeraj Upadhyay
The current rcu_gp_kthread_wake() function uses in_interrupt() and thus does a self-wakeup from all interrupt contexts, including the pointless case where the GP kthread happens to be running with bottom halves disabled, along with the impossible case where the GP kthread is running within an NMI handler (you are not supposed to invoke rcu_gp_kthread_wake() from within an NMI handler. This commit therefore replaces the in_interrupt() with in_irq(), so that the self-wakeups happen only from handlers for hardware interrupts and softirqs. This also makes the code match the comment. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26rcu: Report error for bad rcu_nocbs= parameter valuesPaul E. McKenney
This commit prints a console message when cpulist_parse() reports a bad list of CPUs, and sets all CPUs' bits in that case. The reason for setting all CPUs' bits is that this is the safe(r) choice for real-time workloads, which would normally be the ones using the rcu_nocbs= kernel boot parameter. Either way, later RCU console log messages list the actual set of CPUs whose RCU callbacks will be offloaded. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Allow rcu_nocbs= to specify all CPUsPaul E. McKenney
Currently, the rcu_nocbs= kernel boot parameter requires that a specific list of CPUs be specified, and has no way to say "all of them". As noted by user RavFX in a comment to Phoronix topic 1002538, this is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot parameter to be given the string "all", as in "rcu_nocbs=all" to specify that all CPUs on the system are to have their RCU callbacks offloaded. Another approach would be to make cpulist_parse() check for "all", but there are uses of cpulist_parse() that do other checking, which could conflict with an "all". This commit therefore focuses on the specific use of cpulist_parse() in rcu_nocb_setup(). Just a note to other people who would like changes to Linux-kernel RCU: If you send your requests to me directly, they might get fixed somewhat faster. RavFX's comment was posted on January 22, 2018 and I first saw it on March 5, 2019. And the only reason that I found it -at- -all- was that I was looking for projects using RCU, and my search engine showed me that Phoronix comment quite by accident. Your choice, though! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Move common code out of if-else blockAkira Yokosawa
As the result of recent addition of "rdp->core_needs_qs = false;" in the "if" block, now both branches of the if-else have the same assignment. Factor it out and reduce line count. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-03-26rcu: Set rcutree.kthread_prio sysfs access to read-onlyLiu Song
The rcutree.kthread_prio kernel-boot parameter is used to set the priority for boost (rcub), per-CPU (rcuc), and grace-period (rcu_preempt or rcu_sched) kthreads. It is also used by rcutorture to check whether it is possible to meaningfully test RCU priority boosting. However, all of these cases will either ignore or be confused by any post-boot changes to rcutree.kthread_prio. Note that the user really can change the priorities of all of these kthreads using chrt, given sufficient privileges. Therefore, the read-write nature of sysfs access to rcutree.kthread_prio is thus at best an attractive nuisance. This commit therefore changes sysfs access to rcutree.kthread_prio to be read-only. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Make exit_rcu() handle non-preempted RCU readersPaul E. McKenney
The purpose of exit_rcu() is to handle cases where buggy code causes a task to exit within an RCU read-side critical section. It currently does that in the case where said RCU read-side critical section was preempted at least once, but fails to handle cases where preemption did not occur. This case needs to be handled because otherwise the final context switch away from the exiting task will incorrectly behave as if task exit were instead a preemption of an RCU read-side critical section, and will therefore queue the exiting task. The exiting task will have exited, and thus won't ever execute rcu_read_unlock(), which means that it will remain queued forever, blocking all subsequent grace periods, and eventually resulting in OOM. Although this is arguably better than letting grace periods proceed and having a later rcu_read_unlock() access the now-freed task structure that once belonged to the exiting tasks, it would obviously be better to correctly handle this case. This commit therefore sets ->rcu_read_lock_nesting to 1 in that case, so that the subsequence call to __rcu_read_unlock() causes the exiting task to exit its dangling RCU read-side critical section. Note that deferred quiescent states need not be considered. The reason is that removing the task from the ->blkd_tasks[] list in the call to rcu_preempt_deferred_qs() handles the per-task component of any deferred quiescent state, and all other components of any deferred quiescent state are associated with the CPU, which isn't going anywhere until some later CPU-hotplug operation, which will report any remaining deferred quiescent states from within the rcu_report_dead() function. Note also that negative values of ->rcu_read_lock_nesting need not be considered. First, these won't show up in exit_rcu() unless there is a serious bug in RCU, and second, setting ->rcu_read_lock_nesting sets the state so that the RCU read-side critical section will be exited normally. Again, this code has no effect unless there has been some prior bug that prevents a task from leaving an RCU read-side critical section before exiting. Furthermore, there have been no reports of the bug fixed by this commit appearing in production. This commit is therefore absolutely -not- recommended for backporting to -stable. Reported-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in> Reported-by: BHARATH Y MOURYA <bharathm@iisc.ac.in> Reported-by: Aravinda Prasad <aravinda@iisc.ac.in> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Tested-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in>
2019-03-26rcu: rcu_qs -- Use raise_softirq_irqoff to not save irqs twiceCyrill Gorcunov
The rcu_qs is disabling IRQs by self so no need to do the same in raise_softirq but instead we can save some cycles using raise_softirq_irqoff directly. CC: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Avoid unnecessary softirq when system is idleJoel Fernandes (Google)
When there are no callbacks pending on an idle system, I noticed that RCU softirq is continuously firing. During this the cpu_no_qs is set to false, and core_needs_qs is set to true indefinitely. This causes rcu_process_callbacks to be repeatedly called, even though the node corresponding to the CPU has that CPU's mask bit cleared and the system is idle. I believe the race is when such mask clearing is done during idle CPU scan of the quiescent state forcing stage in the kthread instead of the softirq. Since the rnp mask is cleared, but the flags on the CPU's rdp are not cleared, the CPU thinks it still needs to report to core RCU. Cure this by clearing the core_needs_qs flag when the CPU detects that its node is already updated which will avoid the unwanted softirq raises to the benefit of real-time systems. Test: Ran rcutorture for various tree RCU configs. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26rcu: Unconditionally expedite during suspend/hibernatePaul E. McKenney
The rcu_pm_notify() function refuses to switch to/from expedited grace periods on systems with more than 256 CPUs due to the serialized initialization of expedited grace periods. However, expedited grace periods are now initialized in parallel, removing this concern. This commit therefore removes the checks from rcu_pm_notify(), so that expedited grace periods are used unconditionally during suspend/resume and hibernate/wake operations. As always, real-time workloads wishing to completely avoid expedited grace periods can use the rcupdate.rcu_normal= kernel parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26bpf: remove incorrect 'verifier bug' warningPaul Chaignon
The BPF verifier checks the maximum number of call stack frames twice, first in the main CFG traversal (do_check) and then in a subsequent traversal (check_max_stack_depth). If the second check fails, it logs a 'verifier bug' warning and errors out, as the number of call stack frames should have been verified already. However, the second check may fail without indicating a verifier bug: if the excessive function calls reside in dead code, the main CFG traversal may not visit them; the subsequent traversal visits all instructions, including dead code. This case raises the question of how invalid dead code should be treated. This patch implements the conservative option and rejects such code. Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Tested-by: Xiao Han <xiao.han@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-26ftrace: Fix warning using plain integer as NULL & spelling correctionsHariprasad Kelam
Changed 0 --> NULL to avoid sparse warning Corrected spelling mistakes reported by checkpatch.pl Sparse warning below: sudo make C=2 CF=-D__CHECK_ENDIAN__ M=kernel/trace CHECK kernel/trace/ftrace.c kernel/trace/ftrace.c:3007:24: warning: Using plain integer as NULL pointer kernel/trace/ftrace.c:4758:37: warning: Using plain integer as NULL pointer Link: http://lkml.kernel.org/r/20190323183523.GA2244@hari-Inspiron-1545 Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26tracing: initialize variable in create_dyn_event()Frank Rowand
Fix compile warning in create_dyn_event(): 'ret' may be used uninitialized in this function [-Wuninitialized]. Link: http://lkml.kernel.org/r/1553237900-8555-1-git-send-email-frowand.list@gmail.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 5448d44c3855 ("tracing: Add unified dynamic event framework") Signed-off-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26tracing: Remove unnecessary var_ref destroy in track_data_destroy()Tom Zanussi
Commit 656fe2ba85e8 (tracing: Use hist trigger's var_ref array to destroy var_refs) centralized the destruction of all the var_refs in one place so that other code didn't have to do it. The track_data_destroy() added later ignored that and also destroyed the track_data var_ref, causing a double-free error flagged by KASAN. ================================================================== BUG: KASAN: use-after-free in destroy_hist_field+0x30/0x70 Read of size 8 at addr ffff888086df2210 by task bash/1694 CPU: 6 PID: 1694 Comm: bash Not tainted 5.1.0-rc1-test+ #15 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: dump_stack+0x71/0xa0 ? destroy_hist_field+0x30/0x70 print_address_description.cold.3+0x9/0x1fb ? destroy_hist_field+0x30/0x70 ? destroy_hist_field+0x30/0x70 kasan_report.cold.4+0x1a/0x33 ? __kasan_slab_free+0x100/0x150 ? destroy_hist_field+0x30/0x70 destroy_hist_field+0x30/0x70 track_data_destroy+0x55/0xe0 destroy_hist_data+0x1f0/0x350 hist_unreg_all+0x203/0x220 event_trigger_open+0xbb/0x130 do_dentry_open+0x296/0x700 ? stacktrace_count_trigger+0x30/0x30 ? generic_permission+0x56/0x200 ? __x64_sys_fchdir+0xd0/0xd0 ? inode_permission+0x55/0x200 ? security_inode_permission+0x18/0x60 path_openat+0x633/0x22b0 ? path_lookupat.isra.50+0x420/0x420 ? __kasan_kmalloc.constprop.12+0xc1/0xd0 ? kmem_cache_alloc+0xe5/0x260 ? getname_flags+0x6c/0x2a0 ? do_sys_open+0x149/0x2b0 ? do_syscall_64+0x73/0x1b0 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ? _raw_write_lock_bh+0xe0/0xe0 ? __kernel_text_address+0xe/0x30 ? unwind_get_return_address+0x2f/0x50 ? __list_add_valid+0x2d/0x70 ? deactivate_slab.isra.62+0x1f4/0x5a0 ? getname_flags+0x6c/0x2a0 ? set_track+0x76/0x120 do_filp_open+0x11a/0x1a0 ? may_open_dev+0x50/0x50 ? _raw_spin_lock+0x7a/0xd0 ? _raw_write_lock_bh+0xe0/0xe0 ? __alloc_fd+0x10f/0x200 do_sys_open+0x1db/0x2b0 ? filp_open+0x50/0x50 do_syscall_64+0x73/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fa7b24a4ca2 Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 85 7a 0d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007fffbafb3af0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 000055d3648ade30 RCX: 00007fa7b24a4ca2 RDX: 0000000000000241 RSI: 000055d364a55240 RDI: 00000000ffffff9c RBP: 00007fffbafb3bf0 R08: 0000000000000020 R09: 0000000000000002 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000003 R14: 0000000000000001 R15: 000055d364a55240 ================================================================== So remove the track_data_destroy() destroy_hist_field() call for that var_ref. Link: http://lkml.kernel.org/r/1deffec420f6a16d11dd8647318d34a66d1989a9.camel@linux.intel.com Fixes: 466f4528fbc69 ("tracing: Generalize hist trigger onmax and save action") Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26bpf: fix use after free in bpf_evict_inodeDaniel Borkmann
syzkaller was able to generate the following UAF in bpf: BUG: KASAN: use-after-free in lookup_last fs/namei.c:2269 [inline] BUG: KASAN: use-after-free in path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318 Read of size 1 at addr ffff8801c4865c47 by task syz-executor2/9423 CPU: 0 PID: 9423 Comm: syz-executor2 Not tainted 4.20.0-rc1-next-20181109+ #110 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 lookup_last fs/namei.c:2269 [inline] path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318 filename_lookup+0x26a/0x520 fs/namei.c:2348 user_path_at_empty+0x40/0x50 fs/namei.c:2608 user_path include/linux/namei.h:62 [inline] do_mount+0x180/0x1ff0 fs/namespace.c:2980 ksys_mount+0x12d/0x140 fs/namespace.c:3258 __do_sys_mount fs/namespace.c:3272 [inline] __se_sys_mount fs/namespace.c:3269 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3269 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457569 Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fde6ed96c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000457569 RDX: 0000000020000040 RSI: 0000000020000000 RDI: 0000000000000000 RBP: 000000000072bf00 R08: 0000000020000340 R09: 0000000000000000 R10: 0000000000200000 R11: 0000000000000246 R12: 00007fde6ed976d4 R13: 00000000004c2c24 R14: 00000000004d4990 R15: 00000000ffffffff Allocated by task 9424: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 __do_kmalloc mm/slab.c:3722 [inline] __kmalloc_track_caller+0x157/0x760 mm/slab.c:3737 kstrdup+0x39/0x70 mm/util.c:49 bpf_symlink+0x26/0x140 kernel/bpf/inode.c:356 vfs_symlink+0x37a/0x5d0 fs/namei.c:4127 do_symlinkat+0x242/0x2d0 fs/namei.c:4154 __do_sys_symlink fs/namei.c:4173 [inline] __se_sys_symlink fs/namei.c:4171 [inline] __x64_sys_symlink+0x59/0x80 fs/namei.c:4171 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9425: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xcf/0x230 mm/slab.c:3817 bpf_evict_inode+0x11f/0x150 kernel/bpf/inode.c:565 evict+0x4b9/0x980 fs/inode.c:558 iput_final fs/inode.c:1550 [inline] iput+0x674/0xa90 fs/inode.c:1576 do_unlinkat+0x733/0xa30 fs/namei.c:4069 __do_sys_unlink fs/namei.c:4110 [inline] __se_sys_unlink fs/namei.c:4108 [inline] __x64_sys_unlink+0x42/0x50 fs/namei.c:4108 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe In this scenario path lookup under RCU is racing with the final unlink in case of symlinks. As Linus puts it in his analysis: [...] We actually RCU-delay the inode freeing itself, but when we do the final iput(), the "evict()" function is called synchronously. Now, the simple fix would seem to just RCU-delay the kfree() of the symlink data in bpf_evict_inode(). Maybe that's the right thing to do. [...] Al suggested to piggy-back on the ->destroy_inode() callback in order to implement RCU deferral there which can then kfree() the inode->i_link eventually right before putting inode back into inode cache. By reusing free_inode_nonrcu() from there we can avoid the need for our own inode cache and just reuse generic one as we currently do. And in-fact on top of all this we should just get rid of the bpf_evict_inode() entirely. This means truncate_inode_pages_final() and clear_inode() will then simply be called by the fs core via evict(). Dropping the reference should really only be done when inode is unhashed and nothing reachable anymore, so it's better also moved into the final ->destroy_inode() callback. Fixes: 0f98621bef5d ("bpf, inode: add support for symlinks and fix mtime/ctime") Reported-by: syzbot+fb731ca573367b7f6564@syzkaller.appspotmail.com Reported-by: syzbot+a13e5ead792d6df37818@syzkaller.appspotmail.com Reported-by: syzbot+7a8ba368b47fdefca61e@syzkaller.appspotmail.com Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/lkml/0000000000006946d2057bbd0eef@google.com/T/
2019-03-24genirq: Prevent use-after-free and work list corruptionPrasad Sodagudi
When irq_set_affinity_notifier() replaces the notifier, then the reference count on the old notifier is dropped which causes it to be freed. But nothing ensures that the old notifier is not longer queued in the work list. If it is queued this results in a use after free and possibly in work list corruption. Ensure that the work is canceled before the reference is dropped. Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: marc.zyngier@arm.com Link: https://lkml.kernel.org/r/1553439424-6529-1-git-send-email-psodagud@codeaurora.org
2019-03-24timer/trace: Improve timer tracingAnna-Maria Gleixner
Timers are added to the timer wheel off by one. This is required in case a timer is queued directly before incrementing jiffies to prevent early timer expiry. When reading a timer trace and relying only on the expiry time of the timer in the timer_start trace point and on the now in the timer_expiry_entry trace point, it seems that the timer fires late. With the current timer_expiry_entry trace point information only now=jiffies is printed but not the value of base->clk. This makes it impossible to draw a conclusion to the index of base->clk and makes it impossible to examine timer problems without additional trace points. Therefore add the base->clk value to the timer_expire_entry trace point, to be able to calculate the index the timer base is located at during collecting expired timers. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20190321120921.16463-5-anna-maria@linutronix.de
2019-03-24timer: Move trace point to get proper indexAnna-Maria Gleixner
When placing the timer_start trace point before the timer wheel bucket index is calculated, the index information in the trace point is useless. It is not possible to simply move the debug_activate() call after the index calculation, because debug_object_activate() needs to be called before touching the object. Therefore split debug_activate() and move the trace point into enqueue_timer() after the new index has been calculated. The debug_object_activate() call remains at the original place. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20190321120921.16463-3-anna-maria@linutronix.de
2019-03-24tick/sched: Update tick_sched struct documentationAnna-Maria Gleixner
Adapt the documentation order of struct members to the effective order of struct members and add missing descriptions. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/20190321120921.16463-2-anna-maria@linutronix.de
2019-03-24Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: "Third more careful attempt for this set of fixes: - Prevent a 32bit math overflow in the cpufreq code - Fix a buffer overflow when scanning the cgroup2 cpu.max property - A set of fixes for the NOHZ scheduler logic to prevent waking up CPUs even if the capacity of the busy CPUs is sufficient along with other tweaks optimizing the behaviour for asymmetric systems (big/little)" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Skip LLC NOHZ logic for asymmetric systems sched/fair: Tune down misfit NOHZ kicks sched/fair: Comment some nohz_balancer_kick() kick conditions sched/core: Fix buffer overflow in cgroup2 property cpu.max sched/cpufreq: Fix 32-bit math overflow
2019-03-24Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "A larger set of perf updates. Not all of them are strictly fixes, but that's solely the tip maintainers fault as they let the timely -rc1 pull request fall through the cracks for various reasons including travel. So I'm sending this nevertheless because rebasing and distangling fixes and updates would be a mess and risky as well. As of tomorrow, a strict fixes separation is happening again. Sorry for the slip-up. Kernel: - Handle RECORD_MMAP vs. RECORD_MMAP2 correctly so different consumers of the mmap event get what they requested. Tools: - A larger set of updates to perf record/report/scripts vs. time stamp handling - More Python3 fixups - A pile of memory leak plumbing - perf BPF improvements and fixes - Finalize the perf.data directory storage" [ Note: the kernel part is strictly a fix, the updates are purely to tooling - Linus ] * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) perf bpf: Show more BPF program info in print_bpf_prog_info() perf bpf: Extract logic to create program names from perf_event__synthesize_one_bpf_prog() perf tools: Save bpf_prog_info and BTF of new BPF programs perf evlist: Introduce side band thread perf annotate: Enable annotation of BPF programs perf build: Check what binutils's 'disassembler()' signature to use perf bpf: Process PERF_BPF_EVENT_PROG_LOAD for annotation perf symbols: Introduce DSO_BINARY_TYPE__BPF_PROG_INFO perf feature detection: Add -lopcodes to feature-libbfd perf top: Add option --no-bpf-event perf bpf: Save BTF information as headers to perf.data perf bpf: Save BTF in a rbtree in perf_env perf bpf: Save bpf_prog_info information as headers to perf.data perf bpf: Save bpf_prog_info in a rbtree in perf_env perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead of perf_tool perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear() bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump() tools lib bpf: Introduce bpf_program__get_prog_info_linear() perf record: Replace option --bpf-event with --no-bpf-event perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test() ...
2019-03-24Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A set of small fixes plus the removal of stale board support code: - Remove the board support code from the clpx711x clocksource driver. This change had fallen through the cracks and I'm sending it now rather than dealing with people who want to improve that stale code for 3 month. - Use the proper clocksource mask on RICSV - Make local scope functions and variables static" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/clps711x: Remove board support clocksource/drivers/riscv: Fix clocksource mask clocksource/drivers/mips-gic-timer: Make gic_compare_irqaction static clocksource/drivers/timer-ti-dm: Make omap_dm_timer_set_load_start() static clocksource/drivers/tcb_clksrc: Make tc_clksrc_suspend/resume() static clocksource/drivers/clps711x: Make clps711x_clksrc_init() static time/jiffies: Make refined_jiffies static
2019-03-24Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two small fixes: - Cure a recently introduces error path hickup which tries to unregister a not registered lockdep key in te workqueue code - Prevent unaligned cmpxchg() crashes in the robust list handling code by sanity checking the user space supplied futex pointer" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Ensure that futex address is aligned in handle_futex_death() workqueue: Only unregister a registered lockdep key
2019-03-24Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem: - Remove secondary GIC support on systems w/o device-tree support - A set of small fixlets in various irqchip drivers - static and fall-through annotations - Kernel doc and typo fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Mark expected switch case fall-through genirq/devres: Remove excess parameter from kernel doc irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static irqchip/mbigen: Don't clear eventid when freeing an MSI irqchip/stm32: Don't set rising configuration registers at init irqchip/stm32: Don't clear rising/falling config registers at init dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support irqchip/mmp: Make mmp_irq_domain_ops static irqchip/brcmstb-l2: Make two init functions static genirq: Fix typo in comment of IRQD_MOVE_PCNTXT irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp irqchip/gic: Drop support for secondary GIC in non-DT systems irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
2019-03-23tick: Remove outgoing CPU from broadcast masksThomas Gleixner
Valentin reported that unplugging a CPU occasionally results in a warning in the tick broadcast code which is triggered when an offline CPU is in the broadcast mask. This happens because the outgoing CPU is not removing itself from the broadcast masks, especially not from the broadcast_force_mask. The removal happens on the control CPU after the outgoing CPU is dead. It's a long standing issue, but the warning is harmless. Rework the hotplug mechanism so that the outgoing CPU removes itself from the broadcast masks after disabling interrupts and removing itself from the online mask. Reported-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.de
2019-03-23genirq: Mark expected switch case fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. With -Wimplicit-fallthrough added to CFLAGS: kernel/irq/manage.c: In function ‘irq_do_set_affinity’: kernel/irq/manage.c:198:3: warning: this statement may fall through [-Wimplicit-fallthrough=] cpumask_copy(desc->irq_common_data.affinity, mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/irq/manage.c:199:2: note: here case IRQ_SET_MASK_OK_NOCOPY: ^~~~ Annotate it. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20190228213714.GA9246@embeddedor
2019-03-23timekeeping: Consistently use unsigned int for seqcount snapshotRasmus Villemoes
The timekeeping code uses a random mix of "unsigned long" and "unsigned int" for the seqcount snapshots (ratio 14:12). Since the seqlock.h API is entirely based on unsigned int, use that throughout. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190318195557.20773-1-linux@rasmusvillemoes.dk
2019-03-22Merge tag 'perf-core-for-mingo-5.1-20190311' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo: kernel: Stephane Eranian : - Restore mmap record type correctly when handling PERF_RECORD_MMAP2 events, as the same template is used for all the threads interested in mmap events, some may want just PERF_RECORD_MMAP, while some may want the extra info in MMAP2 records. perf probe: Adrian Hunter: - Fix getting the kernel map, because since changes related to x86 PTI entry trampolines handling, there are more than one kernel map. perf script: Andi Kleen: - Support insn output for normal samples, i.e.: perf script -F ip,sym,insn --xed Will fetch the sample IP from the thread address space and feed it to Intel's XED disassembler, producing lines such as: ffffffffa4068804 native_write_msr wrmsr ffffffffa415b95e __hrtimer_next_event_base movq 0x18(%rax), %rdx That match 'perf annotate's output. - Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in addition to PERF_RECORD_SAMPLE. perf report: - Add a new --samples option to save a small random number of samples per hist entry, using a reservoir technique to select a representative number of samples. Then allow browsing the samples using 'perf script' as part of the hist entry context menu. This automatically adds the right filters, so only the thread or CPU of the sample is displayed. Then we use less' search functionality to directly jump to the time stamp of the selected sample. It uses different menus for assembler and source display. Assembler needs xed installed and source needs debuginfo. - Fix the UI browser scripts pop up menu when there are many scripts available. perf report: Andi Kleen: - Add 'time' sort option. E.g.: % perf report --sort time,overhead,symbol --time-quantum 1ms --stdio ... 0.67% 277061.87300 [.] _dl_start 0.50% 277061.87300 [.] f1 0.50% 277061.87300 [.] f2 0.33% 277061.87300 [.] main 0.29% 277061.87300 [.] _dl_lookup_symbol_x 0.29% 277061.87300 [.] dl_main 0.29% 277061.87300 [.] do_lookup_x 0.17% 277061.87300 [.] _dl_debug_initialize 0.17% 277061.87300 [.] _dl_init_paths 0.08% 277061.87300 [.] check_match 0.04% 277061.87300 [.] _dl_count_modids 1.33% 277061.87400 [.] f1 1.33% 277061.87400 [.] f2 1.33% 277061.87400 [.] main 1.17% 277061.87500 [.] main 1.08% 277061.87500 [.] f1 1.08% 277061.87500 [.] f2 1.00% 277061.87600 [.] main 0.83% 277061.87600 [.] f1 0.83% 277061.87600 [.] f2 1.00% 277061.87700 [.] main tools headers: Arnaldo Carvalho de Melo: - Update x86's syscall_64.tbl, no change in tools/perf behaviour. - Sync copies asm-generic/unistd.h and linux/in with the kernel sources. perf data: Jiri Olsa: - Prep work to support having perf.data stored as a directory, with one file per CPU, that ultimately will allow having one ring buffer reading thread per CPU. Vendor events: Martin Liška: - perf PMU events for AMD Family 17h. perf script python: Tony Jones: - Add python3 support for the remaining Intel PT related scripts, with these we should have a clean build of perf with python3 while still supporting the build with python2. libbpf: Arnaldo Carvalho de Melo: - Fix the build on uCLibc, adding the missing stdarg.h since we use va_list in one typedef. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-22genetlink: make policy common to familyJohannes Berg
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22softirq: Remove tasklet_hrtimerThomas Gleixner
There are no more users of this interface. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Link: https://lkml.kernel.org/r/20190301224821.29843-4-bigeasy@linutronix.de
2019-03-22watchdog/core: Make variables staticValdis Kletnieks
sparse complains: CHECK kernel/watchdog.c kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available' was not declared. Should it be static? kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask' was not declared. Should it be static? They're not referenced by name from anyplace else, make them static. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/7855.1552383228@turing-police
2019-03-22time/jiffies: Make refined_jiffies staticValdis Kletnieks
sparse complains: CHECK kernel/time/jiffies.c kernel/time/jiffies.c:92:20: warning: symbol 'refined_jiffies' was not declared. Should it be static? Its only used in file scope. Make it static. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/32342.1552379915@turing-police
2019-03-22genirq/devres: Remove excess parameter from kernel docValdis Kletnieks
Building with 'make W=1' complains: CC kernel/irq/devres.o kernel/irq/devres.c:104: warning: Excess function parameter 'thread_fn' description in 'devm_request_any_context_irq' Remove it. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/31207.1552378676@turing-police
2019-03-22futex: Ensure that futex address is aligned in handle_futex_death()Chen Jie
The futex code requires that the user space addresses of futexes are 32bit aligned. sys_futex() checks this in futex_get_keys() but the robust list code has no alignment check in place. As a consequence the kernel crashes on architectures with strict alignment requirements in handle_futex_death() when trying to cmpxchg() on an unaligned futex address which was retrieved from the robust list. [ tglx: Rewrote changelog, proper sizeof() based alignement check and add comment ] Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core") Signed-off-by: Chen Jie <chenjie6@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <dvhart@infradead.org> Cc: <peterz@infradead.org> Cc: <zengweilin@huawei.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.com
2019-03-21bpf: verifier: propagate liveness on all framesJakub Kicinski
Commit 7640ead93924 ("bpf: verifier: make sure callees don't prune with caller differences") connected up parentage chains of all frames of the stack. It didn't, however, ensure propagate_liveness() propagates all liveness information along those chains. This means pruning happening in the callee may generate explored states with incomplete liveness for the chains in lower frames of the stack. The included selftest is similar to the prior one from commit 7640ead93924 ("bpf: verifier: make sure callees don't prune with caller differences"), where callee would prune regardless of the difference in r8 state. Now we also initialize r9 to 0 or 1 based on a result from get_random(). r9 is never read so the walk with r9 = 0 gets pruned (correctly) after the walk with r9 = 1 completes. The selftest is so arranged that the pruning will happen in the callee. Since callee does not propagate read marks of r8, the explored state at the pruning point prior to the callee will now ignore r8. Propagate liveness on all frames of the stack when pruning. Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21bpf: add skc_lookup_tcp helperLorenz Bauer
Allow looking up a sock_common. This gives eBPF programs access to timewait and request sockets. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21bpf: allow helpers to return PTR_TO_SOCK_COMMONLorenz Bauer
It's currently not possible to access timewait or request sockets from eBPF, since there is no way to return a PTR_TO_SOCK_COMMON from a helper. Introduce RET_PTR_TO_SOCK_COMMON to enable this behaviour. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21bpf: track references based on is_acquire_funcLorenz Bauer
So far, the verifier only acquires reference tracking state for RET_PTR_TO_SOCKET_OR_NULL. Instead of extending this for every new return type which desires these semantics, acquire reference tracking state iff the called helper is an acquire function. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21bpf: do not restore dst_reg when cur_state is freedXu Yu
Syzkaller hit 'KASAN: use-after-free Write in sanitize_ptr_alu' bug. Call trace: dump_stack+0xbf/0x12e print_address_description+0x6a/0x280 kasan_report+0x237/0x360 sanitize_ptr_alu+0x85a/0x8d0 adjust_ptr_min_max_vals+0x8f2/0x1ca0 adjust_reg_min_max_vals+0x8ed/0x22e0 do_check+0x1ca6/0x5d00 bpf_check+0x9ca/0x2570 bpf_prog_load+0xc91/0x1030 __se_sys_bpf+0x61e/0x1f00 do_syscall_64+0xc8/0x550 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fault injection trace:  kfree+0xea/0x290  free_func_state+0x4a/0x60  free_verifier_state+0x61/0xe0  push_stack+0x216/0x2f0 <- inject failslab  sanitize_ptr_alu+0x2b1/0x8d0  adjust_ptr_min_max_vals+0x8f2/0x1ca0  adjust_reg_min_max_vals+0x8ed/0x22e0  do_check+0x1ca6/0x5d00  bpf_check+0x9ca/0x2570  bpf_prog_load+0xc91/0x1030  __se_sys_bpf+0x61e/0x1f00  do_syscall_64+0xc8/0x550  entry_SYSCALL_64_after_hwframe+0x49/0xbe When kzalloc() fails in push_stack(), free_verifier_state() will free current verifier state. As push_stack() returns, dst_reg was restored if ptr_is_dst_reg is false. However, as member of the cur_state, dst_reg is also freed, and error occurs when dereferencing dst_reg. Simply fix it by testing ret of push_stack() before restoring dst_reg. Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-21workqueue: Only unregister a registered lockdep keyBart Van Assche
The recent change to prevent use after free and a memory leak introduced an unconditional call to wq_unregister_lockdep() in the error handling path. If the lockdep key had not been registered yet, then the lockdep core emits a warning. Only call wq_unregister_lockdep() if wq_register_lockdep() has been called first. Fixes: 009bb421b6ce ("workqueue, lockdep: Fix an alloc_workqueue() error path") Reported-by: syzbot+be0c198232f86389c3dd@syzkaller.appspotmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Qian Cai <cai@lca.pw> Link: https://lkml.kernel.org/r/20190311230255.176081-1-bvanassche@acm.org
2019-03-20bpf: Only print ref_obj_id for refcounted regMartin KaFai Lau
Naresh reported that test_align fails because of the mismatch at the verbose printout of the register states. The reason is due to the newly added ref_obj_id. ref_obj_id is only useful for refcounted reg. Thus, this patch fixes it by only printing ref_obj_id for refcounted reg. While at it, it also uses comma instead of space to separate between "id" and "ref_obj_id". Fixes: 1b986589680a ("bpf: Fix bpf_tcp_sock and bpf_sk_fullsock issue related to bpf_sk_release") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-20syscall_get_arch: add "struct task_struct *" argumentDmitry V. Levin
This argument is required to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request: syscall_get_arch() is going to be called from ptrace_request() along with syscall_get_nr(), syscall_get_arguments(), syscall_get_error(), and syscall_get_return_value() functions with a tracee as their argument. The primary intent is that the triple (audit_arch, syscall_nr, arg1..arg6) should describe what system call is being called and what its arguments are. Reverts: 5e937a9ae913 ("syscall_get_arch: remove useless function arguments") Reverts: 1002d94d3076 ("syscall.h: fix doc text for syscall_get_arch()") Reviewed-by: Andy Lutomirski <luto@kernel.org> # for x86 Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Kees Cook <keescook@chromium.org> # seccomp parts Acked-by: Mark Salter <msalter@redhat.com> # for the c6x bit Cc: Elvira Khabirova <lineprinter@altlinux.org> Cc: Eugene Syromyatnikov <esyr@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: x86@kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Cc: linux-audit@redhat.com Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-03-20audit: Make audit_log_cap and audit_copy_inode staticYueHaibing
Fix sparse warning: kernel/auditsc.c:1150:6: warning: symbol 'audit_log_cap' was not declared. Should it be static? kernel/auditsc.c:1908:6: warning: symbol 'audit_copy_inode' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-03-20audit: connect LOGIN record to its syscall recordRichard Guy Briggs
Currently the AUDIT_LOGIN event is a standalone record that isn't connected to any other records that may be part of its syscall event. To avoid the confusion of generating two events, connect the records by using its syscall context. Please see the github issue https://github.com/linux-audit/audit-kernel/issues/110 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>