summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2018-07-25sched/debug: Show the sum wait time of a task groupYun Wang
Although we can rely on cpuacct to present the CPU usage of task groups, it is hard to tell how intense the competition is between these groups on CPU resources. Monitoring the wait time or sched_debug of each process could be very expensive, and there is no good way to accurately represent the conflict with these info, we need the wait time on group dimension. Thus we introduce group's wait_sum to represent the resource conflict between task groups, which is simply the sum of the wait time of the group's cfs_rq. The 'cpu.stat' is modified to show the statistic, like: nr_periods 0 nr_throttled 0 throttled_time 0 wait_sum 2035098795584 Now we can monitor the changes of wait_sum to tell how much a a task group is suffering in the fight of CPU resources. For example: (wait_sum - last_wait_sum) * 100 / (nr_cpu * period_ns) == X% means the task group paid X percentage of period on waiting for the CPU. Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ff7dae3b-e5f9-7157-1caa-ff02c6b23dc1@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/fair: Remove #ifdefs from scale_rt_capacity()Vincent Guittot
Reuse cpu_util_irq() that has been defined for schedutil and set irq util to 0 when !CONFIG_IRQ_TIME_ACCOUNTING. But the compiler is not able to optimize the sequence (at least with aarch64 GCC 7.2.1): free *= (max - irq); free /= max; when irq is fixed to 0 Add a new inline function scale_irq_capacity() that will scale utilization when irq is accounted. Reuse this funciton in schedutil which applies similar formula. Suggested-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1532001606-6689-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25Merge branch 'sched/urgent' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHAREHailong Liu
NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough runtime with a spin-rt-task. However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd enough rt_runtime at the beginning, rt_runtime can't be restored to its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE. E.g. on my PC with 4 cores, procedure to reproduce: 1) Make sure RT_RUNTIME_SHARE is enabled cat /sys/kernel/debug/sched_features GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS 2) Start a spin-rt-task ./loop_rr & 3) set affinity to the last cpu taskset -p 8 $pid_of_loop_rr 4) Observe that last cpu have borrowed enough runtime. cat /proc/sched_debug | grep rt_runtime .rt_runtime : 950.000000 .rt_runtime : 900.000000 .rt_runtime : 950.000000 .rt_runtime : 1000.000000 5) Disable RT_RUNTIME_SHARE echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features 6) Observe that rt_runtime can not been restored cat /proc/sched_debug | grep rt_runtime .rt_runtime : 950.000000 .rt_runtime : 900.000000 .rt_runtime : 950.000000 .rt_runtime : 1000.000000 This patch help to restore rt_runtime after we disable RT_RUNTIME_SHARE. Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn> Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: zhong.weidong@zte.com.cn Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/deadline: Update rq_clock of later_rq when pushing a taskDaniel Bristot de Oliveira
Daniel Casini got this warn while running a DL task here at RetisLab: [ 461.137582] ------------[ cut here ]------------ [ 461.137583] rq->clock_update_flags < RQCF_ACT_SKIP [ 461.137599] WARNING: CPU: 4 PID: 2354 at kernel/sched/sched.h:967 assert_clock_updated.isra.32.part.33+0x17/0x20 [a ton of modules] [ 461.137646] CPU: 4 PID: 2354 Comm: label_image Not tainted 4.18.0-rc4+ #3 [ 461.137647] Hardware name: ASUS All Series/Z87-K, BIOS 0801 09/02/2013 [ 461.137649] RIP: 0010:assert_clock_updated.isra.32.part.33+0x17/0x20 [ 461.137649] Code: ff 48 89 83 08 09 00 00 eb c6 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 98 7a 6c a5 c6 05 bc 0d 54 01 01 48 89 e5 e8 a9 84 fb ff <0f> 0b 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 83 7e 60 01 74 0a 48 3b [ 461.137673] RSP: 0018:ffffa77e08cafc68 EFLAGS: 00010082 [ 461.137674] RAX: 0000000000000000 RBX: ffff8b3fc1702d80 RCX: 0000000000000006 [ 461.137674] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8b3fded164b0 [ 461.137675] RBP: ffffa77e08cafc68 R08: 0000000000000026 R09: 0000000000000339 [ 461.137676] R10: ffff8b3fd060d410 R11: 0000000000000026 R12: ffffffffa4e14e20 [ 461.137677] R13: ffff8b3fdec22940 R14: ffff8b3fc1702da0 R15: ffff8b3fdec22940 [ 461.137678] FS: 00007efe43ee5700(0000) GS:ffff8b3fded00000(0000) knlGS:0000000000000000 [ 461.137679] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 461.137680] CR2: 00007efe30000010 CR3: 0000000301744003 CR4: 00000000001606e0 [ 461.137680] Call Trace: [ 461.137684] push_dl_task.part.46+0x3bc/0x460 [ 461.137686] task_woken_dl+0x60/0x80 [ 461.137689] ttwu_do_wakeup+0x4f/0x150 [ 461.137690] ttwu_do_activate+0x77/0x80 [ 461.137692] try_to_wake_up+0x1d6/0x4c0 [ 461.137693] wake_up_q+0x32/0x70 [ 461.137696] do_futex+0x7e7/0xb50 [ 461.137698] __x64_sys_futex+0x8b/0x180 [ 461.137701] do_syscall_64+0x5a/0x110 [ 461.137703] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 461.137705] RIP: 0033:0x7efe4918ca26 [ 461.137705] Code: 00 00 00 74 17 49 8b 48 20 44 8b 59 10 41 83 e3 30 41 83 fb 20 74 1e be 85 00 00 00 41 ba 01 00 00 00 41 b9 01 00 00 04 0f 05 <48> 3d 01 f0 ff ff 73 1f 31 c0 c3 be 8c 00 00 00 49 89 c8 4d 31 d2 [ 461.137738] RSP: 002b:00007efe43ee4928 EFLAGS: 00000283 ORIG_RAX: 00000000000000ca [ 461.137739] RAX: ffffffffffffffda RBX: 0000000005094df0 RCX: 00007efe4918ca26 [ 461.137740] RDX: 0000000000000001 RSI: 0000000000000085 RDI: 0000000005094e24 [ 461.137741] RBP: 00007efe43ee49c0 R08: 0000000005094e20 R09: 0000000004000001 [ 461.137741] R10: 0000000000000001 R11: 0000000000000283 R12: 0000000000000000 [ 461.137742] R13: 0000000005094df8 R14: 0000000000000001 R15: 0000000000448a10 [ 461.137743] ---[ end trace 187df4cad2bf7649 ]--- This warning happened in the push_dl_task(), because __add_running_bw()->cpufreq_update_util() is getting the rq_clock of the later_rq before its update, which takes place at activate_task(). The fix then is to update the rq_clock before calling add_running_bw(). To avoid double rq_clock_update() call, we set ENQUEUE_NOCLOCK flag to activate_task(). Reported-by: Daniel Casini <daniel.casini@santannapisa.it> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it> Fixes: e0367b12674b sched/deadline: Move CPU frequency selection triggering points Link: http://lkml.kernel.org/r/ca31d073a4788acf0684a8b255f14fea775ccf20.1532077269.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25stop_machine: Disable preemption after queueing stopper threadsIsaac J. Manjarres
This commit: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") does not fully address the race condition that can occur as follows: On one CPU, call it CPU 3, thread 1 invokes cpu_stop_queue_two_works(2, 3,...), and the execution is such that thread 1 queues the works for migration/2 and migration/3, and is preempted after releasing the locks for migration/2 and migration/3, but before waking the threads. Then, On CPU 2, a kworker, call it thread 2, is running, and it invokes cpu_stop_queue_two_works(1, 2,...), such that thread 2 queues the works for migration/1 and migration/2. Meanwhile, on CPU 3, thread 1 resumes execution, and wakes migration/2 and migration/3. This means that when CPU 2 releases the locks for migration/1 and migration/2, but before it wakes those threads, it can be preempted by migration/2. If thread 2 is preempted by migration/2, then migration/2 will execute the first work item successfully, since migration/3 was woken up by CPU 3, but when it goes to execute the second work item, it disables preemption, calls multi_cpu_stop(), and thus, CPU 2 will wait forever for migration/1, which should have been woken up by thread 2. However migration/1 cannot be woken up by thread 2, since it is a kworker, so it is affine to CPU 2, but CPU 2 is running migration/2 with preemption disabled, so thread 2 will never run. Disable preemption after queueing works for stopper threads to ensure that the operation of queueing the works and waking the stopper threads is atomic. Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org> Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: matt@codeblueprint.co.uk Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/topology: Check variable group before dereferencing itYi Wang
The 'group' variable in sched_domain_debug_one() is not checked when firstly used in cpumask_test_cpu(cpu, sched_group_span(group)), but it might be NULL (it is checked later in the following while loop) and may cause NULL pointer dereference. We need to check it before using to avoid NULL dereference. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: zhong.weidong@zte.com.cn Link: http://lkml.kernel.org/r/1532319547-33335-1-git-send-email-wang.yi59@zte.com.cn Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25locking/rtmutex: Allow specifying a subclass for nested lockingPeter Rosin
Needed for annotating rt_mutex locks. Tested-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Deepa Dinamani <deepadinamani@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Chang <dpf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Wolfram Sang <wsa@the-dreams.de> Link: http://lkml.kernel.org/r/20180720083914.1950-2-peda@axentia.se Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-24Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle stations tied to AP_VLANs properly during mac80211 hw reconfig. From Manikanta Pubbisetty. 2) Fix jump stack depth validation in nf_tables, from Taehee Yoo. 3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran Ben Elisha. 4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann. 5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan. 6) Fix cached netdev name leak in nf_tables, from Florian Westphal. 7) Fix memory leaks on chain rename, also from Florian Westphal. 8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk Cheng. 9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing. 10) Fix link local address handling with VRF, from David Ahern. 11) Don't clobber 'err' on a successful call to __skb_linearize() in skb_segment(). From Eric Dumazet. 12) Fix vxlan fdb notification races, from Roopa Prabhu. 13) Hash UDP fragments consistently, from Paolo Abeni. 14) If TCP receives lots of out of order tiny packets, we do really silly stuff. Make the out-of-order queue ending more robust to this kind of behavior, from Eric Dumazet. 15) Don't leak netlink dump state in nf_tables, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: axienet: Fix double deregister of mdio qmi_wwan: fix interface number for DW5821e production firmware ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull bnx2x: Fix invalid memory access in rss hash config path. net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper r8169: restore previous behavior to accept BIOS WoL settings cfg80211: never ignore user regulatory hint sock: fix sg page frag coalescing in sk_alloc_sg netfilter: nf_tables: move dumper state allocation into ->start tcp: add tcp_ooo_try_coalesce() helper tcp: call tcp_drop() from tcp_data_queue_ofo() tcp: detect malicious patterns in tcp_collapse_ofo_queue() tcp: avoid collapses in tcp_prune_queue() if possible tcp: free batches of packets in tcp_prune_ofo_queue() ip: hash fragments consistently ipv6: use fib6_info_hold_safe() when necessary can: xilinx_can: fix power management handling can: xilinx_can: fix incorrect clear of non-processed interrupts can: xilinx_can: fix RX overflow interrupt not being enabled can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting ...
2018-07-24cpu/hotplug: detect SMT disabled by BIOSJosh Poimboeuf
If SMT is disabled in BIOS, the CPU code doesn't properly detect it. The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf' vulnerabilities file shows SMT as vulnerable. Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a case. Unfortunately the detection can only be done after bringing all the CPUs online, so we have to overwrite any previous writes to the variable. Reported-by: Joe Mario <jmario@redhat.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Fixes: f048c399e0f7 ("x86/topology: Provide topology_smt_supported()") Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2018-07-24bpf: btf: Ensure the member->offset is in the right orderMartin KaFai Lau
This patch ensures the member->offset of a struct is in the correct order (i.e the later member's offset cannot go backward). The current "pahole -J" BTF encoder does not generate something like this. However, checking this can ensure future encoder will not violate this. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-23mm, dev_pagemap: Do not clear ->mapping on final putDan Williams
MEMORY_DEVICE_FS_DAX relies on typical page semantics whereby ->mapping is only ever cleared by truncation, not final put. Without this fix dax pages may forget their mapping association at the end of every page pin event. Move this atypical behavior that HMM wants into the HMM ->page_free() callback. Cc: <stable@vger.kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: d2c997c0f145 ("fs, dax: use page->mapping...") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2018-07-23fork: Unconditionally exit if a fatal signal is pendingEric W. Biederman
In practice this does not change anything as testing for fatal_signal_pending and exiting for with an error code duplicates the work of the next clause which recalculates pending signals and then exits fork if any are pending. In both cases the pending signal will trigger the slow path when existing to userspace, and the fatal signal will cause do_exit to be called. The advantage of making this a separate test is that it makes it clear processing the fatal signal will terminate the fork, and it allows the rest of the signal logic to be updated without fear that this important case will be lost. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-23fork: Move and describe why the code examines PIDNS_ADDINGEric W. Biederman
Normally this would be something that would be handled by handling signals that are sent to a group of processes but in this case the forking process is not a member of the group being signaled. Thus special code is needed to prevent a race with pid namespaces exiting, and fork adding new processes within them. Move this test up before the signal restart just in case signals are also pending. Fatal conditions should take presedence over restarts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-23livepatch: Validate module/old func name lengthKamalesh Babulal
livepatch module author can pass module name/old function name with more than the defined character limit. With obj->name length greater than MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on the module specified by obj->name to be loaded. It also populates a /sys directory with an untruncated object name. In the case of funcs->old_name length greater then KSYM_NAME_LEN, it would not match against any of the symbol table entries. Instead loop through the symbol table comparing them against a nonexisting function, which can be avoided. The same issues apply, to misspelled/incorrect names. At least gatekeep the modules with over the limit string length, by checking for their length during livepatch module registration. Cc: stable@vger.kernel.org Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-07-21Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: a stop-machine preemption fix and a SCHED_DEADLINE fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix switched_from_dl() warning stop_machine: Disable preemption when waking two stopper threads
2018-07-21mm: make vm_area_alloc() initialize core fieldsLinus Torvalds
Like vm_area_dup(), it initializes the anon_vma_chain head, and the basic mm pointer. The rest of the fields end up being different for different users, although the plan is to also initialize the 'vm_ops' field to a dummy entry. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21mm: make vm_area_dup() actually copy the old vma dataLinus Torvalds
.. and re-initialize th eanon_vma_chain head. This removes some boiler-plate from the users, and also makes it clear why it didn't need use the 'zalloc()' version. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21mm: use helper functions for allocating and freeing vm_area structsLinus Torvalds
The vm_area_struct is one of the most fundamental memory management objects, but the management of it is entirely open-coded evertwhere, ranging from allocation and freeing (using kmem_cache_[z]alloc and kmem_cache_free) to initializing all the fields. We want to unify this in order to end up having some unified initialization of the vmas, and the first step to this is to at least have basic allocation functions. Right now those functions are literally just wrappers around the kmem_cache_*() calls. This is a purely mechanical conversion: # new vma: kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc() # copy old vma kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old) # free vma kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma) to the point where the old vma passed in to the vm_area_dup() function isn't even used yet (because I've left all the old manual initialization alone). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21signal: Push pid type down into complete_signal.Eric W. Biederman
This is the bottom and by pushing this down it simplifies the callers and otherwise leaves things as is. This is in preparation for allowing fork to implement better handling of signals set to groups of processes. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21signal: Push pid type down into __send_signalEric W. Biederman
This information is already available in the callers and by pushing it down it makes the code a little clearer, and allows implementing better handling of signales set to a group of processes in fork. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21signal: Push pid type down into send_signalEric W. Biederman
This information is already available in the callers and by pushing it down it makes the code a little clearer, and allows better group signal behavior in fork. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21signal: Pass pid type into do_send_sig_infoEric W. Biederman
This passes the information we already have at the call sight into do_send_sig_info. Ultimately allowing for better handling of signals sent to a group of processes during fork. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21signal: Pass pid type into group_send_sig_infoEric W. Biederman
This passes the information we already have at the call sight into group_send_sig_info. Ultimatelly allowing for to better handle signals sent to a group of processes. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21signal: Pass pid and pid type into send_sigqueueEric W. Biederman
Make the code more maintainable by performing more of the signal related work in send_sigqueue. A quick inspection of do_timer_create will show that this code path does not lookup a thread group by a thread's pid. Making it safe to find the task pointed to by it_pid with "pid_task(it_pid, type)"; This supports the changes needed in fork to tell if a signal was sent to a single process or a group of processes. Having the pid to task transition in signal.c will also make it easier to sort out races with de_thread and and the thread group leader exiting when it comes time to address that. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21posix-timers: Noralize good_sigeventEric W. Biederman
In good_sigevent directly compute the default return value as "task_tgid(current)". This is exactly the same as "task_pid(current->group_leader)" but written more clearly. In the thread case first compute the thread's pid. Then veify that attached to that pid is a thread of the current thread group. This has the net effect of making the code a little clearer, and making it obvious that posix timers never look up a process by a the pid of a thread. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21pid: Implement PIDTYPE_TGIDEric W. Biederman
Everywhere except in the pid array we distinguish between a tasks pid and a tasks tgid (thread group id). Even in the enumeration we want that distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid we almost have an implementation of PIDTYPE_TGID in struct signal_struct. Add PIDTYPE_TGID as a first class member of the pid_type enumeration and into the pids array. Then remove the __PIDTYPE_TGID special case and the leader_pid in signal_struct. The net size increase is just an extra pointer added to struct pid and an extra pair of pointers of an hlist_node added to task_struct. The effect on code maintenance is the removal of a number of special cases today and the potential to remove many more special cases as PIDTYPE_TGID gets used to it's fullest. The long term potential is allowing zombie thread group leaders to exit, which will remove a lot more special cases in the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21pids: Move the pgrp and session pid pointers from task_struct to signal_structEric W. Biederman
To access these fields the code always has to go to group leader so going to signal struct is no loss and is actually a fundamental simplification. This saves a little bit of memory by only allocating the pid pointer array once instead of once for every thread, and even better this removes a few potential races caused by the fact that group_leader can be changed by de_thread, while signal_struct can not. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21pids: Compute task_tgid using signal->leader_pidEric W. Biederman
The cost is the the same and this removes the need to worry about complications that come from de_thread and group_leader changing. __task_pid_nr_ns has been updated to take advantage of this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21pids: Move task_pid_type into sched/signal.hEric W. Biederman
The function is general and inline so there is no need to hide it inside of exit.c Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-07-20 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add sharing of BPF objects within one ASIC: this allows for reuse of the same program on multiple ports of a device, and therefore gains better code store utilization. On top of that, this now also enables sharing of maps between programs attached to different ports of a device, from Jakub. 2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature detections and unused variable exports, also from Jakub. 3) First batch of RCU annotation fixes in prog array handling, i.e. there are several __rcu markers which are not correct as well as some of the RCU handling, from Roman. 4) Two fixes in BPF sample files related to checking of the prog_cnt upper limit from sample loader, from Dan. 5) Minor cleanup in sockmap to remove a set but not used variable, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20kernfs: allow creating kernfs objects with arbitrary uid/gidDmitry Torokhov
This change allows creating kernfs files and directories with arbitrary uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments. The "simple" kernfs_create_file() and kernfs_create_dir() are left alone and always create objects belonging to the global root. When creating symlinks ownership (uid/gid) is taken from the target kernfs object. Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20sched/clock: Close a hole in sched_clock_init()Peter Zijlstra
All data required for the 'unstable' sched_clock must be set-up _before_ enabling it -- setting sched_clock_running. This includes the __gtod_offset but also a recent scd stamp. Make the gtod-offset update also set the csd stamp -- it requires the same two clock reads _anyway_. This doesn't hurt in the sched_clock_tick_stable() case and ensures sched_clock_init() gets everything set-up before use. Also switch to unconditional IRQ-disable/enable because the static key stuff already requires this is not ran with IRQs disabled. Fixes: 857baa87b642 ("sched/clock: Enable sched clock early") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180720080911.GM2494@hirez.programming.kicks-ass.net
2018-07-20bpf: btf: Clean up BTF_INT_BITS() in uapi btf.hMartin KaFai Lau
This patch shrinks the BTF_INT_BITS() mask. The current btf_int_check_meta() ensures the nr_bits of an integer cannot exceed 64. Hence, it is mostly an uapi cleanup. The actual btf usage (i.e. seq_show()) is also modified to use u8 instead of u16. The verification (e.g. btf_int_check_meta()) path stays as is to deal with invalid BTF situation. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-20Merge tag 'fortglx/4.19/time-part2' of ↵Thomas Gleixner
https://git.linaro.org/people/john.stultz/linux into timers/core Pull the second set of timekeeping things for 4.19 from John Stultz * NTP argument clenaups and constification from Ondrej Mosnacek * Fix to avoid RTC injecting sleeptime when suspend fails from Mukesh Ojha * Broading suspsend-timing to include non-stop clocksources that aren't currently used for timekeeping from Baolin Wang
2018-07-19time: Introduce one suspend clocksource to compensate the suspend timeBaolin Wang
On some hardware with multiple clocksources, we have coarse grained clocksources that support the CLOCK_SOURCE_SUSPEND_NONSTOP flag, but which are less than ideal for timekeeping whereas other clocksources can be better candidates but halt on suspend. Currently, the timekeeping core only supports timing suspend using CLOCK_SOURCE_SUSPEND_NONSTOP clocksources if that clocksource is the current clocksource for timekeeping. As a result, some architectures try to implement read_persistent_clock64() using those non-stop clocksources, but isn't really ideal, which will introduce more duplicate code. To fix this, provide logic to allow a registered SUSPEND_NONSTOP clocksource, which isn't the current clocksource, to be used to calculate the suspend time. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> [jstultz: minor tweaks to merge with previous resume changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
2018-07-19time: Fix extra sleeptime injection when suspend failsMukesh Ojha
Currently, there exists a corner case assuming when there is only one clocksource e.g RTC, and system failed to go to suspend mode. While resume rtc_resume() injects the sleeptime as timekeeping_rtc_skipresume() returned 'false' (default value of sleeptime_injected) due to which we can see mismatch in timestamps. This issue can also come in a system where more than one clocksource are present and very first suspend fails. Success case: ------------ {sleeptime_injected=false} rtc_suspend() => timekeeping_suspend() => timekeeping_resume() => (sleeptime injected) rtc_resume() Failure case: ------------ {failure in sleep path} {sleeptime_injected=false} rtc_suspend() => rtc_resume() {sleeptime injected again which was not required as the suspend failed} Fix this by handling the boolean logic properly. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2018-07-19timekeeping/ntp: Constify some function argumentsOndrej Mosnacek
Add 'const' to some function arguments and variables to make it easier to read the code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> [jstultz: Also fixup pre-existing checkpatch warnings for prototype arguments with no variable name] Signed-off-by: John Stultz <john.stultz@linaro.org>
2018-07-20sched/clock: Use static key for sched_clock_runningPavel Tatashin
sched_clock_running may be read every time sched_clock_cpu() is called. Yet, this variable is updated only twice during boot, and never changes again, therefore it is better to make it a static key. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-25-pasha.tatashin@oracle.com
2018-07-20sched/clock: Enable sched clock earlyPavel Tatashin
Allow sched_clock() to be used before schec_clock_init() is called. This provides a way to get early boot timestamps on machines with unstable clocks. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-24-pasha.tatashin@oracle.com
2018-07-20sched/clock: Move sched clock initialization and merge with generic clockPavel Tatashin
sched_clock_postinit() initializes a generic clock on systems where no other clock is provided. This function may be called only after timekeeping_init(). Rename sched_clock_postinit to generic_clock_inti() and call it from sched_clock_init(). Move the call for sched_clock_init() until after time_init(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-23-pasha.tatashin@oracle.com
2018-07-20timekeeping: Default boot time offset to local_clock()Pavel Tatashin
read_persistent_wall_and_boot_offset() is called during boot to read both the persistent clock and also return the offset between the boot time and the value of persistent clock. Change the default boot_offset from zero to local_clock() so architectures, that do not have a dedicated boot_clock but have early sched_clock(), such as SPARCv9, x86, and possibly more will benefit from this change by getting a better and more consistent estimate of the boot time without need for an arch specific implementation. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-17-pasha.tatashin@oracle.com
2018-07-20timekeeping: Replace read_boot_clock64() with ↵Pavel Tatashin
read_persistent_wall_and_boot_offset() If architecture does not support exact boot time, it is challenging to estimate boot time without having a reference to the current persistent clock value. Yet, it cannot read the persistent clock time again, because this may lead to math discrepancies with the caller of read_boot_clock64() who have read the persistent clock at a different time. This is why it is better to provide two values simultaneously: the persistent clock value, and the boot time. Replace read_boot_clock64() with: read_persistent_wall_and_boot_offset(wall_time, boot_offset) Where wall_time is returned by read_persistent_clock() And boot_offset is wall_time - boot time, which defaults to 0. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-16-pasha.tatashin@oracle.com
2018-07-19ntp: Use kstrtos64 for s64 variableOndrej Mosnacek
...instead of kstrtol with a dirty cast. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2018-07-19ntp: Remove redundant argumentsOndrej Mosnacek
The 'ts' argument of process_adj_status() and process_adjtimex_modes() is unused and can be safely removed. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2018-07-19timer: Fix coding styleYi Wang
The call to wake_up_nohz_cpu() is incorrectly indented. Remove the surplus TAB. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: zhong.weidong@zte.com.cn CC: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: https://lkml.kernel.org/r/1531721337-30284-1-git-send-email-wang.yi59@zte.com.cn
2018-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Lots of fixes, here goes: 1) NULL deref in qtnfmac, from Gustavo A. R. Silva. 2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih. 3) Lost completion messages in AF_XDP, from Magnus Karlsson. 4) Correct bogus self-assignment in rhashtable, from Rishabh Bhatnagar. 5) Fix regression in ipv6 route append handling, from David Ahern. 6) Fix masking in __set_phy_supported(), from Heiner Kallweit. 7) Missing module owner set in x_tables icmp, from Florian Westphal. 8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire. 9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy. 10) Fix NULL deref when using chains in act_csum, from Davide Caratti. 11) XDP_REDIRECT needs to check if the interface is up and whether the MTU is sufficient. From Toshiaki Makita. 12) Net diag can do a double free when killing TCP_NEW_SYN_RECV connections, from Lorenzo Colitti. 13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a full minute, delaying device unregister. From Eric Dumazet. 14) Update MAC entries in the correct order in ixgbe, from Alexander Duyck. 15) Don't leave partial mangles bpf program in jit_subprogs, from Daniel Borkmann. 16) Fix pfmemalloc SKB state propagation, from Stefano Brivio. 17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng. 18) Use after free in tun XDP_TX, from Toshiaki Makita. 19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole. 20) Don't reuse remainder of RX page when XDP is set in mlx4, from Saeed Mahameed. 21) Fix window probe handling of TCP rapair sockets, from Stefan Baranoff. 22) Missing socket locking in smc_ioctl(), from Ursula Braun. 23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann. 24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva. 25) Two spots in ipv6 do a rol32() on a hash value but ignore the result. Fixes from Colin Ian King" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits) tcp: identify cryptic messages as TCP seq # bugs ptp: fix missing break in switch hv_netvsc: Fix napi reschedule while receive completion is busy MAINTAINERS: Drop inactive Vitaly Bordug's email net: cavium: Add fine-granular dependencies on PCI net: qca_spi: Fix log level if probe fails net: qca_spi: Make sure the QCA7000 reset is triggered net: qca_spi: Avoid packet drop during initial sync ipv6: fix useless rol32 call on hash ipv6: sr: fix useless rol32 call on hash net: sched: Using NULL instead of plain integer net: usb: asix: replace mii_nway_restart in resume path net: cxgb3_main: fix potential Spectre v1 lib/rhashtable: consider param->min_size when setting initial table size net/smc: reset recv timeout after clc handshake net/smc: add error handling for get_user() net/smc: optimize consumer cursor updates net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL. ipv6: ila: select CONFIG_DST_CACHE net: usb: rtl8150: demote allmulti message to dev_dbg() ...
2018-07-18audit: fix use-after-free in audit_add_watchRonny Chevalier
audit_add_watch stores locally krule->watch without taking a reference on watch. Then, it calls audit_add_to_parent, and uses the watch stored locally. Unfortunately, it is possible that audit_add_to_parent updates krule->watch. When it happens, it also drops a reference of watch which could free the watch. How to reproduce (with KASAN enabled): auditctl -w /etc/passwd -F success=0 -k test_passwd auditctl -w /etc/passwd -F success=1 -k test_passwd2 The second call to auditctl triggers the use-after-free, because audit_to_parent updates krule->watch to use a previous existing watch and drops the reference to the newly created watch. To fix the issue, we grab a reference of watch and we release it at the end of the function. Signed-off-by: Ronny Chevalier <ronny.chevalier@hp.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-07-18bpf: offload: allow program and map sharing per-ASICJakub Kicinski
Allow programs and maps to be re-used across different netdevs, as long as they belong to the same struct bpf_offload_dev. Update the bpf_offload_prog_map_match() helper for the verifier and export a new helper for the drivers to use when checking programs at attachment time. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>