summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2017-06-06bpf: Introduce bpf_prog IDMartin KaFai Lau
This patch generates an unique ID for each BPF_PROG_LOAD-ed prog. It is worth to note that each BPF_PROG_LOAD-ed prog will have a different ID even they have the same bpf instructions. The ID is generated by the existing idr_alloc_cyclic(). The ID is ranged from [1, INT_MAX). It is allocated in cyclic manner, so an ID will get reused every 2 billion BPF_PROG_LOAD. The bpf_prog_alloc_id() is done after bpf_prog_select_runtime() because the jit process may have allocated a new prog. Hence, we need to ensure the value of pointer 'prog' will not be changed any more before storing the prog to the prog_idr. After bpf_prog_select_runtime(), the prog is read-only. Hence, the id is stored in 'struct bpf_prog_aux'. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05Merge branch 'for-4.12-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two cgroup fixes. One to address RCU delay of cpuset removal affecting userland visible behaviors. The other fixes a race condition between controller disable and cgroup removal" * 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: consider dying css as offline cgroup: Prevent kill_css() from being called more than once
2017-06-05sysctl: switch to use uuid_tChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05nohz: Fix buggy tick delay on IRQ stormsFrederic Weisbecker
When the tick is stopped and we reach the dynticks evaluation code on IRQ exit, we perform a soft tick restart if we observe an expired timer from there. It means we program the nearest possible tick but we stay in dynticks mode (ts->tick_stopped = 1) because we may need to stop the tick again after that expired timer is handled. Now this solution works most of the time but if we suffer an IRQ storm and those interrupts trigger faster than the hardware clockevents min delay, our tick won't fire until that IRQ storm is finished. Here is the problem: on IRQ exit we reprog the timer to at least NOW() + min_clockevents_delay. Another IRQ fires before the tick so we reschedule again to NOW() + min_clockevents_delay, etc... The tick is eternally rescheduled min_clockevents_delay ahead. A solution is to simply remove this soft tick restart. After all the normal dynticks evaluation path can handle 0 delay just fine. And by doing that we benefit from the optimization branch which avoids clock reprogramming if the clockevents deadline hasn't changed since the last reprog. This fixes our issue because we don't do repetitive clock reprog that always add hardware min delay. As a side effect it should even optimize the 0 delay path in general. Reported-and-tested-by: Octavian Purdila <octavian.purdila@nxp.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1496328429-13317-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-04perf, bpf: Add BPF support to all perf_event typesAlexei Starovoitov
Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all perf_event types, including HW_CACHE, RAW, and dynamic pmu events. Only tracepoint/kprobe events are treated differently which require BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly. Also add support for reading all event counters using bpf_perf_event_read() helper. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04alarmtimer: Switch over to generic set/get/rearm routineThomas Gleixner
All required callbacks are in place. Switch the alarm timer based posix interval timer callbacks to the common implementation and remove the incorrect private implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.825471962@linutronix.de
2017-06-04alarmtimer: Implement arm callbackThomas Gleixner
Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.747567162@linutronix.de
2017-06-04alarmtimer: Implement try_to_cancel callbackThomas Gleixner
Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.670026824@linutronix.de
2017-06-04alarmtimer: Implement remaining callbackThomas Gleixner
Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.592676753@linutronix.de
2017-06-04alarmtimer: Implement forward callbackThomas Gleixner
Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.513694229@linutronix.de
2017-06-04alarmtimer: Implement timer_rearm() callbackThomas Gleixner
Preparatory change to utilize the common posix timer mechanisms. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.434598989@linutronix.de
2017-06-04posix-timers: Make use of cancel/arm callbacksThomas Gleixner
Replace the hrtimer calls by calls to the new try_to_cancel()/arm() kclock callbacks and move the hrtimer specific implementation into the corresponding callback functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.355396667@linutronix.de
2017-06-04posix-timers: Add cancel/arm callbacksThomas Gleixner
Add timer_try_to_cancel() and timer_arm() callbacks to kclock which allow to make common_timer_set() usable by both hrtimer and alarmtimer based clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.278022962@linutronix.de
2017-06-04posix-timers: Zero settings value in common codeThomas Gleixner
Zero out the settings struct in the common code so the callbacks do not have to do it themself. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.200870713@linutronix.de
2017-06-04posix-timers: Make use of forward/remaining callbacksThomas Gleixner
Replace the hrtimer calls by calls to the new forward/remaining kclock callbacks and move the hrtimer specific implementation into the corresponding callback functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.121437232@linutronix.de
2017-06-04posix-timers: Add forward/remaining callbacksThomas Gleixner
Add two callbacks to kclock which allow using common_)timer_get() for both hrtimer and alarm timer based clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211657.044915536@linutronix.de
2017-06-04posix-timers: Add active flag to k_itimerThomas Gleixner
Keep track of the activation state of posix timers. This is a preparatory change for making common_timer_get() usable by both hrtimer and alarm timer implementations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.967783982@linutronix.de
2017-06-04posix-timers: Use timer_rearm() callback in posixtimer_rearm()Thomas Gleixner
Use the new timer_rearm() callback to replace the conditional hardcoded calls into the hrtimer and cpu timer code. This allows later to bring the same logic to alarmtimers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.889661919@linutronix.de
2017-06-04posix-timers: Rename do_schedule_next_timerThomas Gleixner
That function is a misnomer. Rename it with a proper prefix to posixtimer_rearm(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.811362578@linutronix.de
2017-06-04posix-timers: Add timer_rearm() callbackThomas Gleixner
Add a timer_rearm() callback which is used to make the rescheduling of posix interval timers independent of the underlying clock implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.732632167@linutronix.de
2017-06-04posix-timers: Store k_clock pointer in k_itimerThomas Gleixner
Having the k_clock pointer in the k_itimer struct avoids the lookup in several code pathes and makes the next steps of unification of the hrtimer and alarmtimer based posix timers simpler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.641222072@linutronix.de
2017-06-04posix-timers: Move interval out of the unionThomas Gleixner
Preparatory patch to unify the alarm timer and hrtimer based posix interval timer handling. The interval is used as a criteria for rearming decisions so moving it out of the clock specific data structures allows later unification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.563922908@linutronix.de
2017-06-04posix-timers: Unify overrun/requeue_pending handlingThomas Gleixner
hrtimer based posix-timers and posix-cpu-timers handle the update of the rearming and overflow related status fields differently. Move that update to the common rearming code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.484936964@linutronix.de
2017-06-04posix-timers: Move posix-timer internals to coreThomas Gleixner
None of these declarations is required outside of kernel/time. Move them to an internal header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170530211656.394803853@linutronix.de
2017-06-04posix-timers: Avoid gazillions of forward declarationsThomas Gleixner
Move it below the actual implementations as there are new callbacks coming which would require even more forward declarations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.238209952@linutronix.de
2017-06-04posix-clocks: Remove interval timer facility and mmap/fasync callbacksThomas Gleixner
The only user of this facility is ptp_clock, which does not implement any of those functions. Remove them to prevent accidental users. Especially the interval timer interfaces are now more or less impossible to implement because the necessary infrastructure has been confined to the core code. Aside of that it's really complex to make these callbacks implemented according to spec as the alarm timer implementation demonstrates. If at all then a nanosleep callback might be a reasonable extension. For now keep just what ptp_clock needs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.145036286@linutronix.de
2017-06-04posix-timers: Remove unused export of posix_timer_event()Thomas Gleixner
Since the removal of the mmtimer driver the export is not longer needed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.052744418@linutronix.de
2017-06-04alarmtimer: Remove pointless config conditionalThomas Gleixner
Having a IF_ENABLED(CONFIG_POSIX_TIMERS) inside of a #ifdef CONFIG_POSIX_TIMERS section is pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211655.975218056@linutronix.de
2017-06-04Merge branch 'timers/urgent' into WIP.timersThomas Gleixner
Pick up urgent fixes to avoid conflicts.
2017-06-04alarmtimer: Rate limit periodic intervalsThomas Gleixner
The alarmtimer code has another source of potentially rearming itself too fast. Interval timers with a very samll interval have a similar CPU hog effect as the previously fixed overflow issue. The reason is that alarmtimers do not implement the normal protection against this kind of problem which the other posix timer use: timer expires -> queue signal -> deliver signal -> rearm timer This scheme brings the rearming under scheduler control and prevents permanently firing timers which hog the CPU. Bringing this scheme to the alarm timer code is a major overhaul because it lacks all the necessary mechanisms completely. So for a quick fix limit the interval to one jiffie. This is not problematic in practice as alarmtimers are usually backed by an RTC for suspend which have 1 second resolution. It could be therefor argued that the resolution of this clock should be set to 1 second in general, but that's outside the scope of this fix. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kostya Serebryany <kcc@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170530211655.896767100@linutronix.de
2017-06-04alarmtimer: Prevent overflow of relative timersThomas Gleixner
Andrey reported a alartimer related RCU stall while fuzzing the kernel with syzkaller. The reason for this is an overflow in ktime_add() which brings the resulting time into negative space and causes immediate expiry of the timer. The following rearm with a small interval does not bring the timer back into positive space due to the same issue. This results in a permanent firing alarmtimer which hogs the CPU. Use ktime_add_safe() instead which detects the overflow and clamps the result to KTIME_SEC_MAX. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kostya Serebryany <kcc@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170530211655.802921648@linutronix.de
2017-06-04posix-timers: Move the do_schedule_next_timer declarationChristoph Hellwig
Having it in asm-generic/siginfo.h doesn't make any sense as it is in no way architecture specific. Move it to posix-timers.h instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-ia64@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: sparclinux@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20170603190102.28866-4-hch@lst.de
2017-06-04genirq: Warn when IRQ_NOAUTOEN is used with shared interruptsThomas Gleixner
Shared interrupts do not go well with disabling auto enable: 1) The sharing interrupt might request it while it's still disabled and then wait for interrupts forever. 2) The interrupt might have been requested by the driver sharing the line before IRQ_NOAUTOEN has been set. So the driver which expects that disabled state after calling request_irq() will not get what it wants. Even worse, when it calls enable_irq() later, it will trigger the unbalanced enable_irq() warning. Reported-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: dianders@chromium.org Cc: jeffy <jeffy.chen@rock-chips.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: tfiga@chromium.org Link: http://lkml.kernel.org/r/20170531100212.210682135@linutronix.de
2017-06-04genirq: Handle NOAUTOEN interrupt setup properThomas Gleixner
If an interrupt is marked NOAUTOEN then request_irq() installs the action, but does not enable the interrupt via startup_irq(). The interrupt is enabled via enable_irq() later from the driver. enable_irq() calls irq_enable(). That means that for interrupts which have a irq_startup() callback this callback is never invoked. Neither is irq_domain_activate_irq() invoked for such interrupts. If an interrupt depends on irq_startup() or irq_domain_activate_irq() then the enable via irq_enable() is not enough. Add a status flag IRQD_IRQ_STARTED_UP and use this to select the proper mechanism in enable_irq(). Use the flag also to avoid pointless calls into the low level functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: dianders@chromium.org Cc: jeffy <jeffy.chen@rock-chips.com> Cc: Brian Norris <briannorris@chromium.org> Cc: tfiga@chromium.org Link: http://lkml.kernel.org/r/20170531100212.130986205@linutronix.de
2017-06-03cpu/hotplug: Drop the device lock on errorSebastian Andrzej Siewior
If a custom CPU target is specified and that one is not available _or_ can't be interrupted then the code returns to userland without dropping a lock as notices by lockdep: |echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target | ================================================ | [ BUG: lock held when returning to user space! ] | ------------------------------------------------ | bash/503 is leaving the kernel with locks still held! | 1 lock held by bash/503: | #0: (device_hotplug_lock){+.+...}, at: [<ffffffff815b5650>] lock_device_hotplug_sysfs+0x10/0x40 So release the lock then. Fixes: 757c989b9994 ("cpu/hotplug: Make target state writeable") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de
2017-06-03perf/core: Don't release cred_guard_mutex if not takenAlexander Levin
If we failed to acquire task's cred_guard_mutex we shouldn't proceed to release it in the error path. Fixes: a63fbed776c ("perf/tracing/cpuhotplug: Fix locking order") Signed-off-by: Alexander Levin <alexander.levin@verizon.com> Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: mhiramat@kernel.org Cc: paulmck@linux.vnet.ibm.com Cc: bigeasy@linutronix.de Link: http://lkml.kernel.org/r/20170603033903.12056-1-alexander.levin@verizon.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-02bpf: Remove the capability check for cgroup skb eBPF programChenbo Feng
Currently loading a cgroup skb eBPF program require a CAP_SYS_ADMIN capability while attaching the program to a cgroup only requires the user have CAP_NET_ADMIN privilege. We can escape the capability check when load the program just like socket filter program to make the capability requirement consistent. Change since v1: Change the code style in order to be compliant with checkpatch.pl preference Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02bpf: Allow CGROUP_SKB eBPF program to access sk_buffChenbo Feng
This allows cgroup eBPF program to classify packet based on their protocol or other detail information. Currently program need CAP_NET_ADMIN privilege to attach a cgroup eBPF program, and A process with CAP_NET_ADMIN can already see all packets on the system, for example, by creating an iptables rules that causes the packet to be passed to userspace via NFLOG. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fix from Jiri Kosina: "Kconfig dependency fix for livepatching infrastructure from Miroslav Benes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
2017-05-31bpf: use different interpreter depending on required stack sizeAlexei Starovoitov
16 __bpf_prog_run() interpreters for various stack sizes add .text but not a lot comparing to run-time stack savings text data bss dec hex filename 26350 10328 624 37302 91b6 kernel/bpf/core.o.before_split 25777 10328 624 36729 8f79 kernel/bpf/core.o.after_split 26970 10328 624 37922 9422 kernel/bpf/core.o.now Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31bpf: reconcile bpf_tail_call and stack_depthAlexei Starovoitov
The next set of patches will take advantage of stack_depth tracking, so make sure that the program that does bpf_tail_call() has stack depth large enough for the callee. We could have tracked the stack depth of the prog_array owner program and only allow insertion of the programs with stack depth less than the owner, but it will break existing applications. Some of them have trivial root bpf program that only does multiple bpf_tail_calls and at init time the prog array is empty. In the future we may add a flag to do such tracking optionally, but for now play simple and safe. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31bpf: teach verifier to track stack depthAlexei Starovoitov
teach verifier to track bpf program stack depth Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31bpf: split bpf core interpreterAlexei Starovoitov
split __bpf_prog_run() interpreter into stack allocation and execution parts. The code section shrinks which helps interpreter performance in some cases. text data bss dec hex filename 26350 10328 624 37302 91b6 kernel/bpf/core.o.before 25777 10328 624 36729 8f79 kernel/bpf/core.o.after Very short programs got slower (due to extra function call): Before: test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 7 PASS test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 8 PASS test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 7 PASS test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 11 PASS test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 7 PASS After: test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 11 PASS test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 11 PASS test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 11 PASS test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 14 PASS test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 10 PASS Longer programs got faster: Before: test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 20286 20513 PASS test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 31853 31768 PASS test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9815 PASS test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 6 PASS test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13959 PASS test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 210 PASS test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 21724 PASS test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19118 PASS After: test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 19008 18827 PASS test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 29238 28450 PASS test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9485 PASS test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 12 PASS test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13257 PASS test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 213 PASS test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 19389 PASS test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19583 PASS For real world production programs the difference is noise. This patch is first step towards reducing interpreter stack consumption. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31bpf: free up BPF_JMP | BPF_CALL | BPF_X opcodeAlexei Starovoitov
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual indirect call by register and use kernel internal opcode to mark call instruction into bpf_tail_call() helper. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-30audit: add ambient capabilities to CAPSET and BPRM_FCAPS recordsRichard Guy Briggs
Capabilities were augmented to include ambient capabilities in v4.3 commit 58319057b784 ("capabilities: ambient capabilities"). Add ambient capabilities to the audit BPRM_FCAPS and CAPSET records. The record contains fields "old_pp", "old_pi", "old_pe", "new_pp", "new_pi", "new_pe" so in keeping with the previous record normalizations, change the "new_*" variants to simply drop the "new_" prefix. A sample of the replaced BPRM_FCAPS record: RAW: type=BPRM_FCAPS msg=audit(1491468034.252:237): fver=2 fp=0000000000200000 fi=0000000000000000 fe=1 old_pp=0000000000000000 old_pi=0000000000000000 old_pe=0000000000000000 old_pa=0000000000000000 pp=0000000000200000 pi=0000000000000000 pe=0000000000200000 pa=0000000000000000 INTERPRET: type=BPRM_FCAPS msg=audit(04/06/2017 04:40:34.252:237): fver=2 fp=sys_admin fi=none fe=chown old_pp=none old_pi=none old_pe=none old_pa=none pp=sys_admin pi=none pe=sys_admin pa=none A sample of the replaced CAPSET record: RAW: type=CAPSET msg=audit(1491469502.371:242): pid=833 cap_pi=0000003fffffffff cap_pp=0000003fffffffff cap_pe=0000003fffffffff cap_pa=0000000000000000 INTERPRET: type=CAPSET msg=audit(04/06/2017 05:05:02.371:242) : pid=833 cap_pi=chown,dac_override,dac_read_search,fowner,fsetid,kill, setgid,setuid,setpcap,linux_immutable,net_bind_service,net_broadcast, net_admin,net_raw,ipc_lock,ipc_owner,sys_module,sys_rawio,sys_chroot, sys_ptrace,sys_pacct,sys_admin,sys_boot,sys_nice,sys_resource,sys_time, sys_tty_config,mknod,lease,audit_write,audit_control,setfcap, mac_override,mac_admin,syslog,wake_alarm,block_suspend,audit_read cap_pp=chown,dac_override,dac_read_search,fowner,fsetid,kill,setgid, setuid,setpcap,linux_immutable,net_bind_service,net_broadcast, net_admin,net_raw,ipc_lock,ipc_owner,sys_module,sys_rawio,sys_chroot, sys_ptrace,sys_pacct,sys_admin,sys_boot,sys_nice,sys_resource, sys_time,sys_tty_config,mknod,lease,audit_write,audit_control,setfcap, mac_override,mac_admin,syslog,wake_alarm,block_suspend,audit_read cap_pe=chown,dac_override,dac_read_search,fowner,fsetid,kill,setgid, setuid,setpcap,linux_immutable,net_bind_service,net_broadcast, net_admin,net_raw,ipc_lock,ipc_owner,sys_module,sys_rawio,sys_chroot, sys_ptrace,sys_pacct,sys_admin,sys_boot,sys_nice,sys_resource, sys_time,sys_tty_config,mknod,lease,audit_write,audit_control,setfcap, mac_override,mac_admin,syslog,wake_alarm,block_suspend,audit_read cap_pa=none See: https://github.com/linux-audit/audit-kernel/issues/40 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-30nohz: Reset next_tick cache even when the timer has no regsFrederic Weisbecker
Handle tick interrupts whose regs are NULL, out of general paranoia. It happens when hrtimer_interrupt() is called from non-interrupt contexts, such as hotplug CPU down events. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-27do_sigaltstack(): lift copying to/from userland into callersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-27take compat_sys_old_getrlimit() to native syscallAl Viro
... and sanitize the ifdefs in there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-27Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixlet from Thomas Gleixner: "Silence dmesg spam by making the posix cpu timer printks depend on print_fatal_signals" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timers: Make signal printks conditional
2017-05-27Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "A fix for a state leak which was introduced in the recent rework of futex/rtmutex interaction" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()