summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2018-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUGMarc Zyngier
CONFIG_IRQ_DOMAIN_DEBUG is similar to CONFIG_GENERIC_IRQ_DEBUGFS, just with less information. Spring cleanup time. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shunyong <shunyong.yang@hxt-semitech.com> Link: https://lkml.kernel.org/r/20180117142647.23622-1-marc.zyngier@arm.com
2018-01-24sched/core: Fix cpu.max vs. cpuhotplug deadlockPeter Zijlstra
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion: tg_set_cfs_bandwidth() get_online_cpus() cpus_read_lock() cfs_bandwidth_usage_inc() static_key_slow_inc() cpus_read_lock() Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()Tejun Heo
debug_show_all_locks() iterates all tasks and print held locks whole holding tasklist_lock. This can take a while on a slow console device and may end up triggering NMI hardlockup detector if someone else ends up waiting for tasklist_lock. Touch the NMI watchdog while printing the held locks to avoid spuriously triggering the hardlockup detector. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/20180122220055.GB1771050@devbig577.frc2.facebook.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24futex: Fix OWNER_DEAD fixupPeter Zijlstra
Both Geert and DaveJ reported that the recent futex commit: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") introduced a problem with setting OWNER_DEAD. We set the bit on an uninitialized variable and then entirely optimize it away as a dead-store. Move the setting of the bit to where it is more useful. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Link: http://lkml.kernel.org/r/20180122103947.GD2228@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-23ftrace: Mark function tracer test functions noinline/nocloneAndi Kleen
The ftrace function tracer self tests calls some functions to verify the get traced. This relies on them not being inlined. Previously this was ensured by putting them into another file, but with LTO the compiler can inline across files, which makes the tests fail. Mark these functions as noinline and noclone. Link: http://lkml.kernel.org/r/20171221233732.31896-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23trace_uprobe: Display correct offset in uprobe_eventsRavi Bangoria
Recently, how the pointers being printed with %p has been changed by commit ad67b74d2469 ("printk: hash addresses printed with %p"). This is causing a regression while showing offset in the uprobe_events file. Instead of %p, use %px to display offset. Before patch: # perf probe -vv -x /tmp/a.out main Opening /sys/kernel/debug/tracing//uprobe_events write=1 Writing event: p:probe_a/main /tmp/a.out:0x58c # cat /sys/kernel/debug/tracing/uprobe_events p:probe_a/main /tmp/a.out:0x0000000049a0f352 After patch: # cat /sys/kernel/debug/tracing/uprobe_events p:probe_a/main /tmp/a.out:0x000000000000058c Link: http://lkml.kernel.org/r/20180106054246.15375-1-ravi.bangoria@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23tracing: Make sure the parsed string always terminates with '\0'Changbin Du
Always mark the parsed string with a terminated nul '\0' character. This removes the need for the users to have to append the '\0' before using the parsed string. Link: http://lkml.kernel.org/r/1516093350-12045-4-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23tracing: Clear parser->idx if only spaces are readChangbin Du
If only spaces were read while parsing the next string, then parser->idx should be cleared in order to make trace_parser_loaded() return false. Link: http://lkml.kernel.org/r/1516093350-12045-3-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23tracing: Detect the string nul character when parsing user input stringChangbin Du
User space can pass in a C nul character '\0' along with its input. The function trace_get_user() will try to process it as a normal character, and that will fail to parse. open("/sys/kernel/debug/tracing//set_ftrace_pid", O_WRONLY|O_TRUNC) = 3 write(3, " \0", 2) = -1 EINVAL (Invalid argument) while parse can handle spaces, so below works. $ echo "" > set_ftrace_pid $ echo " " > set_ftrace_pid $ echo -n " " > set_ftrace_pid Have the parser stop on '\0' and cease any further parsing. Only process the characters up to the nul '\0' character and do not process it. Link: http://lkml.kernel.org/r/1516093350-12045-2-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23tracing: Update stack trace skipping for ORC unwinderSteven Rostedt (VMware)
With the addition of ORC unwinder and FRAME POINTER unwinder, the stack trace skipping requirements have changed. I went through the tracing stack trace dumps with ORC and with frame pointers and recalculated the proper values. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23ftrace, orc, x86: Handle ftrace dynamically allocated trampolinesSteven Rostedt (VMware)
The function tracer can create a dynamically allocated trampoline that is called by the function mcount or fentry hook that is used to call the function callback that is registered. The problem is that the orc undwinder will bail if it encounters one of these trampolines. This breaks the stack trace of function callbacks, which include the stack tracer and setting the stack trace for individual functions. Since these dynamic trampolines are basically copies of the static ftrace trampolines defined in ftrace_*.S, we do not need to create new orc entries for the dynamic trampolines. Finding the return address on the stack will be identical as the functions that were copied to create the dynamic trampolines. When encountering a ftrace dynamic trampoline, we can just use the orc entry of the ftrace static function that was copied for that trampoline. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in 'net'. The esp{4,6}_offload.c conflicts were overlapping changes. The 'out' label is removed so we just return ERR_PTR(-EINVAL) directly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23bpf: fix incorrect kmalloc usage in lpm_trie MAP_GET_NEXT_KEY rcu regionYonghong Song
In commit b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map"), the implemented MAP_GET_NEXT_KEY callback function is guarded with rcu read lock. In the function body, "kmalloc(size, GFP_USER | __GFP_NOWARN)" is used which may sleep and violate rcu read lock region requirements. This patch fixed the issue by using GFP_ATOMIC instead to avoid blocking kmalloc. Tested with CONFIG_DEBUG_ATOMIC_SLEEP=y as suggested by Eric Dumazet. Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") Signed-off-by: Yonghong Song <yhs@fb.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-22signal/ptrace: Add force_sig_ptrace_errno_trap and use it where neededEric W. Biederman
There are so many places that build struct siginfo by hand that at least one of them is bound to get it wrong. A handful of cases in the kernel arguably did just that when using the errno field of siginfo to pass no errno values to userspace. The usage is limited to a single si_code so at least does not mess up anything else. Encapsulate this questionable pattern in a helper function so that the userspace ABI is preserved. Update all of the places that use this pattern to use the new helper function. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22signal: Helpers for faults with specialized siginfo layoutsEric W. Biederman
The helpers added are: send_sig_mceerr force_sig_mceerr force_sig_bnderr force_sig_pkuerr Filling out siginfo properly can ge tricky. Especially for these specialized cases where the temptation is to share code with other cases which use a different subset of siginfo fields. Unfortunately that code sharing frequently results in bugs with the wrong siginfo fields filled in, and makes it harder to verify that the siginfo structure was properly initialized. Provide these helpers instead that get all of the details right, and guarantee that siginfo is properly initialized. send_sig_mceerr and force_sig_mceer are a little special as two si codes BUS_MCEERR_AO and BUS_MCEER_AR both use the same extended signinfo layout. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22signal: Add send_sig_fault and force_sig_faultEric W. Biederman
The vast majority of signals sent from architecture specific code are simple faults. Encapsulate this reality with two helper functions so that the nit-picky implementation of preparing a siginfo does not need to be repeated many times on each architecture. As only some architectures support the trapno field, make the trapno arguement only present on those architectures. Similary as ia64 has three fields: imm, flags, and isr that are specific to it. Have those arguments always present on ia64 and no where else. This ensures the architecture specific code always remembers which fields it needs to pass into the siginfo structure. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22signal: Replace memset(info,...) with clear_siginfo for clarityEric W. Biederman
The function clear_siginfo is just a nice wrapper around memset so this results in no functional change. This change makes mistakes a little more difficult and it makes it clearer what is going on. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22signal: Don't use structure initializers for struct siginfoEric W. Biederman
The siginfo structure has all manners of holes with the result that a structure initializer is not guaranteed to initialize all of the bits. As we have to copy the structure to userspace don't even try to use a structure initializer. Instead use clear_siginfo followed by initializing selected fields. This gives a guarantee that uninitialized kernel memory is not copied to userspace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22Merge branch 'for-4.16-console-waiter-logic' into for-4.16Petr Mladek
2018-01-22Merge branch 'for-4.16-print-symbol' into for-4.16Petr Mladek
2018-01-22Merge branch 'for-4.16-deprecate-printk-pf' into for-4.16Petr Mladek
2018-01-22printk: drop redundant devkmsg_log_str memsetsSergey Senozhatsky
We copy in null terminated strings "on" and "off", no need to zero out devkmsg_log_str in control_devkmsg(). Link: http://lkml.kernel.org/r/20180119043901.1728-1-sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-21Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for the new matrix allocator to prevent vector exhaustion by certain network drivers which allocate gazillions of unused vectors which cannot be put into reservation mode due to MSI and the lack of MSI entry masking. The fix/workaround is to spread the vectors across CPUs by searching the supplied target CPU mask for the CPU with the smallest number of allocated vectors" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq/matrix: Spread interrupts on allocation
2018-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-01-19 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) bpf array map HW offload, from Jakub. 2) support for bpf_get_next_key() for LPM map, from Yonghong. 3) test_verifier now runs loaded programs, from Alexei. 4) xdp cpumap monitoring, from Jesper. 5) variety of tests, cleanups and small x64 JIT optimization, from Daniel. 6) user space can now retrieve HW JITed program, from Jiong. Note there is a minor conflict between Russell's arm32 JIT fixes and removal of bpf_jit_enable variable by Daniel which should be resolved by keeping Russell's comment and removing that variable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The BPF verifier conflict was some minor contextual issue. The TUN conflict was less trivial. Cong Wang fixed a memory leak of tfile->tx_array in 'net'. This is an skb_array. But meanwhile in net-next tun changed tfile->tx_arry into tfile->tx_ring which is a ptr_ring. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19bpf: add upper complexity limit to verifier logDaniel Borkmann
Given the limit could potentially get further adjustments in the future, add it to the log so it becomes obvious what the current limit is w/o having to check the source first. This may also be helpful for debugging complexity related issues on kernels that backport from upstream. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-19bpf: get rid of pure_initcall dependency to enable jitsDaniel Borkmann
Having a pure_initcall() callback just to permanently enable BPF JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave a small race window in future where JIT is still disabled on boot. Since we know about the setting at compilation time anyway, just initialize it properly there. Also consolidate all the individual bpf_jit_enable variables into a single one and move them under one location. Moreover, don't allow for setting unspecified garbage values on them. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-19bpf, verifier: detect misconfigured mem, size argument pairDaniel Borkmann
I've seen two patch proposals now for helper additions that used ARG_PTR_TO_MEM or similar in reg_X but no corresponding ARG_CONST_SIZE in reg_X+1. Verifier won't complain in such case, but it will omit verifying the memory passed to the helper thus ending up badly. Detect such buggy helper function signature and bail out during verification rather than finding them through review. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-19bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE mapYonghong Song
Current LPM_TRIE map type does not implement MAP_GET_NEXT_KEY command. This command is handy when users want to enumerate keys. Otherwise, a different map which supports key enumeration may be required to store the keys. If the map data is sparse and all map data are to be deleted without closing file descriptor, using MAP_GET_NEXT_KEY to find all keys is much faster than enumerating all key space. This patch implements MAP_GET_NEXT_KEY command for LPM_TRIE map. If user provided key pointer is NULL or the key does not have an exact match in the trie, the first key will be returned. Otherwise, the next key will be returned. In this implemenation, key enumeration follows a postorder traversal of internal trie. More specific keys will be returned first than less specific ones, given a sequence of MAP_GET_NEXT_KEY syscalls. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-19Merge tag 'trace-v4.15-rc4-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two more small fixes - The conversion of enums into their actual numbers to display in the event format file had an off-by-one bug, that could cause an enum not to be converted, and break user space parsing tools. - A fix to a previous fix to bring back the context recursion checks. The interrupt case checks for NMI, IRQ and softirq, but the softirq returned the same number regardless if it was set or not, although the logic would force it to be set if it were hit" * tag 'trace-v4.15-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix converting enum's from the map in trace_event_eval_update() ring-buffer: Fix duplicate results in mapping context to bits in recursive lock
2018-01-19Merge branch 'for-4.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "cgroup.threads should be delegatable (ie. a container should be able to write to it from inside) but was missing the flag. The change is very low risk" * 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: make cgroup.threads delegatable
2018-01-19Merge branch 'for-4.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixlet from Tejun Heo: "One patch to add touch_nmi_watchdog() while dumping workqueue debug messages to avoid triggering the lockup detector spuriously. The change is very low risk" * 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: avoid hard lockups in show_workqueue_state()
2018-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix BPF divides by zero, from Eric Dumazet and Alexei Starovoitov. 2) Reject stores into bpf context via st and xadd, from Daniel Borkmann. 3) Fix a memory leak in TUN, from Cong Wang. 4) Disable RX aggregation on a specific troublesome configuration of r8152 in a Dell TB16b dock. 5) Fix sw_ctx leak in tls, from Sabrina Dubroca. 6) Fix program replacement in cls_bpf, from Daniel Borkmann. 7) Fix uninitialized station_info structures in cfg80211, from Johannes Berg. 8) Fix miscalculation of transport header offset field in flow dissector, from Eric Dumazet. 9) Fix LPM tree leak on failure in mlxsw driver, from Ido Schimmel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) ibmvnic: Fix IPv6 packet descriptors ibmvnic: Fix IP offload control buffer ipv6: don't let tb6_root node share routes with other node ip6_gre: init dev->mtu and dev->hard_header_len correctly mlxsw: spectrum_router: Free LPM tree upon failure flow_dissector: properly cap thoff field fm10k: mark PM functions as __maybe_unused cfg80211: fix station info handling bugs netlink: reset extack earlier in netlink_rcv_skb can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once bpf: mark dst unknown on inconsistent {s, u}bounds adjustments bpf: fix cls_bpf on filter replace Net: ethernet: ti: netcp: Fix inbound ping crash if MTU size is greater than 1500 tls: reset crypto_info when do_tls_setsockopt_tx fails tls: return -EBUSY if crypto_info is already set tls: fix sw_ctx leak net/tls: Only attach to sockets in ESTABLISHED state net: fs_enet: do not call phy_stop() in interrupts r8152: disable RX aggregation on Dell TB16 dock ...
2018-01-19string: drop __must_check from strscpy() and restore strscpy() usages in cgroupTejun Heo
e7fd37ba1217 ("cgroup: avoid copying strings longer than the buffers") converted possibly unsafe strncpy() usages in cgroup to strscpy(). However, although the callsites are completely fine with truncated copied, because strscpy() is marked __must_check, it led to the following warnings. kernel/cgroup/cgroup.c: In function ‘cgroup_file_name’: kernel/cgroup/cgroup.c:1400:10: warning: ignoring return value of ‘strscpy’, declared with attribute warn_unused_result [-Wunused-result] strscpy(buf, cft->name, CGROUP_FILE_NAME_MAX); ^ To avoid the warnings, 50034ed49645 ("cgroup: use strlcpy() instead of strscpy() to avoid spurious warning") switched them to strlcpy(). strlcpy() is worse than strlcpy() because it unconditionally runs strlen() on the source string, and the only reason we switched to strlcpy() here was because it was lacking __must_check, which doesn't reflect any material differences between the two function. It's just that someone added __must_check to strscpy() and not to strlcpy(). These basic string copy operations are used in variety of ways, and one of not-so-uncommon use cases is safely handling truncated copies, where the caller naturally doesn't care about the return value. The __must_check doesn't match the actual use cases and forces users to opt for inferior variants which lack __must_check by happenstance or spread ugly (void) casts. Remove __must_check from strscpy() and restore strscpy() usages in cgroup. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@ezchip.com>
2018-01-18bpf: offload: report device information about offloaded mapsJakub Kicinski
Tell user space about device on which the map was created. Unfortunate reality of user ABI makes sharing this code with program offload difficult but the information is the same. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18bpf: offload: allow array map offloadJakub Kicinski
The special handling of different map types is left to the driver. Allow offload of array maps by simply adding it to accepted types. For nfp we have to make sure array elements are not deleted. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18bpf: arraymap: use bpf_map_init_from_attr()Jakub Kicinski
Arraymap was not converted to use bpf_map_init_from_attr() to avoid merge conflicts with emergency fixes. Do it now. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18bpf: arraymap: move checks out of alloc functionJakub Kicinski
Use the new callback to perform allocation checks for array maps. The fd maps don't need a special allocation callback, they only need a special check callback. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18bpf: allow socket_filter programs to use bpf_prog_test_runAlexei Starovoitov
in order to improve test coverage allow socket_filter program type to be run via bpf_prog_test_run command. Since such programs can be loaded by non-root tighten permissions for bpf_prog_test_run to be root only to avoid surprises. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18tracing: Fix converting enum's from the map in trace_event_eval_update()Steven Rostedt (VMware)
Since enums do not get converted by the TRACE_EVENT macro into their values, the event format displaces the enum name and not the value. This breaks tools like perf and trace-cmd that need to interpret the raw binary data. To solve this, an enum map was created to convert these enums into their actual numbers on boot up. This is done by TRACE_EVENTS() adding a TRACE_DEFINE_ENUM() macro. Some enums were not being converted. This was caused by an optization that had a bug in it. All calls get checked against this enum map to see if it should be converted or not, and it compares the call's system to the system that the enum map was created under. If they match, then they call is processed. To cut down on the number of iterations needed to find the maps with a matching system, since calls and maps are grouped by system, when a match is made, the index into the map array is saved, so that the next call, if it belongs to the same system as the previous call, could start right at that array index and not have to scan all the previous arrays. The problem was, the saved index was used as the variable to know if this is a call in a new system or not. If the index was zero, it was assumed that the call is in a new system and would keep incrementing the saved index until it found a matching system. The issue arises when the first matching system was at index zero. The next map, if it belonged to the same system, would then think it was the first match and increment the index to one. If the next call belong to the same system, it would begin its search of the maps off by one, and miss the first enum that should be converted. This left a single enum not converted properly. Also add a comment to describe exactly what that index was for. It took me a bit too long to figure out what I was thinking when debugging this issue. Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com Cc: stable@vger.kernel.org Fixes: 0c564a538aa93 ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values") Reported-by: Chuck Lever <chuck.lever@oracle.com> Teste-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18ring-buffer: Fix duplicate results in mapping context to bits in recursive lockSteven Rostedt (VMware)
In bringing back the context checks, the code checks first if its normal (non-interrupt) context, and then for NMI then IRQ then softirq. The final check is redundant. Since the if branch is only hit if the context is one of NMI, IRQ, or SOFTIRQ, if it's not NMI or IRQ there's no reason to check if it is SOFTIRQ. The current code returns the same result even if its not a SOFTIRQ. Which is confusing. pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ Is redundant as RB_CTX_SOFTIRQ *is* 2! Fixes: a0e3a18f4baf ("ring-buffer: Bring back context level recursive checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-01-18 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a divide by zero due to wrong if (src_reg == 0) check in 64-bit mode. Properly handle this in interpreter and mask it also generically in verifier to guard against similar checks in JITs, from Eric and Alexei. 2) Fix a bug in arm64 JIT when tail calls are involved and progs have different stack sizes, from Daniel. 3) Reject stores into BPF context that are not expected BPF_STX | BPF_MEM variant, from Daniel. 4) Mark dst reg as unknown on {s,u}bounds adjustments when the src reg has derived bounds from dead branches, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18lockdep: Make lockdep checking constantMatthew Wilcox
There are several places in the kernel which would like to pass a const pointer to lockdep_is_held(). Constify the entire path so nobody has to trick the compiler. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/20180117151414.23686-3-willy@infradead.org
2018-01-18lockdep: Assign lock keys on registrationMatthew Wilcox
Lockdep is assigning lock keys when a lock was looked up. This is unnecessary; if the lock has never been registered then it is known that it is not locked. It also complicates the calling convention. Switch to assigning the lock key in register_lock_class(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/20180117151414.23686-2-willy@infradead.org
2018-01-18irq/matrix: Spread interrupts on allocationThomas Gleixner
Keith reported an issue with vector space exhaustion on a server machine which is caused by the i40e driver allocating 168 MSI interrupts when the driver is initialized, even when most of these interrupts are not used at all. The x86 vector allocation code tries to avoid the immediate allocation with the reservation mode, but the card uses MSI and does not support MSI entry masking, which prevents reservation mode and requires immediate vector allocation. The matrix allocator is a bit naive and prefers the first CPU in the cpumask which describes the possible target CPUs for an allocation. That results in allocating all 168 vectors on CPU0 which later causes vector space exhaustion when the NVMe driver tries to allocate managed interrupts on each CPU for the per CPU queues. Avoid this by finding the CPU which has the lowest vector allocation count to spread out the non managed interrupt accross the possible target CPUs. Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Keith Busch <keith.busch@intel.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801171557330.1777@nanos
2018-01-18Merge branches 'acpi-pm' and 'pm-sleep'Rafael J. Wysocki
* acpi-pm: platform/x86: surfacepro3: Support for wakeup from suspend-to-idle ACPI / PM: Use Low Power S0 Idle on more systems ACPI / PM: Make it possible to ignore the system sleep blacklist * pm-sleep: PM / hibernate: Drop unused parameter of enough_swap block, scsi: Fix race between SPI domain validation and system suspend PM / sleep: Make lock/unlock_system_sleep() available to kernel modules PM: hibernate: Do not subtract NR_FILE_MAPPED in minimum_image_size()
2018-01-18Merge branches 'pm-domains', 'pm-kconfig', 'pm-cpuidle' and 'powercap'Rafael J. Wysocki
* pm-domains: PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks PM / Domains: Remove obsolete "samsung,power-domain" check * pm-kconfig: bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate PM: Provide a config snippet for disabling PM * pm-cpuidle: cpuidle: Avoid NULL argument in cpuidle_switch_governor() * powercap: powercap: intel_rapl: Fix trailing semicolon powercap: add suspend and resume mechanism for SOC power limit powercap: Simplify powercap_init()
2018-01-18bpf: change fake_ip for bpf_trace_printk helperYonghong Song
Currently, for bpf_trace_printk helper, fake ip address 0x1 is used with comments saying that fake ip will not be printed. This is indeed true for 4.12 and earlier version, but for 4.13 and later version, the ip address will be printed if it cannot be resolved with kallsym. Running samples/bpf/tracex5 program and you will have the following in the debugfs trace_pipe output: ... <...>-1819 [003] .... 443.497877: 0x00000001: mmap <...>-1819 [003] .... 443.498289: 0x00000001: syscall=102 (one of get/set uid/pid/gid) ... The kernel commit changed this behavior is: commit feaf1283d11794b9d518fcfd54b6bf8bee1f0b4b Author: Steven Rostedt (VMware) <rostedt@goodmis.org> Date: Thu Jun 22 17:04:55 2017 -0400 tracing: Show address when function names are not found ... This patch changed the comment and also altered the fake ip address to 0x0 as users may think 0x1 has some special meaning while it doesn't. The new output: ... <...>-1799 [002] .... 25.953576: 0: mmap <...>-1799 [002] .... 25.953865: 0: read(fd=0, buf=00000000053936b5, size=512) ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18bpf: add new jited info fields in bpf_dev_offload and bpf_prog_infoJiong Wang
For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog used by all host JIT targets to get jited image and it's length. While for offload, targets are likely to have different offload mechanisms that these info are kept in device private data fields. Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT length and contents info for offload targets. One way is to introduce new callback to parse device private data then fill those fields in bpf_prog_info. This might be a little heavy, the other way is to add generic fields which will be initialized by all offload targets. This patch follow the second approach to introduce two new fields in struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill correct jited_prog_len and jited_prog_insns in bpf_prog_info. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>