summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-02-28locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()Bart Van Assche
This patch does not change any functionality but makes the next patch in this series easier to read. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-14-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Reuse list entries that are no longer in useBart Van Assche
Instead of abandoning elements of list_entries[] that are no longer in use, make alloc_list_entry() reuse array elements that have been freed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-13-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Free lock classes that are no longer in useBart Van Assche
Instead of leaving lock classes that are no longer in use in the lock_classes array, reuse entries from that array that are no longer in use. Maintain a linked list of free lock classes with list head 'free_lock_class'. Only add freed lock classes to the free_lock_classes list after a grace period to avoid that a lock_classes[] element would be reused while an RCU reader is accessing it. Since the lockdep selftests run in a context where sleeping is not allowed and since the selftests require that lock resetting/zapping works with debug_locks off, make the behavior of lockdep_free_key_range() and lockdep_reset_lock() depend on whether or not these are called from the context of the lockdep selftests. Thanks to Peter for having shown how to modify get_pending_free() such that that function does not have to sleep. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-12-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Update two outdated commentsBart Van Assche
synchronize_sched() has been removed recently. Update the comments that refer to synchronize_sched(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Fixes: 51959d85f32d ("lockdep: Replace synchronize_sched() with synchronize_rcu()") # v5.0-rc1 Link: https://lkml.kernel.org/r/20190214230058.196511-11-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Make it easy to detect whether or not inside a selftestBart Van Assche
The patch that frees unused lock classes will modify the behavior of lockdep_free_key_range() and lockdep_reset_lock() depending on whether or not these functions are called from the context of the lockdep selftests. Hence make it easy to detect whether or not lockdep code is called from the context of a lockdep selftest. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-10-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()Bart Van Assche
This patch does not change the behavior of these functions but makes the patch that frees unused lock classes easier to read. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-9-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Initialize the locks_before and locks_after lists earlierBart Van Assche
This patch does not change any functionality. A later patch will reuse lock classes that have been freed. In combination with that patch this patch wil have the effect of initializing lock class order lists once instead of every time a lock class structure is reinitialized. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-8-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Make zap_class() remove all matching lock order entriesBart Van Assche
Make sure that all lock order entries that refer to a class are removed from the list_entries[] array when a kernel module is unloaded. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-7-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cacheBart Van Assche
Make sure that add_chain_cache() returns 0 and does not modify the chain hash if nr_chain_hlocks == MAX_LOCKDEP_CHAIN_HLOCKS before this function is called. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-5-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Fix reported required memory size (2/2)Bart Van Assche
Lock chains are only tracked with CONFIG_PROVE_LOCKING=y. Do not report the memory required for the lock chain array if CONFIG_PROVE_LOCKING=n. See also commit: ca58abcb4a6d ("lockdep: sanitise CONFIG_PROVE_LOCKING") Include the size of the chain_hlocks[] array. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-4-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Fix reported required memory size (1/2)Bart Van Assche
Change the sizeof(array element time) * (array size) expressions into sizeof(array). This fixes the size computations of the classhash_table[] and chainhash_table[] arrays. The reason is that commit: a63f38cc4ccf ("locking/lockdep: Convert hash tables to hlists") changed the type of the elements of that array from 'struct list_head' into 'struct hlist_head'. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-3-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/lockdep: Fix two 32-bit compiler warningsBart Van Assche
Use %zu to format size_t instead of %lu to avoid that the compiler complains about a mismatch between format specifier and argument on 32-bit systems. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: johannes.berg@intel.com Cc: tj@kernel.org Link: https://lkml.kernel.org/r/20190214230058.196511-2-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28locking/qspinlock: Remove unnecessary BUG_ON() callWaiman Long
With the > 4 nesting levels case handled by the commit: d682b596d993 ("locking/qspinlock: Handle > 4 slowpath nesting levels") the BUG_ON() call in encode_tail() will never actually be triggered. Remove it. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1551057253-3231-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-27bpf: set inner_map_meta->spin_lock_off correctlyYonghong Song
Commit d83525ca62cf ("bpf: introduce bpf_spin_lock") introduced bpf_spin_lock and the field spin_lock_off in kernel internal structure bpf_map has the following meaning: >=0 valid offset, <0 error For every map created, the kernel will ensure spin_lock_off has correct value. Currently, bpf_map->spin_lock_off is not copied from the inner map to the map_in_map inner_map_meta during a map_in_map type map creation, so inner_map_meta->spin_lock_off = 0. This will give verifier wrong information that inner_map has bpf_spin_lock and the bpf_spin_lock is defined at offset 0. An access to offset 0 of a value pointer will trigger the following error: bpf_spin_lock cannot be accessed directly by load/store This patch fixed the issue by copy inner map's spin_lock_off value to inner_map_meta->spin_lock_off. Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock") Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-27bpf: expose program stats via bpf_prog_infoAlexei Starovoitov
Return bpf program run_time_ns and run_cnt via bpf_prog_info Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-27bpf: enable program statsAlexei Starovoitov
JITed BPF programs are indistinguishable from kernel functions, but unlike kernel code BPF code can be changed often. Typical approach of "perf record" + "perf report" profiling and tuning of kernel code works just as well for BPF programs, but kernel code doesn't need to be monitored whereas BPF programs do. Users load and run large amount of BPF programs. These BPF stats allow tools monitor the usage of BPF on the server. The monitoring tools will turn sysctl kernel.bpf_stats_enabled on and off for few seconds to sample average cost of the programs. Aggregated data over hours and days will provide an insight into cost of BPF and alarms can trigger in case given program suddenly gets more expensive. The cost of two sched_clock() per program invocation adds ~20 nsec. Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down from ~10 nsec to ~30 nsec. static_key minimizes the cost of the stats collection. There is no measurable difference before/after this patch with kernel.bpf_stats_enabled=0 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-27kbuild: compute false-positive -Wmaybe-uninitialized cases in KconfigMasahiro Yamada
Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched various false positives: - commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building with -Os") turned off this option for -Os. - commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for CONFIG_PROFILE_ALL_BRANCHES - commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning for "make W=1"") turned off this option for GCC < 4.9 Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903 I think this looks better by shifting the logic from Makefile to Kconfig. Link: https://github.com/ClangBuiltLinux/linux/issues/350 Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
2019-02-26bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()Peng Sun
In bpf/syscall.c, bpf_map_get_fd_by_id() use bpf_map_inc_not_zero() to increase the refcount, both map->refcnt and map->usercnt. Then, if bpf_map_new_fd() fails, should handle map->usercnt too. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <sironhide0null@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Hopefully the last pull request for this release. Fingers crossed: 1) Only refcount ESP stats on full sockets, from Martin Willi. 2) Missing barriers in AF_UNIX, from Al Viro. 3) RCU protection fixes in ipv6 route code, from Paolo Abeni. 4) Avoid false positives in untrusted GSO validation, from Willem de Bruijn. 5) Forwarded mesh packets in mac80211 need more tailroom allocated, from Felix Fietkau. 6) Use operstate consistently for linkup in team driver, from George Wilkie. 7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF and PF code paths. 8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni. 9) nfp eBPF code gen fixes from Jiong Wang. 10) bnxt_en firmware timeout fix from Michael Chan. 11) Use after free in udp/udpv6 error handlers, from Paolo Abeni. 12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) net: phy: realtek: Dummy IRQ calls for RTL8366RB tcp: repaired skbs must init their tso_segs net/x25: fix a race in x25_bind() net: dsa: Remove documentation for port_fdb_prepare Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" selftests: fib_tests: sleep after changing carrier. again. net: set static variable an initial value in atl2_probe() net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G bpf, doc: add bpf list as secondary entry to maintainers file udp: fix possible user after free in error handler udpv6: fix possible user after free in error handler fou6: fix proto error handler argument type udpv6: add the required annotation to mib type mdio_bus: Fix use-after-free on device_register fails net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 bnxt_en: Wait longer for the firmware message response to complete. bnxt_en: Fix typo in firmware message timeout logic. nfp: bpf: fix ALU32 high bits clearance bug nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K Documentation: networking: switchdev: Update port parent ID section ...
2019-02-23Merge tag 'irqchip-5.1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier - Core pseudo-NMI handling code - Allow the default irq domain to be retrieved - A new interrupt controller for the Loongson LS1X platform - Affinity support for the SiFive PLIC - Better support for the iMX irqsteer driver - NUMA aware memory allocations for GICv3 - A handful of other fixes (i8259, GICv3, PLIC)
2019-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-02-23 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a bug in BPF's LPM deletion logic to match correct prefix length, from Alban. 2) Fix AF_XDP teardown by not destroying umem prematurely as it is still needed till all outstanding skbs are freed, from Björn. 3) Fix unkillable BPF_PROG_TEST_RUN under preempt kernel by checking signal_pending() outside need_resched() condition which is never triggered there, from Stanislav. 4) Fix two nfp JIT bugs, one in code emission for K-based xor, and another one to explicitly clear upper bits in alu32, from Jiong. 5) Add bpf list address to maintainers file, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22perf, pt, coresight: Fix address filters for vmas with non-zero offsetAlexander Shishkin
Currently, the address range calculation for file-based filters works as long as the vma that maps the matching part of the object file starts from offset zero into the file (vm_pgoff==0). Otherwise, the resulting filter range would be off by vm_pgoff pages. Another related problem is that in case of a partially matching vma, that is, a vma that matches part of a filter region, the filter range size wouldn't be adjusted. Fix the arithmetics around address filter range calculations, taking into account vma offset, so that the entire calculation is done before the filter configuration is passed to the PMU drivers instead of having those drivers do the final bit of arithmetics. Based on the patch by Adrian Hunter <adrian.hunter.intel.com>. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Link: http://lkml.kernel.org/r/20190215115655.63469-3-alexander.shishkin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-22perf: Copy parent's address filter offsets on cloneAlexander Shishkin
When a child event is allocated in the inherit_event() path, the VMA based filter offsets are not copied from the parent, even though the address space mapping of the new task remains the same, which leads to no trace for the new task until exec. Reported-by: Mansour Alharthi <malharthi9@gatech.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Link: http://lkml.kernel.org/r/20190215115655.63469-2-alexander.shishkin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-22bpf, lpm: fix lookup bug in map_delete_elemAlban Crequy
trie_delete_elem() was deleting an entry even though it was not matching if the prefixlen was correct. This patch adds a check on matchlen. Reproducer: $ sudo bpftool map create /sys/fs/bpf/mylpm type lpm_trie key 8 value 1 entries 128 name mylpm flags 1 $ sudo bpftool map update pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 aa bb cc dd value hex 01 $ sudo bpftool map dump pinned /sys/fs/bpf/mylpm key: 10 00 00 00 aa bb cc dd value: 01 Found 1 element $ sudo bpftool map delete pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 ff ff ff ff $ echo $? 0 $ sudo bpftool map dump pinned /sys/fs/bpf/mylpm Found 0 elements A similar reproducer is added in the selftests. Without the patch: $ sudo ./tools/testing/selftests/bpf/test_lpm_map test_lpm_map: test_lpm_map.c:485: test_lpm_delete: Assertion `bpf_map_delete_elem(map_fd, key) == -1 && errno == ENOENT' failed. Aborted With the patch: test_lpm_map runs without errors. Fixes: e454cf595853 ("bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE") Cc: Craig Gallek <kraig@google.com> Signed-off-by: Alban Crequy <alban@kinvolk.io> Acked-by: Craig Gallek <kraig@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-22seccomp, bpf: disable preemption before calling into bpf progAlexei Starovoitov
All BPF programs must be called with preemption disabled. Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-21psi: avoid divide-by-zero crash inside virtual machinesJohannes Weiner
We've been seeing hard-to-trigger psi crashes when running inside VM instances: divide error: 0000 [#1] SMP PTI Modules linked in: [...] CPU: 0 PID: 212 Comm: kworker/0:2 Not tainted 4.16.18-119_fbk9_3817_gfe944c98d695 #119 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: events psi_clock RIP: 0010:psi_update_stats+0x270/0x490 RSP: 0018:ffffc90001117e10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800a35a13f8 RDX: 0000000000000000 RSI: ffff8800a35a1340 RDI: 0000000000000000 RBP: 0000000000000658 R08: ffff8800a35a1470 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000f8502 FS: 0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fbe370fa000 CR3: 00000000b1e3a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: psi_clock+0x12/0x50 process_one_work+0x1e0/0x390 worker_thread+0x2b/0x3c0 ? rescuer_thread+0x330/0x330 kthread+0x113/0x130 ? kthread_create_worker_on_cpu+0x40/0x40 ? SyS_exit_group+0x10/0x10 ret_from_fork+0x35/0x40 Code: 48 0f 47 c7 48 01 c2 45 85 e4 48 89 16 0f 85 e6 00 00 00 4c 8b 49 10 4c 8b 51 08 49 69 d9 f2 07 00 00 48 6b c0 64 4c 8b 29 31 d2 <48> f7 f7 49 69 d5 8d 06 00 00 48 89 c5 4c 69 f0 00 98 0b 00 48 The Code-line points to `period` being 0 inside update_stats(), and we divide by that when calculating that period's pressure percentage. The elapsed period should never be 0. The reason this can happen is due to an off-by-one in the idle time / missing period calculation combined with a coarse sched_clock() in the virtual machine. The target time for aggregation is advanced into the future on a fixed grid to prevent clock drift. So when an aggregation runs after some idle period, we can not just set it to "now + psi_period", but have to calculate the downtime and advance the target time relative to itself. However, if the aggregator was disabled exactly one psi_period (ns), we drop one idle period in the calculation due to a > when we should do >=. In that case, next_update will be advanced from 'now - psi_period' to 'now' when it should be moved to 'now + psi_period'. The run finishes with last_update == next_update == sched_clock(). With hardware clocks, this exact nanosecond match isn't likely in the first place; but if it does happen, the clock will still have moved on and the period non-zero by the time the worker runs. A pointlessly short period, but besides the extra work, no harm no foul. However, a slow sched_clock() like we have on VMs might not have advanced either by the time the worker runs again. And when we calculate the elapsed period, the result, our pressure divisor, will be 0. Ouch. Fix this by correctly handling the situation when the elapsed time between aggregation runs is precisely two periods, and advance the expiration timestamp correctly to period into the future. Link: http://lkml.kernel.org/r/20190214193157.15788-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Łukasz Siudut <lsiudut@fb.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-21workqueue: fix typo in commentLiu Song
qeueue/queue Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-02-21tracing/perf: Use strndup_user() instead of buggy open-coded versionJann Horn
The first version of this method was missing the check for `ret == PATH_MAX`; then such a check was added, but it didn't call kfree() on error, so there was still a small memory leak in the error case. Fix it by using strndup_user() instead of open-coding it. Link: http://lkml.kernel.org/r/20190220165443.152385-1-jannh@google.com Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Fixes: 0eadcc7a7bc0 ("perf/core: Fix perf_uprobe_init()") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-21Merge branch 'topic/dma' into nextMichael Ellerman
Merge hch's big DMA rework series. This is in a topic branch in case he wants to merge it to minimise conflicts.
2019-02-21Merge branch 'ib-qcom-ssbi' into develLinus Walleij
2019-02-21irqdomain: Allow the default irq domain to be retrievedMarc Zyngier
The default irq domain allows legacy code to create irqdomain mappings without having to track the domain it is allocating from. Setting the default domain is a one shot, fire and forget operation, and no effort was made to be able to retrieve this information at a later point in time. Newer irqdomain APIs (the hierarchical stuff) relies on both the irqchip code to track the irqdomain it is allocating from, as well as some form of firmware abstraction to easily identify which piece of HW maps to which irq domain (DT, ACPI). For systems without such firmware (or legacy platform that are getting dragged into the 21st century), things are a bit harder. For these cases (and these cases only!), let's provide a way to retrieve the default domain, allowing the use of the v2 API without having to resort to platform-specific hacks. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-21printk: Pass caller information to log_store().Tetsuo Handa
When thread1 called printk() which did not end with '\n', and then thread2 called printk() which ends with '\n' before thread1 calls pr_cont(), the partial content saved into "struct cont" is flushed by thread2 despite the partial content was generated by thread1. This leads to confusing output as if the partial content was generated by thread2. Fix this problem by passing correct caller information to log_store(). Before: [ T8533] abcdefghijklm [ T8533] ABCDEFGHIJKLMNOPQRSTUVWXYZ [ T8532] nopqrstuvwxyz [ T8532] abcdefghijklmnopqrstuvwxyz [ T8533] abcdefghijklm [ T8533] ABCDEFGHIJKLMNOPQRSTUVWXYZ [ T8532] nopqrstuvwxyz After: [ T8507] abcdefghijklm [ T8508] ABCDEFGHIJKLMNOPQRSTUVWXYZ [ T8507] nopqrstuvwxyz [ T8507] abcdefghijklmnopqrstuvwxyz [ T8507] abcdefghijklm [ T8508] ABCDEFGHIJKLMNOPQRSTUVWXYZ [ T8507] nopqrstuvwxyz Link: http://lkml.kernel.org/r/1550314773-8607-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp To: Dmitry Vyukov <dvyukov@google.com> To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> To: Steven Rostedt <rostedt@goodmis.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [pmladek: broke 80-column rule where it made more harm than good] Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-02-20tracing: Fix spelling mistake: "analagous" -> "analogous"Colin Ian King
There is a spelling mistake in the mini-howto help text. Fix it. Link: http://lkml.kernel.org/r/20190217223222.16479-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Comment why cond_snapshot is checked outside of max_lock protectionSteven Rostedt (VMware)
Before setting tr->cond_snapshot, it must be NULL before it can be updated. It can go to NULL when a trace event hist trigger is created or removed, and can only be modified under the max_lock spin lock. But because it can only be set to something other than NULL under both the max_lock spin lock as well as the trace_types_lock, we can perform the check if it is not NULL only under the trace_types_lock and fail out without having to grab the max_lock spin lock. This is very subtle, and deserves a comment. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Add alternative synthetic event trace action syntaxTom Zanussi
Add a 'trace(synthetic_event_name, params)' alternative to synthetic_event_name(params). Currently, the syntax used for generating synthetic events is to invoke synthetic_event_name(params) i.e. use the synthetic event name as a function call. Users requested a new form that more explicitly shows that the synthetic event is in effect being traced. In this version, a new 'trace()' keyword is used, and the synthetic event name is passed in as the first argument. In addition, for the sake of consistency with other actions, change the documention to emphasize the trace() form over the function-call form, which remains documented as equivalent. Link: http://lkml.kernel.org/r/d082773e50232a001480cf837679a1e01c1a2eb7.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Add hist trigger onchange() handlerTom Zanussi
Add support for a hist:onchange($var) handler, similar to the onmax() handler but triggering whenever there's any change in $var, not just a max. Link: http://lkml.kernel.org/r/dfbc7e4ada242603e9ec3f049b5ad076a07dfd03.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Add hist trigger snapshot() actionTom Zanussi
Add support for hist:handlerXXX($var).snapshot(), which will take a snapshot of the current trace buffer whenever handlerXXX is hit. As a first user, this also adds snapshot() action support for the onmax() handler i.e. hist:onmax($var).snapshot(). Also, the hist trigger key printing is moved into a separate function so the snapshot() action can print a histogram key outside the histogram display - add and use hist_trigger_print_key() for that purpose. Link: http://lkml.kernel.org/r/2f1a952c0dcd8aca8702ce81269581a692396d45.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Add conditional snapshotTom Zanussi
Currently, tracing snapshots are context-free - they capture the ring buffer contents at the time the tracing_snapshot() function was invoked, and nothing else. Additionally, they're always taken unconditionally - the calling code can decide whether or not to take a snapshot, but the data used to make that decision is kept separately from the snapshot itself. This change adds the ability to associate with each trace instance some user data, along with an 'update' function that can use that data to determine whether or not to actually take a snapshot. The update function can then update that data along with any other state (as part of the data presumably), if warranted. Because snapshots are 'global' per-instance, only one user can enable and use a conditional snapshot for any given trace instance. To enable a conditional snapshot (see details in the function and data structure comments), the user calls tracing_snapshot_cond_enable(). Similarly, to disable a conditional snapshot and free it up for other users, tracing_snapshot_cond_disable() should be called. To actually initiate a conditional snapshot, tracing_snapshot_cond() should be called. tracing_snapshot_cond() will invoke the update() callback, allowing the user to decide whether or not to actually take the snapshot and update the user-defined data associated with the snapshot. If the callback returns 'true', tracing_snapshot_cond() will then actually take the snapshot and return. This scheme allows for flexibility in snapshot implementations - for example, by implementing slightly different update() callbacks, snapshots can be taken in situations where the user is only interested in taking a snapshot when a new maximum in hit versus when a value changes in any way at all. Future patches will demonstrate both cases. Link: http://lkml.kernel.org/r/1bea07828d5fd6864a585f83b1eed47ce097eb45.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Generalize hist trigger onmax and save actionTom Zanussi
The action refactor code allowed actions and handlers to be separated, but the existing onmax handler and save action code is still not flexible enough to handle arbitrary coupling. This change generalizes them and in the process makes additional handlers and actions easier to implement. The onmax action can be broken up and thought of as two separate components - a variable to be tracked (the parameter given to the onmax($var_to_track) function) and an invisible variable created to save the ongoing result of doing something with that variable, such as saving the max value of that variable so far seen. Separating it out like this and renaming it appropriately allows us to use the same code for similar tracking functions such as onchange($var_to_track), which would just track the last value seen rather than the max seen so far, which is useful in some situations. Additionally, because different handlers and actions may want to save and access data differently e.g. save and retrieve tracking values as local variables vs something more global, save_val() and get_val() interface functions are introduced and max-specific implementations are used instead. The same goes for the code that checks whether a maximum has been hit - a generic check_val() interface and max-checking implementation is used instead, which allows future patches to make use of he same code using their own implemetations of similar functionality. Link: http://lkml.kernel.org/r/980ea73dd8e3f36db3d646f99652f8fed42b77d4.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Split up onmatch action dataTom Zanussi
Currently, the onmatch action data binds the onmatch action to data related to synthetic event generation. Since we want to allow the onmatch handler to potentially invoke a different action, and because we expect other handlers to generate synthetic events, we need to separate the data related to these two functions. Also rename the onmatch data to something more descriptive, and create and use common action data destroy function. Link: http://lkml.kernel.org/r/b9abbf9aae69fe3920cdc8ddbcaad544dd258d78.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Refactor hist trigger action codeTom Zanussi
The hist trigger action code currently implements two essentially hard-coded pairs of 'actions' - onmax(), which tracks a variable and saves some event fields when a max is hit, and onmatch(), which is hard-coded to generate a synthetic event. These hardcoded pairs (track max/save fields and detect match/generate synthetic event) should really be decoupled into separate components that can then be arbitrarily combined. The first component of each pair (track max/detect match) is called a 'handler' in the new code, while the second component (save fields/generate synthetic event) is called an 'action' in this scheme. This change refactors the action code to reflect this split by adding two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two actions, ACTION_SAVE and ACTION_TRACE. The new code combines them to produce the existing ONMATCH/TRACE and ONMAX/SAVE functionality, but doesn't implement the other combinations now possible. Future patches will expand these to further useful cases, such as ONMAX/TRACE, as well as add additional handlers and actions such as ONCHANGE and SNAPSHOT. Also, add abbreviated documentation for handlers and actions to README. Link: http://lkml.kernel.org/r/98bfdd48c1b4ff29fc5766442f99f5bc3c34b76b.1550100284.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20tracing: Do not free iter->trace in fail path of tracing_open_pipe()zhangyi (F)
Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") use the current tracer instead of the copy in tracing_open_pipe(), but it forget to remove the freeing sentence in the error path. There's an error path that can call kfree(iter->trace) after the iter->trace was assigned to tr->current_trace, which would be bad to free. Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com Cc: stable@vger.kernel.org Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20dma-mapping: remove the DMA_MEMORY_EXCLUSIVE flagChristoph Hellwig
All users of dma_declare_coherent want their allocations to be exclusive, so default to exclusive allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20dma-mapping: remove dma_mark_declared_memory_occupiedChristoph Hellwig
This API is not used anywhere, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-02-20dma-mapping: move CONFIG_DMA_CMA to kernel/dma/KconfigChristoph Hellwig
This is where all the related code already lives. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20dma-mapping: improve selection of dma_declare_coherent availabilityChristoph Hellwig
This API is primarily used through DT entries, but two architectures and two drivers call it directly. So instead of selecting the config symbol for random architectures pull it in implicitly for the actual users. Also rename the Kconfig option to describe the feature better. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul Burton <paul.burton@mips.com> # MIPS Acked-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20Merge tag 'gpio-v5.1-updates-for-linus-part-2' of ↵Linus Walleij
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into devel gpio: updates for v5.1 - part 2 - gpio-mockup updates improving the user-space testing interface and adding line state tracking for correct edge interrupts - interrupt simulator patch exposing the irq type configuration to users
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>