summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-07-19Merge tag 'trace-v5.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Eiichi Tsukata found a small bug from the fixup of the stack code Removing ULONG_MAX as the marker for the user stack trace end, made the tracing code not know where the end is. The end is now marked with a zero (NULL) pointer. Eiichi fixed this in the tracing code" * tag 'trace-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix user stack trace "??" output
2019-07-19Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: perf_event_get(): don't bother with fget_raw() vfs: update d_make_root() description
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix AF_XDP cq entry leak, from Ilya Maximets. 2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit. 3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika. 4) Fix handling of neigh timers wrt. entries added by userspace, from Lorenzo Bianconi. 5) Various cases of missing of_node_put(), from Nishka Dasgupta. 6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing. 7) Various RDS layer fixes, from Gerd Rausch. 8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from Cong Wang. 9) Fix FIB source validation checks over loopback, also from Cong Wang. 10) Use promisc for unsupported number of filters, from Justin Chen. 11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) tcp: fix tcp_set_congestion_control() use from bpf hook ag71xx: fix return value check in ag71xx_probe() ag71xx: fix error return code in ag71xx_probe() usb: qmi_wwan: add D-Link DWM-222 A2 device ID bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips. net: dsa: sja1105: Fix missing unlock on error in sk_buff() gve: replace kfree with kvfree selftests/bpf: fix test_xdp_noinline on s390 selftests/bpf: fix "valid read map access into a read-only array 1" on s390 net/mlx5: Replace kfree with kvfree MAINTAINERS: update netsec driver ipv6: Unlink sibling route in case of failure liquidio: Replace vmalloc + memset with vzalloc udp: Fix typo in net/ipv4/udp.c net: bcmgenet: use promisc for unsupported filters ipv6: rt6_check should return NULL if 'from' is NULL tipc: initialize 'validated' field of received packets selftests: add a test case for rp_filter fib: relax source validation check for loopback packets mlxsw: spectrum: Do not process learned records with a dummy FID ...
2019-07-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: "The rest of MM and a kernel-wide procfs cleanup. Summary of the more significant patches: - Patch series "mm/memory_hotplug: Factor out memory block devicehandling", v3. David Hildenbrand. Some spring-cleaning of the memory hotplug code, notably in drivers/base/memory.c - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang Shi. Fix /proc/pid/smaps output for THP pages used in shmem. - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit. Bugfix and speedup for kernel/resource.c - Patch series "mm: Further memory block device cleanups", David Hildenbrand. More spring-cleaning of the memory hotplug code. - Patch series "mm: Sub-section memory hotplug support". Dan Williams. Generalise the memory hotplug code so that pmem can use it more completely. Then remove the hacks from the libnvdimm code which were there to work around the memory-hotplug code's constraints. - "proc/sysctl: add shared variables for range check", Matteo Croce. We have about 250 instances of int zero; ... .extra1 = &zero, in the tree. This is a tree-wide sweep to make all those private "zero"s and "one"s use global variables. Alas, it isn't practical to make those two global integers const" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits) proc/sysctl: add shared variables for range check mm: migrate: remove unused mode argument mm/sparsemem: cleanup 'section number' data types libnvdimm/pfn: stop padding pmem namespaces to section alignment libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields mm/devm_memremap_pages: enable sub-section remap mm: document ZONE_DEVICE memory-model implications mm/sparsemem: support sub-section hotplug mm/sparsemem: prepare for sub-section ranges mm: kill is_dev_zone() helper mm/hotplug: kill is_dev_zone() usage in __remove_pages() mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal mm/sparsemem: add helpers track active portions of a section at boot mm/sparsemem: introduce a SECTION_IS_EARLY flag mm/sparsemem: introduce struct mem_section_usage drivers/base/memory.c: get rid of find_memory_block_hinted() mm/memory_hotplug: move and simplify walk_memory_blocks() mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns mm: make register_mem_sect_under_node() static ...
2019-07-19tracing: Fix user stack trace "??" outputEiichi Tsukata
Commit c5c27a0a5838 ("x86/stacktrace: Remove the pointless ULONG_MAX marker") removes ULONG_MAX marker from user stack trace entries but trace_user_stack_print() still uses the marker and it outputs unnecessary "??". For example: less-1911 [001] d..2 34.758944: <user stack trace> => <00007f16f2295910> => ?? => ?? => ?? => ?? => ?? => ?? => ?? The user stack trace code zeroes the storage before saving the stack, so if the trace is shorter than the maximum number of entries it can terminate the print loop if a zero entry is detected. Link: http://lkml.kernel.org/r/20190630085438.25545-1-devel@etsukata.com Cc: stable@vger.kernel.org Fixes: 4285f2fcef80 ("tracing: Remove the ULONG_MAX stack trace hackery") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/devm_memremap_pages: enable sub-section remapDan Williams
Teach devm_memremap_pages() about the new sub-section capabilities of arch_{add,remove}_memory(). Effectively, just replace all usage of align_start, align_end, and align_size with res->start, res->end, and resource_size(res). The existing sanity check will still make sure that the two separate remap attempts do not collide within a sub-section (2MB on x86). Link: http://lkml.kernel.org/r/156092355542.979959.10060071713397030576.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: Michal Hocko <mhocko@suse.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18resource: avoid unnecessary lookups in find_next_iomem_res()Nadav Amit
find_next_iomem_res() shows up to be a source for overhead in dax benchmarks. Improve performance by not considering children of the tree if the top level does not match. Since the range of the parents should include the range of the children such check is redundant. Running sysbench on dax (pmem emulation, with write_cache disabled): sysbench fileio --file-total-size=3G --file-test-mode=rndwr \ --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run Provides the following results: events (avg/stddev) ------------------- 5.2-rc3: 1247669.0000/16075.39 w/patch: 1286320.5000/16402.72 (+3%) Link: http://lkml.kernel.org/r/20190613045903.4922-3-namit@vmware.com Signed-off-by: Nadav Amit <namit@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18resource: fix locking in find_next_iomem_res()Nadav Amit
Since resources can be removed, locking should ensure that the resource is not removed while accessing it. However, find_next_iomem_res() does not hold the lock while copying the data of the resource. Keep holding the lock while the data is copied. While at it, change the return value to a more informative value. It is disregarded by the callers. [akpm@linux-foundation.org: fix find_next_iomem_res() documentation] Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface") Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-07-18 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) verifier precision propagation fix, from Andrii. 2) BTF size fix for typedefs, from Andrii. 3) a bunch of big endian fixes, from Ilya. 4) wide load from bpf_sock_addr fixes, from Stanislav. 5) a bunch of misc fixes from a number of developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18Merge tag 'modules-for-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "Summary of modules changes for the 5.3 merge window: - Code fixes and cleanups - Fix bug where set_memory_x() wasn't being called when rodata=n - Fix bug where -EEXIST was being returned for going modules - Allow arches to override module_exit_section()" * tag 'modules-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: modules: fix compile error if don't have strict module rwx ARM: module: recognize unwind exit sections module: allow arch overrides for .exit section names modules: fix BUG when load module with rodata=n kernel/module: Fix mem leak in module_add_modinfo_attrs kernel: module: Use struct_size() helper kernel/module.c: Only return -EEXIST for modules that have finished loading
2019-07-18Merge tag 'trace-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The main changes in this release include: - Add user space specific memory reading for kprobes - Allow kprobes to be executed earlier in boot The rest are mostly just various clean ups and small fixes" * tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Make trace_get_fields() global tracing: Let filter_assign_type() detect FILTER_PTR_STRING tracing: Pass type into tracing_generic_entry_update() ftrace/selftest: Test if set_event/ftrace_pid exists before writing ftrace/selftests: Return the skip code when tracing directory not configured in kernel tracing/kprobe: Check registered state using kprobe tracing/probe: Add trace_event_call accesses APIs tracing/probe: Add probe event name and group name accesses APIs tracing/probe: Add trace flag access APIs for trace_probe tracing/probe: Add trace_event_file access APIs for trace_probe tracing/probe: Add trace_event_call register API for trace_probe tracing/probe: Add trace_probe init and free functions tracing/uprobe: Set print format when parsing command tracing/kprobe: Set print format right after parsed command kprobes: Fix to init kprobes in subsys_initcall tracepoint: Use struct_size() in kmalloc() ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS ftrace: Enable trampoline when rec count returns back to one tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline tracing: Make a separate config for trace event self tests ...
2019-07-18Merge branch 'for-linus-5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb updates from Konrad Rzeszutek Wilk: "One compiler fix, and a bug-fix in swiotlb_nr_tbl() and swiotlb_max_segment() to check also for no_iotlb_memory" * 'for-linus-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: fix phys_addr_t overflow warning swiotlb: Return consistent SWIOTLB segments/nr_tbl swiotlb: Group identical cleanup in swiotlb_cleanup()
2019-07-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "VM: - z3fold fixes and enhancements by Henry Burns and Vitaly Wool - more accurate reclaimed slab caches calculations by Yafang Shao - fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by Christoph Hellwig - !CONFIG_MMU fixes by Christoph Hellwig - new novmcoredd parameter to omit device dumps from vmcore, by Kairui Song - new test_meminit module for testing heap and pagealloc initialization, by Alexander Potapenko - ioremap improvements for huge mappings, by Anshuman Khandual - generalize kprobe page fault handling, by Anshuman Khandual - device-dax hotplug fixes and improvements, by Pavel Tatashin - enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V - add pte_devmap() support for arm64, by Robin Murphy - unify locked_vm accounting with a helper, by Daniel Jordan - several misc fixes core/lib: - new typeof_member() macro including some users, by Alexey Dobriyan - make BIT() and GENMASK() available in asm, by Masahiro Yamada - changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better code generation, by Alexey Dobriyan - rbtree code size optimizations, by Michel Lespinasse - convert struct pid count to refcount_t, by Joel Fernandes get_maintainer.pl: - add --no-moderated switch to skip moderated ML's, by Joe Perches misc: - ptrace PTRACE_GET_SYSCALL_INFO interface - coda updates - gdb scripts, various" [ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits) fs/select.c: use struct_size() in kmalloc() mm: add account_locked_vm utility function arm64: mm: implement pte_devmap support mm: introduce ARCH_HAS_PTE_DEVMAP mm: clean up is_device_*_page() definitions mm/mmap: move common defines to mman-common.h mm: move MAP_SYNC to asm-generic/mman-common.h device-dax: "Hotremove" persistent memory that is used like normal RAM mm/hotplug: make remove_memory() interface usable device-dax: fix memory and resource leak if hotplug fails include/linux/lz4.h: fix spelling and copy-paste errors in documentation ipc/mqueue.c: only perform resource calculation if user valid include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures scripts/gdb: add helpers to find and list devices scripts/gdb: add lx-genpd-summary command drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl kernel/pid.c: convert struct pid count to refcount_t drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining() select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR ...
2019-07-16kernel/pid.c: convert struct pid count to refcount_tJoel Fernandes (Google)
struct pid's count is an atomic_t field used as a refcount. Use refcount_t for it which is basically atomic_t but does additional checking to prevent use-after-free bugs. For memory ordering, the only change is with the following: - if ((atomic_read(&pid->count) == 1) || - atomic_dec_and_test(&pid->count)) { + if (refcount_dec_and_test(&pid->count)) { kmem_cache_free(ns->pid_cachep, pid); Here the change is from: Fully ordered --> RELEASE + ACQUIRE (as per refcount-vs-atomic.rst) This ACQUIRE should take care of making sure the free happens after the refcount_dec_and_test(). The above hunk also removes atomic_read() since it is not needed for the code to work and it is unclear how beneficial it is. The removal lets refcount_dec_and_test() check for cases where get_pid() happened before the object was freed. Link: http://lkml.kernel.org/r/20190701183826.191936-1-joel@joelfernandes.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16signal: simplify set_user_sigmask/restore_user_sigmaskOleg Nesterov
task->saved_sigmask and ->restore_sigmask are only used in the ret-from- syscall paths. This means that set_user_sigmask() can save ->blocked in ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked was modified. This way the callers do not need 2 sigset_t's passed to set/restore and restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns into the trivial helper which just calls restore_saved_sigmask(). Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Eric Wong <e@80x24.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David Laight <David.Laight@aculab.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16ptrace: add PTRACE_GET_SYSCALL_INFO requestElvira Khabirova
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain details of the syscall the tracee is blocked in. There are two reasons for a special syscall-related ptrace request. Firstly, with the current ptrace API there are cases when ptracer cannot retrieve necessary information about syscalls. Some examples include: * The notorious int-0x80-from-64-bit-task issue. See [1] for details. In short, if a 64-bit task performs a syscall through int 0x80, its tracer has no reliable means to find out that the syscall was, in fact, a compat syscall, and misidentifies it. * Syscall-enter-stop and syscall-exit-stop look the same for the tracer. Common practice is to keep track of the sequence of ptrace-stops in order not to mix the two syscall-stops up. But it is not as simple as it looks; for example, strace had a (just recently fixed) long-standing bug where attaching strace to a tracee that is performing the execve system call led to the tracer identifying the following syscall-exit-stop as syscall-enter-stop, which messed up all the state tracking. * Since the introduction of commit 84d77d3f06e7 ("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA and process_vm_readv become unavailable when the process dumpable flag is cleared. On such architectures as ia64 this results in all syscall arguments being unavailable for the tracer. Secondly, ptracers also have to support a lot of arch-specific code for obtaining information about the tracee. For some architectures, this requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall argument and return value. ptrace(2) man page: long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data); ... PTRACE_GET_SYSCALL_INFO Retrieve information about the syscall that caused the stop. The information is placed into the buffer pointed by "data" argument, which should be a pointer to a buffer of type "struct ptrace_syscall_info". The "addr" argument contains the size of the buffer pointed to by "data" argument (i.e., sizeof(struct ptrace_syscall_info)). The return value contains the number of bytes available to be written by the kernel. If the size of data to be written by the kernel exceeds the size specified by "addr" argument, the output is truncated. [ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO] Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org> Co-developed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Eugene Syromyatnikov <esyr@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greentime Hu <greentime@andestech.com> Cc: Helge Deller <deller@gmx.de> [parisc] Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: James Hogan <jhogan@kernel.org> Cc: kbuild test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16kernel: fix typos and some coding style in commentsWeitao Hou
fix lenght to length Link: http://lkml.kernel.org/r/20190521050937.4370-1-houweitaoo@gmail.com Signed-off-by: Weitao Hou <houweitaoo@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16Merge tag 'docs/v5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull rst conversion of docs from Mauro Carvalho Chehab: "As agreed with Jon, I'm sending this big series directly to you, c/c him, as this series required a special care, in order to avoid conflicts with other trees" * tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits) docs: kbuild: fix build with pdf and fix some minor issues docs: block: fix pdf output docs: arm: fix a breakage with pdf output docs: don't use nested tables docs: gpio: add sysfs interface to the admin-guide docs: locking: add it to the main index docs: add some directories to the main documentation index docs: add SPDX tags to new index files docs: add a memory-devices subdir to driver-api docs: phy: place documentation under driver-api docs: serial: move it to the driver-api docs: driver-api: add remaining converted dirs to it docs: driver-api: add xilinx driver API documentation docs: driver-api: add a series of orphaned documents docs: admin-guide: add a series of orphaned documents docs: cgroup-v1: add it to the admin-guide book docs: aoe: add it to the driver-api book docs: add some documentation dirs to the driver-api book docs: driver-model: move it to the driver-api book docs: lp855x-driver.rst: add it to the driver-api book ...
2019-07-16tracing: Make trace_get_fields() globalCong Wang
trace_get_fields() is the only way to read tracepoint fields at run time, as their fields are defined at compile-time with macros. Make this function visible to all users and it will be used by trace event injection code to calculate the size of a tracepoint entry. Link: http://lkml.kernel.org/r/20190525165802.25944-4-xiyou.wangcong@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing: Let filter_assign_type() detect FILTER_PTR_STRINGCong Wang
filter_assign_type() could detect dynamic string and static string, but not string pointers. Teach filter_assign_type() to detect string pointers, and this will be needed by trace event injection code. BTW, trace event hist uses FILTER_PTR_STRING too. Link: http://lkml.kernel.org/r/20190525165802.25944-3-xiyou.wangcong@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing: Pass type into tracing_generic_entry_update()Cong Wang
All callers of tracing_generic_entry_update() have to initialize entry->type, so let's just simply move it inside. Link: http://lkml.kernel.org/r/20190525165802.25944-2-xiyou.wangcong@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/kprobe: Check registered state using kprobeMasami Hiramatsu
Change registered check only by trace_kprobe and remove TP_FLAG_REGISTERED from trace_probe, since this feature is only used for trace_kprobe. Link: http://lkml.kernel.org/r/155931588704.28323.4952266828256245833.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/probe: Add trace_event_call accesses APIsMasami Hiramatsu
Add trace_event_call access APIs for trace_probe. Instead of accessing trace_probe.call directly, use those accesses by trace_probe_event_call() method. This hides the relationship of trace_event_call and trace_probe from trace_kprobe and trace_uprobe. Link: http://lkml.kernel.org/r/155931587711.28323.8335129014686133120.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/probe: Add probe event name and group name accesses APIsMasami Hiramatsu
Add trace_probe_name() and trace_probe_group_name() functions for accessing probe name and group name of trace_probe. Link: http://lkml.kernel.org/r/155931586717.28323.8738615064952254761.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/probe: Add trace flag access APIs for trace_probeMasami Hiramatsu
Add trace_probe_test/set/clear_flag() functions for accessing trace_probe.flag field. This flags field should not be accessed directly. Link: http://lkml.kernel.org/r/155931585683.28323.314290023236905988.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/probe: Add trace_event_file access APIs for trace_probeMasami Hiramatsu
Add trace_event_file access APIs for trace_probe data structure. This simplifies enabling/disabling operations in uprobe and kprobe events so that those don't touch deep inside the trace_probe. This also removing a redundant synchronization when the kprobe event is used from perf, since the perf itself uses tracepoint_synchronize_unregister() after disabling (ftrace- defined) event, thus we don't have to synchronize in that path. Also we don't need to identify local trace_kprobe too anymore. Link: http://lkml.kernel.org/r/155931584587.28323.372301976283354629.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/probe: Add trace_event_call register API for trace_probeMasami Hiramatsu
Since trace_event_call is a field of trace_probe, these operations should be done in trace_probe.c. trace_kprobe and trace_uprobe use new functions to register/unregister trace_event_call. Link: http://lkml.kernel.org/r/155931583643.28323.14828411185591538876.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/probe: Add trace_probe init and free functionsMasami Hiramatsu
Add common trace_probe init and cleanup function in trace_probe.c, and use it from trace_kprobe.c and trace_uprobe.c Link: http://lkml.kernel.org/r/155931582664.28323.5934870189034740822.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/uprobe: Set print format when parsing commandMasami Hiramatsu
Set event call's print format right after parsed command for simplifying (un)register_uprobe_event(). Link: http://lkml.kernel.org/r/155931581659.28323.5404667166417404076.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16tracing/kprobe: Set print format right after parsed commandMasami Hiramatsu
Set event call's print format right after parsed command for simplifying (un)register_kprobe_event(). Link: http://lkml.kernel.org/r/155931580625.28323.5158822928646225903.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16kprobes: Fix to init kprobes in subsys_initcallMasami Hiramatsu
Since arm64 kernel initializes breakpoint trap vector in arch_initcall(), initializing kprobe (and run smoke test) in postcore_initcall() causes a kernel panic. To fix this issue, move the kprobe initialization in subsys_initcall() (which is called right afer the arch_initcall). In-kernel kprobe users (ftrace and bpf) are using fs_initcall() which is called after subsys_initcall(), so this shouldn't cause more problem. Link: http://lkml.kernel.org/r/155956708268.12228.10363800793132214198.stgit@devnote2 Link: http://lkml.kernel.org/r/20190709153755.GB10123@lakrids.cambridge.arm.com Reported-by: Anders Roxell <anders.roxell@linaro.org> Fixes: b5f8b32c93b2 ("kprobes: Initialize kprobes at postcore_initcall") Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16Merge tag 'for-linus-20190715' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd and clone3 fixes from Christian Brauner: "This contains a bugfix for CLONE_PIDFD when used with the legacy clone syscall, two fixes to ensure that syscall numbering and clone3 entrypoint implementations will stay consistent, and an update for the maintainers file: - The addition of clone3 broke CLONE_PIDFD for legacy clone on all architectures that use do_fork() directly instead of calling the clone syscall itself. (Fwiw, cleaning do_fork() up is on my todo.) The reason this happened was that during conversion of _do_fork() to use struct kernel_clone_args we missed that do_fork() is called directly by various architectures. This is fixed by making sure that the pidfd argument in struct kernel_clone_args is correctly initialized with the parent_tidptr argument passed down from do_fork(). Additionally, do_fork() missed a check to make CLONE_PIDFD and CLONE_PARENT_SETTID mutually exclusive just a clone() does. This is now fixed too. - When clone3() was introduced we skipped architectures that require special handling for fork-like syscalls. Their syscall tables did not contain any mention of clone3(). To make sure that Arnd's work to make syscall numbers on all architectures identical (minus alpha) was not for naught we are placing a comment in all syscall tables that do not yet implement clone3(). The comment makes it clear that 435 is reserved for clone3 and should not be used. - Also, this contains a patch to make the clone3() syscall definition in asm-generic/unist.h conditional on __ARCH_WANT_SYS_CLONE3. This lets us catch new architectures that implicitly make use of clone3 without setting __ARCH_WANT_SYS_CLONE3 which is a good indicator that they did not check whether it needs special treatment or not. - Finally, this contains a patch to add me as maintainer for pidfd stuff so people can start blaming me (more)" * tag 'for-linus-20190715' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: MAINTAINERS: add new entry for pidfd api unistd: protect clone3 via __ARCH_WANT_SYS_CLONE3 arch: mark syscall number 435 reserved for clone3 clone: fix CLONE_PIDFD support
2019-07-15Merge tag 'pci-v5.3-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration changes: - Evaluate PCI Boot Configuration _DSM to learn if firmware wants us to preserve its resource assignments (Benjamin Herrenschmidt) - Simplify resource distribution (Nicholas Johnson) - Decode 32 GT/s link speed (Gustavo Pimentel) Virtualization: - Fix incorrect caching of VF config space size (Alex Williamson) - Fix VF driver probing sysfs knobs (Alex Williamson) Peer-to-peer DMA: - Fix dma_virt_ops check (Logan Gunthorpe) Altera host bridge driver: - Allow building as module (Ley Foon Tan) Armada 8K host bridge driver: - add PHYs support (Miquel Raynal) DesignWare host bridge driver: - Export APIs to support removable loadable module (Vidya Sagar) - Enable Relaxed Ordering erratum workaround only on Tegra20 & Tegra30 (Vidya Sagar) Hyper-V host bridge driver: - Fix use-after-free in eject (Dexuan Cui) Mobiveil host bridge driver: - Clean up and fix many issues, including non-identify mapped windows, 64-bit windows, multi-MSI, class code, INTx clearing (Hou Zhiqiang) Qualcomm host bridge driver: - Use clk bulk API for 2.4.0 controllers (Bjorn Andersson) - Add QCS404 support (Bjorn Andersson) - Assert PERST for at least 100ms (Niklas Cassel) R-Car host bridge driver: - Add r8a774a1 DT support (Biju Das) Tegra host bridge driver: - Add support for Gen2, opportunistic UpdateFC and ACK (PCIe protocol details) AER, GPIO-based PERST# (Manikanta Maddireddy) - Fix many issues, including power-on failure cases, interrupt masking in suspend, UPHY settings, AFI dynamic clock gating, pending DLL transactions (Manikanta Maddireddy) Xilinx host bridge driver: - Fix NWL Multi-MSI programming (Bharat Kumar Gogada) Endpoint support: - Fix 64bit BAR support (Alan Mikhak) - Fix pcitest build issues (Alan Mikhak, Andy Shevchenko) Bug fixes: - Fix NVIDIA GPU multi-function power dependencies (Abhishek Sahu) - Fix NVIDIA GPU HDA enablement issue (Lukas Wunner) - Ignore lockdep for sysfs "remove" (Marek Vasut) Misc: - Convert docs to reST (Changbin Du, Mauro Carvalho Chehab)" * tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (107 commits) PCI: Enable NVIDIA HDA controllers tools: PCI: Fix installation when `make tools/pci_install` PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB PCI: Fix typos and whitespace errors PCI: mobiveil: Fix INTx interrupt clearing in mobiveil_pcie_isr() PCI: mobiveil: Fix infinite-loop in the INTx handling function PCI: mobiveil: Move PCIe PIO enablement out of inbound window routine PCI: mobiveil: Add upper 32-bit PCI base address setup in inbound window PCI: mobiveil: Add upper 32-bit CPU base address setup in outbound window PCI: mobiveil: Mask out hardcoded bits in inbound/outbound windows setup PCI: mobiveil: Clear the control fields before updating it PCI: mobiveil: Add configured inbound windows counter PCI: mobiveil: Fix the valid check for inbound and outbound windows PCI: mobiveil: Clean-up program_{ib/ob}_windows() PCI: mobiveil: Remove an unnecessary return value check PCI: mobiveil: Fix error return values PCI: mobiveil: Refactor the MEM/IO outbound window initialization PCI: mobiveil: Make some register updates more readable PCI: mobiveil: Reformat the code for readability dt-bindings: PCI: mobiveil: Change gpio_slave and apb_csr to optional ...
2019-07-15bpf: fix BTF verifier size resolution logicAndrii Nakryiko
BTF verifier has a size resolution bug which in some circumstances leads to invalid size resolution for, e.g., TYPEDEF modifier. This happens if we have [1] PTR -> [2] TYPEDEF -> [3] ARRAY, in which case due to being in pointer context ARRAY size won't be resolved (because for pointer it doesn't matter, so it's a sink in pointer context), but it will be permanently remembered as zero for TYPEDEF and TYPEDEF will be marked as RESOLVED. Eventually ARRAY size will be resolved correctly, but TYPEDEF resolved_size won't be updated anymore. This, subsequently, will lead to erroneous map creation failure, if that TYPEDEF is specified as either key or value, as key_size/value_size won't correspond to resolved size of TYPEDEF (kernel will believe it's zero). Note, that if BTF was ordered as [1] ARRAY <- [2] TYPEDEF <- [3] PTR, this won't be a problem, as by the time we get to TYPEDEF, ARRAY's size is already calculated and stored. This bug manifests itself in rejecting BTF-defined maps that use array typedef as a value type: typedef int array_t[16]; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(value, array_t); /* i.e., array_t *value; */ } test_map SEC(".maps"); The fix consists on not relying on modifier's resolved_size and instead using modifier's resolved_id (type ID for "concrete" type to which modifier eventually resolves) and doing size determination for that resolved type. This allow to preserve existing "early DFS termination" logic for PTR or STRUCT_OR_ARRAY contexts, but still do correct size determination for modifier types. Fixes: eb3f595dab40 ("bpf: btf: Validate type reference") Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-15docs: cgroup-v1: add it to the admin-guide bookMauro Carvalho Chehab
Those files belong to the admin guide, so add them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15docs: admin-guide: move sysctl directory to itMauro Carvalho Chehab
The stuff under sysctl describes /sys interface from userspace point of view. So, add it to the admin-guide and remove the :orphan: from its index file. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15docs: sysctl: convert to ReSTMauro Carvalho Chehab
Rename the /proc/sys/ documentation files to ReST, using the README file as a template for an index.rst, adding the other files there via TOC markup. Despite being written on different times with different styles, try to make them somewhat coherent with a similar look and feel, ensuring that they'll look nice as both raw text file and as via the html output produced by the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15docs: locking: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
Convert the locking documents to ReST and add them to the kernel development book where it belongs. Most of the stuff here is just to make Sphinx to properly parse the text file, as they're already in good shape, not requiring massive changes in order to be parsed. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
2019-07-14Merge tag 'for-linus-hmm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull HMM updates from Jason Gunthorpe: "Improvements and bug fixes for the hmm interface in the kernel: - Improve clarity, locking and APIs related to the 'hmm mirror' feature merged last cycle. In linux-next we now see AMDGPU and nouveau to be using this API. - Remove old or transitional hmm APIs. These are hold overs from the past with no users, or APIs that existed only to manage cross tree conflicts. There are still a few more of these cleanups that didn't make the merge window cut off. - Improve some core mm APIs: - export alloc_pages_vma() for driver use - refactor into devm_request_free_mem_region() to manage DEVICE_PRIVATE resource reservations - refactor duplicative driver code into the core dev_pagemap struct - Remove hmm wrappers of improved core mm APIs, instead have drivers use the simplified API directly - Remove DEVICE_PUBLIC - Simplify the kconfig flow for the hmm users and core code" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits) mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR mm: remove the HMM config option mm: sort out the DEVICE_PRIVATE Kconfig mess mm: simplify ZONE_DEVICE page private data mm: remove hmm_devmem_add mm: remove hmm_vma_alloc_locked_page nouveau: use devm_memremap_pages directly nouveau: use alloc_page_vma directly PCI/P2PDMA: use the dev_pagemap internal refcount device-dax: use the dev_pagemap internal refcount memremap: provide an optional internal refcount in struct dev_pagemap memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag memremap: remove the data field in struct dev_pagemap memremap: add a migrate_to_ram method to struct dev_pagemap_ops memremap: lift the devmap_enable manipulation into devm_memremap_pages memremap: pass a struct dev_pagemap to ->kill and ->cleanup memremap: move dev_pagemap callbacks into a separate structure memremap: validate the pagemap type passed to devm_memremap_pages mm: factor out a devm_request_free_mem_region helper mm: export alloc_pages_vma ...
2019-07-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A number of PMU driver corner case fixes, a race fix, an event grouping fix, plus a bunch of tooling fixes/updates" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) perf/x86/intel: Fix spurious NMI on fixed counter perf/core: Fix exclusive events' grouping perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs perf/core: Fix race between close() and fork() perf intel-pt: Fix potential NULL pointer dereference found by the smatch tool perf intel-bts: Fix potential NULL pointer dereference found by the smatch tool perf script: Assume native_arch for pipe mode perf scripts python: export-to-sqlite.py: Fix DROP VIEW power_events_view perf scripts python: export-to-postgresql.py: Fix DROP VIEW power_events_view perf hists browser: Fix potential NULL pointer dereference found by the smatch tool perf cs-etm: Fix potential NULL pointer dereference found by the smatch tool perf parse-events: Remove unused variable: error perf parse-events: Remove unused variable 'i' perf metricgroup: Add missing list_del_init() when flushing egroups list perf tools: Use list_del_init() more thorougly perf tools: Use zfree() where applicable tools lib: Adopt zalloc()/zfree() from tools/perf perf tools: Move get_current_dir_name() cond prototype out of util.h perf namespaces: Move the conditional setns() prototype to namespaces.h ...
2019-07-14Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "A single fix for a locking statistics bug" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix lock used or unused stats error
2019-07-14clone: fix CLONE_PIDFD supportDmitry V. Levin
The introduction of clone3 syscall accidentally broke CLONE_PIDFD support in traditional clone syscall on compat x86 and those architectures that use do_fork to implement clone syscall. This bug was found by strace test suite. Link: https://strace.io/logs/strace/2019-07-12 Fixes: 7f192e3cd316 ("fork: add clone3") Bisected-and-tested-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Link: https://lore.kernel.org/r/20190714162047.GB10389@altlinux.org Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-14Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a sched statistics related bug that would trigger a kernel warning on certain configs" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix preempt warning in ttwu
2019-07-13locking/lockdep: Fix lock used or unused stats errorYuyang Du
The stats variable nr_unused_locks is incremented every time a new lock class is register and decremented when the lock is first used in __lock_acquire(). And after all, it is shown and checked in lockdep_stats. However, under configurations that either CONFIG_TRACE_IRQFLAGS or CONFIG_PROVE_LOCKING is not defined: The commit: 091806515124b20 ("locking/lockdep: Consolidate lock usage bit initialization") missed marking the LOCK_USED flag at IRQ usage initialization because as mark_usage() is not called. And the commit: 886532aee3cd42d ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING") further made mark_lock() not defined such that the LOCK_USED cannot be marked at all when the lock is first acquired. As a result, we fix this by not showing and checking the stats under such configurations for lockdep_stats. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Yuyang Du <duyuyang@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: arnd@arndb.de Cc: frederic@kernel.org Link: https://lkml.kernel.org/r/20190709101522.9117-1-duyuyang@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-13sched/core: Fix preempt warning in ttwuPeter Zijlstra
John reported a DEBUG_PREEMPT warning caused by commit: aacedf26fb76 ("sched/core: Optimize try_to_wake_up() for local wakeups") I overlooked that ttwu_stat() requires preemption disabled. Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: aacedf26fb76 ("sched/core: Optimize try_to_wake_up() for local wakeups") Link: https://lkml.kernel.org/r/20190710105736.GK3402@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-13perf/core: Fix exclusive events' groupingAlexander Shishkin
So far, we tried to disallow grouping exclusive events for the fear of complications they would cause with moving between contexts. Specifically, moving a software group to a hardware context would violate the exclusivity rules if both groups contain matching exclusive events. This attempt was, however, unsuccessful: the check that we have in the perf_event_open() syscall is both wrong (looks at wrong PMU) and insufficient (group leader may still be exclusive), as can be illustrated by running: $ perf record -e '{intel_pt//,cycles}' uname $ perf record -e '{cycles,intel_pt//}' uname ultimately successfully. Furthermore, we are completely free to trigger the exclusivity violation by: perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}' even though the helpful perf record will not allow that, the ABI will. The warning later in the perf_event_open() path will also not trigger, because it's also wrong. Fix all this by validating the original group before moving, getting rid of broken safeguards and placing a useful one to perf_install_in_context(). Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: mathieu.poirier@linaro.org Cc: will.deacon@arm.com Fixes: bed5b25ad9c8a ("perf: Add a pmu capability for "exclusive" events") Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-13perf/core: Fix race between close() and fork()Peter Zijlstra
Syzcaller reported the following Use-after-Free bug: close() clone() copy_process() perf_event_init_task() perf_event_init_context() mutex_lock(parent_ctx->mutex) inherit_task_group() inherit_group() inherit_event() mutex_lock(event->child_mutex) // expose event on child list list_add_tail() mutex_unlock(event->child_mutex) mutex_unlock(parent_ctx->mutex) ... goto bad_fork_* bad_fork_cleanup_perf: perf_event_free_task() perf_release() perf_event_release_kernel() list_for_each_entry() mutex_lock(ctx->mutex) mutex_lock(event->child_mutex) // event is from the failing inherit // on the other CPU perf_remove_from_context() list_move() mutex_unlock(event->child_mutex) mutex_unlock(ctx->mutex) mutex_lock(ctx->mutex) list_for_each_entry_safe() // event already stolen mutex_unlock(ctx->mutex) delayed_free_task() free_task() list_for_each_entry_safe() list_del() free_event() _free_event() // and so event->hw.target // is the already freed failed clone() if (event->hw.target) put_task_struct(event->hw.target) // WHOOPSIE, already quite dead Which puts the lie to the the comment on perf_event_free_task(): 'unexposed, unused context' not so much. Which is a 'fun' confluence of fail; copy_process() doing an unconditional free_task() and not respecting refcounts, and perf having creative locking. In particular: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp") seems to have overlooked this 'fun' parade. Solve it by using the fact that detached events still have a reference count on their (previous) context. With this perf_event_free_task() can detect when events have escaped and wait for their destruction. Debugged-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-12Merge tag 'kbuild-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - remove headers_{install,check}_all targets - remove unreasonable 'depends on !UML' from CONFIG_SAMPLES - re-implement 'make headers_install' more cleanly - add new header-test-y syntax to compile-test headers - compile-test exported headers to ensure they are compilable in user-space - compile-test headers under include/ to ensure they are self-contained - remove -Waggregate-return, -Wno-uninitialized, -Wno-unused-value flags - add -Werror=unknown-warning-option for Clang - add 128-bit built-in types support to genksyms - fix missed rebuild of modules.builtin - propagate 'No space left on device' error in fixdep to Make - allow Clang to use its integrated assembler - improve some coccinelle scripts - add a new flag KBUILD_ABS_SRCTREE to request Kbuild to use absolute path for $(srctree). - do not ignore errors when compression utility is missing - misc cleanups * tag 'kbuild-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (49 commits) kbuild: use -- separater intead of $(filter-out ...) for cc-cross-prefix kbuild: Inform user to pass ARCH= for make mrproper kbuild: fix compression errors getting ignored kbuild: add a flag to force absolute path for srctree kbuild: replace KBUILD_SRCTREE with boolean building_out_of_srctree kbuild: remove src and obj from the top Makefile scripts/tags.sh: remove unused environment variables from comments scripts/tags.sh: drop SUBARCH support for ARM kbuild: compile-test kernel headers to ensure they are self-contained kheaders: include only headers into kheaders_data.tar.xz kheaders: remove meaningless -R option of 'ls' kbuild: support header-test-pattern-y kbuild: do not create wrappers for header-test-y kbuild: compile-test exported headers to ensure they are self-contained init/Kconfig: add CONFIG_CC_CAN_LINK kallsyms: exclude kasan local symbols on s390 kbuild: add more hints about SUBDIRS replacement coccinelle: api/stream_open: treat all wait_.*() calls as blocking coccinelle: put_device: Add a cast to an expression for an assignment coccinelle: put_device: Adjust a message construction ...