linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-10-20	powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected	Christophe Leroy
	If CONFIG_PPC_SPLPAR is not selected, steal_time will always be NUL, so accounting it is pointless Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64	Christophe Leroy
	scaled cputime is only meaningfull when the processor has SPURR and/or PURR, which means only on PPC64. Removing it on PPC32 significantly reduces the size of vtime_account_system() and vtime_account_idle() on an 8xx: Before: 00000000 l F .text 000000a8 vtime_delta 00000280 g F .text 0000010c vtime_account_system 0000038c g F .text 00000048 vtime_account_idle After: (vtime_delta gets inlined inside the two functions) 000001d8 g F .text 000000a0 vtime_account_system 00000278 g F .text 00000038 vtime_account_idle In terms of performance, we also get approximatly 7% improvement on task switch. The following small benchmark app is run with perf stat: void thread(void arg) { int i; for (i = 0; i < atoi((char)arg); i++) pthread_yield(); } int main(int argc, char *argv) { pthread_t th1, th2; pthread_create(&th1, NULL, thread, argv[1]); pthread_create(&th2, NULL, thread, argv[1]); pthread_join(th1, NULL); pthread_join(th2, NULL); return 0; } Before the patch: Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs): 8228.476465 task-clock (msec) # 0.954 CPUs utilized ( +- 0.23% ) 200004 context-switches # 0.024 M/sec ( +- 0.00% ) After the patch: Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs): 7649.070444 task-clock (msec) # 0.955 CPUs utilized ( +- 0.27% ) 200004 context-switches # 0.026 M/sec ( +- 0.00% ) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/time: isolate scaled cputime accounting in dedicated functions.	Christophe Leroy
	scaled cputime is only meaningfull when the processor has SPURR and/or PURR, which means only on PPC64. In preparation of the following patch that will remove CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC32, this patch moves all scaled cputing accounting logic into dedicated functions. This patch doesn't change any functionality. It's only code reorganisation. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/kgdb: add kgdb_arch_set/remove_breakpoint()	Christophe Leroy
	Generic implementation fails to remove breakpoints after init when CONFIG_STRICT_KERNEL_RWX is selected: [ 13.251285] KGDB: BP remove failed: c001c338 [ 13.259587] kgdbts: ERROR PUT: end of test buffer on 'do_fork_test' line 8 expected OK got $E14#aa [ 13.268969] KGDB: re-enter exception: ALL breakpoints killed [ 13.275099] CPU: 0 PID: 1 Comm: init Not tainted 4.18.0-g82bbb913ffd8 #860 [ 13.282836] Call Trace: [ 13.285313] [c60e1ba0] [c0080ef0] kgdb_handle_exception+0x6f4/0x720 (unreliable) [ 13.292618] [c60e1c30] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98 [ 13.298709] [c60e1c40] [c000af54] program_check_exception+0x104/0x700 [ 13.305083] [c60e1c60] [c000e45c] ret_from_except_full+0x0/0x4 [ 13.310845] [c60e1d20] [c02a22ac] run_simple_test+0x2b4/0x2d4 [ 13.316532] [c60e1d30] [c0081698] put_packet+0xb8/0x158 [ 13.321694] [c60e1d60] [c00820b4] gdb_serial_stub+0x230/0xc4c [ 13.327374] [c60e1dc0] [c0080af8] kgdb_handle_exception+0x2fc/0x720 [ 13.333573] [c60e1e50] [c000e928] kgdb_singlestep+0xb4/0xcc [ 13.339068] [c60e1e70] [c000ae1c] single_step_exception+0x90/0xac [ 13.345100] [c60e1e80] [c000e45c] ret_from_except_full+0x0/0x4 [ 13.350865] [c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38 [ 13.356346] Kernel panic - not syncing: Recursive entry to debugger This patch creates powerpc specific version of kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() using patch_instruction() Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/sysdev/ipic: check primary_ipic NULL pointer before using it	Christophe Leroy
	ipic_get_mcp_status() is used by targets implementing NMI watchdog in target specific machine check handler in order to known whether a machine check results from a watchdog NMI reset. In case of very early machine check, primary_ipic pointer might not have been set yet, so ipic_get_mcp_status() needs to check it for nullity before using it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/mm: fix always true/false warning in slice.c	Christophe Leroy
	This patch fixes the following warnings (obtained with make W=1). arch/powerpc/mm/slice.c: In function 'slice_range_to_mask': arch/powerpc/mm/slice.c:73:12: error: comparison is always true due to limited range of data type [-Werror=type-limits] if (start < SLICE_LOW_TOP) { ^ arch/powerpc/mm/slice.c:81:20: error: comparison is always false due to limited range of data type [-Werror=type-limits] if ((start + len) > SLICE_LOW_TOP) { ^ arch/powerpc/mm/slice.c: In function 'slice_mask_for_free': arch/powerpc/mm/slice.c:136:17: error: comparison is always true due to limited range of data type [-Werror=type-limits] if (high_limit <= SLICE_LOW_TOP) ^ arch/powerpc/mm/slice.c: In function 'slice_check_range_fits': arch/powerpc/mm/slice.c:185:12: error: comparison is always true due to limited range of data type [-Werror=type-limits] if (start < SLICE_LOW_TOP) { ^ arch/powerpc/mm/slice.c:195:39: error: comparison is always false due to limited range of data type [-Werror=type-limits] if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) { ^ arch/powerpc/mm/slice.c: In function 'slice_scan_available': arch/powerpc/mm/slice.c:306:11: error: comparison is always true due to limited range of data type [-Werror=type-limits] if (addr < SLICE_LOW_TOP) { ^ arch/powerpc/mm/slice.c: In function 'get_slice_psize': arch/powerpc/mm/slice.c:709:11: error: comparison is always true due to limited range of data type [-Werror=type-limits] if (addr < SLICE_LOW_TOP) { ^ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/mm: fix missing prototypes in slice.c	Christophe Leroy
	This patch fixes the following warnings (obtained with make W=1). arch/powerpc/mm/slice.c: At top level: arch/powerpc/mm/slice.c:682:15: error: no previous prototype for 'arch_get_unmapped_area' [-Werror=missing-prototypes] unsigned long arch_get_unmapped_area(struct file filp, ^ arch/powerpc/mm/slice.c:692:15: error: no previous prototype for 'arch_get_unmapped_area_topdown' [-Werror=missing-prototypes] unsigned long arch_get_unmapped_area_topdown(struct file filp, ^ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/mm: Trace tlbia instruction	Christophe Leroy
	Add a trace point for tlbia (Translation Lookaside Buffer Invalidate All) instruction. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/mm: Add missing tracepoint for tlbie	Christophe Leroy
	commit 0428491cba927 ("powerpc/mm: Trace tlbie(l) instructions") added tracepoints for tlbie calls, but _tlbil_va() was forgotten Fixes: 0428491cba927 ("powerpc/mm: Trace tlbie(l) instructions") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/book3s64: fix dump_linuxpagetables "present" flag	Christophe Leroy
	Since commit bd0dbb73e013 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid."), _PAGE_PRESENT doesn't mean exactly that a page is present. A page is also considered preset when _PAGE_INVALID is set. This patch changes the meaning of "present" and adds a status "valid" associated to the _PAGE_PRESENT flag. Fixes: bd0dbb73e013 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc/pseries: Export raw per-CPU VPA data via debugfs	Aravinda Prasad
	This patch exports the raw per-CPU VPA data via debugfs. A per-CPU file is created which exports the VPA data of that CPU to help debug some of the VPA related issues or to analyze the per-CPU VPA related statistics. v3: Removed offline CPU check. v2: Included offline CPU check and other review comments. Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	selftests/powerpc: Add test to verify rfi flush across a system call	Naveen N. Rao
	This adds a test to verify proper functioning of the rfi flush capability implemented to mitigate meltdown. The test works by measuring the number of L1d cache misses encountered while loading data from memory. Across a system call, since the L1d cache is flushed when rfi_flush is enabled, the number of cache misses is expected to be relative to the number of cachelines corresponding to the data being loaded. The current system setting is reflected via powerpc/rfi_flush under debugfs (assumed to be /sys/kernel/debug/). This test verifies the expected result with rfi_flush enabled as well as when it is disabled. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Add SPDX tags, clang format, skip if the debugfs is missing, use __u64 and SANE_USERSPACE_TYPES to avoid printf() build errors.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	selftests/powerpc: Move UCONTEXT_NIA() into utils.h	Naveen N. Rao
	... so that it can be used by others. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc64/module elfv1: Set opd addresses after module relocation	Naveen N. Rao
	module_frob_arch_sections() is called before the module is moved to its final location. The function descriptor section addresses we are setting here are thus invalid. Fix this by processing opd section during module_finalize() Fixes: 5633e85b2c313 ("powerpc64: Add .opd based function descriptor dereference") Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	powerpc: Add support for function error injection	Naveen N. Rao
	We implement regs_set_return_value() and override_function_with_return() for this purpose. On powerpc, a return from a function (blr) just branches to the location contained in the link register. So, we can just update pt_regs rather than redirecting execution to a dummy function that returns. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20	Merge tag 'drm-misc-fixes-2018-10-19' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Second pull request for v4.19: - Fix ulong overflow in sun4i - Fix a serious GPF in waiting for flip_done from commit_tail(). Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/97d1ed42-1d99-fcc5-291e-cd1dc29a4252@linux.intel.com
2018-10-19	net: ethernet: lpc_eth: add device and device node local variables	Vladimir Zapolskiy
	Trivial non-functional change added to simplify getting multiple references to device pointer in lpc_eth_drv_probe(). Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19	net: ethernet: lpc_eth: remove unused local variable	Vladimir Zapolskiy
	A trivial change which removes an unused local variable, the issue is reported as a compile time warning: drivers/net/ethernet/nxp/lpc_eth.c: In function 'lpc_eth_drv_probe': drivers/net/ethernet/nxp/lpc_eth.c:1250:21: warning: variable 'phydev' set but not used [-Wunused-but-set-variable] struct phy_device *phydev; ^~~~~~ Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19	net: ethernet: lpc_eth: remove CONFIG_OF guard from the driver	Vladimir Zapolskiy
	The MAC controller device is available on NXP LPC32xx platform only, and the LPC32xx platform supports OF builds only, so additional checks in the device driver are not needed. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19	net: ethernet: lpc_eth: clean up the list of included headers	Vladimir Zapolskiy
	The change removes all unnecessary included headers from the driver source code, the remaining list is sorted in alphabetical order. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19	Merge branch 'Microchip-Technology-KSZ9131'	David S. Miller
	Yuiko Oshino says: ==================== Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY This is the initial driver for Microchip KSZ9131 10/100/1000 Ethernet PHY v3: - KSZ9131 uses picosecond units for values of devicetree properties. - rewrite micrel.c and micrel-ksz90x1.txt to use the picosecond values. v2: - Creating a series from two related patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19	dt-bindings: net: add support for Microchip KSZ9131	Yuiko Oshino
	Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19	net: phy: micrel: add Microchip KSZ9131 initial driver	Yuiko Oshino
	Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19	netpoll: allow cleanup to be synchronous	Debabrata Banerjee
	This fixes a problem introduced by: commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock") When using netconsole on a bond, __netpoll_cleanup can asynchronously recurse multiple times, each __netpoll_free_async call can result in more __netpoll_free_async's. This means there is now a race between cleanup_work queues on multiple netpoll_info's on multiple devices and the configuration of a new netpoll. For example if a netconsole is set to enable 0, reconfigured, and enable 1 immediately, this netconsole will likely not work. Given the reason for __netpoll_free_async is it can be called when rtnl is not locked, if it is locked, we should be able to execute synchronously. It appears to be locked everywhere it's called from. Generalize the design pattern from the teaming driver for current callers of __netpoll_free_async. CC: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-20	bpf: skmsg, fix psock create on existing kcm/tls port	John Fastabend
	Before using the psock returned by sk_psock_get() when adding it to a sockmap we need to ensure it is actually a sockmap based psock. Previously we were only checking this after incrementing the reference counter which was an error. This resulted in a slab-out-of-bounds error when the psock was not actually a sockmap type. This moves the check up so the reference counter is only used if it is a sockmap psock. Eric reported the following KASAN BUG, BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387 CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 sk_psock_get include/linux/skmsg.h:379 [inline] sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178 sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669 sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738 map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-19	scsi: myrs: Fix the processor absent message in processor_show()	Dan Carpenter
	If both processors are absent then it's supposed to print that, but instead we print that just the second processor is absent. Fixes: 77266186397c ("scsi: myrs: Add Mylex RAID controller (SCSI interface)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-19	scsi: myrs: Fix a logical vs bitwise bug	Dan Carpenter
	The \|\| was supposed to be \|. The original code just sets ->result to 1. Fixes: 77266186397c ("scsi: myrs: Add Mylex RAID controller (SCSI interface)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-19	scsi: hisi_sas: Fix NULL pointer dereference	Gustavo A. R. Silva
	There is a NULL pointer dereference in case slot happens to be NULL at lines 1053 and 1878: struct hisi_sas_cq cq = &hisi_hba->cq[slot->dlvry_queue]; Notice that slot* is being NULL checked at lines 1057 and 1881: if (slot), which implies it may be NULL. Fix this by placing the declaration and definition of variable cq, which contains the pointer dereference slot->dlvry_queue, after slot has been properly NULL checked. Addresses-Coverity-ID: 1474515 ("Dereference before null check") Addresses-Coverity-ID: 1474520 ("Dereference before null check") Fixes: 584f53fe5f52 ("scsi: hisi_sas: Fix the race between IO completion and timeout for SMP/internal IO") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-19	scsi: myrs: fix build failure on 32 bit	James Bottomley
	For 32 bit versions we have to be careful about divisions of 64 bit quantities so use do_div() instead of a direct division. This fixes a warning about _uldivmod being undefined in certain configurations Fixes: 77266186397c ("scsi: myrs: Add Mylex RAID controller") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-19	selftests: ftrace: Add synthetic event syntax testcase	Masami Hiramatsu
	Add a testcase to check the syntax and field types for synthetic_events interface. Link: http://lkml.kernel.org/r/153986838264.18251.16627517536956299922.stgit@devbox Acked-by: Shuah Khan <shuah@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-19	tracing: Fix synthetic event to allow semicolon at end	Masami Hiramatsu
	Fix synthetic event to allow independent semicolon at end. The synthetic_events interface accepts a semicolon after the last word if there is no space. # echo "myevent u64 var;" >> synthetic_events But if there is a space, it returns an error. # echo "myevent u64 var ;" > synthetic_events sh: write error: Invalid argument This behavior is difficult for users to understand. Let's allow the last independent semicolon too. Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-19	tracing: Fix synthetic event to accept unsigned modifier	Masami Hiramatsu
	Fix synthetic event to accept unsigned modifier for its field type correctly. Currently, synthetic_events interface returns error for "unsigned" modifiers as below; # echo "myevent unsigned long var" >> synthetic_events sh: write error: Invalid argument This is because argv_split() breaks "unsigned long" into "unsigned" and "long", but parse_synth_field() doesn't expected it. With this fix, synthetic_events can handle the "unsigned long" correctly like as below; # echo "myevent unsigned long var" >> synthetic_events # cat synthetic_events myevent unsigned long var Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-19	bpf: remove unused variable	Alexei Starovoitov
	fix the following warning ../kernel/bpf/syscall.c: In function ‘map_lookup_and_delete_elem’: ../kernel/bpf/syscall.c:1010:22: warning: unused variable ‘ptr’ [-Wunused-variable] void key, value, *ptr; ^~~ Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	Merge branch 'cg_skb_direct_pkt_access'	Alexei Starovoitov
	Song Liu says: ==================== Changes v7 -> v8: 1. Dynamically allocate the dummy sk to avoid race conditions. Changes v6 -> v7: 1. Make dummy sk a global variable (test_run_sk). Changes v5 -> v6: 1. Fixed dummy sk in bpf_prog_test_run_skb() as suggested by Eric Dumazet. Changes v4 -> v5: 1. Replaced bpf_compute_and_save_data_pointers() with bpf_compute_and_save_data_end(); Replaced bpf_restore_data_pointers() with bpf_restore_data_end(). 2. Fixed indentation in test_verifier.c Changes v3 -> v4: 1. Fixed crash issue reported by Alexei. Changes v2 -> v3: 1. Added helper function bpf_compute_and_save_data_pointers() and bpf_restore_data_pointers(). Changes v1 -> v2: 1. Updated the list of read-only fields, and read-write fields. 2. Added dummy sk to bpf_prog_test_run_skb(). This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access some __skb_buff data directly. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf: add tests for direct packet access from CGROUP_SKB	Song Liu
	Tests are added to make sure CGROUP_SKB cannot access: tc_classid, data_meta, flow_keys and can read and write: mark, prority, and cb[0-4] and can read other fields. To make selftest with skb->sk work, a dummy sk is added in bpf_prog_test_run_skb(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB	Song Liu
	BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the skb. This patch enables direct access of skb for these programs. Two helper functions bpf_compute_and_save_data_end() and bpf_restore_data_end() are introduced. There are used in __cgroup_bpf_run_filter_skb(), to compute proper data_end for the BPF program, and restore original data afterwards. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	Merge branch 'improve_perf_barriers'	Alexei Starovoitov
	Daniel Borkmann says: ==================== This set first adds smp_* barrier variants to tools infrastructure and updates perf and libbpf to make use of them. For details, please see individual patches, thanks! Arnaldo, if there are no objections, could this be routed via bpf-next with Acked-by's due to later dependencies in libbpf? Alternatively, I could also get the 2nd patch out during merge window, but perhaps it's okay to do in one go as there shouldn't be much conflict in perf itself. Thanks! v1 -> v2: - add common helper and switch to acquire/release variants when possible, thanks Peter! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf, libbpf: use correct barriers in perf ring buffer walk	Daniel Borkmann
	Given libbpf is a generic library and not restricted to x86-64 only, the compiler barrier in bpf_perf_event_read_simple() after fetching the head needs to be replaced with smp_rmb() at minimum. Also, writing out the tail we should use WRITE_ONCE() to avoid store tearing. Now that we have the logic in place in ring_buffer_read_head() and ring_buffer_write_tail() helper also used by perf tool which would select the correct and best variant for a given architecture (e.g. x86-64 can avoid CPU barriers entirely), make use of these in order to fix bpf_perf_event_read_simple(). Fixes: d0cabbb021be ("tools: bpf: move the event reading loop to libbpf") Fixes: 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	tools, perf: add and use optimized ring_buffer_{read_head, write_tail} helpers	Daniel Borkmann
	Currently, on x86-64, perf uses LFENCE and MFENCE (rmb() and mb(), respectively) when processing events from the perf ring buffer which is unnecessarily expensive as we can do more lightweight in particular given this is critical fast-path in perf. According to Peter rmb()/mb() were added back then via a94d342b9cb0 ("tools/perf: Add required memory barriers") at a time where kernel still supported chips that needed it, but nowadays support for these has been ditched completely, therefore we can fix them up as well. While for x86-64, replacing rmb() and mb() with smp_() variants would result in just a compiler barrier for the former and LOCK + ADD for the latter (__sync_synchronize() uses slower MFENCE by the way), Peter suggested we can use smp_{load_acquire,store_release}() instead for architectures where its implementation doesn't resolve in slower smp_mb(). Thus, e.g. in x86-64 we would be able to avoid CPU barrier entirely due to TSO. For architectures where the latter needs to use smp_mb() e.g. on arm, we stick to cheaper smp_rmb() variant for fetching the head. This work adds helpers ring_buffer_read_head() and ring_buffer_write_tail() for tools infrastructure that either switches to smp_load_acquire() for architectures where it is cheaper or uses READ_ONCE() + smp_rmb() barrier for those where it's not in order to fetch the data_head from the perf control page, and it uses smp_store_release() to write the data_tail. Latter is smp_mb() + WRITE_ONCE() combination or a cheaper variant if architecture allows for it. Those that rely on smp_rmb() and smp_mb() can further improve performance in a follow up step by implementing the two under tools/arch//include/asm/barrier.h such that they don't have to fallback to rmb() and mb() in tools/include/asm/barrier.h. Switch perf to use ring_buffer_read_head() and ring_buffer_write_tail() so it can make use of the optimizations. Later, we convert libbpf as well to use the same helpers. Side note [0]: the topic has been raised of whether one could simply use the C11 gcc builtins [1] for the smp_load_acquire() and smp_store_release() instead: __atomic_load_n(ptr, __ATOMIC_ACQUIRE); __atomic_store_n(ptr, val, __ATOMIC_RELEASE); Kernel and (presumably) tooling shipped along with the kernel has a minimum requirement of being able to build with gcc-4.6 and the latter does not have C11 builtins. While generally the C11 memory models don't align with the kernel's, the C11 load-acquire and store-release alone /could/ suffice, however. Issue is that this is implementation dependent on how the load-acquire and store-release is done by the compiler and the mapping of supported compilers must align to be compatible with the kernel's implementation, and thus needs to be verified/tracked on a case by case basis whether they match (unless an architecture uses them also from kernel side). The implementations for smp_load_acquire() and smp_store_release() in this patch have been adapted from the kernel side ones to have a concrete and compatible mapping in place. [0] http://patchwork.ozlabs.org/patch/985422/ [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	selftests/bpf: add missing executables to .gitignore	Anders Roxell
	Fixes: 371e4fcc9d96 ("selftests/bpf: cgroup local storage-based network counters") Fixes: 370920c47b26 ("selftests/bpf: Test libbpf_{prog,attach}_type_by_name") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	Merge branch 'queue_stack_maps'	Alexei Starovoitov
	Mauricio Vasquez says: ==================== In some applications this is needed have a pool of free elements, for example the list of free L4 ports in a SNAT. None of the current maps allow to do it as it is not possible to get any element without having they key it is associated to, even if it were possible, the lack of locking mecanishms in eBPF would do it almost impossible to be implemented without data races. This patchset implements two new kind of eBPF maps: queue and stack. Those maps provide to eBPF programs the peek, push and pop operations, and for userspace applications a new bpf_map_lookup_and_delete_elem() is added. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> v2 -> v3: - Remove "almost dead code" in syscall.c - Remove unnecessary copy_from_user in bpf_map_lookup_and_delete_elem - Rebase v1 -> v2: - Put ARG_PTR_TO_UNINIT_MAP_VALUE logic into a separated patch - Fix missing __this_cpu_dec & preempt_enable calls in kernel/bpf/syscall.c RFC v4 -> v1: - Remove roundup to power of 2 in memory allocation - Remove count and use a free slot to check if queue/stack is empty - Use if + assigment for wrapping indexes - Fix some minor style issues - Squash two patches together RFC v3 -> RFC v4: - Revert renaming of kernel/bpf/stackmap.c - Remove restriction on value size - Remove len arguments from peek/pop helpers - Add new ARG_PTR_TO_UNINIT_MAP_VALUE RFC v2 -> RFC v3: - Return elements by value instead that by reference - Implement queue/stack base on array and head + tail indexes - Rename stack trace related files to avoid confusion and conflicts RFC v1 -> RFC v2: - Create two separate maps instead of single one + flags - Implement bpf_map_lookup_and_delete syscall - Support peek operation - Define replacement policy through flags in the update() method - Add eBPF side tests ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	selftests/bpf: add test cases for queue and stack maps	Mauricio Vasquez B
	test_maps: Tests that queue/stack maps are behaving correctly even in corner cases test_progs: Tests new ebpf helpers Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	Sync uapi/bpf.h to tools/include	Mauricio Vasquez B
	Sync both files. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall	Mauricio Vasquez B
	The previous patch implemented a bpf queue/stack maps that provided the peek/pop/push functions. There is not a direct relationship between those functions and the current maps syscalls, hence a new MAP_LOOKUP_AND_DELETE_ELEM syscall is added, this is mapped to the pop operation in the queue/stack maps and it is still to implement in other kind of maps. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf: add queue and stack maps	Mauricio Vasquez B
	Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs. These maps support peek, pop and push operations that are exposed to eBPF programs through the new bpf_map[peek/pop/push] helpers. Those operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop BPF_MAP_UPDATE_ELEM -> push Queue/stack maps are implemented using a buffer, tail and head indexes, hence BPF_F_NO_PREALLOC is not supported. As opposite to other maps, queue and stack do not use RCU for protecting maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE argument that is a pointer to a memory zone where to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. Our main motivation for implementing queue/stack maps was to keep track of a pool of elements, like network ports in a SNAT, however we forsee other use cases, like for exampling saving last N kernel events in a map and then analysing from userspace. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUE	Mauricio Vasquez B
	ARG_PTR_TO_UNINIT_MAP_VALUE argument is a pointer to a memory zone used to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. This will be used in the following patch that implements some new helpers that receive a pointer to be filled with a map value. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf/syscall: allow key to be null in map functions	Mauricio Vasquez B
	This commit adds the required logic to allow key being NULL in case the key_size of the map is 0. A new __bpf_copy_key function helper only copies the key from userpsace when key_size != 0, otherwise it enforces that key must be null. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	bpf: rename stack trace map operations	Mauricio Vasquez B
	In the following patches queue and stack maps (FIFO and LIFO datastructures) will be implemented. In order to avoid confusion and a possible name clash rename stack_map_ops to stack_trace_map_ops Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19	parisc: Add PDC PAT cell_info() and pd_get_pdc_revisions() functions	Helge Deller
	Add wrappers for the PDC_PAT_CELL_GET_INFO and PDC_PAT_PD_GET_PDC_INTERF_REV PAT PDC subfunctions. Both provide access to the PAT capability bitfield which can guide us if simultaneous PTLBs are allowed on the bus, and if firmware will rendezvous all processors within PDCE_Check in case of an HPMC. Signed-off-by: Helge Deller <deller@gmx.de>
2018-10-19	parisc: Drop two instructions from pte lookup code	Helge Deller
	Remove two instruction from the hot path. The temporary move to %r9 is unneccessary, and the zero-inialization of pte happens twice. Signed-off-by: Helge Deller <deller@gmx.de>