summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-06perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIPAdrian Hunter
It is possible to have a CBR packet between a FUP packet and corresponding TIP packet. Stop treating it as an error. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACINGAdrian Hunter
sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched. In one case, INTEL_PT_SS_NOT_TRACING state was not correctly transitioning to INTEL_PT_SS_TRACING state due to a missing case clause. Add it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf script powerpc: Python script for hypervisor call statisticsRavi Bangoria
Add python script to show hypervisor call statistics. Ex, # perf record -a -e "{powerpc:hcall_entry,powerpc:hcall_exit}" # perf script -s scripts/python/powerpc-hcalls.py hcall count min(ns) max(ns) avg(ns) -------------------------------------------------------------------- H_RANDOM 82 838 1164 904 H_PUT_TCE 47 1078 5928 2003 H_EOI 266 1336 3546 1654 H_ENTER 28 1646 4038 1952 H_PUT_TCE_INDIRECT 230 2166 18168 6109 H_IPI 238 1072 3232 1688 H_SEND_LOGICAL_LAN 42 5488 21366 7694 H_STUFF_TCE 294 986 6210 3591 H_XIRR 266 2286 6990 3783 H_PROTECT 10 2196 3556 2555 H_VIO_SIGNAL 294 1028 2784 1311 H_ADD_LOGICAL_LAN_BUFFER 53 1978 3450 2600 H_SEND_CRQ 77 1762 7240 2447 Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20180605124801.17210-1-ravi.bangoria@linux.ibm.com [ Fixup typo: table_loockup -> table_lookup ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf test record+probe_libc_inet_pton: Ask 'nm' for dynamic symbolsArnaldo Carvalho de Melo
Adrian reported that this test fails in his system where: probe libc's inet_pton & backtrace it with ping: FAILED! root@kbl04:~/git/linux-perf# nm -g /lib/x86_64-linux-gnu/libc-2.19.so | grep inet_pton nm: /lib/x86_64-linux-gnu/libc-2.19.so: no symbols This fails on ubuntu systems, with Adrian's being kubuntu 14.04, I tested with ubuntu 14.04.4 and 18.04, and there we need to use the -D/--dynamic 'nm' option to have this test working. And it works as well with that on fedora 27, so use it. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-zlfnbauad3ljlmtjgo0v660u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf map: Consider PTI entry trampolines in rip_2objdump()Adrian Hunter
perf tools uses map__rip_2objdump() to calculate objdump virtual addresses. map__rip_2objdump() needs to be amended to deal with PTI entry trampolines. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1528183800-21577-1-git-send-email-adrian.hunter@intel.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf test code-reading: Fix perf_env setup for PTI entry trampolinesAdrian Hunter
The "Object code reading" test will not create maps for the PTI entry trampolines unless the machine environment exists to show that the arch is x86_64. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1528183800-21577-1-git-send-email-adrian.hunter@intel.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf tools: Fix pmu events parsing ruleJiri Olsa
Currently all the event parsing fails end up in the event_pmu rule, and display misleading help like: $ perf stat -e inst kill event syntax error: 'inst' \___ Cannot find PMU `inst'. Missing kernel support? ... The reason is that the event_pmu is too strong and match also single string. Changing it to force the '/' separators to be part of the rule, and getting the proper error now: $ perf stat -e inst kill event syntax error: 'inst' \___ parser error Run 'perf list' for a list of valid events ... Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180605121416.31645-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf stat: Display user and system timeJiri Olsa
Adding the support to read rusage data once the workload is finished and display the system/user time values: $ perf stat --null perf bench sched pipe ... Performance counter stats for 'perf bench sched pipe': 5.342599256 seconds time elapsed 2.544434000 seconds user 4.549691000 seconds sys It works only in non -r mode and only for workload target. So as of now, for workload targets, we display 3 types of timings. The time we meassure in perf stat from enable to disable+period: 5.342599256 seconds time elapsed The time spent in user and system lands, displayed only for workload session/target: 2.544434000 seconds user 4.549691000 seconds sys Those times are the very same displayed by 'time' tool. They are returned by wait4 call via the getrusage struct interface. Committer notes: Had to rename some variables to avoid this on older systems such as centos:6: builtin-stat.c: In function 'print_footer': builtin-stat.c:1831: warning: declaration of 'stime' shadows a global declaration /usr/include/time.h:297: warning: shadowed declaration is here Committer testing: # perf stat --null time perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 5.526 [sec] 5.526534 usecs/op 180945 ops/sec 1.00user 6.25system 0:05.52elapsed 131%CPU (0avgtext+0avgdata 8056maxresident)k 0inputs+0outputs (0major+606minor)pagefaults 0swaps Performance counter stats for 'time perf bench sched pipe': 5.530978744 seconds time elapsed 1.004037000 seconds user 6.259937000 seconds sys # Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180605121313.31337-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf record: Enable arbitrary event names thru name= modifierAlexey Budankov
Enable complex event names containing [.:=,] symbols to be encoded into Perf trace using name= modifier e.g. like this: perf record -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',\ period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex Below is how it looks like in the report output. Please note explicit escaped quoting at cmdline string in the header so that thestring can be directly reused for another collection in shell: perf report --header # ======== ... # cmdline : /root/abudanko/kernel/tip/tools/perf/perf record -v -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex # event : name = OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM, , type = 4, size = 112, config = 0x100003c, { sample_period, sample_freq } = 3500000, sample_type = IP|TID|TIME, disabled = 1, inh ... # ======== # # # Total Lost Samples: 0 # # Samples: 24K of event 'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM' # Event count (approx.): 86492000000 # # Overhead Command Shared Object Symbol # ........ ....... ................ .............................................. # 14.75% futex [kernel.vmlinux] [k] __entry_trampoline_start ... perf stat -e cpu/name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex 10000000 process context switches in 16678890291ns (1667.9ns/ctxsw) Performance counter stats for './futex': 88,095,770,571 CPU_CLK_UNHALTED.THREAD:cmask=0x1 16.679542407 seconds time elapsed Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/c194b060-761d-0d50-3b21-bb4ed680002d@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf tools: Fix symbol and object code resolution for vdso32 and vdsox32Adrian Hunter
Fix __kmod_path__parse() so that perf tools does not treat vdso32 and vdsox32 as kernel modules and fail to find the object. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' correctly") Link: http://lkml.kernel.org/r/1528117014-30032-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf tests kmod-path: Add tests for vdso32 and vdsox32Adrian Hunter
Add tests for vdso32 and vdsox32. This will cause the overall test to fail because __kmod_path__parse() does not handle vdso32 or vdsox32. Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' correctly") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1528117014-30032-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf hists: Check if a hist_entry has callchains before using themArnaldo Carvalho de Melo
So far if we use 'perf record -g' this will make symbol_conf.use_callchain 'true' and logic will assume that all events have callchains enabled, but ever since we added the possibility of setting up callchains for some events (e.g.: -e cycles/call-graph=dwarf/) while not for others, we limit usage scenarios by looking at that symbol_conf.use_callchain global boolean, we better look at each event attributes. On the road to that we need to look if a hist_entry has callchains, that is, to go from hist_entry->hists to the evsel that contains it, to then look at evsel->sample_type for PERF_SAMPLE_CALLCHAIN. The next step is to add a symbol_conf.ignore_callchains global, to use in the places where what we really want to know is if callchains should be ignored, even if present. Then -g will mean just to select a callchain mode to be applied to all events not explicitely setting some other callchain mode, i.e. a default callchain mode, and --no-call-graph will set symbol_conf.ignore_callchains with that clear intention. That too will at some point become a per evsel thing, that tools can set for all or just a few of its evsels. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-0sas5cm4dsw2obn75g7ruz69@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf hists: Introduce hist_entry__has_callchain() methodArnaldo Carvalho de Melo
We'll use this helper more frequently when reworking symbol_conf.use_callchain logic, where knowing if a hist_entry has callchains is the important bit, so make going from hist_entry to hists to evsel easier, compact. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-p6gioxkzpkpz71dtt4wcs36o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06Merge tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "This starts to support NVIDIA volta hardware with nouveau, and adds amdgpu support for the GPU in the Kabylake-G (the intel + radeon single package chip), along with some initial Intel icelake enabling. Summary: New Drivers: - v3d - driver for broadcom V3D V3.x+ hardware - xen-front - XEN PV display frontend core: - handle zpos normalization in the core - stop looking at legacy pointers in atomic paths - improved scheduler documentation - improved aspect ratio validation - aspect ratio support for 64:27 and 256:135 - drop unused control node code. i915: - Icelake (ICL) enabling - GuC/HuC refactoring - PSR/PSR2 enabling and fixes - DPLL management refactoring - DP MST fixes - NV12 enabling - HDCP improvements - GEM/Execlist/reset improvements - GVT improvements - stolen memory first 4k fix amdgpu: - Vega 20 support - VEGAM support (Kabylake-G) - preOS scanout buffer reservation - power management gfxoff support for raven - SR-IOV fixes - Vega10 power profiles and clock voltage control - scatter/gather display support on CZ/ST amdkfd: - GFX9 dGPU support - userptr memory mapping nouveau: - major refactoring for Volta GV100 support tda998x: - HDMI i2c CEC support etnaviv: - removed unused logging code - license text cleanups - MMU handling improvements - timeout fence fix for 50 days uptime tegra: - IOMMU support in gr2d/gr3d drivers - zpos support vc4: - syncobj support - CTM, plane alpha and async cursor support analogix_dp: - HPD and aux chan fixes sun4i: - MIPI DSI support tilcdc: - clock divider fixes for OMAP-l138 LCDK board rcar-du: - R8A77965 support - dma-buf fences fixes - hardware indexed crtc/du group handling - generic zplane property support atmel-hclcdc: - generic zplane property support mediatek: - use generic video mode function exynos: - S5PV210 FIMD variant support - IPP v2 framework - more HW overlays support" * tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm: (1286 commits) drm/amdgpu: fix 32-bit build warning drm/exynos: fimc: signedness bug in fimc_setup_clocks() drm/exynos: scaler: fix static checker warning drm/amdgpu: Use dev_info() to report amdkfd is not supported for this ASIC drm/amd/display: Remove use of division operator for long longs drm/amdgpu: Update GFX info structure to match what vega20 used drm/amdgpu/pp: remove duplicate assignment drm/sched: add rcu_barrier after entity fini drm/amdgpu: move VM BOs on LRU again drm/amdgpu: consistenly use VM moved flag drm/amdgpu: kmap PDs/PTs in amdgpu_vm_update_directories drm/amdgpu: further optimize amdgpu_vm_handle_moved drm/amdgpu: cleanup amdgpu_vm_validate_pt_bos v2 drm/amdgpu: rework VM state machine lock handling v2 drm/amdgpu: Add runtime VCN PG support drm/amdgpu: Enable VCN static PG by default on RV drm/amdgpu: Add VCN static PG support on RV drm/amdgpu: Enable VCN CG by default on RV drm/amdgpu: Add static CG control for VCN on RV drm/exynos: Fix default value for zpos plane property ...
2018-06-06block: pass failfast and driver-specific flags to flush requestsHannes Reinecke
If flush requests are being sent to the device we need to inherit the failfast and driver-specific flags, too, otherwise I/O will fail. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-06x86/apic/vector: Print APIC control bits in debugfsThomas Gleixner
Extend the debugability of the vector management by adding the state bits to the debugfs output. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.908136099@linutronix.de
2018-06-06genirq/affinity: Defer affinity setting if irq chip is busyThomas Gleixner
The case that interrupt affinity setting fails with -EBUSY can be handled in the kernel completely by using the already available generic pending infrastructure. If a irq_chip::set_affinity() fails with -EBUSY, handle it like the interrupts for which irq_chip::set_affinity() can only be invoked from interrupt context. Copy the new affinity mask to irq_desc::pending_mask and set the affinity pending bit. The next raised interrupt for the affected irq will check the pending bit and try to set the new affinity from the handler. This avoids that -EBUSY is returned when an affinity change is requested from user space and the previous change has not been cleaned up. The new affinity will take effect when the next interrupt is raised from the device. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de
2018-06-06x86/platform/uv: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of the special uv_ack_apic() implementation which is merily a wrapper around ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.721691398@linutronix.de
2018-06-06x86/ioapic: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of directly invoking ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de
2018-06-06irq_remapping: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of the special ir_ack_apic_edge() implementation which is merily a wrapper around ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.555716895@linutronix.de
2018-06-06x86/apic: Provide apic_ack_irq()Thomas Gleixner
apic_ack_edge() is explicitely for handling interrupt affinity cleanup when interrupt remapping is not available or disable. Remapped interrupts and also some of the platform specific special interrupts, e.g. UV, invoke ack_APIC_irq() directly. To address the issue of failing an affinity update with -EBUSY the delayed affinity mechanism can be reused, but ack_APIC_irq() does not handle that. Adding this to ack_APIC_irq() is not possible, because that function is also used for exceptions and directly handled interrupts like IPIs. Create a new function, which just contains the conditional invocation of irq_move_irq() and the final ack_APIC_irq(). Reuse the new function in apic_ack_edge(). Preparatory change for the real fix. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
2018-06-06genirq/migration: Avoid out of line call if pending is not setThomas Gleixner
The upcoming fix for the -EBUSY return from affinity settings requires to use the irq_move_irq() functionality even on irq remapped interrupts. To avoid the out of line call, move the check for the pending bit into an inline helper. Preparatory change for the real fix. No functional change. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
2018-06-06genirq/generic_pending: Do not lose pending affinity updateThomas Gleixner
The generic pending interrupt mechanism moves interrupts from the interrupt handler on the original target CPU to the new destination CPU. This is required for x86 and ia64 due to the way the interrupt delivery and acknowledge works if the interrupts are not remapped. However that update can fail for various reasons. Some of them are valid reasons to discard the pending update, but the case, when the previous move has not been fully cleaned up is not a legit reason to fail. Check the return value of irq_do_set_affinity() for -EBUSY, which indicates a pending cleanup, and rearm the pending move in the irq dexcriptor so it's tried again when the next interrupt arrives. Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de
2018-06-06x86/apic/vector: Prevent hlist corruption and leaksThomas Gleixner
Several people observed the WARN_ON() in irq_matrix_free() which triggers when the caller tries to free an vector which is not in the allocation range. Song provided the trace information which allowed to decode the root cause. The rework of the vector allocation mechanism failed to preserve a sanity check, which prevents setting a new target vector/CPU when the previous affinity change has not fully completed. As a result a half finished affinity change can be overwritten, which can cause the leak of a irq descriptor pointer on the previous target CPU and double enqueue of the hlist head into the cleanup lists of two or more CPUs. After one CPU cleaned up its vector the next CPU will invoke the cleanup handler with vector 0, which triggers the out of range warning in the matrix allocator. Prevent this by checking the apic_data of the interrupt whether the move_in_progress flag is false and the hlist node is not hashed. Return -EBUSY if not. This prevents the damage and restores the behaviour before the vector allocation rework, but due to other changes in that area it also widens the chance that user space can observe -EBUSY. In theory this should be fine, but actually not all user space tools handle -EBUSY correctly. Addressing that is not part of this fix, but will be addressed in follow up patches. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Dmitry Safonov <0x7f454c46@gmail.com> Reported-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de
2018-06-06enic: fix UDP rss bitsGovindarajulu Varadarajan
In commit 48398b6e7065 ("enic: set UDP rss flag") driver needed to set a single bit to enable UDP rss. This is changed to two bit. One for UDP IPv4 and other bit for UDP IPv6. The hardware which supports this is not released yet. When released, driver should set 2 bit to enable UDP rss for both IPv4 and IPv6. Also add spinlock around vnic_dev_capable_rss_hash_type(). Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-06objtool: Fix GCC 8 cold subfunction detection for aliased functionsJosh Poimboeuf
The kbuild test robot reported the following issue: kernel/time/posix-stubs.o: warning: objtool: sys_ni_posix_timers.cold.1()+0x0: unreachable instruction This file creates symbol aliases for the sys_ni_posix_timers() function. So there are multiple ELF function symbols for the same function: 23: 0000000000000150 26 FUNC GLOBAL DEFAULT 1 __x64_sys_timer_create 24: 0000000000000150 26 FUNC GLOBAL DEFAULT 1 sys_ni_posix_timers 25: 0000000000000150 26 FUNC GLOBAL DEFAULT 1 __ia32_sys_timer_create 26: 0000000000000150 26 FUNC GLOBAL DEFAULT 1 __x64_sys_timer_gettime Here's the corresponding cold subfunction: 11: 0000000000000000 45 FUNC LOCAL DEFAULT 6 sys_ni_posix_timers.cold.1 When analyzing overlapping functions, objtool only looks at the first one in the symbol list. The rest of the functions are basically ignored because they point to instructions which have already been analyzed. So in this case it analyzes the __x64_sys_timer_create() function, but then it fails to recognize that its cold subfunction is sys_ni_posix_timers.cold.1(), because the names are different. Make the subfunction detection a little smarter by associating each subfunction with the first function which jumps to it, since that's the one which will be analyzed. Unfortunately we still have to leave the original subfunction detection code in place, thanks to GCC switch tables. (See the comment for more details.) Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/d3ba52662cbc8e3a64a3b64d44b4efc5674fd9ab.1527855808.git.jpoimboe@redhat.com
2018-06-06x86/bugs: Switch the selection of mitigation from CPU vendor to CPU featuresKonrad Rzeszutek Wilk
Both AMD and Intel can have SPEC_CTRL_MSR for SSBD. However AMD also has two more other ways of doing it - which are !SPEC_CTRL MSR ways. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: andrew.cooper3@citrix.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180601145921.9500-4-konrad.wilk@oracle.com
2018-06-06x86/bugs: Add AMD's SPEC_CTRL MSR usageKonrad Rzeszutek Wilk
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that if CPUID 8000_0008.EBX[24] is set we should be using the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f) for speculative store bypass disable. This in effect means we should clear the X86_FEATURE_VIRT_SSBD flag so that we would prefer the SPEC_CTRL MSR. See the document titled: 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199889 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: andrew.cooper3@citrix.com Cc: Joerg Roedel <joro@8bytes.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com
2018-06-06x86/bugs: Add AMD's variant of SSB_NOKonrad Rzeszutek Wilk
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that the CPUID 8000_0008.EBX[26] will mean that the speculative store bypass disable is no longer needed. A copy of this document is available at: https://bugzilla.kernel.org/show_bug.cgi?id=199889 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: kvm@vger.kernel.org Cc: andrew.cooper3@citrix.com Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180601145921.9500-2-konrad.wilk@oracle.com
2018-06-06x86/vector: Fix the args of vector_alloc tracepointDou Liyang
The vector_alloc tracepont reversed the reserved and ret aggs, that made the trace print wrong. Exchange them. Fixes: 8d1e3dca7de6 ("x86/vector: Add tracepoints for vector management") Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180601065031.21872-1-douly.fnst@cn.fujitsu.com
2018-06-06x86/idt: Simplify the idt_setup_apic_and_irq_gates()Dou Liyang
The idt_setup_apic_and_irq_gates() sets the gates from FIRST_EXTERNAL_VECTOR up to FIRST_SYSTEM_VECTOR first. then secondly, from FIRST_SYSTEM_VECTOR to NR_VECTORS, it takes both APIC=y and APIC=n into account. But for APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS, all vectors has been set at the first step. Simplify the second step, make it just work for APIC=y. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180523023555.2933-1-douly.fnst@cn.fujitsu.com
2018-06-06x86/platform/uv: Remove extra parenthesesVarsha Rao
Remove extra parentheses to fix the extraneous parentheses clang warning. Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Varsha Rao <rvarsha016@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180520080012.8215-1-rvarsha016@gmail.com
2018-06-06x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SMEKirill A. Shutemov
AMD SME claims one bit from physical address to indicate whether the page is encrypted or not. To achieve that we clear out the bit from __PHYSICAL_MASK. The capability to adjust __PHYSICAL_MASK is required beyond AMD SME. For instance for upcoming Intel Multi-Key Total Memory Encryption. Factor it out into a separate feature with own Kconfig handle. It also helps with overhead of AMD SME. It saves more than 3k in .text on defconfig + AMD_MEM_ENCRYPT: add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564) We would need to return to this once we have infrastructure to patch constants in code. That's good candidate for it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com
2018-06-06x86: Mark native_set_p4d() as __always_inlineArnd Bergmann
When CONFIG_OPTIMIZE_INLINING is enabled, the function native_set_p4d() may not be fully inlined into the caller, resulting in a false-positive warning about an access to the __pgtable_l5_enabled variable from a non-__init function, despite the original caller being an __init function: WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_set_p4d() to the variable .init.data:__pgtable_l5_enabled WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_p4d_clear() to the variable .init.data:__pgtable_l5_enabled The function native_set_p4d() references the variable __initdata __pgtable_l5_enabled. This is often because native_set_p4d lacks a __initdata annotation or the annotation of __pgtable_l5_enabled is wrong. Marking the native_set_p4d function and its caller native_p4d_clear() avoids this problem. I did not bisect the original cause, but I assume this is related to the recent rework that turned pgtable_l5_enabled() into an inline function, which in turn caused the compiler to make different inlining decisions. Fixes: ad3fe525b950 ("x86/mm: Unify pgtable_l5_enabled usage in early boot code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Link: https://lkml.kernel.org/r/20180605113715.1133726-1-arnd@arndb.de
2018-06-06irqchip/ls-scfg-msi: Map MSIs in the iommuLaurentiu Tudor
Add the required iommu_dma_map_msi_msg() when composing the MSI message, otherwise the interrupts will not work. Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: zhiqiang.hou@nxp.com Cc: minghuan.lian@nxp.com Link: https://lkml.kernel.org/r/20180605122727.12831-1-laurentiu.tudor@nxp.com
2018-06-06irqchip/stm32: Fix non-SMP build warningArnd Bergmann
A CONFIG_SMP=n build emits a harmless compile-time warning: drivers/irqchip/irq-stm32-exti.c:495:12: error: 'stm32_exti_h_set_affinity' defined but not used [-Werror=unused-function] The #ifdef is inconsistent here, and it's better to use an IS_ENABLED() check that lets the compiler silently drop that function. Fixes: 927abfc4461e ("irqchip/stm32: Add stm32mp1 support with hierarchy domain") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ludovic Barre <ludovic.barre@st.com> Cc: Rob Herring <robh@kernel.org> Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org> Cc: Radoslaw Pietrzyk <radoslaw.pietrzyk@gmail.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Link: https://lkml.kernel.org/r/20180605114347.1347128-1-arnd@arndb.de
2018-06-06rseq/selftests: Provide Makefile, scripts, gitignoreMathieu Desnoyers
A run_param_test.sh script runs many variants of the parametrizable tests. Wire up the rseq Makefile, add directory entry into MAINTAINERS file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-17-mathieu.desnoyers@efficios.com
2018-06-06rseq/selftests: Provide parametrized testsMathieu Desnoyers
"param_test" is a parametrizable restartable sequences test. See the "--help" output for usage. "param_test_benchmark" is the same as "param_test", but it removes testing book-keeping code to allow accurate benchmarks. "param_test_compare_twice" is the same as "param_test", but it performs each comparison within rseq critical section twice, thus validating invariants. If any of the second comparisons fails, an error message is printed and the test aborts. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-16-mathieu.desnoyers@efficios.com
2018-06-06rseq/selftests: Provide basic percpu ops testMathieu Desnoyers
"basic_percpu_ops_test" is a slightly more "realistic" variant, implementing a few simple per-cpu operations and testing their correctness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-15-mathieu.desnoyers@efficios.com
2018-06-06rseq/selftests: Provide basic testMathieu Desnoyers
"basic_test" only asserts that RSEQ works moderately correctly. E.g. that the CPUID pointer works. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-14-mathieu.desnoyers@efficios.com
2018-06-06rseq/selftests: Provide rseq libraryMathieu Desnoyers
This rseq helper library provides a user-space API to the rseq() system call. The rseq fast-path exposes the instruction pointer addresses where the rseq assembly blocks begin and end, as well as the associated abort instruction pointer, in the __rseq_table section. This section allows debuggers may know where to place breakpoints when single-stepping through assembly blocks which may be aborted at any point by the kernel. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-13-mathieu.desnoyers@efficios.com
2018-06-06selftests/lib.mk: Introduce OVERRIDE_TARGETSMathieu Desnoyers
Introduce OVERRIDE_TARGETS to allow tests to express dependencies on header files and .so, which require to override the selftests lib.mk targets. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-12-mathieu.desnoyers@efficios.com
2018-06-06powerpc: Wire up restartable sequences system callBoqun Feng
Wire up the rseq system call on powerpc. This provides an ABI improving the speed of a user-space getcpu operation on powerpc by skipping the getcpu system call on the fast path, as well as improving the speed of user-space operations on per-cpu data compared to using load-reservation/store-conditional atomics. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-11-mathieu.desnoyers@efficios.com
2018-06-06powerpc: Add syscall detection for restartable sequencesBoqun Feng
Syscalls are not allowed inside restartable sequences, so add a call to rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-10-mathieu.desnoyers@efficios.com
2018-06-06powerpc: Add support for restartable sequencesBoqun Feng
Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal when a signal is delivered on top of a restartable sequence critical section. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-9-mathieu.desnoyers@efficios.com
2018-06-06x86: Wire up restartable sequence system callMathieu Desnoyers
Wire up the rseq system call on x86 32/64. This provides an ABI improving the speed of a user-space getcpu operation on x86 by removing the need to perform a function call, "lsl" instruction, or system call on the fast path, as well as improving the speed of user-space operations on per-cpu data. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-8-mathieu.desnoyers@efficios.com
2018-06-06x86: Add support for restartable sequencesMathieu Desnoyers
Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. Check that system calls are not invoked from within rseq critical sections by invoking rseq_signal() from syscall_return_slowpath(). With CONFIG_DEBUG_RSEQ, such behavior results in termination of the process with SIGSEGV. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-7-mathieu.desnoyers@efficios.com
2018-06-06arm: Wire up restartable sequences system callMathieu Desnoyers
Wire up the rseq system call on 32-bit ARM. This provides an ABI improving the speed of a user-space getcpu operation on ARM by skipping the getcpu system call on the fast path, as well as improving the speed of user-space operations on per-cpu data compared to using load-linked/store-conditional. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-6-mathieu.desnoyers@efficios.com
2018-06-06arm: Add syscall detection for restartable sequencesMathieu Desnoyers
Syscalls are not allowed inside restartable sequences, so add a call to rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-5-mathieu.desnoyers@efficios.com
2018-06-06arm: Add restartable sequences supportMathieu Desnoyers
Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-4-mathieu.desnoyers@efficios.com