summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-25tracing: Quiet gcc warning about maybe unused link variableSteven Rostedt (VMware)
Commit 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure") added an if statement that depends on another if statement that gcc doesn't see will initialize the "link" variable and gives the warning: "warning: 'link' may be used uninitialized in this function" It is really a false positive, but to quiet the warning, and also to make sure that it never actually is used uninitialized, initialize the "link" variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler thinks it could be used uninitialized. Cc: stable@vger.kernel.org Fixes: 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25tracing: Fix possible double free in event_enable_trigger_func()Steven Rostedt (VMware)
There was a case that triggered a double free in event_trigger_callback() due to the called reg() function freeing the trigger_data and then it getting freed again by the error return by the caller. The solution there was to up the trigger_data ref count. Code inspection found that event_enable_trigger_func() has the same issue, but is not as easy to trigger (requires harder to trigger failures). It needs to be solved slightly different as it needs more to clean up when the reg() function fails. Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 7862ad1846e99 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands") Reivewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26xsk: fix poll/POLLIN premature returnsBjörn Töpel
Polling for the ingress queues relies on reading the producer/consumer pointers of the Rx queue. Prior this commit, a cached consumer pointer could be used, instead of the actual consumer pointer and therefore report POLLIN prematurely. This patch makes sure that the non-cached consumer pointer is used instead. Reported-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Qi Zhang <qi.z.zhang@intel.com> Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-26bpf, x32: Fix regression caused by commit 24dea04767e6Wang YanQing
Commit 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind") removed the 4 /* Extra space for skb_copy_bits buffer */ from _STACK_SIZE, but it didn't fix the concerned code in emit_prologue and emit_epilogue, and this error will bring very strange kernel runtime errors. This patch fixes it. Fixes: 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind") Reported-by: Meelis Roos <mroos@linux.ee> Bisected-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-25net: igmp: make function __ip_mc_inc_group() staticWei Yongjun
Fixes the following sparse warnings: net/ipv4/igmp.c:1391:6: warning: symbol '__ip_mc_inc_group' was not declared. Should it be static? Fixes: 6e2059b53f98 ("ipv4/igmp: init group mode as INCLUDE when join source group") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25tcp: ack immediately when a cwr packet arrivesLawrence Brakmo
We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The problem is triggered when the last packet of a request arrives CE marked. The reply will carry the ECE mark causing TCP to shrink its cwnd to 1 (because there are no packets in flight). When the 1st packet of the next request arrives, the ACK was sometimes delayed even though it is CWR marked, adding up to 40ms to the RPC latency. This patch insures that CWR marked data packets arriving will be acked immediately. Packetdrill script to reproduce the problem: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 2:3(1) ack 2001 0.200 < [ect0] . 2001:3001(1000) ack 3 win 257 0.200 < [ect0] . 3001:4001(1000) ack 3 win 257 0.200 > [ect01] . 3:3(0) ack 4001 0.210 < [ce] P. 4001:4501(500) ack 3 win 257 +0.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501 +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257 // Previously the ACK sequence below would be 4501, causing a long RTO +0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data +0 > [ect01] . 4:4(0) ack 6501 // now acks everything +0.500 < F. 9501:9501(0) ack 4 win 257 Modified based on comments by Neal Cardwell <ncardwell@google.com> Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25drm/i915/glk: Add Quirk for GLK NUC HDMI port issues.Clint Taylor
On GLK NUC platforms the HDMI retiming buffer needs additional disabled time to correctly sync to a faster incoming signal. When measured on a scope the highspeed lines of the HDMI clock turn off for ~400uS during a normal resolution change. The HDMI retimer on the GLK NUC appears to require at least a full frame of quiet time before a new faster clock can be correctly sync'd. Wait 100ms due to msleep inaccuracies while waiting for a completed frame. Add a quirk to the driver for GLK boards that use ITE66317 HDMI retimers. V2: Add more devices to the quirk list V3: Delay increased to 100ms, check to confirm crtc type is HDMI. V4: crtc type check extended to include _DDI and whitespace fixes v5: Fix white spaces, remove the macro for delay. Revert the crtc type check introduced in v4. Cc: Imre Deak <imre.deak@intel.com> Cc: <stable@vger.kernel.org> # v4.14+ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105887 Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com> Tested-by: Daniel Scheller <d.scheller.oss@gmail.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180710200205.1478-1-radhakrishna.sripada@intel.com (cherry picked from commit 90c3e2198777aaa355b6994a31a79c636c8d4306) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-07-25hinic: Link the logical network device to the pci device in sysfsdann frazier
Otherwise interfaces get exposed under /sys/devices/virtual, which doesn't give udev the context it needs for PCI-based predictable interface names. Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25Merge tag 'perf-core-for-mingo-4.19-20180725' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/cores fixes and improvements from Arnaldo Carvalho de Melo: Tools: top: - Fix 'struct comm_str' removal crash race, detected with refcount_t debugging (Jiri Olsa) - Use last_match threads cache only in single threaded mode, fixing a crash (Jiri Olsa) record: - Synthesize GROUP_DESC feature in pipe mode fixing display of event groups (Jiri Olsa) stat: - Get rid of extra clock display function (Jiri Olsa) perf script: - Show correct offsets for DWARF-based unwinding (Sandipan Das) test: - Check that complex event name is parsed correctly (Alexey Budankov) - Fix subtest number when showing results (Thomas Richter) Arch specific: arm64: - Generate syscall table from the kernel sources (asm/unistd.h) like other arches do, speeding up the support for new system calls in tools such as 'perf trace' (Kim Phillips) arm: - Bail out immediatelly on CoreSight hardware tracing instruction sample failure (Leo Yan) PowerPC: - Fix record+probe_libc_inet_pton.sh 'perf test' entry (Sandipan Das) - Callchain IP filtering fixes (Sandipan Das) S/390: - Add support for detailed S/390 PMU event description in 'perf list' (Thomas Richter) - Add transaction flag (-T) support in 'perf stat' for S/390 (Thomas Richter) - Fix 'perf kvm' S/390 subcommands (Thomas Richter) Infrastructure: hists: - Clarify callchain disabling when available (Arnaldo Carvalho de Melo) evsel: - Use perf_evsel__match instead of open coded equivalent (Jiri Olsa) Documentation: - Add missing documentation for 'perf list' --desc and --debug options (Sangwon Hong) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25virtio_net: Fix incosistent received bytes counterToshiaki Makita
When received packets are dropped in virtio_net driver, received packets counter is incremented but bytes counter is not. As a result, for instance if we drop all packets by XDP, only received is counted and bytes stays 0, which looks inconsistent. IMHO received packets/bytes should be counted if packets are produced by the hypervisor, like what common NICs on physical machines are doing. So fix the bytes counter. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formatsBoris Brezillon
This is needed to ensure ->is_unity is correct when the plane was previously configured to output a multi-planar format with scaling enabled, and is then being reconfigured to output a uniplanar format. Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/20180724133601.32114-1-boris.brezillon@bootlin.com
2018-07-25drm/atomic: Initialize variables in drm_atomic_helper_async_check() to make ↵Boris Brezillon
gcc happy drm_atomic_helper_async_check() declares the plane, old_plane_state and new_plane_state variables to iterate over all planes of the atomic state and make sure only one plane is enabled. Unfortunately gcc is not smart enough to figure out that the check on n_planes is enough to guarantee that plane, new_plane_state and old_plane_state are initialized. Explicitly initialize those variables to NULL to make gcc happy. Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20180724133300.32023-1-boris.brezillon@bootlin.com
2018-07-25drm/atomic: Check old_plane_state->crtc in drm_atomic_helper_async_check()Boris Brezillon
Async plane update is supposed to work only when updating the FB or FB position of an already enabled plane. That does not apply to requests where the plane was previously disabled or assigned to a different CTRC. Check old_plane_state->crtc value to make sure async plane update is allowed. Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/20180724133215.31917-1-boris.brezillon@bootlin.com
2018-07-25Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "One more round of updates for problems seen this -rc series. Drivers fixes are: - Amlogic Meson audio divider fix and CPU clk critical marking - Qualcomm multimedia GDSC marked as 'always on' to keep display working - Aspeed fixes for critical clks, resets causing clks to stay disabled, and an incorrect HPLL frequency calculation - Marvell Armada 3700 cpu clks would undervolt when switching from low frequencies to high frequencies because the voltage didn't stabilize in time so now we switch to an intermediate frequency Plus we have a core framework thinko that messed up the debugfs flag printing logic to make it not very useful" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: aspeed: Support HPLL strapping on ast2400 clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON clk: aspeed: Treat a gate in reset as disabled clk: Really show symbolic clock flags in debugfs clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock clk: meson: audio-divider is one based clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL
2018-07-25Merge tag 'fscache-fixes-20180725' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache/cachefiles fixes from David Howells: - Allow cancelled operations to be queued so they can be cleaned up. - Fix a refcounting bug in the monitoring of reads on backend files whereby a race can occur between monitor objects being listed for work, the work processing being queued and the work processor running and destroying the monitor objects. - Fix a ref overput in object attachment, whereby a tentatively considered object is put in error handling without first being 'got'. - Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an assertion occurs when we retry because it seems the object is now active. - Wait rather BUG'ing on an object collision in the depths of cachefiles as the active object should be being cleaned up - also depends on the one above. * tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: cachefiles: Wait rather than BUG'ing on "Unexpected object collision" cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag fscache: Fix reference overput in fscache_attach_object() error handling cachefiles: Fix refcounting bug in backing-file read monitoring fscache: Allow cancelled operations to be enqueued
2018-07-25tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failureArtem Savkov
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe it returns an error, but does not unset the tp flags it set previously. This results in a probe being considered enabled and failures like being unable to remove the probe through kprobe_events file since probes_open() expects every probe to be disabled. Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 41a7dd420c57 ("tracing/kprobes: Support ftrace_event_file base multibuffer") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25selftests/ftrace: Add snapshot and tracing_on test caseMasami Hiramatsu
Add a testcase for checking snapshot and tracing_on relationship. This ensures that the snapshotting doesn't affect current tracing on/off settings. Link: http://lkml.kernel.org/r/153149932412.11274.15289227592627901488.stgit@devbox Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: linux-kselftest@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25ring_buffer: tracing: Inherit the tracing setting to next ring bufferMasami Hiramatsu
Maintain the tracing on/off setting of the ring_buffer when switching to the trace buffer snapshot. Taking a snapshot is done by swapping the backup ring buffer (max_tr_buffer). But since the tracing on/off setting is defined by the ring buffer, when swapping it, the tracing on/off setting can also be changed. This causes a strange result like below: /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 0 > tracing_on /sys/kernel/debug/tracing # cat tracing_on 0 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 0 We don't touch tracing_on, but snapshot changes tracing_on setting each time. This is an anomaly, because user doesn't know that each "ring_buffer" stores its own tracing-enable state and the snapshot is done by swapping ring buffers. Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp> Cc: stable@vger.kernel.org Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> [ Updated commit log and comment in the code ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25tracing: Fix double free of event_trigger_dataSteven Rostedt (VMware)
Running the following: # cd /sys/kernel/debug/tracing # echo 500000 > buffer_size_kb [ Or some other number that takes up most of memory ] # echo snapshot > events/sched/sched_switch/trigger Triggers the following bug: ------------[ cut here ]------------ kernel BUG at mm/slub.c:296! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:kfree+0x16c/0x180 Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246 RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80 RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500 RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00 FS: 00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0 Call Trace: event_trigger_callback+0xee/0x1d0 event_trigger_write+0xfc/0x1a0 __vfs_write+0x33/0x190 ? handle_mm_fault+0x115/0x230 ? _cond_resched+0x16/0x40 vfs_write+0xb0/0x190 ksys_write+0x52/0xc0 do_syscall_64+0x5a/0x160 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f363e16ab50 Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24 RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50 RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001 RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009 R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper 86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e ---[ end trace d301afa879ddfa25 ]--- The cause is because the register_snapshot_trigger() call failed to allocate the snapshot buffer, and then called unregister_trigger() which freed the data that was passed to it. Then on return to the function that called register_snapshot_trigger(), as it sees it failed to register, it frees the trigger_data again and causes a double free. By calling event_trigger_init() on the trigger_data (which only ups the reference counter for it), and then event_trigger_free() afterward, the trigger_data would not get freed by the registering trigger function as it would only up and lower the ref count for it. If the register trigger function fails, then the event_trigger_free() called after it will free the trigger data normally. Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home Cc: stable@vger.kerne.org Fixes: 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command") Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25cachefiles: Wait rather than BUG'ing on "Unexpected object collision"Kiran Kumar Modukuri
If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in the active object tree, we have been emitting a BUG after logging information about it and the new object. Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared on the old object (or return an error). The ACTIVE flag should be cleared after it has been removed from the active object tree. A timeout of 60s is used in the wait, so we shouldn't be able to get stuck there. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-07-25cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flagKiran Kumar Modukuri
In cachefiles_mark_object_active(), the new object is marked active and then we try to add it to the active object tree. If a conflicting object is already present, we want to wait for that to go away. After the wait, we go round again and try to re-mark the object as being active - but it's already marked active from the first time we went through and a BUG is issued. Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again. Analysis from Kiran Kumar Modukuri: [Impact] Oops during heavy NFS + FSCache + Cachefiles CacheFiles: Error: Overlong wait for old active object to go away. BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 CacheFiles: Error: Object already active kernel BUG at fs/cachefiles/namei.c:163! [Cause] In a heavily loaded system with big files being read and truncated, an fscache object for a cookie is being dropped and a new object being looked. The new object being looked for has to wait for the old object to go away before the new object is moved to active state. [Fix] Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when retrying the object lookup. [Testcase] Have run ~100 hours of NFS stress tests and have not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-07-25fscache: Fix reference overput in fscache_attach_object() error handlingKiran Kumar Modukuri
When a cookie is allocated that causes fscache_object structs to be allocated, those objects are initialised with the cookie pointer, but aren't blessed with a ref on that cookie unless the attachment is successfully completed in fscache_attach_object(). If attachment fails because the parent object was dying or there was a collision, fscache_attach_object() returns without incrementing the cookie counter - but upon failure of this function, the object is released which then puts the cookie, whether or not a ref was taken on the cookie. Fix this by taking a ref on the cookie when it is assigned in fscache_object_init(), even when we're creating a root object. Analysis from Kiran Kumar: This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277 fscache cookie ref count updated incorrectly during fscache object allocation resulting in following Oops. kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321! kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639! [Cause] Two threads are trying to do operate on a cookie and two objects. (1) One thread tries to unmount the filesystem and in process goes over a huge list of objects marking them dead and deleting the objects. cookie->usage is also decremented in following path: nfs_fscache_release_super_cookie -> __fscache_relinquish_cookie ->__fscache_cookie_put ->BUG_ON(atomic_read(&cookie->usage) <= 0); (2) A second thread tries to lookup an object for reading data in following path: fscache_alloc_object 1) cachefiles_alloc_object -> fscache_object_init -> assign cookie, but usage not bumped. 2) fscache_attach_object -> fails in cant_attach_object because the cookie's backing object or cookie's->parent object are going away 3) fscache_put_object -> cachefiles_put_object ->fscache_object_destroy ->fscache_cookie_put ->BUG_ON(atomic_read(&cookie->usage) <= 0); [NOTE from dhowells] It's unclear as to the circumstances in which (2) can take place, given that thread (1) is in nfs_kill_super(), however a conflicting NFS mount with slightly different parameters that creates a different superblock would do it. A backtrace from Kiran seems to show that this is a possibility: kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639! ... RIP: __fscache_cookie_put+0x3a/0x40 [fscache] Call Trace: __fscache_relinquish_cookie+0x87/0x120 [fscache] nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs] nfs_kill_super+0x29/0x40 [nfs] deactivate_locked_super+0x48/0x80 deactivate_super+0x5c/0x60 cleanup_mnt+0x3f/0x90 __cleanup_mnt+0x12/0x20 task_work_run+0x86/0xb0 exit_to_usermode_loop+0xc2/0xd0 syscall_return_slowpath+0x4e/0x60 int_ret_from_sys_call+0x25/0x9f [Fix] Bump up the cookie usage in fscache_object_init, when it is first being assigned a cookie atomically such that the cookie is added and bumped up if its refcount is not zero. Remove the assignment in fscache_attach_object(). [Testcase] I have run ~100 hours of NFS stress tests and not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Fixes: ccc4fc3d11e9 ("FS-Cache: Implement the cookie management part of the netfs API") Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-07-25cachefiles: Fix refcounting bug in backing-file read monitoringKiran Kumar Modukuri
cachefiles_read_waiter() has the right to access a 'monitor' object by virtue of being called under the waitqueue lock for one of the pages in its purview. However, it has no ref on that monitor object or on the associated operation. What it is allowed to do is to move the monitor object to the operation's to_do list, but once it drops the work_lock, it's actually no longer permitted to access that object. However, it is trying to enqueue the retrieval operation for processing - but it can only do this via a pointer in the monitor object, something it shouldn't be doing. If it doesn't enqueue the operation, the operation may not get processed. If the order is flipped so that the enqueue is first, then it's possible for the work processor to look at the to_do list before the monitor is enqueued upon it. Fix this by getting a ref on the operation so that we can trust that it will still be there once we've added the monitor to the to_do list and dropped the work_lock. The op can then be enqueued after the lock is dropped. The bug can manifest in one of a couple of ways. The first manifestation looks like: FS-Cache: FS-Cache: Assertion failed FS-Cache: 6 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:494! RIP: 0010:fscache_put_operation+0x1e3/0x1f0 ... fscache_op_work_func+0x26/0x50 process_one_work+0x131/0x290 worker_thread+0x45/0x360 kthread+0xf8/0x130 ? create_worker+0x190/0x190 ? kthread_cancel_work_sync+0x10/0x10 ret_from_fork+0x1f/0x30 This is due to the operation being in the DEAD state (6) rather than INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through fscache_put_operation(). The bug can also manifest like the following: kernel BUG at fs/fscache/operation.c:69! ... [exception RIP: fscache_enqueue_operation+246] ... #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not entirely clear which assertion failed. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Lei Xue <carmark.dlut@gmail.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Reported-by: Anthony DeRobertis <aderobertis@metrics.net> Reported-by: NeilBrown <neilb@suse.com> Reported-by: Daniel Axtens <dja@axtens.net> Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Daniel Axtens <dja@axtens.net>
2018-07-25fscache: Allow cancelled operations to be enqueuedKiran Kumar Modukuri
Alter the state-check assertion in fscache_enqueue_operation() to allow cancelled operations to be given processing time so they can be cleaned up. Also fix a debugging statement that was requiring such operations to have an object assigned. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-07-25arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setupsJohannes Weiner
Arnd reports the following arm64 randconfig build error with the PSI patches that add another page flag: /git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init': /git/arm-soc/include/linux/compiler.h:357:38: error: call to '__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT) The additional page flag causes other information stored in page->flags to get bumped into their own struct page member: #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT #else #define LAST_CPUPID_WIDTH 0 #endif #if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0 #define LAST_CPUPID_NOT_IN_PAGE_FLAGS #endif which in turn causes the struct page size to exceed the size set in STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the VMEMMAP page array according to address space and struct page size. However, the check is performed - and triggers here - on a !VMEMMAP config, which consumes an additional 22 page bits for the sparse section id. When VMEMMAP is enabled, those bits are returned, cpupid doesn't need its own member, and the page passes the VMEMMAP check. Restrict that check to the situation it was meant to check: that we are sizing the VMEMMAP page array correctly. Says Arnd: Further experiments show that the build error already existed before, but was only triggered with larger values of CONFIG_NR_CPU and/or CONFIG_NODES_SHIFT that might be used in actual configurations but not in randconfig builds. With longer CPU and node masks, I could recreate the problem with kernels as old as linux-4.7 when arm64 NUMA support got added. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") Fixes: 3e1907d5bf5a ("arm64: mm: move vmemmap region right below the linear region") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-25arm64: Check for errata before evaluating cpu featuresDirk Mueller
Since commit d3aec8a28be3b8 ("arm64: capabilities: Restrict KPTI detection to boot-time CPUs") we rely on errata flags being already populated during feature enumeration. The order of errata and features was flipped as part of commit ed478b3f9e4a ("arm64: capabilities: Group handling of features and errata workarounds"). Return to the orginal order of errata and feature evaluation to ensure errata flags are present during feature evaluation. Fixes: ed478b3f9e4a ("arm64: capabilities: Group handling of features and errata workarounds") CC: Suzuki K Poulose <suzuki.poulose@arm.com> CC: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Dirk Mueller <dmueller@suse.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-25nvmet: only check for filebacking on -ENOTBLKHannes Reinecke
We only need to check for a file-backed namespace if nvmet_bdev_ns_enable() returns -ENOTBLK. For any other error it's pointless as the open() error will remain the same. Fixes: d5eff33e ("nvmet: add simple file backed ns support") Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-25nvmet: fixup crash on NULL device pathHannes Reinecke
When writing an empty string into the device_path attribute the kernel will crash with nvmet: failed to open block device (null): (-22) BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 This patch sanitizes the error handling for invalid device path settings. Fixes: a07b4970 ("nvmet: add a generic NVMe target") Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-25arm/asm/tlb.h: Fix build error implicit func declarationAnders Roxell
Building on arm 32 with LPAE enabled we don't include asm-generic/tlb.h, where we have tlb_flush_remove_tables_local and tlb_flush_remove_tables defined. The build fails with: mm/memory.c: In function ‘tlb_remove_table_smp_sync’: mm/memory.c:339:2: error: implicit declaration of function ‘tlb_flush_remove_tables_local’; did you mean ‘tlb_remove_table’? [-Werror=implicit-function-declaration] ... This bug got introduced in: 2ff6ddf19c0e ("x86/mm/tlb: Leave lazy TLB mode at page table free time") To fix this issue we define them in arm 32's specific asm/tlb.h file as well. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@armlinux.org.uk Cc: riel@surriel.com Cc: songliubraving@fb.com Fixes: 2ff6ddf19c0e ("x86/mm/tlb: Leave lazy TLB mode at page table free time") Link: http://lkml.kernel.org/r/20180725095557.19668-1-anders.roxell@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25x86/boot: Fix if_changed build flip/flop bugKees Cook
Dirk Gouders reported that two consecutive "make" invocations on an already compiled tree will show alternating behaviors: $ make CALL scripts/checksyscalls.sh DESCEND objtool CHK include/generated/compile.h DATAREL arch/x86/boot/compressed/vmlinux Kernel: arch/x86/boot/bzImage is ready (#48) Building modules, stage 2. MODPOST 165 modules $ make CALL scripts/checksyscalls.sh DESCEND objtool CHK include/generated/compile.h LD arch/x86/boot/compressed/vmlinux ZOFFSET arch/x86/boot/zoffset.h AS arch/x86/boot/header.o LD arch/x86/boot/setup.elf OBJCOPY arch/x86/boot/setup.bin OBJCOPY arch/x86/boot/vmlinux.bin BUILD arch/x86/boot/bzImage Setup is 15644 bytes (padded to 15872 bytes). System is 6663 kB CRC 3eb90f40 Kernel: arch/x86/boot/bzImage is ready (#48) Building modules, stage 2. MODPOST 165 modules He bisected it back to: commit 98f78525371b ("x86/boot: Refuse to build with data relocations") The root cause was the use of the "if_changed" kbuild function multiple times for the same target. It was designed to only be used once per target, otherwise it will effectively always trigger, flipping back and forth between the two commands getting recorded by "if_changed". Instead, this patch merges the two commands into a single function to get stable build artifacts (i.e. .vmlinux.cmd), and a single build behavior. Bisected-and-Reported-by: Dirk Gouders <dirk@gouders.net> Fix-Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25locking/atomics: Rework ordering barriersMark Rutland
Currently architectures can override __atomic_op_*() to define the barriers used before/after a relaxed atomic when used to build acquire/release/fence variants. This has the unfortunate property of requiring the architecture to define the full wrapper for the atomics, rather than just the barriers they care about, and gets in the way of generating atomics which can be easily read. Instead, this patch has architectures define an optional set of barriers: * __atomic_acquire_fence() * __atomic_release_fence() * __atomic_pre_full_fence() * __atomic_post_full_fence() ... which <linux/atomic.h> uses to build the wrappers. It would be nice if we could undef these, along with the __atomic_op_*() wrappers, but that would break the cmpxchg() wrappers, which are written in preprocessor. Undefs would have been nice, but alas. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: dvyukov@google.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-7-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25locking/atomics: Instrument cmpxchg_double*()Mark Rutland
We currently don't instrument cmpxchg_double() and cmpxchg_double_local() due to compilation issues reported in the past, which are supposedly related to GCC bug 72873 [1], reported when GCC 7 was not yet released. This bug only applies to x86-64, and does not apply to other architectures. While the test case for GCC bug 72873 triggers issues with released versions of GCC, the instrumented kernel code compiles fine for all configurations I have tried, and it is unclear how the two cases are/were related. As we can't reproduce the kernel build failures, let's instrument cmpxchg_double*() again. We can revisit the issue if build failures reappear. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: parri.andrea@gmail.com Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-6-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25locking/atomics: Instrument xchg()Mark Rutland
While we instrument all of the (non-relaxed) atomic_*() functions and cmpxchg(), we missed xchg(). Let's add instrumentation for xchg(), fixing up x86 to implement arch_xchg(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: parri.andrea@gmail.com Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-5-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25locking/atomics: Simplify cmpxchg() instrumentationMark Rutland
Currently we define some fairly verbose wrappers for the cmpxchg() family so that we can pass a pointer and size into kasan_check_write(). The wrappers duplicate the size-switching logic necessary in arch code, and only work for scalar types. On some architectures, (cmp)xchg are used on non-scalar types, and thus the instrumented wrappers need to be able to handle this. We could take the type-punning logic from {READ,WRITE}_ONCE(), but this makes the wrappers even more verbose, and requires several local variables in the macros. Instead, let's simplify the wrappers into simple macros which: * snapshot the pointer into a single local variable, called __ai_ptr to avoid conflicts with variables in the scope of the caller. * call kasan_check_write() on __ai_ptr. * invoke the relevant arch_*() function, passing the original arguments, bar __ai_ptr being substituted for ptr. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: parri.andrea@gmail.com Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-4-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentationMark Rutland
Currently x86's arch_cmpxchg64() and arch_cmpxchg64_local() are instrumented twice, as they call into instrumented atomics rather than their arch_ equivalents. A call to cmpxchg64() results in: cmpxchg64() kasan_check_write() arch_cmpxchg64() cmpxchg() kasan_check_write() arch_cmpxchg() Let's fix this up and call the arch_ equivalents, resulting in: cmpxchg64() kasan_check_write() arch_cmpxchg64() arch_cmpxchg() Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: andy.shevchenko@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: parri.andrea@gmail.com Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-3-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel: Support Extended PEBS for Goldmont PlusKan Liang
Enable the extended PEBS for Goldmont Plus. There is no specific PEBS constrains for Goldmont Plus. Removing the pebs_constraints for Goldmont Plus. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20180309021542.11374-4-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel/ds: Handle PEBS overflow for fixed countersKan Liang
The pebs_drain() need to support fixed counters. The DS Save Area now include "counter reset value" fields for each fixed counters. Extend the related variables (e.g. mask, counters, error) to support fixed counters. There is no extended PEBS in PEBS v2 and earlier PEBS format. Only need to change the code for PEBS v3 and later PEBS format. Extend the pebs_event_reset[] logic to support new "counter reset value" fields. Increase the reserve space for fixed counters. Based-on-code-from: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20180309021542.11374-3-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel: Support PEBS on fixed countersKan Liang
The Extended PEBS feature supports PEBS on fixed-function performance counters as well as all four general purpose counters. It has to change the order of PEBS and fixed counter enabling to make sure PEBS is enabled for the fixed counters. The change of the order doesn't impact the behavior of current code on other platforms which don't support extended PEBS. Because there is no dependency among those enable/disable functions. Don't enable IRQ generation (0x8) for MSR_ARCH_PERFMON_FIXED_CTR_CTRL. The PEBS ucode will handle the interrupt generation. Based-on-code-from: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20180309021542.11374-2-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel: Introduce PMU flag for Extended PEBSKan Liang
The Extended PEBS feature, introduced in the Goldmont Plus microarchitecture, supports all events as "Extended PEBS". Introduce flag PMU_FL_PEBS_ALL to indicate the platforms which support extended PEBS. To support all events, it needs to support all constraints for PEBS. To avoid duplicating all the constraints in the PEBS table, making the PEBS code search the normal constraints too. Based-on-code-from: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20180309021542.11374-1-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/core: Fix crash when using HW tracing kernel filtersMathieu Poirier
In function perf_event_parse_addr_filter(), the path::dentry of each struct perf_addr_filter is left unassigned (as it should be) when the pattern being parsed is related to kernel space. But in function perf_addr_filter_match() the same dentries are given to d_inode() where the value is not expected to be NULL, resulting in the following splat: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 pc : perf_event_mmap+0x2fc/0x5a0 lr : perf_event_mmap+0x2c8/0x5a0 Process uname (pid: 2860, stack limit = 0x000000001cbcca37) Call trace: perf_event_mmap+0x2fc/0x5a0 mmap_region+0x124/0x570 do_mmap+0x344/0x4f8 vm_mmap_pgoff+0xe4/0x110 vm_mmap+0x2c/0x40 elf_map+0x60/0x108 load_elf_binary+0x450/0x12c4 search_binary_handler+0x90/0x290 __do_execve_file.isra.13+0x6e4/0x858 sys_execve+0x3c/0x50 el0_svc_naked+0x30/0x34 This patch is fixing the problem by introducing a new check in function perf_addr_filter_match() to see if the filter's dentry is NULL. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: miklos@szeredi.hu Cc: namhyung@kernel.org Cc: songliubraving@fb.com Fixes: 9511bce9fe8e ("perf/core: Fix bad use of igrab()") Link: http://lkml.kernel.org/r/1531782831-1186-1-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)Peter Zijlstra
Vince reported the perf_fuzzer giving various unwinder warnings and Josh reported: > Deja vu. Most of these are related to perf PEBS, similar to the > following issue: > > b8000586c90b ("perf/x86/intel: Cure bogus unwind from PEBS entries") > > This is basically the ORC version of that. setup_pebs_sample_data() is > assembling a franken-pt_regs which ORC isn't happy about. RIP is > inconsistent with some of the other registers (like RSP and RBP). And where the previous unwinder only needed BP,SP ORC also requires IP. But we cannot spoof IP because then the sample will get displaced, entirely negating the point of PEBS. So cure the whole thing differently by doing the unwind early; this does however require a means to communicate we did the unwind early. We (ab)use an unused sample_type bit for this, which we set on events that fill out the data->callchain before the normal perf_prepare_sample(). Debugged-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()Srikar Dronamraju
numa_migrate_preferred() is called periodically or when task preferred node changes. Preferred node evaluations happen once per scan sequence. If the scan completion happens just after the periodic NUMA migration, then we try to migrate to the preferred node and the preferred node might change, needing another node migration. Avoid this by checking for scan sequence completion only when checking for periodic migration. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25862.6 26158.1 1.14258 1 74357 72725 -2.19482 Running SPECjbb2005 on a 16 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 8 117019 113992 -2.58 1 179095 174947 -2.31 (numbers from v1 based on v4.17-rc5) Testcase Time: Min Max Avg StdDev numa01.sh Real: 449.46 770.77 615.22 101.70 numa01.sh Sys: 132.72 208.17 170.46 24.96 numa01.sh User: 39185.26 60290.89 50066.76 6807.84 numa02.sh Real: 60.85 61.79 61.28 0.37 numa02.sh Sys: 15.34 24.71 21.08 3.61 numa02.sh User: 5204.41 5249.85 5231.21 17.60 numa03.sh Real: 785.50 916.97 840.77 44.98 numa03.sh Sys: 108.08 133.60 119.43 8.82 numa03.sh User: 61422.86 70919.75 64720.87 3310.61 numa04.sh Real: 429.57 587.37 480.80 57.40 numa04.sh Sys: 240.61 321.97 290.84 33.58 numa04.sh User: 34597.65 40498.99 37079.48 2060.72 numa05.sh Real: 392.09 431.25 414.65 13.82 numa05.sh Sys: 229.41 372.48 297.54 53.14 numa05.sh User: 33390.86 34697.49 34222.43 556.42 Testcase Time: Min Max Avg StdDev %Change numa01.sh Real: 424.63 566.18 498.12 59.26 23.50% numa01.sh Sys: 160.19 256.53 208.98 37.02 -18.4% numa01.sh User: 37320.00 46225.58 42001.57 3482.45 19.20% numa02.sh Real: 60.17 62.47 60.91 0.85 0.607% numa02.sh Sys: 15.30 22.82 17.04 2.90 23.70% numa02.sh User: 5202.13 5255.51 5219.08 20.14 0.232% numa03.sh Real: 823.91 844.89 833.86 8.46 0.828% numa03.sh Sys: 130.69 148.29 140.47 6.21 -14.9% numa03.sh User: 62519.15 64262.20 63613.38 620.05 1.740% numa04.sh Real: 515.30 603.74 548.56 30.93 -12.3% numa04.sh Sys: 459.73 525.48 489.18 21.63 -40.5% numa04.sh User: 40561.96 44919.18 42047.87 1526.85 -11.8% numa05.sh Real: 396.58 454.37 421.13 19.71 -1.53% numa05.sh Sys: 208.72 422.02 348.90 73.60 -14.7% numa05.sh User: 33124.08 36109.35 34846.47 1089.74 -1.79% Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-20-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/numa: Use group_weights to identify if migration degrades localitySrikar Dronamraju
On NUMA_BACKPLANE and NUMA_GLUELESS_MESH systems, tasks/memory should be consolidated to the closest group of nodes. In such a case, relying on group_fault metric may not always help to consolidate. There can always be a case where a node closer to the preferred node may have lesser faults than a node further away from the preferred node. In such a case, moving to node with more faults might avoid numa consolidation. Using group_weight would help to consolidate task/memory around the preferred_node. While here, to be on the conservative side, don't override migrate thread degrades locality logic for CPU_NEWLY_IDLE load balancing. Note: Similar problems exist with should_numa_migrate_memory and will be dealt separately. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25645.4 25960 1.22 1 72142 73550 1.95 Running SPECjbb2005 on a 16 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 8 110199 120071 8.958 1 176303 176249 -0.03 (numbers from v1 based on v4.17-rc5) Testcase Time: Min Max Avg StdDev numa01.sh Real: 490.04 774.86 596.26 96.46 numa01.sh Sys: 151.52 242.88 184.82 31.71 numa01.sh User: 41418.41 60844.59 48776.09 6564.27 numa02.sh Real: 60.14 62.94 60.98 1.00 numa02.sh Sys: 16.11 30.77 21.20 5.28 numa02.sh User: 5184.33 5311.09 5228.50 44.24 numa03.sh Real: 790.95 856.35 826.41 24.11 numa03.sh Sys: 114.93 118.85 117.05 1.63 numa03.sh User: 60990.99 64959.28 63470.43 1415.44 numa04.sh Real: 434.37 597.92 504.87 59.70 numa04.sh Sys: 237.63 397.40 289.74 55.98 numa04.sh User: 34854.87 41121.83 38572.52 2615.84 numa05.sh Real: 386.77 448.90 417.22 22.79 numa05.sh Sys: 149.23 379.95 303.04 79.55 numa05.sh User: 32951.76 35959.58 34562.18 1034.05 Testcase Time: Min Max Avg StdDev %Change numa01.sh Real: 493.19 672.88 597.51 59.38 -0.20% numa01.sh Sys: 150.09 245.48 207.76 34.26 -11.0% numa01.sh User: 41928.51 53779.17 48747.06 3901.39 0.059% numa02.sh Real: 60.63 62.87 61.22 0.83 -0.39% numa02.sh Sys: 16.64 27.97 20.25 4.06 4.691% numa02.sh User: 5222.92 5309.60 5254.03 29.98 -0.48% numa03.sh Real: 821.52 902.15 863.60 32.41 -4.30% numa03.sh Sys: 112.04 130.66 118.35 7.08 -1.09% numa03.sh User: 62245.16 69165.14 66443.04 2450.32 -4.47% numa04.sh Real: 414.53 519.57 476.25 37.00 6.009% numa04.sh Sys: 181.84 335.67 280.41 54.07 3.327% numa04.sh User: 33924.50 39115.39 37343.78 1934.26 3.290% numa05.sh Real: 408.30 441.45 417.90 12.05 -0.16% numa05.sh Sys: 233.41 381.60 295.58 57.37 2.523% numa05.sh User: 33301.31 35972.50 34335.19 938.94 0.661% Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-16-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/numa: Update the scan period without holding the numa_group lockSrikar Dronamraju
The metrics for updating scan periods are local or task specific. Currently this update happens under the numa_group lock, which seems unnecessary. Hence move this update outside the lock. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25355.9 25645.4 1.141 1 72812 72142 -0.92 Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-15-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/numa: Remove numa_has_capacity()Srikar Dronamraju
task_numa_find_cpu() helps to find the CPU to swap/move the task to. It's guarded by numa_has_capacity(). However node not having capacity shouldn't deter a task swapping if it helps NUMA placement. Further load_too_imbalanced(), which evaluates possibilities of move/swap, provides similar checks as numa_has_capacity. Hence remove numa_has_capacity() to enhance possibilities of task swapping even if load is imbalanced. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25657.9 25804.1 0.569 1 74435 73413 -1.37 Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-13-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/numa: Modify migrate_swap() to accept additional parametersSrikar Dronamraju
There are checks in migrate_swap_stop() that check if the task/CPU combination is as per migrate_swap_arg before migrating. However atleast one of the two tasks to be swapped by migrate_swap() could have migrated to a completely different CPU before updating the migrate_swap_arg. The new CPU where the task is currently running could be a different node too. If the task has migrated, numa balancer might end up placing a task in a wrong node. Instead of achieving node consolidation, it may end up spreading the load across nodes. To avoid that pass the CPUs as additional parameters. While here, place migrate_swap under CONFIG_NUMA_BALANCING. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25377.3 25226.6 -0.59 1 72287 73326 1.437 Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-10-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/numa: Remove unused task_capacity from 'struct numa_stats'Srikar Dronamraju
The task_capacity field in 'struct numa_stats' is redundant. Also move nr_running for better packing within the struct. No functional changes. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25308.6 25377.3 0.271 1 72964 72287 -0.92 Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Rik van Riel <riel@surriel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-9-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/numa: Skip nodes that are at 'hoplimit'Srikar Dronamraju
When comparing two nodes at a distance of 'hoplimit', we should consider nodes only up to 'hoplimit'. Currently we also consider nodes at 'oplimit' distance too. Hence two nodes at a distance of 'hoplimit' will have same groupweight. Fix this by skipping nodes at hoplimit. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25375.3 25308.6 -0.26 1 72617 72964 0.477 Running SPECjbb2005 on a 16 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 8 113372 108750 -4.07684 1 177403 183115 3.21979 (numbers from v1 based on v4.17-rc5) Testcase Time: Min Max Avg StdDev numa01.sh Real: 478.45 565.90 515.11 30.87 numa01.sh Sys: 207.79 271.04 232.94 21.33 numa01.sh User: 39763.93 47303.12 43210.73 2644.86 numa02.sh Real: 60.00 61.46 60.78 0.49 numa02.sh Sys: 15.71 25.31 20.69 3.42 numa02.sh User: 5175.92 5265.86 5235.97 32.82 numa03.sh Real: 776.42 834.85 806.01 23.22 numa03.sh Sys: 114.43 128.75 121.65 5.49 numa03.sh User: 60773.93 64855.25 62616.91 1576.39 numa04.sh Real: 456.93 511.95 482.91 20.88 numa04.sh Sys: 178.09 460.89 356.86 94.58 numa04.sh User: 36312.09 42553.24 39623.21 2247.96 numa05.sh Real: 393.98 493.48 436.61 35.59 numa05.sh Sys: 164.49 329.15 265.87 61.78 numa05.sh User: 33182.65 36654.53 35074.51 1187.71 Testcase Time: Min Max Avg StdDev %Change numa01.sh Real: 414.64 819.20 556.08 147.70 -7.36% numa01.sh Sys: 77.52 205.04 139.40 52.05 67.10% numa01.sh User: 37043.24 61757.88 45517.48 9290.38 -5.06% numa02.sh Real: 60.80 63.32 61.63 0.88 -1.37% numa02.sh Sys: 17.35 39.37 25.71 7.33 -19.5% numa02.sh User: 5213.79 5374.73 5268.90 55.09 -0.62% numa03.sh Real: 780.09 948.64 831.43 63.02 -3.05% numa03.sh Sys: 104.96 136.92 116.31 11.34 4.591% numa03.sh User: 60465.42 73339.78 64368.03 4700.14 -2.72% numa04.sh Real: 412.60 681.92 521.29 96.64 -7.36% numa04.sh Sys: 210.32 314.10 251.77 37.71 41.74% numa04.sh User: 34026.38 45581.20 38534.49 4198.53 2.825% numa05.sh Real: 394.79 439.63 411.35 16.87 6.140% numa05.sh Sys: 238.32 330.09 292.31 38.32 -9.04% numa05.sh User: 33456.45 34876.07 34138.62 609.45 2.741% While there is a regression with this change, this change is needed from a correctness perspective. Also it helps consolidation as seen from perf bench output. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-8-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25sched/debug: Reverse the order of printing faultsSrikar Dronamraju
Fix the order in which the private and shared numa faults are getting printed. No functional changes. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25215.7 25375.3 0.63 1 72107 72617 0.70 Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-7-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>