summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-08-01staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pagesJohn Stultz
Amit Pundir and Youling in parallel reported crashes with recent mainline kernels running Android: F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** F DEBUG : Build fingerprint: 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key F DEBUG : Revision: '0' F DEBUG : ABI: 'arm' F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<< F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008 ... <snip> ... F DEBUG : backtrace: F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48) F DEBUG : #01 pc 0010c513 /system/lib/libart.so (create_mspace_with_base+82) F DEBUG : #02 pc 0015c601 /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned int)+40) F DEBUG : #03 pc 0015c3ed /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, std::__1::basic_string<char, std::__ 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned int, unsigned int, bool)+36) ... This was bisected back to commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives"). create_mspace_with_base() in the trace above, utilizes ashmem, and with ashmem, for shared mappings we use shmem_zero_setup(), which sets the vma->vm_ops to &shmem_vm_ops. But for private ashmem mappings nothing sets the vma->vm_ops. Looking at the problematic patch, it seems to add a requirement that one call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will be used. Using the dummy_vm_ops seem to triggger SIGBUS when traversing unmapped pages. Thus, this patch adds a call to vma_set_anonymous() for ashmem private mappings and seems to avoid the reported problem. Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Colin Cross <ccross@google.com> Cc: Matthew Wilcox <willy@infradead.org> Reported-by: Amit Pundir <amit.pundir@linaro.org> Reported-by: Youling 257 <youling257@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01ia64: mark special ia64 memory areas anonymousLinus Torvalds
Commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") made newly allocated vma's have a dummy vm_ops field so that they wouldn't be mistaken for anonymous mappings, and if you wanted an anonymous vma you had to explicitly say so by calling "vma_set_anonymous()" on it. However, it missed the two special vmas that ia64 processes have: the register backing store and the NaT page. So they wouldn't actually act like anonymous ranges, and page faults on them caused a SIGBUS rather than the creation of a new anon page in them. That obviously will make any ia64 binary very unhappy indeed, and the boot fails early. Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") Reported-by: Tony Luck <tony.luck@intel.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01net: dsa: Do not suspend/resume closed slave_devFlorian Fainelli
If a DSA slave network device was previously disabled, there is no need to suspend or resume it. Fixes: 2446254915a7 ("net: dsa: allow switch drivers to implement suspend/resume hooks") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01netlink: Fix spectre v1 gadget in netlink_create()Jeremy Cline
'protocol' is a user-controlled value, so sanitize it after the bounds check to avoid using it for speculative out-of-bounds access to arrays indexed by it. This addresses the following accesses detected with the help of smatch: * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_keys' [w] * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_key_strings' [w] * net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre issue 'nl_table' [w] (local cap) Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01Documentation: dpaa2: Use correct heading adornmentIoana Ciornei
Add overline heading adornment to document title in order to comply with kernel doc requirements. Fixes: 60b9131 staging: fsl-mc: Convert documentation to rst format Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: stmmac: Fix WoL for PCI-based setupsJose Abreu
WoL won't work in PCI-based setups because we are not saving the PCI EP state before entering suspend state and not allowing D3 wake. Fix this by using a wrapper around stmmac_{suspend/resume} which correctly sets the PCI EP state. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01bonding: avoid lockdep confusion in bond_get_stats()Eric Dumazet
syzbot found that the following sequence produces a LOCKDEP splat [1] ip link add bond10 type bond ip link add bond11 type bond ip link set bond11 master bond10 To fix this, we can use the already provided nest_level. This patch also provides correct nesting for dev->addr_list_lock [1] WARNING: possible recursive locking detected 4.18.0-rc6+ #167 Not tainted -------------------------------------------- syz-executor751/4439 is trying to acquire lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 but task is already holding lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&bond->stats_lock)->rlock); lock(&(&bond->stats_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor751/4439: #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215 stack backtrace: CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] check_deadlock kernel/locking/lockdep.c:1809 [inline] validate_chain kernel/locking/lockdep.c:2405 [inline] __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144 spin_lock include/linux/spinlock.h:310 [inline] bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169 rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611 rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268 rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300 rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline] rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716 notifier_call_chain+0x180/0x390 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735 call_netdevice_notifiers net/core/dev.c:1753 [inline] netdev_features_change net/core/dev.c:1321 [inline] netdev_change_features+0xb3/0x110 net/core/dev.c:7759 bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120 bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755 bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528 dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327 dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493 sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992 sock_ioctl+0x30d/0x680 net/socket.c:1093 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440859 Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859 RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003 RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390 R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.hArnaldo Carvalho de Melo
The next example scripts need the definition for the BPF functions, i.e. things like BPF_FUNC_probe_read, and in time will require lots of other definitions found in uapi/linux/bpf.h, so include it from the bpf.h file included from the eBPF scripts build with clang via '-e bpf_script.c' like in this example: $ tail -8 tools/perf/examples/bpf/5sec.c #include <bpf.h> int probe(hrtimer_nanosleep, rqtp->tv_sec)(void *ctx, int err, long sec) { return sec == 5; } license(GPL); $ That 'bpf.h' include in the 5sec.c eBPF example will come from a set of header files crafted for building eBPF objects, that in a end-user system will come from: /usr/lib/perf/include/bpf/bpf.h And will include <uapi/linux/bpf.h> either from the place where the kernel was built, or from a kernel-devel rpm package like: -working-directory /lib/modules/4.17.9-100.fc27.x86_64/build That is set up by tools/perf/util/llvm-utils.c, and can be overriden by setting the 'kbuild-dir' variable in the "llvm" ~/.perfconfig file, like: # cat ~/.perfconfig [llvm] kbuild-dir = /home/foo/git/build/linux This usually doesn't need any change, just documenting here my findings while working with this code. In the future we may want to instead just use what is in /usr/include/linux/bpf.h, that comes from the UAPI provided from the kernel sources, for now, to avoid getting the kernel's non-UAPI "linux/bpf.h" file, that will cause clang to fail and is not what we want anyway (no BPF function definitions, etc), do it explicitely by asking for "uapi/linux/bpf.h". Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-zd8zeyhr2sappevojdem9xxt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-01perf tools: Allow overriding MAX_NR_CPUS at compile timeChristophe Leroy
After update of kernel, the perf tool doesn't run anymore on my 32MB RAM powerpc board, but still runs on a 128MB RAM board: ~# strace perf execve("/usr/sbin/perf", ["perf"], [/* 12 vars */]) = -1 ENOMEM (Cannot allocate memory) --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} --- +++ killed by SIGSEGV +++ Segmentation fault objdump -x shows that .bss section has a huge size of 24Mbytes: 27 .bss 016baca8 101cebb8 101cebb8 001cd988 2**3 With especially the following objects having quite big size: 10205f80 l O .bss 00140000 runtime_cycles_stats 10345f80 l O .bss 00140000 runtime_stalled_cycles_front_stats 10485f80 l O .bss 00140000 runtime_stalled_cycles_back_stats 105c5f80 l O .bss 00140000 runtime_branches_stats 10705f80 l O .bss 00140000 runtime_cacherefs_stats 10845f80 l O .bss 00140000 runtime_l1_dcache_stats 10985f80 l O .bss 00140000 runtime_l1_icache_stats 10ac5f80 l O .bss 00140000 runtime_ll_cache_stats 10c05f80 l O .bss 00140000 runtime_itlb_cache_stats 10d45f80 l O .bss 00140000 runtime_dtlb_cache_stats 10e85f80 l O .bss 00140000 runtime_cycles_in_tx_stats 10fc5f80 l O .bss 00140000 runtime_transaction_stats 11105f80 l O .bss 00140000 runtime_elision_stats 11245f80 l O .bss 00140000 runtime_topdown_total_slots 11385f80 l O .bss 00140000 runtime_topdown_slots_retired 114c5f80 l O .bss 00140000 runtime_topdown_slots_issued 11605f80 l O .bss 00140000 runtime_topdown_fetch_bubbles 11745f80 l O .bss 00140000 runtime_topdown_recovery_bubbles This is due to commit 4d255766d28b1 ("perf: Bump max number of cpus to 1024"), because many tables are sized with MAX_NR_CPUS This patch gives the opportunity to redefine MAX_NR_CPUS via $ make EXTRA_CFLAGS=-DMAX_NR_CPUS=1 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170922112043.8349468C57@po15668-vm-win7.idsi0.si.c-s.fr Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-01powerpc/64s/radix: Fix missing global invalidations when removing coproFrederic Barrat
With the optimizations for TLB invalidation from commit 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask"), the scope of a TLBI (global vs. local) can now be influenced by the value of the 'copros' counter of the memory context. When calling mm_context_remove_copro(), the 'copros' counter is decremented first before flushing. It may have the unintended side effect of sending local TLBIs when we explicitly need global invalidations in this case. Thus breaking any nMMU user in a bad and unpredictable way. Fix it by flushing first, before updating the 'copros' counter, so that invalidations will be global. Fixes: 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-01gpiolib-acpi: make sure we trigger edge events at least once on bootBenjamin Tissoires
On some systems using edge triggered ACPI Event Interrupts, the initial state at boot is not setup by the firmware, instead relying on the edge irq event handler running at least once to setup the initial state. 2 known examples of this are: 1) The Surface 3 has its _LID state controlled by an ACPI operation region triggered by a GPIO event: OperationRegion (GPOR, GeneralPurposeIo, Zero, One) Field (GPOR, ByteAcc, NoLock, Preserve) { Connection ( GpioIo (Shared, PullNone, 0x0000, 0x0000, IoRestrictionNone, "\\_SB.GPO0", 0x00, ResourceConsumer, , ) { // Pin list 0x004C } ), HELD, 1 } Method (_E4C, 0, Serialized) // _Exx: Edge-Triggered GPE { If ((HELD == One)) { ^^LID.LIDB = One } Else { ^^LID.LIDB = Zero Notify (LID, 0x80) // Status Change } Notify (^^PCI0.SPI1.NTRG, One) // Device Check } Currently, the state of LIDB is wrong until the user actually closes or open the cover. We need to trigger the GPIO event once to update the internal ACPI state. Coincidentally, this also enables the Surface 2 integrated HID sensor hub which also requires an ACPI gpio operation region to start initialization. 2) Various Bay Trail based tablets come with an external USB mux and TI T1210B USB phy to enable USB gadget mode. The mux is controlled by a GPIO which is controlled by an edge triggered ACPI Event Interrupt which monitors the micro-USB ID pin. When the tablet is connected to a PC (or no cable is plugged in), the ID pin is high and the tablet should be in gadget mode. But the GPIO controlling the mux is initialized by the firmware so that the USB data lines are muxed to the host controller. This means that if the user wants to use gadget mode, the user needs to first plug in a host-cable to force the ID pin low and then unplug it and connect the tablet to a PC, to get the ACPI event handler to run and switch the mux to device mode, This commit fixes both by running the event-handler once on boot. Note that the running of the event-handler is done from a late_initcall, this is done because the handler AML code may rely on OperationRegions registered by other builtin drivers. This avoids errors like these: [ 0.133026] ACPI Error: No handler for Region [XSCG] ((____ptrval____)) [GenericSerialBus] (20180531/evregion-132) [ 0.133036] ACPI Error: Region GenericSerialBus (ID=9) has no handler (20180531/exfldio-265) [ 0.133046] ACPI Error: Method parse/execution failed \_SB.GPO2._E12, AE_NOT_EXIST (20180531/psparse-516) Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> [hdegoede: Document BYT USB mux reliance on initial trigger] [hdegoede: Run event handler from a late_initcall, rather then immediately] Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-01Merge branch 'pm-tools'Rafael J. Wysocki
Merge turbostat utility fixes for final 4.18: - Fix the -S option on 1-CPU systems. - Fix computations using incorrect processor core counts. - Fix the x2apic debug message. - Fix logical node enumeration to allow for non-sequential physical nodes. - Fix reported family on modern AMD processors. - Clarify the RAPL column information in the man page. * pm-tools: tools/power turbostat: version 18.07.27 tools/power turbostat: Read extended processor family from CPUID tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes tools/power turbostat: fix x2apic debug message output file tools/power turbostat: fix bogus summary values tools/power turbostat: fix -S on UP systems tools/power turbostat: Update turbostat(8) RAPL throttling column description
2018-08-01s390/numa: move initial setup of node_to_cpumask_mapMartin Schwidefsky
The numa_init_early initcall sets the node_to_cpumask_map[0] to the full cpu_possible_mask. Unfortunately this early_initcall is too late, the NUMA setup for numa=emu is done even earlier. The order of calls is numa_setup() -> emu_update_cpu_topology(), then the early_initcalls(), followed by sched_init_domains(). Starting with git commit 051f3ca02e46432c0965e8948f00c07d8a2f09c0 "sched/topology: Introduce NUMA identity node sched domain" the incorrect node_to_cpumask_map[0] really screws up the domain setup and the kernel panics with the follow oops: Cc: <stable@vger.kernel.org> # v4.15+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-01Merge tag 'drm-misc-fixes-2018-07-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes pull request for v4.18-rc7: - Small fixes to drm_atomic_helper_async_check(). (bbrezillon) - Fix error handling in drm_legacy_addctx(). (Nicholas) - Handle register reset on hotplug in adv7511. (seanpaul) Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/90e0e966-bce5-15a4-286a-eda908788b03@linux.intel.com
2018-07-31enic: do not call enic_change_mtu in enic_probeGovindarajulu Varadarajan
In commit ab123fe071c9 ("enic: handle mtu change for vf properly") ASSERT_RTNL() is added to _enic_change_mtu() to prevent it from being called without rtnl held. enic_probe() calls enic_change_mtu() without rtnl held. At this point netdev is not registered yet. Remove call to enic_change_mtu and assign the mtu to netdev->mtu. Fixes: ab123fe071c9 ("enic: handle mtu change for vf properly") Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31ipv4: frags: handle possible skb truesize changeEric Dumazet
ip_frag_queue() might call pskb_pull() on one skb that is already in the fragment queue. We need to take care of possible truesize change, or we might have an imbalance of the netns frags memory usage. IPv6 is immune to this bug, because RFC5722, Section 4, amended by Errata ID 3089 states : When reassembling an IPv6 datagram, if one or more its constituent fragments is determined to be an overlapping fragment, the entire datagram (and any constituent fragments) MUST be silently discarded. Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31inet: frag: enforce memory limits earlierEric Dumazet
We currently check current frags memory usage only when a new frag queue is created. This allows attackers to first consume the memory budget (default : 4 MB) creating thousands of frag queues, then sending tiny skbs to exceed high_thresh limit by 2 to 3 order of magnitude. Note that before commit 648700f76b03 ("inet: frags: use rhashtables for reassembly units"), work queue could be starved under DOS, getting no cpu cycles. After commit 648700f76b03, only the per frag queue timer can eventually remove an incomplete frag queue and its skbs. Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Peter Oskolkov <posk@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31Merge tag 'mlx5-fixes-2018-07-31' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-07-31 The following series includes four mlx5 fixes. Please pull and let me know if there's any problem. For -stable v4.14 net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager For -stable v4.16 net/mlx5e: Set port trust mode to PCP as default For -stable v4.17 net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31Merge tag 'audit-pr-20180731' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single small audit fix to guard against memory allocation failures when logging information about a kernel module load. It's small, easy to understand, and self-contained; while nothing is zero risk, this should be pretty low" * tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix potential null dereference 'context->module.name'
2018-07-31nohz: Fix local_timer_softirq_pending()Anna-Maria Gleixner
local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Cc: bigeasy@linutronix.de Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
2018-07-31net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flowFeras Daoud
After introduction of the cited commit, mlx5e_build_nic_params receives the netdevice mtu in order to set the sw_mtu of mlx5e_params. For enhanced IPoIB, the netdevice mtu is not set in this stage, therefore, the initial sw_mtu equals zero. As a result, the hw_mtu of the receive queue will be calculated incorrectly causing traffic issues. To fix this issue, query for port mtu before building the nic params. Fixes: 472a1e44b349 ("net/mlx5e: Save MTU in channels params") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31net/mlx5e: Fix null pointer access when setting MTU of vport representorAdi Nissim
MTU helper function is used by both conventional mlx5e instances (PF/VF) and the eswitch representors. The representor shouldn't change the nic vport context MTU, the VF is responsible for that. Therefore set_mtu_cb has a null value when changing the representor MTU. Fixes: 250a42b6a764 ("net/mlx5e: Support configurable MTU for vport representors") Signed-off-by: Adi Nissim <adin@mellanox.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31net/mlx5e: Set port trust mode to PCP as defaultOr Gerlitz
The hairpin offload code has dependency on the trust mode being PCP. Hence we should set PCP as the default for handling cases where we are disallowed to read the trust mode from the FW, or failed to initialize it. Fixes: 106be53b6b0a ('net/mlx5e: Set per priority hairpin pairs') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31net/mlx5e: E-Switch, Initialize eswitch only if eswitch managerEli Cohen
Execute mlx5_eswitch_init() only if we have MLX5_ESWITCH_MANAGER capabilities. Do the same for mlx5_eswitch_cleanup(). Fixes: a9f7705ffd66 ("net/mlx5: Unify vport manager capability check") Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31net: dsa: mv88e6xxx: Fix SERDES support on 88E6141/6341Andrew Lunn
Version 1 of the patch adding SERDES support to the 88E6141/6341 correctly added the ops to the 88E6141/6341. However, by the time version 3 was committed, the ops had moved to the 88E6085/6175. Put them back where they belong. Fixes: 5bafeb6e7e87 ("net: dsa: mv88e6xxx: 88E6141/6341 SERDES support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31Merge tag 'wireless-drivers-for-davem-2018-07-31' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.18 Last set of fixes before 4.18 is released iwlwifi * add new IDs for cards already available on the market brcmfmac * fix a regression introduced in v4.17 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Nine fixes, five in the qla2xxx driver, the most serious of which is the uninitialized list head crash which can be observed in most systems under a sufficiently loaded low memory environment. The two sg fixes are minor but obvious and two target ones which seem reasonable but not high impact" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qla2xxx: Return error when TMF returns scsi: qla2xxx: Fix ISP recovery on unload scsi: qla2xxx: Fix driver unload by shutting down chip scsi: qla2xxx: Fix NPIV deletion by calling wait_for_sess_deletion scsi: qla2xxx: Fix unintialized List head crash scsi: sg: update comment for blk_get_request() scsi: sg: fix minor memory leak in error path scsi: libiscsi: fix possible NULL pointer dereference in case of TMF scsi: target: iscsi: cxgbit: fix max iso npdu calculation
2018-07-31Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Some bugfixes that seem important and safe enough to merge at the last minute" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: fix another race between migration and ballooning tools/virtio: add kmalloc_array stub tools/virtio: add dma barrier stubs
2018-07-31Merge tag 'acpi-urgent-4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a recent ACPICA regression affecting control method execution at the table level and an earlier hibernation regression in the ACPI driver for Intel SoCs (LPSS) that was missed by a previous fix in this cycle. Specifics: - Fix a recent ACPICA regression introduced by a previous fix that caused control method execution at the table level to be mishandled by mistake (Erik Schmauss). - Fix a hibernation regression from the 4.15 cycle in the ACPI driver for Intel SoCs (LPSS) that caused the platform firmware to be confused during resume from hibernation by the driver's PM quirks which was fixed for system-wide suspend/resume (ACPI S3) earlier in this cycle, but that previous fix missed the hibernation (ACPI S4) case (Rafael Wysocki)" * tag 'acpi-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: AML Parser: ignore control method status in module-level code ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation
2018-07-31PCI: Fix is_added/is_busmaster race conditionHari Vyas
When a PCI device is detected, pdev->is_added is set to 1 and proc and sysfs entries are created. When the device is removed, pdev->is_added is checked for one and then device is detached with clearing of proc and sys entries and at end, pdev->is_added is set to 0. is_added and is_busmaster are bit fields in pci_dev structure sharing same memory location. A strange issue was observed with multiple removal and rescan of a PCIe NVMe device using sysfs commands where is_added flag was observed as zero instead of one while removing device and proc,sys entries are not cleared. This causes issue in later device addition with warning message "proc_dir_entry" already registered. Debugging revealed a race condition between the PCI core setting the is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue setting the is_busmaster bit in pci_set_master(). As these fields are not handled atomically, that clears the is_added bit. Move the is_added bit to a separate private flag variable and use atomic functions to set and retrieve the device addition state. This avoids the race because is_added no longer shares a memory location with is_busmaster. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283 Signed-off-by: Hari Vyas <hari.vyas@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31s390/kdump: Fix elfcorehdr size calculationPhilipp Rudo
Before the memory for the elfcorehdr is allocated the required size is estimated with alloc_size = 0x1000 + get_cpu_cnt() * 0x4a0 + mem_chunk_cnt * sizeof(Elf64_Phdr); Where 0x4a0 is used as size for the ELF notes to store the register contend. This size is 8 bytes too small. Usually this does not immediately cause a problem because the page reserved for overhead (Elf_Ehdr, vmcoreinfo, etc.) is pretty generous. So usually there is enough spare memory to counter the mis-calculated per cpu size. However, with growing overhead and/or a huge cpu count the allocated size gets too small for the elfcorehdr. Ultimately a BUG_ON is triggered causing the crash kernel to panic. Fix this by properly calculating the required size instead of relying on magic numbers. Fixes: a62bc07392539 ("s390/kdump: add support for vector extension") Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-31perf bpf: Show better message when failing to load an objectArnaldo Carvalho de Melo
Before: libbpf: license of tools/perf/examples/bpf/etcsnoop.c is GPL libbpf: section(6) version, size 4, link 0, flags 3, type=1 libbpf: kernel version of tools/perf/examples/bpf/etcsnoop.c is 41200 libbpf: section(7) .symtab, size 120, link 1, flags 0, type=2 bpf: config program 'syscalls:sys_enter_openat' libbpf: load bpf program failed: Operation not permitted libbpf: failed to load program 'syscalls:sys_enter_openat' libbpf: failed to load object 'tools/perf/examples/bpf/etcsnoop.c' bpf: load objects failed After: (just the last line changes) bpf: load objects failed: err=-4009: (Incorrect kernel version) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-wi44iid0yjfht3lcvplc75fm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf list: Unify metric group description format with PMU event descriptionMichael Petlan
PMU event descriptions use 7 spaces + '[' or 8 spaces as indentation. Metric groups used a tab + '['. This patch unifies it to the way PMU event descriptions are indented. BEFORE: $ perf list [...] Metric Groups: DSB: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] [...] AFTER: $ perf list [...] Metric Groups: DSB: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] [...] Signed-off-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> LPU-Reference: 771439042.22924766.1532986504631.JavaMail.zimbra@redhat.com Link: https://lkml.kernel.org/n/tip-mlo850517m6u1rbjndvd1bwr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf vendor events arm64: Update ThunderX2 implementation defined pmu core ↵Ganapatrao Kulkarni
events Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ganapatrao Kulkarni <gklkml16@gmail.com> Cc: Jan Glauber <jan.glauber@cavium.com> Cc: Jayachandran C <jnair@caviumnetworks.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@cavium.com> Cc: Vadim Lomovtsev <vadim.lomovtsev@cavium.com> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20180731100251.23575-1-ganapatrao.kulkarni@cavium.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packetLeo Yan
CS_ETM_TRACE_ON packet itself can give the info that there have a discontinuity in the trace, this patch is to add branch sample for CS_ETM_TRACE_ON packet if it is inserted in the middle of CS_ETM_RANGE packets; as result we can have hint for the trace discontinuity. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-7-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packetLeo Yan
If one CS_ETM_TRACE_ON packet is inserted, we miss to generate branch sample for the previous CS_ETM_RANGE packet. This patch is to generate branch sample when receiving a CS_ETM_TRACE_ON packet, so this can save complete info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON packet. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-6-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packetLeo Yan
For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in the decoder layer as dummy value, but the dummy value is pointless for branch sample when we use 'perf script' command to check program flow. This patch is a preparation to support CS_ETM_TRACE_ON packet for branch sample, it converts the dummy address value to zero for more readable; this is accomplished by cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The later one is a new function introduced by this patch. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-5-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Fix start tracing packet handlingLeo Yan
Usually the start tracing packet is a CS_ETM_TRACE_ON packet, this packet is passed to cs_etm__flush(); cs_etm__flush() will check the condition 'prev_packet->sample_type == CS_ETM_RANGE' but 'prev_packet' is allocated by zalloc() so 'prev_packet->sample_type' is zero in initialization and this condition is false. So cs_etm__flush() will directly bail out without handling the start tracing packet. This patch is to introduce a new sample type CS_ETM_EMPTY, which is used to indicate the packet is an empty packet. cs_etm__flush() will swap packets when it finds the previous packet is empty, so this can record the start tracing packet into 'etmq->prev_packet'. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-4-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf build: Fix installation directory for eBPFThomas Richter
The perf tool build and install is controlled via a Makefile. The 'install' rule creates directories and copies files. Among them are header files installed in /usr/lib/include/perf/bpf/. However all listed examples are installing its header files in /usr/lib/<tool-name>/...[/include]/header.h and not in /usr/lib/include/<tool-name>/.../header.h. Background information: Building the Fedora 28 glibc RPM on s390x and s390 fails on s390 (gcc -m31) as gcc is not able to find header-files like stdbool.h. In the glibc.spec file, you can see that glibc is configured with "--with-headers". In this case, first -nostdinc is added to the CFLAGS and then further include paths are added via -isystem. One of those paths should contain header files like stdbool.h. In order to get this path, gcc is invoked with: - on Fedora 28 (with 4.18 kernel): $ gcc -print-file-name=include /usr/lib/gcc/s390x-redhat-linux/8/include $ gcc -m31 -print-file-name=include /usr/lib/gcc/s390x-redhat-linux/8/../../../../lib/include => If perf is installed, this is: /usr/lib/include On my machine this directory is only containing the directory "perf". If perf is not installed gcc returns: /usr/lib/gcc/s390x-redhat-linux/8/include - on Ubuntu 18.04 (with 4.15 kernel): $ gcc -print-file-name=include /usr/lib/gcc/s390x-linux-gnu/7/include $ gcc -m31 -print-file-name=include /usr/lib/gcc/s390x-linux-gnu/7/include => gcc returns the correct path even if perf is installed. In each case, the introduction of the subdirectory /usr/lib/include leads to the regression that one can not build the glibc RPM for s390 anymore as gcc can not find headers like stdbool.h. To remedy this install bpf.h to /usr/lib/perf/include/bpf/bpf.h Output before using the command 'perf test -Fv 40': echo '...[bpf-program-source]...' | /usr/bin/clang ... \ -I/root/lib/include/perf/bpf ... ^^^^^^^^^^^^ ... [root@p23lp27 perf]# perf test -F 40 40: BPF filter : 40.1: Basic BPF filtering : Ok 40.2: BPF pinning : Ok 40.3: BPF prologue generation : Ok 40.4: BPF relocation checker : Ok [root@p23lp27 perf]# Output after using command 'perf test -Fv 40': echo '...[bpf-program-source]...' | /usr/bin/clang ... \ -I/root/lib/perf/include/bpf ... ^^^^^^^^^^^^ ... [root@p23lp27 perf]# perf test -F 40 40: BPF filter : 40.1: Basic BPF filtering : Ok 40.2: BPF pinning : Ok 40.3: BPF prologue generation : Ok 40.4: BPF relocation checker : Ok [root@p23lp27 perf]# Committer testing: While the above 'perf test -F 40' (or 'perf test bpf') will allow us to see that the correct path is now added via -I, to actually test this we better try to use a bpf script that includes files in the changed directory. We have the files that now reside in /root/lib/perf/examples/bpf/ to do just that: # tail -8 /root/lib/perf/examples/bpf/5sec.c #include <bpf.h> int probe(hrtimer_nanosleep, rqtp->tv_sec)(void *ctx, int err, long sec) { return sec == 5; } license(GPL); # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 4 0.333 (4000.086 ms): sleep/9248 nanosleep(rqtp: 0x7ffc155f3300) = 0 # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 5 0.287 ( ): sleep/9659 nanosleep(rqtp: 0x7ffeafe38200) ... 0.290 ( ): perf_bpf_probe:hrtimer_nanosleep:(ffffffff9911efe0) tv_sec=5 0.287 (5000.059 ms): sleep/9659 ... [continued]: nanosleep()) = 0 # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 6 0.247 (5999.951 ms): sleep/10068 nanosleep(rqtp: 0x7fff2086d900) = 0 # perf trace -e *sleep -e /root/lib/perf/examples/bpf/5sec.c sleep 5.987 0.293 ( ): sleep/10489 nanosleep(rqtp: 0x7ffdd4fc10e0) ... 0.296 ( ): perf_bpf_probe:hrtimer_nanosleep:(ffffffff9911efe0) tv_sec=5 0.293 (5986.912 ms): sleep/10489 ... [continued]: nanosleep()) = 0 # Suggested-by: Stefan Liebler <stli@linux.ibm.com> Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Fixes: 1b16fffa389d ("perf llvm-utils: Add bpf include path to clang command line") Link: http://lkml.kernel.org/r/20180731073254.91090-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf c2c report: Fix crash for empty browserJiri Olsa
'perf c2c' scans read/write accesses and tries to find false sharing cases, so when the events it wants were not asked for or ended up not taking place, we get no histograms. So do not try to display entry details if there's not any. Currently this ends up in crash: $ perf c2c report # then press 'd' perf: Segmentation fault $ Committer testing: Before: Record a perf.data file without events of interest to 'perf c2c report', then call it and press 'd': # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (6 samples) ] # perf c2c report perf: Segmentation fault -------- backtrace -------- perf[0x5b1d2a] /lib64/libc.so.6(+0x346df)[0x7fcb566e36df] perf[0x46fcae] perf[0x4a9f1e] perf[0x4aa220] perf(main+0x301)[0x42c561] /lib64/libc.so.6(__libc_start_main+0xe9)[0x7fcb566cff29] perf(_start+0x29)[0x42c999] # After the patch the segfault doesn't take place, a follow up patch to tell the user why nothing changes when 'd' is pressed would be good. Reported-by: rodia@autistici.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: f1c5fd4d0bb9 ("perf c2c report: Add TUI cacheline browser") Link: http://lkml.kernel.org/r/20180724062008.26126-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf tests: Fix indexing when invoking subtestsSandipan Das
Recently, the subtest numbering was changed to start from 1. While it is fine for displaying results, this should not be the case when the subtests are actually invoked. Typically, the subtests are stored in zero-indexed arrays and invoked based on the index passed to the main test function. Since the index now starts from 1, the second subtest in the array (index 1) gets invoked instead of the first (index 0). This applies to all of the following subtests but for the last one, the subtest always fails because it does not meet the boundary condition of the subtest index being lesser than the number of subtests. This can be observed on powerpc64 and x86_64 systems running Fedora 28 as shown below. Before: # perf test "builtin clang support" 55: builtin clang support : 55.1: builtin clang compile C source to IR : Ok 55.2: builtin clang compile C source to ELF object : FAILED! # perf test "LLVM search and compile" 38: LLVM search and compile : 38.1: Basic BPF llvm compile : Ok 38.2: kbuild searching : Ok 38.3: Compile source for BPF prologue generation : Ok 38.4: Compile source for BPF relocation : FAILED! # perf test "BPF filter" 40: BPF filter : 40.1: Basic BPF filtering : Ok 40.2: BPF pinning : Ok 40.3: BPF prologue generation : Ok 40.4: BPF relocation checker : FAILED! After: # perf test "builtin clang support" 55: builtin clang support : 55.1: builtin clang compile C source to IR : Ok 55.2: builtin clang compile C source to ELF object : Ok # perf test "LLVM search and compile" 38: LLVM search and compile : 38.1: Basic BPF llvm compile : Ok 38.2: kbuild searching : Ok 38.3: Compile source for BPF prologue generation : Ok 38.4: Compile source for BPF relocation : Ok # perf test "BPF filter" 40: BPF filter : 40.1: Basic BPF filtering : Ok 40.2: BPF pinning : Ok 40.3: BPF prologue generation : Ok 40.4: BPF relocation checker : Ok Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Fixes: 9ef0112442bd ("perf test: Fix subtest number when showing results") Link: http://lkml.kernel.org/r/20180726171733.33208-1-sandipan@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' argsArnaldo Carvalho de Melo
For instance: $ trace -e socket* ssh sandy 0.000 ( 0.031 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3 0.052 ( 0.015 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3 1.568 ( 0.020 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3 1.603 ( 0.012 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3 1.699 ( 0.014 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3 1.724 ( 0.012 ms): ssh/19919 socket(family: LOCAL, type: STREAM|CLOEXEC|NONBLOCK ) = 3 1.804 ( 0.020 ms): ssh/19919 socket(family: INET, type: STREAM, protocol: TCP ) = 3 17.549 ( 0.098 ms): ssh/19919 socket(family: LOCAL, type: STREAM ) = 4 acme@sandy's password: Just like with other syscall args, the common bits are supressed so that the output is more compact, i.e. we use "TCP" instead of "IPPROTO_TCP", but we can make this show the original constant names if we like it by using some command line knob or ~/.perfconfig "[trace]" section variable. Also needed is to make perf's event parser accept things like: $ perf trace -e socket*/protocol=TCP/ By using both the tracefs event 'format' files and these tables built from the kernel sources. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-l39jz1vnyda0b6jsufuc8bz7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf trace beauty: Add beautifiers for 'socket''s 'protocol' argArnaldo Carvalho de Melo
It'll be wired to 'perf trace' in the next cset. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-2i9vkvm1ik8yu4hgjmxhsyjv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf trace beauty: Do not print NULL strarray entriesArnaldo Carvalho de Melo
We may have string tables where not all slots have values, in those cases its better to print the numeric value, for instance: In the table below we would show "protocol: (null)" for socket_ipproto[3] Where it would be better to show "protocol: 3". $ tools/perf/trace/beauty/socket_ipproto.sh static const char *socket_ipproto[] = { [0] = "IP", [103] = "PIM", [108] = "COMP", [12] = "PUP", [132] = "SCTP", [136] = "UDPLITE", [137] = "MPLS", [17] = "UDP", [1] = "ICMP", [22] = "IDP", [255] = "RAW", [29] = "TP", [2] = "IGMP", [33] = "DCCP", [41] = "IPV6", [46] = "RSVP", [47] = "GRE", [4] = "IPIP", [50] = "ESP", [51] = "AH", [6] = "TCP", [8] = "EGP", [92] = "MTP", [94] = "BEETPH", [98] = "ENCAP", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-7djfak94eb3b9ltr79cpn3ti@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf beauty: Add a generator for IPPROTO_ socket's protocol constantsArnaldo Carvalho de Melo
It'll use tools/include copy of linux/in.h to generate a table to be used by tools, initially by the 'socket' and 'socketpair' beautifiers in 'perf trace', but that could also be used to translate from a string constant to the integer value to be used in a eBPF or tracefs tracepoint filter. When used without any args it produces: $ tools/perf/trace/beauty/socket_ipproto.sh static const char *socket_ipproto[] = { [0] = "IP", [103] = "PIM", [108] = "COMP", [12] = "PUP", [132] = "SCTP", [136] = "UDPLITE", [137] = "MPLS", [17] = "UDP", [1] = "ICMP", [22] = "IDP", [255] = "RAW", [29] = "TP", [2] = "IGMP", [33] = "DCCP", [41] = "IPV6", [46] = "RSVP", [47] = "GRE", [4] = "IPIP", [50] = "ESP", [51] = "AH", [6] = "TCP", [8] = "EGP", [92] = "MTP", [94] = "BEETPH", [98] = "ENCAP", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-v9rafqh3qn6b9kp9vfvj9f8s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31tools include uapi: Grab a copy of linux/in.hArnaldo Carvalho de Melo
We'll use it to create tables for the 'protocol' argument to the socket syscall when the 'family' arg is one of AF_INET or AF_INET6. Add it to check_headers.sh so that when a new protocol gets added we get a notification during the build process. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-2amnveu1ns4emjn70xuavpje@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf tests: Fix complex event name parsingSandipan Das
The 'umask' event parameter is unsupported on some architectures like powerpc64. This can be observed on a powerpc64le system running Fedora 27 as shown below. # perf test "Parse event definition strings" -v 6: Parse event definition strings : --- start --- test child forked, pid 45915 ... running test 3 'cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2,umask=0x3/ukp'Invalid event/parameter 'umask' Invalid event/parameter 'umask' failed to parse event 'cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2,umask=0x3/ukp', err 1, str 'unknown term' event syntax error: '..,event=0x2,umask=0x3/ukp' \___ unknown term valid terms: event,mark,pmc,cache_sel,pmcxsel,unit,thresh_stop,thresh_start,combine,thresh_sel,thresh_cmp,sample_mode,config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,no-overwrite,overwrite,driver-config mem_access -> cpu/event=0x10401e0/ running test 0 'config=10,config1,config2=3,umask=1' test child finished with 1 ---- end ---- Parse event definition strings: FAILED! Committer testing: After applying the patch these test passes and in verbose mode we get: # perf test -v "event definition" 6: Parse event definition strings: --- start --- test child forked, pid 11061 running test 0 'syscalls:sys_enter_openat'Using CPUID GenuineIntel-6-9E <SNIP> running test 53 'cycles/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks'/Duk' running test 0 'cpu/config=10,config1,config2=3,period=1000/u' running test 1 'cpu/config=1,name=krava/u,cpu/config=2/u' running test 2 'cpu/config=1,call-graph=fp,time,period=100000/,cpu/config=2,call-graph=no,time=0,period=2000/' running test 3 'cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp' <SNIP> test child finished with 0 ---- end ---- Parse event definition strings: Ok # Suggested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Fixes: 06dc5bf21f3f ("perf tests: Check that complex event name is parsed correctly") Link: http://lkml.kernel.org/r/20180726105502.31670-1-sandipan@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf evlist: Fix error out while applying initial delay and LBRKan Liang
'perf record' will error out if both --delay and LBR are applied. For example: # perf record -D 1000 -a -e cycles -j any -- sleep 2 Error: dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' # A dummy event is added implicitly for initial delay, which has the same configurations as real sampling events. The dummy event is a software event. If LBR is configured, perf must error out. The dummy event will only be used to track PERF_RECORD_MMAP while perf waits for the initial delay to enable the real events. The BRANCH_STACK bit can be safely cleared for the dummy event. After applying the patch: # perf record -D 1000 -a -e cycles -j any -- sleep 2 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.054 MB perf.data (828 samples) ] # Reported-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1531145722-16404-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf trace beauty: Default header_dir to cwd to work without parmsArnaldo Carvalho de Melo
Useful when checking the effects of header synchs for the files it uses as a input to generate string tables, in retrospect this is how it should've been done from day 1, not requiring the header_dir to be set on the Makefile, will change everything later, so that the only parm, common to all generators will be $(srctree) and $(beauty_outdir). So, to see what it generates, just call it without any parameters: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh static const char *vhost_virtio_ioctl_cmds[] = { [0x00] = "SET_FEATURES", [0x01] = "SET_OWNER", [0x02] = "RESET_OWNER", [0x03] = "SET_MEM_TABLE", [0x04] = "SET_LOG_BASE", [0x07] = "SET_LOG_FD", [0x10] = "SET_VRING_NUM", [0x11] = "SET_VRING_ADDR", [0x12] = "SET_VRING_BASE", [0x13] = "SET_VRING_ENDIAN", [0x14] = "GET_VRING_ENDIAN", [0x20] = "SET_VRING_KICK", [0x21] = "SET_VRING_CALL", [0x22] = "SET_VRING_ERR", [0x23] = "SET_VRING_BUSYLOOP_TIMEOUT", [0x24] = "GET_VRING_BUSYLOOP_TIMEOUT", [0x30] = "NET_SET_BACKEND", [0x40] = "SCSI_SET_ENDPOINT", [0x41] = "SCSI_CLEAR_ENDPOINT", [0x42] = "SCSI_GET_ABI_VERSION", [0x43] = "SCSI_SET_EVENTS_MISSED", [0x44] = "SCSI_GET_EVENTS_MISSED", [0x60] = "VSOCK_SET_GUEST_CID", [0x61] = "VSOCK_SET_RUNNING", }; static const char *vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", [0x12] = "GET_VRING_BASE", }; $ Or: $ tools/perf/trace/beauty/sndrv_pcm_ioctl.sh static const char *sndrv_pcm_ioctl_cmds[] = { [0x00] = "PVERSION", [0x01] = "INFO", [0x02] = "TSTAMP", [0x03] = "TTSTAMP", [0x04] = "USER_PVERSION", [0x10] = "HW_REFINE", [0x11] = "HW_PARAMS", [0x12] = "HW_FREE", [0x13] = "SW_PARAMS", [0x20] = "STATUS", [0x21] = "DELAY", [0x22] = "HWSYNC", [0x23] = "SYNC_PTR", [0x24] = "STATUS_EXT", [0x32] = "CHANNEL_INFO", [0x40] = "PREPARE", [0x41] = "RESET", [0x42] = "START", [0x43] = "DROP", [0x44] = "DRAIN", [0x45] = "PAUSE", [0x46] = "REWIND", [0x47] = "RESUME", [0x48] = "XRUN", [0x49] = "FORWARD", [0x50] = "WRITEI_FRAMES", [0x51] = "READI_FRAMES", [0x52] = "WRITEN_FRAMES", [0x53] = "READN_FRAMES", [0x60] = "LINK", [0x61] = "UNLINK", }; $ Etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-90am4vm8hh1osms894dp2otr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31Merge remote-tracking branch 'tip/perf/urgent' into perf/coreArnaldo Carvalho de Melo
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>