summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-02-10genirq: Avoid summation loops for /proc/statThomas Gleixner
Waiman reported that on large systems with a large amount of interrupts the readout of /proc/stat takes a long time to sum up the interrupt statistics. In principle this is not a problem. but for unknown reasons some enterprise quality software reads /proc/stat with a high frequency. The reason for this is that interrupt statistics are accounted per cpu. So the /proc/stat logic has to sum up the interrupt stats for each interrupt. This can be largely avoided for interrupts which are not marked as 'PER_CPU' interrupts by simply adding a per interrupt summation counter which is incremented along with the per interrupt per cpu counter. The PER_CPU interrupts need to avoid that and use only per cpu accounting because they share the interrupt number and the interrupt descriptor and concurrent updates would conflict or require unwanted synchronization. Reported-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-fsdevel@vger.kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de 8<------------- v2: Undo the unintentional layout change of struct irq_desc. include/linux/irqdesc.h | 1 + kernel/irq/chip.c | 12 ++++++++++-- kernel/irq/internals.h | 8 +++++++- kernel/irq/irqdesc.c | 7 ++++++- 4 files changed, 24 insertions(+), 4 deletions(-)
2019-02-10Merge tag 'y2038-new-syscalls' of ↵Thomas Gleixner
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038 Pull y2038 - time64 system calls from Arnd Bergmann: This series finally gets us to the point of having system calls with 64-bit time_t on all architectures, after a long time of incremental preparation patches. There was actually one conversion that I missed during the summer, i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes and review comments. The following system calls are now added on all 32-bit architectures using the same system call numbers: 403 clock_gettime64 404 clock_settime64 405 clock_adjtime64 406 clock_getres_time64 407 clock_nanosleep_time64 408 timer_gettime64 409 timer_settime64 410 timerfd_gettime64 411 timerfd_settime64 412 utimensat_time64 413 pselect6_time64 414 ppoll_time64 416 io_pgetevents_time64 417 recvmmsg_time64 418 mq_timedsend_time64 419 mq_timedreceiv_time64 420 semtimedop_time64 421 rt_sigtimedwait_time64 422 futex_time64 423 sched_rr_get_interval_time64 Each one of these corresponds directly to an existing system call that includes a 'struct timespec' argument, or a structure containing a timespec or (in case of clock_adjtime) timeval. Not included here are new versions of getitimer/setitimer and getrusage/waitid, which are planned for the future but only needed to make a consistent API rather than for correct operation beyond y2038. These four system calls are based on 'timeval', and it has not been finally decided what the replacement kernel interface will use instead. So far, I have done a lot of build testing across most architectures, which has found a number of bugs. Runtime testing so far included testing LTP on 32-bit ARM with the existing system calls, to ensure we do not regress for existing binaries, and a test with a 32-bit x86 build of LTP against a modified version of the musl C library that has been adapted to the new system call interface [3]. This library can be used for testing on all architectures supported by musl-1.1.21, but it is not how the support is getting integrated into the official musl release. Official musl support is planned but will require more invasive changes to the library. Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/ Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/ Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
2019-02-10Merge tag 'y2038-syscall-cleanup' of ↵Thomas Gleixner
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038 Pull preparatory work for y2038 changes from Arnd Bergmann: System call unification and cleanup The system call tables have diverged a bit over the years, and a number of the recent additions never made it into all architectures, for one reason or another. This is an attempt to clean it up as far as we can without breaking compatibility, doing a number of steps: - Add system calls that have not yet been integrated into all architectures but that we definitely want there. This includes {,f}statfs64() and get{eg,eu,g,p,u,pp}id() on alpha, which have been missing traditionally. - The s390 compat syscall handling is cleaned up to be more like what we do on other architectures, while keeping the 31-bit pointer extension. This was merged as a shared branch by the s390 maintainers and is included here in order to base the other patches on top. - Add the separate ipc syscalls on all architectures that traditionally only had sys_ipc(). This version is done without support for IPC_OLD that is we have in sys_ipc. The new semtimedop_time64 syscall will only be added here, not in sys_ipc - Add syscall numbers for a couple of syscalls that we probably don't need everywhere, in particular pkey_* and rseq, for the purpose of symmetry: if it's in asm-generic/unistd.h, it makes sense to have it everywhere. I expect that any future system calls will get assigned on all platforms together, even when they appear to be specific to a single architecture. - Prepare for having the same system call numbers for any future calls. In combination with the generated tables, this hopefully makes it easier to add new calls across all architectures together. All of the above are technically separate from the y2038 work, but are done as preparation before we add the new 64-bit time_t system calls everywhere, providing a common baseline set of system calls. I expect that glibc and other libraries that want to use 64-bit time_t will require linux-5.1 kernel headers for building in the future, and at a much later point may also require linux-5.1 or a later version as the minimum kernel at runtime. Having a common baseline then allows the removal of many architecture or kernel version specific workarounds.
2019-02-10genirq/affinity: Move allocation of 'node_to_cpumask' to ↵Ming Lei
irq_build_affinity_masks() 'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(), so move it into irq_build_affinity_masks(). No functioanl change. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Link: https://lkml.kernel.org/r/20190125095347.17950-2-ming.lei@redhat.com
2019-02-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A couple of kernel side fixes: - Fix the Intel uncore driver on certain hardware configurations - Fix a CPU hotplug related memory allocation bug - Remove a spurious WARN() ... plus also a handful of perf tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf script python: Add Python3 support to tests/attr.py perf trace: Support multiple "vfs_getname" probes perf symbols: Filter out hidden symbols from labels perf symbols: Add fallback definitions for GELF_ST_VISIBILITY() tools headers uapi: Sync linux/in.h copy from the kernel sources perf clang: Do not use 'return std::move(something)' perf mem/c2c: Fix perf_mem_events to support powerpc perf tests evsel-tp-sched: Fix bitwise operator perf/core: Don't WARN() for impossible ring-buffer sizes perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu() perf/x86/intel/uncore: Add Node ID mask
2019-02-10Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "An rtmutex (PI-futex) deadlock scenario fix, plus a locking documentation fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Handle early deadlock return correctly futex: Fix barrier comment
2019-02-09bpf: Fix narrow load on a bpf_sock returned from sk_lookup()Martin KaFai Lau
By adding this test to test_verifier: { "reference tracking: access sk->src_ip4 (narrow load)", .insns = { BPF_SK_LOOKUP, BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2), BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), BPF_EMIT_CALL(BPF_FUNC_sk_release), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, The above test loads 2 bytes from sk->src_ip4 where sk is obtained by bpf_sk_lookup_tcp(). It hits an internal verifier error from convert_ctx_accesses(): [root@arch-fb-vm1 bpf]# ./test_verifier 665 665 Failed to load prog 'Invalid argument'! 0: (b7) r2 = 0 1: (63) *(u32 *)(r10 -8) = r2 2: (7b) *(u64 *)(r10 -16) = r2 3: (7b) *(u64 *)(r10 -24) = r2 4: (7b) *(u64 *)(r10 -32) = r2 5: (7b) *(u64 *)(r10 -40) = r2 6: (7b) *(u64 *)(r10 -48) = r2 7: (bf) r2 = r10 8: (07) r2 += -48 9: (b7) r3 = 36 10: (b7) r4 = 0 11: (b7) r5 = 0 12: (85) call bpf_sk_lookup_tcp#84 13: (bf) r6 = r0 14: (15) if r0 == 0x0 goto pc+3 R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1 15: (69) r2 = *(u16 *)(r0 +26) 16: (bf) r1 = r6 17: (85) call bpf_sk_release#86 18: (95) exit from 14 to 18: safe processed 20 insns (limit 131072), stack depth 48 bpf verifier is misconfigured Summary: 0 PASSED, 0 SKIPPED, 1 FAILED The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly loaded (meaning load any 1 or 2 bytes of the src_ip4) by marking info->ctx_field_size. However, this marked ctx_field_size is not used. This patch fixes it. Due to the recent refactoring in test_verifier, this new test will be added to the bpf-next branch (together with the bpf_tcp_sock patchset) to avoid merge conflict. Fixes: c64b7983288e ("bpf: Add PTR_TO_SOCKET verifier type") Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-09Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', ↵Paul E. McKenney
'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD doc.2019.01.26a: Documentation updates. fixes.2019.01.26a: Miscellaneous fixes. sil.2019.01.26a: Removal of a few more spin_is_locked() instances. spdx.2019.02.09a: Add SPDX identifiers to RCU files srcu.2019.01.26a: SRCU updates. torture.2019.01.26a: Torture-test updates.
2019-02-09locking/locktorture: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09torture: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/update: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/tree: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Update .h file SPDX comment format per Joe Perches. ] Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/tiny: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/sync: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/srcu: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/rcutorture: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/rcu_segcblist: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/rcuperf: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09rcu/rcu.h: Convert to SPDX license identifierPaul E. McKenney
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal fixes from Eric Biederman: "This contains four small fixes for signal handling. A missing range check, a regression fix, prioritizing signals we have already started a signal group exit for, and better detection of synchronous signals. The confused decision of which signals to handle failed spectacularly when a timer was pointed at SIGBUS and the stack overflowed. Resulting in an unkillable process in an infinite loop instead of a SIGSEGV and core dump" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Better detection of synchronous signals signal: Always notice exiting tasks signal: Always attempt to allocate siginfo for SIGSTOP signal: Make siginmask safe when passed a signal of 0
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "This pull request is dedicated to the upcoming snowpocalypse parts 2 and 3 in the Pacific Northwest: 1) Drop profiles are broken because some drivers use dev_kfree_skb* instead of dev_consume_skb*, from Yang Wei. 2) Fix IWLWIFI kconfig deps, from Luca Coelho. 3) Fix percpu maps updating in bpftool, from Paolo Abeni. 4) Missing station release in batman-adv, from Felix Fietkau. 5) Fix some networking compat ioctl bugs, from Johannes Berg. 6) ucc_geth must reset the BQL queue state when stopping the device, from Mathias Thore. 7) Several XDP bug fixes in virtio_net from Toshiaki Makita. 8) TSO packets must be sent always on queue 0 in stmmac, from Jose Abreu. 9) Fix socket refcounting bug in RDS, from Eric Dumazet. 10) Handle sparse cpu allocations in bpf selftests, from Martynas Pumputis. 11) Make sure mgmt frames have enough tailroom in mac80211, from Felix Feitkau. 12) Use safe list walking in sctp_sendmsg() asoc list traversal, from Greg Kroah-Hartman. 13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL ccid, from Eric Dumazet. 14) Need to reload WoL password into bcmsysport device after deep sleeps, from Florian Fainelli. 15) Remove filter from mask before freeing in cls_flower, from Petr Machata. 16) Missing release and use after free in error paths of s390 qeth code, from Julian Wiedmann. 17) Fix lockdep false positive in dsa code, from Marc Zyngier. 18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn. 19) Fix EQ firmware assert in qed driver, from Manish Chopra. 20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net: dsa: b53: Fix for failure when irq is not defined in dt sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() geneve: should not call rt6_lookup() when ipv6 was disabled net: Don't default Cavium PTP driver to 'y' net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net/mlx5e: Don't overwrite pedit action when multiple pedit used net/mlx5e: Update hw flows when encap source mac changed qed*: Advance drivers version to 8.37.0.20 qed: Change verbosity for coalescing message. qede: Fix system crash on configuring channels. qed: Consider TX tcs while deriving the max num_queues for PF. ...
2019-02-08Merge tag 'driver-core-5.0-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core fixes for 5.0-rc6. Well, not so much "driver core" as "debugfs". There's a lot of outstanding debugfs cleanup patches coming in through different subsystem trees, and in that process the debugfs core was found that it really should return errors when something bad happens, to prevent random files from showing up in the root of debugfs afterward. So debugfs was fixed up to handle this properly, and then two fixes for the relay and blk-mq code was needed as it was making invalid assumptions about debugfs return values. There's also a cacheinfo fix in here that resolves a tiny issue. All of these have been in linux-next for over a week with no reported problems" * tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: blk-mq: protect debugfs_create_files() from failures relay: check return of create_buf_file() properly debugfs: debugfs_lookup() should return NULL if not found debugfs: return error values, not NULL debugfs: fix debugfs_rename parameter checking cacheinfo: Keep the old value if of_property_read_u32 fails
2019-02-08printk: Export console_printkPrarit Bhargava
The fbcon can be built as a module and requires console_printk. Export console_printk. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2019-02-08futex: Handle early deadlock return correctlyThomas Gleixner
commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") changed the locking rules in the futex code so that the hash bucket lock is not longer held while the waiter is enqueued into the rtmutex wait list. This made the lock and the unlock path symmetric, but unfortunately the possible early exit from __rt_mutex_proxy_start() due to a detected deadlock was not updated accordingly. That allows a concurrent unlocker to observe inconsitent state which triggers the warning in the unlock path. futex_lock_pi() futex_unlock_pi() lock(hb->lock) queue(hb_waiter) lock(hb->lock) lock(rtmutex->wait_lock) unlock(hb->lock) // acquired hb->lock hb_waiter = futex_top_waiter() lock(rtmutex->wait_lock) __rt_mutex_proxy_start() ---> fail remove(rtmutex_waiter); ---> returns -EDEADLOCK unlock(rtmutex->wait_lock) // acquired wait_lock wake_futex_pi() rt_mutex_next_owner() --> returns NULL --> WARN lock(hb->lock) unqueue(hb_waiter) The problem is caused by the remove(rtmutex_waiter) in the failure case of __rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the hash bucket but no waiter on the rtmutex, i.e. inconsistent state. The original commit handles this correctly for the other early return cases (timeout, signal) by delaying the removal of the rtmutex waiter until the returning task reacquired the hash bucket lock. Treat the failure case of __rt_mutex_proxy_start() in the same way and let the existing cleanup code handle the eventual handover of the rtmutex gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter removal for the failure case, so that the other callsites are still operating correctly. Add proper comments to the code so all these details are fully documented. Thanks to Peter for helping with the analysis and writing the really valuable code comments. Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Stefan Liebler <stli@linux.ibm.com> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de
2019-02-08futex: Fix barrier commentDavidlohr Bueso
The current comment for the barrier that guarantees that waiter increment is always before taking the hb spinlock (barrier (A)) needs to be fixed as it is misplaced. This is obviously referring to hb_waiters_inc, which is a full barrier. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190206185602.949-1-dave@stgolabs.net
2019-02-07audit: hide auditsc_get_stamp and audit_serial prototypesRichard Guy Briggs
auditsc_get_stamp() and audit_serial() are internal audit functions so move their prototypes from include/linux/audit.h to kernel/audit.h so they are not visible to the rest of the kernel. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-02-07mm: make mm->pinned_vm an atomic64 counterDavidlohr Bueso
Taking a sleeping lock to _only_ increment a variable is quite the overkill, and pretty much all users do this. Furthermore, some drivers (ie: infiniband and scif) that need pinned semantics can go to quite some trouble to actually delay via workqueue (un)accounting for pinned pages when not possible to acquire it. By making the counter atomic we no longer need to hold the mmap_sem and can simply some code around it for pinned_vm users. The counter is 64-bit such that we need not worry about overflows such as rdma user input controlled from userspace. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07signal: Better detection of synchronous signalsEric W. Biederman
Recently syzkaller was able to create unkillablle processes by creating a timer that is delivered as a thread local signal on SIGHUP, and receiving SIGHUP SA_NODEFERER. Ultimately causing a loop failing to deliver SIGHUP but always trying. When the stack overflows delivery of SIGHUP fails and force_sigsegv is called. Unfortunately because SIGSEGV is numerically higher than SIGHUP next_signal tries again to deliver a SIGHUP. From a quality of implementation standpoint attempting to deliver the timer SIGHUP signal is wrong. We should attempt to deliver the synchronous SIGSEGV signal we just forced. We can make that happening in a fairly straight forward manner by instead of just looking at the signal number we also look at the si_code. In particular for exceptions (aka synchronous signals) the si_code is always greater than 0. That still has the potential to pick up a number of asynchronous signals as in a few cases the same si_codes that are used for synchronous signals are also used for asynchronous signals, and SI_KERNEL is also included in the list of possible si_codes. Still the heuristic is much better and timer signals are definitely excluded. Which is enough to prevent all known ways for someone sending a process signals fast enough to cause unexpected and arguably incorrect behavior. Cc: stable@vger.kernel.org Fixes: a27341cd5fcb ("Prioritize synchronous signals over 'normal' signals") Tested-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-02-07signal: Always notice exiting tasksEric W. Biederman
Recently syzkaller was able to create unkillablle processes by creating a timer that is delivered as a thread local signal on SIGHUP, and receiving SIGHUP SA_NODEFERER. Ultimately causing a loop failing to deliver SIGHUP but always trying. Upon examination it turns out part of the problem is actually most of the solution. Since 2.5 signal delivery has found all fatal signals, marked the signal group for death, and queued SIGKILL in every threads thread queue relying on signal->group_exit_code to preserve the information of which was the actual fatal signal. The conversion of all fatal signals to SIGKILL results in the synchronous signal heuristic in next_signal kicking in and preferring SIGHUP to SIGKILL. Which is especially problematic as all fatal signals have already been transformed into SIGKILL. Instead of dequeueing signals and depending upon SIGKILL to be the first signal dequeued, first test if the signal group has already been marked for death. This guarantees that nothing in the signal queue can prevent a process that needs to exit from exiting. Cc: stable@vger.kernel.org Tested-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Ref: ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-02-07Merge tag 'trace-v5.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This has two fixes for uprobe code. - Cut and paste fix to have uprobe printks say "uprobe" and not "kprobe" - Add terminating '\0' byte when copying function arguments" * tag 'trace-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/uprobes: Fix output for multiple string arguments tracing: uprobes: Fix typo in pr_fmt string
2019-02-07y2038: rename old time and utime syscallsArnd Bergmann
The time, stime, utime, utimes, and futimesat system calls are only used on older architectures, and we do not provide y2038 safe variants of them, as they are replaced by clock_gettime64, clock_settime64, and utimensat_time64. However, for consistency it seems better to have the 32-bit architectures that still use them call the "time32" entry points (leaving the traditional handlers for the 64-bit architectures), like we do for system calls that now require two versions. Note: We used to always define __ARCH_WANT_SYS_TIME and __ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and __ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat mode. The resulting asm/unistd.h changes look a bit counterintuitive. This is only a cleanup patch and it should not change any behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-02-07y2038: syscalls: rename y2038 compat syscallsArnd Bergmann
A lot of system calls that pass a time_t somewhere have an implementation using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have been reworked so that this implementation can now be used on 32-bit architectures as well. The missing step is to redefine them using the regular SYSCALL_DEFINEx() to get them out of the compat namespace and make it possible to build them on 32-bit architectures. Any system call that ends in 'time' gets a '32' suffix on its name for that version, while the others get a '_time32' suffix, to distinguish them from the normal version, which takes a 64-bit time argument in the future. In this step, only 64-bit architectures are changed, doing this rename first lets us avoid touching the 32-bit architectures twice. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07timex: change syscalls to use struct __kernel_timexDeepa Dinamani
struct timex is not y2038 safe. Switch all the syscall apis to use y2038 safe __kernel_timex. Note that sys_adjtimex() does not have a y2038 safe solution. C libraries can implement it by calling clock_adjtime(CLOCK_REALTIME, ...). Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07timex: use __kernel_timex internallyDeepa Dinamani
struct timex is not y2038 safe. Replace all uses of timex with y2038 safe __kernel_timex. Note that struct __kernel_timex is an ABI interface definition. We could define a new structure based on __kernel_timex that is only available internally instead. Right now, there isn't a strong motivation for this as the structure is isolated to a few defined struct timex interfaces and such a structure would be exactly the same as struct timex. The patch was generated by the following coccinelle script: virtual patch @depends on patch forall@ identifier ts; expression e; @@ ( - struct timex ts; + struct __kernel_timex ts; | - struct timex ts = {}; + struct __kernel_timex ts = {}; | - struct timex ts = e; + struct __kernel_timex ts = e; | - struct timex *ts; + struct __kernel_timex *ts; | (memset \| copy_from_user \| copy_to_user \)(..., - sizeof(struct timex)) + sizeof(struct __kernel_timex)) ) @depends on patch forall@ identifier ts; identifier fn; @@ fn(..., - struct timex *ts, + struct __kernel_timex *ts, ...) { ... } @depends on patch forall@ identifier ts; identifier fn; @@ fn(..., - struct timex *ts) { + struct __kernel_timex *ts) { ... } Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: linux-alpha@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07sparc64: add custom adjtimex/clock_adjtime functionsArnd Bergmann
sparc64 is the only architecture on Linux that has a 'timeval' definition with a 32-bit tv_usec but a 64-bit tv_sec. This causes problems for sparc32 compat mode when we convert it to use the new __kernel_timex type that has the same layout as all other 64-bit architectures. To avoid adding sparc64 specific code into the generic adjtimex implementation, this adds a wrapper in the sparc64 system call handling that converts the sparc64 'timex' into the new '__kernel_timex'. At this point, the two structures are defined to be identical, but that will change in the next step once we convert sparc32. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07time: make adjtime compat handling available for 32 bitArnd Bergmann
We want to reuse the compat_timex handling on 32-bit architectures the same way we are using the compat handling for timespec when moving to 64-bit time_t. Move all definitions related to compat_timex out of the compat code into the normal timekeeping code, along with a rename to old_timex32, corresponding to the timespec/timeval structures, and make it controlled by CONFIG_COMPAT_32BIT_TIME, which 32-bit architectures will then select. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-06ring-buffer: Remove unused function ring_buffer_page_len()Miroslav Benes
Commit 6b7e633fe9c2 ("tracing: Remove extra zeroing out of the ring buffer page") removed the only caller of ring_buffer_page_len(). The function is now unused and may be removed. Link: http://lkml.kernel.org/r/20181228133847.106177-1-mbenes@suse.cz Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06tracing: Show stacktrace for wakeup tracersChangbin Du
This align the behavior of wakeup tracers with irqsoff latency tracer that we record stacktrace at the beginning and end of waking up. The stacktrace shows us what is happening in the kernel. Link: http://lkml.kernel.org/r/20190116160249.7554-1-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06tracing: Put a margin between flags and duration for wakeup tracersChangbin Du
Don't mix context flags with function duration info. Instead of this: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNh5 | /* 0:120:R + [000] 11: 0:R migration/0 */ 2 us | 0) <idle>-0 | dNh5 0.000 us | (null)(); 4 us | 0) <idle>-0 | dNh4 | _raw_spin_unlock() { 4 us | 0) <idle>-0 | dNh4 0.304 us | preempt_count_sub(); 5 us | 0) <idle>-0 | dNh3 1.063 us | } 5 us | 0) <idle>-0 | dNh3 0.266 us | ttwu_stat(); 6 us | 0) <idle>-0 | dNh3 | _raw_spin_unlock_irqrestore() { 6 us | 0) <idle>-0 | dNh3 0.273 us | preempt_count_sub(); 6 us | 0) <idle>-0 | dNh2 0.818 us | } Show this: # tracer: wakeup # # wakeup latency trace v1.1.5 on 4.20.0+ # -------------------------------------------------------------------- # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */ 3 us | 0) <idle>-0 | dNs. | 0.000 us | (null)(); 67 us | 0) <idle>-0 | dNs. | 0.721 us | ttwu_stat(); 69 us | 0) <idle>-0 | dNs. | 0.607 us | _raw_spin_unlock_irqrestore(); 71 us | 0) <idle>-0 | .Ns. | 0.598 us | _raw_spin_lock_irq(); 72 us | 0) <idle>-0 | .Ns. | 0.584 us | _raw_spin_lock_irq(); 73 us | 0) <idle>-0 | dNs. | + 11.118 us | __next_timer_interrupt(); 75 us | 0) <idle>-0 | dNs. | | call_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | delayed_work_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | __queue_work() { ... Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06tracing: Show more info for funcgraph wakeup tracersChangbin Du
Add these info fields to funcgraph wakeup tracers: o Show CPU info since the waker could be on a different CPU. o Show function duration and overhead. o Show IRQ markers. Link: http://lkml.kernel.org/r/20190101154614.8887-3-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06tracing: Add comment to predicate_parse() about "&&" or "||"Steven Rostedt (VMware)
As the predicat_parse() code is rather complex, commenting subtleties is important. The switch case statement should be commented to describe that it is only looking for two '&' or '|' together, which is why the fall through to an error is after the check. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06tracing: Annotate implicit fall through in predicate_parse()Mathieu Malaterre
There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/trace/trace_events_filter.c:494:8: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203039.16535-2-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06tracing: Annotate implicit fall through in parse_probe_arg()Mathieu Malaterre
There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/trace/trace_probe.c:302:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203039.16535-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06function_graph: Support displaying relative timestampChangbin Du
When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06perf/aux: Make perf_event accessible to setup_aux()Mathieu Poirier
When pmu::setup_aux() is called the coresight PMU needs to know which sink to use for the session by looking up the information in the event's attr::config2 field. As such simply replace the cpu information by the complete perf_event structure and change all affected customers. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06livepatch: Module coming and going callbacks can proceed with all listed patchesPetr Mladek
Livepatches can no longer get enabled and disabled repeatedly. The list klp_patches contains only enabled patches and eventually the patch in transition. The module coming and going callbacks do no longer need to check for these state. They have to proceed with all listed patches. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-02-06livepatch: Introduce klp_for_each_patch macroPetr Mladek
There are already macros to iterate over struct klp_func and klp_object. Add also klp_for_each_patch(). But make it internal because also klp_patches list is internal. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-02-06livepatch: core: Return EOPNOTSUPP instead of ENOSYSAlice Ferrazzi
As a result of an unsupported operation is better to use EOPNOTSUPP as error code. ENOSYS is only used for 'invalid syscall nr' and nothing else. Signed-off-by: Alice Ferrazzi <alice.ferrazzi@miraclelinux.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-02-05irqdesc: Add domain handler for NMIsJulien Thierry
NMI handling code should be executed between calls to nmi_enter and nmi_exit. Add a separate domain handler to properly setup NMI context when handling an interrupt requested as NMI. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>