summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-07-08Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-07-08Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The changes in this cycle are: - RCU flavor consolidation cleanups and optmizations - Documentation updates - Miscellaneous fixes - SRCU updates - RCU-sync flavor consolidation - Torture-test updates - Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) tools/memory-model: Improve data-race detection tools/memory-model: Change definition of rcu-fence tools/memory-model: Expand definition of barrier tools/memory-model: Do not use "herd" to refer to "herd7" tools/memory-model: Fix comment in MP+poonceonces.litmus Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() rcu: Don't return a value from rcu_assign_pointer() rcu: Force inlining of rcu_read_lock() rcu: Fix irritating whitespace error in rcu_assign_pointer() rcu: Upgrade sync_exp_work_done() to smp_mb() rcutorture: Upper case solves the case of the vanishing NULL pointer torture: Suppress propagating trace_printk() warning rcutorture: Dump trace buffer for callback pipe drain failures torture: Add --trust-make to suppress "make clean" torture: Make --cpus override idleness calculations torture: Run kernel build in source directory torture: Add function graph-tracing cheat sheet torture: Capture qemu output rcutorture: Tweak kvm options rcutorture: Add trivial RCU implementation ...
2019-07-08Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x96 apic updates from Thomas Gleixner: "Updates for the x86 APIC interrupt handling and APIC timer: - Fix a long standing issue with spurious interrupts which was caused by the big vector management rework a few years ago. Robert Hodaszi provided finally enough debug data and an excellent initial failure analysis which allowed to understand the underlying issues. This contains a change to the core interrupt management code which is required to handle this correctly for the APIC/IO_APIC. The core changes are NOOPs for most architectures except ARM64. ARM64 is not impacted by the change as confirmed by Marc Zyngier. - Newer systems allow to disable the PIT clock for power saving causing panic in the timer interrupt delivery check of the IO/APIC when the HPET timer is not enabled either. While the clock could be turned on this would cause an endless whack a mole game to chase the proper register in each affected chipset. These systems provide the relevant frequencies for TSC, CPU and the local APIC timer via CPUID and/or MSRs, which allows to avoid the PIT/HPET based calibration. As the calibration code is the only usage of the legacy timers on modern systems and is skipped anyway when the frequencies are known already, there is no point in setting up the PIT and actually checking for the interrupt delivery via IO/APIC. To achieve this on a wide variety of platforms, the CPUID/MSR based frequency readout has been made more robust, which also allowed to remove quite some workarounds which turned out to be not longer required. Thanks to Daniel Drake for analysis, patches and verification" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Seperate unused system vectors from spurious entry again x86/irq: Handle spurious interrupt after shutdown gracefully x86/ioapic: Implement irq_get_irqchip_state() callback genirq: Add optional hardware synchronization for shutdown genirq: Fix misleading synchronize_irq() documentation genirq: Delay deactivation in free_irq() x86/timer: Skip PIT initialization on modern chipsets x86/apic: Use non-atomic operations when possible x86/apic: Make apic_bsp_setup() static x86/tsc: Set LAPIC timer period to crystal clock frequency x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period' x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
2019-07-08Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer and timekeeping departement delivers: Core: - The consolidation of the VDSO code into a generic library including the conversion of x86 and ARM64. Conversion of ARM and MIPS are en route through the relevant maintainer trees and should end up in 5.4. This gets rid of the unnecessary different copies of the same code and brings all architectures on the same level of VDSO functionality. - Make the NTP user space interface more robust by restricting the TAI offset to prevent undefined behaviour. Includes a selftest. - Validate user input in the compat settimeofday() syscall to catch invalid values which would be turned into valid values by a multiplication overflow - Consolidate the time accessors - Small fixes, improvements and cleanups all over the place Drivers: - Support for the NXP system counter, TI davinci timer - Move the Microsoft HyperV clocksource/events code into the drivers/clocksource directory so it can be shared between x86 and ARM64. - Overhaul of the Tegra driver - Delay timer support for IXP4xx - Small fixes, improvements and cleanups as usual" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) time: Validate user input in compat_settimeofday() timer: Document TIMER_PINNED clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic clocksource/drivers: Make Hyper-V clocksource ISA agnostic MAINTAINERS: Fix Andy's surname and the directory entries of VDSO hrtimer: Use a bullet for the returns bullet list arm64: vdso: Fix compilation with clang older than 8 arm64: compat: Fix __arch_get_hw_counter() implementation arm64: Fix __arch_get_hw_counter() implementation lib/vdso: Make delta calculation work correctly MAINTAINERS: Add entry for the generic VDSO library arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system arm64: vdso: Remove unnecessary asm-offsets.c definitions vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h clocksource/drivers/davinci: Add support for clocksource clocksource/drivers/davinci: Add support for clockevents clocksource/drivers/tegra: Set up maximum-ticks limit properly clocksource/drivers/tegra: Cycles can't be 0 clocksource/drivers/tegra: Restore base address before cleanup clocksource/drivers/tegra: Add verbose definition for 1MHz constant ...
2019-07-08Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The irq departement provides the usual mixed bag: Core: - Further improvements to the irq timings code which aims to predict the next interrupt for power state selection to achieve better latency/power balance - Add interrupt statistics to the core NMI handlers - The usual small fixes and cleanups Drivers: - Support for Renesas RZ/A1, Annapurna Labs FIC, Meson-G12A SoC and Amazon Gravition AMR/GIC interrupt controllers. - Rework of the Renesas INTC controller driver - ACPI support for Socionext SoCs - Enhancements to the CSKY interrupt controller - The usual small fixes and cleanups" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) irq/irqdomain: Fix comment typo genirq: Update irq stats from NMI handlers irqchip/gic-pm: Remove PM_CLK dependency irqchip/al-fic: Introduce Amazon's Annapurna Labs Fabric Interrupt Controller Driver dt-bindings: interrupt-controller: Add Amazon's Annapurna Labs FIC softirq: Use __this_cpu_write() in takeover_tasklets() irqchip/mbigen: Stop printing kernel addresses irqchip/gic: Add dependency for ARM_GIC_MAX_NR genirq/affinity: Remove unused argument from [__]irq_build_affinity_masks() genirq/timings: Add selftest for next event computation genirq/timings: Add selftest for irqs circular buffer genirq/timings: Add selftest for circular array genirq/timings: Encapsulate storing function genirq/timings: Encapsulate timings push genirq/timings: Optimize the period detection speed genirq/timings: Fix timings buffer inspection genirq/timings: Fix next event index function irqchip/qcom: Use struct_size() in devm_kzalloc() irqchip/irq-csky-mpintc: Remove unnecessary loop in interrupt handler dt-bindings: interrupt-controller: Update csky mpintc ...
2019-07-08Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP/hotplug updates from Thomas Gleixner: "A small set of updates for SMP and CPU hotplug: - Abort disabling secondary CPUs in the freezer when a wakeup is pending instead of evaluating it only after all CPUs have been offlined. - Remove the shared annotation for the strict per CPU cfd_data in the smp function call core code. - Remove the return values of smp_call_function() and on_each_cpu() as they are unconditionally 0. Fixup the few callers which actually bothered to check the return value" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove smp_call_function() and on_each_cpu() return values smp: Do not mark call_function_data as shared cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()
2019-07-08Merge tag 's390-5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Improve stop_machine wait logic: replace cpu_relax_yield call in generic stop_machine function with a weak stop_machine_yield function. This is overridden on s390, which yields the current cpu to the neighbouring cpu after a couple of retries, instead of blindly giving up the cpu to the hipervisor. This significantly improves stop_machine performance on s390 in overcommitted scenarios. This includes common code changes which have been Acked by Peter Zijlstra and Thomas Gleixner. - Improve jump label transformation speed: transform jump labels without using stop_machine. - Refactoring of the vfio-ccw cp handling, simplifying the code and avoiding unneeded allocating/copying. - Various vfio-ccw fixes (ccw translation, state machine). - Add support for vfio-ap queue interrupt control in the guest. This includes s390 kvm changes which have been Acked by Christian Borntraeger. - Add protected virtualization support for virtio-ccw. - Enforce both CONFIG_SMP and CONFIG_HOTPLUG_CPU, which allows to remove some code which most likely isn't working at all, besides that s390 didn't even compile for !CONFIG_SMP. - Support for special flagged EP11 CPRBs for zcrypt. - Handle PCI devices with no support for new MIO instructions. - Avoid KASAN false positives in reworked stack unwinder. - Couple of fixes for the QDIO layer. - Convert s390 specific documentation to ReST format. - Let s390 crypto modules return -ENODEV instead of -EOPNOTSUPP if hardware is missing. This way our modules behave like most other modules and which is also what systemd's systemd-modules-load.service expects. - Replace defconfig with performance_defconfig, so there is one config file less to maintain. - Remove the SCLP call home device driver, which was never useful. - Cleanups all over the place. * tag 's390-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (83 commits) docs: s390: s390dbf: typos and formatting, update crash command docs: s390: unify and update s390dbf kdocs at debug.c docs: s390: restore important non-kdoc parts of s390dbf.rst vfio-ccw: Fix the conversion of Format-0 CCWs to Format-1 s390/pci: correctly handle MIO opt-out s390/pci: deal with devices that have no support for MIO instructions s390: ap: kvm: Enable PQAP/AQIC facility for the guest s390: ap: implement PAPQ AQIC interception in kernel vfio: ap: register IOMMU VFIO notifier s390: ap: kvm: add PQAP interception for AQIC s390/unwind: cleanup unused READ_ONCE_TASK_STACK s390/kasan: avoid false positives during stack unwind s390/qdio: don't touch the dsci in tiqdio_add_input_queues() s390/qdio: (re-)initialize tiqdio list entries s390/dasd: Fix a precision vs width bug in dasd_feature_list() s390/cio: introduce driver_override on the css bus vfio-ccw: make convert_ccw0_to_ccw1 static vfio-ccw: Remove copy_ccw_from_iova() vfio-ccw: Factor out the ccw0-to-ccw1 transition vfio-ccw: Copy CCW data outside length calculation ...
2019-07-08Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP} - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to manage the permissions of executable vmalloc regions more strictly - Slight performance improvement by keeping softirqs enabled while touching the FPSIMD/SVE state (kernel_neon_begin/end) - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG and AXFLAG instructions for floating point comparison flags manipulation) and FRINT (rounding floating point numbers to integers) - Re-instate ARM64_PSEUDO_NMI support which was previously marked as BROKEN due to some bugs (now fixed) - Improve parking of stopped CPUs and implement an arm64-specific panic_smp_self_stop() to avoid warning on not being able to stop secondary CPUs during panic - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI platforms - perf: DDR performance monitor support for iMX8QXP - cache_line_size() can now be set from DT or ACPI/PPTT if provided to cope with a system cache info not exposed via the CPUID registers - Avoid warning on hardware cache line size greater than ARCH_DMA_MINALIGN if the system is fully coherent - arm64 do_page_fault() and hugetlb cleanups - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep) - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags' introduced in 5.1) - CONFIG_RANDOMIZE_BASE now enabled in defconfig - Allow the selection of ARM64_MODULE_PLTS, currently only done via RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill over into the vmalloc area - Make ZONE_DMA32 configurable * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits) perf: arm_spe: Enable ACPI/Platform automatic module loading arm_pmu: acpi: spe: Add initial MADT/SPE probing ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens ACPI/PPTT: Modify node flag detection to find last IDENTICAL x86/entry: Simplify _TIF_SYSCALL_EMU handling arm64: rename dump_instr as dump_kernel_instr arm64/mm: Drop [PTE|PMD]_TYPE_FAULT arm64: Implement panic_smp_self_stop() arm64: Improve parking of stopped CPUs arm64: Expose FRINT capabilities to userspace arm64: Expose ARMv8.5 CondM capability to userspace arm64: defconfig: enable CONFIG_RANDOMIZE_BASE arm64: ARM64_MODULES_PLTS must depend on MODULES arm64: bpf: do not allocate executable memory arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP arm64: module: create module allocations without exec permissions arm64: Allow user selection of ARM64_MODULE_PLTS acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 arm64: Allow selecting Pseudo-NMI again ...
2019-07-07time: Validate user input in compat_settimeofday()zhengbin
The user value is validated after converting the timeval to a timespec, but for a wide range of negative tv_usec values the multiplication overflow turns them in positive numbers. So the 'validated later' is not catching the invalid input. Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1562460701-113301-1-git-send-email-zhengbin13@huawei.com
2019-07-06irq/irqdomain: Fix comment typoZenghui Yu
Fix typo in the comment on top of __irq_domain_add(). Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1562388072-23492-1-git-send-email-yuzenghui@huawei.com
2019-07-06genirq: Update irq stats from NMI handlersShijith Thotton
The NMI handlers handle_percpu_devid_fasteoi_nmi() and handle_fasteoi_nmi() do not update the interrupt counts. Due to that the NMI interrupt count does not show up correctly in /proc/interrupts. Add the statistics and treat the NMI handlers in the same way as per cpu interrupts and prevent them from updating irq_desc::tot_count as this might be corrupted due to concurrency. [ tglx: Massaged changelog ] Fixes: 2dcf1fbcad35 ("genirq: Provide NMI handlers") Signed-off-by: Shijith Thotton <sthotton@marvell.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1562313336-11888-1-git-send-email-sthotton@marvell.com
2019-07-05ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEMEJann Horn
Fix two issues: When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU reference to the parent's objective credentials, then give that pointer to get_cred(). However, the object lifetime rules for things like struct cred do not permit unconditionally turning an RCU reference into a stable reference. PTRACE_TRACEME records the parent's credentials as if the parent was acting as the subject, but that's not the case. If a malicious unprivileged child uses PTRACE_TRACEME and the parent is privileged, and at a later point, the parent process becomes attacker-controlled (because it drops privileges and calls execve()), the attacker ends up with control over two processes with a privileged ptrace relationship, which can be abused to ptrace a suid binary and obtain root privileges. Fix both of these by always recording the credentials of the process that is requesting the creation of the ptrace relationship: current_cred() can't change under us, and current is the proper subject for access control. This change is theoretically userspace-visible, but I am not aware of any code that it will actually break. Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-04Merge tag 'trace-v5.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This includes three fixes: - Fix a deadlock from a previous fix to keep module loading and function tracing text modifications from stepping on each other (this has a few patches to help document the issue in comments) - Fix a crash when the snapshot buffer gets out of sync with the main ring buffer - Fix a memory leak when reading the memory logs" * tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare() tracing/snapshot: Resize spare buffer if size changed tracing: Fix memory leak in tracing_err_log_open() ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare() ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()
2019-07-03Merge branch 'timers/vdso' into timers/coreThomas Gleixner
so the hyper-v clocksource update can be applied.
2019-07-03genirq: Add optional hardware synchronization for shutdownThomas Gleixner
free_irq() ensures that no hardware interrupt handler is executing on a different CPU before actually releasing resources and deactivating the interrupt completely in a domain hierarchy. But that does not catch the case where the interrupt is on flight at the hardware level but not yet serviced by the target CPU. That creates an interesing race condition: CPU 0 CPU 1 IRQ CHIP interrupt is raised sent to CPU1 Unable to handle immediately (interrupts off, deep idle delay) mask() ... free() shutdown() synchronize_irq() release_resources() do_IRQ() -> resources are not available That might be harmless and just trigger a spurious interrupt warning, but some interrupt chips might get into a wedged state. Utilize the existing irq_get_irqchip_state() callback for the synchronization in free_irq(). synchronize_hardirq() is not using this mechanism as it might actually deadlock unter certain conditions, e.g. when called with interrupts disabled and the target CPU is the one on which the synchronization is invoked. synchronize_irq() uses it because that function cannot be called from non preemtible contexts as it might sleep. No functional change intended and according to Marc the existing GIC implementations where the driver supports the callback should be able to cope with that core change. Famous last words. Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode") Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20190628111440.279463375@linutronix.de
2019-07-03genirq: Fix misleading synchronize_irq() documentationThomas Gleixner
The function might sleep, so it cannot be called from interrupt context. Not even with care. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20190628111440.189241552@linutronix.de
2019-07-03genirq: Delay deactivation in free_irq()Thomas Gleixner
When interrupts are shutdown, they are immediately deactivated in the irqdomain hierarchy. While this looks obviously correct there is a subtle issue: There might be an interrupt in flight when free_irq() is invoking the shutdown. This is properly handled at the irq descriptor / primary handler level, but the deactivation might completely disable resources which are required to acknowledge the interrupt. Split the shutdown code and deactivate the interrupt after synchronization in free_irq(). Fixup all other usage sites where this is not an issue to invoke the combined shutdown_and_deactivate() function instead. This still might be an issue if the interrupt in flight servicing is delayed on a remote CPU beyond the invocation of synchronize_irq(), but that cannot be handled at that level and needs to be handled in the synchronize_irq() context. Fixes: f8264e34965a ("irqdomain: Introduce new interfaces to support hierarchy irqdomains") Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20190628111440.098196390@linutronix.de
2019-07-01fork: return proper negative error codeChristian Brauner
Make sure to return a proper negative error code from copy_process() when anon_inode_getfile() fails with CLONE_PIDFD. Otherwise _do_fork() will not detect an error and get_task_pid() will operator on a nonsensical pointer: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c R13: 00007ffc15fbb0ff R14: 00007ff07e47e9c0 R15: 0000000000000000 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 7990 Comm: syz-executor290 Not tainted 5.2.0-rc6+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] RIP: 0010:get_task_pid+0xe1/0x210 kernel/pid.c:372 Code: 89 ff e8 62 27 5f 00 49 8b 07 44 89 f1 4c 8d bc c8 90 01 00 00 eb 0c e8 0d fe 25 00 49 81 c7 38 05 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 18 00 74 08 4c 89 ff e8 31 27 5f 00 4d 8b 37 e8 f9 47 12 00 RSP: 0018:ffff88808a4a7d78 EFLAGS: 00010203 RAX: 00000000000000a7 RBX: dffffc0000000000 RCX: ffff888088180600 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88808a4a7d90 R08: ffffffff814fb3a8 R09: ffffed1015d66bf8 R10: ffffed1015d66bf8 R11: 1ffff11015d66bf7 R12: 0000000000041ffc R13: 1ffff11011494fbc R14: 0000000000000000 R15: 000000000000053d FS: 00007ff07e47e700(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b5100 CR3: 0000000094df2000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _do_fork+0x1b9/0x5f0 kernel/fork.c:2360 __do_sys_clone kernel/fork.c:2454 [inline] __se_sys_clone kernel/fork.c:2448 [inline] __x64_sys_clone+0xc1/0xd0 kernel/fork.c:2448 do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Link: https://lore.kernel.org/lkml/000000000000e0dc0d058c9e7142@google.com Reported-and-tested-by: syzbot+002e636502bc4b64eb5c@syzkaller.appspotmail.com Fixes: 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups") Cc: Jann Horn <jannh@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-30Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP fixes from Thomas Gleixner: "Two small changes for the cpu hotplug code: - Prevent out of bounds access which actually might crash the machine caused by a missing bounds check in the fail injection code - Warn about unsupported migitation mode command line arguments to make people aware that they typoed the paramater. Not necessarily a fix but quite some people tripped over that" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Fix out-of-bounds read when setting fail state cpu/speculation: Warn on unsupported mitigations= parameter
2019-06-29Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes, most of them related to bugs perf fuzzing found in the x86 code" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/regs: Use PERF_REG_EXTENDED_MASK perf/x86: Remove pmu->pebs_no_xmm_regs perf/x86: Clean up PEBS_XMM_REGS perf/x86/regs: Check reserved bits perf/x86: Disable extended registers for non-supported PMUs perf/ioctl: Add check for the sample_period value perf/core: Fix perf_sample_regs_user() mm check
2019-06-29Merge tag 'pm-5.2-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Avoid skipping bus-level PCI power management during system resume for PCIe ports left in D0 during the preceding suspend transition on platforms where the power states of those ports can change out of the PCI layer's control" * tag 'pm-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PCI: PM: Avoid skipping bus-level PM on platforms without ACPI
2019-06-29fork,memcg: alloc_thread_stack_node needs to set tsk->stackAndrea Arcangeli
Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail") corrected two instances, but there was a third instance of this bug. Without setting tsk->stack, if memcg_charge_kernel_stack fails, it'll execute free_thread_stack() on a dangling pointer. Enterprise kernels are compiled with VMAP_STACK=y so this isn't critical, but custom VMAP_STACK=n builds should have some performance advantage, with the drawback of risking to fail fork because compaction didn't succeed. So as long as VMAP_STACK=n is a supported option it's worth fixing it upstream. Link: http://lkml.kernel.org/r/20190619011450.28048-1-aarcange@redhat.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29signal: remove the wrong signal_pending() check in restore_user_sigmask()Oleg Nesterov
This is the minimal fix for stable, I'll send cleanups later. Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") introduced the visible change which breaks user-space: a signal temporary unblocked by set_user_sigmask() can be delivered even if the caller returns success or timeout. Change restore_user_sigmask() to accept the additional "interrupted" argument which should be used instead of signal_pending() check, and update the callers. Eric said: : For clarity. I don't think this is required by posix, or fundamentally to : remove the races in select. It is what linux has always done and we have : applications who care so I agree this fix is needed. : : Further in any case where the semantic change that this patch rolls back : (aka where allowing a signal to be delivered and the select like call to : complete) would be advantage we can do as well if not better by using : signalfd. : : Michael is there any chance we can get this guarantee of the linux : implementation of pselect and friends clearly documented. The guarantee : that if the system call completes successfully we are guaranteed that no : signal that is unblocked by using sigmask will be delivered? Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Eric Wong <e@80x24.org> Tested-by: Eric Wong <e@80x24.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jason Baron <jbaron@akamai.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Laight <David.Laight@ACULAB.COM> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-28tracing/snapshot: Resize spare buffer if size changedEiichi Tsukata
Current snapshot implementation swaps two ring_buffers even though their sizes are different from each other, that can cause an inconsistency between the contents of buffer_size_kb file and the current buffer size. For example: # cat buffer_size_kb 7 (expanded: 1408) # echo 1 > events/enable # grep bytes per_cpu/cpu0/stats bytes: 1441020 # echo 1 > snapshot // current:1408, spare:1408 # echo 123 > buffer_size_kb // current:123, spare:1408 # echo 1 > snapshot // current:1408, spare:123 # grep bytes per_cpu/cpu0/stats bytes: 1443700 # cat buffer_size_kb 123 // != current:1408 And also, a similar per-cpu case hits the following WARNING: Reproducer: # echo 1 > per_cpu/cpu0/snapshot # echo 123 > buffer_size_kb # echo 1 > per_cpu/cpu0/snapshot WARNING: WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380 Modules linked in: CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380 Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48 RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093 RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8 RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005 RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000 R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060 FS: 00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0 Call Trace: ? trace_array_printk_buf+0x140/0x140 ? __mutex_lock_slowpath+0x10/0x10 tracing_snapshot_write+0x4c8/0x7f0 ? trace_printk_init_buffers+0x60/0x60 ? selinux_file_permission+0x3b/0x540 ? tracer_preempt_off+0x38/0x506 ? trace_printk_init_buffers+0x60/0x60 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 ? __ia32_sys_read+0xb0/0xb0 ? do_syscall_64+0x1f/0x390 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe This patch adds resize_buffer_duplicate_size() to check if there is a difference between current/spare buffer sizes and resize a spare buffer if necessary. Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com Cc: stable@vger.kernel.org Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28tracing: Fix memory leak in tracing_err_log_open()Takeshi Misawa
When tracing_err_log_open() calls seq_open(), allocated memory is not freed. kmemleak report: unreferenced object 0xffff92c0781d1100 (size 128): comm "tail", pid 15116, jiffies 4295163855 (age 22.704s) hex dump (first 32 bytes): 00 f0 08 e5 c0 92 ff ff 00 10 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000000d0687d5>] kmem_cache_alloc+0x11f/0x1e0 [<000000003e3039a8>] seq_open+0x2f/0x90 [<000000008dd36b7d>] tracing_err_log_open+0x67/0x140 [<000000005a431ae2>] do_dentry_open+0x1df/0x3a0 [<00000000a2910603>] vfs_open+0x2f/0x40 [<0000000038b0a383>] path_openat+0x2e8/0x1690 [<00000000fe025bda>] do_filp_open+0x9b/0x110 [<00000000483a5091>] do_sys_open+0x1ba/0x260 [<00000000c558b5fd>] __x64_sys_openat+0x20/0x30 [<000000006881ec07>] do_syscall_64+0x5a/0x130 [<00000000571c2e94>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by calling seq_release() in tracing_err_log_fops.release(). Link: http://lkml.kernel.org/r/20190628105640.GA1863@DESKTOP Fixes: 8a062902be725 ("tracing: Add tracing error log") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28ftrace/x86: Remove possible deadlock between register_kprobe() and ↵Petr Mladek
ftrace_run_update_code() The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") causes a possible deadlock between register_kprobe() and ftrace_run_update_code() when ftrace is using stop_machine(). The existing dependency chain (in reverse order) is: -> #1 (text_mutex){+.+.}: validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 __mutex_lock+0x88/0x908 mutex_lock_nested+0x32/0x40 register_kprobe+0x254/0x658 init_kprobes+0x11a/0x168 do_one_initcall+0x70/0x318 kernel_init_freeable+0x456/0x508 kernel_init+0x22/0x150 ret_from_fork+0x30/0x34 kernel_thread_starter+0x0/0xc -> #0 (cpu_hotplug_lock.rw_sem){++++}: check_prev_add+0x90c/0xde0 validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 cpus_read_lock+0x62/0xd0 stop_machine+0x2e/0x60 arch_ftrace_update_code+0x2e/0x40 ftrace_run_update_code+0x40/0xa0 ftrace_startup+0xb2/0x168 register_ftrace_function+0x64/0x88 klp_patch_object+0x1a2/0x290 klp_enable_patch+0x554/0x980 do_one_initcall+0x70/0x318 do_init_module+0x6e/0x250 load_module+0x1782/0x1990 __s390x_sys_finit_module+0xaa/0xf0 system_call+0xd8/0x2d0 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); It is similar problem that has been solved by the commit 2d1e38f56622b9b ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved. To be on the safe side, text_mutex must become a low level lock taken after cpu_hotplug_lock.rw_sem. This can't be achieved easily with the current ftrace design. For example, arm calls set_all_modules_text_rw() already in ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c. This functions is called: + outside stop_machine() from ftrace_run_update_code() + without stop_machine() from ftrace_module_enable() Fortunately, the problematic fix is needed only on x86_64. It is the only architecture that calls set_all_modules_text_rw() in ftrace path and supports livepatching at the same time. Therefore it is enough to move text_mutex handling from the generic kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c: ftrace_arch_code_modify_prepare() ftrace_arch_code_modify_post_process() This patch basically reverts the ftrace part of the problematic commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race"). And provides x86_64 specific-fix. Some refactoring of the ftrace code will be needed when livepatching is implemented for arm or nds32. These architectures call set_all_modules_text_rw() and use stop_machine() at the same time. Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") Acked-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> [ As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of ftrace_run_update_code() as it is a void function. ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-28Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull rcu/next + tools/memory-model changes from Paul E. McKenney: - RCU flavor consolidation cleanups and optmizations - Documentation updates - Miscellaneous fixes - SRCU updates - RCU-sync flavor consolidation - Torture-test updates - Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-27hrtimer: Use a bullet for the returns bullet listMauro Carvalho Chehab
That gets rid of this warning: ./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank line; unexpected unindent. and displays nicely both at the source code and at the produced documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linux Doc Mailing List <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lkml.kernel.org/r/74ddad7dac331b4e5ce4a90e15c8a49e3a16d2ac.1561372382.git.mchehab+samsung@kernel.org
2019-06-27copy_process(): don't use ksys_close() on cleanupsAl Viro
anon_inode_getfd() should be used *ONLY* in situations when we are guaranteed to be past the last failure point (including copying the descriptor number to userland, at that). And ksys_close() should not be used for cleanups at all. anon_inode_getfile() is there for all nontrivial cases like that. Just use that... Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27cpu/hotplug: Fix out-of-bounds read when setting fail stateEiichi Tsukata
Setting invalid value to /sys/devices/system/cpu/cpuX/hotplug/fail can control `struct cpuhp_step *sp` address, results in the following global-out-of-bounds read. Reproducer: # echo -2 > /sys/devices/system/cpu/cpu0/hotplug/fail KASAN report: BUG: KASAN: global-out-of-bounds in write_cpuhp_fail+0x2cd/0x2e0 Read of size 8 at addr ffffffff89734438 by task bash/1941 CPU: 0 PID: 1941 Comm: bash Not tainted 5.2.0-rc6+ #31 Call Trace: write_cpuhp_fail+0x2cd/0x2e0 dev_attr_store+0x58/0x80 sysfs_kf_write+0x13d/0x1a0 kernfs_fop_write+0x2bc/0x460 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f05e4f4c970 The buggy address belongs to the variable: cpu_hotplug_lock+0x98/0xa0 Memory state around the buggy address: ffffffff89734300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffff89734400: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa ^ ffffffff89734480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Add a sanity check for the value written from user space. Fixes: 1db49484f21ed ("smp/hotplug: Hotplug state fail injection") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/20190627024732.31672-1-devel@etsukata.com
2019-06-26PCI: PM: Avoid skipping bus-level PM on platforms without ACPIRafael J. Wysocki
There are platforms that do not call pm_set_suspend_via_firmware(), so pm_suspend_via_firmware() returns 'false' on them, but the power states of PCI devices (PCIe ports in particular) are changed as a result of powering down core platform components during system-wide suspend. Thus the pm_suspend_via_firmware() checks in pci_pm_suspend_noirq() and pci_pm_resume_noirq() introduced by commit 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to- idle") are not sufficient to determine that devices left in D0 during suspend will remain in D0 during resume and so the bus-level power management can be skipped for them. For this reason, introduce a new global suspend flag, PM_SUSPEND_FLAG_NO_PLATFORM, set it for suspend-to-idle only and replace the pm_suspend_via_firmware() checks mentioned above with checks against this flag. Fixes: 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-06-26cpu/speculation: Warn on unsupported mitigations= parameterGeert Uytterhoeven
Currently, if the user specifies an unsupported mitigation strategy on the kernel command line, it will be ignored silently. The code will fall back to the default strategy, possibly leaving the system more vulnerable than expected. This may happen due to e.g. a simple typo, or, for a stable kernel release, because not all mitigation strategies have been backported. Inform the user by printing a message. Fixes: 98af8452945c5565 ("cpu/speculation: Add 'mitigations=' cmdline option") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190516070935.22546-1-geert@linux-m68k.org
2019-06-25locking/lockdep: increase size of counters for lockdep statisticsKobe Wu
When system has been running for a long time, signed integer counters are not enough for some lockdep statistics. Using unsigned long counters can satisfy the requirement. Besides, most of lockdep statistics are unsigned. It is better to use unsigned int instead of int. Remove unused variables. - max_recursion_depth - nr_cyclic_check_recursions - nr_find_usage_forwards_recursions - nr_find_usage_backwards_recursions Signed-off-by: Kobe Wu <kobe-cp.wu@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <linux-mediatek@lists.infradead.org> Cc: <wsd_upstream@mediatek.com> Cc: Eason Lin <eason-yh.lin@mediatek.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/1561365348-16050-1-git-send-email-kobe-cp.wu@mediatek.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-25locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && ↵Arnd Bergmann
CONFIG_PROVE_LOCKING The last cleanup patch triggered another issue, as now another function should be moved into the same section: kernel/locking/lockdep.c:3580:12: error: 'mark_lock' defined but not used [-Werror=unused-function] static int mark_lock(struct task_struct *curr, struct held_lock *this, Move mark_lock() into the same #ifdef section as its only caller, and remove the now-unused mark_lock_irq() stub helper. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yuyang Du <duyuyang@gmail.com> Fixes: 0d2cc3b34532 ("locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING") Link: https://lkml.kernel.org/r/20190617124718.1232976-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Disable extended registers for non-supported PMUsKan Liang
The perf fuzzer caused Skylake machine to crash: [ 9680.085831] Call Trace: [ 9680.088301] <IRQ> [ 9680.090363] perf_output_sample_regs+0x43/0xa0 [ 9680.094928] perf_output_sample+0x3aa/0x7a0 [ 9680.099181] perf_event_output_forward+0x53/0x80 [ 9680.103917] __perf_event_overflow+0x52/0xf0 [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0 [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150 [ 9680.117475] ? check_preempt_wakeup+0x181/0x230 [ 9680.122091] ? check_preempt_curr+0x62/0x90 [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140 [ 9680.130355] ? try_to_wake_up+0x54/0x460 [ 9680.134366] ? reweight_entity+0x15b/0x1a0 [ 9680.138559] ? __queue_work+0x103/0x3f0 [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270 [ 9680.147194] ? timerqueue_del+0x1e/0x40 [ 9680.151092] ? __remove_hrtimer+0x35/0x70 [ 9680.155191] __hrtimer_run_queues+0x100/0x280 [ 9680.159658] hrtimer_interrupt+0x100/0x220 [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140 [ 9680.168555] apic_timer_interrupt+0xf/0x20 [ 9680.172756] </IRQ> The XMM registers can only be collected by PEBS hardware events on the platforms with PEBS baseline support, e.g. Icelake, not software/probe events. Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU which support extended registers. For X86, the extended registers are XMM registers. Add has_extended_regs() to check if extended registers are applied. The generic code define the mask of extended registers as 0 if arch headers haven't overridden it. Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/ioctl: Add check for the sample_period valueRavi Bangoria
perf_event_open() limits the sample_period to 63 bits. See: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Make ioctl() consistent with it. Also on PowerPC, negative sample_period could cause a recursive PMIs leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: maddy@linux.vnet.ibm.com Cc: mpe@ellerman.id.au Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bangoria@linux.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24fork: don't check parent_tidptr with CLONE_PIDFDDmitry V. Levin
Give userspace a cheap and reliable way to tell whether CLONE_PIDFD is supported by the kernel or not. The easiest way is to pass an invalid file descriptor value in parent_tidptr, perform the syscall and verify that parent_tidptr has been changed to a valid file descriptor value. CLONE_PIDFD uses parent_tidptr to return pidfds. CLONE_PARENT_SETTID will use parent_tidptr to return the tid of the parent. The two flags cannot be used together. Old kernels that only support CLONE_PARENT_SETTID will not verify the value pointed to by parent_tidptr. This behavior is unchanged even with the introduction of CLONE_PIDFD. However, if CLONE_PIDFD is specified the kernel will currently check the value pointed to by parent_tidptr before placing the pidfd in the memory pointed to. EINVAL will be returned if the value in parent_tidptr is not 0. If CLONE_PIDFD is supported and fd 0 is closed, then the returned pidfd can and likely will be 0 and parent_tidptr will be unchanged. This means userspace must either check CLONE_PIDFD support beforehand or check that fd 0 is not closed when invoking CLONE_PIDFD. The check for pidfd == 0 was introduced during the v5.2 merge window by commit b3e583825266 ("clone: add CLONE_PIDFD") to ensure that CLONE_PIDFD could be potentially extended by passing in flags through the return argument. However, that extension would look horrible, and with the upcoming introduction of the clone3 syscall in v5.3 there is no need to extend legacy clone syscall this way. (Even if it would need to be extended, CLONE_DETACHED can be reused with CLONE_PIDFD.) So remove the pidfd == 0 check. Userspace that needs to be portable to kernels without CLONE_PIDFD support can then be advised to initialize pidfd to -1 and check the pidfd value returned by CLONE_PIDFD. Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-23softirq: Use __this_cpu_write() in takeover_tasklets()Muchun Song
The code is executed with interrupts disabled, so it's safe to use __this_cpu_write(). [ tglx: Massaged changelog ] Signed-off-by: Muchun Song <smuchun@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: joel@joelfernandes.org Cc: rostedt@goodmis.org Cc: frederic@kernel.org Cc: paulmck@linux.vnet.ibm.com Cc: alexander.levin@verizon.com Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/20190618143305.2038-1-smuchun@gmail.com
2019-06-23smp: Remove smp_call_function() and on_each_cpu() return valuesNadav Amit
The return value is fixed. Remove it and amend the callers. [ tglx: Fixup arm/bL_switcher and powerpc/rtas ] Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20190613064813.8102-2-namit@vmware.com
2019-06-23smp: Do not mark call_function_data as sharedNadav Amit
cfd_data is marked as shared, but although it hold pointers to shared data structures, it is private per core. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20190613064813.8102-8-namit@vmware.com
2019-06-23timer_list: Guard procfs specific codeNathan Huckleberry
With CONFIG_PROC_FS=n the following warning is emitted: kernel/time/timer_list.c:361:36: warning: unused variable 'timer_list_sops' [-Wunused-const-variable] static const struct seq_operations timer_list_sops = { Add #ifdef guard around procfs specific code. Signed-off-by: Nathan Huckleberry <nhuck@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/534 Link: https://lkml.kernel.org/r/20190614181604.112297-1-nhuck@google.com
2019-06-22timekeeping: Provide a generic update_vsyscall() implementationVincenzo Frascino
The new generic VDSO library allows to unify the update_vsyscall[_tz]() implementations. Provide a generic implementation based on the x86 code and the bindings which need to be implemented in architecture specific code. [ tglx: Moved it into kernel/time where it belongs. Removed the pointless line breaks in the stub functions. Massaged changelog ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shijith Thotton <sthotton@marvell.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Link: https://lkml.kernel.org/r/20190621095252.32307-4-vincenzo.frascino@arm.com
2019-06-22posix-timers: Use spin_lock_irq() in itimer_delete()Sebastian Andrzej Siewior
itimer_delete() uses spin_lock_irqsave() to obtain a `flags' variable which can then be passed to unlock_timer(). It uses already spin_lock locking for the structure instead of lock_timer() because it has a timer which can not be removed by others at this point. The cleanup is always performed with enabled interrupts. Use spin_lock_irq() / spin_unlock_irq() so the `flags' variable can be removed. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190621143643.25649-3-bigeasy@linutronix.de
2019-06-22posix-timers: Remove "it_signal = NULL" assignment in itimer_delete()Sebastian Andrzej Siewior
itimer_delete() is invoked during do_exit(). At this point it is the last thread in the group dying and doing the clean up. Since it is the last thread in the group, there can not be any other task attempting to lock the itimer which means the NULL assignment (which avoids lookups in __lock_timer()) is not required. The assignment and comment was copied in commit 0e568881178ff ("[PATCH] fix posix-timers to have proper per-process scope") from sys_timer_delete() which was/is the syscall interface and requires the assignment. Remove the superfluous ->it_signal = NULL assignment. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190621143643.25649-2-bigeasy@linutronix.de
2019-06-22timekeeping: Use proper clock specifier names in functionsJason A. Donenfeld
This makes boot uniformly boottime and tai uniformly clocktai, to address the remaining oversights. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190621203249.3909-2-Jason@zx2c4.com
2019-06-22timekeeping: Use proper ktime_add when adding nsecs in coarse offsetJason A. Donenfeld
While this doesn't actually amount to a real difference, since the macro evaluates to the same thing, every place else operates on ktime_t using these functions, so let's not break the pattern. Fixes: e3ff9c3678b4 ("timekeeping: Repair ktime_get_coarse*() granularity") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190621203249.3909-1-Jason@zx2c4.com
2019-06-22Merge branch 'linus' into timers/coreThomas Gleixner
Pick up upstream fixes for pending changes.
2019-06-22ntp: Limit TAI-UTC offsetMiroslav Lichvar
Don't allow the TAI-UTC offset of the system clock to be set by adjtimex() to a value larger than 100000 seconds. This prevents an overflow in the conversion to int, prevents the CLOCK_TAI clock from getting too far ahead of the CLOCK_REALTIME clock, and it is still large enough to allow leap seconds to be inserted at the maximum rate currently supported by the kernel (once per day) for the next ~270 years, however unlikely it is that someone can survive a catastrophic event which slowed down the rotation of the Earth so much. Reported-by: Weikang shi <swkhack@gmail.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190618154713.20929-1-mlichvar@redhat.com
2019-06-21Merge tag 'spdx-5.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
2019-06-21arm64: Fix interrupt tracing in the presence of NMIsJulien Thierry
In the presence of any form of instrumentation, nmi_enter() should be done before calling any traceable code and any instrumentation code. Currently, nmi_enter() is done in handle_domain_nmi(), which is much too late as instrumentation code might get called before. Move the nmi_enter/exit() calls to the arch IRQ vector handler. On arm64, it is not possible to know if the IRQ vector handler was called because of an NMI before acknowledging the interrupt. However, It is possible to know whether normal interrupts could be taken in the interrupted context (i.e. if taking an NMI in that context could introduce a potential race condition). When interrupting a context with IRQs disabled, call nmi_enter() as soon as possible. In contexts with IRQs enabled, defer this to the interrupt controller, which is in a better position to know if an interrupt taken is an NMI. Fixes: bc3c03ccb464 ("arm64: Enable the support of pseudo-NMIs") Cc: <stable@vger.kernel.org> # 5.1.x- Cc: Will Deacon <will.deacon@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>