summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-06-28ftrace/x86: Remove possible deadlock between register_kprobe() and ↵Petr Mladek
ftrace_run_update_code() The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") causes a possible deadlock between register_kprobe() and ftrace_run_update_code() when ftrace is using stop_machine(). The existing dependency chain (in reverse order) is: -> #1 (text_mutex){+.+.}: validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 __mutex_lock+0x88/0x908 mutex_lock_nested+0x32/0x40 register_kprobe+0x254/0x658 init_kprobes+0x11a/0x168 do_one_initcall+0x70/0x318 kernel_init_freeable+0x456/0x508 kernel_init+0x22/0x150 ret_from_fork+0x30/0x34 kernel_thread_starter+0x0/0xc -> #0 (cpu_hotplug_lock.rw_sem){++++}: check_prev_add+0x90c/0xde0 validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 cpus_read_lock+0x62/0xd0 stop_machine+0x2e/0x60 arch_ftrace_update_code+0x2e/0x40 ftrace_run_update_code+0x40/0xa0 ftrace_startup+0xb2/0x168 register_ftrace_function+0x64/0x88 klp_patch_object+0x1a2/0x290 klp_enable_patch+0x554/0x980 do_one_initcall+0x70/0x318 do_init_module+0x6e/0x250 load_module+0x1782/0x1990 __s390x_sys_finit_module+0xaa/0xf0 system_call+0xd8/0x2d0 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); It is similar problem that has been solved by the commit 2d1e38f56622b9b ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved. To be on the safe side, text_mutex must become a low level lock taken after cpu_hotplug_lock.rw_sem. This can't be achieved easily with the current ftrace design. For example, arm calls set_all_modules_text_rw() already in ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c. This functions is called: + outside stop_machine() from ftrace_run_update_code() + without stop_machine() from ftrace_module_enable() Fortunately, the problematic fix is needed only on x86_64. It is the only architecture that calls set_all_modules_text_rw() in ftrace path and supports livepatching at the same time. Therefore it is enough to move text_mutex handling from the generic kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c: ftrace_arch_code_modify_prepare() ftrace_arch_code_modify_post_process() This patch basically reverts the ftrace part of the problematic commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race"). And provides x86_64 specific-fix. Some refactoring of the ftrace code will be needed when livepatching is implemented for arm or nds32. These architectures call set_all_modules_text_rw() and use stop_machine() at the same time. Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") Acked-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> [ As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of ftrace_run_update_code() as it is a void function. ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-19Merge tag 'kbuild-v5.2-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - remove unneeded use of cc-option, cc-disable-warning, cc-ldoption - exclude tracked files from .gitignore - re-enable -Wint-in-bool-context warning - refactor samples/Makefile - stop building immediately if syncconfig fails - do not sprinkle error messages when $(CC) does not exist - move arch/alpha/defconfig to the configs subdirectory - remove crappy header search path manipulation - add comment lines to .config to clarify the end of menu blocks - check uniqueness of module names (adding new warnings intentionally) * tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (24 commits) kconfig: use 'else ifneq' for Makefile to improve readability kbuild: check uniqueness of module names kconfig: Terminate menu blocks with a comment in the generated config kbuild: add LICENSES to KBUILD_ALLDIRS kbuild: remove 'addtree' and 'flags' magic for header search paths treewide: prefix header search paths with $(srctree)/ media: prefix header search paths with $(srctree)/ media: remove unneeded header search paths alpha: move arch/alpha/defconfig to arch/alpha/configs/defconfig kbuild: terminate Kconfig when $(CC) or $(LD) is missing kbuild: turn auto.conf.cmd into a mandatory include file .gitignore: exclude .get_maintainer.ignore and .gitattributes kbuild: add all Clang-specific flags unconditionally kbuild: Don't try to add '-fcatch-undefined-behavior' flag kbuild: add some extra warning flags unconditionally kbuild: add -Wvla flag unconditionally arch: remove dangling asm-generic wrappers samples: guard sub-directories with CONFIG options kbuild: re-enable int-in-bool-context warning MAINTAINERS: kbuild: Add pattern for scripts/*vmlinux* ...
2019-05-19Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "This fixes a particularly thorny munmap() bug with MPX, plus fixes a host build environment assumption in objtool" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Allow AR to be overridden with HOSTAR x86/mpx, mm/core: Fix recursive munmap() corruption
2019-05-18treewide: prefix header search paths with $(srctree)/Masahiro Yamada
Currently, the Kbuild core manipulates header search paths in a crazy way [1]. To fix this mess, I want all Makefiles to add explicit $(srctree)/ to the search paths in the srctree. Some Makefiles are already written in that way, but not all. The goal of this work is to make the notation consistent, and finally get rid of the gross hacks. Having whitespaces after -I does not matter since commit 48f6e3cf5bc6 ("kbuild: do not drop -I without parameter"). [1]: https://patchwork.kernel.org/patch/9632347/ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - support for SVE and Pointer Authentication in guests - PMU improvements POWER: - support for direct access to the POWER9 XIVE interrupt controller - memory and performance optimizations x86: - support for accessing memory not backed by struct page - fixes and refactoring Generic: - dirty page tracking improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits) kvm: fix compilation on aarch64 Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" kvm: x86: Fix L1TF mitigation for shadow MMU KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing" KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete tests: kvm: Add tests for KVM_SET_NESTED_STATE KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID tests: kvm: Add tests to .gitignore KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one KVM: Fix the bitmap range to copy during clear dirty KVM: arm64: Fix ptrauth ID register masking logic KVM: x86: use direct accessors for RIP and RSP KVM: VMX: Use accessors for GPRs outside of dedicated caching logic KVM: x86: Omit caching logic for always-available GPRs kvm, x86: Properly check whether a pfn is an MMIO or not ...
2019-05-17Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull more vfs mount updates from Al Viro: "Propagation of new syscalls to other architectures + cosmetic change from Christian (fscontext didn't follow the convention for anon inode names)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: uapi: Wire up the mount API syscalls on non-x86 arches [ver #2] uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2] uapi, fsopen: use square brackets around "fscontext" [ver #2]
2019-05-16Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes and updates: - a handful of MDS documentation/comment updates - a cleanup related to hweight interfaces - a SEV guest fix for large pages - a kprobes LTO fix - and a final cleanup commit for vDSO HPET support removal" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mds: Improve CPU buffer clear documentation x86/speculation/mds: Revert CPU buffer clear on double fault exit x86/kconfig: Disable CONFIG_GENERIC_HWEIGHT and remove __HAVE_ARCH_SW_HWEIGHT x86/mm: Do not use set_{pud, pmd}_safe() when splitting a large page x86/kprobes: Make trampoline_handler() global and visible x86/vdso: Remove hpet_page from vDSO
2019-05-16Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An x86 PMU constraint fix, an interface fix, and a Sparse fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Allow PEBS multi-entry in watermark mode perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking perf/x86/amd/iommu: Make the 'amd_iommu_attr_groups' symbol static
2019-05-16uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]David Howells
Fix the syscall numbering of the mount API syscalls so that the numbers match between i386 and x86_64 and that they're in the common numbering scheme space. Fixes: a07b20004793 ("vfs: syscall: Add open_tree(2) to reference or clone a mount") Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around") Fixes: 24dcb3d90a1f ("vfs: syscall: Add fsopen() to prepare for superblock creation") Fixes: ecdab150fddb ("vfs: syscall: Add fsconfig() for configuring and managing a context") Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock") Fixes: cf3cba4a429b ("vfs: syscall: Add fspick() to select a superblock for reconfiguration") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-16x86/speculation/mds: Revert CPU buffer clear on double fault exitAndy Lutomirski
The double fault ESPFIX path doesn't return to user mode at all -- it returns back to the kernel by simulating a #GP fault. prepare_exit_to_usermode() will run on the way out of general_protection before running user code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jon Masters <jcm@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-16Merge branch 'linus' into x86/urgent, to pick up dependent changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-15Merge tag 'for-v5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: "Core: - Add over-current health state - Add standard, adaptive and custom charge types - Add new properties for start/end charge threshold New Drivers / Hardware: - UCS1002 Programmable USB Port Power Controller - Ingenic JZ47xx Battery Fuel Gauge - AXP20x USB Power: Add AXP813 support - AT91 poweroff: Add SAM9X60 support - OLPC battery: Add XO-1.5 and XO-1.75 support Misc Changes: - syscon-reboot: support mask property - AXP288 fuel gauge: Blacklist ACEPC T8/T11. Looks like some vendor thought it's a good idea to build a desktop system with a fuel gauge, that slowly "discharges"... - cpcap-battery: Fix calculation errors - misc fixes" * tag 'for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (54 commits) power: supply: olpc_battery: force the le/be casts power: supply: ucs1002: Fix build error without CONFIG_REGULATOR power: supply: ucs1002: Fix wrong return value checking power: supply: Add driver for Microchip UCS1002 dt-bindings: power: supply: Add bindings for Microchip UCS1002 power: supply: core: Add POWER_SUPPLY_HEALTH_OVERCURRENT constant power: supply: core: fix clang -Wunsequenced power: supply: core: Add missing documentation for CHARGE_CONTROL_* properties power: supply: core: Add CHARGE_CONTROL_{START_THRESHOLD,END_THRESHOLD} properties power: supply: core: Add Standard, Adaptive, and Custom charge types power: supply: axp288_fuel_gauge: Add ACEPC T8 and T11 mini PCs to the blacklist power: supply: bq27xxx_battery: Notify also about status changes power: supply: olpc_battery: Have the framework register sysfs files for us power: supply: olpc_battery: Add OLPC XO 1.75 support power: supply: olpc_battery: Avoid using platform_info power: supply: olpc_battery: Use devm_power_supply_register() power: supply: olpc_battery: Move priv data to a struct power: supply: olpc_battery: Use DT to get battery version x86/platform/olpc: Use a correct version when making up a battery node x86/platform/olpc: Trivial code move in DT fixup ...
2019-05-15Merge tag 'for-linus-5.2b-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - some minor cleanups - two small corrections for Xen on ARM - two fixes for Xen PVH guest support - a patch for a new command line option to tune virtual timer handling * tag 'for-linus-5.2b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/arm: Use p2m entry with lock protection xen/arm: Free p2m entry if fail to add it to RB tree xen/pvh: correctly setup the PV EFI interface for dom0 xen/pvh: set xen_domain_type to HVM in xen_pvh_init xenbus: drop useless LIST_HEAD in xenbus_write_watch() and xenbus_file_write() xen-netfront: mark expected switch fall-through xen: xen-pciback: fix warning Using plain integer as NULL pointer x86/xen: Add "xen_timer_slop" command line option
2019-05-15Merge tag 'trace-v5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The major changes in this tracing update includes: - Removal of non-DYNAMIC_FTRACE from 32bit x86 - Removal of mcount support from x86 - Emulating a call from int3 on x86_64, fixes live kernel patching - Consolidated Tracing Error logs file Minor updates: - Removal of klp_check_compiler_support() - kdb ftrace dumping output changes - Accessing and creating ftrace instances from inside the kernel - Clean up of #define if macro - Introduction of TRACE_EVENT_NOP() to disable trace events based on config options And other minor fixes and clean ups" * tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) x86: Hide the int3_emulate_call/jmp functions from UML livepatch: Remove klp_check_compiler_support() ftrace/x86: Remove mcount support ftrace/x86_32: Remove support for non DYNAMIC_FTRACE tracing: Simplify "if" macro code tracing: Fix documentation about disabling options using trace_options tracing: Replace kzalloc with kcalloc tracing: Fix partial reading of trace event's id file tracing: Allow RCU to run between postponed startup tests tracing: Fix white space issues in parse_pred() function tracing: Eliminate const char[] auto variables ring-buffer: Fix mispelling of Calculate tracing: probeevent: Fix to make the type of $comm string tracing: probeevent: Do not accumulate on ret variable tracing: uprobes: Re-enable $comm support for uprobe events ftrace/x86_64: Emulate call function while updating in breakpoint handler x86_64: Allow breakpoints to emulate call instructions x86_64: Add gap to int3 to allow for call emulation tracing: kdb: Allow ftdump to skip all but the last few entries tracing: Add trace_total_entries() / trace_total_entries_cpu() ...
2019-05-15Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"Sean Christopherson
The RDPMC-exiting control is dependent on the existence of the RDPMC instruction itself, i.e. is not tied to the "Architectural Performance Monitoring" feature. For all intents and purposes, the control exists on all CPUs with VMX support since RDPMC also exists on all VCPUs with VMX supported. Per Intel's SDM: The RDPMC instruction was introduced into the IA-32 Architecture in the Pentium Pro processor and the Pentium processor with MMX technology. The earlier Pentium processors have performance-monitoring counters, but they must be read with the RDMSR instruction. Because RDPMC-exiting always exists, KVM requires the control and refuses to load if it's not available. As a result, hiding the PMU from a guest breaks nested virtualization if the guest attemts to use KVM. While it's not explicitly stated in the RDPMC pseudocode, the VM-Exit check for RDPMC-exiting follows standard fault vs. VM-Exit prioritization for privileged instructions, e.g. occurs after the CPL/CR0.PE/CR4.PCE checks, but before the counter referenced in ECX is checked for validity. In other words, the original KVM behavior of injecting a #GP was correct, and the KVM unit test needs to be adjusted accordingly, e.g. eat the #GP when the unit test guest (L3 in this case) executes RDPMC without RDPMC-exiting set in the unit test host (L2). This reverts commit e51bfdb68725dc052d16241ace40ea3140f938aa. Fixes: e51bfdb68725 ("KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU") Reported-by: David Hill <hilld@binarystorm.net> Cc: Saar Amar <saaramar@microsoft.com> Cc: Mihai Carabas <mihai.carabas@oracle.com> Cc: Jim Mattson <jmattson@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-15kvm: x86: Fix L1TF mitigation for shadow MMUKai Huang
Currently KVM sets 5 most significant bits of physical address bits reported by CPUID (boot_cpu_data.x86_phys_bits) for nonpresent or reserved bits SPTE to mitigate L1TF attack from guest when using shadow MMU. However for some particular Intel CPUs the physical address bits of internal cache is greater than physical address bits reported by CPUID. Use the kernel's existing boot_cpu_data.x86_cache_bits to determine the five most significant bits. Doing so improves KVM's L1TF mitigation in the unlikely scenario that system RAM overlaps the high order bits of the "real" physical address space as reported by CPUID. This aligns with the kernel's warnings regarding L1TF mitigation, e.g. in the above scenario the kernel won't warn the user about lack of L1TF mitigation if x86_cache_bits is greater than x86_phys_bits. Also initialize shadow_nonpresent_or_rsvd_mask explicitly to make it consistent with other 'shadow_{xxx}_mask', and opportunistically add a WARN once if KVM's L1TF mitigation cannot be applied on a system that is marked as being susceptible to L1TF. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-15KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possibleSean Christopherson
If L1 is using an MSR bitmap, unconditionally merge the MSR bitmaps from L0 and L1 for MSR_{KERNEL,}_{FS,GS}_BASE. KVM unconditionally exposes MSRs L1. If KVM is also running in L1 then it's highly likely L1 is also exposing the MSRs to L2, i.e. KVM doesn't need to intercept L2 accesses. Based on code from Jintack Lim. Cc: Jintack Lim <jintack@xxxxxxxxxxxxxxx> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-15Merge tag 'pm-5.2-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These fix a recent regression causing kernels built with CONFIG_PM unset to crash on systems that support the Performance and Energy Bias Hint (EPB), clean up the cpufreq core and some users of transition notifiers and introduce a new power domain flag into the generic power domains framework (genpd). Specifics: - Fix recent regression causing kernels built with CONFIG_PM unset to crash on systems that support the Performance and Energy Bias Hint (EPB) by avoiding to compile the EPB-related code depending on CONFIG_PM when it is unset (Rafael Wysocki). - Clean up the transition notifier invocation code in the cpufreq core and change some users of cpufreq transition notifiers accordingly (Viresh Kumar). - Change MAINTAINERS to cover the schedutil governor as part of cpufreq (Viresh Kumar). - Simplify cpufreq_init_policy() to avoid redundant computations (Yue Hu). - Add explanatory comment to the cpufreq core (Rafael Wysocki). - Introduce a new flag, GENPD_FLAG_RPM_ALWAYS_ON, to the generic power domains (genpd) framework along with the first user of it (Leonard Crestez)" * tag 'pm-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: soc: imx: gpc: Use GENPD_FLAG_RPM_ALWAYS_ON for ERR009619 PM / Domains: Add GENPD_FLAG_RPM_ALWAYS_ON flag cpufreq: Update MAINTAINERS to include schedutil governor cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy() cpufreq: Explain the kobject_put() in cpufreq_policy_alloc() cpufreq: Call transition notifier only once for each policy x86: intel_epb: Take CONFIG_PM into account
2019-05-15Merge branches 'pm-cpufreq' and 'pm-domains'Rafael J. Wysocki
* pm-cpufreq: cpufreq: Update MAINTAINERS to include schedutil governor cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy() cpufreq: Explain the kobject_put() in cpufreq_policy_alloc() cpufreq: Call transition notifier only once for each policy * pm-domains: soc: imx: gpc: Use GENPD_FLAG_RPM_ALWAYS_ON for ERR009619 PM / Domains: Add GENPD_FLAG_RPM_ALWAYS_ON flag
2019-05-14treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h>Masahiro Yamada
Since commit dccd2304cc90 ("ARM: 7430/1: sizes.h: move from asm-generic to <linux/sizes.h>"), <asm/sizes.h> and <asm-generic/sizes.h> are just wrappers of <linux/sizes.h>. This commit replaces all <asm/sizes.h> and <asm-generic/sizes.h> to prepare for the removal. Link: http://lkml.kernel.org/r/1553267665-27228-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14compiler: allow all arches to enable CONFIG_OPTIMIZE_INLININGMasahiro Yamada
Commit 60a3cdd06394 ("x86: add optimized inlining") introduced CONFIG_OPTIMIZE_INLINING, but it has been available only for x86. The idea is obviously arch-agnostic. This commit moves the config entry from arch/x86/Kconfig.debug to lib/Kconfig.debug so that all architectures can benefit from it. This can make a huge difference in kernel image size especially when CONFIG_OPTIMIZE_FOR_SIZE is enabled. For example, I got 3.5% smaller arm64 kernel for v5.1-rc1. dec file 18983424 arch/arm64/boot/Image.before 18321920 arch/arm64/boot/Image.after This also slightly improves the "Kernel hacking" Kconfig menu as e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen') suggested; this config option would be a good fit in the "compiler option" menu. Link: http://lkml.kernel.org/r/20190423034959.13525-12-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Brezillon <bbrezillon@kernel.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marek Vasut <marek.vasut@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Stefan Agner <stefan@agner.ch> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headersMasahiro Yamada
The "WITH Linux-syscall-note" should be added to headers exported to the user-space. Some kernel-space headers have "WITH Linux-syscall-note", which seems a mistake. [1] arch/x86/include/asm/hyperv-tlfs.h Commit 5a4858032217 ("x86/hyper-v: move hyperv.h out of uapi") moved this file out of uapi, but missed to update the SPDX License tag. [2] include/asm-generic/shmparam.h Commit 76ce2a80a28e ("Rename include/{uapi => }/asm-generic/shmparam.h really") moved this file out of uapi, but missed to update the SPDX License tag. [3] include/linux/qcom-geni-se.h Commit eddac5af0654 ("soc: qcom: Add GENI based QUP Wrapper driver") added this file, but I do not see a good reason why its license tag must include "WITH Linux-syscall-note". Link: http://lkml.kernel.org/r/1554196104-3522-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14Merge tag 'pci-v5.2-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration changes: - Add _HPX Type 3 settings support, which gives firmware more influence over device configuration (Alexandru Gagniuc) - Support fixed bus numbers from bridge Enhanced Allocation capabilities (Subbaraya Sundeep) - Add "external-facing" DT property to identify cases where we require IOMMU protection against untrusted devices (Jean-Philippe Brucker) - Enable PCIe services for host controller drivers that use managed host bridge alloc (Jean-Philippe Brucker) - Log PCIe port service messages with pci_dev, not the pcie_device (Frederick Lawler) - Convert pciehp from pciehp_debug module parameter to generic dynamic debug (Frederick Lawler) Peer-to-peer DMA: - Add whitelist of Root Complexes that support peer-to-peer DMA between Root Ports (Christian König) Native controller drivers: - Add PCI host bridge DMA ranges for bridges that can't DMA everywhere, e.g., iProc (Srinath Mannam) - Add Amazon Annapurna Labs PCIe host controller driver (Jonathan Chocron) - Fix Tegra MSI target allocation so DMA doesn't generate unwanted MSIs (Vidya Sagar) - Fix of_node reference leaks (Wen Yang) - Fix Hyper-V module unload & device removal issues (Dexuan Cui) - Cleanup R-Car driver (Marek Vasut) - Cleanup Keystone driver (Kishon Vijay Abraham I) - Cleanup i.MX6 driver (Andrey Smirnov) Significant bug fixes: - Reset Lenovo ThinkPad P50 GPU so nouveau works after reboot (Lyude Paul) - Fix Switchtec firmware update performance issue (Wesley Sheng) - Work around Pericom switch link retraining erratum (Stefan Mätje)" * tag 'pci-v5.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (141 commits) MAINTAINERS: Add Karthikeyan Mitran and Hou Zhiqiang for Mobiveil PCI PCI: pciehp: Remove pointless MY_NAME definition PCI: pciehp: Remove pointless PCIE_MODULE_NAME definition PCI: pciehp: Remove unused dbg/err/info/warn() wrappers PCI: pciehp: Log messages with pci_dev, not pcie_device PCI: pciehp: Replace pciehp_debug module param with dyndbg PCI: pciehp: Remove pciehp_debug uses PCI/AER: Log messages with pci_dev, not pcie_device PCI/DPC: Log messages with pci_dev, not pcie_device PCI/PME: Replace dev_printk(KERN_DEBUG) with dev_info() PCI/AER: Replace dev_printk(KERN_DEBUG) with dev_info() PCI: Replace dev_printk(KERN_DEBUG) with dev_info(), etc PCI: Replace printk(KERN_INFO) with pr_info(), etc PCI: Use dev_printk() when possible PCI: Cleanup setup-bus.c comments and whitespace PCI: imx6: Allow asynchronous probing PCI: dwc: Save root bus for driver remove hooks PCI: dwc: Use devm_pci_alloc_host_bridge() to simplify code PCI: dwc: Free MSI in dw_pcie_host_init() error path PCI: dwc: Free MSI IRQ page in dw_pcie_free_msi() ...
2019-05-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things and hotfixes - ocfs2 - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (139 commits) kernel/memremap.c: remove the unused device_private_entry_fault() export mm: delete find_get_entries_tag mm/huge_memory.c: make __thp_get_unmapped_area static mm/mprotect.c: fix compilation warning because of unused 'mm' variable mm/page-writeback: introduce tracepoint for wait_on_page_writeback() mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags mm/Kconfig: update "Memory Model" help text mm/vmscan.c: don't disable irq again when count pgrefill for memcg mm: memblock: make keeping memblock memory opt-in rather than opt-out hugetlbfs: always use address space in inode for resv_map pointer mm/z3fold.c: support page migration mm/z3fold.c: add structure for buddy handles mm/z3fold.c: improve compression by extending search mm/z3fold.c: introduce helper functions mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig mm/vmscan.c: simplify shrink_inactive_list() fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback xen/privcmd-buf.c: convert to use vm_map_pages_zero() xen/gntdev.c: convert to use vm_map_pages() ...
2019-05-14mm: memblock: make keeping memblock memory opt-in rather than opt-outMike Rapoport
Most architectures do not need the memblock memory after the page allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the arch Kconfig. Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the logic makes it clear which architectures actually use memblock after system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK to the architectures that are still missing that option. Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never failDavid Hildenbrand
All callers of arch_remove_memory() ignore errors. And we should really try to remove any errors from the memory removal path. No more errors are reported from __remove_pages(). BUG() in s390x code in case arch_remove_memory() is triggered. We may implement that properly later. WARN in case powerpc code failed to remove the section mapping, which is better than ignoring the error completely right now. Link: http://lkml.kernel.org/r/20190409100148.24703-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Stefan Agner <stefan@agner.ch> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Rob Herring <robh@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Qian Cai <cai@lca.pw> Cc: Mathieu Malaterre <malat@debian.org> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mike Travis <mike.travis@hpe.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm, memory_hotplug: provide a more generic restrictions for memory hotplugMichal Hocko
arch_add_memory, __add_pages take a want_memblock which controls whether the newly added memory should get the sysfs memblock user API (e.g. ZONE_DEVICE users do not want/need this interface). Some callers even want to control where do we allocate the memmap from by configuring altmap. Add a more generic hotplug context for arch_add_memory and __add_pages. struct mhp_restrictions contains flags which contains additional features to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and altmap for alternative memmap allocator. This patch shouldn't introduce any functional change. [akpm@linux-foundation.org: build fix] Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.de Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14hugetlb: allow to free gigantic pages regardless of the configurationAlexandre Ghiti
On systems without CONTIG_ALLOC activated but that support gigantic pages, boottime reserved gigantic pages can not be freed at all. This patch simply enables the possibility to hand back those pages to memory allocator. Link: http://lkml.kernel.org/r/20190327063626.18421-5-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: David S. Miller <davem@davemloft.net> [sparc] Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm: simplify MEMORY_ISOLATION && COMPACTION || CMA into CONTIG_ALLOCAlexandre Ghiti
This condition allows to define alloc_contig_range, so simplify it into a more accurate naming. Link: http://lkml.kernel.org/r/20190327063626.18421-4-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14Merge branch 'x86-mds-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MDS mitigations from Thomas Gleixner: "Microarchitectural Data Sampling (MDS) is a hardware vulnerability which allows unprivileged speculative access to data which is available in various CPU internal buffers. This new set of misfeatures has the following CVEs assigned: CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory MDS attacks target microarchitectural buffers which speculatively forward data under certain conditions. Disclosure gadgets can expose this data via cache side channels. Contrary to other speculation based vulnerabilities the MDS vulnerability does not allow the attacker to control the memory target address. As a consequence the attacks are purely sampling based, but as demonstrated with the TLBleed attack samples can be postprocessed successfully. The mitigation is to flush the microarchitectural buffers on return to user space and before entering a VM. It's bolted on the VERW instruction and requires a microcode update. As some of the attacks exploit data structures shared between hyperthreads, full protection requires to disable hyperthreading. The kernel does not do that by default to avoid breaking unattended updates. The mitigation set comes with documentation for administrators and a deeper technical view" * 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/speculation/mds: Fix documentation typo Documentation: Correct the possible MDS sysfs values x86/mds: Add MDSUM variant to the MDS documentation x86/speculation/mds: Add 'mitigations=' support for MDS x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off x86/speculation/mds: Fix comment x86/speculation/mds: Add SMT warning message x86/speculation: Move arch_smt_update() call to after mitigation decisions x86/speculation/mds: Add mds=full,nosmt cmdline option Documentation: Add MDS vulnerability documentation Documentation: Move L1TF to separate directory x86/speculation/mds: Add mitigation mode VMWERV x86/speculation/mds: Add sysfs reporting for MDS x86/speculation/mds: Add mitigation control for MDS x86/speculation/mds: Conditionally clear CPU buffers on idle entry x86/kvm/vmx: Add MDS protection when L1D Flush is not active x86/speculation/mds: Clear CPU buffers on exit to user x86/speculation/mds: Add mds_clear_cpu_buffers() x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests x86/speculation/mds: Add BUG_MSBDS_ONLY ...
2019-05-14perf/x86/intel: Allow PEBS multi-entry in watermark modeStephane Eranian
This patch fixes a restriction/bug introduced by: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") The original patch prevented using multi-entry PEBS when wakeup_events != 0. However given that wakeup_events is part of a union with wakeup_watermark, it means that in watermark mode, PEBS multi-entry is also disabled which is not the intent. This patch fixes this by checking is watermark mode is enabled. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: vincent.weaver@maine.edu Fixes: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") Link: http://lkml.kernel.org/r/20190514003400.224340-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-13x86/kconfig: Disable CONFIG_GENERIC_HWEIGHT and remove __HAVE_ARCH_SW_HWEIGHTMasahiro Yamada
Remove an unnecessary arch complication: arch/x86/include/asm/arch_hweight.h uses __sw_hweight{32,64} as alternatives, and they are implemented in arch/x86/lib/hweight.S x86 does not rely on the generic C implementation lib/hweight.c at all, so CONFIG_GENERIC_HWEIGHT should be disabled. __HAVE_ARCH_SW_HWEIGHT is not necessary either. No change in functionality intended. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uros Bizjak <ubizjak@gmail.com> Link: http://lkml.kernel.org/r/1557665521-17570-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-11x86: Hide the int3_emulate_call/jmp functions from UMLSteven Rostedt (VMware)
User Mode Linux does not have access to the ip or sp fields of the pt_regs, and accessing them causes UML to fail to build. Hide the int3_emulate_jmp() and int3_emulate_call() instructions from UML, as it doesn't need them anyway. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-10livepatch: Remove klp_check_compiler_support()Jiri Kosina
The only purpose of klp_check_compiler_support() is to make sure that we are not using ftrace on x86 via mcount (because that's executed only after prologue has already happened, and that's too late for livepatching purposes). Now that mcount is not supported by ftrace any more, there is no need for klp_check_compiler_support() either. Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1905102346100.17054@cbobk.fhfr.pm Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-10ftrace/x86: Remove mcount supportSteven Rostedt (VMware)
There's two methods of enabling function tracing in Linux on x86. One is with just "gcc -pg" and the other is "gcc -pg -mfentry". The former will use calls to a special function "mcount" after the frame is set up in all C functions. The latter will add calls to a special function called "fentry" as the very first instruction of all C functions. At compile time, there is a check to see if gcc supports, -mfentry, and if it does, it will use that, because it is more versatile and less error prone for function tracing. Starting with v4.19, the minimum gcc supported to build the Linux kernel, was raised to version 4.6. That also happens to be the first gcc version to support -mfentry. Since on x86, using gcc versions from 4.6 and beyond will unconditionally enable the -mfentry, it will no longer use mcount as the method for inserting calls into the C functions of the kernel. This means that there is no point in continuing to maintain mcount in x86. Remove support for using mcount. This makes the code less complex, and will also allow it to be simplified in the future. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-10ftrace/x86_32: Remove support for non DYNAMIC_FTRACESteven Rostedt (VMware)
When DYNAMIC_FTRACE is enabled in the kernel, all the functions that can be traced by the function tracer have a "nop" placeholder at the start of the function. When function tracing is enabled, the nop is converted into a call to the tracing infrastructure where the functions get traced. This also allows for specifying specific functions to trace, and a lot of infrastructure is built on top of this. When DYNAMIC_FTRACE is not enabled, all the functions have a call to the ftrace trampoline. A check is made to see if a function pointer is the ftrace_stub or not, and if it is not, it calls the function pointer to trace the code. This adds over 10% overhead to the kernel even when tracing is disabled. When an architecture supports DYNAMIC_FTRACE there really is no reason to use the static tracing. I have kept non DYNAMIC_FTRACE available in x86 so that the generic code for non DYNAMIC_FTRACE can be tested. There is no reason to support non DYNAMIC_FTRACE for both x86_64 and x86_32. As the non DYNAMIC_FTRACE for x86_32 does not even support fentry, and we want to remove mcount completely, there's no reason to keep non DYNAMIC_FTRACE around for x86_32. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-10cpufreq: Call transition notifier only once for each policyViresh Kumar
Currently, the notifiers are called once for each CPU of the policy->cpus cpumask. It would be more optimal if the notifier can be called only once and all the relevant information be provided to it. Out of the 23 drivers that register for the transition notifiers today, only 4 of them do per-cpu updates and the callback for the rest can be called only once for the policy without any impact. This would also avoid multiple function calls to the notifier callbacks and reduce multiple iterations of notifier core's code (which does locking as well). This patch adds pointer to the cpufreq policy to the struct cpufreq_freqs, so the notifier callback has all the information available to it with a single call. The five drivers which perform per-cpu updates are updated to use the cpufreq policy. The freqs->cpu field is redundant now and is removed. Acked-by: David S. Miller <davem@davemloft.net> (sparc) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-05-10x86: intel_epb: Take CONFIG_PM into accountRafael J. Wysocki
Commit b9c273babce7 ("PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface") caused kernels built with CONFIG_PM unset to crash on systems supporting the Performance and Energy Bias Hint (EPB), because it attempts to add files to sysfs directories that don't exist on those systems. Prevent that from happening by taking CONFIG_PM into account so that the code depending on it is not compiled at all when it is not set. Fixes: b9c273babce7 ("PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface") Reported-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2019-05-10perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* maskingStephane Eranian
On Intel Westmere, a cmdline as follows: $ perf record -e cpu/event=0xc4,umask=0x2,name=br_inst_retired.near_call/p .... was failing. Yet the event+ umask support PEBS. It turns out this is due to a bug in the the PEBS event constraint table for westmere. All forms of BR_INST_RETIRED.* support PEBS. Therefore the constraint mask should ignore the umask. The name of the macro INTEL_FLAGS_EVENT_CONSTRAINT() hint that this is the case but it was not. That macros was checking both the event code and event umask. Therefore, it was only matching on 0x00c4. There are code+umask macros, they all have *UEVENT*. This bug fixes the issue by checking only the event code in the mask. Both single and range version are modified. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/20190509214556.123493-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-09Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull intgrity updates from James Morris: "This contains just three patches, the remainder were either included in other pull requests (eg. audit, lockdown) or will be upstreamed via other subsystems (eg. kselftests, Power). Included here is one bug fix, one documentation update, and extending the x86 IMA arch policy rules to coordinate the different kernel module signature verification methods" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: doc/kernel-parameters.txt: Deprecate ima_appraise_tcb x86/ima: add missing include x86/ima: require signed kernel modules
2019-05-09Merge tag 'dma-mapping-5.2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull DMA mapping updates from Christoph Hellwig: - remove the already broken support for NULL dev arguments to the DMA API calls - Kconfig tidyups * tag 'dma-mapping-5.2' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: add a Kconfig symbol to indicate arch_dma_prep_coherent presence dma-mapping: remove an unnecessary NULL check x86/dma: Remove the x86_dma_fallback_dev hack dma-mapping: remove leftover NULL device support arm: use a dummy struct device for ISA DMA use of the DMA API pxa3xx-gcu: pass struct device to dma_mmap_coherent gbefb: switch to managed version of the DMA allocator da8xx-fb: pass struct device to DMA API functions parport_ip32: pass struct device to DMA API functions dma: select GENERIC_ALLOCATOR for DMA_REMAP
2019-05-09x86/mpx, mm/core: Fix recursive munmap() corruptionDave Hansen
This is a bit of a mess, to put it mildly. But, it's a bug that only seems to have showed up in 4.20 but wasn't noticed until now, because nobody uses MPX. MPX has the arch_unmap() hook inside of munmap() because MPX uses bounds tables that protect other areas of memory. When memory is unmapped, there is also a need to unmap the MPX bounds tables. Barring this, unused bounds tables can eat 80% of the address space. But, the recursive do_munmap() that gets called vi arch_unmap() wreaks havoc with __do_munmap()'s state. It can result in freeing populated page tables, accessing bogus VMA state, double-freed VMAs and more. See the "long story" further below for the gory details. To fix this, call arch_unmap() before __do_unmap() has a chance to do anything meaningful. Also, remove the 'vma' argument and force the MPX code to do its own, independent VMA lookup. == UML / unicore32 impact == Remove unused 'vma' argument to arch_unmap(). No functional change. I compile tested this on UML but not unicore32. == powerpc impact == powerpc uses arch_unmap() well to watch for munmap() on the VDSO and zeroes out 'current->mm->context.vdso_base'. Moving arch_unmap() makes this happen earlier in __do_munmap(). But, 'vdso_base' seems to only be used in perf and in the signal delivery that happens near the return to userspace. I can not find any likely impact to powerpc, other than the zeroing happening a little earlier. powerpc does not use the 'vma' argument and is unaffected by its removal. I compile-tested a 64-bit powerpc defconfig. == x86 impact == For the common success case this is functionally identical to what was there before. For the munmap() failure case, it's possible that some MPX tables will be zapped for memory that continues to be in use. But, this is an extraordinarily unlikely scenario and the harm would be that MPX provides no protection since the bounds table got reset (zeroed). I can't imagine anyone doing this: ptr = mmap(); // use ptr ret = munmap(ptr); if (ret) // oh, there was an error, I'll // keep using ptr. Because if you're doing munmap(), you are *done* with the memory. There's probably no good data in there _anyway_. This passes the original reproducer from Richard Biener as well as the existing mpx selftests/. The long story: munmap() has a couple of pieces: 1. Find the affected VMA(s) 2. Split the start/end one(s) if neceesary 3. Pull the VMAs out of the rbtree 4. Actually zap the memory via unmap_region(), including freeing page tables (or queueing them to be freed). 5. Fix up some of the accounting (like fput()) and actually free the VMA itself. This specific ordering was actually introduced by: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") during the 4.20 merge window. The previous __do_munmap() code was actually safe because the only thing after arch_unmap() was remove_vma_list(). arch_unmap() could not see 'vma' in the rbtree because it was detached, so it is not even capable of doing operations unsafe for remove_vma_list()'s use of 'vma'. Richard Biener reported a test that shows this in dmesg: [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551 [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576 What triggered this was the recursive do_munmap() called via arch_unmap(). It was freeing page tables that has not been properly zapped. But, the problem was bigger than this. For one, arch_unmap() can free VMAs. But, the calling __do_munmap() has variables that *point* to VMAs and obviously can't handle them just getting freed while the pointer is still in use. I tried a couple of things here. First, I tried to fix the page table freeing problem in isolation, but I then found the VMA issue. I also tried having the MPX code return a flag if it modified the rbtree which would force __do_munmap() to re-walk to restart. That spiralled out of control in complexity pretty fast. Just moving arch_unmap() and accepting that the bonkers failure case might eat some bounds tables seems like the simplest viable fix. This was also reported in the following kernel bugzilla entry: https://bugzilla.kernel.org/show_bug.cgi?id=203123 There are some reports that this commit triggered this bug: dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap") While that commit certainly made the issues easier to hit, I believe the fundamental issue has been with us as long as MPX itself, thus the Fixes: tag below is for one of the original MPX commits. [ mingo: Minor edits to the changelog and the patch. ] Reported-by: Richard Biener <rguenther@suse.de> Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-um@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@vger.kernel.org Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-08Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "This has two exciting community drivers for ARM Mali accelerators. Since ARM has never been open source friendly on the GPU side of the house, the community has had to create open source drivers for the Mali GPUs. Lima covers the older t4xx and panfrost the newer 6xx/7xx series. Well done to all involved and hopefully this will help ARM head in the right direction. There is also now the ability if you don't have any of the legacy drivers enabled (pre-KMS) to remove all the pre-KMS support code from the core drm, this saves 10% or so in codesize on my machine. i915 also enable Icelake/Elkhart Lake Gen11 GPUs by default, vboxvideo moves out of staging. There are also some rcar-du patches which crossover with media tree but all should be acked by Mauro. Summary: uapi changes: - Colorspace connector property - fourcc - new YUV formts - timeline sync objects initially merged - expose FB_DAMAGE_CLIPS to atomic userspace new drivers: - vboxvideo: moved out of staging - aspeed: ASPEED SoC BMC chip display support - lima: ARM Mali4xx GPU acceleration driver support - panfrost: ARM Mali6xx/7xx Midgard/Bitfrost acceleration driver support core: - component helper docs - unplugging fixes - devm device init - MIPI/DSI rate control - shmem backed gem objects - connector, display_info, edid_quirks cleanups - dma_buf fence chain support - 64-bit dma-fence seqno comparison fixes - move initial fb config code to core - gem fence array helpers for Lima - ability to remove legacy support code if no drivers requires it (removes 10% of drm.ko size) - lease fixes ttm: - unified DRM_FILE_PAGE_OFFSET handling - Account for kernel allocations in kernel zone only panel: - OSD070T1718-19TS panel support - panel-tpo-td028ttec1 backlight support - Ronbo RB070D30 MIPI/DSI - Feiyang FY07024DI26A30-D MIPI-DSI panel - Rocktech jh057n00900 MIPI-DSI panel i915: - Comet Lake (Gen9) PCI IDs - Updated Icelake PCI IDs - Elkhartlake (Gen11) support - DP MST property addtions - plane and watermark fixes - Icelake port sync and VEBOX disable fixes - struct_mutex usage reduction - Icelake gamma fix - GuC reset fixes - make mmap more asynchronous - sound display power well race fixes - DDI/MIPI-DSI clocks for Icelake - Icelake RPS frequency changing support - Icelake workarounds amdgpu: - Use HMM for userptr - vega20 experimental smu11 support - RAS support for vega20 - BACO support for vega12 + fixes for vega20 - reworked IH interrupt handling - amdkfd RAS support - Freesync improvements - initial timeline sync object support - DC Z ordering fixes - NV12 planes support - colorspace properties for planes= - eDP opts if eDP already initialized nouveau: - misc fixes etnaviv: - misc fixes msm: - GPU zap shader support expansion - robustness ABI addition exynos: - Logging cleanups tegra: - Shared reset fix - CPU cache maintenance fix cirrus: - driver rewritten using simple helpers meson: - G12A support vmwgfx: - Resource dirtying management improvements - Userspace logging improvements virtio: - PRIME fixes rockchip: - rk3066 hdmi support sun4i: - DSI burst mode support vc4: - load tracker to detect underflow v3d: - v3d v4.2 support malidp: - initial Mali D71 support in komeda driver tfp410: - omap related improvement omapdrm: - drm bridge/panel support - drop some omap specific panels rcar-du: - Display writeback support" * tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm: (1507 commits) drm/msm/a6xx: No zap shader is not an error drm/cma-helper: Fix drm_gem_cma_free_object() drm: Fix timestamp docs for variable refresh properties. drm/komeda: Mark the local functions as static drm/komeda: Fixed warning: Function parameter or member not described drm/komeda: Expose bus_width to Komeda-CORE drm/komeda: Add sysfs attribute: core_id and config_id drm: add non-desktop quirk for Valve HMDs drm/panfrost: Show stored feature registers drm/panfrost: Don't scream about deferred probe drm/panfrost: Disable PM on probe failure drm/panfrost: Set DMA masks earlier drm/panfrost: Add sanity checks to submit IOCTL drm/etnaviv: initialize idle mask before querying the HW db drm: introduce a capability flag for syncobj timeline support drm: report consistent errors when checking syncobj capibility drm/nouveau/nouveau: forward error generated while resuming objects tree drm/nouveau/fb/ramgk104: fix spelling mistake "sucessfully" -> "successfully" drm/nouveau/i2c: Disable i2c bus access after ->fini() drm/nouveau: Remove duplicate ACPI_VIDEO_NOTIFY_PROBE definition ...
2019-05-08x86/mm: Do not use set_{pud, pmd}_safe() when splitting a large pageBrijesh Singh
The commit 0a9fe8ca844d ("x86/mm: Validate kernel_physical_mapping_init() PTE population") triggers this warning in SEV guests: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:87 phys_pmd_init+0x30d/0x386 Call Trace: kernel_physical_mapping_init+0xce/0x259 early_set_memory_enc_dec+0x10f/0x160 kvm_smp_prepare_boot_cpu+0x71/0x9d start_kernel+0x1c9/0x50b secondary_startup_64+0xa4/0xb0 A SEV guest calls kernel_physical_mapping_init() to clear the encryption mask from an existing mapping. While doing so, it also splits large pages into smaller. To split a page, kernel_physical_mapping_init() allocates a new page and updates the existing entry. The set_{pud,pmd}_safe() helpers trigger a warning when updating an entry with a page in the present state. Add a new kernel_physical_mapping_change() helper which uses the non-safe variants of set_{pmd,pud,p4d}() and {pmd,pud,p4d}_populate() routines when updating the entry. Since kernel_physical_mapping_change() may replace an existing entry with a new entry, the caller is responsible to flush the TLB at the end. Change early_set_memory_enc_dec() to use kernel_physical_mapping_change() when it wants to clear the memory encryption mask from the page table entry. [ bp: - massage commit message. - flesh out comment according to dhansen's request. - align function arguments at opening brace. ] Fixes: 0a9fe8ca844d ("x86/mm: Validate kernel_physical_mapping_init() PTE population") Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190417154102.22613-1-brijesh.singh@amd.com
2019-05-08ftrace/x86_64: Emulate call function while updating in breakpoint handlerPeter Zijlstra
Nicolai Stange discovered[1] that if live kernel patching is enabled, and the function tracer started tracing the same function that was patched, the conversion of the fentry call site during the translation of going from calling the live kernel patch trampoline to the iterator trampoline, would have as slight window where it didn't call anything. As live kernel patching depends on ftrace to always call its code (to prevent the function being traced from being called, as it will redirect it). This small window would allow the old buggy function to be called, and this can cause undesirable results. Nicolai submitted new patches[2] but these were controversial. As this is similar to the static call emulation issues that came up a while ago[3]. But after some debate[4][5] adding a gap in the stack when entering the breakpoint handler allows for pushing the return address onto the stack to easily emulate a call. [1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de [2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de [3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com [4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com [5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_w@mail.gmail.com [ Live kernel patching is not implemented on x86_32, thus the emulate calls are only for x86_64. ] Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolai Stange <nstange@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: the arch/x86 maintainers <x86@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nayna Jain <nayna@linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Changed to only implement emulated calls for x86_64 ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08x86_64: Allow breakpoints to emulate call instructionsPeter Zijlstra
In order to allow breakpoints to emulate call instructions, they need to push the return address onto the stack. The x86_64 int3 handler adds a small gap to allow the stack to grow some. Use this gap to add the return address to be able to emulate a call instruction at the breakpoint location. These helper functions are added: int3_emulate_jmp(): changes the location of the regs->ip to return there. (The next two are only for x86_64) int3_emulate_push(): to push the address onto the gap in the stack int3_emulate_call(): push the return address and change regs->ip Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolai Stange <nstange@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: the arch/x86 maintainers <x86@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nayna Jain <nayna@linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Modified to only work for x86_64 and added comment to int3_emulate_push() ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08x86_64: Add gap to int3 to allow for call emulationJosh Poimboeuf
To allow an int3 handler to emulate a call instruction, it must be able to push a return address onto the stack. Add a gap to the stack to allow the int3 handler to push the return address and change the return from int3 to jump straight to the emulated called function target. Link: http://lkml.kernel.org/r/20181130183917.hxmti5josgq4clti@treble Link: http://lkml.kernel.org/r/20190502162133.GX2623@hirez.programming.kicks-ass.net [ Note, this is needed to allow Live Kernel Patching to not miss calling a patched function when tracing is enabled. -- Steven Rostedt ] Cc: stable@vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks completeAaron Lewis
nested_run_pending=1 implies we have successfully entered guest mode. Move setting from external state in vmx_set_nested_state() until after all other checks are complete. Based on a patch by Aaron Lewis. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-08KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting ↵Aaron Lewis
new state Move call to nested_enable_evmcs until after free_nested() is complete. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>