summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-10-22KVM: PPC: Book3S HV: Reuse kvmppc_inject_interrupt for async guest deliveryNicholas Piggin
This consolidates the HV interrupt delivery logic into one place. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S: Replace reset_msr mmu op with inject_interrupt arch opNicholas Piggin
reset_msr sets the MSR for interrupt injection, but it's cleaner and more flexible to provide a single op to set both MSR and PC for the interrupt. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S: Define and use SRR1_MSR_BITSNicholas Piggin
Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPsGreg Kurz
Add a new attribute to both XIVE and XICS-on-XIVE KVM devices so that userspace can tell how many interrupt servers it needs. If a VM needs less than the current default of KVM_MAX_VCPUS (2048), we can allocate less VPs in OPAL. Combined with a core stride (VSMT) that matches the number of guest threads per core, this may substantially increases the number of VMs that can run concurrently with an in-kernel XIVE device. Since the legacy XIVE KVM device is exposed to userspace through the XICS KVM API, a new attribute group is added to it for this purpose. While here, fix the syntax of the existing KVM_DEV_XICS_GRP_SOURCES in the XICS documentation. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Make VP block size configurableGreg Kurz
The XIVE VP is an internal structure which allow the XIVE interrupt controller to maintain the interrupt context state of vCPUs non dispatched on HW threads. When a guest is started, the XIVE KVM device allocates a block of XIVE VPs in OPAL, enough to accommodate the highest possible vCPU id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048). With a guest's core stride of 8 and a threading mode of 1 (QEMU's default), a VM must run at least 256 vCPUs to actually need such a range of VPs. A POWER9 system has a limited XIVE VP space : 512k and KVM is currently wasting this HW resource with large VP allocations, especially since a typical VM likely runs with a lot less vCPUs. Make the size of the VP block configurable. Add an nr_servers field to the XIVE structure and a function to set it for this purpose. Split VP allocation out of the device create function. Since the VP block isn't used before the first vCPU connects to the XIVE KVM device, allocation is now performed by kvmppc_xive_connect_vcpu(). This gives the opportunity to set nr_servers in between: kvmppc_xive_create() / kvmppc_xive_native_create() . . kvmppc_xive_set_nr_servers() . . kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu() The connect_vcpu() functions check that the vCPU id is below nr_servers and if it is the first vCPU they allocate the VP block. This is protected against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers() with the xive->lock mutex. Also, the block is allocated once for the device lifetime: nr_servers should stay constant otherwise connect_vcpu() could generate a boggus VP id and likely crash OPAL. It is thus forbidden to update nr_servers once the block is allocated. If the VP allocation fail, return ENOSPC which seems more appropriate to report the depletion of system wide HW resource than ENOMEM or ENXIO. A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence only need 256 VPs instead of 2048. If the stride is set to match the number of threads per core, this goes further down to 32. This will be exposed to userspace by a subsequent patch. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helperGreg Kurz
Reduce code duplication by consolidating the checking of vCPU ids and VP ids to a common helper used by both legacy and native XIVE KVM devices. And explain the magic with a comment. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Show VP id in debugfsGreg Kurz
Print out the VP id of each connected vCPU, this allow to see: - the VP block base in which OPAL encodes information that may be useful when debugging - the packed vCPU id which may differ from the raw vCPU id if the latter is >= KVM_MAX_VCPUS (2048) Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocatedGreg Kurz
If we cannot allocate the XIVE VPs in OPAL, the creation of a XIVE or XICS-on-XIVE device is aborted as expected, but we leave kvm->arch.xive set forever since the release method isn't called in this case. Any subsequent tentative to create a XIVE or XICS-on-XIVE for this VM will thus always fail (DoS). This is a problem for QEMU since it destroys and re-creates these devices when the VM is reset: the VM would be restricted to using the much slower emulated XIVE or XICS forever. As an alternative to adding rollback, do not assign kvm->arch.xive before making sure the XIVE VPs are allocated in OPAL. Cc: stable@vger.kernel.org # v5.2 Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: E500: Replace current->mm by kvm->mmLeonardo Bras
Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Reduce calls to get current->mm by storing the value locallyLeonardo Bras
Reduces the number of calls to get_current() in order to get the value of current->mm by doing it once and storing the value, since it is not supposed to change inside the same process). Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-21tracing: Fix race in perf_trace_buf initializationPrateek Sood
A race condition exists while initialiazing perf_trace_buf from perf_trace_init() and perf_kprobe_init(). CPU0 CPU1 perf_trace_init() mutex_lock(&event_mutex) perf_trace_event_init() perf_trace_event_reg() total_ref_count == 0 buf = alloc_percpu() perf_trace_buf[i] = buf tp_event->class->reg() //fails perf_kprobe_init() goto fail perf_trace_event_init() perf_trace_event_reg() fail: total_ref_count == 0 total_ref_count == 0 buf = alloc_percpu() perf_trace_buf[i] = buf tp_event->class->reg() total_ref_count++ free_percpu(perf_trace_buf[i]) perf_trace_buf[i] = NULL Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0, causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should fix this race. The race caused the following bug: Unable to handle kernel paging request at virtual address 0000003106f2003c Mem abort info: ESR = 0x96000045 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000045 CM = 0, WnR = 1 user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000 [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000045 [#1] PREEMPT SMP Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000) pstate: 80400005 (Nzcv daif +PAN -UAO) pc : __memset+0x20/0x1ac lr : memset+0x3c/0x50 sp : ffffffc09319fc50 __memset+0x20/0x1ac perf_trace_buf_alloc+0x140/0x1a0 perf_trace_sys_enter+0x158/0x310 syscall_trace_enter+0x348/0x7c0 el0_svc_common+0x11c/0x368 el0_svc_handler+0x12c/0x198 el0_svc+0x8/0xc Ramdumps showed the following: total_ref_count = 3 perf_trace_buf = ( 0x0 -> NULL, 0x0 -> NULL, 0x0 -> NULL, 0x0 -> NULL) Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org Cc: stable@vger.kernel.org Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU") Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Prateek Sood <prsood@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-22x86/cpu/vmware: Fix platform detection VMWARE_PORT macroThomas Hellstrom
The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT definition, but expects it to be an integer. However, when it was moved to the new vmware.h include file, it was changed to be a string to better fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the platform detection VMWARE_PORT functionality. Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB definitions to be integers, and use __stringify() for their stringified form when needed. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas_os@shipmail.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvmThomas Hellstrom
LLVM's assembler doesn't accept the short form INL instruction: inl (%%dx) but instead insists on the output register to be explicitly specified. This was previously fixed for the VMWARE_PORT macro. Fix it also for the VMWARE_HYPERCALL macro. Suggested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: clang-built-linux@googlegroups.com Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas_os@shipmail.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-21xdp: Handle device unregister for devmap_hash map typeToke Høiland-Jørgensen
It seems I forgot to add handling of devmap_hash type maps to the device unregister hook for devmaps. This omission causes devices to not be properly released, which causes hangs. Fix this by adding the missing handler. Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191019111931.2981954-1-toke@redhat.com
2019-10-21Merge tag 'arm-soc/for-5.4/devicetree-fixes-part2' of ↵Olof Johansson
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoC Device Tree fixes for 5.4, please pull the following: - Stefan removes the activity LED node from the CM3 DTS since there is no driver for that LED yet and leds-gpio cannot drive it either * tag 'arm-soc/for-5.4/devicetree-fixes-part2' of https://github.com/Broadcom/stblinux: ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue Link: https://lore.kernel.org/r/20191021194302.21024-1-f.fainelli@gmail.com Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21Merge tag 'arm-soc/for-5.4/devicetree-fixes' of ↵Olof Johansson
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoCs Device Tree fixes for 5.4, please pull the following: - Stefan fixes the MMC controller bus-width property for the Raspberry Pi Zero Wireless which was incorrect after a prior refactoring * tag 'arm-soc/for-5.4/devicetree-fixes' of https://github.com/Broadcom/stblinux: ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci Link: https://lore.kernel.org/r/20191015172356.9650-1-f.fainelli@gmail.com Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21Merge tag 'davinci-fixes-for-v5.4' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into arm/fixes DaVinci fixes for v5.4 ====================== * fix GPIO backlight support on DA850 by enabling the needed config in davinci_all_defconfig. This is a fix because the driver and board support got converted to use BACKLIGHT_GPIO driver, but defconfig update is still missing in v5.4. * fix for McBSP DMA on DM365 * tag 'davinci-fixes-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: davinci_all_defconfig: enable GPIO backlight ARM: davinci: dm365: Fix McBSP dma_slave_map entry Link: https://lore.kernel.org/r/7f3393f9-59be-a2d4-c1e1-ba6e407681d1@ti.com Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21Merge tag 'v5.4-rockchip-dtsfixes1' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes A number of fixes for individual boards like the rockpro64, and Hugsun X99 as well as a fix for the Gru-Kevin display override and fixing the dt- binding for Theobroma boards to the correct naming that is also actually used in the wild. * tag 'v5.4-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Fix override mode for rk3399-kevin panel arm64: dts: rockchip: Fix usb-c on Hugsun X99 TV Box arm64: dts: rockchip: fix RockPro64 sdmmc settings arm64: dts: rockchip: fix RockPro64 sdhci settings arm64: dts: rockchip: fix RockPro64 vdd-log regulator settings dt-bindings: arm: rockchip: fix Theobroma-System board bindings arm64: dts: rockchip: fix Rockpro64 RK808 interrupt line Link: https://lore.kernel.org/r/1599050.HRXuSXmxRg@phil Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21Merge tag 'imx-fixes-5.4' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.4: - Re-enable SNVS power key for imx6q-logicpd board which was accidentally disabled by a SoC level change. - Fix I2C switches on vf610-zii-scu4-aib board by specifying property i2c-mux-idle-disconnect. - A fix on imx-scu API that reads UID from firmware to avoid kernel NULL pointer dump. - A series from Anson to correct i.MX7 GPT and i.MX8 USDHC IPG clock. - A fix on DRM_MSM Kconfig regression on i.MX5 by adding the option explicitly into imx_v6_v7_defconfig. - Fix ARM regulator states issue for zii-ultra board, which is impacting stability of the board. - A correction on CPU core idle state name for LayerScape LX2160A SoC. * tag 'imx-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk ARM: dts: imx7s: Correct GPT's ipg clock source ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect' ARM: dts: imx6q-logicpd: Re-Enable SNVS power key arm64: dts: lx2160a: Correct CPU core idle state name arm64: dts: zii-ultra: fix ARM regulator states soc: imx: imx-scu: Getting UID from SCU should have response Link: https://lore.kernel.org/r/20191017141851.GA22506@dragon Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21Merge tag 'omap-for-v5.4/fixes-rc3-signed' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Fixes for omaps for v5.4-rc cycle More fixes for omap variants: - Update more panel options in omap2plus_defconfig that got changed as we moved to use generic LCD panels - Remove unused twl_keypad for logicpd-torpedo-som to avoid boot time warnings. This is only a cosmetic fix, but at least dmesg output is now getting more readable after all the fixes to remove pointless warnings - Fix gpu_cm node name as we still have a non-standard node name dependency for clocks. This should eventually get fixed by use of domain specific compatible property - Fix use of i2c-mux-idle-disconnect for m3874-iceboard - Use level interrupt for omap4 & 5 wlcore to avoid lost edge interrupts * tag 'omap-for-v5.4/fixes-rc3-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: Use level interrupt for omap4 & 5 wlcore ARM: dts: am3874-iceboard: Fix 'i2c-mux-idle-disconnect' usage ARM: dts: omap5: fix gpu_cm clock provider name ARM: dts: logicpd-torpedo-som: Remove twl_keypad ARM: omap2plus_defconfig: Fix selected panels after generic panel changes Link: https://lore.kernel.org/r/pull-1571242890-118432@atomide.com Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21r8152: add device id for Lenovo ThinkPad USB-C Dock Gen 2Kazutoshi Noguchi
This device is sold as 'ThinkPad USB-C Dock Gen 2 (40AS)'. Chipset is RTL8153 and works with r8152. Without this, the generic cdc_ether grabs the device, and the device jam connected networks up when the machine suspends. Signed-off-by: Kazutoshi Noguchi <noguchi.kazutosi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21Merge tag 'arm-soc/for-5.4/devicetree-arm64-fixes' of ↵Olof Johansson
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM64-based SoCs Device Tree fixes for 5.4, please pull the following: - Rayangonda fixes the GPIO pins assignment for the Stringray SoCs * tag 'arm-soc/for-5.4/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux: arm64: dts: Fix gpio to pinmux mapping Link: https://lore.kernel.org/r/20191015172356.9650-2-f.fainelli@gmail.com Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-21arm64: Retrieve stolen time as paravirtualized guestSteven Price
Enable paravirtualization features when running under a hypervisor supporting the PV_TIME_ST hypercall. For each (v)CPU, we ask the hypervisor for the location of a shared page which the hypervisor will use to report stolen time to us. We set pv_time_ops to the stolen time function which simply reads the stolen value from the shared page for a VCPU. We guarantee single-copy atomicity using READ_ONCE which means we can also read the stolen time for another VCPU than the currently running one while it is potentially being updated by the hypervisor. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21arm/arm64: Make use of the SMCCC 1.1 wrapperSteven Price
Rather than directly choosing which function to use based on psci_ops.conduit, use the new arm_smccc_1_1 wrapper instead. In some cases we still need to do some operations based on the conduit, but the code duplication is removed. No functional change. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21arm/arm64: Provide a wrapper for SMCCC 1.1 callsSteven Price
SMCCC 1.1 calls may use either HVC or SMC depending on the PSCI conduit. Rather than coding this in every call site, provide a macro which uses the correct instruction. The macro also handles the case where no conduit is configured/available returning a not supported error in res, along with returning the conduit used for the call. This allow us to remove some duplicated code and will be useful later when adding paravirtualized time hypervisor calls. Signed-off-by: Steven Price <steven.price@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Provide VCPU attributes for stolen timeSteven Price
Allow user space to inform the KVM host where in the physical memory map the paravirtualized time structures should be located. User space can set an attribute on the VCPU providing the IPA base address of the stolen time structure for that VCPU. This must be repeated for every VCPU in the VM. The address is given in terms of the physical address visible to the guest and must be 64 byte aligned. The guest will discover the address via a hypercall. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: Allow kvm_device_ops to be constSteven Price
Currently a kvm_device_ops structure cannot be const without triggering compiler warnings. However the structure doesn't need to be written to and, by marking it const, it can be read-only in memory. Add some more const keywords to allow this. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Support stolen time reporting via shared structureSteven Price
Implement the service call for configuring a shared structure between a VCPU and the hypervisor in which the hypervisor can write the time stolen from the VCPU's execution time by other tasks on the host. User space allocates memory which is placed at an IPA also chosen by user space. The hypervisor then updates the shared structure using kvm_put_guest() to ensure single copy atomicity of the 64-bit value reporting the stolen time in nanoseconds. Whenever stolen time is enabled by the guest, the stolen time counter is reset. The stolen time itself is retrieved from the sched_info structure maintained by the Linux scheduler code. We enable SCHEDSTATS when selecting KVM Kconfig to ensure this value is meaningful. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: Implement kvm_put_guest()Steven Price
kvm_put_guest() is analogous to put_user() - it writes a single value to the guest physical address. The implementation is built upon put_user() and so it has the same single copy atomic properties. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Implement PV_TIME_FEATURES callSteven Price
This provides a mechanism for querying which paravirtualized time features are available in this hypervisor. Also add the header file which defines the ABI for the paravirtualized time features we're about to add. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm/arm64: Factor out hypercall handling from PSCI codeChristoffer Dall
We currently intertwine the KVM PSCI implementation with the general dispatch of hypercall handling, which makes perfect sense because PSCI is the only category of hypercalls we support. However, as we are about to support additional hypercalls, factor out this functionality into a separate hypercall handler file. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [steven.price@arm.com: rebased] Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Document PV-time interfaceSteven Price
Introduce a paravirtualization interface for KVM/arm64 based on the "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A. This only adds the details about "Stolen Time" as the details of "Live Physical Time" have not been fully agreed. User space can specify a reserved area of memory for the guest and inform KVM to populate the memory with information on time that the host kernel has stolen from the guest. A hypercall interface is provided for the guest to interrogate the hypervisor's support for this interface and the location of the shared memory structures. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21Merge remote-tracking branch 'arm64/for-next/smccc-conduit-cleanup' into ↵Marc Zyngier
kvm-arm64/stolen-time
2019-10-21KVM: arm/arm64: Allow user injection of external data abortsChristoffer Dall
In some scenarios, such as buggy guest or incorrect configuration of the VMM and firmware description data, userspace will detect a memory access to a portion of the IPA, which is not mapped to any MMIO region. For this purpose, the appropriate action is to inject an external abort to the guest. The kernel already has functionality to inject an external abort, but we need to wire up a signal from user space that lets user space tell the kernel to do this. It turns out, we already have the set event functionality which we can perfectly reuse for this. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm/arm64: Allow reporting non-ISV data aborts to userspaceChristoffer Dall
For a long time, if a guest accessed memory outside of a memslot using any of the load/store instructions in the architecture which doesn't supply decoding information in the ESR_EL2 (the ISV bit is not set), the kernel would print the following message and terminate the VM as a result of returning -ENOSYS to userspace: load/store instruction decoding not implemented The reason behind this message is that KVM assumes that all accesses outside a memslot is an MMIO access which should be handled by userspace, and we originally expected to eventually implement some sort of decoding of load/store instructions where the ISV bit was not set. However, it turns out that many of the instructions which don't provide decoding information on abort are not safe to use for MMIO accesses, and the remaining few that would potentially make sense to use on MMIO accesses, such as those with register writeback, are not used in practice. It also turns out that fetching an instruction from guest memory can be a pretty horrible affair, involving stopping all CPUs on SMP systems, handling multiple corner cases of address translation in software, and more. It doesn't appear likely that we'll ever implement this in the kernel. What is much more common is that a user has misconfigured his/her guest and is actually not accessing an MMIO region, but just hitting some random hole in the IPA space. In this scenario, the error message above is almost misleading and has led to a great deal of confusion over the years. It is, nevertheless, ABI to userspace, and we therefore need to introduce a new capability that userspace explicitly enables to change behavior. This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) which does exactly that, and introduces a new exit reason to report the event to userspace. User space can then emulate an exception to the guest, restart the guest, suspend the guest, or take any other appropriate action as per the policy of the running system. Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21ipv4: fix IPSKB_FRAG_PMTU handling with fragmentationEric Dumazet
This patch removes the iph field from the state structure, which is not properly initialized. Instead, add a new field to make the "do we want to set DF" be the state bit and move the code to set the DF flag from ip_frag_next(). Joint work with Pablo and Linus. Fixes: 19c3401a917b ("net: ipv4: place control buffer handling away from fragmentation iterators") Reported-by: Patrick Schönthaler <patrick@notvads.ovh> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21ARM: 8926/1: v7m: remove register save to stack before svcafzal mohammed
r0-r3 & r12 registers are saved & restored, before & after svc respectively. Intention was to preserve those registers across thread to handler mode switch. On v7-M, hardware saves the register context upon exception in AAPCS complaint way. Restoring r0-r3 & r12 is done from stack location where hardware saves it, not from the location on stack where these registers were saved. To clarify, on stm32f429 discovery board: 1. before svc, sp - 0x90009ff8 2. r0-r3,r12 saved to 0x90009ff8 - 0x9000a00b 3. upon svc, h/w decrements sp by 32 & pushes registers onto stack 4. after svc, sp - 0x90009fd8 5. r0-r3,r12 restored from 0x90009fd8 - 0x90009feb Above means r0-r3,r12 is not restored from the location where they are saved, but since hardware pushes the registers onto stack, the registers are restored correctly. Note that during register saving to stack (step 2), it goes past 0x9000a000. And it seems, based on objdump, there are global symbols residing there, and it perhaps can cause issues on a non-XIP Kernel (on XIP, data section is setup later). Based on the analysis above, manually saving registers onto stack is at best no-op and at worst can cause data section corruption. Hence remove storing of registers onto stack before svc. Fixes: b70cd406d7fe ("ARM: 8671/1: V7M: Preserve registers across switch from Thread to Handler mode") Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Acked-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-10-21Input: st1232 - fix reporting multitouch coordinatesDixit Parmar
For Sitronix st1633 multi-touch controller driver the coordinates reported for multiple fingers were wrong, as it was always taking LSB of coordinates from the first contact data. Signed-off-by: Dixit Parmar <dixitparmar19@gmail.com> Reviewed-by: Martin Kepplinger <martink@posteo.de> Cc: stable@vger.kernel.org Fixes: 351e0592bfea ("Input: st1232 - add support for st1633") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204561 Link: https://lore.kernel.org/r/1566209314-21767-1-git-send-email-dixitparmar19@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-10-21Merge tag 'mlx5-fixes-2019-10-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 kTLS fixes 18-10-2019 This series introduces kTLS related fixes to mlx5 driver from Tariq, and two misc memory leak fixes form Navid Emamdoost. Please pull and let me know if there is any problem. I would appreciate it if you queue up kTLS fixes from the list below to stable kernel v5.3 ! For -stable v4.13: nett/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21Revert "pwm: Let pwm_get_state() return the last implemented state"Thierry Reding
It turns out that commit 01ccf903edd6 ("pwm: Let pwm_get_state() return the last implemented state") causes backlight failures on a number of boards. The reason is that some of the drivers do not write the full state through to the hardware registers, which means that ->get_state() subsequently does not return the correct state. Consumers which rely on pwm_get_state() returning the current state will therefore get confused and subsequently try to program a bad state. Before this change can be made, existing drivers need to be more carefully audited and fixed to behave as the framework expects. Until then, keep the original behaviour of returning the software state that was applied rather than reading the state back from hardware. Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Tested-by: Michal Vokáč <michal.vokac@ysoft.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-10-21mmc: mxs: fix flags passed to dmaengine_prep_slave_sgSascha Hauer
Since ceeeb99cd821 we no longer abuse the DMA_CTRL_ACK flag for custom driver use and introduced the MXS_DMA_CTRL_WAIT4END instead. We have not changed all users to this flag though. This patch fixes it for the mxs-mmc driver. Fixes: ceeeb99cd821 ("dmaengine: mxs: rename custom flag") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: Fabio Estevam <festevam@gmail.com> Reported-by: Bruno Thomsen <bruno.thomsen@gmail.com> Tested-by: Bruno Thomsen <bruno.thomsen@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-10-21pinctrl: cherryview: Fix irq_valid_mask calculationHans de Goede
Commit 03c4749dd6c7 ("gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation") has made the cherryview gpio numbers sparse, to get a 1:1 mapping between ACPI pin numbers and gpio numbers in Linux. This has greatly simplified things, but the code setting the irq_valid_mask was not updated for this, so the valid mask is still in the old "compressed" numbering with the gaps in the pin numbers skipped, which is wrong as irq_valid_mask needs to be expressed in gpio numbers. This results in the following error on devices using pin 24 (0x0018) on the north GPIO controller as an ACPI event source: [ 0.422452] cherryview-pinctrl INT33FF:01: Failed to translate GPIO to IRQ This has been reported (by email) to be happening on a Caterpillar CAT T20 tablet and I've reproduced this myself on a Medion Akoya e2215t 2-in-1. This commit uses the pin number instead of the compressed index into community->pins to clear the correct bits in irq_valid_mask for GPIOs using GPEs for interrupts, fixing these errors and in case of the Medion Akoya e2215t also fixing the LID switch not working. Cc: stable@vger.kernel.org Fixes: 03c4749dd6c7 ("gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-10-21virtiofs: Retry request submission from worker contextVivek Goyal
If regular request queue gets full, currently we sleep for a bit and retrying submission in submitter's context. This assumes submitter is not holding any spin lock. But this assumption is not true for background requests. For background requests, we are called with fc->bg_lock held. This can lead to deadlock where one thread is trying submission with fc->bg_lock held while request completion thread has called fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As request completion thread gets blocked, it does not make further progress and that means queue does not get empty and submitter can't submit more requests. To solve this issue, retry submission with the help of a worker, instead of retrying in submitter's context. We already do this for hiprio/forget requests. Reported-by: Chirantan Ekbote <chirantan@chromium.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21virtiofs: Count pending forgets as in_flight forgetsVivek Goyal
If virtqueue is full, we put forget requests on a list and these forgets are dispatched later using a worker. As of now we don't count these forgets in fsvq->in_flight variable. This means when queue is being drained, we have to have special logic to first drain these pending requests and then wait for fsvq->in_flight to go to zero. By counting pending forgets in fsvq->in_flight, we can get rid of special logic and just wait for in_flight to go to zero. Worker thread will kick and drain all the forgets anyway, leading in_flight to zero. I also need similar logic for normal request queue in next patch where I am about to defer request submission in the worker context if queue is full. This simplifies the code a bit. Also add two helper functions to inc/dec in_flight. Decrement in_flight helper will later used to call completion when in_flight reaches zero. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21virtiofs: Set FR_SENT flag only after request has been sentVivek Goyal
FR_SENT flag should be set when request has been sent successfully sent over virtqueue. This is used by interrupt logic to figure out if interrupt request should be sent or not. Also add it to fqp->processing list after sending it successfully. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21virtiofs: No need to check fpq->connected stateVivek Goyal
In virtiofs we keep per queue connected state in virtio_fs_vq->connected and use that to end request if queue is not connected. And virtiofs does not even touch fpq->connected state. We probably need to merge these two at some point of time. For now, simplify the code a bit and do not worry about checking state of fpq->connected. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21virtiofs: Do not end request in submission contextVivek Goyal
Submission context can hold some locks which end request code tries to hold again and deadlock can occur. For example, fc->bg_lock. If a background request is being submitted, it might hold fc->bg_lock and if we could not submit request (because device went away) and tried to end request, then deadlock happens. During testing, I also got a warning from deadlock detection code. So put requests on a list and end requests from a worker thread. I got following warning from deadlock detector. [ 603.137138] WARNING: possible recursive locking detected [ 603.137142] -------------------------------------------- [ 603.137144] blogbench/2036 is trying to acquire lock: [ 603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse] [ 603.140701] [ 603.140701] but task is already holding lock: [ 603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse] [ 603.140713] [ 603.140713] other info that might help us debug this: [ 603.140714] Possible unsafe locking scenario: [ 603.140714] [ 603.140715] CPU0 [ 603.140716] ---- [ 603.140716] lock(&(&fc->bg_lock)->rlock); [ 603.140718] lock(&(&fc->bg_lock)->rlock); [ 603.140719] [ 603.140719] *** DEADLOCK *** Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21fuse: don't advise readdirplus for negative lookupMiklos Szeredi
If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a directory before/during readdir are used as an indication that READDIRPLUS should be used instead of READDIR. However if the lookup turns out to be negative, then selecting READDIRPLUS makes no sense. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21drm/komeda: Fix typos in komeda_splitter_validateMihail Atanassov
Fix both the string and the struct member being printed. Changes since v1: - Now with a bonus grammar fix, too. Fixes: 264b9436d23b ("drm/komeda: Enable writeback split support") Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com> Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190930122231.33029-1-mihail.atanassov@arm.com
2019-10-21drm/komeda: Don't flush inactive pipesMihail Atanassov
HW doesn't allow flushing inactive pipes and raises an MERR interrupt if you try to do so. Stop triggering the MERR interrupt in the middle of a commit by calling drm_atomic_helper_commit_planes with the ACTIVE_ONLY flag. Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com> Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010102950.56253-1-mihail.atanassov@arm.com