summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from Jakub Kicinski: - fix failure to add bond interfaces to a bridge, the offload-handling code was too defensive there and recent refactoring unearthed that. Users complained (Ido) - fix unnecessarily reflecting ECN bits within TOS values / QoS marking in TCP ACK and reset packets (Wei) - fix a deadlock with bpf iterator. Hopefully we're in the clear on this front now... (Yonghong) - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel) - fix AQL on mt76 devices with FW rate control and add a couple of AQL issues in mac80211 code (Felix) - fix authentication issue with mwifiex (Maximilian) - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro) - fix exception handling for multipath routes via same device (David Ahern) - revert back to a BH spin lock flavor for nsid_lock: there are paths which do require the BH context protection (Taehee) - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke) - fix ife module load deadlock (Cong) - make an adjustment to netlink reply message type for code added in this release (the sole change touching uAPI here) (Michal) - a number of fixes for small NXP and Microchip switches (Vladimir) [ Pull request acked by David: "you can expect more of this in the future as I try to delegate more things to Jakub" ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits) net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU net: Update MAINTAINERS for MediaTek switch driver net/mlx5e: mlx5e_fec_in_caps() returns a boolean net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock net/mlx5e: kTLS, Fix leak on resync error flow net/mlx5e: kTLS, Add missing dma_unmap in RX resync net/mlx5e: kTLS, Fix napi sync and possible use-after-free net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats() net/mlx5e: Fix multicast counter not up-to-date in "ip -s" net/mlx5e: Fix endianness when calculating pedit mask first bit net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported net/mlx5e: CT: Fix freeing ct_label mapping net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready net/mlx5e: Use synchronize_rcu to sync with NAPI net/mlx5e: Use RCU to protect rq->xdp_prog ...
2020-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - fix fault on page table writes during instruction fetch s390: - doc improvement x86: - The obvious patches are always the ones that turn out to be completely broken. /me hangs his head in shame" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Revert "KVM: Check the allocation of pv cpu mask" KVM: arm64: Remove S1PTW check from kvm_vcpu_dabt_iswrite() KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch docs: kvm: add documentation for KVM_CAP_S390_DIAG318
2020-09-20Merge tag 'x86_urgent_for_v5.9_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A defconfig fix (Daniel Díaz) - Disable relocation relaxation for the compressed kernel when not built as -pie as in that case kernels built with clang and linked with LLD fail to boot due to the linker optimizing some instructions in non-PIE form; the gory details in the commit message (Arvind Sankar) - A fix for the "bad bp value" warning issued by the frame-pointer unwinder (Josh Poimboeuf) * tag 'x86_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/fp: Fix FP unwinding in ret_from_fork x86/boot/compressed: Disable relocation relaxation x86/defconfigs: Explicitly unset CONFIG_64BIT in i386_defconfig
2020-09-20Merge tag 'kvmarm-fixes-5.9-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.9, take #2 - Fix handling of S1 Page Table Walk permission fault at S2 on instruction fetch - Cleanup kvm_vcpu_dabt_iswrite()
2020-09-20Revert "KVM: Check the allocation of pv cpu mask"Vitaly Kuznetsov
The commit 0f990222108d ("KVM: Check the allocation of pv cpu mask") we have in 5.9-rc5 has two issue: 1) Compilation fails for !CONFIG_SMP, see: https://bugzilla.kernel.org/show_bug.cgi?id=209285 2) This commit completely disables PV TLB flush, see https://lore.kernel.org/kvm/87y2lrnnyf.fsf@vitty.brq.redhat.com/ The allocation problem is likely a theoretical one, if we don't have memory that early in boot process we're likely doomed anyway. Let's solve it properly later. This reverts commit 0f990222108d214a0924d920e6095b58107d7b59. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-20Merge tag 'riscv-for-linus-5.9-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for a lockdep issue to avoid an asserting triggering during early boot. There shouldn't be any incorrect behavior as the system isn't concurrent at the time. - The addition of a missing fence when installing early fixmap mappings. - A corretion to the K210 device tree's interrupt map. - A fix for M-mode timer handling on the K210. * tag 'riscv-for-linus-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Resurrect the MMIO timer implementation for M-mode systems riscv: Fix Kendryte K210 device tree riscv: Add sfence.vma after early page table changes RISC-V: Take text_mutex in ftrace_init_nop()
2020-09-19RISC-V: Resurrect the MMIO timer implementation for M-mode systemsPalmer Dabbelt
The K210 doesn't implement rdtime in M-mode, and since that's where Linux runs in the NOMMU systems that means we can't use rdtime. The K210 is the only system that anyone is currently running NOMMU or M-mode on, so here we're just inlining the timer read directly. This also adds the CLINT driver as an !MMU dependency, as it's currently the only timer driver availiable for these systems and without it we get a build failure for some configurations. Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-19riscv: Fix Kendryte K210 device treeDamien Le Moal
The Kendryte K210 SoC CLINT is compatible with Sifive clint v0 (sifive,clint0). Fix the Kendryte K210 device tree clint entry to be inline with the sifive timer definition documented in Documentation/devicetree/bindings/timer/sifive,clint.yaml. The device tree clint entry is renamed similarly to u-boot device tree definition to improve compatibility with u-boot defined device tree. To ensure correct initialization, the interrup-cells attribute is added and the interrupt-extended attribute definition fixed. This fixes boot failures with Kendryte K210 SoC boards. Note that the clock referenced is kept as K210_CLK_ACLK, which does not necessarilly match the clint MTIME increment rate. This however does not seem to cause any problem for now. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-19riscv: Add sfence.vma after early page table changesGreentime Hu
This invalidates local TLB after modifying the page tables during early init as it's too early to handle suprious faults as we otherwise do. Fixes: f2c17aabc917 ("RISC-V: Implement compile-time fixed mappings") Reported-by: Syven Wang <syven.wang@sifive.com> Signed-off-by: Syven Wang <syven.wang@sifive.com> Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> [Palmer: Cleaned up the commit text] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-18Merge tag 's390-5.9-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix order in trace_hardirqs_off_caller() to make locking state consistent even if the IRQ tracer calls into lockdep again. Touches common code. Acked-by Peter Zijlstra. - Correctly handle secure storage violation exception to avoid kernel panic triggered by user space misbehaviour. - Switch the idle->seqcount over to using raw_write_*() to avoid "suspicious RCU usage". - Fix memory leaks on hard unplug in pci code. - Use kvmalloc instead of kmalloc for larger allocations in zcrypt. - Add few missing __init annotations to static functions to avoid section mismatch complains when functions are not inlined. * tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: add 3f program exception handler lockdep: fix order in trace_hardirqs_off_caller() s390/pci: fix leak of DMA tables on hard unplug s390/init: add missing __init annotations s390/zcrypt: fix kmalloc 256k failure s390/idle: fix suspicious RCU usage
2020-09-18Merge tag 'sh-for-5.9-part2' of git://git.libc.org/linux-shLinus Torvalds
Pull arch/sh fixes from Rich Felker: "Fixes for build and function regression" * tag 'sh-for-5.9-part2' of git://git.libc.org/linux-sh: sh: fix syscall tracing sh: remove spurious circular inclusion from asm/smp.h
2020-09-18Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Allow CPUs affected by erratum 1418040 to come online late (previously we only fixed the other case - CPUs not affected by the erratum coming up late). - Fix branch offset in BPF JIT. - Defer the stolen time initialisation to the CPU online time from the CPU starting time to avoid a (sleep-able) memory allocation in an atomic context. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: paravirt: Initialize steal time when cpu is online arm64: bpf: Fix branch offset in JIT arm64: Allow CPUs unffected by ARM erratum 1418040 to come in late
2020-09-18Merge tag 'powerpc-5.9-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.9: - Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing crashes. - Fix a long standing bug in our DMA mask handling that was hidden until recently, and which caused problems with some drivers. - Fix a boot failure on systems with large amounts of RAM, and no hugepage support and using Radix MMU, only seen in the lab. - A few other minor fixes. Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy, Hari Bathini, Ira Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav Jain, and Vaidyanathan Srinivasan" * tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute cpuidle: pseries: Fix CEDE latency conversion from tb to us powerpc/dma: Fix dma_map_ops::get_required_mask Revert "powerpc/build: vdso linker warning for orphan sections" powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc selftests/powerpc: Skip PROT_SAO test in guests/LPARS powerpc/book3s64/radix: Fix boot failure with large amount of guest memory
2020-09-18Merge tag 'pm-5.9-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add a new CPU ID to the RAPL power capping driver and prevent the ACPI processor idle driver from triggering RCU-lockdep complaints. Specifics: - Add support for the Lakefield chip to the RAPL power capping driver (Ricardo Neri). - Modify the ACPI processor idle driver to prevent it from triggering RCU-lockdep complaints which has started to happen after recent changes in that area (Peter Zijlstra)" * tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: processor: Take over RCU-idle for C3-BM idle cpuidle: Allow cpuidle drivers to take over RCU-idle ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHED ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP powercap: RAPL: Add support for Lakefield
2020-09-18KVM: arm64: Remove S1PTW check from kvm_vcpu_dabt_iswrite()Marc Zyngier
Now that kvm_vcpu_trap_is_write_fault() checks for S1PTW, there is no need for kvm_vcpu_dabt_iswrite() to do the same thing, as we already check for this condition on all existing paths. Drop the check and add a comment instead. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200915104218.1284701-3-maz@kernel.org
2020-09-18KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetchMarc Zyngier
KVM currently assumes that an instruction abort can never be a write. This is in general true, except when the abort is triggered by a S1PTW on instruction fetch that tries to update the S1 page tables (to set AF, for example). This can happen if the page tables have been paged out and brought back in without seeing a direct write to them (they are thus marked read only), and the fault handling code will make the PT executable(!) instead of writable. The guest gets stuck forever. In these conditions, the permission fault must be considered as a write so that the Stage-1 update can take place. This is essentially the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: KVM: Take S1 walks into account when determining S2 write faults"). Update kvm_is_write_fault() to return true on IABT+S1PTW, and introduce kvm_vcpu_trap_is_exec_fault() that only return true when no faulting on a S1 fault. Additionally, kvm_vcpu_dabt_iss1tw() is renamed to kvm_vcpu_abt_iss1tw(), as the above makes it plain that it isn't specific to data abort. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200915104218.1284701-2-maz@kernel.org
2020-09-18x86/unwind/fp: Fix FP unwinding in ret_from_forkJosh Poimboeuf
There have been some reports of "bad bp value" warnings printed by the frame pointer unwinder: WARNING: kernel stack regs at 000000005bac7112 in sh:1014 has bad 'bp' value 0000000000000000 This warning happens when unwinding from an interrupt in ret_from_fork(). If entry code gets interrupted, the state of the frame pointer (rbp) may be undefined, which can confuse the unwinder, resulting in warnings like the above. There's an in_entry_code() check which normally silences such warnings for entry code. But in this case, ret_from_fork() is getting interrupted. It recently got moved out of .entry.text, so the in_entry_code() check no longer works. It could be moved back into .entry.text, but that would break the noinstr validation because of the call to schedule_tail(). Instead, initialize each new task's RBP to point to the task's entry regs via an encoded frame pointer. That will allow the unwinder to reach the end of the stack gracefully. Fixes: b9f6976bfb94 ("x86/entry/64: Move non entry code into .text section") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/f366bbf5a8d02e2318ee312f738112d0af74d16f.1600103007.git.jpoimboe@redhat.com
2020-09-17Merge tag 'mips_fixes_5.9_2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: "Two small fixes for SNI machines" * tag 'mips_fixes_5.9_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: SNI: Fix spurious interrupts MIPS: SNI: Fix MIPS_L1_CACHE_SHIFT
2020-09-17arm64: paravirt: Initialize steal time when cpu is onlineAndrew Jones
Steal time initialization requires mapping a memory region which invokes a memory allocation. Doing this at CPU starting time results in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled: BUG: sleeping function called from invalid context at mm/slab.h:498 in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1 Call trace: dump_backtrace+0x0/0x208 show_stack+0x1c/0x28 dump_stack+0xc4/0x11c ___might_sleep+0xf8/0x130 __might_sleep+0x58/0x90 slab_pre_alloc_hook.constprop.101+0xd0/0x118 kmem_cache_alloc_node_trace+0x84/0x270 __get_vm_area_node+0x88/0x210 get_vm_area_caller+0x38/0x40 __ioremap_caller+0x70/0xf8 ioremap_cache+0x78/0xb0 memremap+0x9c/0x1a8 init_stolen_time_cpu+0x54/0xf0 cpuhp_invoke_callback+0xa8/0x720 notify_cpu_starting+0xc8/0xd8 secondary_start_kernel+0x114/0x180 CPU1: Booted secondary processor 0x0000000001 [0x431f0a11] However we don't need to initialize steal time at CPU starting time. We can simply wait until CPU online time, just sacrificing a bit of accuracy by returning zero for steal time until we know better. While at it, add __init to the functions that are only called by pv_time_init() which is __init. Signed-off-by: Andrew Jones <drjones@redhat.com> Fixes: e0685fa228fd ("arm64: Retrieve stolen time as paravirtualized guest") Cc: stable@vger.kernel.org Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-17arm64: bpf: Fix branch offset in JITIlias Apalodimas
Running the eBPF test_verifier leads to random errors looking like this: [ 6525.735488] Unexpected kernel BRK exception at EL1 [ 6525.735502] Internal error: ptrace BRK handler: f2000100 [#1] SMP [ 6525.741609] Modules linked in: nls_utf8 cifs libdes libarc4 dns_resolver fscache binfmt_misc nls_ascii nls_cp437 vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce gf128mul efi_pstore sha2_ce sha256_arm64 sha1_ce evdev efivars efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic xor xor_neon zstd_compress raid6_pq libcrc32c crc32c_generic ahci xhci_pci libahci xhci_hcd igb libata i2c_algo_bit nvme realtek usbcore nvme_core scsi_mod t10_pi netsec mdio_devres of_mdio gpio_keys fixed_phy libphy gpio_mb86s7x [ 6525.787760] CPU: 3 PID: 7881 Comm: test_verifier Tainted: G W 5.9.0-rc1+ #47 [ 6525.796111] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #1 Jun 6 2020 [ 6525.804812] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) [ 6525.810390] pc : bpf_prog_c3d01833289b6311_F+0xc8/0x9f4 [ 6525.815613] lr : bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c [ 6525.820832] sp : ffff8000130cbb80 [ 6525.824141] x29: ffff8000130cbbb0 x28: 0000000000000000 [ 6525.829451] x27: 000005ef6fcbf39b x26: 0000000000000000 [ 6525.834759] x25: ffff8000130cbb80 x24: ffff800011dc7038 [ 6525.840067] x23: ffff8000130cbd00 x22: ffff0008f624d080 [ 6525.845375] x21: 0000000000000001 x20: ffff800011dc7000 [ 6525.850682] x19: 0000000000000000 x18: 0000000000000000 [ 6525.855990] x17: 0000000000000000 x16: 0000000000000000 [ 6525.861298] x15: 0000000000000000 x14: 0000000000000000 [ 6525.866606] x13: 0000000000000000 x12: 0000000000000000 [ 6525.871913] x11: 0000000000000001 x10: ffff8000000a660c [ 6525.877220] x9 : ffff800010951810 x8 : ffff8000130cbc38 [ 6525.882528] x7 : 0000000000000000 x6 : 0000009864cfa881 [ 6525.887836] x5 : 00ffffffffffffff x4 : 002880ba1a0b3e9f [ 6525.893144] x3 : 0000000000000018 x2 : ffff8000000a4374 [ 6525.898452] x1 : 000000000000000a x0 : 0000000000000009 [ 6525.903760] Call trace: [ 6525.906202] bpf_prog_c3d01833289b6311_F+0xc8/0x9f4 [ 6525.911076] bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c [ 6525.915957] bpf_dispatcher_xdp_func+0x14/0x20 [ 6525.920398] bpf_test_run+0x70/0x1b0 [ 6525.923969] bpf_prog_test_run_xdp+0xec/0x190 [ 6525.928326] __do_sys_bpf+0xc88/0x1b28 [ 6525.932072] __arm64_sys_bpf+0x24/0x30 [ 6525.935820] el0_svc_common.constprop.0+0x70/0x168 [ 6525.940607] do_el0_svc+0x28/0x88 [ 6525.943920] el0_sync_handler+0x88/0x190 [ 6525.947838] el0_sync+0x140/0x180 [ 6525.951154] Code: d4202000 d4202000 d4202000 d4202000 (d4202000) [ 6525.957249] ---[ end trace cecc3f93b14927e2 ]--- The reason is the offset[] creation and later usage, while building the eBPF body. The code currently omits the first instruction, since build_insn() will increase our ctx->idx before saving it. That was fine up until bounded eBPF loops were introduced. After that introduction, offset[0] must be the offset of the end of prologue which is the start of the 1st insn while, offset[n] holds the offset of the end of n-th insn. When "taken loop with back jump to 1st insn" test runs, it will eventually call bpf2a64_offset(-1, 2, ctx). Since negative indexing is permitted, the current outcome depends on the value stored in ctx->offset[-1], which has nothing to do with our array. If the value happens to be 0 the tests will work. If not this error triggers. commit 7c2e988f400e ("bpf: fix x64 JIT code generation for jmp to 1st insn") fixed an indentical bug on x86 when eBPF bounded loops were introduced. So let's fix it by creating the ctx->offset[] differently. Track the beginning of instruction and account for the extra instruction while calculating the arm instruction offsets. Fixes: 2589726d12a1 ("bpf: introduce bounded loops") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Jiri Olsa <jolsa@kernel.org> Co-developed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Co-developed-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200917084925.177348-1-ilias.apalodimas@linaro.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-16MIPS: SNI: Fix spurious interruptsThomas Bogendoerfer
On A20R machines the interrupt pending bits in cause register need to be updated by requesting the chipset to do it. This needs to be done to find the interrupt cause and after interrupt service. In commit 0b888c7f3a03 ("MIPS: SNI: Convert to new irq_chip functions") the function to do after service update got lost, which caused spurious interrupts. Fixes: 0b888c7f3a03 ("MIPS: SNI: Convert to new irq_chip functions") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-16ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHEDPeter Zijlstra
Make acpi_processor_idle() use the generic TLB flushing code. This again removes RCU usage after rcu_idle_enter(). (XXX make every C3 invalidate TLBs, not just C3-BM) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-15MIPS: SNI: Fix MIPS_L1_CACHE_SHIFTThomas Bogendoerfer
Commit 930beb5ac09a ("MIPS: introduce MIPS_L1_CACHE_SHIFT_<N>") forgot to select the correct MIPS_L1_CACHE_SHIFT for SNI RM. This breaks non coherent DMA because of a wrong allocation alignment. Fixes: 930beb5ac09a ("MIPS: introduce MIPS_L1_CACHE_SHIFT_<N>") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-14vgacon: remove software scrollback supportLinus Torvalds
Yunhai Zhang recently fixed a VGA software scrollback bug in commit ebfdfeeae8c0 ("vgacon: Fix for missing check in scrollback handling"), but that then made people look more closely at some of this code, and there were more problems on the vgacon side, but also the fbcon software scrollback. We don't really have anybody who maintains this code - probably because nobody actually _uses_ it any more. Sure, people still use both VGA and the framebuffer consoles, but they are no longer the main user interfaces to the kernel, and haven't been for decades, so these kinds of extra features end up bitrotting and not really being used. So rather than try to maintain a likely unused set of code, I'll just aggressively remove it, and see if anybody even notices. Maybe there are people who haven't jumped on the whole GUI badnwagon yet, and think it's just a fad. And maybe those people use the scrollback code. If that turns out to be the case, we can resurrect this again, once we've found the sucker^Wmaintainer for it who actually uses it. Reported-by: NopNop Nop <nopitydays@gmail.com> Tested-by: Willy Tarreau <w@1wt.eu> Cc: 张云海 <zhangyunhai@nsfocus.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Willy Tarreau <w@1wt.eu> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-14x86/boot/compressed: Disable relocation relaxationArvind Sankar
The x86-64 psABI [0] specifies special relocation types (R_X86_64_[REX_]GOTPCRELX) for indirection through the Global Offset Table, semantically equivalent to R_X86_64_GOTPCREL, which the linker can take advantage of for optimization (relaxation) at link time. This is supported by LLD and binutils versions 2.26 onwards. The compressed kernel is position-independent code, however, when using LLD or binutils versions before 2.27, it must be linked without the -pie option. In this case, the linker may optimize certain instructions into a non-position-independent form, by converting foo@GOTPCREL(%rip) to $foo. This potential issue has been present with LLD and binutils-2.26 for a long time, but it has never manifested itself before now: - LLD and binutils-2.26 only relax movq foo@GOTPCREL(%rip), %reg to leaq foo(%rip), %reg which is still position-independent, rather than mov $foo, %reg which is permitted by the psABI when -pie is not enabled. - GCC happens to only generate GOTPCREL relocations on mov instructions. - CLang does generate GOTPCREL relocations on non-mov instructions, but when building the compressed kernel, it uses its integrated assembler (due to the redefinition of KBUILD_CFLAGS dropping -no-integrated-as), which has so far defaulted to not generating the GOTPCRELX relocations. Nick Desaulniers reports [1,2]: "A recent change [3] to a default value of configuration variable (ENABLE_X86_RELAX_RELOCATIONS OFF -> ON) in LLVM now causes Clang's integrated assembler to emit R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX relocations. LLD will relax instructions with these relocations based on whether the image is being linked as position independent or not. When not, then LLD will relax these instructions to use absolute addressing mode (R_RELAX_GOT_PC_NOPIC). This causes kernels built with Clang and linked with LLD to fail to boot." Patch series [4] is a solution to allow the compressed kernel to be linked with -pie unconditionally, but even if merged is unlikely to be backported. As a simple solution that can be applied to stable as well, prevent the assembler from generating the relaxed relocation types using the -mrelax-relocations=no option. For ease of backporting, do this unconditionally. [0] https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/linker-optimization.tex#L65 [1] https://lore.kernel.org/lkml/20200807194100.3570838-1-ndesaulniers@google.com/ [2] https://github.com/ClangBuiltLinux/linux/issues/1121 [3] https://reviews.llvm.org/rGc41a18cf61790fc898dcda1055c3efbf442c14c0 [4] https://lore.kernel.org/lkml/20200731202738.2577854-1-nivedita@alum.mit.edu/ Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200812004308.1448603-1-nivedita@alum.mit.edu
2020-09-14s390: add 3f program exception handlerJanosch Frank
Program exception 3f (secure storage violation) can only be detected when the CPU is running in SIE with a format 4 state description, e.g. running a protected guest. Because of this and because user space partly controls the guest memory mapping and can trigger this exception, we want to send a SIGSEGV to the process running the guest and not panic the kernel. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Cc: <stable@vger.kernel.org> # 5.7 Fixes: 084ea4d611a3 ("s390/mm: add (non)secure page access exceptions handlers") Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14s390/pci: fix leak of DMA tables on hard unplugNiklas Schnelle
commit f606b3ef47c9 ("s390/pci: adapt events for zbus") removed the zpci_disable_device() call for a zPCI event with PEC 0x0304 because the device is already deconfigured by the platform. This however skips the Linux side of the disable in particular it leads to leaking the DMA tables and bitmaps because zpci_dma_exit_device() is never called on the device. If the device transitions to the Reserved state we call zpci_zdev_put() but zpci_release_device() will not call zpci_disable_device() because the state of the zPCI function is already ZPCI_FN_STATE_STANDBY. If the device is put into the Standby state, zpci_disable_device() is not called and the device is assumed to have been put in Standby through platform action. At this point the device may be removed by a subsequent event with PEC 0x0308 or 0x0306 which calls zpci_zdev_put() with the same problem as above or the device may be configured again in which case zpci_disable_device() is also not called. Fix this by calling zpci_disable_device() explicitly for PEC 0x0304 as before. To make it more clear that zpci_disable_device() may be called, even if the lower level device has already been disabled by the platform, add a comment to zpci_disable_device(). Cc: <stable@vger.kernel.org> # 5.8 Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14s390/init: add missing __init annotationsIlya Leoshkevich
Add __init to reserve_memory_end, reserve_oldmem and remove_oldmem. Sometimes these functions are not inlined, and then the build complains about section mismatch. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14s390/idle: fix suspicious RCU usagePeter Zijlstra
After commit eb1f00237aca ("lockdep,trace: Expose tracepoints") the lock tracepoints are visible to lockdep and RCU-lockdep is finding a bunch more RCU violations that were previously hidden. Switch the idle->seqcount over to using raw_write_*() to avoid the lockdep annotation and thus the lock tracepoints. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-13sh: fix syscall tracingRich Felker
Addition of SECCOMP_FILTER exposed a longstanding bug in do_syscall_trace_enter, whereby r0 (the 5th argument register) was mistakenly used where r3 (syscall_nr) was intended. By overwriting r0 rather than r3 with -1 when attempting to block a syscall, the existing code would instead have caused the syscall to execute with an argument clobbered. Commit 0bb605c2c7f2b4b3 then introduced skipping of the syscall when do_syscall_trace_enter returns -1, so that the return value set by seccomp filters would not be clobbered by -ENOSYS. This eliminated the clobbering of the 5th argument register, but instead caused syscalls made with a 5th argument of -1 to be misinterpreted as a request by do_syscall_trace_enter to suppress the syscall. Fixes: 0bb605c2c7f2b4b3 ("sh: Add SECCOMP_FILTER") Fixes: ab99c733ae73cce3 ("sh: Make syscall tracer use tracehook notifiers, add TIF_NOTIFY_RESUME.") Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Rich Felker <dalias@libc.org>
2020-09-13sh: remove spurious circular inclusion from asm/smp.hRich Felker
Commit 0cd39f4600ed4de8 added inclusion of smp.h to lockdep.h, creating a circular include dependency where arch/sh's asm/smp.h in turn includes spinlock.h which depends on lockdep.h. Since our asm/smp.h does not actually need spinlock.h, just remove it. Fixes: 0cd39f4600ed4de8 ("locking/seqlock, headers: Untangle the spaghetti monster") Tested-by: Rob Landley <rob@landley.net> Signed-off-by: Rich Felker <dalias@libc.org>
2020-09-13Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "A collection of fixes I've been accruing over the last few weeks, none of them have been severe enough to warrant flushing the queue but it's been long enough now that it's a good idea to send them in. A handful of them are fixups for QSPI DT/bindings/compatibles, some smaller fixes for system DMA clock control and TMU interrupts on i.MX, a handful of fixes for OMAP, including a fix for DSI (display) on omap5" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits) arm64: dts: ns2: Fixed QSPI compatible string ARM: dts: BCM5301X: Fixed QSPI compatible string ARM: dts: NSP: Fixed QSPI compatible string ARM: dts: bcm: HR2: Fixed QSPI compatible string dt-bindings: spi: Fix spi-bcm-qspi compatible ordering ARM: dts: imx6sx: fix the pad QSPI1B_SCLK mux mode for uart3 arm64: dts: imx8mp: correct sdma1 clk setting arm64: dts: imx8mq: Fix TMU interrupt property ARM: dts: imx7d-zii-rmu2: fix rgmii phy-mode for ksz9031 phy ARM: dts: vfxxx: Add syscon compatible with OCOTP ARM: dts: imx6q-logicpd: Fix broken PWM arm64: dts: imx: Add missing imx8mm-beacon-kit.dtb to build ARM: dts: imx6q-prtwd2: Remove unneeded i2c unit name ARM: dts: imx6qdl-gw51xx: Remove unneeded #address-cells/#size-cells ARM: dts: imx7ulp: Correct gpio ranges ARM: dts: ls1021a: fix QuadSPI-memory reg range arm64: defconfig: Enable ptn5150 extcon driver arm64: defconfig: Enable USB gadget with configfs ARM: configs: Update Integrator defconfig ARM: dts: omap5: Fix DSI base address and clocks ...
2020-09-13Merge tag 'arm-soc/for-5.9/devicetree-fixes' of ↵Olof Johansson
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoCs Device Tree fixes for 5.9, please pull the following: - Florian fixes the Broadcom QSPI controller binding such that the most specific compatible string is the left most one, and all existing in-tree users are updated as well. * tag 'arm-soc/for-5.9/devicetree-fixes' of https://github.com/Broadcom/stblinux: arm64: dts: ns2: Fixed QSPI compatible string ARM: dts: BCM5301X: Fixed QSPI compatible string ARM: dts: NSP: Fixed QSPI compatible string ARM: dts: bcm: HR2: Fixed QSPI compatible string dt-bindings: spi: Fix spi-bcm-qspi compatible ordering Link: https://lore.kernel.org/r/20200909211857.4144718-1-f.fainelli@gmail.com Signed-off-by: Olof Johansson <olof@lixom.net>
2020-09-13Merge tag 'imx-fixes-5.9-2' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.9, round 2: - Fix the misspelling of 'interrupts' property in i.MX8MQ TMU DT node. - Correct 'ahb' clock for i.MX8MP SDMA1 in device tree. - Fix pad QSPI1B_SCLK mux mode for UART3 on i.MX6SX. * tag 'imx-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6sx: fix the pad QSPI1B_SCLK mux mode for uart3 arm64: dts: imx8mp: correct sdma1 clk setting arm64: dts: imx8mq: Fix TMU interrupt property Link: https://lore.kernel.org/r/20200909143844.GA25109@dragon Signed-off-by: Olof Johansson <olof@lixom.net>
2020-09-13Merge tag 'omap-for-v5.9/fixes-rc3' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Fixes for omaps for v5.9-rc cycle Few fixes for omap based devices: - Fix of_clk_get() error handling for omap-iommu - Fix missing audio pinctrl entries for logicpd boards - Fix video for logicpd-som-lv after switch to generic panels - Fix omap5 DSI clocks base * tag 'omap-for-v5.9/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: omap5: Fix DSI base address and clocks ARM: dts: logicpd-som-lv-baseboard: Fix missing video ARM: dts: logicpd-som-lv-baseboard: Fix broken audio ARM: dts: logicpd-torpedo-baseboard: Fix broken audio ARM: OMAP2+: Fix an IS_ERR() vs NULL check in _get_pwrdm() Link: https://lore.kernel.org/r/pull-1599132064-54898@atomide.com Signed-off-by: Olof Johansson <olof@lixom.net>
2020-09-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A bit on the bigger side, mostly due to me being on vacation, then busy, then on parental leave, but there's nothing worrisome. ARM: - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values x86: - nSVM state restore fixes - Async page fault fixes - Lots of small fixes everywhere" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: emulator: more strict rsm checks. KVM: nSVM: more strict SMM checks when returning to nested guest SVM: nSVM: setup nested msr permission bitmap on nested state load SVM: nSVM: correctly restore GIF on vmexit from nesting after migration x86/kvm: don't forget to ACK async PF IRQ x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit KVM: SVM: avoid emulation with stale next_rip KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN KVM: SVM: Periodically schedule when unregistering regions on destroy KVM: MIPS: Change the definition of kvm type kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control KVM: fix memory leak in kvm_io_bus_unregister_dev() KVM: Check the allocation of pv cpu mask KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected KVM: arm64: Update page shift if stage 2 block mapping not supported KVM: arm64: Fix address truncation in traces KVM: arm64: Do not try to map PUDs when they are folded into PMD arm64/x86: KVM: Introduce steal-time cap ...
2020-09-13arm64: Allow CPUs unffected by ARM erratum 1418040 to come in lateMarc Zyngier
Now that we allow CPUs affected by erratum 1418040 to come in late, this prevents their unaffected sibblings from coming in late (or coming back after a suspend or hotplug-off, which amounts to the same thing). To allow this, we need to add ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU, which amounts to set .type to ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE. Fixes: bf87bb0881d0 ("arm64: Allow booting of late CPUs affected by erratum 1418040") Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200911181611.2073183-1-maz@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-12Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds
Pull OpenRISC fixes from Stafford Horne: "Fixes for compile issues pointed out by kbuild and one bug I found in initrd with the 5.9 patches" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: Fix issue with get_user for 64-bit values openrisc: Fix cache API compile issue when not inlining openrisc: Reserve memblock for initrd
2020-09-12KVM: emulator: more strict rsm checks.Maxim Levitsky
Don't ignore return values in rsm_load_state_64/32 to avoid loading invalid state from SMM state area if it was tampered with by the guest. This is primarly intended to avoid letting guest set bits in EFER (like EFER.SVME when nesting is disabled) by manipulating SMM save area. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12KVM: nSVM: more strict SMM checks when returning to nested guestMaxim Levitsky
* check that guest is 64 bit guest, otherwise the SVM related fields in the smm state area are not defined * If the SMM area indicates that SMM interrupted a running guest, check that EFER.SVME which is also saved in this area is set, otherwise the guest might have tampered with SMM save area, and so indicate emulation failure which should triple fault the guest. * Check that that guest CPUID supports SVM (due to the same issue as above) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12SVM: nSVM: setup nested msr permission bitmap on nested state loadMaxim Levitsky
This code was missing and was forcing the L2 run with L1's msr permission bitmap Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12SVM: nSVM: correctly restore GIF on vmexit from nesting after migrationMaxim Levitsky
Currently code in svm_set_nested_state copies the current vmcb control area to L1 control area (hsave->control), under assumption that it mostly reflects the defaults that kvm choose, and later qemu overrides these defaults with L2 state using standard KVM interfaces, like KVM_SET_REGS. However nested GIF (which is AMD specific thing) is by default is true, and it is copied to hsave area as such. This alone is not a big deal since on VMexit, GIF is always set to false, regardless of what it was on VM entry. However in nested_svm_vmexit we were first were setting GIF to false, but then we overwrite the control fields with value from the hsave area. (including the nested GIF field itself if GIF virtualization is enabled). Now on normal vm entry this is not a problem, since GIF is usually false prior to normal vm entry, and this is the value that copied to hsave, and then restored, but this is not always the case when the nested state is loaded as explained above. To fix this issue, move svm_set_gif after we restore the L1 control state in nested_svm_vmexit, so that even with wrong GIF in the saved L1 control area, we still clear GIF as the spec says. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12openrisc: Fix issue with get_user for 64-bit valuesStafford Horne
A build failure was raised by kbuild with the following error. drivers/android/binder.c: Assembler messages: drivers/android/binder.c:3861: Error: unrecognized keyword/register name `l.lwz ?ap,4(r24)' drivers/android/binder.c:3866: Error: unrecognized keyword/register name `l.addi ?ap,r0,0' The issue is with 64-bit get_user() calls on openrisc. I traced this to a problem where in the internally in the get_user macros there is a cast to long __gu_val this causes GCC to think the get_user call is 32-bit. This binder code is really long and GCC allocates register r30, which triggers the issue. The 64-bit get_user asm tries to get the 64-bit pair register, which for r30 overflows the general register names and returns the dummy register ?ap. The fix here is to move the temporary variables into the asm macros. We use a 32-bit __gu_tmp for 32-bit and smaller macro and a 64-bit tmp in the 64-bit macro. The cast in the 64-bit macro has a trick of casting through __typeof__((x)-(x)) which avoids the below warning. This was barrowed from riscv. arch/openrisc/include/asm/uaccess.h:240:8: warning: cast to pointer from integer of different size I tested this in a small unit test to check reading between 64-bit and 32-bit pointers to 64-bit and 32-bit values in all combinations. Also I ran make C=1 to confirm no new sparse warnings came up. It all looks clean to me. Link: https://lore.kernel.org/lkml/202008200453.ohnhqkjQ%25lkp@intel.com/ Signed-off-by: Stafford Horne <shorne@gmail.com> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-09-12x86/kvm: don't forget to ACK async PF IRQVitaly Kuznetsov
Merge commit 26d05b368a5c0 ("Merge branch 'kvm-async-pf-int' into HEAD") tried to adapt the new interrupt based async PF mechanism to the newly introduced IDTENTRY magic but unfortunately it missed the fact that DEFINE_IDTENTRY_SYSVEC() doesn't call ack_APIC_irq() on its own and all DEFINE_IDTENTRY_SYSVEC() users have to call it manually. As the result all multi-CPU KVM guest hang on boot when KVM_FEATURE_ASYNC_PF_INT is present. The breakage went unnoticed because no KVM userspace (e.g. QEMU) currently set it (and thus async PF mechanism is currently disabled) but we're about to change that. Fixes: 26d05b368a5c0 ("Merge branch 'kvm-async-pf-int' into HEAD") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-3-vkuznets@redhat.com> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macroVitaly Kuznetsov
DEFINE_IDTENTRY_SYSVEC() already contains irqentry_enter()/ irqentry_exit(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exitWanpeng Li
According to SDM 27.2.4, Event delivery causes an APIC-access VM exit. Don't report internal error and freeze guest when event delivery causes an APIC-access exit, it is handleable and the event will be re-injected during the next vmentry. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1597827327-25055-2-git-send-email-wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12KVM: SVM: avoid emulation with stale next_ripWanpeng Li
svm->next_rip is reset in svm_vcpu_run() only after calling svm_exit_handlers_fastpath(), which will cause SVM's skip_emulated_instruction() to write a stale RIP. We can move svm_exit_handlers_fastpath towards the end of svm_vcpu_run(). To align VMX with SVM, keep svm_complete_interrupts() close as well. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paul K. <kronenpj@kronenpj.dyndns.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Also move vmcb_mark_all_clean before any possible write to the VMCB. - Paolo]
2020-09-11RISC-V: Take text_mutex in ftrace_init_nop()Palmer Dabbelt
Without this we get lockdep failures. They're spurious failures as SMP isn't up when ftrace_init_nop() is called. As far as I can tell the easiest fix is to just take the lock, which also seems like the safest fix. Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Guo Ren <guoren@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-11KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_ENVitaly Kuznetsov
Even without in-kernel LAPIC we should allow writing '0' to MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In particular, QEMU with 'kernel-irqchip=off' fails to start a guest with qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0 Fixes: 9d3c447c72fb2 ("KVM: X86: Fix async pf caused null-ptr-deref") Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200911093147.484565-1-vkuznets@redhat.com> [Actually commit the version proposed by Sean Christopherson. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: SVM: Periodically schedule when unregistering regions on destroyDavid Rientjes
There may be many encrypted regions that need to be unregistered when a SEV VM is destroyed. This can lead to soft lockups. For example, on a host running 4.15: watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] CPU: 206 PID: 194348 Comm: t_virtual_machi RIP: 0010:free_unref_page_list+0x105/0x170 ... Call Trace: [<0>] release_pages+0x159/0x3d0 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] [<0>] kvm_arch_destroy_vm+0x47/0x200 [<0>] kvm_put_kvm+0x1a8/0x2f0 [<0>] kvm_vm_release+0x25/0x30 [<0>] do_exit+0x335/0xc10 [<0>] do_group_exit+0x3f/0xa0 [<0>] get_signal+0x1bc/0x670 [<0>] do_signal+0x31/0x130 Although the CLFLUSH is no longer issued on every encrypted region to be unregistered, there are no other changes that can prevent soft lockups for very large SEV VMs in the latest kernel. Periodically schedule if necessary. This still holds kvm->lock across the resched, but since this only happens when the VM is destroyed this is assumed to be acceptable. Signed-off-by: David Rientjes <rientjes@google.com> Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>