summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-11Documentation/kvm/arm: improve description of HVC_SOFT_RESTARTPingfan Liu
Besides disabling MMU, HVC_SOFT_RESTART also clears I+D bits. These behaviors are what kexec-reboot expects, so describe it more precisely. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: linux-doc@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/1598621998-20563-2-git-send-email-kernelfans@gmail.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11Merge tag 'regulator-fix-v5.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "The biggest set of fixes here is those from Michał Mirosław fixing some locking issues with coupled regulators that are triggered in cases where a coupled regulator is used by a device involved in fs_reclaim like eMMC storage. These are relatively serious for the affected systems, though the circumstances where they trigger are very rare" * tag 'regulator-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: pwm: Fix machine constraints application regulator: core: Fix slab-out-of-bounds in regulator_unlock_recursive() regulator: remove superfluous lock in regulator_resolve_coupling() regulator: cleanup regulator_ena_gpio_free() regulator: plug of_node leak in regulator_register()'s error path regulator: push allocation in set_consumer_device_supply() out of lock regulator: push allocations in create_regulator() outside of lock regulator: push allocation in regulator_ena_gpio_request() out of lock regulator: push allocation in regulator_init_coupling() outside of lock regulator: fix spelling mistake "Cant" -> "Can't" regulator: cros-ec-regulator: Add NULL test for devm_kmemdup call
2020-09-11arm64/relocate_kernel: remove redundant codePingfan Liu
Kernel startup entry point requires disabling MMU and D-cache. As for kexec-reboot, taking a close look at "msr sctlr_el1, x12" in __cpu_soft_restart as the following: -1. booted at EL1 The instruction is enough to disable MMU and I/D cache for EL1 regime. -2. booted at EL2, using VHE Access to SCTLR_EL1 is redirected to SCTLR_EL2 in EL2. So the instruction is enough to disable MMU and clear I+C bits for EL2 regime. -3. booted at EL2, not using VHE The instruction itself can not affect EL2 regime. But The hyp-stub doesn't enable the MMU and I/D cache for EL2 regime. And KVM also disable them for EL2 regime when its unloaded, or execute a HVC_SOFT_RESTART call. So when kexec-reboot, the code in KVM has prepare the requirement. As a conclusion, disabling MMU and clearing I+C bits in SYM_CODE_START(arm64_relocate_new_kernel) is redundant, and can be removed Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Remi Denis-Courmont <remi.denis.courmont@huawei.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/1598621998-20563-1-git-send-email-kernelfans@gmail.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_ENVitaly Kuznetsov
Even without in-kernel LAPIC we should allow writing '0' to MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In particular, QEMU with 'kernel-irqchip=off' fails to start a guest with qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0 Fixes: 9d3c447c72fb2 ("KVM: X86: Fix async pf caused null-ptr-deref") Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200911093147.484565-1-vkuznets@redhat.com> [Actually commit the version proposed by Sean Christopherson. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: SVM: Periodically schedule when unregistering regions on destroyDavid Rientjes
There may be many encrypted regions that need to be unregistered when a SEV VM is destroyed. This can lead to soft lockups. For example, on a host running 4.15: watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] CPU: 206 PID: 194348 Comm: t_virtual_machi RIP: 0010:free_unref_page_list+0x105/0x170 ... Call Trace: [<0>] release_pages+0x159/0x3d0 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] [<0>] kvm_arch_destroy_vm+0x47/0x200 [<0>] kvm_put_kvm+0x1a8/0x2f0 [<0>] kvm_vm_release+0x25/0x30 [<0>] do_exit+0x335/0xc10 [<0>] do_group_exit+0x3f/0xa0 [<0>] get_signal+0x1bc/0x670 [<0>] do_signal+0x31/0x130 Although the CLFLUSH is no longer issued on every encrypted region to be unregistered, there are no other changes that can prevent soft lockups for very large SEV VMs in the latest kernel. Periodically schedule if necessary. This still holds kvm->lock across the resched, but since this only happens when the VM is destroyed this is assumed to be acceptable. Signed-off-by: David Rientjes <rientjes@google.com> Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: MIPS: Change the definition of kvm typeHuacai Chen
MIPS defines two kvm types: #define KVM_VM_MIPS_TE 0 #define KVM_VM_MIPS_VZ 1 In Documentation/virt/kvm/api.rst it is said that "You probably want to use 0 as machine type", which implies that type 0 be the "automatic" or "default" type. And, in user-space libvirt use the null-machine (with type 0) to detect the kvm capability, which returns "KVM not supported" on a VZ platform. I try to fix it in QEMU but it is ugly: https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html And Thomas Huth suggests me to change the definition of kvm type: https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html So I define like this: #define KVM_VM_MIPS_AUTO 0 #define KVM_VM_MIPS_VZ 1 #define KVM_VM_MIPS_TE 2 Since VZ and TE cannot co-exists, using type 0 on a TE platform will still return success (so old user-space tools have no problems on new kernels); the advantage is that using type 0 on a VZ platform will not return failure. So, the only problem is "new user-space tools use type 2 on old kernels", but if we treat this as a kernel bug, we can backport this patch to old stable kernels. Signed-off-by: Huacai Chen <chenhc@lemote.com> Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11Merge tag 'mmc-v5.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - sdio: Restore ~20% performance drop for SDHCI drivers, by using mmc_pre_req() and mmc_post_req() for SDIO requests. MMC host: - sdhci-of-esdhc: Fix support for erratum eSDHC7 - mmc_spi: Allow the driver to be built when CONFIG_HAS_DMA is unset - sdhci-msm: Use retries to fix tuning - sdhci-acpi: Fix resume for eMMC HS400 mode" * tag 'mmc-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdio: Use mmc_pre_req() / mmc_post_req() mmc: sdhci-of-esdhc: Don't walk device-tree on every interrupt mmc: mmc_spi: Allow the driver to be built when CONFIG_HAS_DMA is unset mmc: sdhci-msm: Add retries when all tuning phases are found valid mmc: sdhci-acpi: Clear amd_sdhci_host on reset
2020-09-11kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when neededLai Jiangshan
When kvm_mmu_get_page() gets a page with unsynced children, the spt pagetable is unsynchronized with the guest pagetable. But the guest might not issue a "flush" operation on it when the pagetable entry is changed from zero or other cases. The hypervisor has the responsibility to synchronize the pagetables. KVM behaved as above for many years, But commit 8c8560b83390 ("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes") inadvertently included a line of code to change it without giving any reason in the changelog. It is clear that the commit's intention was to change KVM_REQ_TLB_FLUSH -> KVM_REQ_TLB_FLUSH_CURRENT, so we don't needlessly flush other contexts; however, one of the hunks changed a nearby KVM_REQ_MMU_SYNC instead. This patch changes it back. Link: https://lore.kernel.org/lkml/20200320212833.3507-26-sean.j.christopherson@intel.com/ Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20200902135421.31158-1-jiangshanlai@gmail.com> fixes: 8c8560b83390 ("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL controlChenyi Qiang
A minor fix for the update of VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL field in exit_ctls_high. Fixes: 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control") Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200828085622.8365-5-chenyi.qiang@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: fix memory leak in kvm_io_bus_unregister_dev()Rustam Kovhaev
when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing the bus, we should iterate over all other devices linked to it and call kvm_iodevice_destructor() for them Fixes: 90db10434b16 ("KVM: kvm_io_bus_unregister_dev() should never fail") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707 Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: Check the allocation of pv cpu maskHaiwei Li
check the allocation of per-cpu __pv_cpu_mask. Initialize ops only when successful. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <d59f05df-e6d3-3d31-a036-cc25a2b2f33f@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detectedPeter Shier
When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call load_pdptrs to read the possibly updated PDPTEs from the guest physical address referenced by CR3. It loads them into vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. At the subsequent assumed reentry into L2, the mmu will call vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works if the L2 CRn write intercept always resumes L2. The resume path calls vmx_check_nested_events which checks for exceptions, MTF, and expired VMX preemption timers. If vmx_check_nested_events finds any of these conditions pending it will reflect the corresponding exit into L1. Live migration at this point would also cause a missed immediate reentry into L2. After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale values stored at last L2 exit. A repro of this bug showed L2 entering triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values. When L2 is in PAE paging mode add a call to ept_load_pdptrs before leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in vcpu->arch.walk_mmu->pdptrs[]. Tested: kvm-unit-tests with new directed test: vmx_mtf_pdpte_test. Verified that test fails without the fix. Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix would repro readily. Ran 14 simultaneous L2s for 140 iterations with no failures. Signed-off-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200820230545.2411347-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11Merge tag 'kvmarm-fixes-5.9-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for Linux 5.9, take #1 - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values
2020-09-11Merge tag 'drm-fixes-2020-09-11' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes, not much a major amount. One thing though is Laurent fixed some Kconfig issues, and I'm carrying the rapidio kconfig change so the drm one for xlnx driver works. He hadn't got a response from rapidio maintainers. Otherwise, virtio, sun4i, tve200, ingenic have some fixes, one audio fix for i915 and a core docs fix. kconfig: - rapidio/xlnx kconfig fix core: - Documentation fix i915: - audio regression fix virtio: - Fix double free in virtio - Fix virtio unblank - Remove output->enabled from virtio, as it should use crtc_state sun4i: - Add missing put_device in sun4i, and other fixes - Handle sun4i alpha on lowest plane correctly tv200: - Fix tve200 enable/disable ingenic - Small ingenic fixes" * tag 'drm-fixes-2020-09-11' of git://anongit.freedesktop.org/drm/drm: drm/i915: fix regression leading to display audio probe failure on GLK drm: xlnx: dpsub: Fix DMADEVICES Kconfig dependency rapidio: Replace 'select' DMAENGINES 'with depends on' drm/virtio: drop virtio_gpu_output->enabled drm/sun4i: backend: Disable alpha on the lowest plane on the A20 drm/sun4i: backend: Support alpha property on lowest plane drm/sun4i: Fix DE2 YVU handling drm/tve200: Stabilize enable/disable dma-buf: fence-chain: Document missing dma_fence_chain_init() parameter in kerneldoc dma-buf: Fix kerneldoc of dma_buf_set_name() drm/virtio: fix unblank Documentation: fix dma-buf.rst underline length warning drm/sun4i: Fix dsi dcs long write function drm/ingenic: Fix driver not probing when IPU port is missing drm/ingenic: Fix leak of device_node pointer drm/sun4i: add missing put_device() call in sun8i_r40_tcon_tv_set_mux() drm/virtio: Revert "drm/virtio: Call the right shmem helpers"
2020-09-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A number of driver bug fixes and a few recent regressions: - Several bug fixes for bnxt_re. Crashing, incorrect data reported, and corruption on new HW - Memory leak and crash in rxe - Fix sysfs corruption in rxe if the netdev name is too long - Fix a crash on error unwind in the new cq_pool code - Fix kobject panics in rtrs by working device lifetime properly - Fix a data corruption bug in iser target related to misaligned buffers" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/isert: Fix unaligned immediate-data handling RDMA/rtrs-srv: Set .release function for rtrs srv device during device init RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx' RDMA/core: Fix reported speed and width RDMA/core: Fix unsafe linked list traversal after failing to allocate CQ RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds RDMA/bnxt_re: Fix driver crash on unaligned PSN entry address RDMA/bnxt_re: Restrict the max_gids to 256 RDMA/bnxt_re: Static NQ depth allocation RDMA/bnxt_re: Fix the qp table indexing RDMA/bnxt_re: Do not report transparent vlan from QP1 RDMA/mlx4: Read pkey table length instead of hardcoded value RDMA/rxe: Fix panic when calling kmem_cache_create() RDMA/rxe: Fix memleak in rxe_mem_init_user RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
2020-09-11gcov: add support for GCC 10.1Peter Oberparleiter
Using gcov to collect coverage data for kernels compiled with GCC 10.1 causes random malfunctions and kernel crashes. This is the result of a changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between the layout of the gcov_info structure created by GCC profiling code and the related structure used by the kernel. Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable config GCOV_KERNEL for use with GCC 10. Reported-by: Colin Ian King <colin.king@canonical.com> Reported-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Tested-and-Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-11arm64: Remove the unused include statementsTian Tao
linux/arm-smccc.h is included more than once, Remove the one that isn't necessary. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/1599643682-10404-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11Merge tag 'asoc-fix-v5.9-rc4' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.9 Most of this is various driver specific fixes, none of which are terribly exciting in themselves, plus one core fix adding and using a new DAI lookup function to deal with a lockdep warning.
2020-09-11arm64/mm: Unify CONT_PMD_SHIFTGavin Shan
Similar to how CONT_PTE_SHIFT is determined, this introduces a new kernel option (CONFIG_CONT_PMD_SHIFT) to determine CONT_PMD_SHIFT. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200910095936.20307-3-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Unify CONT_PTE_SHIFTGavin Shan
CONT_PTE_SHIFT actually depends on CONFIG_ARM64_CONT_SHIFT. It's reasonable to reflect the dependency: * This renames CONFIG_ARM64_CONT_SHIFT to CONFIG_ARM64_CONT_PTE_SHIFT, so that we can introduce CONFIG_ARM64_CONT_PMD_SHIFT later. * CONT_{SHIFT, SIZE, MASK}, defined in page-def.h are removed as they are not used by anyone. * CONT_PTE_SHIFT is determined by CONFIG_ARM64_CONT_PTE_SHIFT. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200910095936.20307-2-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Remove CONT_RANGE_OFFSETGavin Shan
The macro was introduced by commit <ecf35a237a85> ("arm64: PTE/PMD contiguous bit definition") at the beginning. It's only used by commit <348a65cdcbbf> ("arm64: Mark kernel page ranges contiguous"), which was reverted later by commit <667c27597ca8>. This makes the macro unused. This removes the unused macro (CONT_RANGE_OFFSET). Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200910095936.20307-1-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitionsAnshuman Khandual
HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are scanned for /proc/cpuinfo are detached from their bit definitions making it vulnerable and difficult to correlate. It is also bit problematic because during /proc/cpuinfo dump these arrays get traversed sequentially assuming they reflect and match actual HWCAP bit sequence, to test various features for a given CPU. This redefines name arrays per their HWCAP bit definitions . It also warns after detecting any feature which is not expected on arm64. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599630535-29337-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11Merge branch 'powercap'Rafael J. Wysocki
* powercap: powercap: make documentation reflect code powercap/intel_rapl: add support for AlderLake powercap/intel_rapl: add support for RocketLake powercap/intel_rapl: add support for TigerLake Desktop
2020-09-11arm64/mm: Enable THP migrationAnshuman Khandual
In certain page migration situations, a THP page can be migrated without being split into it's constituent subpages. This saves time required to split a THP and put it back together when required. But it also saves an wider address range translation covered by a single TLB entry, reducing future page fault costs. A previous patch changed platform THP helpers per generic memory semantics, clearing the path for THP migration support. This adds two more THP helpers required to create PMD migration swap entries. Now enable THP migration via ARCH_ENABLE_THP_MIGRATION. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599627183-14453-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11arm64/mm: Change THP helpers to comply with generic MM semanticsAnshuman Khandual
pmd_present() and pmd_trans_huge() are expected to behave in the following manner during various phases of a given PMD. It is derived from a previous detailed discussion on this topic [1] and present THP documentation [2]. pmd_present(pmd): - Returns true if pmd refers to system RAM with a valid pmd_page(pmd) - Returns false if pmd refers to a migration or swap entry pmd_trans_huge(pmd): - Returns true if pmd refers to system RAM and is a trans huge mapping ------------------------------------------------------------------------- | PMD states | pmd_present | pmd_trans_huge | ------------------------------------------------------------------------- | Mapped | Yes | Yes | ------------------------------------------------------------------------- | Splitting | Yes | Yes | ------------------------------------------------------------------------- | Migration/Swap | No | No | ------------------------------------------------------------------------- The problem: PMD is first invalidated with pmdp_invalidate() before it's splitting. This invalidation clears PMD_SECT_VALID as below. PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re- affirm pmd_present() as true during the THP split process. To comply with above mentioned semantics, pmd_trans_huge() should also check pmd_present() first before testing presence of an actual transparent huge mapping. The solution: Ideally PMD_TYPE_SECT should have been used here instead. But it shares the bit position with PMD_SECT_VALID which is used for THP invalidation. Hence it will not be there for pmd_present() check after pmdp_invalidate(). A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD entry during invalidation which can help pmd_present() return true and in recognizing the fact that it still points to memory. This bit is transient. During the split process it will be overridden by a page table page representing normal pages in place of erstwhile huge page. Other pmdp_invalidate() callers always write a fresh PMD value on the entry overriding this transient PMD_PRESENT_INVALID bit, which makes it safe. [1]: https://lkml.org/lkml/2018/10/17/231 [2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599627183-14453-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-11spi: spi-fsl-dspi: use XSPI mode instead of DMA for DPAA2 SoCsVladimir Oltean
The arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi device tree lacks DMA channels for DSPI, so naturally, the driver fails to probe: [ 2.945302] fsl-dspi 2100000.spi: rx dma channel not available [ 2.951134] fsl-dspi 2100000.spi: can't get dma channels In retrospect, this should have been obvious, because LS2080A, LS2085A LS2088A and LX2160A don't appear to have an eDMA module at all. Looking again at their datasheets, the CTARE register (which is specific to XSPI functionality) seems to be documented, so switch them to XSPI mode instead. Fixes: 0feaf8f5afe0 ("spi: spi-fsl-dspi: Convert the instantiations that support it to DMA") Reported-by: Qiang Zhao <qiang.zhao@nxp.com> Tested-by: Qiang Zhao <qiang.zhao@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20200910121532.1138596-1-olteanv@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-09-11x86/mce: Make mce_rdmsrl() panic on an inaccessible MSRBorislav Petkov
If an exception needs to be handled while reading an MSR - which is in most of the cases caused by a #GP on a non-existent MSR - then this is most likely the incarnation of a BIOS or a hardware bug. Such bug violates the architectural guarantee that MCA banks are present with all MSRs belonging to them. The proper fix belongs in the hardware/firmware - not in the kernel. Handling an #MC exception which is raised while an NMI is being handled would cause the nasty NMI nesting issue because of the shortcoming of IRET of reenabling NMIs when executed. And the machine is in an #MC context already so <Deity> be at its side. Tracing MSR accesses while in #MC is another no-no due to tracing being inherently a bad idea in atomic context: vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section so remove all that "additional" functionality from mce_rdmsrl() and provide it with a special exception handler which panics the machine when that MSR is not accessible. The exception handler prints a human-readable message explaining what the panic reason is but, what is more, it panics while in the #GP handler and latter won't have executed an IRET, thus opening the NMI nesting issue in the case when the #MC has happened while handling an NMI. (#MC itself won't be reenabled until MCG_STATUS hasn't been cleared). Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> [ Add missing prototypes for ex_handler_* ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200906212130.GA28456@zn.tnic
2020-09-10docs/bpf: Fix ringbuf documentationAndrii Nakryiko
Remove link to litmus tests that didn't make it to upstream. Fix ringbuf benchmark link. I wasn't able to test this with `make htmldocs`, unfortunately, because of Sphinx dependencies. But bench_ringbufs.c path is certainly correct now. Fixes: 97abb2b39682 ("docs/bpf: Add BPF ring buffer design notes") Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200910225245.2896991-1-andriin@fb.com
2020-09-11Merge tag 'drm-misc-fixes-2020-09-09' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.9-rc5: - Fix double free in virtio. - Add missing put_device in sun4i, and other fixes. - Small ingenic fixes. - Handle sun4i alpha on lowest plane correctly. - Remove output->enabled from virtio, as it should use crtc_state. - Fix tve200 enable/disable. - Documentation fix. - Fix virtio unblank. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/478b49d1-b1b3-c983-7056-8a89249be435@mblankhorst.nl
2020-09-11Merge tag 'drm-intel-fixes-2020-09-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.9-rc5: - Fix regression leading to audio probe failure Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/875z8m2hss.fsf@intel.com
2020-09-10net: dec: de2104x: Increase receive ring size for TulipLucy Yan
Increase Rx ring size to address issue where hardware is reaching the receive work limit. Before: [ 102.223342] de2104x 0000:17:00.0 eth0: rx work limit reached [ 102.245695] de2104x 0000:17:00.0 eth0: rx work limit reached [ 102.251387] de2104x 0000:17:00.0 eth0: rx work limit reached [ 102.267444] de2104x 0000:17:00.0 eth0: rx work limit reached Signed-off-by: Lucy Yan <lucyyan@google.com> Reviewed-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10netlink: fix doc about nlmsg_parse/nla_validateNicolas Dichtel
There is no @validate argument. CC: Johannes Berg <johannes.berg@intel.com> Fixes: 3de644035446 ("netlink: re-add parse/validate functions in strict mode") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: DCB: Validate DCB_ATTR_DCB_BUFFER argumentPetr Machata
The parameter passed via DCB_ATTR_DCB_BUFFER is a struct dcbnl_buffer. The field prio2buffer is an array of IEEE_8021Q_MAX_PRIORITIES bytes, where each value is a number of a buffer to direct that priority's traffic to. That value is however never validated to lie within the bounds set by DCBX_MAX_BUFFERS. The only driver that currently implements the callback is mlx5 (maintainers CCd), and that does not do any validation either, in particual allowing incorrect configuration if the prio2buffer value does not fit into 4 bits. Instead of offloading the need to validate the buffer index to drivers, do it right there in core, and bounce the request if the value is too large. CC: Parav Pandit <parav@nvidia.com> CC: Saeed Mahameed <saeedm@nvidia.com> Fixes: e549f6f9c098 ("net/dcb: Add dcbnl buffer attribute") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge branch 'net-Fix-bridge-enslavement-failure'David S. Miller
Ido Schimmel says: ==================== net: Fix bridge enslavement failure Patch #1 fixes an issue in which an upper netdev cannot be enslaved to a bridge when it has multiple netdevs with different parent identifiers beneath it. Patch #2 adds a test case using two netdevsim instances. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10selftests: rtnetlink: Test bridge enslavement with different parent IDsIdo Schimmel
Test that an upper device of netdevs with different parent IDs can be enslaved to a bridge. The test fails without previous commit. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: Fix bridge enslavement failureIdo Schimmel
When a netdev is enslaved to a bridge, its parent identifier is queried. This is done so that packets that were already forwarded in hardware will not be forwarded again by the bridge device between netdevs belonging to the same hardware instance. The operation fails when the netdev is an upper of netdevs with different parent identifiers. Instead of failing the enslavement, have dev_get_port_parent_id() return '-EOPNOTSUPP' which will signal the bridge to skip the query operation. Other callers of the function are not affected by this change. Fixes: 7e1146e8c10c ("net: devlink: introduce devlink_compat_switch_id_get() helper") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: mvneta: fix possible use-after-free in mvneta_xdp_put_buffLorenzo Bianconi
Release first buffer as last one since it contains references to subsequent fragments. This code will be optimized introducing multi-buffer bit in xdp_buff structure. Fixes: ca0e014609f05 ("net: mvneta: move skb build after descriptors processing") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10epoll: EPOLL_CTL_ADD: close the race in decision to take fast pathAl Viro
Checking for the lack of epitems refering to the epoll we want to insert into is not enough; we might have an insertion of that epoll into another one that has already collected the set of files to recheck for excessive reverse paths, but hasn't gotten to creating/inserting the epitem for it. However, any such insertion in progress can be detected - it will update the generation count in our epoll when it's done looking through it for files to check. That gets done under ->mtx of our epoll and that allows us to detect that safely. We are *not* holding epmutex here, so the generation count is not stable. However, since both the update of ep->gen by loop check and (later) insertion into ->f_ep_link are done with ep->mtx held, we are fine - the sequence is grab epmutex bump loop_check_gen ... grab tep->mtx // 1 tep->gen = loop_check_gen ... drop tep->mtx // 2 ... grab tep->mtx // 3 ... insert into ->f_ep_link ... drop tep->mtx // 4 bump loop_check_gen drop epmutex and if the fastpath check in another thread happens for that eventpoll, it can come * before (1) - in that case fastpath is just fine * after (4) - we'll see non-empty ->f_ep_link, slow path taken * between (2) and (3) - loop_check_gen is stable, with ->mtx providing barriers and we end up taking slow path. Note that ->f_ep_link emptiness check is slightly racy - we are protected against insertions into that list, but removals can happen right under us. Not a problem - in the worst case we'll end up taking a slow path for no good reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-10clk: qcom: lpass: Correct goto target in lpass_core_sc7180_probe()Jing Xiangfeng
lpass_core_sc7180_probe() misses to call pm_clk_destroy() and pm_runtime_disable() in error paths. Correct goto target to fix it. This issue is found by code inspection. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Link: https://lore.kernel.org/r/20200827141629.101802-1-jingxiangfeng@huawei.com Fixes: edab812d802d ("clk: qcom: lpass: Add support for LPASS clock controller for SC7180") Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-09-10s390/qeth: delay draining the TX buffersJulian Wiedmann
Wait until the QDIO data connection is severed. Otherwise the device might still be processing the buffers, and end up accessing skb data that we already freed. Fixes: 8b5026bc1693 ("s390/qeth: fix qdio teardown after early init error") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: Fix broken NETIF_F_CSUM_MASK spell in netdev_features.hMiaohe Lin
Remove the weird space inside the NETIF_F_CSUM_MASK. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: Correct the comment of dst_dev_put()Miaohe Lin
Since commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries"), we use blackhole_netdev to invalidate dst entries instead of loopback device anymore. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10Merge tag 'f2fs-for-5.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Small bug fixes for: - SMR drive fix - infinite loop when building free node ids - EOF at DIO read" * tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: Return EOF on unaligned end of file DIO read f2fs: fix indefinite loop scanning for free nid f2fs: Fix type of section block count variables
2020-09-10hdlc_ppp: add range checks in ppp_cp_parse_cr()Dan Carpenter
There are a couple bugs here: 1) If opt[1] is zero then this results in a forever loop. If the value is less than 2 then it is invalid. 2) It assumes that "len" is more than sizeof(valid_accm) or 6 which can result in memory corruption. In the case of LCP_OPTION_ACCM, then we should check "opt[1]" instead of "len" because, if "opt[1]" is less than sizeof(valid_accm) then "nak_len" gets out of sync and it can lead to memory corruption in the next iterations through the loop. In case of LCP_OPTION_MAGIC, the only valid value for opt[1] is 6, but the code is trying to log invalid data so we should only discard the data when "len" is less than 6 because that leads to a read overflow. Reported-by: ChenNan Of Chaitin Security Research Lab <whutchennan@gmail.com> Fixes: e022c2f07ae5 ("WAN: new synchronous PPP implementation for generic HDLC.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: phy: call phy_disable_interrupts() in phy_attach_direct() insteadYoshihiro Shimoda
Since the micrel phy driver calls phy_init_hw() as a workaround, the commit 9886a4dbd2aa ("net: phy: call phy_disable_interrupts() in phy_init_hw()") disables the interrupt unexpectedly. So, call phy_disable_interrupts() in phy_attach_direct() instead. Otherwise, the phy cannot link up after the ethernet cable was disconnected. Note that other drivers (like at803x.c) also calls phy_init_hw(). So, perhaps, the driver caused a similar issue too. Fixes: 9886a4dbd2aa ("net: phy: call phy_disable_interrupts() in phy_init_hw()") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10hv_netvsc: Cache the current data path to avoid duplicate call and messageDexuan Cui
The previous change "hv_netvsc: Switch the data path at the right time during hibernation" adds the call of netvsc_vf_changed() upon NETDEV_CHANGE, so it's necessary to avoid the duplicate call and message when the VF is brought UP or DOWN. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10hv_netvsc: Switch the data path at the right time during hibernationDexuan Cui
When netvsc_resume() is called, the mlx5 VF NIC has not been resumed yet, so in the future the host might sliently fail the call netvsc_vf_changed() -> netvsc_switch_datapath() there, even if the call works now. Call netvsc_vf_changed() in the NETDEV_CHANGE event handler: at that time the mlx5 VF NIC has been resumed. Fixes: 19162fd4063a ("hv_netvsc: Fix hibernation for mlx5 VF driver") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: sch_generic: aviod concurrent reset and enqueue op for lockless qdiscYunsheng Lin
Currently there is concurrent reset and enqueue operation for the same lockless qdisc when there is no lock to synchronize the q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in qdisc_deactivate() called by dev_deactivate_queue(), which may cause out-of-bounds access for priv->ring[] in hns3 driver if user has requested a smaller queue num when __dev_xmit_skb() still enqueue a skb with a larger queue_mapping after the corresponding qdisc is reset, and call hns3_nic_net_xmit() with that skb later. Reused the existing synchronize_net() in dev_deactivate_many() to make sure skb with larger queue_mapping enqueued to old qdisc(which is saved in dev_queue->qdisc_sleeping) will always be reset when dev_reset_queue() is called. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net: dsa: microchip: look for phy-mode in port nodesHelmut Grohne
Documentation/devicetree/bindings/net/dsa/dsa.txt says that the phy-mode property should be specified on port nodes. However, the microchip drivers read it from the switch node. Let the driver use the per-port property and fall back to the old location with a warning. Fix in-tree users. Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de> Link: https://lore.kernel.org/netdev/20200617082235.GA1523@laureti-dev/ Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10mptcp: fix kmalloc flag in mptcp_pm_nl_get_local_idGeliang Tang
mptcp_pm_nl_get_local_id may be called in interrupt context, so we need to use GFP_ATOMIC flag to allocate memory to avoid sleeping in atomic context. [ 280.209809] BUG: sleeping function called from invalid context at mm/slab.h:498 [ 280.209812] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1680, name: kworker/1:3 [ 280.209814] INFO: lockdep is turned off. [ 280.209816] CPU: 1 PID: 1680 Comm: kworker/1:3 Tainted: G W 5.9.0-rc3-mptcp+ #146 [ 280.209818] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 280.209820] Workqueue: events mptcp_worker [ 280.209822] Call Trace: [ 280.209824] <IRQ> [ 280.209826] dump_stack+0x77/0xa0 [ 280.209829] ___might_sleep.cold+0xa6/0xb6 [ 280.209832] kmem_cache_alloc_trace+0x1d1/0x290 [ 280.209835] mptcp_pm_nl_get_local_id+0x23c/0x410 [ 280.209840] subflow_init_req+0x1e9/0x2ea [ 280.209843] ? inet_reqsk_alloc+0x1c/0x120 [ 280.209845] ? kmem_cache_alloc+0x264/0x290 [ 280.209849] tcp_conn_request+0x303/0xae0 [ 280.209854] ? printk+0x53/0x6a [ 280.209857] ? tcp_rcv_state_process+0x28f/0x1374 [ 280.209859] tcp_rcv_state_process+0x28f/0x1374 [ 280.209864] ? tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209866] tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209869] tcp_v4_rcv+0xed6/0xfa0 [ 280.209873] ip_protocol_deliver_rcu+0x28/0x270 [ 280.209875] ip_local_deliver_finish+0x89/0x120 [ 280.209877] ip_local_deliver+0x180/0x220 [ 280.209881] ip_rcv+0x166/0x210 [ 280.209885] __netif_receive_skb_one_core+0x82/0x90 [ 280.209888] process_backlog+0xd6/0x230 [ 280.209891] net_rx_action+0x13a/0x410 [ 280.209895] __do_softirq+0xcf/0x468 [ 280.209899] asm_call_on_stack+0x12/0x20 [ 280.209901] </IRQ> [ 280.209903] ? ip_finish_output2+0x240/0x9a0 [ 280.209906] do_softirq_own_stack+0x4d/0x60 [ 280.209908] do_softirq.part.0+0x2b/0x60 [ 280.209911] __local_bh_enable_ip+0x9a/0xa0 [ 280.209913] ip_finish_output2+0x264/0x9a0 [ 280.209916] ? rcu_read_lock_held+0x4d/0x60 [ 280.209920] ? ip_output+0x7a/0x250 [ 280.209922] ip_output+0x7a/0x250 [ 280.209925] ? __ip_finish_output+0x330/0x330 [ 280.209928] __ip_queue_xmit+0x1dc/0x5a0 [ 280.209931] __tcp_transmit_skb+0xa0f/0xc70 [ 280.209937] tcp_connect+0xb03/0xff0 [ 280.209939] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209942] ? ktime_get_with_offset+0x125/0x150 [ 280.209944] ? trace_hardirqs_on+0x1c/0xe0 [ 280.209948] tcp_v4_connect+0x449/0x550 [ 280.209953] __inet_stream_connect+0xbb/0x320 [ 280.209955] ? mark_held_locks+0x49/0x70 [ 280.209958] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209960] ? __local_bh_enable_ip+0x6b/0xa0 [ 280.209963] inet_stream_connect+0x32/0x50 [ 280.209966] __mptcp_subflow_connect+0x1fd/0x242 [ 280.209972] mptcp_pm_create_subflow_or_signal_addr+0x2db/0x600 [ 280.209975] mptcp_worker+0x543/0x7a0 [ 280.209980] process_one_work+0x26d/0x5b0 [ 280.209984] ? process_one_work+0x5b0/0x5b0 [ 280.209987] worker_thread+0x48/0x3d0 [ 280.209990] ? process_one_work+0x5b0/0x5b0 [ 280.209993] kthread+0x117/0x150 [ 280.209996] ? kthread_park+0x80/0x80 [ 280.209998] ret_from_fork+0x22/0x30 Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>