Age | Commit message (Collapse) | Author |
|
Besides disabling MMU, HVC_SOFT_RESTART also clears I+D bits. These behaviors
are what kexec-reboot expects, so describe it more precisely.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-doc@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/1598621998-20563-2-git-send-email-kernelfans@gmail.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"The biggest set of fixes here is those from Michał Mirosław fixing
some locking issues with coupled regulators that are triggered in
cases where a coupled regulator is used by a device involved in
fs_reclaim like eMMC storage.
These are relatively serious for the affected systems, though the
circumstances where they trigger are very rare"
* tag 'regulator-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pwm: Fix machine constraints application
regulator: core: Fix slab-out-of-bounds in regulator_unlock_recursive()
regulator: remove superfluous lock in regulator_resolve_coupling()
regulator: cleanup regulator_ena_gpio_free()
regulator: plug of_node leak in regulator_register()'s error path
regulator: push allocation in set_consumer_device_supply() out of lock
regulator: push allocations in create_regulator() outside of lock
regulator: push allocation in regulator_ena_gpio_request() out of lock
regulator: push allocation in regulator_init_coupling() outside of lock
regulator: fix spelling mistake "Cant" -> "Can't"
regulator: cros-ec-regulator: Add NULL test for devm_kmemdup call
|
|
Kernel startup entry point requires disabling MMU and D-cache.
As for kexec-reboot, taking a close look at "msr sctlr_el1, x12" in
__cpu_soft_restart as the following:
-1. booted at EL1
The instruction is enough to disable MMU and I/D cache for
EL1 regime.
-2. booted at EL2, using VHE
Access to SCTLR_EL1 is redirected to SCTLR_EL2 in EL2. So the instruction
is enough to disable MMU and clear I+C bits for EL2 regime.
-3. booted at EL2, not using VHE
The instruction itself can not affect EL2 regime. But The hyp-stub doesn't
enable the MMU and I/D cache for EL2 regime. And KVM also disable them for EL2
regime when its unloaded, or execute a HVC_SOFT_RESTART call. So when
kexec-reboot, the code in KVM has prepare the requirement.
As a conclusion, disabling MMU and clearing I+C bits in
SYM_CODE_START(arm64_relocate_new_kernel) is redundant, and can be removed
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Remi Denis-Courmont <remi.denis.courmont@huawei.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/1598621998-20563-1-git-send-email-kernelfans@gmail.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Even without in-kernel LAPIC we should allow writing '0' to
MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In
particular, QEMU with 'kernel-irqchip=off' fails to start
a guest with
qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0
Fixes: 9d3c447c72fb2 ("KVM: X86: Fix async pf caused null-ptr-deref")
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200911093147.484565-1-vkuznets@redhat.com>
[Actually commit the version proposed by Sean Christopherson. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There may be many encrypted regions that need to be unregistered when a
SEV VM is destroyed. This can lead to soft lockups. For example, on a
host running 4.15:
watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
CPU: 206 PID: 194348 Comm: t_virtual_machi
RIP: 0010:free_unref_page_list+0x105/0x170
...
Call Trace:
[<0>] release_pages+0x159/0x3d0
[<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
[<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
[<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
[<0>] kvm_arch_destroy_vm+0x47/0x200
[<0>] kvm_put_kvm+0x1a8/0x2f0
[<0>] kvm_vm_release+0x25/0x30
[<0>] do_exit+0x335/0xc10
[<0>] do_group_exit+0x3f/0xa0
[<0>] get_signal+0x1bc/0x670
[<0>] do_signal+0x31/0x130
Although the CLFLUSH is no longer issued on every encrypted region to be
unregistered, there are no other changes that can prevent soft lockups for
very large SEV VMs in the latest kernel.
Periodically schedule if necessary. This still holds kvm->lock across the
resched, but since this only happens when the VM is destroyed this is
assumed to be acceptable.
Signed-off-by: David Rientjes <rientjes@google.com>
Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
MIPS defines two kvm types:
#define KVM_VM_MIPS_TE 0
#define KVM_VM_MIPS_VZ 1
In Documentation/virt/kvm/api.rst it is said that "You probably want to
use 0 as machine type", which implies that type 0 be the "automatic" or
"default" type. And, in user-space libvirt use the null-machine (with
type 0) to detect the kvm capability, which returns "KVM not supported"
on a VZ platform.
I try to fix it in QEMU but it is ugly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html
And Thomas Huth suggests me to change the definition of kvm type:
https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html
So I define like this:
#define KVM_VM_MIPS_AUTO 0
#define KVM_VM_MIPS_VZ 1
#define KVM_VM_MIPS_TE 2
Since VZ and TE cannot co-exists, using type 0 on a TE platform will
still return success (so old user-space tools have no problems on new
kernels); the advantage is that using type 0 on a VZ platform will not
return failure. So, the only problem is "new user-space tools use type
2 on old kernels", but if we treat this as a kernel bug, we can backport
this patch to old stable kernels.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- sdio: Restore ~20% performance drop for SDHCI drivers, by using
mmc_pre_req() and mmc_post_req() for SDIO requests.
MMC host:
- sdhci-of-esdhc: Fix support for erratum eSDHC7
- mmc_spi: Allow the driver to be built when CONFIG_HAS_DMA is unset
- sdhci-msm: Use retries to fix tuning
- sdhci-acpi: Fix resume for eMMC HS400 mode"
* tag 'mmc-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdio: Use mmc_pre_req() / mmc_post_req()
mmc: sdhci-of-esdhc: Don't walk device-tree on every interrupt
mmc: mmc_spi: Allow the driver to be built when CONFIG_HAS_DMA is unset
mmc: sdhci-msm: Add retries when all tuning phases are found valid
mmc: sdhci-acpi: Clear amd_sdhci_host on reset
|
|
When kvm_mmu_get_page() gets a page with unsynced children, the spt
pagetable is unsynchronized with the guest pagetable. But the
guest might not issue a "flush" operation on it when the pagetable
entry is changed from zero or other cases. The hypervisor has the
responsibility to synchronize the pagetables.
KVM behaved as above for many years, But commit 8c8560b83390
("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes")
inadvertently included a line of code to change it without giving any
reason in the changelog. It is clear that the commit's intention was to
change KVM_REQ_TLB_FLUSH -> KVM_REQ_TLB_FLUSH_CURRENT, so we don't
needlessly flush other contexts; however, one of the hunks changed
a nearby KVM_REQ_MMU_SYNC instead. This patch changes it back.
Link: https://lore.kernel.org/lkml/20200320212833.3507-26-sean.j.christopherson@intel.com/
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20200902135421.31158-1-jiangshanlai@gmail.com>
fixes: 8c8560b83390 ("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A minor fix for the update of VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL field
in exit_ctls_high.
Fixes: 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
VM-{Entry,Exit} control")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200828085622.8365-5-chenyi.qiang@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them
Fixes: 90db10434b16 ("KVM: kvm_io_bus_unregister_dev() should never fail")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
check the allocation of per-cpu __pv_cpu_mask. Initialize ops only when
successful.
Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Message-Id: <d59f05df-e6d3-3d31-a036-cc25a2b2f33f@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
load_pdptrs to read the possibly updated PDPTEs from the guest
physical address referenced by CR3. It loads them into
vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
vcpu->arch.regs_dirty.
At the subsequent assumed reentry into L2, the mmu will call
vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
if the L2 CRn write intercept always resumes L2.
The resume path calls vmx_check_nested_events which checks for
exceptions, MTF, and expired VMX preemption timers. If
vmx_check_nested_events finds any of these conditions pending it will
reflect the corresponding exit into L1. Live migration at this point
would also cause a missed immediate reentry into L2.
After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next
resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
values stored at last L2 exit. A repro of this bug showed L2 entering
triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.
When L2 is in PAE paging mode add a call to ept_load_pdptrs before
leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
vcpu->arch.walk_mmu->pdptrs[].
Tested:
kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
Verified that test fails without the fix.
Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
failures.
Signed-off-by: Peter Shier <pshier@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200820230545.2411347-1-pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for Linux 5.9, take #1
- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
(dirty logging, for example)
- Fix tracing output of 64bit values
|
|
Pull drm fixes from Dave Airlie:
"Regular fixes, not much a major amount. One thing though is Laurent
fixed some Kconfig issues, and I'm carrying the rapidio kconfig change
so the drm one for xlnx driver works. He hadn't got a response from
rapidio maintainers.
Otherwise, virtio, sun4i, tve200, ingenic have some fixes, one audio
fix for i915 and a core docs fix.
kconfig:
- rapidio/xlnx kconfig fix
core:
- Documentation fix
i915:
- audio regression fix
virtio:
- Fix double free in virtio
- Fix virtio unblank
- Remove output->enabled from virtio, as it should use crtc_state
sun4i:
- Add missing put_device in sun4i, and other fixes
- Handle sun4i alpha on lowest plane correctly
tv200:
- Fix tve200 enable/disable
ingenic
- Small ingenic fixes"
* tag 'drm-fixes-2020-09-11' of git://anongit.freedesktop.org/drm/drm:
drm/i915: fix regression leading to display audio probe failure on GLK
drm: xlnx: dpsub: Fix DMADEVICES Kconfig dependency
rapidio: Replace 'select' DMAENGINES 'with depends on'
drm/virtio: drop virtio_gpu_output->enabled
drm/sun4i: backend: Disable alpha on the lowest plane on the A20
drm/sun4i: backend: Support alpha property on lowest plane
drm/sun4i: Fix DE2 YVU handling
drm/tve200: Stabilize enable/disable
dma-buf: fence-chain: Document missing dma_fence_chain_init() parameter in kerneldoc
dma-buf: Fix kerneldoc of dma_buf_set_name()
drm/virtio: fix unblank
Documentation: fix dma-buf.rst underline length warning
drm/sun4i: Fix dsi dcs long write function
drm/ingenic: Fix driver not probing when IPU port is missing
drm/ingenic: Fix leak of device_node pointer
drm/sun4i: add missing put_device() call in sun8i_r40_tcon_tv_set_mux()
drm/virtio: Revert "drm/virtio: Call the right shmem helpers"
|
|
Pull rdma fixes from Jason Gunthorpe:
"A number of driver bug fixes and a few recent regressions:
- Several bug fixes for bnxt_re. Crashing, incorrect data reported,
and corruption on new HW
- Memory leak and crash in rxe
- Fix sysfs corruption in rxe if the netdev name is too long
- Fix a crash on error unwind in the new cq_pool code
- Fix kobject panics in rtrs by working device lifetime properly
- Fix a data corruption bug in iser target related to misaligned
buffers"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/isert: Fix unaligned immediate-data handling
RDMA/rtrs-srv: Set .release function for rtrs srv device during device init
RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'
RDMA/core: Fix reported speed and width
RDMA/core: Fix unsafe linked list traversal after failing to allocate CQ
RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds
RDMA/bnxt_re: Fix driver crash on unaligned PSN entry address
RDMA/bnxt_re: Restrict the max_gids to 256
RDMA/bnxt_re: Static NQ depth allocation
RDMA/bnxt_re: Fix the qp table indexing
RDMA/bnxt_re: Do not report transparent vlan from QP1
RDMA/mlx4: Read pkey table length instead of hardcoded value
RDMA/rxe: Fix panic when calling kmem_cache_create()
RDMA/rxe: Fix memleak in rxe_mem_init_user
RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars
RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
|
|
Using gcov to collect coverage data for kernels compiled with GCC 10.1
causes random malfunctions and kernel crashes. This is the result of a
changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between
the layout of the gcov_info structure created by GCC profiling code and
the related structure used by the kernel.
Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable
config GCOV_KERNEL for use with GCC 10.
Reported-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Tested-and-Acked-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
linux/arm-smccc.h is included more than once, Remove the one that isn't
necessary.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/1599643682-10404-1-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.9
Most of this is various driver specific fixes, none of which are
terribly exciting in themselves, plus one core fix adding and using a
new DAI lookup function to deal with a lockdep warning.
|
|
Similar to how CONT_PTE_SHIFT is determined, this introduces a new
kernel option (CONFIG_CONT_PMD_SHIFT) to determine CONT_PMD_SHIFT.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-3-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
CONT_PTE_SHIFT actually depends on CONFIG_ARM64_CONT_SHIFT. It's
reasonable to reflect the dependency:
* This renames CONFIG_ARM64_CONT_SHIFT to CONFIG_ARM64_CONT_PTE_SHIFT,
so that we can introduce CONFIG_ARM64_CONT_PMD_SHIFT later.
* CONT_{SHIFT, SIZE, MASK}, defined in page-def.h are removed as they
are not used by anyone.
* CONT_PTE_SHIFT is determined by CONFIG_ARM64_CONT_PTE_SHIFT.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-2-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The macro was introduced by commit <ecf35a237a85> ("arm64: PTE/PMD
contiguous bit definition") at the beginning. It's only used by
commit <348a65cdcbbf> ("arm64: Mark kernel page ranges contiguous"),
which was reverted later by commit <667c27597ca8>. This makes the
macro unused.
This removes the unused macro (CONT_RANGE_OFFSET).
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are
scanned for /proc/cpuinfo are detached from their bit definitions making it
vulnerable and difficult to correlate. It is also bit problematic because
during /proc/cpuinfo dump these arrays get traversed sequentially assuming
they reflect and match actual HWCAP bit sequence, to test various features
for a given CPU. This redefines name arrays per their HWCAP bit definitions
. It also warns after detecting any feature which is not expected on arm64.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599630535-29337-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
* powercap:
powercap: make documentation reflect code
powercap/intel_rapl: add support for AlderLake
powercap/intel_rapl: add support for RocketLake
powercap/intel_rapl: add support for TigerLake Desktop
|
|
In certain page migration situations, a THP page can be migrated without
being split into it's constituent subpages. This saves time required to
split a THP and put it back together when required. But it also saves an
wider address range translation covered by a single TLB entry, reducing
future page fault costs.
A previous patch changed platform THP helpers per generic memory semantics,
clearing the path for THP migration support. This adds two more THP helpers
required to create PMD migration swap entries. Now enable THP migration via
ARCH_ENABLE_THP_MIGRATION.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599627183-14453-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
pmd_present() and pmd_trans_huge() are expected to behave in the following
manner during various phases of a given PMD. It is derived from a previous
detailed discussion on this topic [1] and present THP documentation [2].
pmd_present(pmd):
- Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
- Returns false if pmd refers to a migration or swap entry
pmd_trans_huge(pmd):
- Returns true if pmd refers to system RAM and is a trans huge mapping
-------------------------------------------------------------------------
| PMD states | pmd_present | pmd_trans_huge |
-------------------------------------------------------------------------
| Mapped | Yes | Yes |
-------------------------------------------------------------------------
| Splitting | Yes | Yes |
-------------------------------------------------------------------------
| Migration/Swap | No | No |
-------------------------------------------------------------------------
The problem:
PMD is first invalidated with pmdp_invalidate() before it's splitting. This
invalidation clears PMD_SECT_VALID as below.
PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID
Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re-
affirm pmd_present() as true during the THP split process. To comply with
above mentioned semantics, pmd_trans_huge() should also check pmd_present()
first before testing presence of an actual transparent huge mapping.
The solution:
Ideally PMD_TYPE_SECT should have been used here instead. But it shares the
bit position with PMD_SECT_VALID which is used for THP invalidation. Hence
it will not be there for pmd_present() check after pmdp_invalidate().
A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD
entry during invalidation which can help pmd_present() return true and in
recognizing the fact that it still points to memory.
This bit is transient. During the split process it will be overridden by a
page table page representing normal pages in place of erstwhile huge page.
Other pmdp_invalidate() callers always write a fresh PMD value on the entry
overriding this transient PMD_PRESENT_INVALID bit, which makes it safe.
[1]: https://lkml.org/lkml/2018/10/17/231
[2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599627183-14453-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi device tree lacks DMA
channels for DSPI, so naturally, the driver fails to probe:
[ 2.945302] fsl-dspi 2100000.spi: rx dma channel not available
[ 2.951134] fsl-dspi 2100000.spi: can't get dma channels
In retrospect, this should have been obvious, because LS2080A, LS2085A
LS2088A and LX2160A don't appear to have an eDMA module at all. Looking
again at their datasheets, the CTARE register (which is specific to XSPI
functionality) seems to be documented, so switch them to XSPI mode
instead.
Fixes: 0feaf8f5afe0 ("spi: spi-fsl-dspi: Convert the instantiations that support it to DMA")
Reported-by: Qiang Zhao <qiang.zhao@nxp.com>
Tested-by: Qiang Zhao <qiang.zhao@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20200910121532.1138596-1-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
If an exception needs to be handled while reading an MSR - which is in
most of the cases caused by a #GP on a non-existent MSR - then this
is most likely the incarnation of a BIOS or a hardware bug. Such bug
violates the architectural guarantee that MCA banks are present with all
MSRs belonging to them.
The proper fix belongs in the hardware/firmware - not in the kernel.
Handling an #MC exception which is raised while an NMI is being handled
would cause the nasty NMI nesting issue because of the shortcoming of
IRET of reenabling NMIs when executed. And the machine is in an #MC
context already so <Deity> be at its side.
Tracing MSR accesses while in #MC is another no-no due to tracing being
inherently a bad idea in atomic context:
vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section
so remove all that "additional" functionality from mce_rdmsrl() and
provide it with a special exception handler which panics the machine
when that MSR is not accessible.
The exception handler prints a human-readable message explaining what
the panic reason is but, what is more, it panics while in the #GP
handler and latter won't have executed an IRET, thus opening the NMI
nesting issue in the case when the #MC has happened while handling
an NMI. (#MC itself won't be reenabled until MCG_STATUS hasn't been
cleared).
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
[ Add missing prototypes for ex_handler_* ]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200906212130.GA28456@zn.tnic
|
|
Remove link to litmus tests that didn't make it to upstream. Fix ringbuf
benchmark link.
I wasn't able to test this with `make htmldocs`, unfortunately, because of
Sphinx dependencies. But bench_ringbufs.c path is certainly correct now.
Fixes: 97abb2b39682 ("docs/bpf: Add BPF ring buffer design notes")
Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200910225245.2896991-1-andriin@fb.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.9-rc5:
- Fix double free in virtio.
- Add missing put_device in sun4i, and other fixes.
- Small ingenic fixes.
- Handle sun4i alpha on lowest plane correctly.
- Remove output->enabled from virtio, as it should use crtc_state.
- Fix tve200 enable/disable.
- Documentation fix.
- Fix virtio unblank.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/478b49d1-b1b3-c983-7056-8a89249be435@mblankhorst.nl
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.9-rc5:
- Fix regression leading to audio probe failure
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/875z8m2hss.fsf@intel.com
|
|
Increase Rx ring size to address issue where hardware is reaching
the receive work limit.
Before:
[ 102.223342] de2104x 0000:17:00.0 eth0: rx work limit reached
[ 102.245695] de2104x 0000:17:00.0 eth0: rx work limit reached
[ 102.251387] de2104x 0000:17:00.0 eth0: rx work limit reached
[ 102.267444] de2104x 0000:17:00.0 eth0: rx work limit reached
Signed-off-by: Lucy Yan <lucyyan@google.com>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no @validate argument.
CC: Johannes Berg <johannes.berg@intel.com>
Fixes: 3de644035446 ("netlink: re-add parse/validate functions in strict mode")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The parameter passed via DCB_ATTR_DCB_BUFFER is a struct dcbnl_buffer. The
field prio2buffer is an array of IEEE_8021Q_MAX_PRIORITIES bytes, where
each value is a number of a buffer to direct that priority's traffic to.
That value is however never validated to lie within the bounds set by
DCBX_MAX_BUFFERS. The only driver that currently implements the callback is
mlx5 (maintainers CCd), and that does not do any validation either, in
particual allowing incorrect configuration if the prio2buffer value does
not fit into 4 bits.
Instead of offloading the need to validate the buffer index to drivers, do
it right there in core, and bounce the request if the value is too large.
CC: Parav Pandit <parav@nvidia.com>
CC: Saeed Mahameed <saeedm@nvidia.com>
Fixes: e549f6f9c098 ("net/dcb: Add dcbnl buffer attribute")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
net: Fix bridge enslavement failure
Patch #1 fixes an issue in which an upper netdev cannot be enslaved to a
bridge when it has multiple netdevs with different parent identifiers
beneath it.
Patch #2 adds a test case using two netdevsim instances.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test that an upper device of netdevs with different parent IDs can be
enslaved to a bridge.
The test fails without previous commit.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a netdev is enslaved to a bridge, its parent identifier is queried.
This is done so that packets that were already forwarded in hardware
will not be forwarded again by the bridge device between netdevs
belonging to the same hardware instance.
The operation fails when the netdev is an upper of netdevs with
different parent identifiers.
Instead of failing the enslavement, have dev_get_port_parent_id() return
'-EOPNOTSUPP' which will signal the bridge to skip the query operation.
Other callers of the function are not affected by this change.
Fixes: 7e1146e8c10c ("net: devlink: introduce devlink_compat_switch_id_get() helper")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Release first buffer as last one since it contains references
to subsequent fragments. This code will be optimized introducing
multi-buffer bit in xdp_buff structure.
Fixes: ca0e014609f05 ("net: mvneta: move skb build after descriptors processing")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Checking for the lack of epitems refering to the epoll we want to insert into
is not enough; we might have an insertion of that epoll into another one that
has already collected the set of files to recheck for excessive reverse paths,
but hasn't gotten to creating/inserting the epitem for it.
However, any such insertion in progress can be detected - it will update the
generation count in our epoll when it's done looking through it for files
to check. That gets done under ->mtx of our epoll and that allows us to
detect that safely.
We are *not* holding epmutex here, so the generation count is not stable.
However, since both the update of ep->gen by loop check and (later)
insertion into ->f_ep_link are done with ep->mtx held, we are fine -
the sequence is
grab epmutex
bump loop_check_gen
...
grab tep->mtx // 1
tep->gen = loop_check_gen
...
drop tep->mtx // 2
...
grab tep->mtx // 3
...
insert into ->f_ep_link
...
drop tep->mtx // 4
bump loop_check_gen
drop epmutex
and if the fastpath check in another thread happens for that
eventpoll, it can come
* before (1) - in that case fastpath is just fine
* after (4) - we'll see non-empty ->f_ep_link, slow path
taken
* between (2) and (3) - loop_check_gen is stable,
with ->mtx providing barriers and we end up taking slow path.
Note that ->f_ep_link emptiness check is slightly racy - we are protected
against insertions into that list, but removals can happen right under us.
Not a problem - in the worst case we'll end up taking a slow path for
no good reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
lpass_core_sc7180_probe() misses to call pm_clk_destroy() and
pm_runtime_disable() in error paths. Correct goto target to fix it.
This issue is found by code inspection.
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Link: https://lore.kernel.org/r/20200827141629.101802-1-jingxiangfeng@huawei.com
Fixes: edab812d802d ("clk: qcom: lpass: Add support for LPASS clock controller for SC7180")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Wait until the QDIO data connection is severed. Otherwise the device
might still be processing the buffers, and end up accessing skb data
that we already freed.
Fixes: 8b5026bc1693 ("s390/qeth: fix qdio teardown after early init error")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the weird space inside the NETIF_F_CSUM_MASK.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to
invalidate dst entries"), we use blackhole_netdev to invalidate dst entries
instead of loopback device anymore.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"Small bug fixes for:
- SMR drive fix
- infinite loop when building free node ids
- EOF at DIO read"
* tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: Return EOF on unaligned end of file DIO read
f2fs: fix indefinite loop scanning for free nid
f2fs: Fix type of section block count variables
|
|
There are a couple bugs here:
1) If opt[1] is zero then this results in a forever loop. If the value
is less than 2 then it is invalid.
2) It assumes that "len" is more than sizeof(valid_accm) or 6 which can
result in memory corruption.
In the case of LCP_OPTION_ACCM, then we should check "opt[1]" instead
of "len" because, if "opt[1]" is less than sizeof(valid_accm) then
"nak_len" gets out of sync and it can lead to memory corruption in the
next iterations through the loop. In case of LCP_OPTION_MAGIC, the
only valid value for opt[1] is 6, but the code is trying to log invalid
data so we should only discard the data when "len" is less than 6
because that leads to a read overflow.
Reported-by: ChenNan Of Chaitin Security Research Lab <whutchennan@gmail.com>
Fixes: e022c2f07ae5 ("WAN: new synchronous PPP implementation for generic HDLC.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the micrel phy driver calls phy_init_hw() as a workaround,
the commit 9886a4dbd2aa ("net: phy: call phy_disable_interrupts()
in phy_init_hw()") disables the interrupt unexpectedly. So,
call phy_disable_interrupts() in phy_attach_direct() instead.
Otherwise, the phy cannot link up after the ethernet cable was
disconnected.
Note that other drivers (like at803x.c) also calls phy_init_hw().
So, perhaps, the driver caused a similar issue too.
Fixes: 9886a4dbd2aa ("net: phy: call phy_disable_interrupts() in phy_init_hw()")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous change "hv_netvsc: Switch the data path at the right time
during hibernation" adds the call of netvsc_vf_changed() upon
NETDEV_CHANGE, so it's necessary to avoid the duplicate call and message
when the VF is brought UP or DOWN.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When netvsc_resume() is called, the mlx5 VF NIC has not been resumed yet,
so in the future the host might sliently fail the call netvsc_vf_changed()
-> netvsc_switch_datapath() there, even if the call works now.
Call netvsc_vf_changed() in the NETDEV_CHANGE event handler: at that time
the mlx5 VF NIC has been resumed.
Fixes: 19162fd4063a ("hv_netvsc: Fix hibernation for mlx5 VF driver")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently there is concurrent reset and enqueue operation for the
same lockless qdisc when there is no lock to synchronize the
q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
qdisc_deactivate() called by dev_deactivate_queue(), which may cause
out-of-bounds access for priv->ring[] in hns3 driver if user has
requested a smaller queue num when __dev_xmit_skb() still enqueue a
skb with a larger queue_mapping after the corresponding qdisc is
reset, and call hns3_nic_net_xmit() with that skb later.
Reused the existing synchronize_net() in dev_deactivate_many() to
make sure skb with larger queue_mapping enqueued to old qdisc(which
is saved in dev_queue->qdisc_sleeping) will always be reset when
dev_reset_queue() is called.
Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Documentation/devicetree/bindings/net/dsa/dsa.txt says that the phy-mode
property should be specified on port nodes. However, the microchip
drivers read it from the switch node.
Let the driver use the per-port property and fall back to the old
location with a warning.
Fix in-tree users.
Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
Link: https://lore.kernel.org/netdev/20200617082235.GA1523@laureti-dev/
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mptcp_pm_nl_get_local_id may be called in interrupt context, so we need to
use GFP_ATOMIC flag to allocate memory to avoid sleeping in atomic context.
[ 280.209809] BUG: sleeping function called from invalid context at mm/slab.h:498
[ 280.209812] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1680, name: kworker/1:3
[ 280.209814] INFO: lockdep is turned off.
[ 280.209816] CPU: 1 PID: 1680 Comm: kworker/1:3 Tainted: G W 5.9.0-rc3-mptcp+ #146
[ 280.209818] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 280.209820] Workqueue: events mptcp_worker
[ 280.209822] Call Trace:
[ 280.209824] <IRQ>
[ 280.209826] dump_stack+0x77/0xa0
[ 280.209829] ___might_sleep.cold+0xa6/0xb6
[ 280.209832] kmem_cache_alloc_trace+0x1d1/0x290
[ 280.209835] mptcp_pm_nl_get_local_id+0x23c/0x410
[ 280.209840] subflow_init_req+0x1e9/0x2ea
[ 280.209843] ? inet_reqsk_alloc+0x1c/0x120
[ 280.209845] ? kmem_cache_alloc+0x264/0x290
[ 280.209849] tcp_conn_request+0x303/0xae0
[ 280.209854] ? printk+0x53/0x6a
[ 280.209857] ? tcp_rcv_state_process+0x28f/0x1374
[ 280.209859] tcp_rcv_state_process+0x28f/0x1374
[ 280.209864] ? tcp_v4_do_rcv+0xb3/0x1f0
[ 280.209866] tcp_v4_do_rcv+0xb3/0x1f0
[ 280.209869] tcp_v4_rcv+0xed6/0xfa0
[ 280.209873] ip_protocol_deliver_rcu+0x28/0x270
[ 280.209875] ip_local_deliver_finish+0x89/0x120
[ 280.209877] ip_local_deliver+0x180/0x220
[ 280.209881] ip_rcv+0x166/0x210
[ 280.209885] __netif_receive_skb_one_core+0x82/0x90
[ 280.209888] process_backlog+0xd6/0x230
[ 280.209891] net_rx_action+0x13a/0x410
[ 280.209895] __do_softirq+0xcf/0x468
[ 280.209899] asm_call_on_stack+0x12/0x20
[ 280.209901] </IRQ>
[ 280.209903] ? ip_finish_output2+0x240/0x9a0
[ 280.209906] do_softirq_own_stack+0x4d/0x60
[ 280.209908] do_softirq.part.0+0x2b/0x60
[ 280.209911] __local_bh_enable_ip+0x9a/0xa0
[ 280.209913] ip_finish_output2+0x264/0x9a0
[ 280.209916] ? rcu_read_lock_held+0x4d/0x60
[ 280.209920] ? ip_output+0x7a/0x250
[ 280.209922] ip_output+0x7a/0x250
[ 280.209925] ? __ip_finish_output+0x330/0x330
[ 280.209928] __ip_queue_xmit+0x1dc/0x5a0
[ 280.209931] __tcp_transmit_skb+0xa0f/0xc70
[ 280.209937] tcp_connect+0xb03/0xff0
[ 280.209939] ? lockdep_hardirqs_on_prepare+0xe7/0x190
[ 280.209942] ? ktime_get_with_offset+0x125/0x150
[ 280.209944] ? trace_hardirqs_on+0x1c/0xe0
[ 280.209948] tcp_v4_connect+0x449/0x550
[ 280.209953] __inet_stream_connect+0xbb/0x320
[ 280.209955] ? mark_held_locks+0x49/0x70
[ 280.209958] ? lockdep_hardirqs_on_prepare+0xe7/0x190
[ 280.209960] ? __local_bh_enable_ip+0x6b/0xa0
[ 280.209963] inet_stream_connect+0x32/0x50
[ 280.209966] __mptcp_subflow_connect+0x1fd/0x242
[ 280.209972] mptcp_pm_create_subflow_or_signal_addr+0x2db/0x600
[ 280.209975] mptcp_worker+0x543/0x7a0
[ 280.209980] process_one_work+0x26d/0x5b0
[ 280.209984] ? process_one_work+0x5b0/0x5b0
[ 280.209987] worker_thread+0x48/0x3d0
[ 280.209990] ? process_one_work+0x5b0/0x5b0
[ 280.209993] kthread+0x117/0x150
[ 280.209996] ? kthread_park+0x80/0x80
[ 280.209998] ret_from_fork+0x22/0x30
Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM")
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|