Age | Commit message (Collapse) | Author |
|
The init_kvm_*mmu functions, with the exception of shadow NPT,
do not need to know the full values of CR0/CR4/EFER; they only
need to know the bits that make up the "role". This cleanup
however will take quite a few incremental steps. As a start,
pull the common computation of the struct kvm_mmu_role_regs
into their caller: all of them extract the struct from the vcpu
as the very first step.
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
struct kvm_mmu_role_regs is computed just once and then accessed. Use
const to make this clearer, even though the const fields of struct
kvm_mmu_role_regs already prevent (or make it harder...) to modify
the contents of the struct.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The role.base.smm flag is always zero when setting up shadow EPT,
do not bother copying it over from vcpu->arch.root_mmu.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Clear enable_mmio_caching if hardware can't support MMIO caching and use
the dedicated flag to detect if MMIO caching is enabled instead of
assuming shadow_mmio_value==0 means MMIO caching is disabled. TDX will
use a zero value even when caching is enabled, and is_mmio_spte() isn't
so hot that it needs to avoid an extra memory access, i.e. there's no
reason to be super clever. And the clever approach may not even be more
performant, e.g. gcc-11 lands the extra check on a non-zero value inline,
but puts the enable_mmio_caching out-of-line, i.e. avoids the few extra
uops for non-MMIO SPTEs.
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Cc: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220420002747.3287931-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When determining whether or not a SPTE needs to have SME/SEV's memory
encryption flag set, do the moderately expensive host MMIO pfn check if
and only if the memory encryption mask is non-zero.
Note, KVM could further optimize the host MMIO checks by making a single
call to kvm_is_mmio_pfn(), but the tdp_enabled path (for EPT's memtype
handling) will likely be split out to a separate flow[*]. At that point,
a better approach would be to shove the call to kvm_is_mmio_pfn() into
VMX code so that AMD+NPT without SME doesn't get hit with an unnecessary
lookup.
[*] https://lkml.kernel.org/r/20220321224358.1305530-3-bgardon@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220415004909.2216670-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The TSC_AUX virtualization feature allows AMD SEV-ES guests to securely use
TSC_AUX (auxiliary time stamp counter data) in the RDTSCP and RDPID
instructions. The TSC_AUX value is set using the WRMSR instruction to the
TSC_AUX MSR (0xC0000103). It is read by the RDMSR, RDTSCP and RDPID
instructions. If the read/write of the TSC_AUX MSR is intercepted, then
RDTSCP and RDPID must also be intercepted when TSC_AUX virtualization
is present. However, the RDPID instruction can't be intercepted. This means
that when TSC_AUX virtualization is present, RDTSCP and TSC_AUX MSR
read/write must not be intercepted for SEV-ES (or SEV-SNP) guests.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <165040164424.1399644.13833277687385156344.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The TSC_AUX Virtualization feature allows AMD SEV-ES guests to securely use
TSC_AUX (auxiliary time stamp counter data) MSR in RDTSCP and RDPID
instructions.
The TSC_AUX MSR is typically initialized to APIC ID or another unique
identifier so that software can quickly associate returned TSC value
with the logical processor.
Add the feature bit and also include it in the kvm for detection.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Message-Id: <165040157111.1399644.6123821125319995316.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees.
The merge reconciles the ABI fixes for KVM_EXIT_SYSTEM_EVENT between
5.18 and commit c24a950ec7d6 ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata
for SEV-ES", 2022-04-13).
|
|
Drop lookup_address_in_mm() now that KVM is providing it's own variant
of lookup_address_in_pgd() that is safe for use with user addresses, e.g.
guards against page tables being torn down. A variant that provides a
non-init mm is inherently dangerous and flawed, as the only reason to use
an mm other than init_mm is to walk a userspace mapping, and
lookup_address_in_pgd() does not play nice with userspace mappings, e.g.
doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE()
to ensure an upper level entry isn't converted to a huge page between
checking the PAGE_SIZE bit and grabbing the address of the next level
down.
This reverts commit 13c72c060f1ba6f4eddd7b1c4f52a8aded43d6d9.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <YmwIi3bXr/1yhYV/@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees:
* Fix potential races when walking host page table
* Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
* Fix shadow page table leak when KVM runs nested
|
|
KVM uses lookup_address_in_mm() to detect the hugepage size that the host
uses to map a pfn. The function suffers from several issues:
- no usage of READ_ONCE(*). This allows multiple dereference of the same
page table entry. The TOCTOU problem because of that may cause KVM to
incorrectly treat a newly generated leaf entry as a nonleaf one, and
dereference the content by using its pfn value.
- the information returned does not match what KVM needs; for non-present
entries it returns the level at which the walk was terminated, as long
as the entry is not 'none'. KVM needs level information of only 'present'
entries, otherwise it may regard a non-present PXE entry as a present
large page mapping.
- the function is not safe for mappings that can be torn down, because it
does not disable IRQs and because it returns a PTE pointer which is never
safe to dereference after the function returns.
So implement the logic for walking host page tables directly in KVM, and
stop using lookup_address_in_mm().
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220429031757.2042406-1-mizhang@google.com>
[Inline in host_pfn_mapping_level, ensure no semantic change for its
callers. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused. Unfortunately this extensibility
mechanism has several issues:
- x86 is not writing the member, so it would not be possible to use it
on x86 except for new events
- the member is not aligned to 64 bits, so the definition of the
uAPI struct is incorrect for 32- on 64-bit userspace. This is a
problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
usage of flags was only introduced in 5.18.
Since padding has to be introduced, place a new field in there
that tells if the flags field is valid. To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid. The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.
To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's
MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR.
The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the
guest, possibly with help from userspace, manages to coerce KVM into
creating a SPTE for an "impossible" gfn, KVM will leak the associated
shadow pages (page tables):
WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57
kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
Modules linked in: kvm_intel kvm irqbypass
CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G W 5.18.0-rc1+ #293
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
Call Trace:
<TASK>
kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
kvm_destroy_vm+0x162/0x2d0 [kvm]
kvm_vm_release+0x1d/0x30 [kvm]
__fput+0x82/0x240
task_work_run+0x5b/0x90
exit_to_user_mode_prepare+0xd2/0xe0
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
On bare metal, encountering an impossible gpa in the page fault path is
well and truly impossible, barring CPU bugs, as the CPU will signal #PF
during the gva=>gpa translation (or a similar failure when stuffing a
physical address into e.g. the VMCS/VMCB). But if KVM is running as a VM
itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR
of the underlying hardware, in which case the hardware will not fault on
the illegal-from-KVM's-perspective gpa.
Alternatively, KVM could continue allowing the dodgy behavior and simply
zap the max possible range. But, for hosts with MAXPHYADDR < 52, that's
a (minor) waste of cycles, and more importantly, KVM can't reasonably
support impossible memslots when running on bare metal (or with an
accurate MAXPHYADDR as a VM). Note, limiting the overhead by checking if
KVM is running as a guest is not a safe option as the host isn't required
to announce itself to the guest in any way, e.g. doesn't need to set the
HYPERVISOR CPUID bit.
A second alternative to disallowing the memslot behavior would be to
disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR. That
restriction is undesirable as there are legitimate use cases for doing
so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous
systems so that VMs can be migrated between hosts with different
MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess.
Note that any guest.MAXPHYADDR is valid with shadow paging, and it is
even useful in order to test KVM with MAXPHYADDR=52 (i.e. without
any reserved physical address bits).
The now common kvm_mmu_max_gfn() is inclusive instead of exclusive.
The memslot and TDP MMU code want an exclusive value, but the name
implies the returned value is inclusive, and the MMIO path needs an
inclusive check.
Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 524a1e4e381f ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220428233416.2446833-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.18, take #2
- Take care of faults occuring between the PARange and
IPA range by injecting an exception
- Fix S2 faults taken from a host EL0 in protected mode
- Work around Oops caused by a PMU access from a 32bit
guest when PMU has been created. This is a temporary
bodge until we fix it for good.
|
|
For reasons best known to the author, gss-proxy does not implement a
NULL procedure, and returns RPC_PROC_UNAVAIL. However we still want to
ensure that we connect to the service at setup time.
So add a quirk-flag specially for this case.
Fixes: 1d658336b05f ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
KASAN report null-ptr-deref as follows:
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:ipmi_unregister_smi+0x7d/0xd50 drivers/char/ipmi/ipmi_msghandler.c:3680
Call Trace:
ipmi_ipmb_remove+0x138/0x1a0 drivers/char/ipmi/ipmi_ipmb.c:443
ipmi_ipmb_probe+0x409/0xda1 drivers/char/ipmi/ipmi_ipmb.c:548
i2c_device_probe+0x959/0xac0 drivers/i2c/i2c-core-base.c:563
really_probe+0x3f3/0xa70 drivers/base/dd.c:541
In ipmi_ipmb_probe(), 'iidev->intf' is not set before
ipmi_register_smi() success. And in the error handling case,
ipmi_ipmb_remove() is called to release resources, ipmi_unregister_smi()
is called without check 'iidev->intf', this will cause KASAN
null-ptr-deref issue.
General kernel style is to allow NULL to be passed into unregister
calls, so fix it that way. This allows a NULL check to be removed in
other code.
Fixes: 57c9e3c9a374 ("ipmi:ipmi_ipmb: Unregister the SMI on remove")
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: stable@vger.kernel.org # v5.17+
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
|
A chunk was dropped when the code handling send messages was rewritten.
Those messages shouldn't be processed normally, they are just an
indication that the message was successfully sent and the timers should
be started for the real response that should be coming later.
Add back in the missing chunk to just discard the message and go on.
Fixes: 059747c245f0 ("ipmi: Add support for IPMB direct messages")
Reported-by: Joe Wiese <jwiese@rackspace.com>
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Joe Wiese <jwiese@rackspace.com>
|
|
Fix following coccicheck error:
./arch/arm64/kernel/process.c:322:2-23: alloc with no test, possible model on line 326
Here should be dst->thread.sve_state.
Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviwed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220426113054.630983-1-wanjiabing@vivo.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
In case the DTB provided by the bootloader/BootROM is before the kernel
image or outside /memory, we won't be able to access it through the
linear mapping, and get a segfault on setup_arch(). Currently OpenSBI
relocates DTB but that's not always the case (e.g. if FW_JUMP_FDT_ADDR
is not specified), and it's also not the most portable approach since
the default FW_JUMP_FDT_ADDR of the generic platform relocates the DTB
at a specific offset that may not be available. To avoid this situation
copy DTB so that it's visible through the linear mapping.
Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
Link: https://lore.kernel.org/r/20220322132839.3653682-1-mick@ics.forth.gr
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Fixes: f105aa940e78 ("riscv: add BUILTIN_DTB support for MMU-enabled targets")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes
ARM: tegra: Default configuration fixes for v5.18
This contains two updates to the default configuration needed because of
a Kconfig symbol name change. This fixes a failure that was detected in
the NVIDIA automated test farm.
* tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: config: multi v7: Enable NVIDIA Tegra video decoder driver
ARM: tegra_defconfig: Update CONFIG_TEGRA_VDE option
Link: https://lore.kernel.org/r/20220429080626.494150-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Only allow data field to be 0 in struct io_uring_rsrc_update user
arguments to allow for future possible usage.
Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.18, 2nd round:
- Fix one sparse warning on imx-weim driver.
- Fix vqmmc regulator to get UHS-I mode work on imx6ull-colibri board.
- Add missing 32.768 kHz PMIC clock for imx8mn-ddr4-evk board to fix
bd718xx-clk probe error.
* tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
ARM: dts: imx6ull-colibri: fix vqmmc regulator
bus: imx-weim: make symbol 'weim_of_notifier' static
Link: https://lore.kernel.org/r/20220426013427.GB14615@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes
Fix return value in RSB bus driver
* tag 'sunxi-fixes-for-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
Link: https://lore.kernel.org/r/Ymbkd+/dDmRJz66w@kista.localdomain
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Fix the discrepancy between the two places we check for the CP0 counter
erratum in along with the incorrect comparison of the R4400 revision
number against 0x30 which matches none and consistently consider all
R4000 and R4400 processors affected, as documented in processor errata
publications[1][2][3], following the mapping between CP0 PRId register
values and processor models:
PRId | Processor Model
---------+--------------------
00000422 | R4000 Revision 2.2
00000430 | R4000 Revision 3.0
00000440 | R4400 Revision 1.0
00000450 | R4400 Revision 2.0
00000460 | R4400 Revision 3.0
No other revision of either processor has ever been spotted.
Contrary to what has been stated in commit ce202cbb9e0b ("[MIPS] Assume
R4000/R4400 newer than 3.0 don't have the mfc0 count bug") marking the
CP0 counter as buggy does not preclude it from being used as either a
clock event or a clock source device. It just cannot be used as both at
a time, because in that case clock event interrupts will be occasionally
lost, and the use as a clock event device takes precedence.
Compare against 0x4ff in `can_use_mips_counter' so that a single machine
instruction is produced.
References:
[1] "MIPS R4000PC/SC Errata, Processor Revision 2.2 and 3.0", MIPS
Technologies Inc., May 10, 1994, Erratum 53, p.13
[2] "MIPS R4400PC/SC Errata, Processor Revision 1.0", MIPS Technologies
Inc., February 9, 1994, Erratum 21, p.4
[3] "MIPS R4400PC/SC Errata, Processor Revision 2.0 & 3.0", MIPS
Technologies Inc., January 24, 1995, Erratum 14, p.3
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: ce202cbb9e0b ("[MIPS] Assume R4000/R4400 newer than 3.0 don't have the mfc0 count bug")
Cc: stable@vger.kernel.org # v2.6.24+
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
People are occasionally reporting a warning bfqq_request_over_limit()
triggering reporting that BFQ's idea of cgroup hierarchy (and its depth)
does not match what generic blkcg code thinks. This can actually happen
when bfqq gets moved between BFQ groups while bfqq_request_over_limit()
is running. Make sure the code is safe against BFQ queue being moved to
a different BFQ group.
Fixes: 76f1df88bbc2 ("bfq: Limit number of requests consumed by each cgroup")
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/all/CAJCQCtTw_2C7ZSz7as5Gvq=OmnDiio=HRkQekqWpKot84sQhFA@mail.gmail.com/
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220407140738.9723-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When the rpcbind server closes the socket, we need to ensure that the
socket is closed by the kernel as soon as feasible, so add a
sk_state_change callback to trigger this close.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.
This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.
Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.
Fixes: 809f9267bbab ("xen: map MSIs into pirqs")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Noah Meyerhans <noahm@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
|
|
The previous split budget between TX and RX made it return not using
the entire budget but at the same time not having calling called
napi_complete. This sometimes led to the poll to not be called, and at
the same time having TX and RX interrupts disabled resulting in the
driver getting stuck.
Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Link: https://lore.kernel.org/all/20220429084656.29788-4-andreas@gaisler.com
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
needs
The systemid property was checked for in the wrong place of the device
tree and compared to the wrong value.
Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Link: https://lore.kernel.org/all/20220429084656.29788-3-andreas@gaisler.com
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Use the device of the device tree node should be rather than the
device of the struct net_device when allocating DMA buffers.
The driver got away with it on sparc32 until commit 53b7670e5735
("sparc: factor the dma coherent mapping into helper") after which the
driver oopses.
Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Link: https://lore.kernel.org/all/20220429084656.29788-2-andreas@gaisler.com
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
There are deadlocks caused by del_timer_sync(&priv->hang_timer) and
del_timer_sync(&priv->rr_timer) in grcan_close(), one of the deadlocks
are shown below:
(Thread 1) | (Thread 2)
| grcan_reset_timer()
grcan_close() | mod_timer()
spin_lock_irqsave() //(1) | (wait a time)
... | grcan_initiate_running_reset()
del_timer_sync() | spin_lock_irqsave() //(2)
(wait timer to stop) | ...
We hold priv->lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler also need
priv->lock in position (2) of thread 2. As a result, grcan_close()
will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain the
needed lock.
Link: https://lore.kernel.org/all/20220425042400.66517-1-duoming@zju.edu.cn
Fixes: 6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Cc: stable@vger.kernel.org
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
As a carry over from the CAN_RAW socket (which allows to change the CAN
interface while mantaining the filter setup) the re-binding of the
CAN_ISOTP socket needs to take care about CAN ID address information and
subscriptions. It turned out that this feature is so limited (e.g. the
sockopts remain fix) that it finally has never been needed/used.
In opposite to the stateless CAN_RAW socket the switching of the CAN ID
subscriptions might additionally lead to an interrupted ongoing PDU
reception. So better remove this unneeded complexity.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220422082337.1676-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The following VM_BUG_ON_FOLIO() is triggered when memory error event
happens on the (thp/folio) pages which are about to be freed:
[ 1160.232771] page:00000000b36a8a0f refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x16a000
[ 1160.236916] page:00000000b36a8a0f refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x16a000
[ 1160.240684] flags: 0x57ffffc0800000(hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
[ 1160.243458] raw: 0057ffffc0800000 dead000000000100 dead000000000122 0000000000000000
[ 1160.246268] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[ 1160.249197] page dumped because: VM_BUG_ON_FOLIO(!folio_test_large(folio))
[ 1160.251815] ------------[ cut here ]------------
[ 1160.253438] kernel BUG at include/linux/mm.h:788!
[ 1160.256162] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 1160.258172] CPU: 2 PID: 115368 Comm: mceinj.sh Tainted: G E 5.18.0-rc1-v5.18-rc1-220404-2353-005-g83111+ #3
[ 1160.262049] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[ 1160.265103] RIP: 0010:dump_page.cold+0x27e/0x2bd
[ 1160.266757] Code: fe ff ff 48 c7 c6 81 f1 5a 98 e9 4c fe ff ff 48 c7 c6 a1 95 59 98 e9 40 fe ff ff 48 c7 c6 50 bf 5a 98 48 89 ef e8 9d 04 6d ff <0f> 0b 41 f7 c4 ff 0f 00 00 0f 85 9f fd ff ff 49 8b 04 24 a9 00 00
[ 1160.273180] RSP: 0018:ffffaa2c4d59fd18 EFLAGS: 00010292
[ 1160.274969] RAX: 000000000000003e RBX: 0000000000000001 RCX: 0000000000000000
[ 1160.277263] RDX: 0000000000000001 RSI: ffffffff985995a1 RDI: 00000000ffffffff
[ 1160.279571] RBP: ffffdc9c45a80000 R08: 0000000000000000 R09: 00000000ffffdfff
[ 1160.281794] R10: ffffaa2c4d59fb08 R11: ffffffff98940d08 R12: ffffdc9c45a80000
[ 1160.283920] R13: ffffffff985b6f94 R14: 0000000000000000 R15: ffffdc9c45a80000
[ 1160.286641] FS: 00007eff54ce1740(0000) GS:ffff99c67bd00000(0000) knlGS:0000000000000000
[ 1160.289498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1160.291106] CR2: 00005628381a5f68 CR3: 0000000104712003 CR4: 0000000000170ee0
[ 1160.293031] Call Trace:
[ 1160.293724] <TASK>
[ 1160.294334] get_hwpoison_page+0x47d/0x570
[ 1160.295474] memory_failure+0x106/0xaa0
[ 1160.296474] ? security_capable+0x36/0x50
[ 1160.297524] hard_offline_page_store+0x43/0x80
[ 1160.298684] kernfs_fop_write_iter+0x11c/0x1b0
[ 1160.299829] new_sync_write+0xf9/0x160
[ 1160.300810] vfs_write+0x209/0x290
[ 1160.301835] ksys_write+0x4f/0xc0
[ 1160.302718] do_syscall_64+0x3b/0x90
[ 1160.303664] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1160.304981] RIP: 0033:0x7eff54b018b7
As shown in the RIP address, this VM_BUG_ON in folio_entire_mapcount() is
called from dump_page("hwpoison: unhandlable page") in get_any_page().
The below explains the mechanism of the race:
CPU 0 CPU 1
memory_failure
get_hwpoison_page
get_any_page
dump_page
compound = PageCompound
free_pages_prepare
page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP
folio_entire_mapcount
VM_BUG_ON_FOLIO(!folio_test_large(folio))
So replace dump_page() with safer one, pr_err().
Link: https://lkml.kernel.org/r/20220427053220.719866-1-naoya.horiguchi@linux.dev
Fixes: 74e8ee4708a8 ("mm: Turn head_compound_mapcount() into folio_entire_mapcount()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Kernel panic when injecting memory_failure for the global huge_zero_page,
when CONFIG_DEBUG_VM is enabled, as follows.
Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2499!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
RIP: 0010:split_huge_page_to_list+0x66a/0x880
Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
try_to_split_thp_page+0x3a/0x130
memory_failure+0x128/0x800
madvise_inject_error.cold+0x8b/0xa1
__x64_sys_madvise+0x54/0x60
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc3754f8bf9
Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
We think that raising BUG is overkilling for splitting huge_zero_page, the
huge_zero_page can't be met from normal paths other than memory failure,
but memory failure is a valid caller. So we tend to replace the BUG to
WARN + returning -EBUSY, and thus the panic above won't happen again.
Link: https://lkml.kernel.org/r/f35f8b97377d5d3ede1bc5ac3114da888c57cbce.1651052574.git.xuyu@linux.alibaba.com
Fixes: d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/memory-failure: rework fix on huge_zero_page splitting".
This patch (of 2):
This reverts commit d173d5417fb67411e623d394aab986d847e47dad.
The commit d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in
memory_failure()") explicitly skips huge_zero_page in memory_failure(), in
order to avoid triggering VM_BUG_ON_PAGE on huge_zero_page in
split_huge_page_to_list().
This works, but Yang Shi thinks that,
Raising BUG is overkilling for splitting huge_zero_page. The
huge_zero_page can't be met from normal paths other than memory
failure, but memory failure is a valid caller. So I tend to replace
the BUG to WARN + returning -EBUSY. If we don't care about the
reason code in memory failure, we don't have to touch memory
failure.
And for the issue that huge_zero_page will be set PG_has_hwpoisoned,
Yang Shi comments that,
The anonymous page fault doesn't check if the page is poisoned or
not since it typically gets a fresh allocated page and assumes the
poisoned page (isolated successfully) can't be reallocated again.
But huge zero page and base zero page are reused every time. So no
matter what fix we pick, the issue is always there.
Finally, Yang, David, Anshuman and Naoya all agree to fix the bug, i.e.,
to split huge_zero_page, in split_huge_page_to_list().
This reverts the commit d173d5417fb6 ("mm/memory-failure.c: skip
huge_zero_page in memory_failure()"), and the original bug will be fixed
by the next patch.
Link: https://lkml.kernel.org/r/872cefb182ba1dd686b0e7db1e6b2ebe5a4fff87.1651039624.git.xuyu@linux.alibaba.com
Fixes: d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
(cherry picked from commit 1f682dc9fb3790aa7ec27d3d122ff32b1eda1365 in wireless-next)
Currently ath11k will wait 11d scan complete while add interface in
ath11k_mac_op_add_interface(), when system resume without enable
wowlan, ath11k_mac_op_add_interface() is called for each resume, thus
it increase the resume time of system. And ath11k_mac_op_hw_scan()
after ath11k_mac_op_add_interface() also needs some time cost because
the previous 11d scan need more than 5 seconds when 6 GHz is enabled,
then the scan started event will indicated to ath11k after the 11d
scan completed.
While 11d scan/hw scan is running in firmware, if ath11k update channel
list to firmware by WMI_SCAN_CHAN_LIST_CMDID, then firmware will cancel
the current scan which is running, it lead the scan failed. The patch
commit 9dcf6808b253 ("ath11k: add 11d scan offload support") used
finish_11d_scan/finish_11d_ch_list/pending_11d to synchronize the 11d
scan/hw scan/channel list between ath11k/firmware/mac80211 and to avoid
the scan fail.
Add wait operation before ath11k update channel list, function
ath11k_reg_update_chan_list() will wait until the current 11d scan/hw
scan completed. And remove the wait operation of start 11d scan and
waiting channel list complete in hw scan. After these changes, resume
time cost reduce about 5 seconds and also hw scan time cost reduced
obviously, and scan failed not seen.
The 11d scan is sent to firmware only one time for each interface added
in mac.c, and it is moved after the 1st hw scan because 11d scan will
cost some time and thus leads the AP scan result update to UI delay.
Currently priority of ath11k's hw scan is WMI_SCAN_PRIORITY_LOW, and
priority of 11d scan in firmware is WMI_SCAN_PRIORITY_MEDIUM, then the
11d scan which sent after hw scan will cancel the hw scan in firmware,
so change the priority to WMI_SCAN_PRIORITY_MEDIUM for the hw scan which
is in front of the 11d scan, thus it will not happen scan cancel in
firmware.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Fixes: 9dcf6808b253 ("ath11k: add 11d scan offload support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215777
Cc: <stable@vger.kernel.org>
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220328035832.14122-1-quic_wgong@quicinc.com
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220427111619.9758-1-kvalo@kernel.org
|
|
Pull drm fixes from Dave Airlie:
"Another relatively quiet week, amdgpu leads the way, some i915 display
fixes, and a single sunxi fix.
amdgpu:
- Runtime pm fix
- DCN memory leak fix in error path
- SI DPM deadlock fix
- S0ix fix
amdkfd:
- GWS fix
- GWS support for CRIU
i915:
- Fix #5284: Backlight control regression on XMG Core 15 e21
- Fix black display plane on Acer One AO532h
- Two smaller display fixes
sunxi:
- Single fix removing applying PHYS_OFFSET twice"
* tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend
drm/amd/pm: fix the deadlock issue observed on SI
drm/amd/display: Fix memory leak in dcn21_clock_source_create
drm/amdgpu: don't runtime suspend if there are displays attached (v3)
drm/amdkfd: CRIU add support for GWS queues
drm/amdkfd: Fix GWS queue count
drm/sun4i: Remove obsolete references to PHYS_OFFSET
drm/i915/fbc: Consult hw.crtc instead of uapi.crtc
drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses
drm/i915: Check EDID for HDR static metadata when choosing blc
drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.18-2022-04-27:
amdgpu:
- Runtime pm fix
- DCN memory leak fix in error path
- SI DPM deadlock fix
- S0ix fix
amdkfd:
- GWS fix
- GWS support for CRIU
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix #5284: Backlight control regression on XMG Core 15 e21
- Fix black display plane on Acer One AO532h
- Two smaller display fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Ymotel5VfZUrJahf@jlahtine-mobl.ger.corp.intel.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.18-rc5:
- Single fix removing applying PHYS_OFFSET twice in sunxi.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f692bb62-5620-1868-91b7-dffb8d6f9175@linux.intel.com
|
|
Mark the CLOCK_MONOTONIC fast time accessors as notrace. These functions are
used in tracing to retrieve timestamps, so they should not recurse.
Fixes: 4498e7467e9e ("time: Parametrize all tk_fast_mono users")
Fixes: f09cb9a1808e ("time: Introduce tk_fast_raw")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220426175338.3807ca4f@gandalf.local.home/
Link: https://lore.kernel.org/r/20220428062432.61063-1-kurt@linutronix.de
|
|
Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.
With the necessary symbols now in kallsyms we can symbolize nVHE
addresses using the %p print format specifier:
[ 98.916444][ T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-7-kaleshsingh@google.com
|
|
The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned such that any valid stack address has PAGE_SHIFT bit as 1.
This allows us to conveniently check for overflow in the exception entry
without corrupting any GPRs. We won't recover from a stack overflow so
panic the hypervisor.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-6-kaleshsingh@google.com
|
|
Map the stack pages in the flexible private VA range and allocate
guard pages below the stack as unbacked VA space. The stack is aligned
so that any valid stack address has PAGE_SHIFT bit as 1 - this is used
for overflow detection (implemented in a subsequent patch in the series)
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-5-kaleshsingh@google.com
|
|
Map the stack pages in the flexible private VA range and allocate
guard pages below the stack as unbacked VA space. The stack is aligned
so that any valid stack address has PAGE_SHIFT bit as 1 - this is used
for overflow detection (implemented in a subsequent patch in the series).
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-4-kaleshsingh@google.com
|
|
pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the pKVM nVHE hypervisor. Allocations are aligned based on the order of
the requested size.
This will be used to implement stack guard pages for pKVM nVHE hypervisor
(in a subsequent patch in the series).
Credits to Quentin Perret <qperret@google.com> for the idea of moving
private VA allocation out of __pkvm_create_private_mapping()
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-3-kaleshsingh@google.com
|
|
hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the nVHE hypervisor. Allocations are aligned based on the order of
the requested size.
This will be used to implement stack guard pages for KVM nVHE hypervisor
(nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-2-kaleshsingh@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.
Current release - new code bugs:
- bridge: switchdev: check br_vlan_group() return value
- use this_cpu_inc() to increment net->core_stats, fix preempt-rt
Previous releases - regressions:
- eth: stmmac: fix write to sgmii_adapter_base
Previous releases - always broken:
- netfilter: nf_conntrack_tcp: re-init for syn packets only,
resolving issues with TCP fastopen
- tcp: md5: fix incorrect tcp_header_len for incoming connections
- tcp: fix F-RTO may not work correctly when receiving DSACK
- tcp: ensure use of most recently sent skb when filling rate samples
- tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
- virtio_net: fix wrong buf address calculation when using xdp
- xsk: fix forwarding when combining copy mode with busy poll
- xsk: fix possible crash when multiple sockets are created
- bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
bpf_xmit lwt hook
- sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
- wireguard: device: check for metadata_dst with skb_valid_dst()
- netfilter: update ip6_route_me_harder to consider L3 domain
- gre: make o_seqno start from 0 in native mode
- gre: switch o_seqno to atomic to prevent races in collect_md mode
Misc:
- add Eric Dumazet to networking maintainers
- dt: dsa: realtek: remove realtek,rtl8367s string
- netfilter: flowtable: Remove the empty file"
* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
tcp: fix F-RTO may not work correctly when receiving DSACK
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
ixgbe: ensure IPsec VF<->PF compatibility
MAINTAINERS: Update BNXT entry with firmware files
netfilter: nft_socket: only do sk lookups when indev is available
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
bnx2x: fix napi API usage sequence
tls: Skip tls_append_frag on zero copy size
Add Eric Dumazet to networking maintainers
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
net: Use this_cpu_inc() to increment net->core_stats
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
ice: fix use-after-free when deinitializing mailbox snapshot
ice: wait 5 s for EMP reset after firmware flash
ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
...
|
|
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll
reads kiocb->private it can contain uninitialized data.
Fixes: 3e08773c3841 ("block: switch polling to be bio based")
Signed-off-by: Joseph Ravichandran <jravi@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These take back recent chages that started to confuse users and fix up
an attr.show callback prototype in a driver.
Specifics:
- Stop warning about deprecation of the userspace thermal governor
and cooling device status interface, because there are cases in
which user space has to drive thermal management with the help of
them (Daniel Lezcano)
- Fix attr.show callback prototype in the int340x thermal driver
(Kees Cook)"
* tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/governor: Remove deprecated information
Revert "thermal/core: Deprecate changing cooling device state from userspace"
thermal: int340x: Fix attr.show callback prototype
|