Age | Commit message (Collapse) | Author |
|
Invalidate all MMUs' roles after a CPUID update to force reinitizliation
of the MMU context/helpers. Despite the efforts of commit de3ccd26fafc
("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
there are still a handful of CPUID-based properties that affect MMU
behavior but are not incorporated into mmu_role. E.g. 1gb hugepage
support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
factor into the guest's reserved PTE bits.
The obvious alternative would be to add all such properties to mmu_role,
but doing so provides no benefit over simply forcing a reinitialization
on every CPUID update, as setting guest CPUID is a rare operation.
Note, reinitializing all MMUs after a CPUID update does not fix all of
KVM's woes. Specifically, kvm_mmu_page_role doesn't track the CPUID
properties, which means that a vCPU can reuse shadow pages that should
not exist for the new vCPU model, e.g. that map GPAs that are now illegal
(due to MAXPHYADDR changes) or that set bits that are now reserved
(PAGE_SIZE for 1gb pages), etc...
Tracking the relevant CPUID properties in kvm_mmu_page_role would address
the majority of problems, but fully tracking that much state in the
shadow page role comes with an unpalatable cost as it would require a
non-trivial increase in KVM's memory footprint. The GBPAGES case is even
worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
support in the hardware page walker, i.e. it's a virtualization hole that
can't be closed when using TDP.
In other words, resetting the MMU after a CPUID update is largely a
superficial fix. But, it will allow reverting the tracking of MAXPHYADDR
in the mmu_role, and that case in particular needs to mostly work because
KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
is supported. For cases where KVM botches guest behavior, the damage is
limited to that guest. But for the shadow_root_level, a misconfigured
MMU can cause KVM to incorrectly access memory, e.g. due to walking off
the end of its shadow page tables.
Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
virtualization. When KVM (L0) is using TDP, CR4.LA57 is not reflected in
mmu_role.base.level because that tracks the shadow root level, i.e. TDP
level. Normally, this is not an issue because LA57 can't be toggled
while long mode is active, i.e. the guest has to first disable paging,
then toggle LA57, then re-enable paging, thus ensuring an MMU
reinitialization.
But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
without having to bounce through an unpaged section. L1 can also load a
new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
single entry PML5 is sufficient. Such shenanigans are only problematic
if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.
Note, in the L2 case with nested TDP, even though L1 can switch between
L2s with different LA57 settings, thus bypassing the paging requirement,
in that case KVM's nested_mmu will track LA57 in base.level.
This reverts commit 8053f924cad30bf9f9a24e02b6c8ddfabf5202ea.
Fixes: 8053f924cad3 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest. When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc... If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.
Fixes: e57d4a356ad3 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
was set prior to INIT. Simply re-initializing the current MMU is not
sufficient as the current root HPA may not be usable in the new context.
E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
KVM will fail to switch to the 32-bit pae_root and bomb on the next
VM-Enter due to running with a 64-bit CR3 in 32-bit mode.
This bug was papered over in both VMX and SVM, but still managed to rear
its head in the MMU role on VMX. Because EFER.LMA=1 requires CR0.PG=1,
kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
checking CR0.PG. VMX's RESET/INIT flow writes CR0 before EFER, and so
an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
generate the wrong MMU role.
In VMX, the INIT issue is specific to running without unrestricted guest
since unrestricted guest is available if and only if EPT is enabled.
Commit 8668a3c468ed ("KVM: VMX: Reset mmu context when entering real
mode") resolved the issue by forcing a reset when entering emulated real
mode.
In SVM, commit ebae871a509d ("kvm: svm: reset mmu on VCPU reset") forced
a MMU reset on every INIT to workaround the flaw in common x86. Note, at
the time the bug was fixed, the SVM problem was exacerbated by a complete
lack of a CR4 update.
The vendor resets will be reverted in future patches, primarily to aid
bisection in case there are non-INIT flows that rely on the existing VMX
logic.
Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
context reset.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Mark NX as being used for all non-nested shadow MMUs, as KVM will set the
NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled.
Checking the mitigation itself is not sufficient as it can be toggled on
at any time and KVM doesn't reset MMU contexts when that happens. KVM
could reset the contexts, but that would require purging all SPTEs in all
MMUs, for no real benefit. And, KVM already forces EFER.NX=1 when TDP is
disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved
for shadow MMUs.
Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove a misguided WARN that attempts to detect the scenario where using
a special A/D tracking flag will set reserved bits on a non-MMIO spte.
The WARN triggers false positives when using EPT with 32-bit KVM because
of the !64-bit clause, which is just flat out wrong. The whole A/D
tracking goo is specific to EPT, and one of the big selling points of EPT
is that EPT is decoupled from the host's native paging mode.
Drop the WARN instead of trying to salvage the check. Keeping a check
specific to A/D tracking bits would essentially regurgitate the same code
that led to KVM needed the tracking bits in the first place.
A better approach would be to add a generic WARN on reserved bits being
set, which would naturally cover the A/D tracking bits, work for all
flavors of paging, and be self-documenting to some extent.
Fixes: 8a406c89532c ("KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
To remove code duplication, use the binary stats descriptors in the
implementation of the debugfs interface for statistics. This unifies
the definition of statistics for the binary and debugfs interfaces.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-8-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add selftest to check KVM stats descriptors validity.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-7-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This new API provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
The statistics descriptors are defined by KVM in kernel and can be
by userspace to discover VM/VCPU statistics during the one-time setup
stage.
The statistics data itself could be read out by userspace telemetry
periodically without any extra parsing or setup effort.
There are a few existed interface protocols and definitions, but no
one can fulfil all the requirements this interface implemented as
below:
1. During high frequency periodic stats reading, there should be no
extra efforts except the stats data read itself.
2. Support stats annotation, like type (cumulative, instantaneous,
peak, histogram, etc) and unit (counter, time, size, cycles, etc).
3. The stats data reading should be free of lock/synchronization. We
don't care about the consistency between all the stats data. All
stats data can not be read out at exactly the same time. We really
care about the change or trend of the stats data. The lock-free
solution is not just for efficiency and scalability, also for the
stats data accuracy and usability. For example, in the situation
that all the stats data readings are protected by a global lock,
if one VCPU died somehow with that lock held, then all stats data
reading would be blocked, then we have no way from stats data that
which VCPU has died.
4. The stats data reading workload can be handed over to other
unprivileged process.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-6-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a VCPU ioctl to get a statistics file descriptor by which a read
functionality is provided for userspace to read out VCPU stats header,
descriptors and data.
Define VCPU statistics descriptors and header for all architectures.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-5-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a VM ioctl to get a statistics file descriptor by which a read
functionality is provided for userspace to read out VM stats header,
descriptors and data.
Define VM statistics descriptors and header for all architectures.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-4-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull drm fixes from Dave Airlie:
"This is a bit bigger than I'd like at this stage, and I guess last
week was extra quiet, but it's mostly one fix across three drivers to
wait for buffer move pinning to complete.
There was one locking change that got reverted so it's just noise.
Otherwise the amdgpu/nouveau changes are for known regressions, and
otherwise it's just misc changes in kmb/atmel/vc4 drivers.
Summary:
core:
- auth locking change + brown paper bag revert
radeon/nouveau/amdgpu/ttm:
- wait for BO to be pinned after moving it (same fix in three
drivers)
amdgpu:
- Revert GFX9/10 doorbell fixes, we just end up trading one bug for
another
- Potential memory corruption fix in framebuffer handling
nouveau:
- fix regression checking dma addresses
kmb:
- error return fix
atmel-hlcdc:
- fix kernel warnings at boot
- enable async flips
vc4:
- fix CPU hang due to power management"
* tag 'drm-fixes-2021-06-25' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau: fix dma_address check for CPU/GPU sync
drm/kmb: Fix error return code in kmb_hw_init()
drm/amdgpu: wait for moving fence after pinning
drm/radeon: wait for moving fence after pinning
drm/nouveau: wait for moving fence after pinning v2
Revert "drm: add a locked version of drm_is_current_master"
Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue."
Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell."
drm/amdgpu: Call drm_framebuffer_init last for framebuffer init
drm: add a locked version of drm_is_current_master
drm/atmel-hlcdc: Allow async page flips
drm/panel: ld9040: reference spi_device_id table
drm: atmel_hlcdc: Enable the crtc vblank prior to crtc usage.
drm/vc4: hdmi: Make sure the controller is powered in detect
drm/vc4: hdmi: Move the HSM clock enable to runtime_pm
|
|
The direction of the pipe argument must match the request-type direction
bit or control requests may fail depending on the host-controller-driver
implementation.
Control transfers without a data stage are treated as OUT requests by
the USB stack and should be using usb_sndctrlpipe(). Failing to do so
will now trigger a warning.
Fix the OSIFI2C_SET_BIT_RATE and OSIFI2C_STOP requests which erroneously
used the osif_usb_read() helper and set the IN direction bit.
Reported-by: syzbot+9d7dadd15b8819d73f41@syzkaller.appspotmail.com
Fixes: 83e53a8f120f ("i2c: Add bus driver for for OSIF USB i2c device.")
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A DMA address check for nouveau, an error code return fix for kmb, fixes
to wait for a moving fence after pinning the BO for amdgpu, nouveau and
radeon, a crtc and async page flip fix for atmel-hlcdc and a cpu hang
fix for vc4.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210624190353.wyizoil3wqrrxz5d@gilmour
|
|
Fix Sparse warnings:
drivers/i2c/i2c-dev.c:546:19: warning: incorrect type in assignment (different address spaces)
drivers/i2c/i2c-dev.c:549:53: warning: incorrect type in argument 2 (different address spaces)
compat_ptr() returns a pointer tagged __user which gets assigned to a
pointer missing the __user annotation. The same pointer is passed to
copy_from_user() as an argument where it is expected to have the __user
annotation. Fix both by adding the __user annotation to the pointer.
Fixes: 7d5cb45655f2 ("i2c compat ioctls: move to ->compat_ioctl()")
Signed-off-by: Andreas Hecht <andreas.e.hecht@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Commit 61ca49a9105f ("libceph: don't set global_id until we get an
auth ticket") delayed the setting of global_id too much. It is set
only after all tickets are received, but in pre-nautilus clusters an
auth ticket and the service tickets are obtained in separate steps
(for a total of three MAuth replies). When the service tickets are
requested, global_id is used to build an authorizer; if global_id is
still 0 we never get them and fail to establish the session.
Moving the setting of global_id into protocol implementations. This
way global_id can be set exactly when an auth ticket is received, not
sooner nor later.
Fixes: 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
There is no result to pass in msgr2 case because authentication
failures are reported through auth_bad_method frame and in MAuth
case an error is returned immediately.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
There is an assignment of ancillary->mode to itself which looks
dubious since the proceeding comment states that the speed and
mode is taken over from the SPI main device, indicating that
ancillary->mode should assigned using the value spi->mode.
Fix this.
Addresses-Coverity: ("Self assignment")
Fixes: 0c79378c0199 ("spi: add ancillary device support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210623172300.161484-1-colin.king@canonical.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"Use memcpy_to/fromio for dram-access-quirk in the meson-gx host
driver"
* tag 'mmc-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: meson-gx: use memcpy_to/fromio for dram-access-quirk
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sigqueue cache fix from Ingo Molnar:
"Fix a memory leak in the recently introduced sigqueue cache"
* tag 'core-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
signal: Prevent sigqueue caching after task got released
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"A last minute cgroup bandwidth scheduling fix for a recently
introduced logic fail which triggered a kernel warning by LTP's
cfs_bandwidth01 test"
* tag 'sched-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Ensure that the CFS parent is added after unthrottling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Ingo Molnar:
"An LBR buffer fix for code that probably only worked accidentally"
* tag 'perf-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/lbr: Zero the xstate buffer on allocation
|
|
It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.
Fix this by only taking a reference on valid pages if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).
This addresses CVE-2021-22543.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This commit defines the API for userspace and prepare the common
functionalities to support per VM/VCPU binary stats data readings.
The KVM stats now is only accessible by debugfs, which has some
shortcomings this change series are supposed to fix:
1. The current debugfs stats solution in KVM could be disabled
when kernel Lockdown mode is enabled, which is a potential
rick for production.
2. The current debugfs stats solution in KVM is organized as "one
stats per file", it is good for debugging, but not efficient
for production.
3. The stats read/clear in current debugfs solution in KVM are
protected by the global kvm_lock.
Besides that, there are some other benefits with this change:
1. All KVM VM/VCPU stats can be read out in a bulk by one copy
to userspace.
2. A schema is used to describe KVM statistics. From userspace's
perspective, the KVM statistics are self-describing.
3. With the fd-based solution, a separate telemetry would be able
to read KVM stats in a less privileged environment.
4. After the initial setup by reading in stats descriptors, a
telemetry only needs to read the stats data itself, no more
parsing or setup is needed.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com> #arm64
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-3-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Generic KVM stats are those collected in architecture independent code
or those supported by all architectures; put all generic statistics in
a separate structure. This ensures that they are defined the same way
in the statistics API which is being added, removing duplication among
different architectures in the declaration of the descriptors.
No functional change intended.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-2-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Treat a NULL shadow page in the "is a TDP MMU" check as valid, non-TDP
root. KVM uses a "direct" PAE paging MMU when TDP is disabled and the
guest is running with paging disabled. In that case, root_hpa points at
the pae_root page (of which only 32 bytes are used), not a standard
shadow page, and the WARN fires (a lot).
Fixes: 0b873fd7fb53 ("KVM: x86/mmu: Remove redundant is_tdp_mmu_enabled check")
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622072454.3449146-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add an x86-only test to verify that x86's MMU reacts to CPUID updates
that impact the MMU. KVM has had multiple bugs where it fails to
reconfigure the MMU after the guest's vCPU model changes.
Sadly, this test is effectively limited to shadow paging because the
hardware page walk handler doesn't support software disabling of GBPAGES
support, and KVM doesn't manually walk the GVA->GPA on faults for
performance reasons (doing so would large defeat the benefits of TDP).
Don't require !TDP for the tests as there is still value in running the
tests with TDP, even though the tests will fail (barring KVM hacks).
E.g. KVM should not completely explode if MAXPHYADDR results in KVM using
4-level vs. 5-level paging for the guest.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add x86-64 hugepage support in the form of a x86-only variant of
virt_pg_map() that takes an explicit page size. To keep things simple,
follow the existing logic for 4k pages and disallow creating a hugepage
if the upper-level entry is present, even if the desired pfn matches.
Opportunistically fix a double "beyond beyond" reported by checkpatch.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-19-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In preparation for adding hugepage support, replace "pageMapL4Entry",
"pageDirectoryPointerEntry", and "pageDirectoryEntry" with a common
"pageUpperEntry", and add a helper to create an upper level entry. All
upper level entries have the same layout, using unique structs provides
minimal value and requires a non-trivial amount of code duplication.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a helper to retrieve a PTE pointer given a PFN, address, and level
in preparation for adding hugepage support.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Rename the "address" field to "pfn" in x86's page table structs to match
reality.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a helper to allocate a page for use in constructing the guest's page
tables. All architectures have identical address and memslot
requirements (which appear to be arbitrary anyways).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop the EPTP memslot param from all EPT helpers and shove the hardcoded
'0' down to the vm_phy_page_alloc() calls.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop the memslot param from virt_pg_map() and virt_map() and shove the
hardcoded '0' down to the vm_phy_page_alloc() calls.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622200529.3650424-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop the memslot param(s) from vm_vaddr_alloc() now that all callers
directly specific '0' as the memslot. Drop the memslot param from
virt_pgd_alloc() as well since vm_vaddr_alloc() is its only user.
I.e. shove the hardcoded '0' down to the vm_phy_pages_alloc() calls.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Address a number of objtool warnings that got reported.
No change in behavior intended, but code generation might be impacted
by commit 1f008d46f124 ("x86: Always inline task_size_max()")"
* tag 'objtool-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Improve noinstr vs errors
x86: Always inline task_size_max()
x86/xen: Fix noinstr fail in exc_xen_unknown_trap()
x86/xen: Fix noinstr fail in xen_pv_evtchn_do_upcall()
x86/entry: Fix noinstr fail in __do_fast_syscall_32()
objtool/x86: Ignore __x86_indirect_alt_* symbols
|
|
Support set_trips() callback of thermal device ops. This allows HWMON
device to operatively notify thermal core about temperature changes, which
is very handy to have in a case where HWMON sensor is used by CPU thermal
zone that performs passive cooling and emergency shutdown on overheat.
Thermal core will be able to react faster to temperature changes.
The set_trips() callback is entirely optional. If HWMON sensor doesn't
support setting thermal trips, then the callback is a NO-OP. The dummy
callback has no effect on the thermal core. The temperature trips are
either complement the temperature polling mechanism of thermal core or
replace the polling if sensor can set the trips and polling is disabled
by a particular device in a device-tree.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20210623042231.16008-3-digetx@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
The min/max/crit and all other temperature values that are passed to
the driver are unlimited and value that is close to INT_MIN results in
integer underflow of the temperature calculations made by the driver
for LM99 sensor. Temperature hysteresis is among those values that need
to be limited, but limiting of hysteresis is independent from the sensor
version. Add the missing limits.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20210623042231.16008-2-digetx@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Setting a page flag without holding a reference to the page
is living dangerously. In the tag-writing path, we drop the
reference to the page by calling kvm_release_pfn_dirty(),
and only then set the PG_mte_tagged bit.
It would be safer to do it the other way round.
Fixes: f0376edb1ddca ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87k0mjidwb.wl-maz@kernel.org
|
|
AGP for example doesn't have a dma_address array.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210614110517.1624-1-christian.koenig@amd.com
|
|
Optimise SVE switching for CPUs with 128-bit implementations.
* for-next/sve:
arm64/sve: Skip flushing Z registers with 128 bit vectors
arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state()
arm64/sve: Split _sve_flush macro into separate Z and predicate flushes
|
|
Add support for versions v1.2 and 1.3 of the SMC calling convention.
* for-next/smccc:
arm64: smccc: Support SMCCC v1.3 SVE register saving hint
arm64: smccc: Add support for SMCCCv1.2 extended input/output registers
|
|
Fix output format from SVE selftest.
* for-next/selftests:
kselftest/arm64: Add missing newline to SVE test skipping output
|
|
Allow Pointer Authentication to be configured independently for kernel
and userspace.
* for-next/ptrauth:
arm64: Conditionally configure PTR_AUTH key of the kernel.
arm64: Add ARM64_PTR_AUTH_KERNEL config option
|
|
PMU driver cleanups for managing IRQ affinity and exposing event
attributes via sysfs.
* for-next/perf: (36 commits)
drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe()
perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number
arm64: perf: Simplify EVENT ATTR macro in perf_event.c
drivers/perf: Simplify EVENT ATTR macro in fsl_imx8_ddr_perf.c
drivers/perf: Simplify EVENT ATTR macro in xgene_pmu.c
drivers/perf: Simplify EVENT ATTR macro in qcom_l3_pmu.c
drivers/perf: Simplify EVENT ATTR macro in qcom_l2_pmu.c
drivers/perf: Simplify EVENT ATTR macro in SMMU PMU driver
perf: Add EVENT_ATTR_ID to simplify event attributes
perf/smmuv3: Don't trample existing events with global filter
perf/hisi: Constify static attribute_group structs
perf: qcom: Remove redundant dev_err call in qcom_l3_cache_pmu_probe()
drivers/perf: hisi: Fix data source control
arm64: perf: Add more support on caps under sysfs
perf: qcom_l2_pmu: move to use request_irq by IRQF_NO_AUTOEN flag
arm_pmu: move to use request_irq by IRQF_NO_AUTOEN flag
perf: arm_spe: use DEVICE_ATTR_RO macro
perf: xgene_pmu: use DEVICE_ATTR_RO macro
perf: qcom: use DEVICE_ATTR_RO macro
perf: arm_pmu: use DEVICE_ATTR_RO macro
...
|
|
KASAN optimisations for the hardware tagging (MTE) implementation.
* for-next/mte:
kasan: disable freed user page poisoning with HW tags
arm64: mte: handle tags zeroing at page allocation time
kasan: use separate (un)poison implementation for integrated init
mm: arch: remove indirection level in alloc_zeroed_user_highpage_movable()
kasan: speed up mte_set_mem_tag_range
|
|
Lots of cleanup to our various page-table definitions, but also some
non-critical fixes and removal of some unnecessary memory types. The
most interesting change here is the reduction of ARCH_DMA_MINALIGN back
to 64 bytes, since we're not aware of any machines that need a higher
value with the way the code is structured (only needed for non-coherent
DMA).
* for-next/mm:
arm64: tlb: fix the TTL value of tlb_get_level
arm64/mm: Rename ARM64_SWAPPER_USES_SECTION_MAPS
arm64: head: fix code comments in set_cpu_boot_mode_flag
arm64: mm: drop unused __pa(__idmap_text_start)
arm64: mm: fix the count comments in compute_indices
arm64/mm: Fix ttbr0 values stored in struct thread_info for software-pan
arm64: mm: Pass original fault address to handle_mm_fault()
arm64/mm: Drop SECTION_[SHIFT|SIZE|MASK]
arm64/mm: Use CONT_PMD_SHIFT for ARM64_MEMSTART_SHIFT
arm64/mm: Drop SWAPPER_INIT_MAP_SIZE
arm64: mm: decode xFSC in mem_abort_decode()
arm64: mm: Add is_el1_data_abort() helper
arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
arm64: mm: Remove unused support for Normal-WT memory type
arm64: acpi: Map EFI_MEMORY_WT memory as Normal-NC
arm64: mm: Remove unused support for Device-GRE memory type
arm64: mm: Use better bitmap_zalloc()
arm64/mm: Make vmemmap_free() available only with CONFIG_MEMORY_HOTPLUG
arm64/mm: Remove [PUD|PMD]_TABLE_BIT from [pud|pmd]_bad()
arm64/mm: Validate CONFIG_PGTABLE_LEVELS
|
|
Reduce loglevel of useless print during CPU offlining.
* for-next/misc:
arm64: smp: Bump debugging information print down to KERN_DEBUG
|
|
Optimise out-of-line KASAN checking when using software tagging.
* for-next/kasan:
kasan: arm64: support specialized outlined tag mismatch checks
|
|
Refactoring of our instruction decoding routines and addition of some
missing encodings.
* for-next/insn:
arm64: insn: avoid circular include dependency
arm64: insn: move AARCH64_INSN_SIZE into <asm/insn.h>
arm64: insn: decouple patching from insn code
arm64: insn: Add load/store decoding helpers
arm64: insn: Add some opcodes to instruction decoder
arm64: insn: Add barrier encodings
arm64: insn: Add SVE instruction class
arm64: Move instruction encoder/decoder under lib/
arm64: Move aarch32 condition check functions
arm64: Move patching utilities out of instruction encoding/decoding
|