summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2021-09-13x86/extable: Provide EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFEThomas Gleixner
Provide exception fixup types which can be used to identify fixups which allow in kernel #MC recovery and make them invoke the existing handlers. These will be used at places where #MC recovery is handled correctly by the caller. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210908132525.269689153@linutronix.de
2021-09-13x86/extable: Rework the exception table mechanicsThomas Gleixner
The exception table entries contain the instruction address, the fixup address and the handler address. All addresses are relative. Storing the handler address has a few downsides: 1) Most handlers need to be exported 2) Handlers can be defined everywhere and there is no overview about the handler types 3) MCE needs to check the handler type to decide whether an in kernel #MC can be recovered. The functionality of the handler itself is not in any way special, but for these checks there need to be separate functions which in the worst case have to be exported. Some of these 'recoverable' exception fixups are pretty obscure and just reuse some other handler to spare code. That obfuscates e.g. the #MC safe copy functions. Cleaning that up would require more handlers and exports Rework the exception fixup mechanics by storing a fixup type number instead of the handler address and invoke the proper handler for each fixup type. Also teach the extable sort to leave the type field alone. This makes most handlers static except for special cases like the MCE MSR fixup and the BPF fixup. This allows to add more types for cleaning up the obscure places without adding more handler code and exports. There is a marginal code size reduction for a production config and it removes _eight_ exported symbols. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lkml.kernel.org/r/20210908132525.211958725@linutronix.de
2021-09-13x86/mce: Get rid of stray semicolonsThomas Gleixner
and the random number of tabs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210908132525.154428878@linutronix.de
2021-09-13x86/mce: Deduplicate exception handlingThomas Gleixner
Prepare code for further simplification. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210908132525.096452100@linutronix.de
2021-09-11Merge branch 'linus' into smp/urgentThomas Gleixner
Ensure that all usage sites of get/put_online_cpus() except for the struggler in drivers/thermal are gone. So the last user and the deprecated inlines can be removed.
2021-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups s390: - enable interpretation of specification exceptions - fix a vcpu_idx vs vcpu_id mixup x86: - fast (lockless) page fault support for the new MMU - new MMU now the default - increased maximum allowed VCPU count - allow inhibit IRQs on KVM_RUN while debugging guests - let Hyper-V-enabled guests run with virtualized LAPIC as long as they do not enable the Hyper-V "AutoEOI" feature - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC) - tuning for the case when two-dimensional paging (EPT/NPT) is disabled - bugfixes and cleanups, especially with respect to vCPU reset and choosing a paging mode based on CR0/CR4/EFER - support for 5-level page table on AMD processors Generic: - MMU notifier invalidation callbacks do not take mmu_lock unless necessary - improved caching of LRU kvm_memory_slot - support for histogram statistics - add statistics for halt polling and remote TLB flush requests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits) KVM: Drop unused kvm_dirty_gfn_invalid() KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: MMU: mark role_regs and role accessors as maybe unused KVM: MIPS: Remove a "set but not used" variable x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait KVM: stats: Add VM stat for remote tlb flush requests KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 kvm: x86: Increase MAX_VCPUS to 1024 kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: s390: Enable specification exception interpretation KVM: arm64: Trim guest debug exception handling KVM: SVM: Add 5-level page table support for SVM ...
2021-09-06Merge tag 'kvmarm-5.15' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.15 - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups
2021-09-06x86/kvm: Don't enable IRQ when IRQ enabled in kvm_waitLai Jiangshan
Commit f4e61f0c9add3 ("x86/kvm: Fix broken irq restoration in kvm_wait") replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable() when IRQ enabled" to suppress a warnning. Although there is no similar debugging warnning for doing local_irq_enable() when IRQ enabled as doing local_irq_restore() in the same IRQ situation. But doing local_irq_enable() when IRQ enabled is no less broken as doing local_irq_restore() and we'd better avoid it. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210814035129.154242-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03memblock: make memblock_find_in_range method privateMike Rapoport
There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv] Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03memcg: enable accounting for ldt_struct objectsVasily Averin
Each task can request own LDT and force the kernel to allocate up to 64Kb memory per-mm. There are legitimate workloads with hundreds of processes and there can be hundreds of workloads running on large machines. The unaccounted memory can cause isolation issues between the workloads particularly on highly utilized machines. It makes sense to account for this objects to restrict the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/38010594-50fe-c06d-7cb0-d1f77ca422f3@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roman Gushchin <guro@fb.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yutian Yang <nglaive@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-02Merge tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - fix debugfs initialization order (Anthony Iliopoulos) - use memory_intersects() directly (Kefeng Wang) - allow to return specific errors from ->map_sg (Logan Gunthorpe, Martin Oliveira) - turn the dma_map_sg return value into an unsigned int (me) - provide a common global coherent pool іmplementation (me) * tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping: (31 commits) hexagon: use the generic global coherent pool dma-mapping: make the global coherent pool conditional dma-mapping: add a dma_init_global_coherent helper dma-mapping: simplify dma_init_coherent_memory dma-mapping: allow using the global coherent pool for !ARM ARM/nommu: use the generic dma-direct code for non-coherent devices dma-direct: add support for dma_coherent_default_memory dma-mapping: return an unsigned int from dma_map_sg{,_attrs} dma-mapping: disallow .map_sg operations from returning zero on error dma-mapping: return error code from dma_dummy_map_sg() x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR x86/amd_gart: return error code from gart_map_sg() xen: swiotlb: return error code from xen_swiotlb_map_sg() parisc: return error code from .map_sg() ops sparc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR sparc/iommu: return error codes from .map_sg() ops s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR s390/pci: return error code from s390_dma_map_sg() powerpc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR powerpc/iommu: return error code from .map_sg() ops ...
2021-09-01Merge tag 'printk-for-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
2021-09-01Merge tag 'hyperv-next-signed-20210831' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - make Hyper-V code arch-agnostic (Michael Kelley) - fix sched_clock behaviour on Hyper-V (Ani Sinha) - fix a fault when Linux runs as the root partition on MSHV (Praveen Kumar) - fix VSS driver (Vitaly Kuznetsov) - cleanup (Sonia Sharma) * tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hv_utils: Set the maximum packet size for VSS driver to the length of the receive buffer Drivers: hv: Enable Hyper-V code to be built on ARM64 arm64: efi: Export screen_info arm64: hyperv: Initialize hypervisor on boot arm64: hyperv: Add panic handler arm64: hyperv: Add Hyper-V hypercall and register access utilities x86/hyperv: fix root partition faults when writing to VP assist page MSR hv: hyperv.h: Remove unused inline functions drivers: hv: Decouple Hyper-V clock/timer code from VMbus drivers x86/hyperv: add comment describing TSC_INVARIANT_CONTROL MSR setting bit 0 Drivers: hv: Move Hyper-V misc functionality to arch-neutral code Drivers: hv: Add arch independent default functions for some Hyper-V handlers Drivers: hv: Make portions of Hyper-V init code be arch neutral x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable asm-generic/hyperv: Add missing #include of nmi.h
2021-09-01Merge branch 'siginfo-si_trapno-for-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo si_trapno updates from Eric Biederman: "The full set of si_trapno changes was not appropriate as a fix for the newly added SIGTRAP TRAP_PERF, and so I postponed the rest of the related cleanups. This is the rest of the cleanups for si_trapno that reduces it from being a really weird arch special case that is expect to be always present (but isn't) on the architectures that support it to being yet another field in the _sigfault union of struct siginfo. The changes have been reviewed and marinated in linux-next. With the removal of this awkward special case new code (like SIGTRAP TRAP_PERF) that works across architectures should be easier to write and maintain" * 'siginfo-si_trapno-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistency signal: Verify the alignment and size of siginfo_t signal: Remove the generic __ARCH_SI_TRAPNO support signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP arm64: Add compile-time asserts for siginfo_t offsets arm: Add compile-time asserts for siginfo_t offsets sparc64: Add compile-time asserts for siginfo_t offsets
2021-09-01Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - i915 has seen a lot of refactoring and uAPI cleanups due to a change in the upstream direction going forward This has all been audited with known userspace, but there may be some pitfalls that were missed. - i915 now uses common TTM to enable discrete memory on DG1/2 GPUs - i915 enables Jasper and Elkhart Lake by default and has preliminary XeHP/DG2 support - amdgpu adds support for Cyan Skillfish - lots of implicit fencing rules documented and fixed up in drivers - msm now uses the core scheduler - the irq midlayer has been removed for non-legacy drivers - the sysfb code now works on more than x86. Otherwise the usual smattering of stuff everywhere, panels, bridges, refactorings. Detailed summary: core: - extract i915 eDP backlight into core - DP aux bus support - drm_device.irq_enabled removed - port drivers to native irq interfaces - export gem shadow plane handling for vgem - print proper driver name in framebuffer registration - driver fixes for implicit fencing rules - ARM fixed rate compression modifier added - updated fb damage handling - rmfb ioctl logging/docs - drop drm_gem_object_put_locked - define DRM_FORMAT_MAX_PLANES - add gem fb vmap/vunmap helpers - add lockdep_assert(once) helpers - mark drm irq midlayer as legacy - use offset adjusted bo mapping conversion vgaarb: - cleanups fbdev: - extend efifb handling to all arches - div by 0 fixes for multiple drivers udmabuf: - add hugepage mapping support dma-buf: - non-dynamic exporter fixups - document implicit fencing rules amdgpu: - Initial Cyan Skillfish support - switch virtual DCE over to vkms based atomic - VCN/JPEG power down fixes - NAVI PCIE link handling fixes - AMD HDMI freesync fixes - Yellow Carp + Beige Goby fixes - Clockgating/S0ix/SMU/EEPROM fixes - embed hw fence in job - rework dma-resv handling - ensure eviction to system ram amdkfd: - uapi: SVM address range query added - sysfs leak fix - GPUVM TLB optimizations - vmfault/migration counters i915: - Enable JSL and EHL by default - preliminary XeHP/DG2 support - remove all CNL support (never shipped) - move to TTM for discrete memory support - allow mixed object mmap handling - GEM uAPI spring cleaning - add I915_MMAP_OBJECT_FIXED - reinstate ADL-P mmap ioctls - drop a bunch of unused by userspace features - disable and remove GPU relocations - revert some i915 misfeatures - major refactoring of GuC for Gen11+ - execbuffer object locking separate step - reject caching/set-domain on discrete - Enable pipe DMC loading on XE-LPD and ADL-P - add PSF GV point support - Refactor and fix DDI buffer translations - Clean up FBC CFB allocation code - Finish INTEL_GEN() and friends macro conversions nouveau: - add eDP backlight support - implicit fence fix msm: - a680/7c3 support - drm/scheduler conversion panfrost: - rework GPU reset virtio: - fix fencing for planes ast: - add detect support bochs: - move to tiny GPU driver vc4: - use hotplug irqs - HDMI codec support vmwgfx: - use internal vmware device headers ingenic: - demidlayering irq rcar-du: - shutdown fixes - convert to bridge connector helpers zynqmp-dsub: - misc fixes mgag200: - convert PLL handling to atomic mediatek: - MT8133 AAL support - gem mmap object support - MT8167 support etnaviv: - NXP Layerscape LS1028A SoC support - GEM mmap cleanups tegra: - new user API exynos: - missing unlock fix - build warning fix - use refcount_t" * tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits) drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box drm/amd/display: Remove duplicate dml init drm/amd/display: Update bounding box states (v2) drm/amd/display: Update number of DCN3 clock states drm/amdgpu: disable GFX CGCG in aldebaran drm/amdgpu: Clear RAS interrupt status on aldebaran drm/amdgpu: Add support for RAS XGMI err query drm/amdkfd: Account for SH/SE count when setting up cu masks. drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain drm/amdgpu: drop redundant cancel_delayed_work_sync call drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend drm/amdkfd: map SVM range with correct access permission drm/amdkfd: check access permisson to restore retry fault drm/amdgpu: Update RAS XGMI Error Query drm/amdgpu: Add driver infrastructure for MCA RAS drm/amd/display: Add Logging for HDMI color depth information drm/amd/amdgpu: consolidate PSP TA init shared buf functions drm/amd/amdgpu: add name field back to ras_common_if drm/amdgpu: Fix build with missing pm_suspend_target_state module export ...
2021-09-01x86/setup: Explicitly include acpi.hNathan Chancellor
After commit 342f43af70db ("iscsi_ibft: fix crash due to KASLR physical memory remapping") x86_64_defconfig shows the following errors: arch/x86/kernel/setup.c: In function ‘setup_arch’: arch/x86/kernel/setup.c:916:13: error: implicit declaration of function ‘acpi_mps_check’ [-Werror=implicit-function-declaration] 916 | if (acpi_mps_check()) { | ^~~~~~~~~~~~~~ arch/x86/kernel/setup.c:1110:9: error: implicit declaration of function ‘acpi_table_upgrade’ [-Werror=implicit-function-declaration] 1110 | acpi_table_upgrade(); | ^~~~~~~~~~~~~~~~~~ [... more acpi noise ...] acpi.h was being implicitly included from iscsi_ibft.h in this configuration so the removal of that header means these functions have no definition or declaration. In most other configurations, <linux/acpi.h> continued to be included through at least <linux/tboot.h> if CONFIG_INTEL_TXT was enabled, and there were probably other implicit include paths too. Add acpi.h explicitly so there is no more error, and so that we don't continue to depend on these unreliable implicit include paths. Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Cc: Maurizio Lombardi <mlombard@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-01drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()Thomas Gleixner
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework to ensure that the cache related functions are called on the upcoming CPU because the notifier itself could run on any online CPU. The hotplug state machine guarantees that the callbacks are invoked on the upcoming CPU. So there is no need to have this SMP function call obfuscation. That indirection was missed when the hotplug notifiers were converted. This also solves the problem of ARM64 init_cache_level() invoking ACPI functions which take a semaphore in that context. That's invalid as SMP function calls run with interrupts disabled. Running it just from the callback in context of the CPU hotplug thread solves this. Fixes: 8571890e1513 ("arm64: Add support for ACPI based firmware tables") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
2021-08-31Merge branch 'stable/for-linus-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft Pull ibft updates from Konrad Rzeszutek Wilk: "A fix for iBFT parsing code badly interfacing when KASLR is enabled" * 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft: iscsi_ibft: fix warning in reserve_ibft_region() iscsi_ibft: fix crash due to KASLR physical memory remapping
2021-08-31Merge tag 'hwmon-for-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "New drivers for: - Aquacomputer D5 Next - SB-RMI power module Added chip support to existing drivers: - Support for various Zen2 and Zen3 APUs and for Yellow Carp (SMU v13) added to k10temp driver - Support for Silicom n5010 PAC added to intel-m10-bmc driver - Support for BPD-RS600 added to pmbus/bpa-rs600 driver Other notable changes: - In k10temp, do not display Tdie on Zen CPUs if there is no difference between Tdie and Tctl - Converted adt7470 and dell-smm drivers to use devm_hwmon_device_register_with_info API - Support for temperature/pwm tables added to axi-fan-control driver - Enabled fan control for Dell Precision 7510 in dell-smm driver Various other minor improvements and fixes in several drivers" * tag 'hwmon-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (41 commits) hwmon: add driver for Aquacomputer D5 Next hwmon: (adt7470) Convert to devm_hwmon_device_register_with_info API hwmon: (adt7470) Convert to use regmap hwmon: (adt7470) Fix some style issues hwmon: (k10temp) Add support for yellow carp hwmon: (k10temp) Rework the temperature offset calculation hwmon: (k10temp) Don't show Tdie for all Zen/Zen2/Zen3 CPU/APU hwmon: (k10temp) Add additional missing Zen2 and Zen3 APUs hwmon: remove amd_energy driver in Makefile hwmon: (dell-smm) Rework SMM function debugging hwmon: (dell-smm) Mark i8k_get_fan_nominal_speed as __init hwmon: (dell-smm) Mark tables as __initconst hwmon: (pmbus/bpa-rs600) Add workaround for incorrect Pin max hwmon: (pmbus/bpa-rs600) Don't use rated limits as warn limits hwmon: (axi-fan-control) Support temperature vs pwm points hwmon: (axi-fan-control) Handle irqs in natural order hwmon: (axi-fan-control) Make sure the clock is enabled hwmon: (pmbus/ibm-cffps) Fix write bits for LED control hwmon: (w83781d) Match on device tree compatibles dt-bindings: hwmon: Add bindings for Winbond W83781D ...
2021-08-30Merge tag 'x86-misc-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Thomas Gleixner: "A set of updates for the x86 reboot code: - Limit the Dell Optiplex 990 quirk to early BIOS versions to avoid the full 'power cycle' alike reboot which is required for the buggy BIOSes. - Update documentation for the reboot=pci command line option and document how DMI platform quirks can be overridden" * tag 'x86-misc-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions x86/reboot: Document how to override DMI platform quirks x86/reboot: Document the "reboot=pci" option
2021-08-30Merge tag 'x86-irq-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PIRQ updates from Thomas Gleixner: "A set of updates to support port 0x22/0x23 based PCI configuration space which can be found on various ALi chipsets and is also available on older Intel systems which expose a PIRQ router. While the Intel support is more or less nostalgia, the ALi chips are still in use on popular embedded boards used for routers" * tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix typo s/ECLR/ELCR/ for the PIC register x86: Avoid magic number with ELCR register accesses x86/PCI: Add support for the Intel 82426EX PIRQ router x86/PCI: Add support for the Intel 82374EB/82374SB (ESC) PIRQ router x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router x86: Add support for 0x22/0x23 port I/O configuration space
2021-08-30Merge tag 'x86-cpu-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache flush updates from Thomas Gleixner: "A reworked version of the opt-in L1D flush mechanism. This is a stop gap for potential future speculation related hardware vulnerabilities and a mechanism for truly security paranoid applications. It allows a task to request that the L1D cache is flushed when the kernel switches to a different mm. This can be requested via prctl(). Changes vs the previous versions: - Get rid of the software flush fallback - Make the handling consistent with other mitigations - Kill the task when it ends up on a SMT enabled core which defeats the purpose of L1D flushing obviously" * tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: Add L1D flushing Documentation x86, prctl: Hook L1D flushing in via prctl x86/mm: Prepare for opt-in based L1D flush in switch_mm() x86/process: Make room for TIF_SPEC_L1D_FLUSH sched: Add task_work callback for paranoid L1D flush x86/mm: Refactor cond_ibpb() to support other use cases x86/smp: Add a per-cpu view of SMT state
2021-08-30Merge tag 'perf-core-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event updates from Ingo Molnar: - Add support for Intel Sapphire Rapids server CPU uncore events - Allow the AMD uncore driver to be built as a module - Misc cleanups and fixes * tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header perf/amd/uncore: Allow the driver to be built as a module x86/cpu: Add get_llc_id() helper function perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/ perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL perf/hw_breakpoint: Replace deprecated CPU-hotplug functions perf/x86/intel: Replace deprecated CPU-hotplug functions perf/x86: Remove unused assignment to pointer 'e' perf/x86/intel/uncore: Fix IIO cleanup mapping procedure for SNR/ICX perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() perf/x86/intel/uncore: Add alias PMU name perf/x86/intel/uncore: Add Sapphire Rapids server MDF support perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support perf/x86/intel/uncore: Add Sapphire Rapids server UPI support perf/x86/intel/uncore: Add Sapphire Rapids server M2M support perf/x86/intel/uncore: Add Sapphire Rapids server IMC support perf/x86/intel/uncore: Add Sapphire Rapids server PCU support perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support ...
2021-08-30Merge tag 'x86_cleanups_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: "The usual round of minor cleanups and fixes" * tag 'x86_cleanups_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kaslr: Have process_mem_region() return a boolean x86/power: Fix kernel-doc warnings in cpu.c x86/mce/inject: Replace deprecated CPU-hotplug functions. x86/microcode: Replace deprecated CPU-hotplug functions. x86/mtrr: Replace deprecated CPU-hotplug functions. x86/mmiotrace: Replace deprecated CPU-hotplug functions.
2021-08-30Merge tag 'x86_cache_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "A first round of changes towards splitting the arch-specific bits from the filesystem bits of resctrl, the ultimate goal being to support ARM's equivalent technology MPAM, with the same fs interface (James Morse)" * tag 'x86_cache_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/resctrl: Make resctrl_arch_get_config() return its value x86/resctrl: Merge the CDP resources x86/resctrl: Expand resctrl_arch_update_domains()'s msr_param range x86/resctrl: Remove rdt_cdp_peer_get() x86/resctrl: Merge the ctrl_val arrays x86/resctrl: Calculate the index from the configuration type x86/resctrl: Apply offset correction when config is staged x86/resctrl: Make ctrlval arrays the same size x86/resctrl: Pass configuration type to resctrl_arch_get_config() x86/resctrl: Add a helper to read a closid's configuration x86/resctrl: Rename update_domains() to resctrl_arch_update_domains() x86/resctrl: Allow different CODE/DATA configurations to be staged x86/resctrl: Group staged configuration into a separate struct x86/resctrl: Move the schemata names into struct resctrl_schema x86/resctrl: Add a helper to read/set the CDP configuration x86/resctrl: Swizzle rdt_resource and resctrl_schema in pseudo_lock_region x86/resctrl: Pass the schema to resctrl filesystem functions x86/resctrl: Add resctrl_arch_get_num_closid() x86/resctrl: Store the effective num_closid in the schema x86/resctrl: Walk the resctrl schema list instead of an arch list ...
2021-08-30Merge tag 'ras_core_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS update from Borislav Petkov: "A single RAS change for 5.15: - Do not start processing MCEs logged early because the decoding chain is not up yet - delay that processing until everything is ready" * tag 'ras_core_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Defer processing of early errors
2021-08-27hwmon: (k10temp) Add support for yellow carpMario Limonciello
Yellow carp matches same behavior as green sardine and other Zen3 products, but have different CCD offsets. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20210827201527.24454-3-mario.limonciello@amd.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-08-26x86/cpu: Add get_llc_id() helper functionKim Phillips
Factor out a helper function rather than export cpu_llc_id, which is needed in order to be able to build the AMD uncore driver as a module. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-7-kim.phillips@amd.com
2021-08-24x86/mce: Defer processing of early errorsBorislav Petkov
When a fatal machine check results in a system reset, Linux does not clear the error(s) from machine check bank(s) - hardware preserves the machine check banks across a warm reset. During initialization of the kernel after the reboot, Linux reads, logs, and clears all machine check banks. But there is a problem. In: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") the call to mce_register_decode_chain() moved later in the boot sequence. This means that /dev/mcelog doesn't see those early error logs. This was partially fixed by: cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers") which made sure that the logs were not lost completely by printing to the console. But parsing console logs is error prone. Users of /dev/mcelog should expect to find any early errors logged to standard places. Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early machine check initialization to indicate that any errors found should just be queued to genpool. When mcheck_late_init() is called it will call mce_schedule_work() to actually log and flush any errors queued in the genpool. [ Based on an original patch, commit message by and completely productized by Tony Luck. ] Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") Reported-by: Sumanth Kamatala <skamatala@juniper.net> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
2021-08-22x86/resctrl: Fix a maybe-uninitialized build warning treated as errorBabu Moger
The recent commit 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") caused a RHEL build failure with an uninitialized variable warning treated as an error because it removed the default case snippet. The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly uninitialized variable warnings to be treated as errors. This is also reported by smatch via the 0day robot. The error from the RHEL build is: arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’: arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used uninitialized in this function [-Werror=maybe-uninitialized] m->chunks += chunks; ^~ The upstream Makefile does not build using '-Werror=maybe-uninitialized'. So, the problem is not seen there. Fix the problem by putting back the default case snippet. [ bp: note that there's nothing wrong with the code and other compilers do not trigger this warning - this is being done just so the RHEL compiler is happy. ] Fixes: 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") Reported-by: Terry Bowman <Terry.Bowman@amd.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
2021-08-15Merge tag 'irq-urgent-2021-08-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early
2021-08-12x86/resctrl: Fix default monitoring groups reportingBabu Moger
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem. Steps to reproduce: 1. mount -t resctrl resctrl /sys/fs/resctrl/ 2. cd /sys/fs/resctrl/ 3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832 4. Create sub monitor group: mkdir mon_groups/test1 5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable". When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable". Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs. Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik <pawel.szulik@intel.com> Signed-off-by: Babu Moger <Babu.Moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
2021-08-12x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versionsPaul Gortmaker
When this platform was relatively new in November 2011, with early BIOS revisions, a reboot quirk was added in commit 6be30bb7d750 ("x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot") However, this quirk (and several others) are open-ended to all BIOS versions and left no automatic expiry if/when the system BIOS fixed the issue, meaning that nobody is likely to come along and re-test. What is really problematic with using PCI reboot as this quirk does, is that it causes this platform to do a full power down, wait one second, and then power back on. This is less than ideal if one is using it for boot testing and/or bisecting kernels when legacy rotating hard disks are installed. It was only by chance that the quirk was noticed in dmesg - and when disabled it turned out that it wasn't required anymore (BIOS A24), and a default reboot would work fine without the "harshness" of power cycling the machine (and disks) down and up like the PCI reboot does. Doing a bit more research, it seems that the "newest" BIOS for which the issue was reported[1] was version A06, however Dell[2] seemed to suggest only up to and including version A05, with the A06 having a large number of fixes[3] listed. As is typical with a new platform, the initial BIOS updates come frequently and then taper off (and in this case, with a revival for CPU CVEs); a search for O990-A<ver>.exe reveals the following dates: A02 16 Mar 2011 A03 11 May 2011 A06 14 Sep 2011 A07 24 Oct 2011 A10 08 Dec 2011 A14 06 Sep 2012 A16 15 Oct 2012 A18 30 Sep 2013 A19 23 Sep 2015 A20 02 Jun 2017 A23 07 Mar 2018 A24 21 Aug 2018 While it's overkill to flash and test each of the above, it would seem likely that the issue was contained within A0x BIOS versions, given the dates above and the dates of issue reports[4] from distros. So rather than just throw out the quirk entirely, limit the scope to just those early BIOS versions, in case people are still running systems from 2011 with the original as-shipped early A0x BIOS versions. [1] https://lore.kernel.org/lkml/1320373471-3942-1-git-send-email-trenn@suse.de/ [2] https://www.dell.com/support/kbdoc/en-ca/000131908/linux-based-operating-systems-stall-upon-reboot-on-optiplex-390-790-990-systems [3] https://www.dell.com/support/home/en-ca/drivers/driversdetails?driverid=85j10 [4] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/768039 Fixes: 6be30bb7d750 ("x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210530162447.996461-4-paul.gortmaker@windriver.com
2021-08-11x86/resctrl: Make resctrl_arch_get_config() return its valueJames Morse
resctrl_arch_get_config() has no return, but does pass a single value back via one of its arguments. Return the value instead. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210811163831.14917-1-james.morse@arm.com
2021-08-11x86/resctrl: Merge the CDP resourcesJames Morse
resctrl uses struct rdt_resource to describe the available hardware resources. The domains of the CDP aliases share a single ctrl_val[] array. The only differences between the struct rdt_hw_resource aliases is the name and conf_type. The name from struct rdt_hw_resource is visible to user-space. To support another architecture, as many user-visible details should be handled in the filesystem parts of the code that is common to all architectures. The name and conf_type go together. Remove conf_type and the CDP aliases. When CDP is supported and enabled, schemata_list_create() can create two schemata using the single resource, generating the CODE/DATA suffix to the schema name itself. This allows the alloc_ctrlval_array() and complications around free()ing the ctrl_val arrays to be removed. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-25-james.morse@arm.com
2021-08-11x86/resctrl: Expand resctrl_arch_update_domains()'s msr_param rangeJames Morse
resctrl_arch_update_domains() specifies the one closid that has been modified and needs copying to the hardware. resctrl_arch_update_domains() takes a struct rdt_resource and a closid as arguments, but copies all the staged configurations for that closid into the ctrl_val[] array. resctrl_arch_update_domains() is called once per schema, but once the resources and domains are merged, the second call of a L2CODE/L2DATA pair will find no staged configurations, as they were previously applied. The msr_param of the first call only has one index, so would only have update the hardware for the last staged configuration. To avoid a second round of IPIs when changing L2CODE and L2DATA in one go, expand the range of the msr_param if multiple staged configurations are found. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-24-james.morse@arm.com
2021-08-11x86/resctrl: Remove rdt_cdp_peer_get()James Morse
When CDP is enabled, rdt_cdp_peer_get() finds the alternative CODE/DATA resource and returns the alternative domain. This is used to determine if bitmaps overlap when there are aliased entries in the two struct rdt_hw_resources. Now that the ctrl_val[] used by the CODE/DATA resources is the same, the search for an alternate resource/domain is not needed. Replace rdt_cdp_peer_get() with resctrl_peer_type(), which returns the alternative type. This can be passed to resctrl_arch_get_config() with the same resource and domain. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-23-james.morse@arm.com
2021-08-11x86/resctrl: Merge the ctrl_val arraysJames Morse
Each struct rdt_hw_resource has its own ctrl_val[] array. When CDP is enabled, two resources are in use, each with its own ctrl_val[] array that holds half of the configuration used by hardware. One uses the odd slots, the other the even. rdt_cdp_peer_get() is the helper to find the alternate resource, its domain, and corresponding entry in the other ctrl_val[] array. Once the CDP resources are merged there will be one struct rdt_hw_resource and one ctrl_val[] array for each hardware resource. This will include changes to rdt_cdp_peer_get(), making it hard to bisect any issue. Merge the ctrl_val[] arrays for three CODE/DATA/NONE resources first. Doing this before merging the resources temporarily complicates allocating and freeing the ctrl_val arrays. Add a helper to allocate the ctrl_val array, that returns the value on the L2 or L3 resource if it already exists. This gets removed once the resources are merged, and there really is only one ctrl_val[] array. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-22-james.morse@arm.com
2021-08-11x86/resctrl: Calculate the index from the configuration typeJames Morse
resctrl uses cbm_idx() to map a closid to an index in the configuration array. This is based on a multiplier and offset that are held in the resource. To merge the resources, the resctrl arch code needs to calculate the index from something else, as there will only be one resource. Decide based on the staged configuration type. This makes the static mult and offset parameters redundant. [ bp: Remove superfluous brackets in get_config_index() ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-21-james.morse@arm.com
2021-08-11x86/resctrl: Apply offset correction when config is stagedJames Morse
When resctrl comes to copy the CAT MSR values from the ctrl_val[] array into hardware, it applies an offset adjustment based on the type of the resource. CODE and DATA resources have their closid mapped into an odd/even range. This mapping is based on a property of the resource. This happens once the new control value has been written to the ctrl_val[] array. Once the CDP resources are merged, there will only be a single property that needs to cover both odd/even mappings to the single ctrl_val[] array. The offset adjustment must be applied before the new value is written to the array. Move the logic from cat_wrmsr() to resctrl_arch_update_domains(). The value provided to apply_config() is now an index in the array, not the closid. The parameters provided via struct msr_param are now indexes too. As resctrl's use of closid is a u32, struct msr_param's type is changed to match. With this, the CODE and DATA resources only use the odd or even indexes in the array. This allows the temporary num_closid/2 fixes in domain_setup_ctrlval() and reset_all_ctrls() to be removed. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-20-james.morse@arm.com
2021-08-11x86/resctrl: Make ctrlval arrays the same sizeJames Morse
The CODE and DATA resources report a num_closid that is half the actual size supported by the hardware. This behaviour is visible to user-space when CDP is enabled. The CODE and DATA resources have their own ctrlval arrays which are half the size of the underlying hardware because num_closid was already adjusted. One holds the odd configurations values, the other even. Before the CDP resources can be merged, the 'half the closids' behaviour needs to be implemented by schemata_list_create(), but this causes the ctrl_val[] array to be full sized. Remove the logic from the architecture specific rdt_get_cdp_config() setup, and add it to schemata_list_create(). Functions that walk all the configurations, such as domain_setup_ctrlval() and reset_all_ctrls(), take num_closid directly from struct rdt_hw_resource also have to halve num_closid as only the lower half of each array is in use. domain_setup_ctrlval() and reset_all_ctrls() both copy struct rdt_hw_resource's num_closid to a struct msr_param. Correct the value here. This is temporary as a subsequent patch will merge all three ctrl_val[] arrays such that when CDP is in use, the CODA/DATA layout in the array matches the hardware. reset_all_ctrls()'s loop over the whole of ctrl_val[] is not touched as this is harmless, and will be required as it is once the resources are merged. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-19-james.morse@arm.com
2021-08-11x86/resctrl: Pass configuration type to resctrl_arch_get_config()James Morse
The ctrl_val[] array for a struct rdt_hw_resource only holds configurations of one type. The type is implicit. Once the CDP resources are merged, the ctrl_val[] array will hold all the configurations for the hardware resource. When a particular type of configuration is needed, it must be specified explicitly. Pass the expected type from the schema into resctrl_arch_get_config(). Nothing uses this yet, but once a single ctrl_val[] array is used for the three struct rdt_hw_resources that share hardware, the type will be used to return the correct configuration value from the shared array. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-18-james.morse@arm.com
2021-08-11x86/resctrl: Add a helper to read a closid's configurationJames Morse
Functions like show_doms() reach into the architecture's private structure to retrieve the configuration from the struct rdt_hw_resource. The hardware configuration may look completely different to the values resctrl gets from user-space. The staged configuration and resctrl_arch_update_domains() allow the architecture to convert or translate these values. Resctrl shouldn't read or write the ctrl_val[] values directly. Add a helper to read the current configuration. This will allow another architecture to scale the bitmaps if necessary, and possibly use controls that don't take the user-space control format at all. Of the remaining functions that access ctrl_val[] directly, apply_config() is part of the architecture-specific code, and is called via resctrl_arch_update_domains(). reset_all_ctrls() will be an architecture specific helper. update_mba_bw() manipulates both ctrl_val[], mbps_val[] and the hardware. The mbps_val[] that matches the mba_sc state of the resource is changed, but the other is left unchanged. Abstracting this is the subject of later patches that affect set_mba_sc() too. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-17-james.morse@arm.com
2021-08-11x86/resctrl: Rename update_domains() to resctrl_arch_update_domains()James Morse
update_domains() merges the staged configuration changes into the arch codes configuration array. Rename to make it clear it is part of the arch code interface to resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-16-james.morse@arm.com
2021-08-11x86/resctrl: Allow different CODE/DATA configurations to be stagedJames Morse
Before the CDP resources can be merged, struct rdt_domain will need an array of struct resctrl_staged_config, one per type of configuration. Use the type as an index to the array to ensure that a schema configuration string can't specify the same domain twice. This will allow two schemata to apply configuration changes to one resource. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-15-james.morse@arm.com
2021-08-11x86/resctrl: Group staged configuration into a separate structJames Morse
When configuration changes are made, the new value is written to struct rdt_domain's new_ctrl field and the have_new_ctrl flag is set. Later new_ctrl is copied to hardware by a call to update_domains(). Once the CDP resources are merged, there will be one new_ctrl field in use by two struct resctrl_schema requiring a per-schema IPI to copy the value to hardware. Move new_ctrl and have_new_ctrl into a new struct resctrl_staged_config. Before the CDP resources can be merged, struct rdt_domain will need an array of these, one per type of configuration. Using the type as an index to the array will ensure that a schema configuration string can't specify the same domain twice. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-14-james.morse@arm.com
2021-08-11x86/resctrl: Move the schemata names into struct resctrl_schemaJames Morse
resctrl 'info' directories and schema parsing use the schema name. This lives in the struct rdt_resource, and is specified by the architecture code. Once the CDP resources are merged, there will only be one resource (and one name) in use by two schemata. To allow the CDP CODE/DATA property to be the type of configuration the schema uses, the name should also be per-schema. Add a name field to struct resctrl_schema, and use this wherever the schema name is exposed (or read from) user-space. Calculating max_name_width for padding the schemata file also moves as this is visible to user-space. As the names in struct rdt_resource already include the CDP information, schemata_list_create() copies them. schemata_list_create() includes the length of the CDP suffix when calculating max_name_width in preparation for CDP resources being merged. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-13-james.morse@arm.com
2021-08-11x86/resctrl: Add a helper to read/set the CDP configurationJames Morse
Whether CDP is enabled for a hardware resource like the L3 cache can be found by inspecting the alloc_enabled flags of the L3CODE/L3DATA struct rdt_hw_resources, even if they aren't in use. Once these resources are merged, the flags can't be compared. Whether CDP is enabled needs tracking explicitly. If another architecture is emulating CDP the behaviour may not be per-resource. 'cdp_capable' needs to be visible to resctrl, even if its not in use, as this affects the padding of the schemata table visible to user-space. Add cdp_enabled to struct rdt_hw_resource and cdp_capable to struct rdt_resource. Add resctrl_arch_set_cdp_enabled() to let resctrl enable or disable CDP on a resource. resctrl_arch_get_cdp_enabled() lets it read the current state. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-12-james.morse@arm.com
2021-08-11x86/resctrl: Swizzle rdt_resource and resctrl_schema in pseudo_lock_regionJames Morse
struct pseudo_lock_region points to the rdt_resource. Once the resources are merged, this won't be unique. The resource name is moving into the schema, so that the filesystem portions of resctrl can generate it. Swap pseudo_lock_region's rdt_resource pointer for a schema pointer. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-11-james.morse@arm.com