summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2021-08-30Merge tag 'x86-cpu-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache flush updates from Thomas Gleixner: "A reworked version of the opt-in L1D flush mechanism. This is a stop gap for potential future speculation related hardware vulnerabilities and a mechanism for truly security paranoid applications. It allows a task to request that the L1D cache is flushed when the kernel switches to a different mm. This can be requested via prctl(). Changes vs the previous versions: - Get rid of the software flush fallback - Make the handling consistent with other mitigations - Kill the task when it ends up on a SMT enabled core which defeats the purpose of L1D flushing obviously" * tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: Add L1D flushing Documentation x86, prctl: Hook L1D flushing in via prctl x86/mm: Prepare for opt-in based L1D flush in switch_mm() x86/process: Make room for TIF_SPEC_L1D_FLUSH sched: Add task_work callback for paranoid L1D flush x86/mm: Refactor cond_ibpb() to support other use cases x86/smp: Add a per-cpu view of SMT state
2021-08-30Merge tag 'perf-core-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event updates from Ingo Molnar: - Add support for Intel Sapphire Rapids server CPU uncore events - Allow the AMD uncore driver to be built as a module - Misc cleanups and fixes * tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header perf/amd/uncore: Allow the driver to be built as a module x86/cpu: Add get_llc_id() helper function perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/ perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL perf/hw_breakpoint: Replace deprecated CPU-hotplug functions perf/x86/intel: Replace deprecated CPU-hotplug functions perf/x86: Remove unused assignment to pointer 'e' perf/x86/intel/uncore: Fix IIO cleanup mapping procedure for SNR/ICX perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() perf/x86/intel/uncore: Add alias PMU name perf/x86/intel/uncore: Add Sapphire Rapids server MDF support perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support perf/x86/intel/uncore: Add Sapphire Rapids server UPI support perf/x86/intel/uncore: Add Sapphire Rapids server M2M support perf/x86/intel/uncore: Add Sapphire Rapids server IMC support perf/x86/intel/uncore: Add Sapphire Rapids server PCU support perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support ...
2021-08-30Merge tag 'x86_cleanups_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: "The usual round of minor cleanups and fixes" * tag 'x86_cleanups_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kaslr: Have process_mem_region() return a boolean x86/power: Fix kernel-doc warnings in cpu.c x86/mce/inject: Replace deprecated CPU-hotplug functions. x86/microcode: Replace deprecated CPU-hotplug functions. x86/mtrr: Replace deprecated CPU-hotplug functions. x86/mmiotrace: Replace deprecated CPU-hotplug functions.
2021-08-30Merge tag 'x86_cache_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "A first round of changes towards splitting the arch-specific bits from the filesystem bits of resctrl, the ultimate goal being to support ARM's equivalent technology MPAM, with the same fs interface (James Morse)" * tag 'x86_cache_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/resctrl: Make resctrl_arch_get_config() return its value x86/resctrl: Merge the CDP resources x86/resctrl: Expand resctrl_arch_update_domains()'s msr_param range x86/resctrl: Remove rdt_cdp_peer_get() x86/resctrl: Merge the ctrl_val arrays x86/resctrl: Calculate the index from the configuration type x86/resctrl: Apply offset correction when config is staged x86/resctrl: Make ctrlval arrays the same size x86/resctrl: Pass configuration type to resctrl_arch_get_config() x86/resctrl: Add a helper to read a closid's configuration x86/resctrl: Rename update_domains() to resctrl_arch_update_domains() x86/resctrl: Allow different CODE/DATA configurations to be staged x86/resctrl: Group staged configuration into a separate struct x86/resctrl: Move the schemata names into struct resctrl_schema x86/resctrl: Add a helper to read/set the CDP configuration x86/resctrl: Swizzle rdt_resource and resctrl_schema in pseudo_lock_region x86/resctrl: Pass the schema to resctrl filesystem functions x86/resctrl: Add resctrl_arch_get_num_closid() x86/resctrl: Store the effective num_closid in the schema x86/resctrl: Walk the resctrl schema list instead of an arch list ...
2021-08-30Merge tag 'x86_build_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Borislav Petkov: - Remove cc-option checks which are old and already supported by the minimal compiler version the kernel uses and thus avoid the need to invoke the compiler unnecessarily. - Cleanups * tag 'x86_build_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Move the install rule to arch/x86/Makefile x86/build: Remove the left-over bzlilo target x86/tools/relocs: Mark die() with the printf function attr format x86/build: Remove stale cc-option checks
2021-08-30Merge tag 'ras_core_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS update from Borislav Petkov: "A single RAS change for 5.15: - Do not start processing MCEs logged early because the decoding chain is not up yet - delay that processing until everything is ready" * tag 'ras_core_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Defer processing of early errors
2021-08-30Merge tag 's390-5.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Improve ftrace code patching so that stop_machine is not required anymore. This requires a small common code patch acked by Steven Rostedt: https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/ - Enable KCSAN for s390. This comes with a small common code change to fix a compile warning. Acked by Marco Elver: https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com - Add KFENCE support for s390. This also comes with a minimal x86 patch from Marco Elver who said also this can be carried via the s390 tree: https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/ - More changes to prepare the decompressor for relocation. - Enable DAT also for CPU restart path. - Final set of register asm removal patches; leaving only three locations where needed and sane. - Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO support to hwcaps flags. - Cleanup hwcaps implementation. - Add new instructions to in-kernel disassembler. - Various QDIO cleanups. - Add SCLP debug feature. - Various other cleanups and improvements all over the place. * tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (105 commits) s390: remove SCHED_CORE from defconfigs s390/smp: do not use nodat_stack for secondary CPU start s390/smp: enable DAT before CPU restart callback is called s390: update defconfigs s390/ap: fix state machine hang after failure to enable irq KVM: s390: generate kvm hypercall functions s390/sclp: add tracing of SCLP interactions s390/debug: add early tracing support s390/debug: fix debug area life cycle s390/debug: keep debug data on resize s390/diag: make restart_part2 a local label s390/mm,pageattr: fix walk_pte_level() early exit s390: fix typo in linker script s390: remove do_signal() prototype and do_notify_resume() function s390/crypto: fix all kernel-doc warnings in vfio_ap_ops.c s390/pci: improve DMA translation init and exit s390/pci: simplify CLP List PCI handling s390/pci: handle FH state mismatch only on disable s390/pci: fix misleading rc in clp_set_pci_fn() s390/boot: factor out offset_vmlinux_info() function ...
2021-08-30Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Algorithms: - Add AES-NI/AVX/x86_64 implementation of SM4. Drivers: - Add Arm SMCCC TRNG based driver" [ And obviously a lot of random fixes and updates - Linus] * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (84 commits) crypto: sha512 - remove imaginary and mystifying clearing of variables crypto: aesni - xts_crypt() return if walk.nbytes is 0 padata: Remove repeated verbose license text crypto: ccp - Add support for new CCP/PSP device ID crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation crypto: x86/sm4 - export reusable AESNI/AVX functions crypto: rmd320 - remove rmd320 in Makefile crypto: skcipher - in_irq() cleanup crypto: hisilicon - check _PS0 and _PR0 method crypto: hisilicon - change parameter passing of debugfs function crypto: hisilicon - support runtime PM for accelerator device crypto: hisilicon - add runtime PM ops crypto: hisilicon - using 'debugfs_create_file' instead of 'debugfs_create_regset32' crypto: tcrypt - add GCM/CCM mode test for SM4 algorithm crypto: testmgr - Add GCM/CCM mode test of SM4 algorithm crypto: tcrypt - Fix missing return value check crypto: hisilicon/sec - modify the hardware endian configuration crypto: hisilicon/sec - fix the abnormal exiting process crypto: qat - store vf.compatible flag crypto: qat - do not export adf_iov_putmsg() ...
2021-08-29Merge tag 'perf_urgent_for_v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Prevent the amd/power module from being removed while in use - Mark AMD IBS as not supporting content exclusion - Add a workaround for AMD erratum #1197 where IBS registers might not be restored properly after exiting CC6 state - Fix a potential truncation of a 32-bit variable due to shifting - Read the correct bits describing the number of configurable address ranges on Intel PT * tag 'perf_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/power: Assign pmu.module perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op perf/x86/amd/ibs: Work around erratum #1197 perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32 perf/x86/intel/pt: Fix mask of num_address_ranges
2021-08-29Merge tag 'x86_urgent_for_v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix build error on RHEL where -Werror=maybe-uninitialized is set. - Restore the firmware's IDT when calling EFI boot services and before ExitBootServices() has been called. This fixes a boot failure on what appears to be a tablet with 32-bit UEFI running a 64-bit kernel. * tag 'x86_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix a maybe-uninitialized build warning treated as error x86/efi: Restore Firmware IDT before calling ExitBootServices()
2021-08-27crypto: aesni - xts_crypt() return if walk.nbytes is 0Shreyansh Chouhan
xts_crypt() code doesn't call kernel_fpu_end() after calling kernel_fpu_begin() if walk.nbytes is 0. The correct behavior should be not calling kernel_fpu_begin() if walk.nbytes is 0. Reported-by: syzbot+20191dc583eff8602d2d@syzkaller.appspotmail.com Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-27crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementationTianjia Zhang
Like the implementation of AESNI/AVX, this patch adds an accelerated implementation of AESNI/AVX2. In terms of code implementation, by reusing AESNI/AVX mode-related codes, the amount of code is greatly reduced. From the benchmark data, it can be seen that when the block size is 1024, compared to AVX acceleration, the performance achieved by AVX2 has increased by about 70%, it is also 7.7 times of the pure software implementation of sm4-generic. The main algorithm implementation comes from SM4 AES-NI work by libgcrypt and Markku-Juhani O. Saarinen at: https://github.com/mjosaarinen/sm4ni This optimization supports the four modes of SM4, ECB, CBC, CFB, and CTR. Since CBC and CFB do not support multiple block parallel encryption, the optimization effect is not obvious. Benchmark on Intel i5-6200U 2.30GHz, performance data of three implementation methods, pure software sm4-generic, aesni/avx acceleration, and aesni/avx2 acceleration, the data comes from the 218 mode and 518 mode of tcrypt. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: block-size | 16 64 128 256 1024 1420 4096 sm4-generic ECB enc | 60.94 70.41 72.27 73.02 73.87 73.58 73.59 ECB dec | 61.87 70.53 72.15 73.09 73.89 73.92 73.86 CBC enc | 56.71 66.31 68.05 69.84 70.02 70.12 70.24 CBC dec | 54.54 65.91 68.22 69.51 70.63 70.79 70.82 CFB enc | 57.21 67.24 69.10 70.25 70.73 70.52 71.42 CFB dec | 57.22 64.74 66.31 67.24 67.40 67.64 67.58 CTR enc | 59.47 68.64 69.91 71.02 71.86 71.61 71.95 CTR dec | 59.94 68.77 69.95 71.00 71.84 71.55 71.95 sm4-aesni-avx ECB enc | 44.95 177.35 292.06 316.98 339.48 322.27 330.59 ECB dec | 45.28 178.66 292.31 317.52 339.59 322.52 331.16 CBC enc | 57.75 67.68 69.72 70.60 71.48 71.63 71.74 CBC dec | 44.32 176.83 284.32 307.24 328.61 312.61 325.82 CFB enc | 57.81 67.64 69.63 70.55 71.40 71.35 71.70 CFB dec | 43.14 167.78 282.03 307.20 328.35 318.24 325.95 CTR enc | 42.35 163.32 279.11 302.93 320.86 310.56 317.93 CTR dec | 42.39 162.81 278.49 302.37 321.11 310.33 318.37 sm4-aesni-avx2 ECB enc | 45.19 177.41 292.42 316.12 339.90 322.53 330.54 ECB dec | 44.83 178.90 291.45 317.31 339.85 322.55 331.07 CBC enc | 57.66 67.62 69.73 70.55 71.58 71.66 71.77 CBC dec | 44.34 176.86 286.10 501.68 559.58 483.87 527.46 CFB enc | 57.43 67.60 69.61 70.52 71.43 71.28 71.65 CFB dec | 43.12 167.75 268.09 499.33 558.35 490.36 524.73 CTR enc | 42.42 163.39 256.17 493.95 552.45 481.58 517.19 CTR dec | 42.49 163.11 256.36 493.34 552.62 481.49 516.83 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-27crypto: x86/sm4 - export reusable AESNI/AVX functionsTianjia Zhang
Export the reusable functions in the SM4 AESNI/AVX implementation, mainly public functions, which are used to develop the SM4 AESNI/AVX2 implementation, and eliminate unnecessary duplication of code. At the same time, in order to make the public function universal, minor fixes was added. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-26perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> headerKim Phillips
Add <asm/amd-ibs.h> with bitfield definitions for IBS MSRs, and demonstrate usage within the driver. Also move 'struct perf_ibs_data' where it can be shared with the perf tool that will soon be using it. No functional changes. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-9-kim.phillips@amd.com
2021-08-26perf/amd/uncore: Allow the driver to be built as a moduleKim Phillips
Add support to build the AMD uncore driver as a module. This is in order to facilitate development without having to reboot the kernel in most cases. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-8-kim.phillips@amd.com
2021-08-26x86/cpu: Add get_llc_id() helper functionKim Phillips
Factor out a helper function rather than export cpu_llc_id, which is needed in order to be able to build the AMD uncore driver as a module. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-7-kim.phillips@amd.com
2021-08-26perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/Kim Phillips
Found by checkpatch. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-6-kim.phillips@amd.com
2021-08-26perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULLKim Phillips
free_percpu() has its own check for NULL, no need to open-code it. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-5-kim.phillips@amd.com
2021-08-26perf/x86/intel: Replace deprecated CPU-hotplug functionsSebastian Andrzej Siewior
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210803141621.780504-11-bigeasy@linutronix.de
2021-08-26perf/x86: Remove unused assignment to pointer 'e'Colin Ian King
The pointer 'e' is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210804115710.109608-1-colin.king@canonical.com
2021-08-26Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-08-26perf/x86/amd/power: Assign pmu.moduleKim Phillips
Assign pmu.module so the driver can't be unloaded whilst in use. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-4-kim.phillips@amd.com
2021-08-26perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS OpKim Phillips
Commit: 2ff40250691e ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs") neglected to do so. Fixes: 2ff40250691e ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210817221048.88063-2-kim.phillips@amd.com
2021-08-26perf/x86/amd/ibs: Work around erratum #1197Kim Phillips
Erratum #1197 "IBS (Instruction Based Sampling) Register State May be Incorrect After Restore From CC6" is published in a document: "Revision Guide for AMD Family 19h Models 00h-0Fh Processors" 56683 Rev. 1.04 July 2021 https://bugzilla.kernel.org/show_bug.cgi?id=206537 Implement the erratum's suggested workaround and ignore IBS samples if MSRC001_1031 == 0. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-3-kim.phillips@amd.com
2021-08-26perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32Colin Ian King
The u32 variable pci_dword is being masked with 0x1fffffff and then left shifted 23 places. The shift is a u32 operation,so a value of 0x200 or more in pci_dword will overflow the u32 and only the bottow 32 bits are assigned to addr. I don't believe this was the original intent. Fix this by casting pci_dword to a resource_size_t to ensure no overflow occurs. Note that the mask and 12 bit left shift operation does not need this because the mask SNR_IMC_MMIO_MEM0_MASK and shift is always a 32 bit value. Fixes: ee49532b38dd ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Addresses-Coverity: ("Unintentional integer overflow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20210706114553.28249-1-colin.king@canonical.com
2021-08-25perf/x86/intel/pt: Fix mask of num_address_rangesXiaoyao Li
Per SDM, bit 2:0 of CPUID(0x14,1).EAX[2:0] reports the number of configurable address ranges for filtering, not bit 1:0. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20210824040622.4081502-1-xiaoyao.li@intel.com
2021-08-25x86/build: Move the install rule to arch/x86/MakefileMasahiro Yamada
Currently, the install target in arch/x86/Makefile descends into arch/x86/boot/Makefile to invoke the shell script, but there is no good reason to do so. arch/x86/Makefile can run the shell script directly. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210729140023.442101-2-masahiroy@kernel.org
2021-08-25x86/build: Remove the left-over bzlilo targetMasahiro Yamada
Commit f279b49f13bd ("x86/boot: Modernize genimage script; hdimage+EFI support") removed the bzlilo target from arch/x86/boot/Makefile. Remove the left-over from arch/x86/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210729140023.442101-1-masahiroy@kernel.org
2021-08-24x86/kaslr: Have process_mem_region() return a booleanJing Yangyang
Fix the following coccicheck warning: ./arch/x86/boot/compressed/kaslr.c:671:10-11:WARNING:return of 0/1 in function 'process_mem_region' with return type bool Generated by: scripts/coccinelle/misc/boolreturn.cocci Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jing Yangyang <jing.yangyang@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210824070515.61065-1-deng.changcheng@zte.com.cn
2021-08-24x86/mce: Defer processing of early errorsBorislav Petkov
When a fatal machine check results in a system reset, Linux does not clear the error(s) from machine check bank(s) - hardware preserves the machine check banks across a warm reset. During initialization of the kernel after the reboot, Linux reads, logs, and clears all machine check banks. But there is a problem. In: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") the call to mce_register_decode_chain() moved later in the boot sequence. This means that /dev/mcelog doesn't see those early error logs. This was partially fixed by: cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers") which made sure that the logs were not lost completely by printing to the console. But parsing console logs is error prone. Users of /dev/mcelog should expect to find any early errors logged to standard places. Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early machine check initialization to indicate that any errors found should just be queued to genpool. When mcheck_late_init() is called it will call mce_schedule_work() to actually log and flush any errors queued in the genpool. [ Based on an original patch, commit message by and completely productized by Tony Luck. ] Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") Reported-by: Sumanth Kamatala <skamatala@juniper.net> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
2021-08-23x86/tools/relocs: Mark die() with the printf function attr formatBorislav Petkov
Mark die() as a function which accepts printf-style arguments so that the compiler can typecheck them against the supplied format string. Use the C99 inttypes.h format specifiers as relocs.c gets built for both 32- and 64-bit. Original version of the patch by Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/YNnb6Q4QHtNYC049@zn.tnic
2021-08-22x86/build: Remove stale cc-option checksNick Desaulniers
cc-option, __cc-option, cc-option-yn, and cc-disable-warning all invoke the compiler during build time, and can slow down the build when these checks become stale for our supported compilers, whose minimally supported versions increases over time. See Documentation/process/changes.rst for the current supported minimal versions (GCC 4.9+, clang 10.0.1+). Compiler version support for these flags may be verified on godbolt.org. The following flags are supported by all supported versions of GCC and Clang. Remove their cc-option, __cc-option, and cc-option-yn tests. -Wno-address-of-packed-member -mno-avx -m32 -mno-80387 -march=k8 -march=nocona -march=core2 -march=atom -mtune=generic -mfentry [ mingo: Fixed regression on GCC, via partial revert of the stack-boundary changes. ] Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1436 Link: https://lkml.kernel.org/r/20210812183848.1519994-1-ndesaulniers@google.com
2021-08-22x86/resctrl: Fix a maybe-uninitialized build warning treated as errorBabu Moger
The recent commit 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") caused a RHEL build failure with an uninitialized variable warning treated as an error because it removed the default case snippet. The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly uninitialized variable warnings to be treated as errors. This is also reported by smatch via the 0day robot. The error from the RHEL build is: arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’: arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used uninitialized in this function [-Werror=maybe-uninitialized] m->chunks += chunks; ^~ The upstream Makefile does not build using '-Werror=maybe-uninitialized'. So, the problem is not seen there. Fix the problem by putting back the default case snippet. [ bp: note that there's nothing wrong with the code and other compilers do not trigger this warning - this is being done just so the RHEL compiler is happy. ] Fixes: 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") Reported-by: Terry Bowman <Terry.Bowman@amd.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
2021-08-21x86/efi: Restore Firmware IDT before calling ExitBootServices()Joerg Roedel
Commit 79419e13e808 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path") introduced an IDT into the 32-bit boot path of the decompressor stub. But the IDT is set up before ExitBootServices() is called, and some UEFI firmwares rely on their own IDT. Save the firmware IDT on boot and restore it before calling into EFI functions to fix boot failures introduced by above commit. Fixes: 79419e13e808 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path") Reported-by: Fabio Aiuto <fabioaiuto83@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org # 5.13+ Link: https://lkml.kernel.org/r/20210820125703.32410-1-joro@8bytes.org
2021-08-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Two nested virtualization fixes for AMD processors" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)
2021-08-16KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)Maxim Levitsky
If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor), then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only possible by making L0 intercept these instructions. Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted, and thus read/write portions of the host physical memory. Fixes: 89c8a4984fc9 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-16KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)Maxim Levitsky
* Invert the mask of bits that we pick from L2 in nested_vmcb02_prepare_control * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr This fixes a security issue that allowed a malicious L1 to run L2 with AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled AVIC to read/write the host physical memory at some offsets. Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-15Merge tag 'irq-urgent-2021-08-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early
2021-08-15Merge tag 'x86_urgent_for_v5.14_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "Two fixes: - An objdump checker fix to ignore parenthesized strings in the objdump version - Fix resctrl default monitoring groups reporting when new subgroups get created" * tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix default monitoring groups reporting x86/tools: Fix objdump version check again
2021-08-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "ARM: - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM x86: - Fixes for the new MMU, especially a memory leak on hosts with <39 physical address bits - Remove bogus EFER.NX checks on 32-bit non-PAE hosts - WAITPKG fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault KVM: x86: remove dead initialization KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE KVM: arm64: Fix off-by-one in range_is_memory
2021-08-13Merge branch 'kvm-tdpmmu-fixes' into kvm-masterPaolo Bonzini
Merge topic branch with fixes for both 5.14-rc6 and 5.15.
2021-08-13KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlockSean Christopherson
Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write. Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages. Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree. For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise. Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock. Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective). Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging). Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181815.3378104-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEsSean Christopherson
Set the min_level for the TDP iterator at the root level when zapping all SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf SPTE will recursively zap all its children, thus there is no need for the iterator to attempt to step down. This avoids rereading the top-level SPTEs after they are zapped by causing try_step_down() to short-circuit. In most cases, optimizing try_step_down() will be in the noise as the cost of zapping SPTEs completely dominates the overall time. The optimization is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM is zapping in response to multiple memslot updates when userspace is adding and removing read-only memslots for option ROMs. In that case, the task doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock for read and thus can be a noisy neighbor of sorts. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEsSean Christopherson
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level paging, the "zap all" flows will skip top-level SPTEs whose range extends beyond shadow_phys_bits and leak their SPs when the VM is destroyed. Use the current upper bound (based on host.MAXPHYADDR) to detect that the caller wants to zap all SPTEs, e.g. instead of using the max theoretical gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator to terminate its walk earlier when running on hosts with MAXPHYADDR < 52. Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to help future debuggers should KVM decide to leak SPTEs again. The bug is most easily reproduced by running (and unloading!) KVM in a VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped. ============================================================================= BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1) CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x45/0x59 slab_err+0x95/0xc9 __kmem_cache_shutdown.cold+0x3c/0x158 kmem_cache_destroy+0x3d/0xf0 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x5d/0x90 [kvm] kvm_exit+0x78/0x90 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x13f/0x220 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PFSean Christopherson
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: 1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13kvm: vmx: Sync all matching EPTPs when injecting nested EPT faultJunaid Shahid
When a nested EPT violation/misconfig is injected into the guest, the shadow EPT PTEs associated with that address need to be synced. This is done by kvm_inject_emulated_page_fault() before it calls nested_ept_inject_page_fault(). However, that will only sync the shadow EPT PTE associated with the current L1 EPTP. Since the ASID is based on EP4TA rather than the full EPTP, so syncing the current EPTP is not enough. The SPTEs associated with any other L1 EPTPs in the prev_roots cache with the same EP4TA also need to be synced. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20210806222229.1645356-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13Merge branch 'kvm-vmx-secctl' into kvm-masterPaolo Bonzini
Merge common topic branch for 5.14-rc6 and 5.15 merge window.
2021-08-13KVM: x86: remove dead initializationPaolo Bonzini
hv_vcpu is initialized again a dozen lines below, and at this point vcpu->arch.hyperv is not valid. Remove the initializer. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernelsSean Christopherson
Remove an ancient restriction that disallowed exposing EFER.NX to the guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU. The motivation of the check, added by commit 2cc51560aed0 ("KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run the guest with the host's EFER.NX and thus avoid context switching EFER if the only divergence was the NX bit. Fast forward to today, and KVM has long since stopped running the guest with the host's EFER.NX. Not only does KVM context switch EFER if host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 && guest.EFER.NX=1 when using shadow paging (to emulate SMEP). Furthermore, the entire motivation for the restriction was made obsolete over a decade ago when Intel added dedicated host and guest EFER fields in the VMCS (Nehalem timeframe), which reduced the overhead of context switching EFER from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles. In practice, the removed restriction only affects non-PAE 32-bit kernels, as EFER.NX is set during boot if NX is supported and the kernel will use PAE paging (32-bit or 64-bit), regardless of whether or not the kernel will actually use NX itself (mark PTEs non-executable). Alternatively and/or complementarily, startup_32_smp() in head_32.S could be modified to set EFER.NX=1 regardless of paging mode, thus eliminating the scenario where NX is supported but not enabled. However, that runs the risk of breaking non-KVM non-PAE kernels (though the risk is very, very low as there are no known EFER.NX errata), and also eliminates an easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER across nested virtualization transitions. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210805183804.1221554-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-12x86/resctrl: Fix default monitoring groups reportingBabu Moger
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem. Steps to reproduce: 1. mount -t resctrl resctrl /sys/fs/resctrl/ 2. cd /sys/fs/resctrl/ 3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832 4. Create sub monitor group: mkdir mon_groups/test1 5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable". When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable". Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs. Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik <pawel.szulik@intel.com> Signed-off-by: Babu Moger <Babu.Moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu