summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2023-12-12riscv: dts: thead: Add TH1520 mmc controllers and sdhci clockDrew Fustini
Add node for the fixed reference clock used for emmc and sdio nodes. Add emmc node for the 1st dwcmshc instance which is typically connected to an eMMC device. Add sdio0 node for the 2nd dwcmshc instance which is typically connected to microSD slot. Add sdio1 node for the 3rd dwcmshc instance which is typically connected to an SDIO WiFi module. The node names are based on Table 1-2 C910/C906 memory map in the TH1520 System User Manual. Link: https://git.beagleboard.org/beaglev-ahead/beaglev-ahead/-/tree/main/docs Signed-off-by: Drew Fustini <dfustini@baylibre.com> Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-12x86/virt/tdx: Disable TDX host support when kexec is enabledDave Hansen
TDX host support currently lacks the ability to handle kexec. Disable TDX when kexec is enabled. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-20-dave.hansen%40intel.com
2023-12-12x86/mce: Differentiate real hardware #MCs from TDX erratum onesKai Huang
The first few generations of TDX hardware have an erratum. Triggering it in Linux requires some kind of kernel bug involving relatively exotic memory writes to TDX private memory and will manifest via spurious-looking machine checks when reading the affected memory. Make an effort to detect these TDX-induced machine checks and spit out a new blurb to dmesg so folks do not think their hardware is failing. == Background == Virtually all kernel memory accesses operations happen in full cachelines. In practice, writing a "byte" of memory usually reads a 64 byte cacheline of memory, modifies it, then writes the whole line back. Those operations do not trigger this problem. This problem is triggered by "partial" writes where a write transaction of less than cacheline lands at the memory controller. The CPU does these via non-temporal write instructions (like MOVNTI), or through UC/WC memory mappings. The issue can also be triggered away from the CPU by devices doing partial writes via DMA. == Problem == A partial write to a TDX private memory cacheline will silently "poison" the line. Subsequent reads will consume the poison and generate a machine check. According to the TDX hardware spec, neither of these things should have happened. To add insult to injury, the Linux machine code will present these as a literal "Hardware error" when they were, in fact, a software-triggered issue. == Solution == In the end, this issue is hard to trigger. Rather than do something rash (and incomplete) like unmap TDX private memory from the direct map, improve the machine check handler. Currently, the #MC handler doesn't distinguish whether the memory is TDX private memory or not but just dump, for instance, below message: [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134 [...] mce: [Hardware Error]: RIP 10:<ffffffffadb69870> {__tlb_remove_page_size+0x10/0xa0} ... [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel [...] Kernel panic - not syncing: Fatal local machine check Which says "Hardware Error" and "Data load in unrecoverable area of kernel". Ideally, it's better for the log to say "software bug around TDX private memory" instead of "Hardware Error". But in reality the real hardware memory error can happen, and sadly such software-triggered #MC cannot be distinguished from the real hardware error. Also, the error message is used by userspace tool 'mcelog' to parse, so changing the output may break userspace. So keep the "Hardware Error". The "Data load in unrecoverable area of kernel" is also helpful, so keep it too. Instead of modifying above error log, improve the error log by printing additional TDX related message to make the log like: ... [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel [...] mce: [Hardware Error]: Machine Check: TDX private memory error. Possible kernel bug. Adding this additional message requires determination of whether the memory page is TDX private memory. There is no existing infrastructure to do that. Add an interface to query the TDX module to fill this gap. == Impact == This issue requires some kind of kernel bug to trigger. TDX private memory should never be mapped UC/WC. A partial write originating from these mappings would require *two* bugs, first mapping the wrong page, then writing the wrong memory. It would also be detectable using traditional memory corruption techniques like DEBUG_PAGEALLOC. MOVNTI (and friends) could cause this issue with something like a simple buffer overrun or use-after-free on the direct map. It should also be detectable with normal debug techniques. The one place where this might get nasty would be if the CPU read data then wrote back the same data. That would trigger this problem but would not, for instance, set off mechanisms like slab redzoning because it doesn't actually corrupt data. With an IOMMU at least, the DMA exposure is similar to the UC/WC issue. TDX private memory would first need to be incorrectly mapped into the I/O space and then a later DMA to that mapping would actually cause the poisoning event. [ dhansen: changelog tweaks ] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-18-dave.hansen%40intel.com
2023-12-12x86/cpu: Detect TDX partial write machine check erratumKai Huang
TDX memory has integrity and confidentiality protections. Violations of this integrity protection are supposed to only affect TDX operations and are never supposed to affect the host kernel itself. In other words, the host kernel should never, itself, see machine checks induced by the TDX integrity hardware. Alas, the first few generations of TDX hardware have an erratum. A partial write to a TDX private memory cacheline will silently "poison" the line. Subsequent reads will consume the poison and generate a machine check. According to the TDX hardware spec, neither of these things should have happened. Virtually all kernel memory accesses operations happen in full cachelines. In practice, writing a "byte" of memory usually reads a 64 byte cacheline of memory, modifies it, then writes the whole line back. Those operations do not trigger this problem. This problem is triggered by "partial" writes where a write transaction of less than cacheline lands at the memory controller. The CPU does these via non-temporal write instructions (like MOVNTI), or through UC/WC memory mappings. The issue can also be triggered away from the CPU by devices doing partial writes via DMA. With this erratum, there are additional things need to be done. To prepare for those changes, add a CPU bug bit to indicate this erratum. Note this bug reflects the hardware thus it is detected regardless of whether the kernel is built with TDX support or not. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-17-dave.hansen%40intel.com
2023-12-12arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modifyJames Houghton
It is currently possible for a userspace application to enter an infinite page fault loop when using HugeTLB pages implemented with contiguous PTEs when HAFDBS is not available. This happens because: 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling the memory access on a system without HAFDBS, we will get a page fault. 3. HugeTLB will check if it needs to update the dirty bits on the PTE. For contiguous PTEs, it will check to see if the pgprot bits need updating. In this case, HugeTLB wants to write a sequence of sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), so it thinks no update is necessary. We can get the kernel to write a sw-dirty, hw-clean PTE with the following steps (showing the relevant VMA flags and pgprot bits): i. Create a valid, writable contiguous PTE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE ii. mprotect the VMA to PROT_NONE. VMA vmflags: VM_SHARED VMA pgprot bits: PTE_RDONLY PTE pgprot bits: PTE_DIRTY | PTE_RDONLY iii. mprotect the VMA back to PROT_READ | PROT_WRITE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY Make it impossible to create a writeable sw-dirty, hw-clean PTE with pte_modify(). Such a PTE should be impossible to create, and there may be places that assume that pte_dirty() implies pte_hw_dirty(). Signed-off-by: James Houghton <jthoughton@google.com> Fixes: 031e6e6b4e12 ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags") Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-12-12ARM: dts: microchip: sama5d27_som1_ek: Remove mmc-ddr-3_3v property from ↵Mihai Sain
sdmmc0 node On board the sdmmc0 interface is wired to a SD Card socket. According with mmc-controller bindings, the mmc-ddr-3_3v property is used for eMMC devices to enable high-speed DDR mode (3.3V I/O). Remove the mmc-ddr-3_3v property from sdmmc0 node. Signed-off-by: Mihai Sain <mihai.sain@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20231211070345.2792-1-mihai.sain@microchip.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2023-12-12arm64: fpsimd: Implement lazy restore for kernel mode FPSIMDArd Biesheuvel
Now that kernel mode FPSIMD state is context switched along with other task state, we can enable the existing logic that keeps track of which task's FPSIMD state the CPU is holding in its registers. If it is the context of the task that we are switching to, we can elide the reload of the FPSIMD state from memory. Note that we also need to check whether the FPSIMD state on this CPU is the most recent: if a task gets migrated away and back again, the state in memory may be more recent than the state in the CPU. So add another CPU id field to task_struct to keep track of this. (We could reuse the existing CPU id field used for user mode context, but that might result in user state to be discarded unnecessarily, given that two distinct CPUs could be holding the most recent user mode state and the most recent kernel mode state) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231208113218.3001940-9-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: fpsimd: Preserve/restore kernel mode NEON at context switchArd Biesheuvel
Currently, the FPSIMD register file is not preserved and restored along with the general registers on exception entry/exit or context switch. For this reason, we disable preemption when enabling FPSIMD for kernel mode use in task context, and suspend the processing of softirqs so that there are no concurrent uses in the kernel. (Kernel mode FPSIMD may not be used at all in other contexts). Disabling preemption while doing CPU intensive work on inputs of potentially unbounded size is bad for real-time performance, which is why we try and ensure that SIMD crypto code does not operate on more than ~4k at a time, which is an arbitrary limit and requires assembler code to implement efficiently. We can avoid the need for disabling preemption if we can ensure that any in-kernel users of the NEON will not lose the FPSIMD register state across a context switch. And given that disabling softirqs implicitly disables preemption as well, we will also have to ensure that a softirq that runs code using FPSIMD can safely interrupt an in-kernel user. So introduce a thread_info flag TIF_KERNEL_FPSTATE, and modify the context switch hook for FPSIMD to preserve and restore the kernel mode FPSIMD to/from struct thread_struct when it is set. This avoids any scheduling blackouts due to prolonged use of FPSIMD in kernel mode, without the need for manual yielding. In order to support softirq processing while FPSIMD is being used in kernel task context, use the same flag to decide whether the kernel mode FPSIMD state needs to be preserved and restored before allowing FPSIMD to be used in softirq context. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231208113218.3001940-8-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: fpsimd: Drop unneeded 'busy' flagArd Biesheuvel
Kernel mode NEON will preserve the user mode FPSIMD state by saving it into the task struct before clobbering the registers. In order to avoid the need for preserving kernel mode state too, we disallow nested use of kernel mode NEON, i..e, use in softirq context while the interrupted task context was using kernel mode NEON too. Originally, this policy was implemented using a per-CPU flag which was exposed via may_use_simd(), requiring the users of the kernel mode NEON to deal with the possibility that it might return false, and having NEON and non-NEON code paths. This policy was changed by commit 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs disabled"), and now, softirq processing is disabled entirely instead, and so may_use_simd() can never fail when called from task or softirq context. This means we can drop the fpsimd_context_busy flag entirely, and instead, ensure that we disable softirq processing in places where we formerly relied on the flag for preventing races in the FPSIMD preserve routines. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20231208113218.3001940-7-ardb@google.com [will: Folded in fix from CAMj1kXFhzbJRyWHELCivQW1yJaF=p07LLtbuyXYX3G1WtsdyQg@mail.gmail.com] Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64/kernel: Move 'nokaslr' parsing out of early idreg codeArd Biesheuvel
Parsing and ignoring 'nokaslr' can be done from anywhere, except from the code that runs very early and is therefore built with limitations on the kind of relocations it is permitted to use. So move it to a source file that is part of the ordinary kernel build. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-63-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: idreg-override: Avoid kstrtou64() to parse a single hex digitArd Biesheuvel
All ID register value overrides are =0 with the exception of the nokaslr pseudo feature which uses =1. In order to remove the dependency on kstrtou64(), which is part of the core kernel and no longer usable once we move idreg-override into the early mini C runtime, let's just parse a single hex digit (with optional leading 0x) and set the output value accordingly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-62-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: idreg-override: Avoid sprintf() for simple string concatenationArd Biesheuvel
Instead of using sprintf() with the "%s.%s=" format, where the first string argument is always the same in the inner loop of match_options(), use simple memcpy() for string concatenation, and move the first copy to the outer loop. This removes the dependency on sprintf(), which will be difficult to fulfil when we move this code into the early mini C runtime. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-61-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: idreg-override: avoid strlen() to check for empty stringsArd Biesheuvel
strlen() is a costly way to decide whether a string is empty, as in that case, the first character will be NUL so we can check for that directly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-60-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: idreg-override: Avoid parameq() and parameqn()Ard Biesheuvel
The only way parameq() and parameqn() deviate from the ordinary string and memory routines is that they ignore the difference between dashes and underscores. Since we copy each command line argument into a buffer before passing it to parameq() and parameqn() numerous times, let's just convert all dashes to underscores just once, and update the alias array accordingly. This also helps reduce the dependency on kernel APIs that are no longer available once we move this code into the early mini C runtime. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-59-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: idreg-override: Prepare for place relative reloc patchingArd Biesheuvel
The ID reg override handling code uses a rather elaborate data structure that relies on statically initialized absolute address values in pointer fields. This means that this code cannot run until relocation fixups have been applied, and this is unfortunate, because it means we cannot discover overrides for KASLR or LVA/LPA without creating the kernel mapping and performing the relocations first. This can be solved by switching to place-relative relocations, which can be applied by the linker at build time. This means some additional arithmetic is required when dereferencing these pointers, as we can no longer dereference the pointer members directly. So let's implement this for idreg-override.c in a preliminary way, i.e., convert all the references in code to use a special accessor that produces the correct absolute value at runtime. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-58-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: idreg-override: Omit non-NULL checks for override pointerArd Biesheuvel
Now that override pointers are always set, we can drop the various non-NULL checks that we have in the code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-57-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: mm: get rid of kimage_vaddr global variableArd Biesheuvel
We store the address of _text in kimage_vaddr, but since commit 09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings decision"), we no longer reference this variable from modules so we no longer need to export it. In fact, we don't need it at all so let's just get rid of it. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231129111555.3594833-46-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: mm: Take potential load offset into account when KASLR is offArd Biesheuvel
We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is disabled, and this permits the loader (i.e., EFI) to place the kernel anywhere in physical memory as long as the base address is 64k aligned. This means that the 'KASLR' case described in the header that defines the size of the statically allocated page tables could take effect even when CONFIG_RANDMIZE_BASE=n. So check for CONFIG_RELOCATABLE instead. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231129111555.3594833-45-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: kernel: Disable latent_entropy GCC plugin in early C runtimeArd Biesheuvel
In subsequent patches, mark portions of the early C code will be marked as __init. Unfortunarely, __init implies __latent_entropy, and this would result in the early C code being instrumented in an unsafe manner. Disable the latent entropy plugin for the early C code. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231129111555.3594833-44-ardb@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm64: defconfig: Enable configs for MT8195-Cherry-Tomato ChromebookNícolas F. R. A. Prado
Enable missing configs needed to boot the MT8195-Cherry-Tomato Chromebook with full support on the defconfig. The configs enabled bring in support for the DSP and sound card, display, thermal sensor and keyboard backlight. Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20231122181335.535498-1-nfraprado@collabora.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2023-12-12arm64: defconfig: Enable DA9211 regulatorVignesh Raman
Mediatek mt8173 board fails to boot with DA9211 regulator disabled. Enabling CONFIG_REGULATOR_DA9211=y in drm-ci fixes the issue. So enable it in the defconfig since kernel-ci also requires it. Suggested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230911104139.617448-1-vignesh.raman@collabora.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2023-12-12x86/CPU/AMD: Add X86_FEATURE_ZEN1Borislav Petkov (AMD)
Add a synthetic feature flag specifically for first generation Zen machines. There's need to have a generic flag for all Zen generations so make X86_FEATURE_ZEN be that flag. Fixes: 30fa92832f40 ("x86/CPU/AMD: Add ZenX generations flags") Suggested-by: Brian Gerst <brgerst@gmail.com> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/dc3835e3-0731-4230-bbb9-336bbe3d042b@amd.com
2023-12-12arm64: dts: juno: Align thermal zone names with bindingsKrzysztof Kozlowski
Thermal bindings require thermal zone node names to match certain patterns: | juno.dtb: thermal-zones: 'big-cluster', 'gpu0', 'gpu1', | 'little-cluster', 'pmic', 'soc' | do not match any of the regexes: | '^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$', 'pinctrl-[0-9]+' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://lore.kernel.org/r/20231209171612.250868-1-krzysztof.kozlowski@linaro.org Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-12-12arm: pmu: Move error message and -EOPNOTSUPP to individual PMUsJames Clark
-EPERM or -EINVAL always get converted to -EOPNOTSUPP, so replace them. This will allow __hw_perf_event_init() to return a different code or not print that particular message for a different error in the next commit. Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20231211161331.1277825-10-james.clark@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12arm: perf/kvm: Use GENMASK for ARMV8_PMU_PMCR_NJames Clark
This is so that FIELD_GET and FIELD_PREP can be used and that the fields are in a consistent format to arm64/tools/sysreg Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20231211161331.1277825-3-james.clark@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()Jason Gunthorpe
This is not being used to pass ops, it is just a way to tell if an iommu driver was probed. These days this can be detected directly via device_iommu_mapped(). Call device_iommu_mapped() in the two places that need to check it and remove the iommu parameter everywhere. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Moritz Fischer <mdf@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rob Herring <robh@kernel.org> Tested-by: Hector Martin <marcan@marcan.st> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-16e4def25ebb+820-iommu_fwspec_p1_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12iommu: Add mm_get_enqcmd_pasid() helper functionTina Zhang
mm_get_enqcmd_pasid() should be used by architecture code and closely related to learn the PASID value that the x86 ENQCMD operation should use for the mm. For the moment SMMUv3 uses this without any connection to ENQCMD, it will be cleaned up similar to how the prior patch made VT-d use the PASID argument of set_dev_pasid(). The motivation is to replace mm->pasid with an iommu private data structure that is introduced in a later patch. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231027000525.1278806-4-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12iommu: Change kconfig around IOMMU_SVAJason Gunthorpe
Linus suggested that the kconfig here is confusing: https://lore.kernel.org/all/CAHk-=wgUiAtiszwseM1p2fCJ+sC4XWQ+YN4TanFhUgvUqjr9Xw@mail.gmail.com/ Let's break it into three kconfigs controlling distinct things: - CONFIG_IOMMU_MM_DATA controls if the mm_struct has the additional fields for the IOMMU. Currently only PASID, but later patches store a struct iommu_mm_data * - CONFIG_ARCH_HAS_CPU_PASID controls if the arch needs the scheduling bit for keeping track of the ENQCMD instruction. x86 will select this if IOMMU_SVA is enabled - IOMMU_SVA controls if the IOMMU core compiles in the SVA support code for iommu driver use and the IOMMU exported API This way ARM will not enable CONFIG_ARCH_HAS_CPU_PASID Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231027000525.1278806-2-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12KVM: arm64: vgic: Ensure that slots_lock is held in ↵Marc Zyngier
vgic_register_all_redist_iodevs() Although we implicitly depend on slots_lock being held when registering IO devices with the IO bus infrastructure, we don't enforce this requirement. Make it explicit. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-12KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroyMarc Zyngier
When failing to create a vcpu because (for example) it has a duplicate vcpu_id, we destroy the vcpu. Amusingly, this leaves the redistributor registered with the KVM_MMIO bus. This is no good, and we should properly clean the mess. Force a teardown of the vgic vcpu interface, including the RD device before returning to the caller. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-12KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy()Marc Zyngier
As we are going to need to call into kvm_vgic_vcpu_destroy() without prior holding of the slots_lock, introduce __kvm_vgic_vcpu_destroy() as a non-locking primitive of kvm_vgic_vcpu_destroy(). Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-12KVM: arm64: vgic: Simplify kvm_vgic_destroy()Marc Zyngier
When destroying a vgic, we have rather cumbersome rules about when slots_lock and config_lock are held, resulting in fun buglets. The first port of call is to simplify kvm_vgic_map_resources() so that there is only one call to kvm_vgic_destroy() instead of two, with the second only holding half of the locks. For that, we kill the non-locking primitive and move the call outside of the locking altogether. This doesn't change anything (we re-acquire the locks and teardown the whole vgic), and simplifies the code significantly. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-11MIPS: SGI-IP27: hubio: fix nasid kernel-doc warningRandy Dunlap
ip27-hubio.c:32: warning: Function parameter or member 'nasid' not described in 'hub_pio_map' ip27-hubio.c:32: warning: Excess function parameter 'hub' description in 'hub_pio_map' Fixes: 4bf841ebf17a ("MIPS: SGI-IP27: get rid of compact node ids") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: lore.kernel.org/r/202311101336.BUL1JuvU-lkp@intel.com Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: linux-mips@vger.kernel.org Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-12-11arm64/sysreg: Add new system registers for GCSMark Brown
FEAT_GCS introduces a number of new system registers. Add the registers available up to EL2 to sysreg as per DDI0601 2022-12. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-13-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Add definition for FPMRMark Brown
DDI0601 2023-09 defines a new sysrem register FPMR (Floating Point Mode Register) which configures the new FP8 features. Add a definition of this register. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-12-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09Mark Brown
DDI0601 2023-09 defines new fields in HCRX_EL2 controlling access to new system registers, update our definition of HCRX_EL2 to reflect this. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-11-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09Mark Brown
DDI0601 2023-09 defines some new fields in SCTLR_EL1 controlling new MTE and floating point features. Update our sysreg definition to reflect these. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-10-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09Mark Brown
The 2023-09 release of DDI0601 defines a number of new feature enumeration fields in ID_AA64SMFR0_EL1. Add these fields. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-9-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Add definition for ID_AA64FPFR0_EL1Mark Brown
DDI0601 2023-09 defines a new feature register ID_AA64FPFR0_EL1 which enumerates a number of FP8 related features. Add a definition for it. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-8-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Add definition for ID_AA64ISAR3_EL1Mark Brown
DDI0601 2023-09 adds a new system register ID_AA64ISAR3_EL1 enumerating new floating point and TLB invalidation features. Add a defintion for it. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-7-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09Mark Brown
DDI0601 2023-09 defines some new fields in previously RES0 space in ID_AA64ISAR2_EL1, together with one new enum value. Update the system register definition to reflect this. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-6-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Add definition for ID_AA64PFR2_EL1Mark Brown
DDI0601 2023-09 defines a new system register ID_AA64PFR2_EL1 which enumerates FPMR and some new MTE features. Add a definition of this register. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-5-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: update CPACR_EL1 registerJoey Gouly
Add E0POE bit that traps accesses to POR_EL0 from EL0. Updated according to DDI0601 2023-03. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-4-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: add system register POR_EL{0,1}Joey Gouly
Add POR_EL{0,1} according to DDI0601 2023-03. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-3-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Add definition for HAFGRTR_EL2Fuad Tabba
Add a definition of HAFGRTR_EL2 (fine grained trap control for the AMU) as per DDI0601 2023-09. This was extracted from Fuad Tabba's patch "KVM: arm64: Handle HAFGRTR_EL2 trapping in nested virt". Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231206100503.564090-6-tabba@google.com [Extract sysreg update and rewrite commit message -- broonie] Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-2-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09Fuad Tabba
The 2023-09 release of the architecture XML (DDI0601) adds a new field ATS1E1A to HFGITR_EL2, update our definition of the register to match. This was extracted from Faud Tabba's patch "KVM: arm64: Add latest HFGITR_EL2 FGT entries to nested virt" [Extracted the sysreg definition from Faud's original patch and reword subject to match -- broonie] Signed-off-by: Fuad Tabba <tabba@google.com> Message-Id: <20231206100503.564090-4-tabba@google.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-1-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11s390: update defconfigsHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-11s390/mm: convert pgste locking functions to CClaudio Imbrenda
Convert pgste_get_lock() and pgste_set_unlock() to C. There is no real reasons to keep them in assembler. Having them in C makes them more readable and maintainable, and better instructions are used automatically when available. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20231205173252.62305-1-imbrenda@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-11s390/fpu: get rid of MACHINE_HAS_VXHeiko Carstens
Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx() which is a short readable wrapper for "test_facility(129)". Facility bit 129 is set if the vector facility is present. test_facility() returns also true for all bits which are set in the architecture level set of the cpu that the kernel is compiled for. This means that test_facility(129) is a compile time constant which returns true for z13 and later, since the vector facility bit is part of the z13 kernel ALS. In result the compiled code will have less runtime checks, and less code. Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-11s390/als: add vector facility to z13 architecture level setHeiko Carstens
Add the vector facility to the z13 architecture level set (ALS). All hypervisors support the vector facility since many years. Adding the facility to the ALS allows for compile time optimizations of the kernel: if the kernel is compiled for z13 or later, all tests which verify if the vector facility is present can return "true". This will be implemented with a subsequent patch. Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>