summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2024-03-24Merge tag 'efi-fixes-for-v6.9-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Fix logic that is supposed to prevent placement of the kernel image below LOAD_PHYSICAL_ADDR - Use the firmware stack in the EFI stub when running in mixed mode - Clear BSS only once when using mixed mode - Check efi.get_variable() function pointer for NULL before trying to call it * tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: fix panic in kdump kernel x86/efistub: Don't clear BSS twice in mixed mode x86/efistub: Call mixed mode boot services on the firmware's stack efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
2024-03-24Merge tag 'x86-urgent-2024-03-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Ensure that the encryption mask at boot is properly propagated on 5-level page tables, otherwise the PGD entry is incorrectly set to non-encrypted, which causes system crashes during boot. - Undo the deferred 5-level page table setup as it cannot work with memory encryption enabled. - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset to the default value but the cached variable is not, so subsequent comparisons might yield the wrong result and as a consequence the result prevents updating the MSR. - Register the local APIC address only once in the MPPARSE enumeration to prevent triggering the related WARN_ONs() in the APIC and topology code. - Handle the case where no APIC is found gracefully by registering a fake APIC in the topology code. That makes all related topology functions work correctly and does not affect the actual APIC driver code at all. - Don't evaluate logical IDs during early boot as the local APIC IDs are not yet enumerated and the invoked function returns an error code. Nothing requires the logical IDs before the final CPUID enumeration takes place, which happens after the enumeration. - Cure the fallout of the per CPU rework on UP which misplaced the copying of boot_cpu_data to per CPU data so that the final update to boot_cpu_data got lost which caused inconsistent state and boot crashes. - Use copy_from_kernel_nofault() in the kprobes setup as there is no guarantee that the address can be safely accessed. - Reorder struct members in struct saved_context to work around another kmemleak false positive - Remove the buggy code which tries to update the E820 kexec table for setup_data as that is never passed to the kexec kernel. - Update the resource control documentation to use the proper units. - Fix a Kconfig warning observed with tinyconfig * tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Move 5-level paging global variable assignments back x86/boot/64: Apply encryption mask to 5-level pagetable update x86/cpu: Add model number for another Intel Arrow Lake mobile processor x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD Documentation/x86: Document that resctrl bandwidth control units are MiB x86/mpparse: Register APIC address only once x86/topology: Handle the !APIC case gracefully x86/topology: Don't evaluate logical IDs during early boot x86/cpu: Ensure that CPU info updates are propagated on UP kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address x86/pm: Work around false positive kmemleak report in msr_build_context() x86/kexec: Do not update E820 kexec table for setup_data x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
2024-03-24x86/efistub: Call mixed mode boot services on the firmware's stackArd Biesheuvel
Normally, the EFI stub calls into the EFI boot services using the stack that was live when the stub was entered. According to the UEFI spec, this stack needs to be at least 128k in size - this might seem large but all asynchronous processing and event handling in EFI runs from the same stack and so quite a lot of space may be used in practice. In mixed mode, the situation is a bit different: the bootloader calls the 32-bit EFI stub entry point, which calls the decompressor's 32-bit entry point, where the boot stack is set up, using a fixed allocation of 16k. This stack is still in use when the EFI stub is started in 64-bit mode, and so all calls back into the EFI firmware will be using the decompressor's limited boot stack. Due to the placement of the boot stack right after the boot heap, any stack overruns have gone unnoticed. However, commit 5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code") moved the definition of the boot heap into C code, and now the boot stack is placed right at the base of BSS, where any overruns will corrupt the end of the .data section. While it would be possible to work around this by increasing the size of the boot stack, doing so would affect all x86 systems, and mixed mode systems are a tiny (and shrinking) fraction of the x86 installed base. So instead, record the firmware stack pointer value when entering from the 32-bit firmware, and switch to this stack every time a EFI boot service call is made. Cc: <stable@kernel.org> # v6.1+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24x86/boot/64: Move 5-level paging global variable assignments backTom Lendacky
Commit 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") moved assignment of 5-level global variables to later in the boot in order to avoid having to use RIP relative addressing in order to set them. However, when running with 5-level paging and SME active (mem_encrypt=on), the variables are needed as part of the page table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(), etc.). Since the variables haven't been set, the page table manipulation is done as if 4-level paging is active, causing the system to crash on boot. While only a subset of the assignments that were moved need to be set early, move all of the assignments back into check_la57_support() so that these assignments aren't spread between two locations. Instead of just reverting the fix, this uses the new RIP_REL_REF() macro when assigning the variables. Fixes: 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
2024-03-24x86/boot/64: Apply encryption mask to 5-level pagetable updateTom Lendacky
When running with 5-level page tables, the kernel mapping PGD entry is updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC, which, when SME is active (mem_encrypt=on), results in a page table entry without the encryption mask set, causing the system to crash on boot. Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so that the encryption mask is set for the PGD entry. Fixes: 533568e06b15 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
2024-03-24x86/cpu: Add model number for another Intel Arrow Lake mobile processorTony Luck
This one is the regular laptop CPU. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
2024-03-24x86/fpu: Keep xfd_state in sync with MSR_IA32_XFDAdamos Ttofari
Commit 672365477ae8 ("x86/fpu: Update XFD state where required") and commit 8bf26758ca96 ("x86/fpu: Add XFD state to fpstate") introduced a per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in order to avoid unnecessary writes to the MSR. On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which wipes out any stale state. But the per CPU cached xfd value is not reset, which brings them out of sync. As a consequence a subsequent xfd_update_state() might fail to update the MSR which in turn can result in XRSTOR raising a #NM in kernel space, which crashes the kernel. To fix this, introduce xfd_set_state() to write xfd_state together with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD. Fixes: 672365477ae8 ("x86/fpu: Update XFD state where required") Signed-off-by: Adamos Ttofari <attofari@amazon.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
2024-03-23x86/mpparse: Register APIC address only onceThomas Gleixner
The APIC address is registered twice. First during the early detection and afterwards when actually scanning the table for APIC IDs. The APIC and topology core warn about the second attempt. Restrict it to the early detection call. Fixes: 81287ad65da5 ("x86/apic: Sanitize APIC address setup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
2024-03-23x86/topology: Handle the !APIC case gracefullyThomas Gleixner
If there is no local APIC enumerated and registered then the topology bitmaps are empty. Therefore, topology_init_possible_cpus() will die with a division by zero exception. Prevent this by registering a fake APIC id to populate the topology bitmap. This also allows to use all topology query interfaces unconditionally. It does not affect the actual APIC code because either the local APIC address was not registered or no local APIC could be detected. Fixes: f1f758a80516 ("x86/topology: Add a mechanism to track topology via APIC IDs") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
2024-03-23x86/topology: Don't evaluate logical IDs during early bootThomas Gleixner
The local APICs have not yet been enumerated so the logical ID evaluation from the topology bitmaps does not work and would return an error code. Skip the evaluation during the early boot CPUID evaluation and only apply it on the final run. Fixes: 380414be78bf ("x86/cpu/topology: Use topology logical mapping mechanism") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
2024-03-23x86/cpu: Ensure that CPU info updates are propagated on UPThomas Gleixner
The boot sequence evaluates CPUID information twice: 1) During early boot 2) When finalizing the early setup right before mitigations are selected and alternatives are patched. In both cases the evaluation is stored in boot_cpu_data, but on UP the copying of boot_cpu_data to the per CPU info of the boot CPU happens between #1 and #2. So any update which happens in #2 is never propagated to the per CPU info instance. Consolidate the whole logic and copy boot_cpu_data right before applying alternatives as that's the point where boot_cpu_data is in it's final state and not supposed to change anymore. This also removes the voodoo mb() from smp_prepare_cpus_common() which had absolutely no purpose. Fixes: 71eb4893cfaf ("x86/percpu: Cure per CPU madness on UP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
2024-03-22kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe addressMasami Hiramatsu (Google)
Read from an unsafe address with copy_from_kernel_nofault() in arch_adjust_kprobe_addr() because this function is used before checking the address is in text or not. Syzcaller bot found a bug and reported the case if user specifies inaccessible data area, arch_adjust_kprobe_addr() will cause a kernel panic. [ mingo: Clarified the comment. ] Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes") Reported-by: Qiang Zhang <zzqq0103.hey@gmail.com> Tested-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/171042945004.154897.2221804961882915806.stgit@devnote2
2024-03-22x86/pm: Work around false positive kmemleak report in msr_build_context()Anton Altaparmakov
Since: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") kmemleak reports this issue: unreferenced object 0xf68241e0 (size 32): comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s) hex dump (first 32 bytes): 00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)........... 00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B.............. backtrace: [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260 [<ea65e13b>] __kmalloc+0x54/0x160 [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100 [<46635aff>] pm_check_save_msr+0x63/0x80 [<6b6bb938>] do_one_initcall+0x41/0x1f0 [<3f3add60>] kernel_init_freeable+0x199/0x1e8 [<3b538fde>] kernel_init+0x1a/0x110 [<938ae2b2>] ret_from_fork+0x1c/0x28 Which is a false positive. Reproducer: - Run rsync of whole kernel tree (multiple times if needed). - start a kmemleak scan - Note this is just an example: a lot of our internal tests hit these. The root cause is similar to the fix in: b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context() ie. the alignment within the packed struct saved_context which has everything unaligned as there is only "u16 gs;" at start of struct where in the past there were four u16 there thus aligning everything afterwards. The issue is with the fact that Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c) so when the struct members are not aligned it doesn't see them. Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240314142656.17699-1-anton@tuxera.com
2024-03-22x86/kexec: Do not update E820 kexec table for setup_dataDave Young
crashkernel reservation failed on a Thinkpad t440s laptop recently. Actually the memblock reservation succeeded, but later insert_resource() failed. Test steps: kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */ kexec reboot -> dmesg|grep "crashkernel reserved"; crashkernel memory range like below reserved successfully: 0x00000000d0000000 - 0x00000000da000000 But no such "Crash kernel" region in /proc/iomem The background story: Currently the E820 code reserves setup_data regions for both the current kernel and the kexec kernel, and it inserts them into the resources list. Before the kexec kernel reboots nobody passes the old setup_data, and kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed. Thus the old setup data memory is not used at all. Due to old kernel updates the kexec e820 table as well so kexec kernel sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data regions are inserted into resources list in the kexec kernel by e820__reserve_resources(). Note, due to no setup_data is passed in for those old regions they are not early reserved (by function early_reserve_memory), and the crashkernel memblock reservation will just treat them as usable memory and it could reserve the crashkernel region which overlaps with the old setup_data regions. And just like the bug I noticed here, kdump insert_resource failed because e820__reserve_resources has added the overlapped chunks in /proc/iomem already. Finally, looking at the code, the old setup_data regions are not used at all as no setup_data is passed in by the kexec boot loader. Although something like SETUP_PCI etc could be needed, kexec should pass the info as new setup_data so that kexec kernel can take care of them. This should be taken care of in other separate patches if needed. Thus drop the useless buggy code here. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Eric DeVolder <eric.devolder@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/Zf0T3HCG-790K-pZ@darkstar.users.ipa.redhat.com
2024-03-21Merge tag 'kbuild-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Generate a list of built DTB files (arch/*/boot/dts/dtbs-list) - Use more threads when building Debian packages in parallel - Fix warnings shown during the RPM kernel package uninstallation - Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to Makefile - Support GCC's -fmin-function-alignment flag - Fix a null pointer dereference bug in modpost - Add the DTB support to the RPM package - Various fixes and cleanups in Kconfig * tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits) kconfig: tests: test dependency after shuffling choices kconfig: tests: add a test for randconfig with dependent choices kconfig: tests: support KCONFIG_SEED for the randconfig runner kbuild: rpm-pkg: add dtb files in kernel rpm kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig() kconfig: check prompt for choice while parsing kconfig: lxdialog: remove unused dialog colors kconfig: lxdialog: fix button color for blackbg theme modpost: fix null pointer dereference kbuild: remove GCC's default -Wpacked-bitfield-compat flag kbuild: unexport abs_srctree and abs_objtree kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 kconfig: remove named choice support kconfig: use linked list in get_symbol_str() to iterate over menus kconfig: link menus to a symbol kbuild: fix inconsistent indentation in top Makefile kbuild: Use -fmin-function-alignment when available alpha: merge two entries for CONFIG_ALPHA_GAMMA alpha: merge two entries for CONFIG_ALPHA_EV4 kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj) ...
2024-03-21Merge tag 'char-misc-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver subsystem updates from Greg KH: "Here is the big set of char/misc and a number of other driver subsystem updates for 6.9-rc1. Included in here are: - IIO driver updates, loads of new ones and evolution of existing ones - coresight driver updates - const cleanups for many driver subsystems - speakup driver additions - platform remove callback void cleanups - mei driver updates - mhi driver updates - cdx driver updates for MSI interrupt handling - nvmem driver updates - other smaller driver updates and cleanups, full details in the shortlog All of these have been in linux-next for a long time with no reported issue, other than a build warning for the speakup driver" The build warning hits clang and is a gcc (and C23) extension, and is fixed up in the merge. Link: https://lore.kernel.org/all/20240321134831.GA2762840@dev-arch.thelio-3990X/ * tag 'char-misc-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (279 commits) binder: remove redundant variable page_addr uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion uio_pruss: UIO_MEM_DMA_COHERENT conversion cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT uio: introduce UIO_MEM_DMA_COHERENT type cdx: add MSI support for CDX bus pps: use cflags-y instead of EXTRA_CFLAGS speakup: Add /dev/synthu device speakup: Fix 8bit characters from direct synth parport: sunbpp: Convert to platform remove callback returning void parport: amiga: Convert to platform remove callback returning void char: xillybus: Convert to platform remove callback returning void vmw_balloon: change maintainership MAINTAINERS: change the maintainer for hpilo driver char: xilinx_hwicap: Fix NULL vs IS_ERR() bug hpet: remove hpets::hp_clocksource platform: goldfish: move the separate 'default' propery for CONFIG_GOLDFISH char: xilinx_hwicap: drop casting to void in dev_set_drvdata greybus: move is_gb_* functions out of greybus.h greybus: Remove usage of the deprecated ida_simple_xx() API ...
2024-03-21Merge tag 'hyperv-next-signed-20240320' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Use Hyper-V entropy to seed guest random number generator (Michael Kelley) - Convert to platform remove callback returning void for vmbus (Uwe Kleine-König) - Introduce hv_get_hypervisor_version function (Nuno Das Neves) - Rename some HV_REGISTER_* defines for consistency (Nuno Das Neves) - Change prefix of generic HV_REGISTER_* MSRs to HV_MSR_* (Nuno Das Neves) - Cosmetic changes for hv_spinlock.c (Purna Pavan Chandra Aekkaladevi) - Use per cpu initial stack for vtl context (Saurabh Sengar) * tag 'hyperv-next-signed-20240320' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Use Hyper-V entropy to seed guest random number generator x86/hyperv: Cosmetic changes for hv_spinlock.c hyperv-tlfs: Rename some HV_REGISTER_* defines for consistency hv: vmbus: Convert to platform remove callback returning void mshyperv: Introduce hv_get_hypervisor_version function x86/hyperv: Use per cpu initial stack for vtl context hyperv-tlfs: Change prefix of generic HV_REGISTER_* MSRs to HV_MSR_*
2024-03-21x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'Masahiro Yamada
Kconfig emits a warning for the following command: $ make ARCH=x86_64 tinyconfig ... .config:1380:warning: override: UNWINDER_GUESS changes choice state When X86_64=y, the unwinder is exclusively selected from the following three options: - UNWINDER_ORC - UNWINDER_FRAME_POINTER - UNWINDER_GUESS However, arch/x86/configs/tiny.config only specifies the values of the last two. UNWINDER_ORC must be explicitly disabled. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240320154313.612342-1-masahiroy@kernel.org
2024-03-19Merge tag 'for-linus-6.9-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - Xen event channel handling fix for a regression with a rare kernel config and some added hardening - better support of running Xen dom0 in PVH mode - a cleanup for the xen grant-dma-iommu driver * tag 'for-linus-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: increment refcnt only if event channel is refcounted xen/evtchn: avoid WARN() when unbinding an event channel x86/xen: attempt to inflate the memory balloon on PVH xen/grant-dma-iommu: Convert to platform remove callback returning void
2024-03-18x86/hyperv: Use Hyper-V entropy to seed guest random number generatorMichael Kelley
A Hyper-V host provides its guest VMs with entropy in a custom ACPI table named "OEM0". The entropy bits are updated each time Hyper-V boots the VM, and are suitable for seeding the Linux guest random number generator (rng). See a brief description of OEM0 in [1]. Generation 2 VMs on Hyper-V use UEFI to boot. Existing EFI code in Linux seeds the rng with entropy bits from the EFI_RNG_PROTOCOL. Via this path, the rng is seeded very early during boot with good entropy. The ACPI OEM0 table provided in such VMs is an additional source of entropy. Generation 1 VMs on Hyper-V boot from BIOS. For these VMs, Linux doesn't currently get any entropy from the Hyper-V host. While this is not fundamentally broken because Linux can generate its own entropy, using the Hyper-V host provided entropy would get the rng off to a better start and would do so earlier in the boot process. Improve the rng seeding for Generation 1 VMs by having Hyper-V specific code in Linux take advantage of the OEM0 table to seed the rng. For Generation 2 VMs, use the OEM0 table to provide additional entropy beyond the EFI_RNG_PROTOCOL. Because the OEM0 table is custom to Hyper-V, parse it directly in the Hyper-V code in the Linux kernel and use add_bootloader_randomness() to add it to the rng. Once the entropy bits are read from OEM0, zero them out in the table so they don't appear in /sys/firmware/acpi/tables/OEM0 in the running VM. The zero'ing is done out of an abundance of caution to avoid potential security risks to the rng. Also set the OEM0 data length to zero so a kexec or other subsequent use of the table won't try to use the zero'ed bits. [1] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20240318155408.216851-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240318155408.216851-1-mhklinux@outlook.com>
2024-03-18x86/hyperv: Cosmetic changes for hv_spinlock.cPurna Pavan Chandra Aekkaladevi
Fix issues reported by checkpatch.pl script for hv_spinlock.c file. - Place __initdata after variable name - Add missing blank line after enum declaration No functional changes intended. Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/1710763751-14137-1-git-send-email-paekkaladevi@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1710763751-14137-1-git-send-email-paekkaladevi@linux.microsoft.com>
2024-03-18hyperv-tlfs: Rename some HV_REGISTER_* defines for consistencyNuno Das Neves
Rename HV_REGISTER_GUEST_OSID to HV_REGISTER_GUEST_OS_ID. This matches the existing HV_X64_MSR_GUEST_OS_ID. Rename HV_REGISTER_CRASH_* to HV_REGISTER_GUEST_CRASH_*. Including GUEST_ is consistent with other #defines such as HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE. The new names also match the TLFS document more accurately, i.e. HvRegisterGuestCrash*. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Link: https://lore.kernel.org/r/1710285687-9160-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1710285687-9160-1-git-send-email-nunodasneves@linux.microsoft.com>
2024-03-17Merge tag 'perf-urgent-2024-03-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fixes from Ingo Molnar: - Work around AMD erratum to filter out bogus LBR stack entries - Fix incorrect PMU reset that can result in warnings (or worse) during suspend/hibernation * tag 'perf-urgent-2024-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/core: Avoid register reset when CPU is dead perf/x86/amd/lbr: Discard erroneous branch entries
2024-03-16x86/CPU/AMD: Update the Zenbleed microcode revisionsBorislav Petkov (AMD)
Update them to the correct revision numbers. Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ...
2024-03-15Merge tag 'devicetree-for-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DT core: - Add cleanup.h based auto release of struct device_node pointers via __free marking and new for_each_child_of_node_scoped() iterator to use it. - Always create a base skeleton DT when CONFIG_OF is enabled. This supports several usecases of adding DT data on non-DT booted systems. - Move around some /reserved-memory code in preparation for further improvements - Add a stub for_each_property_of_node() for !OF - Adjust the printk levels on some messages - Fix __be32 sparse warning - Drop RESERVEDMEM_OF_DECLARE usage from Freescale qbman driver (currently orphaned) - Add Saravana Kannan and drop Frank Rowand as DT maintainers DT bindings: - Convert Mediatek timer, Mediatek sysirq, fsl,imx6ul-tsc, fsl,imx6ul-pinctrl, Atmel AIC, Atmel HLCDC, FPGA region, and xlnx,sd-fec to DT schemas - Add existing, but undocumented fsl,imx-anatop binding - Add bunch of undocumented vendor prefixes used in compatible strings - Drop obsolete brcm,bcm2835-pm-wdt binding - Drop obsolete i2c.txt which as been replaced with schema in dtschema - Add DPS310 device and sort trivial-devices.yaml - Enable undocumented compatible checks on DT binding examples - More QCom maintainer fixes/updates - Updates to writing-schema.rst and DT submitting-patches.rst to cover some frequent review comments - Clean-up SPDX tags to use 'OR' rather than 'or'" * tag 'devicetree-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (56 commits) dt-bindings: soc: imx: fsl,imx-anatop: add imx6q regulators of: unittest: Use for_each_child_of_node_scoped() of: Introduce for_each_*_child_of_node_scoped() to automate of_node_put() handling of: Add cleanup.h based auto release via __free(device_node) markings of: Move all FDT reserved-memory handling into of_reserved_mem.c of: Add KUnit test to confirm DTB is loaded of: unittest: treat missing of_root as error instead of fixing up x86/of: Unconditionally call unflatten_and_copy_device_tree() um: Unconditionally call unflatten_device_tree() of: Create of_root if no dtb provided by firmware of: Always unflatten in unflatten_and_copy_device_tree() dt-bindings: timer: mediatek: Convert to json-schema dt-bindings: interrupt-controller: fsl,intmux: Include power-domains support soc: fsl: qbman: Remove RESERVEDMEM_OF_DECLARE usage dt-bindings: fsl-imx-sdma: fix HDMI audio index dt-bindings: soc: imx: fsl,imx-iomuxc-gpr: add imx6 dt-bindings: soc: imx: fsl,imx-anatop: add binding dt-bindings: input: touchscreen: fsl,imx6ul-tsc convert to YAML dt-bindings: pinctrl: fsl,imx6ul-pinctrl: convert to YAML of: make for_each_property_of_node() available to to !OF ...
2024-03-14Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-14Merge tag 'probes-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: "x86 kprobes: - Use boolean for some function return instead of 0 and 1 - Prohibit probing on INT/UD. This prevents user to put kprobe on INTn/INT1/INT3/INTO and UD0/UD1/UD2 because these are used for a special purpose in the kernel - Boost Grp instructions. Because a few percent of kernel instructions are Grp 2/3/4/5 and those are safe to be executed without ip register fixup, allow those to be boosted (direct execution on the trampoline buffer with a JMP) tracing: - Add function argument access from return events (kretprobe and fprobe). This allows user to compare how a data structure field is changed after executing a function. With BTF, return event also accepts function argument access by name. - Fix a wrong comment (using "Kretprobe" in fprobe) - Cleanup a big probe argument parser function into three parts, type parser, post-processing function, and main parser - Cleanup to set nr_args field when initializing trace_probe instead of counting up it while parsing - Cleanup a redundant #else block from tracefs/README source code - Update selftests to check entry argument access from return probes - Documentation update about entry argument access from return probes" * tag 'probes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Add entry argument access at function exit selftests/ftrace: Add test cases for entry args at function exit tracing/probes: Support $argN in return probe (kprobe and fprobe) tracing: Remove redundant #else block for BTF args from README tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init tracing/probes: Cleanup probe argument parser tracing/fprobe-event: cleanup: Fix a wrong comment in fprobe event x86/kprobes: Boost more instructions from grp2/3/4/5 x86/kprobes: Prohibit kprobing on INT and UD x86/kprobes: Refactor can_{probe,boost} return type to bool
2024-03-14Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The major features are support for LPA2 (52-bit VA/PA with 4K and 16K pages), the dpISA extension and Rust enabled on arm64. The changes are mostly contained within the usual arch/arm64/, drivers/perf, the arm64 Documentation and kselftests. The exception is the Rust support which touches some generic build files. Summary: - Reorganise the arm64 kernel VA space and add support for LPA2 (at stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address range with 4KB and 16KB pages - Enable Rust on arm64 - Support for the 2023 dpISA extensions (data processing ISA), host only - arm64 perf updates: - StarFive's StarLink (integrates one or more CPU cores with a shared L3 memory system) PMU support - Enable HiSilicon Erratum 162700402 quirk for HIP09 - Several updates for the HiSilicon PCIe PMU driver - Arm CoreSight PMU support - Convert all drivers under drivers/perf/ to use .remove_new() - Miscellaneous: - Don't enable workarounds for "rare" errata by default - Clean up the DAIF flags handling for EL0 returns (in preparation for NMI support) - Kselftest update for ptrace() - Update some of the sysreg field definitions - Slight improvement in the code generation for inline asm I/O accessors to permit offset addressing - kretprobes: acquire regs via a BRK exception (previously done via a trampoline handler) - SVE/SME cleanups, comment updates - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously disabled due to gcc silently ignoring -falign-functions=N)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (134 commits) Revert "mm: add arch hook to validate mmap() prot flags" Revert "arm64: mm: add support for WXN memory translation attribute" Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512" ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 kselftest/arm64: Add 2023 DPISA hwcap test coverage kselftest/arm64: Add basic FPMR test kselftest/arm64: Handle FPMR context in generic signal frame parser arm64/hwcap: Define hwcaps for 2023 DPISA features arm64/ptrace: Expose FPMR via ptrace arm64/signal: Add FPMR signal handling arm64/fpsimd: Support FEAT_FPMR arm64/fpsimd: Enable host kernel access to FPMR arm64/cpufeature: Hook new identification registers up to cpufeature docs: perf: Fix build warning of hisi-pcie-pmu.rst perf: starfive: Only allow COMPILE_TEST for 64-bit architectures MAINTAINERS: Add entry for StarFive StarLink PMU docs: perf: Add description for StarFive's StarLink PMU dt-bindings: perf: starfive: Add JH8100 StarLink PMU perf: starfive: Add StarLink PMU support docs: perf: Update usage for target filter of hisi-pcie-pmu ...
2024-03-14Merge tag 'pci-v6.9-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate interrupt related code in irq.c (Ilpo Järvinen) - Reduce kernel size by replacing sysfs resource macros with functions (Ilpo Järvinen) - Reduce kernel size by compiling sysfs support only when CONFIG_SYSFS=y (Lukas Wunner) - Avoid using Extended Tags on 3ware-9650SE Root Port to work around an apparent hardware defect (Jörg Wedekind) Resource management: - Fix an MMIO mapping leak in pci_iounmap() (Philipp Stanner) - Move pci_iomap.c and other PCI-specific devres code to drivers/pci (Philipp Stanner) - Consolidate PCI devres code in devres.c (Philipp Stanner) Power management: - Avoid D3cold on Asus B1400 PCI-NVMe bridge, where firmware doesn't know how to return correctly to D0, and remove previous quirk that wasn't as specific (Daniel Drake) - Allow runtime PM when the driver enables it but doesn't need any runtime PM callbacks (Raag Jadav) - Drain runtime-idle callbacks before driver removal to avoid races between .remove() and .runtime_idle(), which caused intermittent page faults when the rtsx .runtime_idle() accessed registers that its .remove() had already unmapped (Rafael J. Wysocki) Virtualization: - Avoid Secondary Bus Reset on LSI FW643 so it can be assigned to VMs with VFIO, e.g., for professional audio software on many Apple machines, at the cost of leaking state between VMs (Edmund Raile) Error handling: - Print all logged TLP Prefixes, not just the first, after AER or DPC errors (Ilpo Järvinen) - Quirk the DPC PIO log size for Intel Raptor Lake Root Ports, which still don't advertise a legal size (Paul Menzel) - Ignore expected DPC Surprise Down errors on hot removal (Smita Koralahalli) - Block runtime suspend while handling AER errors to avoid races that prevent the device form being resumed from D3hot (Stanislaw Gruszka) Peer-to-peer DMA: - Use atomic XA allocation in RCU read section (Christophe JAILLET) ASPM: - Collect bits of ASPM-related code that we need even without CONFIG_PCIEASPM into aspm.c (David E. Box) - Save/restore L1 PM Substates config for suspend/resume (David E. Box) - Update save_save when ASPM config is changed, so a .slot_reset() during error recovery restores the changed config, not the .probe()-time config (Vidya Sagar) Endpoint framework: - Refactor and improve pci_epf_alloc_space() API (Niklas Cassel) - Clean up endpoint BAR descriptions (Niklas Cassel) - Fix ntb_register_device() name leak in error path (Yang Yingliang) - Return actual error code for pci_vntb_probe() failure (Yang Yingliang) Broadcom STB PCIe controller driver: - Fix MDIO write polling, which previously never waited for completion (Jonathan Bell) Cadence PCIe endpoint driver: - Clear the ARI "Next Function Number" of last function (Jasko-EXT Wojciech) Freescale i.MX6 PCIe controller driver: - Simplify by replacing switch statements with function pointers for different hardware variants (Frank Li) - Simplify by using clk_bulk*() API (Frank Li) - Remove redundant DT clock and reg/reg-name details (Frank Li) - Add i.MX95 DT and driver support for both Root Complex and Endpoint mode (Frank Li) Microsoft Hyper-V host bridge driver: - Reduce memory usage by limiting ring buffer size to 16KB instead of 4 pages (Michael Kelley) Qualcomm PCIe controller driver: - Add X1E80100 DT and driver support (Abel Vesa) - Add DT 'required-opps' for SoCs that require a minimum performance level (Johan Hovold) - Make DT 'msi-map-mask' optional, depending on how MSI interrupts are mapped (Johan Hovold) - Disable ASPM L0s for sc8280xp, sa8540p and sa8295p because the PHY configuration isn't tuned correctly for L0s (Johan Hovold) - Split dt-binding qcom,pcie.yaml into qcom,pcie-common.yaml and separate files for SA8775p, SC7280, SC8180X, SC8280XP, SM8150, SM8250, SM8350, SM8450, SM8550 for easier reviewing (Krzysztof Kozlowski) - Enable BDF to SID translation by disabling bypass mode (Manivannan Sadhasivam) - Add endpoint MHI support for Snapdragon SA8775P SoC (Mrinmay Sarkar) Synopsys DesignWare PCIe controller driver: - Allocate 64-bit MSI address if no 32-bit address is available (Ajay Agarwal) - Fix endpoint Resizable BAR to actually advertise the required 1MB size (Niklas Cassel) MicroSemi Switchtec management driver: - Release resources if the .probe() fails (Christophe JAILLET) Miscellaneous: - Make pcie_port_bus_type const (Ricardo B. Marliere)" * tag 'pci-v6.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (77 commits) PCI/ASPM: Update save_state when configuration changes PCI/ASPM: Disable L1 before configuring L1 Substates PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() PCI/ASPM: Save L1 PM Substates Capability for suspend/resume PCI: hv: Fix ring buffer size calculation PCI: dwc: endpoint: Fix advertised resizable BAR size PCI: cadence: Clear the ARI Capability Next Function Number of the last function PCI: dwc: Strengthen the MSI address allocation logic PCI: brcmstb: Fix broken brcm_pcie_mdio_write() polling PCI: qcom: Add X1E80100 PCIe support dt-bindings: PCI: qcom: Document the X1E80100 PCIe Controller PCI: qcom: Enable BDF to SID translation properly PCI/AER: Generalize TLP Header Log reading PCI/AER: Use explicit register size for PCI_ERR_CAP PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p dt-bindings: PCI: qcom: Do not require 'msi-map-mask' dt-bindings: PCI: qcom: Allow 'required-opps' PCI/AER: Block runtime suspend when handling errors PCI/ASPM: Move pci_save_ltr_state() to aspm.c PCI/ASPM: Always build aspm.c ...
2024-03-14Merge tag 'platform-drivers-x86-v6.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: - New acer-wmi HW support - Support for new revision of amd/pmf heartbeat notify - Correctly handle asus-wmi HW without LEDs - fujitsu-laptop battery charge control support - Support for new hp-wmi thermal profiles - Support ideapad-laptop refresh rate key - Put intel/pmc AI accelerator (GNA) into D3 if it has no driver to allow entry into low-power modes, and temporarily removed Lunar Lake SSRAM support due to breaking FW changes causing probe fail (further breaking FW changes are still pending) - Report pmc/punit_atom devices that prevent reacing low power levels - Surface Fan speed function support - Support for more sperial keys and complete the list of models with non-standard fan registers in thinkpad_acpi - New DMI touchscreen HW support - Continued modernization efforts of wmi - Removal of obsoleted ledtrig-audio call and the related dependency - Debug & metrics interface improvements - Miscellaneous cleanups / fixes / improvements * tag 'platform-drivers-x86-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (87 commits) platform/x86/intel/pmc: Improve PKGC residency counters debug platform/x86: asus-wmi: Consider device is absent when the read is ~0 Documentation/x86/amd/hsmp: Updating urls platform/mellanox: mlxreg-hotplug: Remove redundant NULL-check platform/x86/amd/pmf: Update sps power thermals according to the platform-profiles platform/x86/amd/pmf: Add support to get sps default APTS index values platform/x86/amd/pmf: Add support to get APTS index numbers for static slider platform/x86/amd/pmf: Add support to notify sbios heart beat event platform/x86/amd/pmf: Add support to get sbios requests in PMF driver platform/x86/amd/pmf: Disable debugfs support for querying power thermals platform/x86/amd/pmf: Differentiate PMF ACPI versions x86/platform/atom: Check state of Punit managed devices on s2idle platform/x86: pmc_atom: Check state of PMC clocks on s2idle platform/x86: pmc_atom: Check state of PMC managed devices on s2idle platform/x86: pmc_atom: Annotate d3_sts register bit defines clk: x86: Move clk-pmc-atom register defines to include/linux/platform_data/x86/pmc_atom.h platform/x86: make fw_attr_class constant platform/x86/intel/tpmi: Change vsec offset to u64 platform/x86: intel_scu_pcidrv: Remove unused intel-mid.h platform/x86: intel_scu_wdt: Remove unused intel-mid.h ...
2024-03-13Merge tag 'efi-next-for-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Measure initrd and command line using the CC protocol if the ordinary TCG2 protocol is not implemented, typically on TDX confidential VMs - Avoid creating mappings that are both writable and executable while running in the EFI boot services. This is a prerequisite for getting the x86 shim loader signed by MicroSoft again, which allows the distros to install on x86 PCs that ship with EFI secure boot enabled. - API update for struct platform_driver::remove() * tag 'efi-next-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: virt: efi_secret: Convert to platform remove callback returning void x86/efistub: Remap kernel text read-only before dropping NX attribute efi/libstub: Add get_event_log() support for CC platforms efi/libstub: Measure into CC protocol if TCG2 protocol is absent efi/libstub: Add Confidential Computing (CC) measurement typedefs efi/tpm: Use symbolic GUID name from spec for final events table efi/libstub: Use TPM event typedefs from the TCG PC Client spec
2024-03-13Merge tag 'acpi-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These modify the ACPI device events and processor enumeration code to take the 'enabled' _STA bit into account as mandated by the ACPI specification, convert several platform drivers to using a remove callback that returns void, add some new quirks for ACPI IRQ override and other things, address assorted issues and clean up code. Specifics: - Rearrange Device Check and Bus Check notification handling in the ACPI device hotplug code to make it get the "enabled" _STA bit into account (Rafael Wysocki) - Modify acpi_processor_add() to skip processors with the "enabled" _STA bit clear, as per the specification (Rafael Wysocki) - Stop failing Device Check notification handling without a valid reason (Rafael Wysocki) - Defer enumeration of devices that depend on a device with an ACPI device ID equalt to INTC10CF to address probe ordering issues on some platforms (Wentong Wu) - Constify acpi_bus_type (Ricardo Marliere) - Make the ACPI-specific suspend-to-idle code take the Low-Power S0 Idle MSFT UUID into account on non-AMD systems (Rafael Wysocki) - Add ACPI IRQ override quirks for some new platforms (Sergey Kalinichev, Maxim Kudinov, Alexey Froloff, Sviatoslav Harasymchuk, Nicolas Haye) - Make the NFIT parsing code use acpi_evaluate_dsm_typed() (Andy Shevchenko) - Fix a memory leak in acpi_processor_power_exit() (Armin Wolf) - Make it possible to quirk the CSI-2 and MIPI DisCo for Imaging properties parsing and add a quirk for Dell XPS 9315 (Sakari Ailus) - Prevent false-positive static checker warnings from triggering by intializing some variables in the ACPI thermal code to zero (Colin Ian King) - Add DELL0501 handling to acpi_quirk_skip_serdev_enumeration() and make that function generic (Hans de Goede) - Make the ACPI backlight code handle fetching EDID that is longer than 256 bytes (Mario Limonciello) - Skip initialization of GHES_ASSIST structures for Machine Check Architecture in APEI (Avadhut Naik) - Convert several plaform drivers in the ACPI subsystem to using a remove callback that returns void (Uwe Kleine-König) - Drop the long-deprecated custom_method debugfs interface that is problematic from the security standpoint (Rafael Wysocki) - Use %pe in a couple of places in the ACPI code for easier error decoding (Onkarnath) - Fix register width information handling during system memory accesses in the ACPI CPPC library (Jarred White) - Add AMD CPPC V2 support for family 17h processors to the ACPI CPPC library (Perry Yuan)" * tag 'acpi-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) ACPI: resource: Use IRQ override on Maibenben X565 ACPI: CPPC: Use access_width over bit_width for system memory accesses ACPI: CPPC: enable AMD CPPC V2 support for family 17h processors ACPI: APEI: Skip initialization of GHES_ASSIST structures for Machine Check Architecture ACPI: scan: Consolidate Device Check and Bus Check notification handling ACPI: scan: Rework Device Check and Bus Check notification handling ACPI: scan: Make acpi_processor_add() check the device enabled bit ACPI: scan: Relocate acpi_bus_trim_one() ACPI: scan: Fix device check notification handling ACPI: resource: Add MAIBENBEN X577 to irq1_edge_low_force_override ACPI: pfr_update: Convert to platform remove callback returning void ACPI: pfr_telemetry: Convert to platform remove callback returning void ACPI: fan: Convert to platform remove callback returning void ACPI: GED: Convert to platform remove callback returning void ACPI: DPTF: Convert to platform remove callback returning void ACPI: AGDI: Convert to platform remove callback returning void ACPI: TAD: Convert to platform remove callback returning void ACPI: APEI: GHES: Convert to platform remove callback returning void ACPI: property: Polish ignoring bad data nodes ACPI: thermal_lib: Initialize temp_decik to zero ...
2024-03-13Merge tag 'pm-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "From the functional perspective, the most significant change here is the addition of support for Energy Models that can be updated dynamically at run time. There is also the addition of LZ4 compression support for hibernation, the new preferred core support in amd-pstate, new platforms support in the Intel RAPL driver, new model-specific EPP handling in intel_pstate and more. Apart from that, the cpufreq default transition delay is reduced from 10 ms to 2 ms (along with some related adjustments), the system suspend statistics code undergoes a significant rework and there is a usual bunch of fixes and code cleanups all over. Specifics: - Allow the Energy Model to be updated dynamically (Lukasz Luba) - Add support for LZ4 compression algorithm to the hibernation image creation and loading code (Nikhil V) - Fix and clean up system suspend statistics collection (Rafael Wysocki) - Simplify device suspend and resume handling in the power management core code (Rafael Wysocki) - Fix PCI hibernation support description (Yiwei Lin) - Make hibernation take set_memory_ro() return values into account as appropriate (Christophe Leroy) - Set mem_sleep_current during kernel command line setup to avoid an ordering issue with handling it (Maulik Shah) - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a driver's system suspend callback (Qingliang Li) - Simplify pm_runtime_get_if_active() usage and add a replacement for pm_runtime_put_autosuspend() (Sakari Ailus) - Add a tracepoint for runtime_status changes tracking (Vilas Bhat) - Fix section title markdown in the runtime PM documentation (Yiwei Lin) - Enable preferred core support in the amd-pstate cpufreq driver (Meng Li) - Fix min_perf assignment in amd_pstate_adjust_perf() and make the min/max limit perf values in amd-pstate always stay within the (highest perf, lowest perf) range (Tor Vic, Meng Li) - Allow intel_pstate to assign model-specific values to strings used in the EPP sysfs interface and make it do so on Meteor Lake (Srinivas Pandruvada) - Drop long-unused cpudata::prev_cummulative_iowait from the intel_pstate cpufreq driver (Jiri Slaby) - Prevent scaling_cur_freq from exceeding scaling_max_freq when the latter is an inefficient frequency (Shivnandan Kumar) - Change default transition delay in cpufreq to 2ms (Qais Yousef) - Remove references to 10ms minimum sampling rate from comments in the cpufreq code (Pierre Gondois) - Honour transition_latency over transition_delay_us in cpufreq (Qais Yousef) - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar) - General enhancements / cleanups to ARM cpufreq drivers (tianyu2, Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia Belova) - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan) - Make the SCMI cpufreq driver get a transition delay value from firmware (Pierre Gondois) - Prevent the haltpoll cpuidle governor from shrinking guest poll_limit_ns below grow_start (Parshuram Sangle) - Avoid potential overflow in integer multiplication when computing cpuidle state parameters (C Cheng) - Adjust MWAIT hint target C-state computation in the ACPI cpuidle driver and in intel_idle to return a correct value for C0 (He Rongguang) - Address multiple issues in the TPMI RAPL driver and add support for new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui) - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel Lezcano) - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li) - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth Norway Ananda) - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil) - Fix a couple of warnings in the OPP core code related to W=1 builds (Viresh Kumar) - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar) - Extend dev_pm_opp_data with turbo support (Sibi Sankar) - dt-bindings: drop maxItems from inner items (David Heidelberg)" * tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits) dt-bindings: opp: drop maxItems from inner items OPP: debugfs: Fix warning around icc_get_name() OPP: debugfs: Fix warning with W=1 builds cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h OPP: Extend dev_pm_opp_data with turbo support Fix cpupower-frequency-info.1 man page typo cpufreq: scmi: Set transition_delay_us firmware: arm_scmi: Populate fast channel rate_limit firmware: arm_scmi: Populate perf commands rate_limit cpuidle: ACPI/intel: fix MWAIT hint target C-state computation PM: sleep: wakeirq: fix wake irq warning in system suspend powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function cpufreq: Don't unregister cpufreq cooling on CPU hotplug PM: suspend: Set mem_sleep_current during kernel command line setup cpufreq: Honour transition_latency over transition_delay_us cpufreq: Limit resolving a frequency to policy min/max Documentation: PM: Fix runtime_pm.rst markdown syntax cpufreq: amd-pstate: adjust min/max limit perf cpufreq: Remove references to 10ms min sampling rate cpufreq: intel_pstate: Update default EPPs for Meteor Lake ...
2024-03-13x86/xen: attempt to inflate the memory balloon on PVHRoger Pau Monne
When running as PVH or HVM Linux will use holes in the memory map as scratch space to map grants, foreign domain pages and possibly miscellaneous other stuff. However the usage of such memory map holes for Xen purposes can be problematic. The request of holesby Xen happen quite early in the kernel boot process (grant table setup already uses scratch map space), and it's possible that by then not all devices have reclaimed their MMIO space. It's not unlikely for chunks of Xen scratch map space to end up using PCI bridge MMIO window memory, which (as expected) causes quite a lot of issues in the system. At least for PVH dom0 we have the possibility of using regions marked as UNUSABLE in the e820 memory map. Either if the region is UNUSABLE in the native memory map, or it has been converted into UNUSABLE in order to hide RAM regions from dom0, the second stage translation page-tables can populate those areas without issues. PV already has this kind of logic, where the balloon driver is inflated at boot. Re-use the current logic in order to also inflate it when running as PVH. onvert UNUSABLE regions up to the ratio specified in EXTRA_MEM_RATIO to RAM, while reserving them using xen_add_extra_mem() (which is also moved so it's no longer tied to CONFIG_PV). [jgross: fixed build for CONFIG_PVH without CONFIG_XEN_PVH] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20240220174341.56131-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2024-03-13perf/x86/amd/core: Avoid register reset when CPU is deadSandipan Das
When bringing a CPU online, some of the PMC and LBR related registers are reset. The same is done when a CPU is taken offline although that is unnecessary. This currently happens in the "cpu_dead" callback which is also incorrect as the callback runs on a control CPU instead of the one that is being taken offline. This also affects hibernation and suspend to RAM on some platforms as reported in the link below. Fixes: 21d59e3e2c40 ("perf/x86/amd/core: Detect PerfMonV2 support") Reported-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/550a026764342cf7e5812680e3e2b91fe662b5ac.1706526029.git.sandipan.das@amd.com
2024-03-13perf/x86/amd/lbr: Discard erroneous branch entriesSandipan Das
The Revision Guide for AMD Family 19h Model 10-1Fh processors declares Erratum 1452 which states that non-branch entries may erroneously be recorded in the Last Branch Record (LBR) stack with the valid and spec bits set. Such entries can be recognized by inspecting bit 61 of the corresponding LastBranchStackToIp register. This bit is currently reserved but if found to be set, the associated branch entry should be discarded. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://bugzilla.kernel.org/attachment.cgi?id=305518 Link: https://lore.kernel.org/r/3ad2aa305f7396d41a40e3f054f740d464b16b7f.1706526029.git.sandipan.das@amd.com
2024-03-12Merge tag 'net-next-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
2024-03-12Merge tag 'hardening-v6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As is pretty normal for this tree, there are changes all over the place, especially for small fixes, selftest improvements, and improved macro usability. Some header changes ended up landing via this tree as they depended on the string header cleanups. Also, a notable set of changes is the work for the reintroduction of the UBSAN signed integer overflow sanitizer so that we can continue to make improvements on the compiler side to make this sanitizer a more viable future security hardening option. Summary: - string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer" * tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits) selftests/powerpc: Fix load_unaligned_zeropad build failure string: Convert helpers selftest to KUnit string: Convert selftest to KUnit sh: Fix build with CONFIG_UBSAN=y compiler.h: Explain how __is_constexpr() works overflow: Allow non-type arg to type_max() and type_min() VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() lib/string_helpers: Add flags param to string_get_size() x86, relocs: Ignore relocations in .notes section objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks overflow: Use POD in check_shl_overflow() lib: stackinit: Adjust target string to 8 bytes for m68k sparc: vdso: Disable UBSAN instrumentation kernel.h: Move lib/cmdline.c prototypes to string.h leaking_addresses: Provide mechanism to scan binary files leaking_addresses: Ignore input device status lines leaking_addresses: Use File::Temp for /tmp files MAINTAINERS: Update LEAKING_ADDRESSES details fortify: Improve buffer overflow reporting fortify: Add KUnit tests for runtime overflows ...
2024-03-12Merge tag 'asm-generic-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Just two small updates this time: - A series I did to unify the definition of PAGE_SIZE through Kconfig, intended to help with a vdso rework that needs the constant but cannot include the normal kernel headers when building the compat VDSO on arm64 and potentially others - a patch from Yan Zhao to remove the pfn_to_virt() definitions from a couple of architectures after finding they were both incorrect and entirely unused" * tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: define CONFIG_PAGE_SIZE_*KB on all architectures arch: simplify architecture specific page size configuration arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions mm: Remove broken pfn_to_virt() on arch csky/hexagon/openrisc
2024-03-12Merge tag 'x86-boot-2024-03-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: - Continuing work by Ard Biesheuvel to improve the x86 early startup code, with the long-term goal to make it position independent: - Get rid of early accesses to global objects, either by moving them to the stack, deferring the access until later, or dropping the globals entirely - Move all code that runs early via the 1:1 mapping into .head.text, and move code that does not out of it, so that build time checks can be added later to ensure that no inadvertent absolute references were emitted into code that does not tolerate them - Remove fixup_pointer() and occurrences of __pa_symbol(), which rely on the compiler emitting absolute references, which is not guaranteed - Improve the early console code - Add early console message about ignored NMIs, so that users are at least warned about their existence - even if we cannot do anything about them - Improve the kexec code's kernel load address handling - Enable more X86S (simplified x86) bits - Simplify early boot GDT handling - Micro-optimize the boot code a bit - Misc cleanups * tag 'x86-boot-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/sev: Move early startup code into .head.text section x86/sme: Move early SME kernel encryption handling into .head.text x86/boot: Move mem_encrypt= parsing to the decompressor efi/libstub: Add generic support for parsing mem_encrypt= x86/startup_64: Simplify virtual switch on primary boot x86/startup_64: Simplify calculation of initial page table address x86/startup_64: Defer assignment of 5-level paging global variables x86/startup_64: Simplify CR4 handling in startup code x86/boot: Use 32-bit XOR to clear registers efi/x86: Set the PE/COFF header's NX compat flag unconditionally x86/boot/64: Load the final kernel GDT during early boot directly, remove startup_gdt[] x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[] x86/boot/64: Use RIP_REL_REF() to access early page tables x86/boot/64: Use RIP_REL_REF() to access '__supported_pte_mask' x86/boot/64: Use RIP_REL_REF() to access early_dynamic_pgts[] x86/boot/64: Use RIP_REL_REF() to assign 'phys_base' x86/boot/64: Simplify global variable accesses in GDT/IDT programming x86/trampoline: Bypass compat mode in trampoline_start64() if not needed kexec: Allocate kernel above bzImage's pref_address x86/boot: Add a message about ignored early NMIs ...
2024-03-12Merge tag 'x86-apic-2024-03-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC fixup from Dave Hansen: "Revert VERW fixed addressing patch. The reverted commit is not x86/apic material and was cruft left over from a merge. I believe the sequence of events went something like this: - The commit in question was added to x86/urgent - x86/urgent was merged into x86/apic to resolve a conflict - The commit was zapped from x86/urgent, but *not* from x86/apic - x86/apic got pullled (yesterday) I think we need to be a bit more vigilant when zapping things to make sure none of the other branches are depending on the zapped material" * tag 'x86-apic-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86/bugs: Use fixed addressing for VERW operand"
2024-03-12Merge tag 'rfds-for-linus-2024-03-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RFDS mitigation from Dave Hansen: "RFDS is a CPU vulnerability that may allow a malicious userspace to infer stale register values from kernel space. Kernel registers can have all kinds of secrets in them so the mitigation is basically to wait until the kernel is about to return to userspace and has user values in the registers. At that point there is little chance of kernel secrets ending up in the registers and the microarchitectural state can be cleared. This leverages some recent robustness fixes for the existing MDS vulnerability. Both MDS and RFDS use the VERW instruction for mitigation" * tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests x86/rfds: Mitigate Register File Data Sampling (RFDS) Documentation/hw-vuln: Add documentation for RFDS x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
2024-03-12Revert "x86/bugs: Use fixed addressing for VERW operand"Dave Hansen
This was reverts commit 8009479ee919b9a91674f48050ccbff64eafedaa. It was originally in x86/urgent, but was deemed wrong so got zapped. But in the meantime, x86/urgent had been merged into x86/apic to resolve a conflict. I didn't notice the merge so didn't zap it from x86/apic and it managed to make it up with the x86/apic material. The reverted commit is known to cause some KASAN problems. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2024-03-12x86/platform/atom: Check state of Punit managed devices on s2idleJohannes Stezenbach
For the Bay Trail or Cherry Trail SoC to enter the S0i3 power-level at s2idle suspend requires most of the hw-blocks / devices in the SoC to be in D3 when entering s2idle suspend. If some devices are not in D3 then the SoC will stay in a higher power state, consuming much more power from the battery then in S0i3. Use the new acpi_s2idle_dev_ops and acpi_register_lps0_dev() functionality to register a new s2idle check function which checks that all hardware blocks in the North complex (controlled by Punit) are in a state that allows the SoC to enter S0i3 and prints an error message for any device in D0. Signed-off-by: Johannes Stezenbach <js@sig21.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: "Borislav Petkov (AMD)" <bp@alien8.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [hdegoede: Use acpi_s2idle_dev_ops] Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240305105915.76242-6-hdegoede@redhat.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-03-12Merge branch 'linus' into x86/boot, to resolve conflictIngo Molnar
There's a new conflict with Linus's upstream tree, because in the following merge conflict resolution in <asm/coco.h>: 38b334fc767e Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Linus has resolved the conflicting placement of 'cc_mask' better than the original commit: 1c811d403afd x86/sev: Fix position dependent variable references in startup code ... which was also done by an internal merge resolution: 2e5fc4786b7a Merge branch 'x86/sev' into x86/boot, to resolve conflicts and to pick up dependent tree But Linus is right in 38b334fc767e, the 'cc_mask' declaration is sufficient within the #ifdef CONFIG_ARCH_HAS_CC_PLATFORM block. So instead of forcing Linus to do the same resolution again, merge in Linus's tree and follow his conflict resolution. Conflicts: arch/x86/include/asm/coco.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-03-12mshyperv: Introduce hv_get_hypervisor_version functionNuno Das Neves
Introduce x86_64 and arm64 functions to get the hypervisor version information and store it in a structure for simpler parsing. Use the new function to get and parse the version at boot time. While at it, move the printing code to hv_common_init() so it is not duplicated. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1709852618-29110-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1709852618-29110-1-git-send-email-nunodasneves@linux.microsoft.com>
2024-03-11Merge tag 'x86_tdx_for_6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tdx update from Dave Hansen: - Fix sparse warning from TDX use of movdir64b() * tag 'x86_tdx_for_6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove the __iomem annotation of movdir64b()'s dst argument
2024-03-11Merge tag 'x86_mm_for_6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Dave Hansen: - Add a warning when memory encryption conversions fail. These operations require VMM cooperation, even in CoCo environments where the VMM is untrusted. While it's _possible_ that memory pressure could trigger the new warning, the odds are that a guest would only see this from an attacking VMM. - Simplify page fault code by re-enabling interrupts unconditionally - Avoid truncation issues when pfns are passed in to pfn_to_kaddr() with small (<64-bit) types. * tag 'x86_mm_for_6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/cpa: Warn for set_memory_XXcrypted() VMM fails x86/mm: Get rid of conditional IF flag handling in page fault path x86/mm: Ensure input to pfn_to_kaddr() is treated as a 64-bit type