From a2f7a0bfcaaa3928e4876d15edd4dfdc09e139b6 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Wed, 25 Sep 2019 09:44:53 +0800 Subject: x86/mm: Fix function name typo in pmd_read_atomic() comment The function involved should be pte_offset_map_lock() and we never have function pmd_offset_map_lock defined. Signed-off-by: Wei Yang Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190925014453.20236-1-richardw.yang@linux.intel.com [ Minor edits. ] Signed-off-by: Ingo Molnar --- arch/x86/include/asm/pgtable-3level.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index e3633795fb22..1796462ff143 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -44,10 +44,10 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte) * pmd_populate rightfully does a set_64bit, but if we're reading the * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen * because gcc will not read the 64bit of the pmd atomically. To fix - * this all places running pmd_offset_map_lock() while holding the + * this all places running pte_offset_map_lock() while holding the * mmap_sem in read mode, shall read the pmdp pointer using this * function to know if the pmd is null nor not, and in turn to know if - * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd + * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd * operations. * * Without THP if the mmap_sem is hold for reading, the pmd can only -- cgit From 44e09568cf2d874cb2a8e2ac35acf71a9ae3402b Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Wed, 25 Sep 2019 08:38:57 +0200 Subject: x86/mm: Clean up the pmd_read_atomic() comments Fix spelling, consistent parenthesis and grammar - and also clarify the language where needed. Reviewed-by: Wei Yang Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- arch/x86/include/asm/pgtable-3level.h | 44 ++++++++++++++++++----------------- 1 file changed, 23 insertions(+), 21 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index 1796462ff143..5afb5e0fe903 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -36,39 +36,41 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte) #define pmd_read_atomic pmd_read_atomic /* - * pte_offset_map_lock on 32bit PAE kernels was reading the pmd_t with - * a "*pmdp" dereference done by gcc. Problem is, in certain places - * where pte_offset_map_lock is called, concurrent page faults are + * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with + * a "*pmdp" dereference done by GCC. Problem is, in certain places + * where pte_offset_map_lock() is called, concurrent page faults are * allowed, if the mmap_sem is hold for reading. An example is mincore * vs page faults vs MADV_DONTNEED. On the page fault side - * pmd_populate rightfully does a set_64bit, but if we're reading the + * pmd_populate() rightfully does a set_64bit(), but if we're reading the * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen - * because gcc will not read the 64bit of the pmd atomically. To fix - * this all places running pte_offset_map_lock() while holding the + * because GCC will not read the 64-bit value of the pmd atomically. + * + * To fix this all places running pte_offset_map_lock() while holding the * mmap_sem in read mode, shall read the pmdp pointer using this - * function to know if the pmd is null nor not, and in turn to know if + * function to know if the pmd is null or not, and in turn to know if * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd * operations. * - * Without THP if the mmap_sem is hold for reading, the pmd can only - * transition from null to not null while pmd_read_atomic runs. So + * Without THP if the mmap_sem is held for reading, the pmd can only + * transition from null to not null while pmd_read_atomic() runs. So * we can always return atomic pmd values with this function. * - * With THP if the mmap_sem is hold for reading, the pmd can become + * With THP if the mmap_sem is held for reading, the pmd can become * trans_huge or none or point to a pte (and in turn become "stable") - * at any time under pmd_read_atomic. We could read it really - * atomically here with a atomic64_read for the THP enabled case (and + * at any time under pmd_read_atomic(). We could read it truly + * atomically here with an atomic64_read() for the THP enabled case (and * it would be a whole lot simpler), but to avoid using cmpxchg8b we * only return an atomic pmdval if the low part of the pmdval is later - * found stable (i.e. pointing to a pte). And we're returning a none - * pmdval if the low part of the pmd is none. In some cases the high - * and low part of the pmdval returned may not be consistent if THP is - * enabled (the low part may point to previously mapped hugepage, - * while the high part may point to a more recently mapped hugepage), - * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part - * of the pmd to be read atomically to decide if the pmd is unstable - * or not, with the only exception of when the low part of the pmd is - * zero in which case we return a none pmd. + * found to be stable (i.e. pointing to a pte). We are also returning a + * 'none' (zero) pmdval if the low part of the pmd is zero. + * + * In some cases the high and low part of the pmdval returned may not be + * consistent if THP is enabled (the low part may point to previously + * mapped hugepage, while the high part may point to a more recently + * mapped hugepage), but pmd_none_or_trans_huge_or_clear_bad() only + * needs the low part of the pmd to be read atomically to decide if the + * pmd is unstable or not, with the only exception when the low part + * of the pmd is zero, in which case we return a 'none' pmd. */ static inline pmd_t pmd_read_atomic(pmd_t *pmdp) { -- cgit From 87d6021b814353d7b353afcc3698ffe49de7d4ec Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 1 Oct 2019 16:23:35 +0200 Subject: x86/math-emu: Limit MATH_EMULATION to 486SX compatibles MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The FPU emulation code is old and fragile in places, try to limit its use to builds for CPUs that actually use it. As far as I can tell, this is only true for i486sx compatibles, including the Cyrix 486SLC, AMD Am486SX and ÉLAN SC410, UMC U5S amd DM&P VortexSX86, all of which were relatively short-lived and got replaced with i486DX compatible processors soon after introduction, though some of the embedded versions remained available much longer. Signed-off-by: Arnd Bergmann Signed-off-by: Borislav Petkov Reviewed-by: Kees Cook Cc: "H. Peter Anvin" Cc: Andrew Morton Cc: Bill Metzenthen Cc: Ingo Molnar Cc: Masahiro Yamada Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20191001142344.1274185-2-arnd@arndb.de --- arch/x86/include/asm/module.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h index 7948a17febb4..c215d2762488 100644 --- a/arch/x86/include/asm/module.h +++ b/arch/x86/include/asm/module.h @@ -15,6 +15,8 @@ struct mod_arch_specific { #ifdef CONFIG_X86_64 /* X86_64 does not define MODULE_PROC_FAMILY */ +#elif defined CONFIG_M486SX +#define MODULE_PROC_FAMILY "486SX " #elif defined CONFIG_M486 #define MODULE_PROC_FAMILY "486 " #elif defined CONFIG_M586 -- cgit From 0959f8256ada0431c1470d89e5a2811ff2305c88 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 10 Sep 2019 09:58:41 -0500 Subject: x86/platform/uv: Return UV Hubless System Type Return the type of UV hubless system for UV specific code that depends on that. Add a function to convert UV system type to bit pattern needed for is_uv_hubless(). Signed-off-by: Mike Travis Reviewed-by: Steve Wahl Reviewed-by: Dimitri Sivanich Cc: Andrew Morton Cc: Borislav Petkov Cc: Christoph Hellwig Cc: H. Peter Anvin Cc: Hedi Berriche Cc: Justin Ernst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Russ Anderson Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190910145839.814880843@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uv/uv.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h index 6bc6d89d8e2a..792faab9df64 100644 --- a/arch/x86/include/asm/uv/uv.h +++ b/arch/x86/include/asm/uv/uv.h @@ -12,6 +12,14 @@ struct mm_struct; #ifdef CONFIG_X86_UV #include +static inline int uv(int uvtype) +{ + /* uv(0) is "any" */ + if (uvtype >= 0 && uvtype <= 30) + return 1 << uvtype; + return 1; +} + extern unsigned long uv_systab_phys; extern enum uv_system_type get_uv_system_type(void); @@ -20,7 +28,7 @@ static inline bool is_early_uv_system(void) return uv_systab_phys && uv_systab_phys != EFI_INVALID_TABLE_ADDR; } extern int is_uv_system(void); -extern int is_uv_hubless(void); +extern int is_uv_hubless(int uvtype); extern void uv_cpu_init(void); extern void uv_nmi_init(void); extern void uv_system_init(void); @@ -32,7 +40,7 @@ extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask, static inline enum uv_system_type get_uv_system_type(void) { return UV_NONE; } static inline bool is_early_uv_system(void) { return 0; } static inline int is_uv_system(void) { return 0; } -static inline int is_uv_hubless(void) { return 0; } +static inline int is_uv_hubless(int uv) { return 0; } static inline void uv_cpu_init(void) { } static inline void uv_system_init(void) { } static inline const struct cpumask * -- cgit From 9743cb68f736d986481edba4d00de454d2faa0ec Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 10 Sep 2019 09:58:42 -0500 Subject: x86/platform/uv: Add return code to UV BIOS Init function Add a return code to the UV BIOS init function that indicates the successful initialization of the kernel/BIOS callback interface. Signed-off-by: Mike Travis Reviewed-by: Steve Wahl Reviewed-by: Dimitri Sivanich Cc: Andrew Morton Cc: Borislav Petkov Cc: Christoph Hellwig Cc: H. Peter Anvin Cc: Hedi Berriche Cc: Justin Ernst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Russ Anderson Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190910145839.895739629@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uv/bios.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 6e7caf65fa40..389174eaec79 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -138,7 +138,7 @@ extern s64 uv_bios_change_memprotect(u64, u64, enum uv_memprotect); extern s64 uv_bios_reserved_page_pa(u64, u64 *, u64 *, u64 *); extern int uv_bios_set_legacy_vga_target(bool decode, int domain, int bus); -extern void uv_bios_init(void); +extern int uv_bios_init(void); extern unsigned long sn_rtc_cycles_per_second; extern int uv_type; -- cgit From 8785968bce1cc7368ea95c3e1e5b9210f56f6667 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 10 Sep 2019 09:58:44 -0500 Subject: x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files Indicate to UV user utilities that UV hubless support is available on this system via the existing /proc infterface. The current interface is maintained with the addition of new /proc leaves ("hubbed", "hubless", and "oemid") that contain the specific type of UV arch this one is. Signed-off-by: Mike Travis Reviewed-by: Steve Wahl Reviewed-by: Dimitri Sivanich Cc: Andrew Morton Cc: Borislav Petkov Cc: Christoph Hellwig Cc: H. Peter Anvin Cc: Hedi Berriche Cc: Justin Ernst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Russ Anderson Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190910145840.055590900@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uv/uv.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h index 792faab9df64..45ea95ce79b4 100644 --- a/arch/x86/include/asm/uv/uv.h +++ b/arch/x86/include/asm/uv/uv.h @@ -12,6 +12,8 @@ struct mm_struct; #ifdef CONFIG_X86_UV #include +#define UV_PROC_NODE "sgi_uv" + static inline int uv(int uvtype) { /* uv(0) is "any" */ @@ -28,6 +30,7 @@ static inline bool is_early_uv_system(void) return uv_systab_phys && uv_systab_phys != EFI_INVALID_TABLE_ADDR; } extern int is_uv_system(void); +extern int is_uv_hubbed(int uvtype); extern int is_uv_hubless(int uvtype); extern void uv_cpu_init(void); extern void uv_nmi_init(void); @@ -40,6 +43,7 @@ extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask, static inline enum uv_system_type get_uv_system_type(void) { return UV_NONE; } static inline bool is_early_uv_system(void) { return 0; } static inline int is_uv_system(void) { return 0; } +static inline int is_uv_hubbed(int uv) { return 0; } static inline int is_uv_hubless(int uv) { return 0; } static inline void uv_cpu_init(void) { } static inline void uv_system_init(void) { } -- cgit From 4fb7d08707565d27ec84a364d011043ade8c38b4 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 10 Sep 2019 09:58:47 -0500 Subject: x86/platform/uv: Account for UV Hubless in is_uvX_hub Ops The references in the is_uvX_hub() function uses the hub_info pointer which will be NULL when the system is hubless. This change avoids that NULL dereference. It is also an optimization in performance. Signed-off-by: Mike Travis Reviewed-by: Steve Wahl Reviewed-by: Dimitri Sivanich Cc: Andrew Morton Cc: Borislav Petkov Cc: Christoph Hellwig Cc: H. Peter Anvin Cc: Hedi Berriche Cc: Justin Ernst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Russ Anderson Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190910145840.294981941@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uv/uv_hub.h | 61 +++++++++++++--------------------------- 1 file changed, 20 insertions(+), 41 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/uv/uv_hub.h b/arch/x86/include/asm/uv/uv_hub.h index 44cf6d6deb7a..950cd1395d5d 100644 --- a/arch/x86/include/asm/uv/uv_hub.h +++ b/arch/x86/include/asm/uv/uv_hub.h @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -243,83 +244,61 @@ static inline int uv_hub_info_check(int version) #define UV4_HUB_REVISION_BASE 7 #define UV4A_HUB_REVISION_BASE 8 /* UV4 (fixed) rev 2 */ -#ifdef UV1_HUB_IS_SUPPORTED +/* WARNING: UVx_HUB_IS_SUPPORTED defines are deprecated and will be removed */ static inline int is_uv1_hub(void) { - return uv_hub_info->hub_revision < UV2_HUB_REVISION_BASE; -} +#ifdef UV1_HUB_IS_SUPPORTED + return is_uv_hubbed(uv(1)); #else -static inline int is_uv1_hub(void) -{ return 0; -} #endif +} -#ifdef UV2_HUB_IS_SUPPORTED static inline int is_uv2_hub(void) { - return ((uv_hub_info->hub_revision >= UV2_HUB_REVISION_BASE) && - (uv_hub_info->hub_revision < UV3_HUB_REVISION_BASE)); -} +#ifdef UV2_HUB_IS_SUPPORTED + return is_uv_hubbed(uv(2)); #else -static inline int is_uv2_hub(void) -{ return 0; -} #endif +} -#ifdef UV3_HUB_IS_SUPPORTED static inline int is_uv3_hub(void) { - return ((uv_hub_info->hub_revision >= UV3_HUB_REVISION_BASE) && - (uv_hub_info->hub_revision < UV4_HUB_REVISION_BASE)); -} +#ifdef UV3_HUB_IS_SUPPORTED + return is_uv_hubbed(uv(3)); #else -static inline int is_uv3_hub(void) -{ return 0; -} #endif +} /* First test "is UV4A", then "is UV4" */ -#ifdef UV4A_HUB_IS_SUPPORTED -static inline int is_uv4a_hub(void) -{ - return (uv_hub_info->hub_revision >= UV4A_HUB_REVISION_BASE); -} -#else static inline int is_uv4a_hub(void) { +#ifdef UV4A_HUB_IS_SUPPORTED + if (is_uv_hubbed(uv(4))) + return (uv_hub_info->hub_revision == UV4A_HUB_REVISION_BASE); +#endif return 0; } -#endif -#ifdef UV4_HUB_IS_SUPPORTED static inline int is_uv4_hub(void) { - return uv_hub_info->hub_revision >= UV4_HUB_REVISION_BASE; -} +#ifdef UV4_HUB_IS_SUPPORTED + return is_uv_hubbed(uv(4)); #else -static inline int is_uv4_hub(void) -{ return 0; -} #endif +} static inline int is_uvx_hub(void) { - if (uv_hub_info->hub_revision >= UV2_HUB_REVISION_BASE) - return uv_hub_info->hub_revision; - - return 0; + return (is_uv_hubbed(-2) >= uv(2)); } static inline int is_uv_hub(void) { -#ifdef UV1_HUB_IS_SUPPORTED - return uv_hub_info->hub_revision; -#endif - return is_uvx_hub(); + return is_uv1_hub() || is_uvx_hub(); } union uvh_apicid { -- cgit From 9d40b85bb46a99bc95dad3a07787da93b0a018e9 Mon Sep 17 00:00:00 2001 From: Babu Moger Date: Mon, 7 Oct 2019 15:48:39 -0500 Subject: x86/cpufeatures: Add feature bit RDPRU on AMD MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit AMD Zen 2 introduces a new RDPRU instruction which is used to give access to some processor registers that are typically only accessible when the privilege level is zero. ECX is used as the implicit register to specify which register to read. RDPRU places the specified register’s value into EDX:EAX. For example, the RDPRU instruction can be used to read MPERF and APERF at CPL > 0. Add the feature bit so it is visible in /proc/cpuinfo. Details are available in the AMD64 Architecture Programmer’s Manual: https://www.amd.com/system/files/TechDocs/24594.pdf Signed-off-by: Babu Moger Signed-off-by: Borislav Petkov Cc: Aaron Lewis Cc: ak@linux.intel.com Cc: Fenghua Yu Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Josh Poimboeuf Cc: "Peter Zijlstra (Intel)" Cc: robert.hu@linux.intel.com Cc: Thomas Gleixner Cc: Thomas Hellstrom Cc: x86-ml Link: https://lkml.kernel.org/r/20191007204839.5727.10803.stgit@localhost.localdomain --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 0652d3eed9bd..1db10a112537 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -292,6 +292,7 @@ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ #define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */ #define X86_FEATURE_XSAVEERPTR (13*32+ 2) /* Always save/restore FP error pointers */ +#define X86_FEATURE_RDPRU (13*32+ 4) /* Read processor register at user level */ #define X86_FEATURE_WBNOINVD (13*32+ 9) /* WBNOINVD instruction */ #define X86_FEATURE_AMD_IBPB (13*32+12) /* "" Indirect Branch Prediction Barrier */ #define X86_FEATURE_AMD_IBRS (13*32+14) /* "" Indirect Branch Restricted Speculation */ -- cgit From 0935e5f7527ccd46163b42e1540409c98e29fe17 Mon Sep 17 00:00:00 2001 From: Ralf Ramsauer Date: Thu, 10 Oct 2019 12:21:01 +0200 Subject: x86/jailhouse: Improve setup data version comparison Soon, setup_data will contain information on passed-through platform UARTs. This requires some preparational work for the sanity check of the header and the check of the version. Use the following strategy: 1. Ensure that the header declares at least enough space for the version and the compatible_version as it must hold that fields for any version. The location and semantics of header+version fields will never change. 2. Copy over data -- as much as as possible. The length is either limited by the header length or the length of setup_data. 3. Things are now in place -- sanity check if the header length complies the actual version. For future versions of the setup_data, only step 3 requires alignment. Signed-off-by: Ralf Ramsauer Signed-off-by: Borislav Petkov Reviewed-by: Jan Kiszka Cc: Baoquan He Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: jailhouse-dev@googlegroups.com Cc: Juergen Gross Cc: "Kirill A. Shutemov" Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20191010102102.421035-2-ralf.ramsauer@oth-regensburg.de --- arch/x86/include/uapi/asm/bootparam.h | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index c895df5482c5..43be437c9c71 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -139,15 +139,19 @@ struct boot_e820_entry { * setup data structure. */ struct jailhouse_setup_data { - __u16 version; - __u16 compatible_version; - __u16 pm_timer_address; - __u16 num_cpus; - __u64 pci_mmconfig_base; - __u32 tsc_khz; - __u32 apic_khz; - __u8 standard_ioapic; - __u8 cpu_ids[255]; + struct { + __u16 version; + __u16 compatible_version; + } __attribute__((packed)) hdr; + struct { + __u16 pm_timer_address; + __u16 num_cpus; + __u64 pci_mmconfig_base; + __u32 tsc_khz; + __u32 apic_khz; + __u8 standard_ioapic; + __u8 cpu_ids[255]; + } __attribute__((packed)) v1; } __attribute__((packed)); /* The so-called "zeropage" */ -- cgit From 7a56b81c474619fa84c60d07eaa287c8fc33ac3c Mon Sep 17 00:00:00 2001 From: Ralf Ramsauer Date: Thu, 10 Oct 2019 12:21:02 +0200 Subject: x86/jailhouse: Only enable platform UARTs if available ACPI tables aren't available if Linux runs as guest of the hypervisor Jailhouse. This makes the 8250 driver probe for all platform UARTs as it assumes that all UARTs are present in case of !ACPI. Jailhouse will stop execution of Linux guest due to port access violation. So far, these access violations were solved by tuning the 8250.nr_uarts cmdline parameter, but this has limitations: Only consecutive platform UARTs can be mapped to Linux, and only in the sequence 0x3f8, 0x2f8, 0x3e8, 0x2e8. Beginning from setup_data version 2, Jailhouse will place information of available platform UARTs in setup_data. This allows for selective activation of platform UARTs. Query setup_data version and only activate available UARTS. This patch comes with backward compatibility, and will still support older setup_data versions. In case of older setup_data versions, Linux falls back to the old behaviour. Signed-off-by: Ralf Ramsauer Signed-off-by: Borislav Petkov Reviewed-by: Jan Kiszka Cc: Baoquan He Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: jailhouse-dev@googlegroups.com Cc: Juergen Gross Cc: "Kirill A. Shutemov" Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20191010102102.421035-3-ralf.ramsauer@oth-regensburg.de --- arch/x86/include/uapi/asm/bootparam.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index 43be437c9c71..db1e24e56e94 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -152,6 +152,9 @@ struct jailhouse_setup_data { __u8 standard_ioapic; __u8 cpu_ids[255]; } __attribute__((packed)) v1; + struct { + __u32 flags; + } __attribute__((packed)) v2; } __attribute__((packed)); /* The so-called "zeropage" */ -- cgit From 8661d769ab77c675b5eb6c3351a372b9fbc1bf40 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Tue, 8 Oct 2019 15:40:45 -0700 Subject: syscalls/x86: Use the correct function type in SYSCALL_DEFINE0 Although a syscall defined using SYSCALL_DEFINE0 doesn't accept parameters, use the correct function type to avoid type mismatches with Control-Flow Integrity (CFI) checking. Signed-off-by: Sami Tolvanen Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: H . Peter Anvin Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20191008224049.115427-2-samitolvanen@google.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/syscall_wrapper.h | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h index e046a405743d..90eb70df0b18 100644 --- a/arch/x86/include/asm/syscall_wrapper.h +++ b/arch/x86/include/asm/syscall_wrapper.h @@ -48,12 +48,13 @@ * To keep the naming coherent, re-define SYSCALL_DEFINE0 to create an alias * named __ia32_sys_*() */ -#define SYSCALL_DEFINE0(sname) \ - SYSCALL_METADATA(_##sname, 0); \ - asmlinkage long __x64_sys_##sname(void); \ - ALLOW_ERROR_INJECTION(__x64_sys_##sname, ERRNO); \ - SYSCALL_ALIAS(__ia32_sys_##sname, __x64_sys_##sname); \ - asmlinkage long __x64_sys_##sname(void) + +#define SYSCALL_DEFINE0(sname) \ + SYSCALL_METADATA(_##sname, 0); \ + asmlinkage long __x64_sys_##sname(const struct pt_regs *__unused);\ + ALLOW_ERROR_INJECTION(__x64_sys_##sname, ERRNO); \ + SYSCALL_ALIAS(__ia32_sys_##sname, __x64_sys_##sname); \ + asmlinkage long __x64_sys_##sname(const struct pt_regs *__unused) #define COND_SYSCALL(name) \ cond_syscall(__x64_sys_##name); \ @@ -181,11 +182,11 @@ * macros to work correctly. */ #ifndef SYSCALL_DEFINE0 -#define SYSCALL_DEFINE0(sname) \ - SYSCALL_METADATA(_##sname, 0); \ - asmlinkage long __x64_sys_##sname(void); \ - ALLOW_ERROR_INJECTION(__x64_sys_##sname, ERRNO); \ - asmlinkage long __x64_sys_##sname(void) +#define SYSCALL_DEFINE0(sname) \ + SYSCALL_METADATA(_##sname, 0); \ + asmlinkage long __x64_sys_##sname(const struct pt_regs *__unused);\ + ALLOW_ERROR_INJECTION(__x64_sys_##sname, ERRNO); \ + asmlinkage long __x64_sys_##sname(const struct pt_regs *__unused) #endif #ifndef COND_SYSCALL -- cgit From cf3b83e19d7c928e05a5d193c375463182c6029a Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 8 Oct 2019 15:40:46 -0700 Subject: syscalls/x86: Wire up COMPAT_SYSCALL_DEFINE0 x86 has special handling for COMPAT_SYSCALL_DEFINEx, but there was no override for COMPAT_SYSCALL_DEFINE0. Wire it up so that we can use it for rt_sigreturn. Signed-off-by: Andy Lutomirski Signed-off-by: Sami Tolvanen Cc: Borislav Petkov Cc: H . Peter Anvin Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20191008224049.115427-3-samitolvanen@google.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/syscall_wrapper.h | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h index 90eb70df0b18..3dab04841494 100644 --- a/arch/x86/include/asm/syscall_wrapper.h +++ b/arch/x86/include/asm/syscall_wrapper.h @@ -28,13 +28,21 @@ * kernel/sys_ni.c and SYS_NI in kernel/time/posix-stubs.c to cover this * case as well. */ +#define __IA32_COMPAT_SYS_STUB0(x, name) \ + asmlinkage long __ia32_compat_sys_##name(const struct pt_regs *regs);\ + ALLOW_ERROR_INJECTION(__ia32_compat_sys_##name, ERRNO); \ + asmlinkage long __ia32_compat_sys_##name(const struct pt_regs *regs)\ + { \ + return __se_compat_sys_##name(); \ + } + #define __IA32_COMPAT_SYS_STUBx(x, name, ...) \ asmlinkage long __ia32_compat_sys##name(const struct pt_regs *regs);\ ALLOW_ERROR_INJECTION(__ia32_compat_sys##name, ERRNO); \ asmlinkage long __ia32_compat_sys##name(const struct pt_regs *regs)\ { \ return __se_compat_sys##name(SC_IA32_REGS_TO_ARGS(x,__VA_ARGS__));\ - } \ + } #define __IA32_SYS_STUBx(x, name, ...) \ asmlinkage long __ia32_sys##name(const struct pt_regs *regs); \ @@ -76,15 +84,24 @@ * of the x86-64-style parameter ordering of x32 syscalls. The syscalls common * with x86_64 obviously do not need such care. */ +#define __X32_COMPAT_SYS_STUB0(x, name, ...) \ + asmlinkage long __x32_compat_sys_##name(const struct pt_regs *regs);\ + ALLOW_ERROR_INJECTION(__x32_compat_sys_##name, ERRNO); \ + asmlinkage long __x32_compat_sys_##name(const struct pt_regs *regs)\ + { \ + return __se_compat_sys_##name();\ + } + #define __X32_COMPAT_SYS_STUBx(x, name, ...) \ asmlinkage long __x32_compat_sys##name(const struct pt_regs *regs);\ ALLOW_ERROR_INJECTION(__x32_compat_sys##name, ERRNO); \ asmlinkage long __x32_compat_sys##name(const struct pt_regs *regs)\ { \ return __se_compat_sys##name(SC_X86_64_REGS_TO_ARGS(x,__VA_ARGS__));\ - } \ + } #else /* CONFIG_X86_X32 */ +#define __X32_COMPAT_SYS_STUB0(x, name) #define __X32_COMPAT_SYS_STUBx(x, name, ...) #endif /* CONFIG_X86_X32 */ @@ -95,6 +112,17 @@ * mapping of registers to parameters, we need to generate stubs for each * of them. */ +#define COMPAT_SYSCALL_DEFINE0(name) \ + static long __se_compat_sys_##name(void); \ + static inline long __do_compat_sys_##name(void); \ + __IA32_COMPAT_SYS_STUB0(x, name) \ + __X32_COMPAT_SYS_STUB0(x, name) \ + static long __se_compat_sys_##name(void) \ + { \ + return __do_compat_sys_##name(); \ + } \ + static inline long __do_compat_sys_##name(void) + #define COMPAT_SYSCALL_DEFINEx(x, name, ...) \ static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\ -- cgit From 6e4847640c6aebcaa2d9b3686cecc91b41f09269 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Tue, 8 Oct 2019 15:40:49 -0700 Subject: syscalls/x86: Fix function types in COND_SYSCALL Define a weak function in COND_SYSCALL instead of a weak alias to sys_ni_syscall(), which has an incompatible type. This fixes indirect call mismatches with Control-Flow Integrity (CFI) checking. Signed-off-by: Sami Tolvanen Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: H . Peter Anvin Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20191008224049.115427-6-samitolvanen@google.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/syscall_wrapper.h | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h index 3dab04841494..e2389ce9bf58 100644 --- a/arch/x86/include/asm/syscall_wrapper.h +++ b/arch/x86/include/asm/syscall_wrapper.h @@ -6,6 +6,8 @@ #ifndef _ASM_X86_SYSCALL_WRAPPER_H #define _ASM_X86_SYSCALL_WRAPPER_H +struct pt_regs; + /* Mapping of registers to parameters for syscalls on x86-64 and x32 */ #define SC_X86_64_REGS_TO_ARGS(x, ...) \ __MAP(x,__SC_ARGS \ @@ -64,9 +66,15 @@ SYSCALL_ALIAS(__ia32_sys_##sname, __x64_sys_##sname); \ asmlinkage long __x64_sys_##sname(const struct pt_regs *__unused) -#define COND_SYSCALL(name) \ - cond_syscall(__x64_sys_##name); \ - cond_syscall(__ia32_sys_##name) +#define COND_SYSCALL(name) \ + asmlinkage __weak long __x64_sys_##name(const struct pt_regs *__unused) \ + { \ + return sys_ni_syscall(); \ + } \ + asmlinkage __weak long __ia32_sys_##name(const struct pt_regs *__unused)\ + { \ + return sys_ni_syscall(); \ + } #define SYS_NI(name) \ SYSCALL_ALIAS(__x64_sys_##name, sys_ni_posix_timers); \ @@ -218,7 +226,11 @@ #endif #ifndef COND_SYSCALL -#define COND_SYSCALL(name) cond_syscall(__x64_sys_##name) +#define COND_SYSCALL(name) \ + asmlinkage __weak long __x64_sys_##name(const struct pt_regs *__unused) \ + { \ + return sys_ni_syscall(); \ + } #endif #ifndef SYS_NI @@ -230,7 +242,6 @@ * For VSYSCALLS, we need to declare these three syscalls with the new * pt_regs-based calling convention for in-kernel use. */ -struct pt_regs; asmlinkage long __x64_sys_getcpu(const struct pt_regs *regs); asmlinkage long __x64_sys_gettimeofday(const struct pt_regs *regs); asmlinkage long __x64_sys_time(const struct pt_regs *regs); -- cgit From f53e2cd0b8ab7d9e390414470bdbd830f660133f Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Fri, 13 Sep 2019 14:14:02 -0700 Subject: x86/mm: Use the correct function type for native_set_fixmap() We call native_set_fixmap indirectly through the function pointer struct pv_mmu_ops::set_fixmap, which expects the first parameter to be 'unsigned' instead of 'enum fixed_addresses'. This patch changes the function type for native_set_fixmap to match the pointer, which fixes indirect call mismatches with Control-Flow Integrity (CFI) checking. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H . Peter Anvin Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190913211402.193018-1-samitolvanen@google.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/fixmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 0c47aa82e2e2..28183ee3cc42 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -156,7 +156,7 @@ extern pte_t *kmap_pte; extern pte_t *pkmap_page_table; void __native_set_fixmap(enum fixed_addresses idx, pte_t pte); -void native_set_fixmap(enum fixed_addresses idx, +void native_set_fixmap(unsigned /* enum fixed_addresses */ idx, phys_addr_t phys, pgprot_t flags); #ifndef CONFIG_PARAVIRT_XXL -- cgit From f7919fd943abf0c77aed4441ea9897a323d132f5 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Fri, 6 Sep 2019 22:13:48 +0900 Subject: x86/asm: Allow to pass macros to __ASM_FORM() Use __stringify() at __ASM_FORM() so that user can pass code including macros to __ASM_FORM(). Signed-off-by: Masami Hiramatsu Signed-off-by: Peter Zijlstra (Intel) Cc: Juergen Gross Cc: x86@kernel.org Cc: Boris Ostrovsky Cc: Ingo Molnar Cc: Stefano Stabellini Cc: Andrew Cooper Cc: Borislav Petkov Cc: xen-devel@lists.xenproject.org Cc: Randy Dunlap Cc: Josh Poimboeuf Link: https://lkml.kernel.org/r/156777562873.25081.2288083344657460959.stgit@devnote2 --- arch/x86/include/asm/asm.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index 3ff577c0b102..1b563f9167ea 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -7,9 +7,11 @@ # define __ASM_FORM_RAW(x) x # define __ASM_FORM_COMMA(x) x, #else -# define __ASM_FORM(x) " " #x " " -# define __ASM_FORM_RAW(x) #x -# define __ASM_FORM_COMMA(x) " " #x "," +#include + +# define __ASM_FORM(x) " " __stringify(x) " " +# define __ASM_FORM_RAW(x) __stringify(x) +# define __ASM_FORM_COMMA(x) " " __stringify(x) "," #endif #ifndef __x86_64__ -- cgit From b3dc0695fa40c3b280230fb6fb7fb7a94ce28bf4 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Fri, 6 Sep 2019 22:13:59 +0900 Subject: x86: xen: kvm: Gather the definition of emulate prefixes Gather the emulate prefixes, which forcibly make the following instruction emulated on virtualization, in one place. Suggested-by: Peter Zijlstra Signed-off-by: Masami Hiramatsu Signed-off-by: Peter Zijlstra (Intel) Cc: Juergen Gross Cc: x86@kernel.org Cc: Ingo Molnar Cc: Boris Ostrovsky Cc: Andrew Cooper Cc: Stefano Stabellini Cc: Borislav Petkov Cc: xen-devel@lists.xenproject.org Cc: Randy Dunlap Cc: Josh Poimboeuf Link: https://lkml.kernel.org/r/156777563917.25081.7286628561790289995.stgit@devnote2 --- arch/x86/include/asm/emulate_prefix.h | 14 ++++++++++++++ arch/x86/include/asm/xen/interface.h | 11 ++++------- 2 files changed, 18 insertions(+), 7 deletions(-) create mode 100644 arch/x86/include/asm/emulate_prefix.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/emulate_prefix.h b/arch/x86/include/asm/emulate_prefix.h new file mode 100644 index 000000000000..70f5b98a5286 --- /dev/null +++ b/arch/x86/include/asm/emulate_prefix.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_EMULATE_PREFIX_H +#define _ASM_X86_EMULATE_PREFIX_H + +/* + * Virt escape sequences to trigger instruction emulation; + * ideally these would decode to 'whole' instruction and not destroy + * the instruction stream; sadly this is not true for the 'kvm' one :/ + */ + +#define __XEN_EMULATE_PREFIX 0x0f,0x0b,0x78,0x65,0x6e /* ud2 ; .ascii "xen" */ +#define __KVM_EMULATE_PREFIX 0x0f,0x0b,0x6b,0x76,0x6d /* ud2 ; .ascii "kvm" */ + +#endif diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h index 62ca03ef5c65..9139b3e86316 100644 --- a/arch/x86/include/asm/xen/interface.h +++ b/arch/x86/include/asm/xen/interface.h @@ -379,12 +379,9 @@ struct xen_pmu_arch { * Prefix forces emulation of some non-trapping instructions. * Currently only CPUID. */ -#ifdef __ASSEMBLY__ -#define XEN_EMULATE_PREFIX .byte 0x0f,0x0b,0x78,0x65,0x6e ; -#define XEN_CPUID XEN_EMULATE_PREFIX cpuid -#else -#define XEN_EMULATE_PREFIX ".byte 0x0f,0x0b,0x78,0x65,0x6e ; " -#define XEN_CPUID XEN_EMULATE_PREFIX "cpuid" -#endif +#include + +#define XEN_EMULATE_PREFIX __ASM_FORM(.byte __XEN_EMULATE_PREFIX ;) +#define XEN_CPUID XEN_EMULATE_PREFIX __ASM_FORM(cpuid) #endif /* _ASM_X86_XEN_INTERFACE_H */ -- cgit From 4d65adfcd1196818659d3bd9b42dccab291e1751 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Fri, 6 Sep 2019 22:14:10 +0900 Subject: x86: xen: insn: Decode Xen and KVM emulate-prefix signature Decode Xen and KVM's emulate-prefix signature by x86 insn decoder. It is called "prefix" but actually not x86 instruction prefix, so this adds insn.emulate_prefix_size field instead of reusing insn.prefixes. If x86 decoder finds a special sequence of instructions of XEN_EMULATE_PREFIX and 'ud2a; .ascii "kvm"', it just counts the length, set insn.emulate_prefix_size and fold it with the next instruction. In other words, the signature and the next instruction is treated as a single instruction. Signed-off-by: Masami Hiramatsu Signed-off-by: Peter Zijlstra (Intel) Acked-by: Josh Poimboeuf Cc: Juergen Gross Cc: x86@kernel.org Cc: Boris Ostrovsky Cc: Ingo Molnar Cc: Stefano Stabellini Cc: Andrew Cooper Cc: Borislav Petkov Cc: xen-devel@lists.xenproject.org Cc: Randy Dunlap Link: https://lkml.kernel.org/r/156777564986.25081.4964537658500952557.stgit@devnote2 --- arch/x86/include/asm/insn.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h index 154f27be8bfc..5c1ae3eff9d4 100644 --- a/arch/x86/include/asm/insn.h +++ b/arch/x86/include/asm/insn.h @@ -45,6 +45,7 @@ struct insn { struct insn_field immediate2; /* for 64bit imm or seg16 */ }; + int emulate_prefix_size; insn_attr_t attr; unsigned char opnd_bytes; unsigned char addr_bytes; @@ -128,6 +129,11 @@ static inline int insn_is_evex(struct insn *insn) return (insn->vex_prefix.nbytes == 4); } +static inline int insn_has_emulate_prefix(struct insn *insn) +{ + return !!insn->emulate_prefix_size; +} + /* Ensure this instruction is decoded completely */ static inline int insn_complete(struct insn *insn) { -- cgit From ffedeeb780dc554eff3d3b16e6a462a26a41d7ec Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Fri, 11 Oct 2019 13:50:41 +0200 Subject: linkage: Introduce new macros for assembler symbols Introduce new C macros for annotations of functions and data in assembly. There is a long-standing mess in macros like ENTRY, END, ENDPROC and similar. They are used in different manners and sometimes incorrectly. So introduce macros with clear use to annotate assembly as follows: a) Support macros for the ones below SYM_T_FUNC -- type used by assembler to mark functions SYM_T_OBJECT -- type used by assembler to mark data SYM_T_NONE -- type used by assembler to mark entries of unknown type They are defined as STT_FUNC, STT_OBJECT, and STT_NOTYPE respectively. According to the gas manual, this is the most portable way. I am not sure about other assemblers, so this can be switched back to %function and %object if this turns into a problem. Architectures can also override them by something like ", @function" if they need. SYM_A_ALIGN, SYM_A_NONE -- align the symbol? SYM_L_GLOBAL, SYM_L_WEAK, SYM_L_LOCAL -- linkage of symbols b) Mostly internal annotations, used by the ones below SYM_ENTRY -- use only if you have to (for non-paired symbols) SYM_START -- use only if you have to (for paired symbols) SYM_END -- use only if you have to (for paired symbols) c) Annotations for code SYM_INNER_LABEL_ALIGN -- only for labels in the middle of code SYM_INNER_LABEL -- only for labels in the middle of code SYM_FUNC_START_LOCAL_ALIAS -- use where there are two local names for one function SYM_FUNC_START_ALIAS -- use where there are two global names for one function SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function SYM_FUNC_START -- use for global functions SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment SYM_FUNC_START_LOCAL -- use for local functions SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o alignment SYM_FUNC_START_WEAK -- use for weak functions SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment SYM_FUNC_END -- the end of SYM_FUNC_START_LOCAL, SYM_FUNC_START, SYM_FUNC_START_WEAK, ... For functions with special (non-C) calling conventions: SYM_CODE_START -- use for non-C (special) functions SYM_CODE_START_NOALIGN -- use for non-C (special) functions, w/o alignment SYM_CODE_START_LOCAL -- use for local non-C (special) functions SYM_CODE_START_LOCAL_NOALIGN -- use for local non-C (special) functions, w/o alignment SYM_CODE_END -- the end of SYM_CODE_START_LOCAL or SYM_CODE_START d) For data SYM_DATA_START -- global data symbol SYM_DATA_START_LOCAL -- local data symbol SYM_DATA_END -- the end of the SYM_DATA_START symbol SYM_DATA_END_LABEL -- the labeled end of SYM_DATA_START symbol SYM_DATA -- start+end wrapper around simple global data SYM_DATA_LOCAL -- start+end wrapper around simple local data ========== The macros allow to pair starts and ends of functions and mark functions correctly in the output ELF objects. All users of the old macros in x86 are converted to use these in further patches. Signed-off-by: Jiri Slaby Signed-off-by: Borislav Petkov Acked-by: Rafael J. Wysocki Cc: Andrew Morton Cc: Andrey Ryabinin Cc: Boris Ostrovsky Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Len Brown Cc: Linus Torvalds Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: Mark Rutland Cc: Pavel Machek Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: x86-ml Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191011115108.12392-2-jslaby@suse.cz --- arch/x86/include/asm/linkage.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index 14caa9d9fb7f..e07188e8d763 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -13,9 +13,13 @@ #ifdef __ASSEMBLY__ -#define GLOBAL(name) \ - .globl name; \ - name: +/* + * GLOBAL is DEPRECATED + * + * use SYM_DATA_START, SYM_FUNC_START, SYM_INNER_LABEL, SYM_CODE_START, or + * similar + */ +#define GLOBAL(name) SYM_ENTRY(name, SYM_L_GLOBAL, SYM_A_NONE) #if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16) #define __ALIGN .p2align 4, 0x90 -- cgit From b4edca150106a68d05eaf823d665a355ff19e28b Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Fri, 11 Oct 2019 13:50:59 +0200 Subject: x86/asm: Remove the last GLOBAL user and remove the macro Convert the remaining 32bit users and remove the GLOBAL macro finally. In particular, this means to use SYM_ENTRY for the singlestepping hack region. Exclude the global definition of GLOBAL from x86 too. Signed-off-by: Jiri Slaby Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-arch@vger.kernel.org Cc: Mark Rutland Cc: "Rafael J. Wysocki" Cc: Thomas Gleixner Cc: Will Deacon Cc: x86-ml Link: https://lkml.kernel.org/r/20191011115108.12392-20-jslaby@suse.cz --- arch/x86/include/asm/linkage.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index e07188e8d763..365111789cc6 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -13,14 +13,6 @@ #ifdef __ASSEMBLY__ -/* - * GLOBAL is DEPRECATED - * - * use SYM_DATA_START, SYM_FUNC_START, SYM_INNER_LABEL, SYM_CODE_START, or - * similar - */ -#define GLOBAL(name) SYM_ENTRY(name, SYM_L_GLOBAL, SYM_A_NONE) - #if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16) #define __ALIGN .p2align 4, 0x90 #define __ALIGN_STR __stringify(__ALIGN) -- cgit From f2c4e5970cece75a895fcc45f0cd66b5a5ec0819 Mon Sep 17 00:00:00 2001 From: Jia He Date: Fri, 11 Oct 2019 22:09:38 +0800 Subject: x86/mm: implement arch_faults_on_old_pte() stub on x86 arch_faults_on_old_pte is a helper to indicate that it might cause page fault when accessing old pte. But on x86, there is feature to setting pte access flag by hardware. Hence implement an overriding stub which always returns false. Signed-off-by: Jia He Suggested-by: Will Deacon Signed-off-by: Catalin Marinas --- arch/x86/include/asm/pgtable.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 0bc530c4eb13..ad97dc155195 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1463,6 +1463,12 @@ static inline bool arch_has_pfn_modify_check(void) return boot_cpu_has_bug(X86_BUG_L1TF); } +#define arch_faults_on_old_pte arch_faults_on_old_pte +static inline bool arch_faults_on_old_pte(void) +{ + return false; +} + #include #endif /* __ASSEMBLY__ */ -- cgit From db633a4e0e6eda69b6065e3e106f9ea13a0676c3 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Mon, 21 Oct 2019 19:24:02 +0200 Subject: x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm LLVM's assembler doesn't accept the short form INL instruction: inl (%%dx) but instead insists on the output register to be explicitly specified. This was previously fixed for the VMWARE_PORT macro. Fix it also for the VMWARE_HYPERCALL macro. Suggested-by: Sami Tolvanen Signed-off-by: Thomas Hellstrom Reviewed-by: Nick Desaulniers Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sean Christopherson Cc: Thomas Gleixner Cc: clang-built-linux@googlegroups.com Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas_os@shipmail.org Signed-off-by: Ingo Molnar --- arch/x86/include/asm/vmware.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h index e00c9e875933..3caac90f9761 100644 --- a/arch/x86/include/asm/vmware.h +++ b/arch/x86/include/asm/vmware.h @@ -29,7 +29,8 @@ /* The low bandwidth call. The low word of edx is presumed clear. */ #define VMWARE_HYPERCALL \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; inl (%%dx)", \ + ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; " \ + "inl (%%dx), %%eax", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) -- cgit From 6fee2a0be0ecae939d4b6cd8297d88b5cbb61654 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Mon, 21 Oct 2019 19:24:03 +0200 Subject: x86/cpu/vmware: Fix platform detection VMWARE_PORT macro The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT definition, but expects it to be an integer. However, when it was moved to the new vmware.h include file, it was changed to be a string to better fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the platform detection VMWARE_PORT functionality. Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB definitions to be integers, and use __stringify() for their stringified form when needed. Signed-off-by: Thomas Hellstrom Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sean Christopherson Cc: Thomas Gleixner Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas_os@shipmail.org Signed-off-by: Ingo Molnar --- arch/x86/include/asm/vmware.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h index 3caac90f9761..ac9fc51e2b18 100644 --- a/arch/x86/include/asm/vmware.h +++ b/arch/x86/include/asm/vmware.h @@ -4,6 +4,7 @@ #include #include +#include /* * The hypercall definitions differ in the low word of the %edx argument @@ -20,8 +21,8 @@ */ /* Old port-based version */ -#define VMWARE_HYPERVISOR_PORT "0x5658" -#define VMWARE_HYPERVISOR_PORT_HB "0x5659" +#define VMWARE_HYPERVISOR_PORT 0x5658 +#define VMWARE_HYPERVISOR_PORT_HB 0x5659 /* Current vmcall / vmmcall version */ #define VMWARE_HYPERVISOR_HB BIT(0) @@ -29,7 +30,7 @@ /* The low bandwidth call. The low word of edx is presumed clear. */ #define VMWARE_HYPERCALL \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; " \ + ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT) ", %%dx; " \ "inl (%%dx), %%eax", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) @@ -39,7 +40,8 @@ * HB and OUT bits set. */ #define VMWARE_HYPERCALL_HB_OUT \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep outsb", \ + ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \ + "rep outsb", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) @@ -48,7 +50,8 @@ * HB bit set. */ #define VMWARE_HYPERCALL_HB_IN \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep insb", \ + ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \ + "rep insb", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) #endif -- cgit From f8845541e93c5b41618405de6735edd6f0cc8984 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 27 Sep 2019 14:45:21 -0700 Subject: KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg' Now that indexing into arch.regs is either protected by WARN_ON_ONCE or done with hardcoded enums, combine all definitions for registers that are tracked by regs_avail and regs_dirty into 'enum kvm_reg'. Having a single enum type will simplify additional cleanup related to regs_avail and regs_dirty. Signed-off-by: Sean Christopherson Reviewed-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 50eb430b0ad8..c86c95a499af 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -156,10 +156,8 @@ enum kvm_reg { VCPU_REGS_R15 = __VCPU_REGS_R15, #endif VCPU_REGS_RIP, - NR_VCPU_REGS -}; + NR_VCPU_REGS, -enum kvm_reg_ex { VCPU_EXREG_PDPTR = NR_VCPU_REGS, VCPU_EXREG_CR3, VCPU_EXREG_RFLAGS, -- cgit From 34059c2570102870df8d8a31bd42f8d9c19cce87 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 27 Sep 2019 14:45:23 -0700 Subject: KVM: x86: Fold decache_cr3() into cache_reg() Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common cache_reg() callback and drop the dedicated decache_cr3(). The name decache_cr3() is somewhat confusing as the caching behavior of CR3 follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and has nothing in common with the caching behavior of CR0/CR4 (whose decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage). This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM. Change it to a WARN_ON_ONCE() -- if the cache never requires filling, the value is already in the right place -- and opportunistically add one in VMX to provide an equivalent check. Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 1 - 1 file changed, 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c86c95a499af..cdde7488430d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1033,7 +1033,6 @@ struct kvm_x86_ops { struct kvm_segment *var, int seg); void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l); void (*decache_cr0_guest_bits)(struct kvm_vcpu *vcpu); - void (*decache_cr3)(struct kvm_vcpu *vcpu); void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu); void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0); void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); -- cgit From 2cf9af0b566823de418eb2ff357a2f8233c718e9 Mon Sep 17 00:00:00 2001 From: "Suthikulpanit, Suravee" Date: Fri, 13 Sep 2019 19:00:49 +0000 Subject: kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm parameter Generally, APICv for all vcpus in the VM are enable/disable in the same manner. So, get_enable_apicv() should represent APICv status of the VM instead of each VCPU. Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter instead of struct kvm_vcpu. Reviewed-by: Vitaly Kuznetsov Signed-off-by: Suravee Suthikulpanit Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cdde7488430d..5d8056ff7390 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1081,7 +1081,7 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); - bool (*get_enable_apicv)(struct kvm_vcpu *vcpu); + bool (*get_enable_apicv)(struct kvm *kvm); void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu); void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr); void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr); -- cgit From 4be946728f65c10c9bb1a1580ec47a316f5ee6ac Mon Sep 17 00:00:00 2001 From: Like Xu Date: Mon, 21 Oct 2019 18:55:04 +0800 Subject: KVM: x86/vPMU: Declare kvm_pmu->reprogram_pmi field using DECLARE_BITMAP Replace the explicit declaration of "u64 reprogram_pmi" with the generic macro DECLARE_BITMAP for all possible appropriate number of bits. Suggested-by: Paolo Bonzini Signed-off-by: Like Xu Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5d8056ff7390..62f32a61c250 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -467,7 +467,7 @@ struct kvm_pmu { struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; struct irq_work irq_work; - u64 reprogram_pmi; + DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX); }; struct kvm_pmu_ops; -- cgit From e095cb7a0f57e1e2dcab93c4213808b1cd57e206 Mon Sep 17 00:00:00 2001 From: Lianbo Jiang Date: Thu, 17 Oct 2019 17:43:46 +0800 Subject: x86/kdump: Remove the unused crash_copy_backup_region() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The crash_copy_backup_region() function is unused so remove it. Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: bhe@redhat.com Cc: dhowells@redhat.com Cc: dyoung@redhat.com Cc: ebiederm@xmission.com Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jürgen Gross Cc: kexec@lists.infradead.org Cc: Thomas Gleixner Cc: Tom Lendacky Cc: vgoyal@redhat.com Cc: x86-ml Cc: Yi Wang Link: https://lkml.kernel.org/r/20191017094347.20327-3-lijiang@redhat.com --- arch/x86/include/asm/crash.h | 1 - 1 file changed, 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h index 0acf5ee45a21..089b2850f9d1 100644 --- a/arch/x86/include/asm/crash.h +++ b/arch/x86/include/asm/crash.h @@ -3,7 +3,6 @@ #define _ASM_X86_CRASH_H int crash_load_segments(struct kimage *image); -int crash_copy_backup_region(struct kimage *image); int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params); void crash_smp_send_stop(void); -- cgit From 7204160eb7809345d10c983d9d1dfbd98060a56d Mon Sep 17 00:00:00 2001 From: Aaron Lewis Date: Mon, 21 Oct 2019 16:30:20 -0700 Subject: KVM: x86: Introduce vcpu->arch.xsaves_enabled Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to vcpu->arch. Reviewed-by: Jim Mattson Signed-off-by: Aaron Lewis Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68 Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 62f32a61c250..6f6b8886a8eb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -560,6 +560,7 @@ struct kvm_vcpu_arch { u64 smbase; u64 smi_count; bool tpr_access_reporting; + bool xsaves_enabled; u64 ia32_xss; u64 microcode_version; u64 arch_capabilities; -- cgit From 671ddc700fd08b94967b1e2a937020e30c838609 Mon Sep 17 00:00:00 2001 From: Jim Mattson Date: Tue, 15 Oct 2019 10:44:05 -0700 Subject: KVM: nVMX: Don't leak L1 MMIO regions to L2 If the "virtualize APIC accesses" VM-execution control is set in the VMCS, the APIC virtualization hardware is triggered when a page walk in VMX non-root mode terminates at a PTE wherein the address of the 4k page frame matches the APIC-access address specified in the VMCS. On hardware, the APIC-access address may be any valid 4k-aligned physical address. KVM's nVMX implementation enforces the additional constraint that the APIC-access address specified in the vmcs12 must be backed by a "struct page" in L1. If not, L0 will simply clear the "virtualize APIC accesses" VM-execution control in the vmcs02. The problem with this approach is that the L1 guest has arranged the vmcs12 EPT tables--or shadow page tables, if the "enable EPT" VM-execution control is clear in the vmcs12--so that the L2 guest physical address(es)--or L2 guest linear address(es)--that reference the L2 APIC map to the APIC-access address specified in the vmcs12. Without the "virtualize APIC accesses" VM-execution control in the vmcs02, the APIC accesses in the L2 guest will directly access the APIC-access page in L1. When there is no mapping whatsoever for the APIC-access address in L1, the L2 VM just loses the intended APIC virtualization. However, when the APIC-access address is mapped to an MMIO region in L1, the L2 guest gets direct access to the L1 MMIO device. For example, if the APIC-access address specified in the vmcs12 is 0xfee00000, then L2 gets direct access to L1's APIC. Since this vmcs12 configuration is something that KVM cannot faithfully emulate, the appropriate response is to exit to userspace with KVM_INTERNAL_ERROR_EMULATION. Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12") Reported-by: Dan Cross Signed-off-by: Jim Mattson Reviewed-by: Peter Shier Reviewed-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 50eb430b0ad8..24d6598dea29 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1189,7 +1189,7 @@ struct kvm_x86_ops { int (*set_nested_state)(struct kvm_vcpu *vcpu, struct kvm_nested_state __user *user_kvm_nested_state, struct kvm_nested_state *kvm_state); - void (*get_vmcs12_pages)(struct kvm_vcpu *vcpu); + bool (*get_vmcs12_pages)(struct kvm_vcpu *vcpu); int (*smi_allowed)(struct kvm_vcpu *vcpu); int (*pre_enter_smm)(struct kvm_vcpu *vcpu, char *smstate); -- cgit From c2955f270a84762343000f103e0640d29c7a96f3 Mon Sep 17 00:00:00 2001 From: Pawan Gupta Date: Wed, 23 Oct 2019 10:45:50 +0200 Subject: x86/msr: Add the IA32_TSX_CTRL MSR Transactional Synchronization Extensions (TSX) may be used on certain processors as part of a speculative side channel attack. A microcode update for existing processors that are vulnerable to this attack will add a new MSR - IA32_TSX_CTRL to allow the system administrator the option to disable TSX as one of the possible mitigations. The CPUs which get this new MSR after a microcode upgrade are the ones which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all CPU buffers takes care of the TAA case as well. [ Note that future processors that are not vulnerable will also support the IA32_TSX_CTRL MSR. ] Add defines for the new IA32_TSX_CTRL MSR and its bits. TSX has two sub-features: 1. Restricted Transactional Memory (RTM) is an explicitly-used feature where new instructions begin and end TSX transactions. 2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of "old" style locks are used by software. Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the IA32_TSX_CTRL MSR. There are two control bits in IA32_TSX_CTRL MSR: Bit 0: When set, it disables the Restricted Transactional Memory (RTM) sub-feature of TSX (will force all transactions to abort on the XBEGIN instruction). Bit 1: When set, it disables the enumeration of the RTM and HLE feature (i.e. it will make CPUID(EAX=7).EBX{bit4} and CPUID(EAX=7).EBX{bit11} read as 0). The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally disabled by the new microcode but still enumerated as present by CPUID(EAX=7).EBX{bit4}, unless disabled by IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR. Signed-off-by: Pawan Gupta Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Tested-by: Neelima Krishnan Reviewed-by: Mark Gross Reviewed-by: Tony Luck Reviewed-by: Josh Poimboeuf --- arch/x86/include/asm/msr-index.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 20ce682a2540..da4caf6da739 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -93,6 +93,7 @@ * Microarchitectural Data * Sampling (MDS) vulnerabilities. */ +#define ARCH_CAP_TSX_CTRL_MSR BIT(7) /* MSR for TSX control is available. */ #define MSR_IA32_FLUSH_CMD 0x0000010b #define L1D_FLUSH BIT(0) /* @@ -103,6 +104,10 @@ #define MSR_IA32_BBL_CR_CTL 0x00000119 #define MSR_IA32_BBL_CR_CTL3 0x0000011e +#define MSR_IA32_TSX_CTRL 0x00000122 +#define TSX_CTRL_RTM_DISABLE BIT(0) /* Disable RTM feature */ +#define TSX_CTRL_CPUID_CLEAR BIT(1) /* Disable TSX enumeration */ + #define MSR_IA32_SYSENTER_CS 0x00000174 #define MSR_IA32_SYSENTER_ESP 0x00000175 #define MSR_IA32_SYSENTER_EIP 0x00000176 -- cgit From 1b42f017415b46c317e71d41c34ec088417a1883 Mon Sep 17 00:00:00 2001 From: Pawan Gupta Date: Wed, 23 Oct 2019 11:30:45 +0200 Subject: x86/speculation/taa: Add mitigation for TSX Async Abort TSX Async Abort (TAA) is a side channel vulnerability to the internal buffers in some Intel processors similar to Microachitectural Data Sampling (MDS). In this case, certain loads may speculatively pass invalid data to dependent operations when an asynchronous abort condition is pending in a TSX transaction. This includes loads with no fault or assist condition. Such loads may speculatively expose stale data from the uarch data structures as in MDS. Scope of exposure is within the same-thread and cross-thread. This issue affects all current processors that support TSX, but do not have ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES. On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0, CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers using VERW or L1D_FLUSH, there is no additional mitigation needed for TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by disabling the Transactional Synchronization Extensions (TSX) feature. A new MSR IA32_TSX_CTRL in future and current processors after a microcode update can be used to control the TSX feature. There are two bits in that MSR: * TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted Transactional Memory (RTM). * TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally disabled with updated microcode but still enumerated as present by CPUID(EAX=7).EBX{bit4}. The second mitigation approach is similar to MDS which is clearing the affected CPU buffers on return to user space and when entering a guest. Relevant microcode update is required for the mitigation to work. More details on this approach can be found here: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html The TSX feature can be controlled by the "tsx" command line parameter. If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is deployed. The effective mitigation state can be read from sysfs. [ bp: - massage + comments cleanup - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh. - remove partial TAA mitigation in update_mds_branch_idle() - Josh. - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g ] Signed-off-by: Pawan Gupta Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 4 ++++ arch/x86/include/asm/nospec-branch.h | 4 ++-- arch/x86/include/asm/processor.h | 7 +++++++ 4 files changed, 14 insertions(+), 2 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 0652d3eed9bd..989e03544f18 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -399,5 +399,6 @@ #define X86_BUG_MDS X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */ #define X86_BUG_MSBDS_ONLY X86_BUG(20) /* CPU is only affected by the MSDBS variant of BUG_MDS */ #define X86_BUG_SWAPGS X86_BUG(21) /* CPU is affected by speculation through SWAPGS */ +#define X86_BUG_TAA X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index da4caf6da739..b3a8bb2af0b6 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -94,6 +94,10 @@ * Sampling (MDS) vulnerabilities. */ #define ARCH_CAP_TSX_CTRL_MSR BIT(7) /* MSR for TSX control is available. */ +#define ARCH_CAP_TAA_NO BIT(8) /* + * Not susceptible to + * TSX Async Abort (TAA) vulnerabilities. + */ #define MSR_IA32_FLUSH_CMD 0x0000010b #define L1D_FLUSH BIT(0) /* diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 80bc209c0708..5c24a7b35166 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -314,7 +314,7 @@ DECLARE_STATIC_KEY_FALSE(mds_idle_clear); #include /** - * mds_clear_cpu_buffers - Mitigation for MDS vulnerability + * mds_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability * * This uses the otherwise unused and obsolete VERW instruction in * combination with microcode which triggers a CPU buffer flush when the @@ -337,7 +337,7 @@ static inline void mds_clear_cpu_buffers(void) } /** - * mds_user_clear_cpu_buffers - Mitigation for MDS vulnerability + * mds_user_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability * * Clear CPU buffers if the corresponding static key is enabled */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6e0a3b43d027..54f5d54280f6 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -988,4 +988,11 @@ enum mds_mitigations { MDS_MITIGATION_VMWERV, }; +enum taa_mitigations { + TAA_MITIGATION_OFF, + TAA_MITIGATION_UCODE_NEEDED, + TAA_MITIGATION_VERW, + TAA_MITIGATION_TSX_DISABLED, +}; + #endif /* _ASM_X86_PROCESSOR_H */ -- cgit From db4d30fbb71b47e4ecb11c4efa5d8aad4b03dfae Mon Sep 17 00:00:00 2001 From: Vineela Tummalapalli Date: Mon, 4 Nov 2019 12:22:01 +0100 Subject: x86/bugs: Add ITLB_MULTIHIT bug infrastructure Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: Vineela Tummalapalli Co-developed-by: Pawan Gupta Signed-off-by: Pawan Gupta Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 7 +++++++ 2 files changed, 8 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 989e03544f18..c4fbe379cc0b 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -400,5 +400,6 @@ #define X86_BUG_MSBDS_ONLY X86_BUG(20) /* CPU is only affected by the MSDBS variant of BUG_MDS */ #define X86_BUG_SWAPGS X86_BUG(21) /* CPU is affected by speculation through SWAPGS */ #define X86_BUG_TAA X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */ +#define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index b3a8bb2af0b6..6a3124664289 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -93,6 +93,13 @@ * Microarchitectural Data * Sampling (MDS) vulnerabilities. */ +#define ARCH_CAP_PSCHANGE_MC_NO BIT(6) /* + * The processor is not susceptible to a + * machine check error due to modifying the + * code page size along with either the + * physical address or cache type + * without TLB invalidation. + */ #define ARCH_CAP_TSX_CTRL_MSR BIT(7) /* MSR for TSX control is available. */ #define ARCH_CAP_TAA_NO BIT(8) /* * Not susceptible to -- cgit From b8e8c8303ff28c61046a4d0f6ea99aea609a7dc0 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 4 Nov 2019 12:22:02 +0100 Subject: kvm: mmu: ITLB_MULTIHIT mitigation With some Intel processors, putting the same virtual address in the TLB as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit and cause the processor to issue a machine check resulting in a CPU lockup. Unfortunately when EPT page tables use huge pages, it is possible for a malicious guest to cause this situation. Add a knob to mark huge pages as non-executable. When the nx_huge_pages parameter is enabled (and we are using EPT), all huge pages are marked as NX. If the guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable. This is not an issue for shadow paging (except nested EPT), because then the host is in control of TLB flushes and the problematic situation cannot happen. With nested EPT, again the nested guest can cause problems shadow and direct EPT is treated in the same way. [ tglx: Fixup default to auto and massage wording a bit ] Originally-by: Junaid Shahid Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/kvm_host.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 24d6598dea29..a37b03483b66 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -315,6 +315,7 @@ struct kvm_mmu_page { bool unsync; u8 mmu_valid_gen; bool mmio_cached; + bool lpage_disallowed; /* Can't be replaced by an equiv large page */ /* * The following two entries are used to key the shadow page in the @@ -946,6 +947,7 @@ struct kvm_vm_stat { ulong mmu_unsync; ulong remote_tlb_flush; ulong lpages; + ulong nx_lpage_splits; ulong max_mmu_page_hash_collisions; }; -- cgit From b907693883fdcff5b492cf0cd02a0e264623055e Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 29 Oct 2019 14:13:37 -0700 Subject: x86/vmlinux: Actually use _etext for the end of the text segment Various calculations are using the end of the exception table (which does not need to be executable) as the end of the text segment. Instead, in preparation for moving the exception table into RO_DATA, move _etext after the exception table and update the calculations. Signed-off-by: Kees Cook Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Dave Hansen Cc: Heiko Carstens Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: Michael Ellerman Cc: Michal Simek Cc: Nick Desaulniers Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Ross Zwisler Cc: Segher Boessenkool Cc: Thomas Gleixner Cc: Thomas Lendacky Cc: Will Deacon Cc: x86-ml Cc: Yoshinori Sato Link: https://lkml.kernel.org/r/20191029211351.13243-16-keescook@chromium.org --- arch/x86/include/asm/sections.h | 1 - 1 file changed, 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h index 71b32f2570ab..036c360910c5 100644 --- a/arch/x86/include/asm/sections.h +++ b/arch/x86/include/asm/sections.h @@ -6,7 +6,6 @@ #include extern char __brk_base[], __brk_limit[]; -extern struct exception_table_entry __stop___ex_table[]; extern char __end_rodata_aligned[]; #if defined(CONFIG_X86_64) -- cgit From 5494c3a6a0b965906ffdcb620d94079ea4cb69ea Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 29 Oct 2019 14:13:49 -0700 Subject: x86/mm: Report which part of kernel image is freed The memory freeing report wasn't very useful for figuring out which parts of the kernel image were being freed. Add the details for clearer reporting in dmesg. Before: Freeing unused kernel image memory: 1348K Write protecting the kernel read-only data: 20480k Freeing unused kernel image memory: 2040K Freeing unused kernel image memory: 172K After: Freeing unused kernel image (initmem) memory: 1348K Write protecting the kernel read-only data: 20480k Freeing unused kernel image (text/rodata gap) memory: 2040K Freeing unused kernel image (rodata/data gap) memory: 172K Signed-off-by: Kees Cook Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Dave Hansen Cc: Heiko Carstens Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: Michael Ellerman Cc: Michal Simek Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Segher Boessenkool Cc: Thomas Gleixner Cc: Will Deacon Cc: x86-ml Cc: Yoshinori Sato Link: https://lkml.kernel.org/r/20191029211351.13243-28-keescook@chromium.org --- arch/x86/include/asm/processor.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6e0a3b43d027..790f250d39a8 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -958,7 +958,7 @@ static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) extern unsigned long arch_align_stack(unsigned long sp); void free_init_pages(const char *what, unsigned long begin, unsigned long end); -extern void free_kernel_image_pages(void *begin, void *end); +extern void free_kernel_image_pages(const char *what, void *begin, void *end); void default_idle(void); #ifdef CONFIG_XEN -- cgit From 1aa9b9572b10529c2e64e2b8f44025d86e124308 Mon Sep 17 00:00:00 2001 From: Junaid Shahid Date: Mon, 4 Nov 2019 20:26:00 +0100 Subject: kvm: x86: mmu: Recovery of shattered NX large pages The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: Junaid Shahid Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/kvm_host.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a37b03483b66..4fc61483919a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -312,6 +312,8 @@ struct kvm_rmap_head { struct kvm_mmu_page { struct list_head link; struct hlist_node hash_link; + struct list_head lpage_disallowed_link; + bool unsync; u8 mmu_valid_gen; bool mmio_cached; @@ -860,6 +862,7 @@ struct kvm_arch { */ struct list_head active_mmu_pages; struct list_head zapped_obsolete_pages; + struct list_head lpage_disallowed_mmu_pages; struct kvm_page_track_notifier_node mmu_sp_tracker; struct kvm_page_track_notifier_head track_notifier_head; @@ -934,6 +937,7 @@ struct kvm_arch { bool exception_payload_enabled; struct kvm_pmu_event_filter *pmu_event_filter; + struct task_struct *nx_lpage_recovery_thread; }; struct kvm_vm_stat { -- cgit From b971880fe79f4042aaaf426744a5b19521bf77b3 Mon Sep 17 00:00:00 2001 From: Babu Moger Date: Tue, 5 Nov 2019 21:25:32 +0000 Subject: x86/Kconfig: Rename UMIP config parameter AMD 2nd generation EPYC processors support the UMIP (User-Mode Instruction Prevention) feature. So, rename X86_INTEL_UMIP to generic X86_UMIP and modify the text to cover both Intel and AMD. [ bp: take of the disabled-features.h copy in tools/ too. ] Signed-off-by: Babu Moger Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Ricardo Neri Cc: Thomas Gleixner Cc: "x86@kernel.org" Link: https://lkml.kernel.org/r/157298912544.17462.2018334793891409521.stgit@naples-babu.amd.com --- arch/x86/include/asm/disabled-features.h | 2 +- arch/x86/include/asm/umip.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index a5ea841cc6d2..8e1d0bb46361 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -22,7 +22,7 @@ # define DISABLE_SMAP (1<<(X86_FEATURE_SMAP & 31)) #endif -#ifdef CONFIG_X86_INTEL_UMIP +#ifdef CONFIG_X86_UMIP # define DISABLE_UMIP 0 #else # define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31)) diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h index db43f2a0d92c..aeed98c3c9e1 100644 --- a/arch/x86/include/asm/umip.h +++ b/arch/x86/include/asm/umip.h @@ -4,9 +4,9 @@ #include #include -#ifdef CONFIG_X86_INTEL_UMIP +#ifdef CONFIG_X86_UMIP bool fixup_umip_exception(struct pt_regs *regs); #else static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; } -#endif /* CONFIG_X86_INTEL_UMIP */ +#endif /* CONFIG_X86_UMIP */ #endif /* _ASM_X86_UMIP_H */ -- cgit From 6950e31b35fdf4588cbbdec1813091bb02cf8871 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:05 -0800 Subject: x86/efi: Push EFI_MEMMAP check into leaf routines In preparation for adding another EFI_MEMMAP dependent call that needs to occur before e820__memblock_setup() fixup the existing efi calls to check for EFI_MEMMAP internally. This ends up being cleaner than the alternative of checking EFI_MEMMAP multiple times in setup_arch(). Reviewed-by: Dave Hansen Reviewed-by: Ard Biesheuvel Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- arch/x86/include/asm/efi.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 43a82e59c59d..45f853bce869 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -140,7 +140,6 @@ extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); extern void efi_recover_from_page_fault(unsigned long phys_addr); extern void efi_free_boot_services(void); -extern void efi_reserve_boot_services(void); struct efi_setup_data { u64 fw_vendor; @@ -244,6 +243,8 @@ static inline bool efi_is_64bit(void) extern bool efi_reboot_required(void); extern bool efi_is_table_address(unsigned long phys_addr); +extern void efi_find_mirror(void); +extern void efi_reserve_boot_services(void); #else static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {} static inline bool efi_reboot_required(void) @@ -254,6 +255,12 @@ static inline bool efi_is_table_address(unsigned long phys_addr) { return false; } +static inline void efi_find_mirror(void) +{ +} +static inline void efi_reserve_boot_services(void) +{ +} #endif /* CONFIG_EFI */ #endif /* _ASM_X86_EFI_H */ -- cgit From 262b45ae3ab4bf8e2caf1fcfd0d8307897519630 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:16 -0800 Subject: x86/efi: EFI soft reservation to E820 enumeration UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. This patch introduces 2 new concepts at once given the entanglement between early boot enumeration relative to memory that can optionally be reserved from the kernel page allocator by default. The new concepts are: - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP attribute on EFI_CONVENTIONAL memory, update the E820 map with this new type. Only perform this classification if the CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as typical ram. - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for a device driver to search iomem resources for application specific memory. Teach the iomem code to identify such ranges as "Soft Reserved". Note that the comment for do_add_efi_memmap() needed refreshing since it seemed to imply that the efi map might overflow the e820 table, but that is not an issue as of commit 7b6e4ba3cb1f "x86/boot/e820: Clean up the E820_X_MAX definition" that removed the 128 entry limit for e820__range_add(). A follow-on change integrates parsing of the ACPI HMAT to identify the node and sub-range boundaries of EFI_MEMORY_SP designated memory. For now, just identify and reserve memory of this type. Acked-by: Ard Biesheuvel Reported-by: kbuild test robot Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- arch/x86/include/asm/e820/types.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h index c3aa4b5e49e2..314f75d886d0 100644 --- a/arch/x86/include/asm/e820/types.h +++ b/arch/x86/include/asm/e820/types.h @@ -28,6 +28,14 @@ enum e820_type { */ E820_TYPE_PRAM = 12, + /* + * Special-purpose memory is indicated to the system via the + * EFI_MEMORY_SP attribute. Define an e820 translation of this + * memory type for the purpose of reserving this range and + * marking it with the IORES_DESC_SOFT_RESERVED designation. + */ + E820_TYPE_SOFT_RESERVED = 0xefffffff, + /* * Reserved RAM used by the kernel itself if * CONFIG_INTEL_TXT=y is enabled, memory of this type -- cgit From 199c8471761273b7e287914cee968ddf21dfbfe0 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:26 -0800 Subject: x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP Given that EFI_MEMORY_SP is platform BIOS policy decision for marking memory ranges as "reserved for a specific purpose" there will inevitably be scenarios where the BIOS omits the attribute in situations where it is desired. Unlike other attributes if the OS wants to reserve this memory from the kernel the reservation needs to happen early in init. So early, in fact, that it needs to happen before e820__memblock_setup() which is a pre-requisite for efi_fake_memmap() that wants to allocate memory for the updated table. Introduce an x86 specific efi_fake_memmap_early() that can search for attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table accordingly. The KASLR code that scans the command line looking for user-directed memory reservations also needs to be updated to consider "efi_fake_mem=nn@ss:0x40000" requests. Acked-by: Ard Biesheuvel Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- arch/x86/include/asm/efi.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 45f853bce869..d028e9acdf1c 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -263,4 +263,12 @@ static inline void efi_reserve_boot_services(void) } #endif /* CONFIG_EFI */ +#ifdef CONFIG_EFI_FAKE_MEMMAP +extern void __init efi_fake_memmap_early(void); +#else +static inline void efi_fake_memmap_early(void) +{ +} +#endif + #endif /* _ASM_X86_EFI_H */ -- cgit From e380a0394c36a3a878c858418d5dd7f5f195b6fc Mon Sep 17 00:00:00 2001 From: Nicolas Saenz Julienne Date: Thu, 7 Nov 2019 16:06:45 +0100 Subject: x86/PCI: sta2x11: use default DMA address translation The devices found behind this PCIe chip have unusual DMA mapping constraints as there is an AMBA interconnect placed in between them and the different PCI endpoints. The offset between physical memory addresses and AMBA's view is provided by reading a PCI config register, which is saved and used whenever DMA mapping is needed. It turns out that this DMA setup can be represented by properly setting 'dma_pfn_offset', 'dma_bus_mask' and 'dma_mask' during the PCI device enable fixup. And ultimately allows us to get rid of this device's custom DMA functions. Aside from the code deletion and DMA setup, sta2x11_pdev_to_mapping() is moved to avoid warnings whenever CONFIG_PM is not enabled. Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Christoph Hellwig --- arch/x86/include/asm/device.h | 3 --- arch/x86/include/asm/dma-direct.h | 9 --------- 2 files changed, 12 deletions(-) delete mode 100644 arch/x86/include/asm/dma-direct.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h index a8f6c809d9b1..5e12c63b47aa 100644 --- a/arch/x86/include/asm/device.h +++ b/arch/x86/include/asm/device.h @@ -6,9 +6,6 @@ struct dev_archdata { #if defined(CONFIG_INTEL_IOMMU) || defined(CONFIG_AMD_IOMMU) void *iommu; /* hook for IOMMU specific extension */ #endif -#ifdef CONFIG_STA2X11 - bool is_sta2x11; -#endif }; #if defined(CONFIG_X86_DEV_DMA_OPS) && defined(CONFIG_PCI_DOMAINS) diff --git a/arch/x86/include/asm/dma-direct.h b/arch/x86/include/asm/dma-direct.h deleted file mode 100644 index 1a19251eaac9..000000000000 --- a/arch/x86/include/asm/dma-direct.h +++ /dev/null @@ -1,9 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef ASM_X86_DMA_DIRECT_H -#define ASM_X86_DMA_DIRECT_H 1 - -bool dma_capable(struct device *dev, dma_addr_t addr, size_t size); -dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr); -phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr); - -#endif /* ASM_X86_DMA_DIRECT_H */ -- cgit From f036c7fa0ab60b7ea47560c32c78e435eb1cd214 Mon Sep 17 00:00:00 2001 From: Yian Chen Date: Thu, 17 Oct 2019 04:39:19 -0700 Subject: iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved VT-d RMRR (Reserved Memory Region Reporting) regions are reserved for device use only and should not be part of allocable memory pool of OS. BIOS e820_table reports complete memory map to OS, including OS usable memory ranges and BIOS reserved memory ranges etc. x86 BIOS may not be trusted to include RMRR regions as reserved type of memory in its e820 memory map, hence validate every RMRR entry with the e820 memory map to make sure the RMRR regions will not be used by OS for any other purposes. ia64 EFI is working fine so implement RMRR validation as a dummy function Reviewed-by: Lu Baolu Reviewed-by: Sohil Mehta Signed-off-by: Yian Chen Signed-off-by: Joerg Roedel --- arch/x86/include/asm/iommu.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h index b91623d521d9..bf1ed2ddc74b 100644 --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -2,10 +2,28 @@ #ifndef _ASM_X86_IOMMU_H #define _ASM_X86_IOMMU_H +#include + +#include + extern int force_iommu, no_iommu; extern int iommu_detected; /* 10 seconds */ #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000) +static inline int __init +arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr) +{ + u64 start = rmrr->base_address; + u64 end = rmrr->end_address + 1; + + if (e820__mapped_all(start, end, E820_TYPE_RESERVED)) + return 0; + + pr_err(FW_BUG "No firmware reserved region can cover this RMRR [%#018Lx-%#018Lx], contact BIOS vendor for fixes\n", + start, end - 1); + return -EINVAL; +} + #endif /* _ASM_X86_IOMMU_H */ -- cgit From c0d94aa54bd893bd41ca35e2a2de332742bb167d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 12 Aug 2019 23:35:59 +0200 Subject: x86: Clean up ioremap() Use ioremap() as the main implemented function, and defines ioremap_nocache() as a deprecated alias of ioremap() in preparation of removing ioremap_nocache() entirely. Signed-off-by: Christoph Hellwig Reviewed-by: Thomas Gleixner --- arch/x86/include/asm/io.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 6bed97ff6db2..6b5cc41319a7 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -180,8 +180,6 @@ static inline unsigned int isa_virt_to_bus(volatile void *address) * The default ioremap() behavior is non-cached; if you need something * else, you probably want one of the following. */ -extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size); -#define ioremap_nocache ioremap_nocache extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size); #define ioremap_uc ioremap_uc extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); @@ -205,11 +203,9 @@ extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long * If the area you are trying to map is a PCI BAR you should have a * look at pci_iomap(). */ -static inline void __iomem *ioremap(resource_size_t offset, unsigned long size) -{ - return ioremap_nocache(offset, size); -} +void __iomem *ioremap(resource_size_t offset, unsigned long size); #define ioremap ioremap +#define ioremap_nocache ioremap extern void iounmap(volatile void __iomem *addr); #define iounmap iounmap -- cgit From d092a87073269677b7ff09e71a8d91912b7f969a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Oct 2019 08:09:38 +0200 Subject: arch: rely on asm-generic/io.h for default ioremap_* definitions Various architectures that use asm-generic/io.h still defined their own default versions of ioremap_nocache, ioremap_wt and ioremap_wc that point back to plain ioremap directly or indirectly. Remove these definitions and rely on asm-generic/io.h instead. For this to work the backup ioremap_* defintions needs to be changed to purely cpp macros instea of inlines to cover for architectures like openrisc that only define ioremap after including . Signed-off-by: Christoph Hellwig Reviewed-by: Arnd Bergmann Reviewed-by: Palmer Dabbelt --- arch/x86/include/asm/io.h | 1 - 1 file changed, 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 6b5cc41319a7..9997521fc5cd 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -205,7 +205,6 @@ extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long */ void __iomem *ioremap(resource_size_t offset, unsigned long size); #define ioremap ioremap -#define ioremap_nocache ioremap extern void iounmap(volatile void __iomem *addr); #define iounmap iounmap -- cgit From b264f57fde0c686c5c1dfdd0c21992c49196bb87 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Sun, 27 Oct 2019 16:19:38 +0100 Subject: x86/hyperv: Micro-optimize send_ipi_one() When sending an IPI to a single CPU there is no need to deal with cpumasks. With 2 CPU guest on WS2019 a minor (like 3%, 8043 -> 7761 CPU cycles) improvement with smp_call_function_single() loop benchmark can be seeb. The optimization, however, is tiny and straitforward. Also, send_ipi_one() is important for PV spinlock kick. Switching to the regular APIC IPI send for CPU > 64 case does not make sense as it is twice as expesive (12650 CPU cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu, vector)). Signed-off-by: Vitaly Kuznetsov Signed-off-by: Thomas Gleixner Reviewed-by: Michael Kelley Reviewed-by: Roman Kagan Link: https://lkml.kernel.org/r/20191027151938.7296-1-vkuznets@redhat.com --- arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h index ace464f09681..4d705cb4d63b 100644 --- a/arch/x86/include/asm/trace/hyperv.h +++ b/arch/x86/include/asm/trace/hyperv.h @@ -71,6 +71,21 @@ TRACE_EVENT(hyperv_send_ipi_mask, __entry->ncpus, __entry->vector) ); +TRACE_EVENT(hyperv_send_ipi_one, + TP_PROTO(int cpu, + int vector), + TP_ARGS(cpu, vector), + TP_STRUCT__entry( + __field(int, cpu) + __field(int, vector) + ), + TP_fast_assign(__entry->cpu = cpu; + __entry->vector = vector; + ), + TP_printk("cpu %d vector %x", + __entry->cpu, __entry->vector) + ); + #endif /* CONFIG_HYPERV */ #undef TRACE_INCLUDE_PATH -- cgit From dce7cd62754b5d4a6e401b8b0769ec94cf971041 Mon Sep 17 00:00:00 2001 From: Andrea Parri Date: Thu, 3 Oct 2019 17:52:00 +0200 Subject: x86/hyperv: Allow guests to enable InvariantTSC If the hardware supports TSC scaling, Hyper-V will set bit 15 of the HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V configuration version. Bit 15 corresponds to the AccessTscInvariantControls privilege. If this privilege bit is set, guests can access the HvSyntheticInvariantTscControl MSR: guests can set bit 0 of this synthetic MSR to enable the InvariantTSC feature. After setting the synthetic MSR, CPUID will enumerate support for InvariantTSC. Signed-off-by: Andrea Parri Signed-off-by: Thomas Gleixner Reviewed-by: Michael Kelley Reviewed-by: Vitaly Kuznetsov Link: https://lkml.kernel.org/r/20191003155200.22022-1-parri.andrea@gmail.com --- arch/x86/include/asm/hyperv-tlfs.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h index 7a2705694f5b..887b1d6a15ab 100644 --- a/arch/x86/include/asm/hyperv-tlfs.h +++ b/arch/x86/include/asm/hyperv-tlfs.h @@ -86,6 +86,8 @@ #define HV_X64_ACCESS_FREQUENCY_MSRS BIT(11) /* AccessReenlightenmentControls privilege */ #define HV_X64_ACCESS_REENLIGHTENMENT BIT(13) +/* AccessTscInvariantControls privilege */ +#define HV_X64_ACCESS_TSC_INVARIANT BIT(15) /* * Feature identification: indicates which flags were specified at partition @@ -270,6 +272,9 @@ #define HV_X64_MSR_TSC_EMULATION_CONTROL 0x40000107 #define HV_X64_MSR_TSC_EMULATION_STATUS 0x40000108 +/* TSC invariant control */ +#define HV_X64_MSR_TSC_INVARIANT_CONTROL 0x40000118 + /* * Declare the MSR used to setup pages used to communicate with the hypervisor. */ -- cgit From 2c33c27fd6033ced942c9a591b8ac15c07c57d70 Mon Sep 17 00:00:00 2001 From: Daniel Kiper Date: Tue, 12 Nov 2019 14:46:38 +0100 Subject: x86/boot: Introduce kernel_info The relationships between the headers are analogous to the various data sections: setup_header = .data boot_params/setup_data = .bss What is missing from the above list? That's right: kernel_info = .rodata We have been (ab)using .data for things that could go into .rodata or .bss for a long time, for lack of alternatives and -- especially early on -- inertia. Also, the BIOS stub is responsible for creating boot_params, so it isn't available to a BIOS-based loader (setup_data is, though). setup_header is permanently limited to 144 bytes due to the reach of the 2-byte jump field, which doubles as a length field for the structure, combined with the size of the "hole" in struct boot_params that a protected-mode loader or the BIOS stub has to copy it into. It is currently 119 bytes long, which leaves us with 25 very precious bytes. This isn't something that can be fixed without revising the boot protocol entirely, breaking backwards compatibility. boot_params proper is limited to 4096 bytes, but can be arbitrarily extended by adding setup_data entries. It cannot be used to communicate properties of the kernel image, because it is .bss and has no image-provided content. kernel_info solves this by providing an extensible place for information about the kernel image. It is readonly, because the kernel cannot rely on a bootloader copying its contents anywhere, but that is OK; if it becomes necessary it can still contain data items that an enabled bootloader would be expected to copy into a setup_data chunk. Do not bump setup_header version in arch/x86/boot/header.S because it will be followed by additional changes coming into the Linux/x86 boot protocol. Suggested-by: H. Peter Anvin (Intel) Signed-off-by: Daniel Kiper Signed-off-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Ross Philipson Reviewed-by: H. Peter Anvin (Intel) Cc: Andy Lutomirski Cc: ard.biesheuvel@linaro.org Cc: Boris Ostrovsky Cc: dave.hansen@linux.intel.com Cc: eric.snowberg@oracle.com Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Juergen Gross Cc: kanth.ghatraju@oracle.com Cc: linux-doc@vger.kernel.org Cc: linux-efi Cc: Peter Zijlstra Cc: rdunlap@infradead.org Cc: ross.philipson@oracle.com Cc: Thomas Gleixner Cc: x86-ml Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191112134640.16035-2-daniel.kiper@oracle.com --- arch/x86/include/uapi/asm/bootparam.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index c895df5482c5..a1ebcd7a991c 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -88,6 +88,7 @@ struct setup_header { __u64 pref_address; __u32 init_size; __u32 handover_offset; + __u32 kernel_info_offset; } __attribute__((packed)); struct sys_desc_table { -- cgit From 00cd1c154d565c62ad5e065bf3530f68bdf59490 Mon Sep 17 00:00:00 2001 From: Daniel Kiper Date: Tue, 12 Nov 2019 14:46:39 +0100 Subject: x86/boot: Introduce kernel_info.setup_type_max This field contains maximal allowed type for setup_data. Do not bump setup_header version in arch/x86/boot/header.S because it will be followed by additional changes coming into the Linux/x86 boot protocol. Suggested-by: H. Peter Anvin (Intel) Signed-off-by: Daniel Kiper Signed-off-by: Borislav Petkov Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Ross Philipson Reviewed-by: H. Peter Anvin (Intel) Cc: Andy Lutomirski Cc: ard.biesheuvel@linaro.org Cc: Boris Ostrovsky Cc: dave.hansen@linux.intel.com Cc: eric.snowberg@oracle.com Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Juergen Gross Cc: kanth.ghatraju@oracle.com Cc: linux-doc@vger.kernel.org Cc: linux-efi Cc: Peter Zijlstra Cc: rdunlap@infradead.org Cc: ross.philipson@oracle.com Cc: Thomas Gleixner Cc: x86-ml Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191112134640.16035-3-daniel.kiper@oracle.com --- arch/x86/include/uapi/asm/bootparam.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index a1ebcd7a991c..dbb41128e5a0 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -11,6 +11,9 @@ #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE 6 +/* max(SETUP_*) */ +#define SETUP_TYPE_MAX SETUP_JAILHOUSE + /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF #define RAMDISK_PROMPT_FLAG 0x8000 -- cgit From b3c72fc9a78e74161f9d05ef7191706060628f8c Mon Sep 17 00:00:00 2001 From: Daniel Kiper Date: Tue, 12 Nov 2019 14:46:40 +0100 Subject: x86/boot: Introduce setup_indirect The setup_data is a bit awkward to use for extremely large data objects, both because the setup_data header has to be adjacent to the data object and because it has a 32-bit length field. However, it is important that intermediate stages of the boot process have a way to identify which chunks of memory are occupied by kernel data. Thus introduce an uniform way to specify such indirect data as setup_indirect struct and SETUP_INDIRECT type. And finally bump setup_header version in arch/x86/boot/header.S. Suggested-by: H. Peter Anvin (Intel) Signed-off-by: Daniel Kiper Signed-off-by: Borislav Petkov Reviewed-by: Ross Philipson Reviewed-by: H. Peter Anvin (Intel) Acked-by: Konrad Rzeszutek Wilk Cc: Andy Lutomirski Cc: ard.biesheuvel@linaro.org Cc: Boris Ostrovsky Cc: dave.hansen@linux.intel.com Cc: eric.snowberg@oracle.com Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Juergen Gross Cc: kanth.ghatraju@oracle.com Cc: linux-doc@vger.kernel.org Cc: linux-efi Cc: Peter Zijlstra Cc: rdunlap@infradead.org Cc: ross.philipson@oracle.com Cc: Thomas Gleixner Cc: x86-ml Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191112134640.16035-4-daniel.kiper@oracle.com --- arch/x86/include/uapi/asm/bootparam.h | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index dbb41128e5a0..949066b5398a 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_BOOTPARAM_H #define _ASM_X86_BOOTPARAM_H -/* setup_data types */ +/* setup_data/setup_indirect types */ #define SETUP_NONE 0 #define SETUP_E820_EXT 1 #define SETUP_DTB 2 @@ -11,8 +11,10 @@ #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE 6 -/* max(SETUP_*) */ -#define SETUP_TYPE_MAX SETUP_JAILHOUSE +#define SETUP_INDIRECT (1<<31) + +/* SETUP_INDIRECT | max(SETUP_*) */ +#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE) /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF @@ -52,6 +54,14 @@ struct setup_data { __u8 data[0]; }; +/* extensible setup indirect data node */ +struct setup_indirect { + __u32 type; + __u32 reserved; /* Reserved, must be set to zero. */ + __u64 len; + __u64 addr; +}; + struct setup_header { __u8 setup_sects; __u16 root_flags; -- cgit From 562955fe6a558b9ef98ad87c470314946338cb2f Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 8 Nov 2019 13:11:39 -0500 Subject: ftrace/x86: Add register_ftrace_direct() for custom trampolines Enable x86 to allow for register_ftrace_direct(), where a custom trampoline may be called directly from an ftrace mcount/fentry location. Signed-off-by: Steven Rostedt (VMware) --- arch/x86/include/asm/ftrace.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h index c38a66661576..c2a7458f912c 100644 --- a/arch/x86/include/asm/ftrace.h +++ b/arch/x86/include/asm/ftrace.h @@ -28,6 +28,19 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr) return addr; } +/* + * When a ftrace registered caller is tracing a function that is + * also set by a register_ftrace_direct() call, it needs to be + * differentiated in the ftrace_caller trampoline. To do this, we + * place the direct caller in the ORIG_AX part of pt_regs. This + * tells the ftrace_caller that there's a direct caller. + */ +static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs, unsigned long addr) +{ + /* Emulate a call */ + regs->orig_ax = addr; +} + #ifdef CONFIG_DYNAMIC_FTRACE struct dyn_arch_ftrace { -- cgit From 77ac117b3a82251b109ffc5daf7d1c5392734be3 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 8 Nov 2019 16:51:00 -0600 Subject: ftrace/x86: Tell objtool to ignore nondeterministic ftrace stack layout Objtool complains about the new ftrace direct trampoline code: arch/x86/kernel/ftrace_64.o: warning: objtool: ftrace_regs_caller()+0x190: stack state mismatch: cfa1=7+16 cfa2=7+24 Typically, code has a deterministic stack layout, such that at a given instruction address, the stack frame size is always the same. That's not the case for the new ftrace_regs_caller() code after it adjusts the stack for the direct case. Just plead ignorance and assume it's always the non-direct path. Note this creates a tiny window for ORC to get confused. Link: http://lkml.kernel.org/r/20191108225100.ea3bhsbdf6oerj6g@treble Reported-by: Steven Rostedt Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (VMware) --- arch/x86/include/asm/unwind_hints.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h index 0bcdb1279361..f5e2eb12cb71 100644 --- a/arch/x86/include/asm/unwind_hints.h +++ b/arch/x86/include/asm/unwind_hints.h @@ -86,6 +86,14 @@ UNWIND_HINT sp_offset=\sp_offset .endm +.macro UNWIND_HINT_SAVE + UNWIND_HINT type=UNWIND_HINT_TYPE_SAVE +.endm + +.macro UNWIND_HINT_RESTORE + UNWIND_HINT type=UNWIND_HINT_TYPE_RESTORE +.endm + #else /* !__ASSEMBLY__ */ #define UNWIND_HINT(sp_reg, sp_offset, type, end) \ -- cgit From 4e3f77d8419b6787f3eb4d4f5178f459d693f9bb Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Mon, 11 Nov 2019 15:46:26 +0100 Subject: xen/mcelog: add PPIN to record when available This is to augment commit 3f5a7896a5 ("x86/mce: Include the PPIN in MCE records when available"). I'm also adding "synd" and "ipid" fields to struct xen_mce, in an attempt to keep field offsets in sync with struct mce. These two fields won't get populated for now, though. Signed-off-by: Jan Beulich Reviewed-by: Boris Ostrovsky Signed-off-by: Juergen Gross --- arch/x86/include/asm/msr-index.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 20ce682a2540..6ec319ecb001 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -393,6 +393,8 @@ #define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140 #define MSR_AMD64_OSVW_STATUS 0xc0010141 +#define MSR_AMD_PPIN_CTL 0xc00102f0 +#define MSR_AMD_PPIN 0xc00102f1 #define MSR_AMD64_LS_CFG 0xc0011020 #define MSR_AMD64_DC_CFG 0xc0011022 #define MSR_AMD64_BU_CFG2 0xc001102a -- cgit From 112eee5d06007dae561f14458bde7f2a4879ef4e Mon Sep 17 00:00:00 2001 From: Lianbo Jiang Date: Fri, 8 Nov 2019 17:00:27 +0800 Subject: x86/crash: Add a forward declaration of struct kimage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a forward declaration of struct kimage to the crash.h header because future changes will invoke a crash-specific function from the realmode init path and the compiler will complain otherwise like this: In file included from arch/x86/realmode/init.c:11: ./arch/x86/include/asm/crash.h:5:32: warning: ‘struct kimage’ declared inside\ parameter list will not be visible outside of this definition or declaration 5 | int crash_load_segments(struct kimage *image); | ^~~~~~ ./arch/x86/include/asm/crash.h:6:37: warning: ‘struct kimage’ declared inside\ parameter list will not be visible outside of this definition or declaration 6 | int crash_copy_backup_region(struct kimage *image); | ^~~~~~ ./arch/x86/include/asm/crash.h:7:39: warning: ‘struct kimage’ declared inside\ parameter list will not be visible outside of this definition or declaration 7 | int crash_setup_memmap_entries(struct kimage *image, | [ bp: Rewrite the commit message. ] Reported-by: kbuild test robot Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: bhe@redhat.com Cc: d.hatayama@fujitsu.com Cc: dhowells@redhat.com Cc: dyoung@redhat.com Cc: ebiederm@xmission.com Cc: horms@verge.net.au Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jürgen Gross Cc: kexec@lists.infradead.org Cc: Thomas Gleixner Cc: Tom Lendacky Cc: vgoyal@redhat.com Cc: x86-ml Link: https://lkml.kernel.org/r/20191108090027.11082-4-lijiang@redhat.com Link: https://lkml.kernel.org/r/201910310233.EJRtTMWP%25lkp@intel.com --- arch/x86/include/asm/crash.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h index 0acf5ee45a21..ef5638f641f2 100644 --- a/arch/x86/include/asm/crash.h +++ b/arch/x86/include/asm/crash.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_CRASH_H #define _ASM_X86_CRASH_H +struct kimage; + int crash_load_segments(struct kimage *image); int crash_copy_backup_region(struct kimage *image); int crash_setup_memmap_entries(struct kimage *image, -- cgit From 6f599d84231fd27e42f4ca2a786a6641e8cddf00 Mon Sep 17 00:00:00 2001 From: Lianbo Jiang Date: Fri, 8 Nov 2019 17:00:25 +0800 Subject: x86/kdump: Always reserve the low 1M when the crashkernel option is specified MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On x86, purgatory() copies the first 640K of memory to a backup region because the kernel needs those first 640K for the real mode trampoline during boot, among others. However, when SME is enabled, the kernel cannot properly copy the old memory to the backup area but reads only its encrypted contents. The result is that the crash tool gets invalid pointers when parsing vmcore: crash> kmem -s|grep -i invalid kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4 kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4 crash> So reserve the remaining low 1M memory when the crashkernel option is specified (after reserving real mode memory) so that allocated memory does not fall into the low 1M area and thus the copying of the contents of the first 640k to a backup region in purgatory() can be avoided altogether. This way, it does not need to be included in crash dumps or used for anything except the trampolines that must live in the low 1M. [ bp: Heavily rewrite commit message, flip check logic in crash_reserve_low_1M().] Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: bhe@redhat.com Cc: Dave Young Cc: d.hatayama@fujitsu.com Cc: dhowells@redhat.com Cc: ebiederm@xmission.com Cc: horms@verge.net.au Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jürgen Gross Cc: kexec@lists.infradead.org Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tom Lendacky Cc: vgoyal@redhat.com Cc: x86-ml Link: https://lkml.kernel.org/r/20191108090027.11082-2-lijiang@redhat.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=204793 --- arch/x86/include/asm/crash.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h index ef5638f641f2..88eadd08ad70 100644 --- a/arch/x86/include/asm/crash.h +++ b/arch/x86/include/asm/crash.h @@ -10,4 +10,10 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params); void crash_smp_send_stop(void); +#ifdef CONFIG_KEXEC_CORE +void __init crash_reserve_low_1M(void); +#else +static inline void __init crash_reserve_low_1M(void) { } +#endif + #endif /* _ASM_X86_CRASH_H */ -- cgit From 7c321eb2b843bf25946100b7e2de4054f71ec068 Mon Sep 17 00:00:00 2001 From: Lianbo Jiang Date: Fri, 8 Nov 2019 17:00:26 +0800 Subject: x86/kdump: Remove the backup region handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When the crashkernel kernel command line option is specified, the low 1M memory will always be reserved now. Therefore, it's not necessary to create a backup region anymore and also no need to copy the contents of the first 640k to it. Remove all the code related to handling that backup region. [ bp: Massage commit message. ] Signed-off-by: Lianbo Jiang Signed-off-by: Borislav Petkov Cc: bhe@redhat.com Cc: Dave Young Cc: d.hatayama@fujitsu.com Cc: dhowells@redhat.com Cc: ebiederm@xmission.com Cc: horms@verge.net.au Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jürgen Gross Cc: kexec@lists.infradead.org Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tom Lendacky Cc: vgoyal@redhat.com Cc: x86-ml Link: https://lkml.kernel.org/r/20191108090027.11082-3-lijiang@redhat.com --- arch/x86/include/asm/kexec.h | 10 ---------- arch/x86/include/asm/purgatory.h | 10 ---------- 2 files changed, 20 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 5e7d6b46de97..6802c59e8252 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -66,10 +66,6 @@ struct kimage; # define KEXEC_ARCH KEXEC_ARCH_X86_64 #endif -/* Memory to backup during crash kdump */ -#define KEXEC_BACKUP_SRC_START (0UL) -#define KEXEC_BACKUP_SRC_END (640 * 1024UL - 1) /* 640K */ - /* * This function is responsible for capturing register states if coming * via panic otherwise just fix up the ss and sp if coming via kernel @@ -154,12 +150,6 @@ struct kimage_arch { pud_t *pud; pmd_t *pmd; pte_t *pte; - /* Details of backup region */ - unsigned long backup_src_start; - unsigned long backup_src_sz; - - /* Physical address of backup segment */ - unsigned long backup_load_addr; /* Core ELF header buffer */ void *elf_headers; diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h index 92c34e517da1..5528e9325049 100644 --- a/arch/x86/include/asm/purgatory.h +++ b/arch/x86/include/asm/purgatory.h @@ -6,16 +6,6 @@ #include extern void purgatory(void); -/* - * These forward declarations serve two purposes: - * - * 1) Make sparse happy when checking arch/purgatory - * 2) Document that these are required to be global so the symbol - * lookup in kexec works - */ -extern unsigned long purgatory_backup_dest; -extern unsigned long purgatory_backup_src; -extern unsigned long purgatory_backup_sz; #endif /* __ASSEMBLY__ */ #endif /* _ASM_PURGATORY_H */ -- cgit From 90dc392fc445ee2fc17c2617e306774b269386ac Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 13 Nov 2019 08:18:34 +0100 Subject: x86: Remove the calgary IOMMU driver The calgary IOMMU was only used on high-end IBM systems in the early x86_64 age and has no known users left. Remove it to avoid having to touch it for pending changes to the DMA API. Signed-off-by: Christoph Hellwig Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20191113071836.21041-2-hch@lst.de --- arch/x86/include/asm/calgary.h | 57 ------------------------------------------ arch/x86/include/asm/pci_64.h | 14 ----------- arch/x86/include/asm/tce.h | 35 -------------------------- 3 files changed, 106 deletions(-) delete mode 100644 arch/x86/include/asm/calgary.h delete mode 100644 arch/x86/include/asm/tce.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/calgary.h b/arch/x86/include/asm/calgary.h deleted file mode 100644 index facd374a1bf7..000000000000 --- a/arch/x86/include/asm/calgary.h +++ /dev/null @@ -1,57 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Derived from include/asm-powerpc/iommu.h - * - * Copyright IBM Corporation, 2006-2007 - * - * Author: Jon Mason - * Author: Muli Ben-Yehuda - */ - -#ifndef _ASM_X86_CALGARY_H -#define _ASM_X86_CALGARY_H - -#include -#include -#include -#include -#include - -struct iommu_table { - const struct cal_chipset_ops *chip_ops; /* chipset specific funcs */ - unsigned long it_base; /* mapped address of tce table */ - unsigned long it_hint; /* Hint for next alloc */ - unsigned long *it_map; /* A simple allocation bitmap for now */ - void __iomem *bbar; /* Bridge BAR */ - u64 tar_val; /* Table Address Register */ - struct timer_list watchdog_timer; - spinlock_t it_lock; /* Protects it_map */ - unsigned int it_size; /* Size of iommu table in entries */ - unsigned char it_busno; /* Bus number this table belongs to */ -}; - -struct cal_chipset_ops { - void (*handle_quirks)(struct iommu_table *tbl, struct pci_dev *dev); - void (*tce_cache_blast)(struct iommu_table *tbl); - void (*dump_error_regs)(struct iommu_table *tbl); -}; - -#define TCE_TABLE_SIZE_UNSPECIFIED ~0 -#define TCE_TABLE_SIZE_64K 0 -#define TCE_TABLE_SIZE_128K 1 -#define TCE_TABLE_SIZE_256K 2 -#define TCE_TABLE_SIZE_512K 3 -#define TCE_TABLE_SIZE_1M 4 -#define TCE_TABLE_SIZE_2M 5 -#define TCE_TABLE_SIZE_4M 6 -#define TCE_TABLE_SIZE_8M 7 - -extern int use_calgary; - -#ifdef CONFIG_CALGARY_IOMMU -extern int detect_calgary(void); -#else -static inline int detect_calgary(void) { return -ENODEV; } -#endif - -#endif /* _ASM_X86_CALGARY_H */ diff --git a/arch/x86/include/asm/pci_64.h b/arch/x86/include/asm/pci_64.h index f5411de0ae11..4e1aef506aa5 100644 --- a/arch/x86/include/asm/pci_64.h +++ b/arch/x86/include/asm/pci_64.h @@ -4,20 +4,6 @@ #ifdef __KERNEL__ -#ifdef CONFIG_CALGARY_IOMMU -static inline void *pci_iommu(struct pci_bus *bus) -{ - struct pci_sysdata *sd = bus->sysdata; - return sd->iommu; -} - -static inline void set_pci_iommu(struct pci_bus *bus, void *val) -{ - struct pci_sysdata *sd = bus->sysdata; - sd->iommu = val; -} -#endif /* CONFIG_CALGARY_IOMMU */ - extern int (*pci_config_read)(int seg, int bus, int dev, int fn, int reg, int len, u32 *value); extern int (*pci_config_write)(int seg, int bus, int dev, int fn, diff --git a/arch/x86/include/asm/tce.h b/arch/x86/include/asm/tce.h deleted file mode 100644 index 6ed2deacf1d0..000000000000 --- a/arch/x86/include/asm/tce.h +++ /dev/null @@ -1,35 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * This file is derived from asm-powerpc/tce.h. - * - * Copyright (C) IBM Corporation, 2006 - * - * Author: Muli Ben-Yehuda - * Author: Jon Mason - */ - -#ifndef _ASM_X86_TCE_H -#define _ASM_X86_TCE_H - -extern unsigned int specified_table_size; -struct iommu_table; - -#define TCE_ENTRY_SIZE 8 /* in bytes */ - -#define TCE_READ_SHIFT 0 -#define TCE_WRITE_SHIFT 1 -#define TCE_HUBID_SHIFT 2 /* unused */ -#define TCE_RSVD_SHIFT 8 /* unused */ -#define TCE_RPN_SHIFT 12 -#define TCE_UNUSED_SHIFT 48 /* unused */ - -#define TCE_RPN_MASK 0x0000fffffffff000ULL - -extern void tce_build(struct iommu_table *tbl, unsigned long index, - unsigned int npages, unsigned long uaddr, int direction); -extern void tce_free(struct iommu_table *tbl, long index, unsigned int npages); -extern void * __init alloc_tce_table(void); -extern void __init free_tce_table(void *tbl); -extern int __init build_tce_table(struct pci_dev *dev, void __iomem *bbar); - -#endif /* _ASM_X86_TCE_H */ -- cgit From 948fdcf94289b15f86ce8206abd92a3f7d12360a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 13 Nov 2019 08:18:35 +0100 Subject: x86/pci: Remove pci_64.h This file only contains external declarations for two non-existing function pointers. Signed-off-by: Christoph Hellwig Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20191113071836.21041-3-hch@lst.de --- arch/x86/include/asm/pci.h | 4 ---- arch/x86/include/asm/pci_64.h | 14 -------------- 2 files changed, 18 deletions(-) delete mode 100644 arch/x86/include/asm/pci_64.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index e662f987dfa2..d9e28aad2738 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -120,10 +120,6 @@ void native_restore_msi_irqs(struct pci_dev *dev); #endif #endif /* __KERNEL__ */ -#ifdef CONFIG_X86_64 -#include -#endif - /* generic pci stuff */ #include diff --git a/arch/x86/include/asm/pci_64.h b/arch/x86/include/asm/pci_64.h deleted file mode 100644 index 4e1aef506aa5..000000000000 --- a/arch/x86/include/asm/pci_64.h +++ /dev/null @@ -1,14 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ASM_X86_PCI_64_H -#define _ASM_X86_PCI_64_H - -#ifdef __KERNEL__ - -extern int (*pci_config_read)(int seg, int bus, int dev, int fn, - int reg, int len, u32 *value); -extern int (*pci_config_write)(int seg, int bus, int dev, int fn, - int reg, int len, u32 value); - -#endif /* __KERNEL__ */ - -#endif /* _ASM_X86_PCI_64_H */ -- cgit From b52b0c4fc977901c243756e191f3fc686725b7d9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 13 Nov 2019 08:18:36 +0100 Subject: x86/pci: Remove #ifdef __KERNEL__ guard from pci.h is not a UAPI header, so the __KERNEL__ ifdef is rather pointless. Signed-off-by: Christoph Hellwig Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20191113071836.21041-4-hch@lst.de --- arch/x86/include/asm/pci.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index d9e28aad2738..90d0731fdcb6 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -12,8 +12,6 @@ #include #include -#ifdef __KERNEL__ - struct pci_sysdata { int domain; /* PCI domain */ int node; /* NUMA node */ @@ -118,7 +116,6 @@ void native_restore_msi_irqs(struct pci_dev *dev); #define native_setup_msi_irqs NULL #define native_teardown_msi_irq NULL #endif -#endif /* __KERNEL__ */ /* generic pci stuff */ #include -- cgit From a6da0d77e98e94fa66187a5ce3cf7e11fbf95503 Mon Sep 17 00:00:00 2001 From: Like Xu Date: Sun, 27 Oct 2019 18:52:42 +0800 Subject: KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is a heavyweight and high-frequency operation, especially when host disables the watchdog (maximum 21000000 ns) which leads to an unacceptable latency of the guest NMI handler. It limits the use of vPMUs in the guest. When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop and release its existing perf_event (if any) every time EVEN in most cases almost the same requested perf_event will be created and configured again. For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl' for fixed) is the same as its current config AND a new sample period based on pmc->counter is accepted by host perf interface, the current event could be reused safely as a new created one does. Otherwise, do release the undesirable perf_event and reprogram a new one as usual. It's light-weight to call pmc_pause_counter (disable, read and reset event) and pmc_resume_counter (recalibrate period and re-enable event) as guest expects instead of release-and-create again on any condition. Compared to use the filterable event->attr or hw.config, a new 'u64 current_config' field is added to save the last original programed config for each vPMC. Based on this implementation, the number of calls to pmc_reprogram_counter is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event. In the usage of multiplexing perf sampling mode, the average latency of the guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up). If host disables watchdog, the minimum latecy of guest NMI handler could be speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average. Suggested-by: Kan Liang Signed-off-by: Like Xu Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6f6b8886a8eb..a87a6c98adee 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -449,6 +449,11 @@ struct kvm_pmc { u64 eventsel; struct perf_event *perf_event; struct kvm_vcpu *vcpu; + /* + * eventsel value for general purpose counters, + * ctrl value for fixed counters. + */ + u64 current_config; }; struct kvm_pmu { -- cgit From b35e5548b41131eb06de041af2f5fb0890d96f96 Mon Sep 17 00:00:00 2001 From: Like Xu Date: Sun, 27 Oct 2019 18:52:43 +0800 Subject: KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently, a host perf_event is created for a vPMC functionality emulation. It’s unpredictable to determine if a disabled perf_event will be reused. If they are disabled and are not reused for a considerable period of time, those obsolete perf_events would increase host context switch overhead that could have been avoided. If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu sched time slice, and its independent enable bit of the vPMC isn't set, we can predict that the guest has finished the use of this vPMC, and then do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in. This lazy mechanism delays the event release time to the beginning of the next scheduled time slice if vPMC's MSRs aren't changed during this time slice. If guest comes back to use this vPMC in next time slice, a new perf event would be re-created via perf_event_create_kernel_counter() as usual. Suggested-by: Wei Wang Suggested-by: Paolo Bonzini Signed-off-by: Like Xu Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a87a6c98adee..20bb2fc0883a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -473,6 +473,20 @@ struct kvm_pmu { struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; struct irq_work irq_work; DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX); + DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX); + DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX); + + /* + * The gate to release perf_events not marked in + * pmc_in_use only once in a vcpu time slice. + */ + bool need_cleanup; + + /* + * The total number of programmed perf_events and it helps to avoid + * redundant check before cleanup if guest don't use vPMU at all. + */ + u8 event_count; }; struct kvm_pmu_ops; -- cgit From 7ee30bc132c683d06a6d9e360e39e483e3990708 Mon Sep 17 00:00:00 2001 From: Nitesh Narayan Lal Date: Thu, 7 Nov 2019 07:53:43 -0500 Subject: KVM: x86: deliver KVM IOAPIC scan request to target vCPUs In IOAPIC fixed delivery mode instead of flushing the scan requests to all vCPUs, we should only send the requests to vCPUs specified within the destination field. This patch introduces kvm_get_dest_vcpus_mask() API which retrieves an array of target vCPUs by using kvm_apic_map_get_dest_lapic() and then based on the vcpus_idx, it sets the bit in a bitmap. However, if the above fails kvm_get_dest_vcpus_mask() finds the target vCPUs by traversing all available vCPUs. Followed by setting the bits in the bitmap. If we had different vCPUs in the previous request for the same redirection table entry then bits corresponding to these vCPUs are also set. This to done to keep ioapic_handled_vectors synchronized. This bitmap is then eventually passed on to kvm_make_vcpus_request_mask() to generate a masked request only for the target vCPUs. This would enable us to reduce the latency overhead on isolated vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN. Suggested-by: Marcelo Tosatti Signed-off-by: Nitesh Narayan Lal Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 20bb2fc0883a..898ab9eb4dc8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1588,6 +1588,8 @@ bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); void kvm_make_mclock_inprogress_request(struct kvm *kvm); void kvm_make_scan_ioapic_request(struct kvm *kvm); +void kvm_make_scan_ioapic_request_mask(struct kvm *kvm, + unsigned long *vcpu_bitmap); void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); -- cgit From caf5e32d4ea7253820f38dd7c429f8d4a8019c5f Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 4 Nov 2019 21:17:26 +0100 Subject: y2038: ipc: remove __kernel_time_t reference from headers There are two structures based on time_t that conflict between libc and kernel: timeval and timespec. Both are now renamed to __kernel_old_timeval and __kernel_old_timespec. For time_t, the old typedef is still __kernel_time_t. There is nothing wrong with that name, but it would be nice to not use that going forward as this type is used almost only in deprecated interfaces because of the y2038 overflow. In the IPC headers (msgbuf.h, sembuf.h, shmbuf.h), __kernel_time_t is only used for the 64-bit variants, which are not deprecated. Change these to a plain 'long', which is the same type as __kernel_time_t on all 64-bit architectures anyway, to reduce the number of users of the old type. Signed-off-by: Arnd Bergmann --- arch/x86/include/uapi/asm/msgbuf.h | 6 +++--- arch/x86/include/uapi/asm/sembuf.h | 4 ++-- arch/x86/include/uapi/asm/shmbuf.h | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/msgbuf.h b/arch/x86/include/uapi/asm/msgbuf.h index 90ab9a795b49..7c5bb43ed8af 100644 --- a/arch/x86/include/uapi/asm/msgbuf.h +++ b/arch/x86/include/uapi/asm/msgbuf.h @@ -15,9 +15,9 @@ struct msqid64_ds { struct ipc64_perm msg_perm; - __kernel_time_t msg_stime; /* last msgsnd time */ - __kernel_time_t msg_rtime; /* last msgrcv time */ - __kernel_time_t msg_ctime; /* last change time */ + __kernel_long_t msg_stime; /* last msgsnd time */ + __kernel_long_t msg_rtime; /* last msgrcv time */ + __kernel_long_t msg_ctime; /* last change time */ __kernel_ulong_t msg_cbytes; /* current number of bytes on queue */ __kernel_ulong_t msg_qnum; /* number of messages in queue */ __kernel_ulong_t msg_qbytes; /* max number of bytes on queue */ diff --git a/arch/x86/include/uapi/asm/sembuf.h b/arch/x86/include/uapi/asm/sembuf.h index 89de6cd9f0a7..7c1b156695ba 100644 --- a/arch/x86/include/uapi/asm/sembuf.h +++ b/arch/x86/include/uapi/asm/sembuf.h @@ -21,9 +21,9 @@ struct semid64_ds { unsigned long sem_ctime; /* last change time */ unsigned long sem_ctime_high; #else - __kernel_time_t sem_otime; /* last semop time */ + long sem_otime; /* last semop time */ __kernel_ulong_t __unused1; - __kernel_time_t sem_ctime; /* last change time */ + long sem_ctime; /* last change time */ __kernel_ulong_t __unused2; #endif __kernel_ulong_t sem_nsems; /* no. of semaphores in array */ diff --git a/arch/x86/include/uapi/asm/shmbuf.h b/arch/x86/include/uapi/asm/shmbuf.h index 644421f3823b..f0305dc660c9 100644 --- a/arch/x86/include/uapi/asm/shmbuf.h +++ b/arch/x86/include/uapi/asm/shmbuf.h @@ -16,9 +16,9 @@ struct shmid64_ds { struct ipc64_perm shm_perm; /* operation perms */ size_t shm_segsz; /* size of segment (bytes) */ - __kernel_time_t shm_atime; /* last attach time */ - __kernel_time_t shm_dtime; /* last detach time */ - __kernel_time_t shm_ctime; /* last change time */ + __kernel_long_t shm_atime; /* last attach time */ + __kernel_long_t shm_dtime; /* last detach time */ + __kernel_long_t shm_ctime; /* last change time */ __kernel_pid_t shm_cpid; /* pid of creator */ __kernel_pid_t shm_lpid; /* pid of last operator */ __kernel_ulong_t shm_nattch; /* no. of current attaches */ -- cgit From db8c33f8b5bea59d00ca12dcd6b65d01b1ea98ef Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Mon, 16 Sep 2019 15:39:58 -0700 Subject: x86/cpu: Align the x86_capability array to size of unsigned long The x86_capability array in cpuinfo_x86 is of type u32 and thus is naturally aligned to 4 bytes. But, set_bit() and clear_bit() require the array to be aligned to size of unsigned long (i.e. 8 bytes on 64-bit systems). The array pointer is handed into atomic bit operations. If the access is not aligned to unsigned long then the atomic bit operations can end up crossing a cache line boundary, which causes the CPU to do a full bus lock as it can't lock both cache lines at once. The bus lock operation is heavy weight and can cause severe performance degradation. The upcoming #AC split lock detection mechanism will issue warnings for this kind of access. Force the alignment of the array to unsigned long. This avoids the massive code changes which would be required when converting the array data type to unsigned long. [ tglx: Rewrote changelog so it contains information WHY this is required ] Suggested-by: David Laight Suggested-by: Thomas Gleixner Signed-off-by: Fenghua Yu Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20190916223958.27048-4-tony.luck@intel.com --- arch/x86/include/asm/processor.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6e0a3b43d027..c073534ca485 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -93,7 +93,15 @@ struct cpuinfo_x86 { __u32 extended_cpuid_level; /* Maximum supported CPUID level, -1=no CPUID: */ int cpuid_level; - __u32 x86_capability[NCAPINTS + NBUGINTS]; + /* + * Align to size of unsigned long because the x86_capability array + * is passed to bitops which require the alignment. Use unnamed + * union to enforce the array is aligned to size of unsigned long. + */ + union { + __u32 x86_capability[NCAPINTS + NBUGINTS]; + unsigned long x86_capability_alignment; + }; char x86_vendor_id[16]; char x86_model_id[64]; /* in KB - valid for CPUS which support this call: */ -- cgit From c3d6324f841bab2403be6419986e2b1d1068d423 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 5 Jun 2019 10:48:37 +0200 Subject: x86/alternatives: Teach text_poke_bp() to emulate instructions In preparation for static_call and variable size jump_label support, teach text_poke_bp() to emulate instructions, namely: JMP32, JMP8, CALL, NOP2, NOP_ATOMIC5, INT3 The current text_poke_bp() takes a @handler argument which is used as a jump target when the temporary INT3 is hit by a different CPU. When patching CALL instructions, this doesn't work because we'd miss the PUSH of the return address. Instead, teach poke_int3_handler() to emulate an instruction, typically the instruction we're patching in. This fits almost all text_poke_bp() users, except arch_unoptimize_kprobe() which restores random text, and for that site we have to build an explicit emulate instruction. Tested-by: Alexei Starovoitov Tested-by: Steven Rostedt (VMware) Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Masami Hiramatsu Reviewed-by: Daniel Bristot de Oliveira Acked-by: Alexei Starovoitov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Steven Rostedt Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20191111132457.529086974@infradead.org Signed-off-by: Ingo Molnar (cherry picked from commit 8c7eebc10687af45ac8e40ad1bac0cf7893dba9f) Signed-off-by: Alexei Starovoitov --- arch/x86/include/asm/text-patching.h | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 5e8319bb207a..23c626a742e8 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -26,10 +26,11 @@ static inline void apply_paravirt(struct paravirt_patch_site *start, #define POKE_MAX_OPCODE_SIZE 5 struct text_poke_loc { - void *detour; void *addr; - size_t len; - const char opcode[POKE_MAX_OPCODE_SIZE]; + int len; + s32 rel32; + u8 opcode; + const u8 text[POKE_MAX_OPCODE_SIZE]; }; extern void text_poke_early(void *addr, const void *opcode, size_t len); @@ -51,8 +52,10 @@ extern void text_poke_early(void *addr, const void *opcode, size_t len); extern void *text_poke(void *addr, const void *opcode, size_t len); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); extern int poke_int3_handler(struct pt_regs *regs); -extern void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); +extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate); extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries); +extern void text_poke_loc_init(struct text_poke_loc *tp, void *addr, + const void *opcode, size_t len, const void *emulate); extern int after_bootmem; extern __ro_after_init struct mm_struct *poking_mm; extern __ro_after_init unsigned long poking_addr; @@ -63,8 +66,17 @@ static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip) regs->ip = ip; } -#define INT3_INSN_SIZE 1 -#define CALL_INSN_SIZE 5 +#define INT3_INSN_SIZE 1 +#define INT3_INSN_OPCODE 0xCC + +#define CALL_INSN_SIZE 5 +#define CALL_INSN_OPCODE 0xE8 + +#define JMP32_INSN_SIZE 5 +#define JMP32_INSN_OPCODE 0xE9 + +#define JMP8_INSN_SIZE 2 +#define JMP8_INSN_OPCODE 0xEB static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val) { -- cgit From 2fff071d28b54f050f62654dad4ec111b8416d8e Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:16 +0100 Subject: x86/process: Unify copy_thread_tls() While looking at the TSS io bitmap it turned out that any change in that area would require identical changes to copy_thread_tls(). The 32 and 64 bit variants share sufficient code to consolidate them into a common function to avoid duplication of upcoming modifications. Signed-off-by: Thomas Gleixner Acked-by: Andy Lutomirski --- arch/x86/include/asm/ptrace.h | 6 ++++++ arch/x86/include/asm/switch_to.h | 10 ++++++++++ 2 files changed, 16 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 332eb3525867..5057a8ed100b 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -361,5 +361,11 @@ extern int do_get_thread_area(struct task_struct *p, int idx, extern int do_set_thread_area(struct task_struct *p, int idx, struct user_desc __user *info, int can_allocate); +#ifdef CONFIG_X86_64 +# define do_set_thread_area_64(p, s, t) do_arch_prctl_64(p, s, t) +#else +# define do_set_thread_area_64(p, s, t) (0) +#endif + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_PTRACE_H */ diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 18a4b6890fa8..0e059b73437b 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -103,7 +103,17 @@ static inline void update_task_stack(struct task_struct *task) if (static_cpu_has(X86_FEATURE_XENPV)) load_sp0(task_top_of_stack(task)); #endif +} +static inline void kthread_frame_init(struct inactive_task_frame *frame, + unsigned long fun, unsigned long arg) +{ + frame->bx = fun; +#ifdef CONFIG_X86_32 + frame->di = arg; +#else + frame->r12 = arg; +#endif } #endif /* _ASM_X86_SWITCH_TO_H */ -- cgit From ecc7e37d4dadd16f6be125ca496feccd05454da4 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:20 +0100 Subject: x86/io: Speedup schedule out of I/O bitmap user There is no requirement to update the TSS I/O bitmap when a thread using it is scheduled out and the incoming thread does not use it. For the permission check based on the TSS I/O bitmap the CPU calculates the memory location of the I/O bitmap by the address of the TSS and the io_bitmap_base member of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the offset to an address outside of the TSS limit. If an I/O instruction is issued from user space the TSS limit causes #GP to be raised in the same was as valid I/O bitmap with all bits set to 1 would do. This removes the extra work when an I/O bitmap using task is scheduled out and puts the burden on the rare I/O bitmap users when they are scheduled in. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/processor.h | 38 ++++++++++++++++++++++++++------------ 1 file changed, 26 insertions(+), 12 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6e0a3b43d027..6d0059c21969 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -330,8 +330,23 @@ struct x86_hw_tss { #define IO_BITMAP_BITS 65536 #define IO_BITMAP_BYTES (IO_BITMAP_BITS/8) #define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long)) -#define IO_BITMAP_OFFSET (offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss)) -#define INVALID_IO_BITMAP_OFFSET 0x8000 + +#define IO_BITMAP_OFFSET_VALID \ + (offsetof(struct tss_struct, io_bitmap) - \ + offsetof(struct tss_struct, x86_tss)) + +/* + * sizeof(unsigned long) coming from an extra "long" at the end + * of the iobitmap. + * + * -1? seg base+limit should be pointing to the address of the + * last valid byte + */ +#define __KERNEL_TSS_LIMIT \ + (IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + sizeof(unsigned long) - 1) + +/* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */ +#define IO_BITMAP_OFFSET_INVALID (__KERNEL_TSS_LIMIT + 1) struct entry_stack { unsigned long words[64]; @@ -349,6 +364,15 @@ struct tss_struct { */ struct x86_hw_tss x86_tss; + /* + * Store the dirty size of the last io bitmap offender. The next + * one will have to do the cleanup as the switch out to a non io + * bitmap user will just set x86_tss.io_bitmap_base to a value + * outside of the TSS limit. So for sane tasks there is no need to + * actually touch the io_bitmap at all. + */ + unsigned int io_bitmap_prev_max; + /* * The extra 1 is there because the CPU will access an * additional byte beyond the end of the IO permission @@ -360,16 +384,6 @@ struct tss_struct { DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); -/* - * sizeof(unsigned long) coming from an extra "long" at the end - * of the iobitmap. - * - * -1? seg base+limit should be pointing to the address of the - * last valid byte - */ -#define __KERNEL_TSS_LIMIT \ - (IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1) - /* Per CPU interrupt stacks */ struct irq_stack { char stack[IRQ_STACK_SIZE]; -- cgit From f5848e5fd2f813c3a8009a642dfbcf635287c199 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 12 Nov 2019 18:45:29 +0100 Subject: x86/tss: Move I/O bitmap data into a seperate struct Move the non hardware portion of I/O bitmap data into a seperate struct for readability sake. Originally-by: Ingo Molnar Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/processor.h | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6d0059c21969..cd7cd7d10b81 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -328,11 +328,11 @@ struct x86_hw_tss { * IO-bitmap sizes: */ #define IO_BITMAP_BITS 65536 -#define IO_BITMAP_BYTES (IO_BITMAP_BITS/8) -#define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long)) +#define IO_BITMAP_BYTES (IO_BITMAP_BITS / BITS_PER_BYTE) +#define IO_BITMAP_LONGS (IO_BITMAP_BYTES / sizeof(long)) -#define IO_BITMAP_OFFSET_VALID \ - (offsetof(struct tss_struct, io_bitmap) - \ +#define IO_BITMAP_OFFSET_VALID \ + (offsetof(struct tss_struct, io_bitmap.bitmap) - \ offsetof(struct tss_struct, x86_tss)) /* @@ -356,14 +356,10 @@ struct entry_stack_page { struct entry_stack stack; } __aligned(PAGE_SIZE); -struct tss_struct { - /* - * The fixed hardware portion. This must not cross a page boundary - * at risk of violating the SDM's advice and potentially triggering - * errata. - */ - struct x86_hw_tss x86_tss; - +/* + * All IO bitmap related data stored in the TSS: + */ +struct x86_io_bitmap { /* * Store the dirty size of the last io bitmap offender. The next * one will have to do the cleanup as the switch out to a non io @@ -371,7 +367,7 @@ struct tss_struct { * outside of the TSS limit. So for sane tasks there is no need to * actually touch the io_bitmap at all. */ - unsigned int io_bitmap_prev_max; + unsigned int prev_max; /* * The extra 1 is there because the CPU will access an @@ -379,7 +375,18 @@ struct tss_struct { * bitmap. The extra byte must be all 1 bits, and must * be within the limit. */ - unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; + unsigned long bitmap[IO_BITMAP_LONGS + 1]; +}; + +struct tss_struct { + /* + * The fixed hardware portion. This must not cross a page boundary + * at risk of violating the SDM's advice and potentially triggering + * errata. + */ + struct x86_hw_tss x86_tss; + + struct x86_io_bitmap io_bitmap; } __aligned(PAGE_SIZE); DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); -- cgit From 577d5cd7e5851d3832066cd0422475fa7db2ee17 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:21 +0100 Subject: x86/ioperm: Move iobitmap data into a struct No point in having all the data in thread_struct, especially as upcoming changes add more. Make the bitmap in the new struct accessible as array of longs and as array of characters via a union, so both the bitmap functions and the update logic can avoid type casts. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/io_bitmap.h | 13 +++++++++++++ arch/x86/include/asm/processor.h | 6 ++---- 2 files changed, 15 insertions(+), 4 deletions(-) create mode 100644 arch/x86/include/asm/io_bitmap.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h new file mode 100644 index 000000000000..1a12b9ff5e4e --- /dev/null +++ b/arch/x86/include/asm/io_bitmap.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_IOBITMAP_H +#define _ASM_X86_IOBITMAP_H + +#include + +struct io_bitmap { + /* The maximum number of bytes to copy so all zero bits are covered */ + unsigned int max; + unsigned long bitmap[IO_BITMAP_LONGS]; +}; + +#endif diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index cd7cd7d10b81..c949e0e5cbe6 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -7,6 +7,7 @@ /* Forward declaration, a strange C thing */ struct task_struct; struct mm_struct; +struct io_bitmap; struct vm86; #include @@ -501,10 +502,8 @@ struct thread_struct { struct vm86 *vm86; #endif /* IO permissions: */ - unsigned long *io_bitmap_ptr; + struct io_bitmap *io_bitmap; unsigned long iopl; - /* Max allowed port in the bitmap, in bytes: */ - unsigned io_bitmap_max; mm_segment_t addr_limit; @@ -862,7 +861,6 @@ static inline void spin_lock_prefetch(const void *x) #define INIT_THREAD { \ .sp0 = TOP_OF_INIT_STACK, \ .sysenter_cs = __KERNEL_CS, \ - .io_bitmap_ptr = NULL, \ .addr_limit = KERNEL_DS, \ } -- cgit From 060aa16fdb7c5078a4159a76e5dc87d6a493af9b Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:22 +0100 Subject: x86/ioperm: Add bitmap sequence number Add a globally unique sequence number which is incremented when ioperm() is changing the I/O bitmap of a task. Store the new sequence number in the io_bitmap structure and compare it with the sequence number of the I/O bitmap which was last loaded on a CPU. Only update the bitmap if the sequence is different. That should further reduce the overhead of I/O bitmap scheduling when there are only a few I/O bitmap users on the system. The 64bit sequence counter is sufficient. A wraparound of the sequence counter assuming an ioperm() call every nanosecond would require about 584 years of uptime. Suggested-by: Linus Torvalds Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/io_bitmap.h | 1 + arch/x86/include/asm/processor.h | 3 +++ 2 files changed, 4 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h index 1a12b9ff5e4e..d63bd5afe671 100644 --- a/arch/x86/include/asm/io_bitmap.h +++ b/arch/x86/include/asm/io_bitmap.h @@ -5,6 +5,7 @@ #include struct io_bitmap { + u64 sequence; /* The maximum number of bytes to copy so all zero bits are covered */ unsigned int max; unsigned long bitmap[IO_BITMAP_LONGS]; diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index c949e0e5cbe6..40bb0f7bca3f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -361,6 +361,9 @@ struct entry_stack_page { * All IO bitmap related data stored in the TSS: */ struct x86_io_bitmap { + /* The sequence number of the last active bitmap. */ + u64 prev_sequence; + /* * Store the dirty size of the last io bitmap offender. The next * one will have to do the cleanup as the switch out to a non io -- cgit From 22fe5b0439dd53643fd6f4c582c46c6dba0fde53 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:23 +0100 Subject: x86/ioperm: Move TSS bitmap update to exit to user work There is no point to update the TSS bitmap for tasks which use I/O bitmaps on every context switch. It's enough to update it right before exiting to user space. That reduces the context switch bitmap handling to invalidating the io bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP set. The invaldiation is done on purpose when a task with an IO bitmap switches out to prevent any possible leakage of an activated IO bitmap. It also removes the requirement to update the tasks bitmap atomically in ioperm(). Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/io_bitmap.h | 2 ++ arch/x86/include/asm/thread_info.h | 9 +++++---- 2 files changed, 7 insertions(+), 4 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h index d63bd5afe671..6d82a37cb17c 100644 --- a/arch/x86/include/asm/io_bitmap.h +++ b/arch/x86/include/asm/io_bitmap.h @@ -11,4 +11,6 @@ struct io_bitmap { unsigned long bitmap[IO_BITMAP_LONGS]; }; +void tss_update_io_bitmap(void); + #endif diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index f9453536f9bb..0accf44878a5 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -143,8 +143,8 @@ struct thread_info { _TIF_NOHZ) /* flags to check in __switch_to() */ -#define _TIF_WORK_CTXSW_BASE \ - (_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP| \ +#define _TIF_WORK_CTXSW_BASE \ + (_TIF_NOCPUID | _TIF_NOTSC | _TIF_BLOCKSTEP | \ _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE) /* @@ -156,8 +156,9 @@ struct thread_info { # define _TIF_WORK_CTXSW (_TIF_WORK_CTXSW_BASE) #endif -#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) -#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) +#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \ + _TIF_IO_BITMAP) +#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) #define STACK_WARN (THREAD_SIZE/8) -- cgit From ea5f1cd7ab494f65f50f338299eabb40ad6a1767 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:24 +0100 Subject: x86/ioperm: Remove bitmap if all permissions dropped If ioperm() results in a bitmap with all bits set (no permissions to any I/O port), then handling that bitmap on context switch and exit to user mode is pointless. Drop it. Move the bitmap exit handling to the ioport code and reuse it for both the thread exit path and dropping it. This allows to reuse this code for the upcoming iopl() emulation. Signed-off-by: Thomas Gleixner Acked-by: Andy Lutomirski --- arch/x86/include/asm/io_bitmap.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h index 6d82a37cb17c..784a88edb25d 100644 --- a/arch/x86/include/asm/io_bitmap.h +++ b/arch/x86/include/asm/io_bitmap.h @@ -11,6 +11,8 @@ struct io_bitmap { unsigned long bitmap[IO_BITMAP_LONGS]; }; +void io_bitmap_exit(void); + void tss_update_io_bitmap(void); #endif -- cgit From 4804e382c117ce213cd5c43512cf4b1d71bb2650 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:25 +0100 Subject: x86/ioperm: Share I/O bitmap if identical The I/O bitmap is duplicated on fork. That's wasting memory and slows down fork. There is no point to do so. As long as the bitmap is not modified it can be shared between threads and processes. Add a refcount and just share it on fork. If a task modifies the bitmap then it has to do the duplication if and only if it is shared. Signed-off-by: Thomas Gleixner Acked-by: Andy Lutomirski --- arch/x86/include/asm/io_bitmap.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h index 784a88edb25d..b664baadf736 100644 --- a/arch/x86/include/asm/io_bitmap.h +++ b/arch/x86/include/asm/io_bitmap.h @@ -2,15 +2,20 @@ #ifndef _ASM_X86_IOBITMAP_H #define _ASM_X86_IOBITMAP_H +#include #include struct io_bitmap { u64 sequence; + refcount_t refcnt; /* The maximum number of bytes to copy so all zero bits are covered */ unsigned int max; unsigned long bitmap[IO_BITMAP_LONGS]; }; +struct task_struct; + +void io_bitmap_share(struct task_struct *tsk); void io_bitmap_exit(void); void tss_update_io_bitmap(void); -- cgit From c8137ace56383688af911fea5934c71ad158135e Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:28 +0100 Subject: x86/iopl: Restrict iopl() permission scope The access to the full I/O port range can be also provided by the TSS I/O bitmap, but that would require to copy 8k of data on scheduling in the task. As shown with the sched out optimization TSS.io_bitmap_base can be used to switch the incoming task to a preallocated I/O bitmap which has all bits zero, i.e. allows access to all I/O ports. Implementing this allows to provide an iopl() emulation mode which restricts the IOPL level 3 permissions to I/O port access but removes the STI/CLI permission which is coming with the hardware IOPL mechansim. Provide a config option to switch IOPL to emulation mode, make it the default and while at it also provide an option to disable IOPL completely. Signed-off-by: Thomas Gleixner Acked-by: Andy Lutomirski --- arch/x86/include/asm/pgtable_32_types.h | 2 +- arch/x86/include/asm/processor.h | 28 +++++++++++++++++++++------- 2 files changed, 22 insertions(+), 8 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h index b0bc0fff5f1f..0fab4bfb4df2 100644 --- a/arch/x86/include/asm/pgtable_32_types.h +++ b/arch/x86/include/asm/pgtable_32_types.h @@ -44,7 +44,7 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */ * Define this here and validate with BUILD_BUG_ON() in pgtable_32.c * to avoid include recursion hell */ -#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40) +#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 41) #define CPU_ENTRY_AREA_BASE \ ((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) \ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 40bb0f7bca3f..b0e02aa3f46a 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -332,19 +332,21 @@ struct x86_hw_tss { #define IO_BITMAP_BYTES (IO_BITMAP_BITS / BITS_PER_BYTE) #define IO_BITMAP_LONGS (IO_BITMAP_BYTES / sizeof(long)) -#define IO_BITMAP_OFFSET_VALID \ +#define IO_BITMAP_OFFSET_VALID_MAP \ (offsetof(struct tss_struct, io_bitmap.bitmap) - \ offsetof(struct tss_struct, x86_tss)) +#define IO_BITMAP_OFFSET_VALID_ALL \ + (offsetof(struct tss_struct, io_bitmap.mapall) - \ + offsetof(struct tss_struct, x86_tss)) + /* - * sizeof(unsigned long) coming from an extra "long" at the end - * of the iobitmap. - * - * -1? seg base+limit should be pointing to the address of the - * last valid byte + * sizeof(unsigned long) coming from an extra "long" at the end of the + * iobitmap. The limit is inclusive, i.e. the last valid byte. */ #define __KERNEL_TSS_LIMIT \ - (IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + sizeof(unsigned long) - 1) + (IO_BITMAP_OFFSET_VALID_ALL + IO_BITMAP_BYTES + \ + sizeof(unsigned long) - 1) /* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */ #define IO_BITMAP_OFFSET_INVALID (__KERNEL_TSS_LIMIT + 1) @@ -380,6 +382,12 @@ struct x86_io_bitmap { * be within the limit. */ unsigned long bitmap[IO_BITMAP_LONGS + 1]; + + /* + * Special I/O bitmap to emulate IOPL(3). All bytes zero, + * except the additional byte at the end. + */ + unsigned long mapall[IO_BITMAP_LONGS + 1]; }; struct tss_struct { @@ -506,7 +514,13 @@ struct thread_struct { #endif /* IO permissions: */ struct io_bitmap *io_bitmap; + + /* + * IOPL. Priviledge level dependent I/O permission which includes + * user space CLI/STI when granted. + */ unsigned long iopl; + unsigned long iopl_emul; mm_segment_t addr_limit; -- cgit From a24ca9976843156eabbc5f2d798954b5674d1b61 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 11 Nov 2019 23:03:29 +0100 Subject: x86/iopl: Remove legacy IOPL option The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy cruft dealing with the (e)flags based IOPL mechanism. Signed-off-by: Thomas Gleixner Reviewed-by: Juergen Gross (Paravirt and Xen parts) Acked-by: Andy Lutomirski --- arch/x86/include/asm/paravirt.h | 4 ---- arch/x86/include/asm/paravirt_types.h | 2 -- arch/x86/include/asm/processor.h | 26 +++----------------------- arch/x86/include/asm/xen/hypervisor.h | 2 -- 4 files changed, 3 insertions(+), 31 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 69089d46f128..86e7317eb31f 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -294,10 +294,6 @@ static inline void write_idt_entry(gate_desc *dt, int entry, const gate_desc *g) { PVOP_VCALL3(cpu.write_idt_entry, dt, entry, g); } -static inline void set_iopl_mask(unsigned mask) -{ - PVOP_VCALL1(cpu.set_iopl_mask, mask); -} static inline void paravirt_activate_mm(struct mm_struct *prev, struct mm_struct *next) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 70b654f3ffe5..84812964d3dd 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -140,8 +140,6 @@ struct pv_cpu_ops { void (*load_sp0)(unsigned long sp0); - void (*set_iopl_mask)(unsigned mask); - void (*wbinvd)(void); /* cpuid emulation, mostly so that caps bits can be disabled */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index b0e02aa3f46a..1387d31c5e07 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -516,10 +516,10 @@ struct thread_struct { struct io_bitmap *io_bitmap; /* - * IOPL. Priviledge level dependent I/O permission which includes - * user space CLI/STI when granted. + * IOPL. Priviledge level dependent I/O permission which is + * emulated via the I/O bitmap to prevent user space from disabling + * interrupts. */ - unsigned long iopl; unsigned long iopl_emul; mm_segment_t addr_limit; @@ -552,25 +552,6 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset, */ #define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/ -/* - * Set IOPL bits in EFLAGS from given mask - */ -static inline void native_set_iopl_mask(unsigned mask) -{ -#ifdef CONFIG_X86_32 - unsigned int reg; - - asm volatile ("pushfl;" - "popl %0;" - "andl %1, %0;" - "orl %2, %0;" - "pushl %0;" - "popfl" - : "=&r" (reg) - : "i" (~X86_EFLAGS_IOPL), "r" (mask)); -#endif -} - static inline void native_load_sp0(unsigned long sp0) { @@ -610,7 +591,6 @@ static inline void load_sp0(unsigned long sp0) native_load_sp0(sp0); } -#define set_iopl_mask native_set_iopl_mask #endif /* CONFIG_PARAVIRT_XXL */ /* Free all resources held by a thread. */ diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h index 42e1245af0d8..ff4b52e37e60 100644 --- a/arch/x86/include/asm/xen/hypervisor.h +++ b/arch/x86/include/asm/xen/hypervisor.h @@ -62,6 +62,4 @@ void xen_arch_register_cpu(int num); void xen_arch_unregister_cpu(int num); #endif -extern void xen_set_iopl_mask(unsigned mask); - #endif /* _ASM_X86_XEN_HYPERVISOR_H */ -- cgit From 111e7b15cf10f6e973ccf537c70c66a5de539060 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 12 Nov 2019 21:40:33 +0100 Subject: x86/ioperm: Extend IOPL config to control ioperm() as well If iopl() is disabled, then providing ioperm() does not make much sense. Rename the config option and disable/enable both syscalls with it. Guard the code with #ifdefs where appropriate. Suggested-by: Andy Lutomirski Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/io_bitmap.h | 6 ++++++ arch/x86/include/asm/processor.h | 9 ++++++++- arch/x86/include/asm/thread_info.h | 7 ++++++- 3 files changed, 20 insertions(+), 2 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h index b664baadf736..02c6ef8f7667 100644 --- a/arch/x86/include/asm/io_bitmap.h +++ b/arch/x86/include/asm/io_bitmap.h @@ -15,9 +15,15 @@ struct io_bitmap { struct task_struct; +#ifdef CONFIG_X86_IOPL_IOPERM void io_bitmap_share(struct task_struct *tsk); void io_bitmap_exit(void); void tss_update_io_bitmap(void); +#else +static inline void io_bitmap_share(struct task_struct *tsk) { } +static inline void io_bitmap_exit(void) { } +static inline void tss_update_io_bitmap(void) { } +#endif #endif diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 1387d31c5e07..45f416a2c1f1 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -340,13 +340,18 @@ struct x86_hw_tss { (offsetof(struct tss_struct, io_bitmap.mapall) - \ offsetof(struct tss_struct, x86_tss)) +#ifdef CONFIG_X86_IOPL_IOPERM /* * sizeof(unsigned long) coming from an extra "long" at the end of the * iobitmap. The limit is inclusive, i.e. the last valid byte. */ -#define __KERNEL_TSS_LIMIT \ +# define __KERNEL_TSS_LIMIT \ (IO_BITMAP_OFFSET_VALID_ALL + IO_BITMAP_BYTES + \ sizeof(unsigned long) - 1) +#else +# define __KERNEL_TSS_LIMIT \ + (offsetof(struct tss_struct, x86_tss) + sizeof(struct x86_hw_tss) - 1) +#endif /* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */ #define IO_BITMAP_OFFSET_INVALID (__KERNEL_TSS_LIMIT + 1) @@ -398,7 +403,9 @@ struct tss_struct { */ struct x86_hw_tss x86_tss; +#ifdef CONFIG_X86_IOPL_IOPERM struct x86_io_bitmap io_bitmap; +#endif } __aligned(PAGE_SIZE); DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 0accf44878a5..d779366ce3f8 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -156,8 +156,13 @@ struct thread_info { # define _TIF_WORK_CTXSW (_TIF_WORK_CTXSW_BASE) #endif -#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \ +#ifdef CONFIG_X86_IOPL_IOPERM +# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \ _TIF_IO_BITMAP) +#else +# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY) +#endif + #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) #define STACK_WARN (THREAD_SIZE/8) -- cgit From b41d62201b9772c7c750360ab668d2caa502e642 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 18 Nov 2019 10:47:29 +0100 Subject: x86: Remove unused asm/rio.h The removed calgary IOMMU driver was the only user of this header file. Reported-by: Jon Mason Signed-off-by: Thomas Gleixner Cc: Christoph Hellwig --- arch/x86/include/asm/rio.h | 64 ---------------------------------------------- 1 file changed, 64 deletions(-) delete mode 100644 arch/x86/include/asm/rio.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/rio.h b/arch/x86/include/asm/rio.h deleted file mode 100644 index 0a21986d2238..000000000000 --- a/arch/x86/include/asm/rio.h +++ /dev/null @@ -1,64 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Derived from include/asm-x86/mach-summit/mach_mpparse.h - * and include/asm-x86/mach-default/bios_ebda.h - * - * Author: Laurent Vivier - */ - -#ifndef _ASM_X86_RIO_H -#define _ASM_X86_RIO_H - -#define RIO_TABLE_VERSION 3 - -struct rio_table_hdr { - u8 version; /* Version number of this data structure */ - u8 num_scal_dev; /* # of Scalability devices */ - u8 num_rio_dev; /* # of RIO I/O devices */ -} __attribute__((packed)); - -struct scal_detail { - u8 node_id; /* Scalability Node ID */ - u32 CBAR; /* Address of 1MB register space */ - u8 port0node; /* Node ID port connected to: 0xFF=None */ - u8 port0port; /* Port num port connected to: 0,1,2, or */ - /* 0xFF=None */ - u8 port1node; /* Node ID port connected to: 0xFF = None */ - u8 port1port; /* Port num port connected to: 0,1,2, or */ - /* 0xFF=None */ - u8 port2node; /* Node ID port connected to: 0xFF = None */ - u8 port2port; /* Port num port connected to: 0,1,2, or */ - /* 0xFF=None */ - u8 chassis_num; /* 1 based Chassis number (1 = boot node) */ -} __attribute__((packed)); - -struct rio_detail { - u8 node_id; /* RIO Node ID */ - u32 BBAR; /* Address of 1MB register space */ - u8 type; /* Type of device */ - u8 owner_id; /* Node ID of Hurricane that owns this */ - /* node */ - u8 port0node; /* Node ID port connected to: 0xFF=None */ - u8 port0port; /* Port num port connected to: 0,1,2, or */ - /* 0xFF=None */ - u8 port1node; /* Node ID port connected to: 0xFF=None */ - u8 port1port; /* Port num port connected to: 0,1,2, or */ - /* 0xFF=None */ - u8 first_slot; /* Lowest slot number below this Calgary */ - u8 status; /* Bit 0 = 1 : the XAPIC is used */ - /* = 0 : the XAPIC is not used, ie: */ - /* ints fwded to another XAPIC */ - /* Bits1:7 Reserved */ - u8 WP_index; /* instance index - lower ones have */ - /* lower slot numbers/PCI bus numbers */ - u8 chassis_num; /* 1 based Chassis number */ -} __attribute__((packed)); - -enum { - HURR_SCALABILTY = 0, /* Hurricane Scalability info */ - HURR_RIOIB = 2, /* Hurricane RIOIB info */ - COMPAT_CALGARY = 4, /* Compatibility Calgary */ - ALT_CALGARY = 5, /* Second Planar Calgary */ -}; - -#endif /* _ASM_X86_RIO_H */ -- cgit From 81ff2c37f9e5d77593928df0536d86443195fd64 Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Mon, 18 Nov 2019 16:21:12 +0100 Subject: x86/stackframe/32: Repair 32-bit Xen PV Once again RPL checks have been introduced which don't account for a 32-bit kernel living in ring 1 when running in a PV Xen domain. The case in FIXUP_FRAME has been preventing boot. Adjust BUG_IF_WRONG_CR3 as well to guard against future uses of the macro on a code path reachable when running in PV mode under Xen; I have to admit that I stopped at a certain point trying to figure out whether there are present ones. Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") Signed-off-by: Jan Beulich Signed-off-by: Thomas Gleixner Cc: Stable Team Link: https://lore.kernel.org/r/0fad341f-b7f5-f859-d55d-f0084ee7087e@suse.com --- arch/x86/include/asm/segment.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index ac3892920419..6669164abadc 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -31,6 +31,18 @@ */ #define SEGMENT_RPL_MASK 0x3 +/* + * When running on Xen PV, the actual privilege level of the kernel is 1, + * not 0. Testing the Requested Privilege Level in a segment selector to + * determine whether the context is user mode or kernel mode with + * SEGMENT_RPL_MASK is wrong because the PV kernel's privilege level + * matches the 0x3 mask. + * + * Testing with USER_SEGMENT_RPL_MASK is valid for both native and Xen PV + * kernels because privilege level 2 is never used. + */ +#define USER_SEGMENT_RPL_MASK 0x2 + /* User mode is privilege level 3: */ #define USER_RPL 0x3 -- cgit From edef5c36b0c7f07ab4926f6c9e50731f3772c79d Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 18 Nov 2019 12:23:00 -0500 Subject: KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID Because KVM always emulates CPUID, the CPUID clear bit (bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually" by the hypervisor when performing said emulation. Right now neither kvm-intel.ko nor kvm-amd.ko implement MSR_IA32_TSX_CTRL but this will change in the next patch. Reviewed-by: Jim Mattson Tested-by: Jim Mattson Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4fc61483919a..663d09ac7778 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1357,6 +1357,7 @@ int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, void kvm_enable_efer_bits(u64); bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); +int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data); int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu); -- cgit From 880a98c339961eaa074393e3a2117cbe9125b8bb Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 21 Nov 2019 00:40:24 +0100 Subject: x86/cpu_entry_area: Add guard page for entry stack on 32bit The entry stack in the cpu entry area is protected against overflow by the readonly GDT on 64-bit, but on 32-bit the GDT needs to be writeable and therefore does not trigger a fault on stack overflow. Add a guard page. Fixes: c482feefe1ae ("x86/entry/64: Make cpu_entry_area.tss read-only") Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Cc: stable@kernel.org --- arch/x86/include/asm/cpu_entry_area.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index 8348f7d69fd5..905d89c80d3f 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -78,8 +78,12 @@ struct cpu_entry_area { /* * The GDT is just below entry_stack and thus serves (on x86_64) as - * a a read-only guard page. + * a read-only guard page. On 32-bit the GDT must be writeable, so + * it needs an extra guard page. */ +#ifdef CONFIG_X86_32 + char guard_entry_stack[PAGE_SIZE]; +#endif struct entry_stack_page entry_stack_page; /* -- cgit From fa36dcdf8b200f4c175d0a00a8c99439ee0df95d Mon Sep 17 00:00:00 2001 From: Himadri Pandya Date: Tue, 30 Jul 2019 09:49:43 +0000 Subject: x86: hv: Add function to allocate zeroed page for Hyper-V Hyper-V assumes page size to be 4K. While this assumption holds true on x86 architecture, it might not be true for ARM64 architecture. Hence define hyper-v specific function to allocate a zeroed page which can have a different implementation on ARM64 architecture to handle the conflict between hyper-v's assumed page size and actual guest page size. Signed-off-by: Himadri Pandya Reviewed-by: Michael Kelley Signed-off-by: Sasha Levin --- arch/x86/include/asm/mshyperv.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index f4138aeb4280..6b79515abb82 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -219,6 +219,7 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu) void __init hyperv_init(void); void hyperv_setup_mmu_ops(void); void *hv_alloc_hyperv_page(void); +void *hv_alloc_hyperv_zeroed_page(void); void hv_free_hyperv_page(unsigned long addr); void hyperv_reenlightenment_intr(struct pt_regs *regs); void set_hv_tscchange_cb(void (*cb)(void)); -- cgit From 05b042a1944322844eaae7ea596d5f154166d68a Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sun, 24 Nov 2019 11:21:44 +0100 Subject: x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When two recent commits that increased the size of the 'struct cpu_entry_area' were merged in -tip, the 32-bit defconfig build started failing on the following build time assert: ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_189’ declared with attribute error: BUILD_BUG_ON failed: CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE arch/x86/mm/cpu_entry_area.c:189:2: note: in expansion of macro ‘BUILD_BUG_ON’ In function ‘setup_cpu_entry_area_ptes’, Which corresponds to the following build time assert: BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE); The purpose of this assert is to sanity check the fixed-value definition of CPU_ENTRY_AREA_PAGES arch/x86/include/asm/pgtable_32_types.h: #define CPU_ENTRY_AREA_PAGES (NR_CPUS * 41) The '41' is supposed to match sizeof(struct cpu_entry_area)/PAGE_SIZE, which value we didn't want to define in such a low level header, because it would cause dependency hell. Every time the size of cpu_entry_area is changed, we have to adjust CPU_ENTRY_AREA_PAGES accordingly - and this assert is checking that constraint. But the assert is both imprecise and buggy, primarily because it doesn't include the single readonly IDT page that is mapped at CPU_ENTRY_AREA_BASE (which begins at a PMD boundary). This bug was hidden by the fact that by accident CPU_ENTRY_AREA_PAGES is defined too large upstream (v5.4-rc8): #define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40) While 'struct cpu_entry_area' is 155648 bytes, or 38 pages. So we had two extra pages, which hid the bug. The following commit (not yet upstream) increased the size to 40 pages: x86/iopl: ("Restrict iopl() permission scope") ... but increased CPU_ENTRY_AREA_PAGES only 41 - i.e. shortening the gap to just 1 extra page. Then another not-yet-upstream commit changed the size again: 880a98c33996: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit") Which increased the cpu_entry_area size from 38 to 39 pages, but didn't change CPU_ENTRY_AREA_PAGES (kept it at 40). This worked fine, because we still had a page left from the accidental 'reserve'. But when these two commits were merged into the same tree, the combined size of cpu_entry_area grew from 38 to 40 pages, while CPU_ENTRY_AREA_PAGES finally caught up to 40 as well. Which is fine in terms of functionality, but the assert broke: BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE); because CPU_ENTRY_AREA_MAP_SIZE is the total size of the area, which is 1 page larger due to the IDT page. To fix all this, change the assert to two precise asserts: BUILD_BUG_ON((CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE); BUILD_BUG_ON(CPU_ENTRY_AREA_TOTAL_SIZE != CPU_ENTRY_AREA_MAP_SIZE); This takes the IDT page into account, and also connects the size-based define of CPU_ENTRY_AREA_TOTAL_SIZE with the address-subtraction based define of CPU_ENTRY_AREA_MAP_SIZE. Also clean up some of the names which made it rather confusing: - 'CPU_ENTRY_AREA_TOT_SIZE' wasn't actually the 'total' size of the cpu-entry-area, but the per-cpu array size, so rename this to CPU_ENTRY_AREA_ARRAY_SIZE. - Introduce CPU_ENTRY_AREA_TOTAL_SIZE that _is_ the total mapping size, with the IDT included. - Add comments where '+1' denotes the IDT mapping - it wasn't obvious and took me about 3 hours to decode... Finally, because this particular commit is actually applied after this patch: 880a98c33996: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit") Fix the CPU_ENTRY_AREA_PAGES value from 40 pages to the correct 39 pages. All future commits that change cpu_entry_area will have to adjust this value precisely. As a side note, we should probably attempt to remove CPU_ENTRY_AREA_PAGES and derive its value directly from the structure, without causing header hell - but that is an adventure for another day! :-) Fixes: 880a98c33996: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit") Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Andy Lutomirski Cc: stable@kernel.org Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpu_entry_area.h | 12 +++++++----- arch/x86/include/asm/pgtable_32_types.h | 8 ++++---- 2 files changed, 11 insertions(+), 9 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index 905d89c80d3f..ea866c7bf31d 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -98,7 +98,6 @@ struct cpu_entry_area { */ struct cea_exception_stacks estacks; #endif -#ifdef CONFIG_CPU_SUP_INTEL /* * Per CPU debug store for Intel performance monitoring. Wastes a * full page at the moment. @@ -109,11 +108,13 @@ struct cpu_entry_area { * Reserve enough fixmap PTEs. */ struct debug_store_buffers cpu_debug_buffers; -#endif }; -#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area)) -#define CPU_ENTRY_AREA_TOT_SIZE (CPU_ENTRY_AREA_SIZE * NR_CPUS) +#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area)) +#define CPU_ENTRY_AREA_ARRAY_SIZE (CPU_ENTRY_AREA_SIZE * NR_CPUS) + +/* Total size includes the readonly IDT mapping page as well: */ +#define CPU_ENTRY_AREA_TOTAL_SIZE (CPU_ENTRY_AREA_ARRAY_SIZE + PAGE_SIZE) DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); DECLARE_PER_CPU(struct cea_exception_stacks *, cea_exception_stacks); @@ -121,13 +122,14 @@ DECLARE_PER_CPU(struct cea_exception_stacks *, cea_exception_stacks); extern void setup_cpu_entry_areas(void); extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags); +/* Single page reserved for the readonly IDT mapping: */ #define CPU_ENTRY_AREA_RO_IDT CPU_ENTRY_AREA_BASE #define CPU_ENTRY_AREA_PER_CPU (CPU_ENTRY_AREA_RO_IDT + PAGE_SIZE) #define CPU_ENTRY_AREA_RO_IDT_VADDR ((void *)CPU_ENTRY_AREA_RO_IDT) #define CPU_ENTRY_AREA_MAP_SIZE \ - (CPU_ENTRY_AREA_PER_CPU + CPU_ENTRY_AREA_TOT_SIZE - CPU_ENTRY_AREA_BASE) + (CPU_ENTRY_AREA_PER_CPU + CPU_ENTRY_AREA_ARRAY_SIZE - CPU_ENTRY_AREA_BASE) extern struct cpu_entry_area *get_cpu_entry_area(int cpu); diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h index b0bc0fff5f1f..1636eb8e5a5b 100644 --- a/arch/x86/include/asm/pgtable_32_types.h +++ b/arch/x86/include/asm/pgtable_32_types.h @@ -44,11 +44,11 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */ * Define this here and validate with BUILD_BUG_ON() in pgtable_32.c * to avoid include recursion hell */ -#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40) +#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 39) -#define CPU_ENTRY_AREA_BASE \ - ((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) \ - & PMD_MASK) +/* The +1 is for the readonly IDT page: */ +#define CPU_ENTRY_AREA_BASE \ + ((FIXADDR_TOT_START - PAGE_SIZE*(CPU_ENTRY_AREA_PAGES+1)) & PMD_MASK) #define LDT_BASE_ADDR \ ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK) -- cgit From fb041bb7c0a918b95c6889fc965cdc4a75b4c0ca Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Thu, 21 Nov 2019 11:59:00 +0000 Subject: locking/refcount: Consolidate implementations of refcount_t The generic implementation of refcount_t should be good enough for everybody, so remove ARCH_HAS_REFCOUNT and REFCOUNT_FULL entirely, leaving the generic implementation enabled unconditionally. Signed-off-by: Will Deacon Reviewed-by: Ard Biesheuvel Acked-by: Kees Cook Tested-by: Hanjun Guo Cc: Ard Biesheuvel Cc: Elena Reshetova Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20191121115902.2551-9-will@kernel.org Signed-off-by: Ingo Molnar --- arch/x86/include/asm/asm.h | 6 -- arch/x86/include/asm/refcount.h | 126 ---------------------------------------- 2 files changed, 132 deletions(-) delete mode 100644 arch/x86/include/asm/refcount.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index 3ff577c0b102..5a0c14ebef70 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -139,9 +139,6 @@ # define _ASM_EXTABLE_EX(from, to) \ _ASM_EXTABLE_HANDLE(from, to, ex_handler_ext) -# define _ASM_EXTABLE_REFCOUNT(from, to) \ - _ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount) - # define _ASM_NOKPROBE(entry) \ .pushsection "_kprobe_blacklist","aw" ; \ _ASM_ALIGN ; \ @@ -170,9 +167,6 @@ # define _ASM_EXTABLE_EX(from, to) \ _ASM_EXTABLE_HANDLE(from, to, ex_handler_ext) -# define _ASM_EXTABLE_REFCOUNT(from, to) \ - _ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount) - /* For C file, we already have NOKPROBE_SYMBOL macro */ #endif diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h deleted file mode 100644 index 232f856e0db0..000000000000 --- a/arch/x86/include/asm/refcount.h +++ /dev/null @@ -1,126 +0,0 @@ -#ifndef __ASM_X86_REFCOUNT_H -#define __ASM_X86_REFCOUNT_H -/* - * x86-specific implementation of refcount_t. Based on PAX_REFCOUNT from - * PaX/grsecurity. - */ -#include -#include - -/* - * This is the first portion of the refcount error handling, which lives in - * .text.unlikely, and is jumped to from the CPU flag check (in the - * following macros). This saves the refcount value location into CX for - * the exception handler to use (in mm/extable.c), and then triggers the - * central refcount exception. The fixup address for the exception points - * back to the regular execution flow in .text. - */ -#define _REFCOUNT_EXCEPTION \ - ".pushsection .text..refcount\n" \ - "111:\tlea %[var], %%" _ASM_CX "\n" \ - "112:\t" ASM_UD2 "\n" \ - ASM_UNREACHABLE \ - ".popsection\n" \ - "113:\n" \ - _ASM_EXTABLE_REFCOUNT(112b, 113b) - -/* Trigger refcount exception if refcount result is negative. */ -#define REFCOUNT_CHECK_LT_ZERO \ - "js 111f\n\t" \ - _REFCOUNT_EXCEPTION - -/* Trigger refcount exception if refcount result is zero or negative. */ -#define REFCOUNT_CHECK_LE_ZERO \ - "jz 111f\n\t" \ - REFCOUNT_CHECK_LT_ZERO - -/* Trigger refcount exception unconditionally. */ -#define REFCOUNT_ERROR \ - "jmp 111f\n\t" \ - _REFCOUNT_EXCEPTION - -static __always_inline void refcount_add(unsigned int i, refcount_t *r) -{ - asm volatile(LOCK_PREFIX "addl %1,%0\n\t" - REFCOUNT_CHECK_LT_ZERO - : [var] "+m" (r->refs.counter) - : "ir" (i) - : "cc", "cx"); -} - -static __always_inline void refcount_inc(refcount_t *r) -{ - asm volatile(LOCK_PREFIX "incl %0\n\t" - REFCOUNT_CHECK_LT_ZERO - : [var] "+m" (r->refs.counter) - : : "cc", "cx"); -} - -static __always_inline void refcount_dec(refcount_t *r) -{ - asm volatile(LOCK_PREFIX "decl %0\n\t" - REFCOUNT_CHECK_LE_ZERO - : [var] "+m" (r->refs.counter) - : : "cc", "cx"); -} - -static __always_inline __must_check -bool refcount_sub_and_test(unsigned int i, refcount_t *r) -{ - bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", - REFCOUNT_CHECK_LT_ZERO, - r->refs.counter, e, "er", i, "cx"); - - if (ret) { - smp_acquire__after_ctrl_dep(); - return true; - } - - return false; -} - -static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r) -{ - bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", - REFCOUNT_CHECK_LT_ZERO, - r->refs.counter, e, "cx"); - - if (ret) { - smp_acquire__after_ctrl_dep(); - return true; - } - - return false; -} - -static __always_inline __must_check -bool refcount_add_not_zero(unsigned int i, refcount_t *r) -{ - int c, result; - - c = atomic_read(&(r->refs)); - do { - if (unlikely(c == 0)) - return false; - - result = c + i; - - /* Did we try to increment from/to an undesirable state? */ - if (unlikely(c < 0 || c == INT_MAX || result < c)) { - asm volatile(REFCOUNT_ERROR - : : [var] "m" (r->refs.counter) - : "cc", "cx"); - break; - } - - } while (!atomic_try_cmpxchg(&(r->refs), &c, result)); - - return c != 0; -} - -static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r) -{ - return refcount_add_not_zero(1, r); -} - -#endif -- cgit From af3784689e2b2741918e69f1ce5f0ecb7933b300 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 21 Nov 2019 15:25:04 +0100 Subject: y2038: ipc: fix x32 ABI breakage The correct type on x32 is 64-bit wide, same as for the other struct members around it, so use __kernel_long_t in place of the original __kernel_time_t here, corresponding to the rest of the structure. Fixes: caf5e32d4ea7 ("y2038: ipc: remove __kernel_time_t reference from headers") Reported-by: Ben Hutchings Signed-off-by: Arnd Bergmann --- arch/x86/include/uapi/asm/sembuf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/sembuf.h b/arch/x86/include/uapi/asm/sembuf.h index 7c1b156695ba..93030e97269a 100644 --- a/arch/x86/include/uapi/asm/sembuf.h +++ b/arch/x86/include/uapi/asm/sembuf.h @@ -21,9 +21,9 @@ struct semid64_ds { unsigned long sem_ctime; /* last change time */ unsigned long sem_ctime_high; #else - long sem_otime; /* last semop time */ + __kernel_long_t sem_otime; /* last semop time */ __kernel_ulong_t __unused1; - long sem_ctime; /* last change time */ + __kernel_long_t sem_ctime; /* last change time */ __kernel_ulong_t __unused2; #endif __kernel_ulong_t sem_nsems; /* no. of semaphores in array */ -- cgit From 0bcd7762727dd8ba9b9b6f828e5a4cbd5da4f725 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 26 Nov 2019 21:49:04 +0100 Subject: x86/iopl: Make 'struct tss_struct' constant size again MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After the following commit: 05b042a19443: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise") 'struct cpu_entry_area' has to be Kconfig invariant, so that we always have a matching CPU_ENTRY_AREA_PAGES size. This commit added a CONFIG_X86_IOPL_IOPERM dependency to tss_struct: 111e7b15cf10: ("x86/ioperm: Extend IOPL config to control ioperm() as well") Which, if CONFIG_X86_IOPL_IOPERM is turned off, reduces the size of cpu_entry_area by two pages, triggering the assert: ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_202’ declared with attribute error: BUILD_BUG_ON failed: (CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE Simplify the Kconfig dependencies and make cpu_entry_area constant size on 32-bit kernels again. Fixes: 05b042a19443: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise") Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Andy Lutomirski Signed-off-by: Ingo Molnar --- arch/x86/include/asm/processor.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index b4e29d8b9e5a..e51afbb0cbfb 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -411,9 +411,7 @@ struct tss_struct { */ struct x86_hw_tss x86_tss; -#ifdef CONFIG_X86_IOPL_IOPERM struct x86_io_bitmap io_bitmap; -#endif } __aligned(PAGE_SIZE); DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); -- cgit From 93efbde2c331004d8053f04b4bf0ca3e630b474a Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Wed, 20 Nov 2019 22:12:38 -0800 Subject: x86/traps: Disentangle the 32-bit and 64-bit doublefault code The 64-bit doublefault handler is much nicer than the 32-bit one. As a first step toward unifying them, make the 64-bit handler self-contained. This should have no effect no functional effect except in the odd case of x86_64 with CONFIG_DOUBLEFAULT=n in which case it will change the logging a bit. This also gets rid of CONFIG_DOUBLEFAULT configurability on 64-bit kernels. It didn't do anything useful -- CONFIG_DOUBLEFAULT=n didn't actually disable doublefault handling on x86_64. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Linus Torvalds Signed-off-by: Ingo Molnar --- arch/x86/include/asm/processor.h | 1 - 1 file changed, 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index e51afbb0cbfb..f6c630097d9f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -997,7 +997,6 @@ bool xen_set_default_idle(void); #endif void stop_this_cpu(void *dummy); -void df_debug(struct pt_regs *regs, long error_code); void microcode_check(void); enum l1tf_mitigations { -- cgit From dc4e0021b00b5a4ecba56fae509217776592b0aa Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 26 Nov 2019 18:27:16 +0100 Subject: x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area There are three problems with the current layout of the doublefault stack and TSS. First, the TSS is only cacheline-aligned, which is not enough -- if the hardware portion of the TSS (struct x86_hw_tss) crosses a page boundary, horrible things happen [0]. Second, the stack and TSS are global, so simultaneous double faults on different CPUs will cause massive corruption. Third, the whole mechanism won't work if user CR3 is loaded, resulting in a triple fault [1]. Let the doublefault stack and TSS share a page (which prevents the TSS from spanning a page boundary), make it percpu, and move it into cpu_entry_area. Teach the stack dump code about the doublefault stack. [0] Real hardware will read past the end of the page onto the next *physical* page if a task switch happens. Virtual machines may have any number of bugs, and I would consider it reasonable for a VM to summarily kill the guest if it tries to task-switch to a page-spanning TSS. [1] Real hardware triple faults. At least some VMs seem to hang. I'm not sure what's going on. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Linus Torvalds Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpu_entry_area.h | 12 ++++++++++++ arch/x86/include/asm/doublefault.h | 13 +++++++++++++ arch/x86/include/asm/pgtable_32_types.h | 7 ++++--- arch/x86/include/asm/processor.h | 1 - 4 files changed, 29 insertions(+), 4 deletions(-) create mode 100644 arch/x86/include/asm/doublefault.h (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index ea866c7bf31d..804734058c77 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -65,6 +65,13 @@ enum exception_stack_ordering { #endif +#ifdef CONFIG_X86_32 +struct doublefault_stack { + unsigned long stack[(PAGE_SIZE - sizeof(struct x86_hw_tss)) / sizeof(unsigned long)]; + struct x86_hw_tss tss; +} __aligned(PAGE_SIZE); +#endif + /* * cpu_entry_area is a percpu region that contains things needed by the CPU * and early entry/exit code. Real types aren't used for all fields here @@ -86,6 +93,11 @@ struct cpu_entry_area { #endif struct entry_stack_page entry_stack_page; +#ifdef CONFIG_X86_32 + char guard_doublefault_stack[PAGE_SIZE]; + struct doublefault_stack doublefault_stack; +#endif + /* * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because * we need task switches to work, and task switches write to the TSS. diff --git a/arch/x86/include/asm/doublefault.h b/arch/x86/include/asm/doublefault.h new file mode 100644 index 000000000000..af9a14ac8962 --- /dev/null +++ b/arch/x86/include/asm/doublefault.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_DOUBLEFAULT_H +#define _ASM_X86_DOUBLEFAULT_H + +#if defined(CONFIG_X86_32) && defined(CONFIG_DOUBLEFAULT) +extern void doublefault_init_cpu_tss(void); +#else +static inline void doublefault_init_cpu_tss(void) +{ +} +#endif + +#endif /* _ASM_X86_DOUBLEFAULT_H */ diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h index 19f5807260c3..0416d42e5bdd 100644 --- a/arch/x86/include/asm/pgtable_32_types.h +++ b/arch/x86/include/asm/pgtable_32_types.h @@ -41,10 +41,11 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */ #endif /* - * Define this here and validate with BUILD_BUG_ON() in pgtable_32.c - * to avoid include recursion hell + * This is an upper bound on sizeof(struct cpu_entry_area) / PAGE_SIZE. + * Define this here and validate with BUILD_BUG_ON() in cpu_entry_area.c + * to avoid include recursion hell. */ -#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 41) +#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 43) /* The +1 is for the readonly IDT page: */ #define CPU_ENTRY_AREA_BASE \ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index f6c630097d9f..0340aad3f2fc 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -166,7 +166,6 @@ enum cpuid_regs_idx { extern struct cpuinfo_x86 boot_cpu_data; extern struct cpuinfo_x86 new_cpu_data; -extern struct x86_hw_tss doublefault_tss; extern __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; extern __u32 cpu_caps_set[NCAPINTS + NBUGINTS]; -- cgit From 7d8d8cfdee9a7bd6f9682f253fa98efdd8048a9e Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Wed, 20 Nov 2019 23:06:41 -0800 Subject: x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit The old x86_32 doublefault_fn() was old and crufty, and it did not even try to recover. do_double_fault() is much nicer. Rewrite the 32-bit double fault code to sanitize CPU state and call do_double_fault(). This is mostly an exercise i386 archaeology. With this patch applied, 32-bit double faults get a real stack trace, just like 64-bit double faults. [ mingo: merged the patch to a later kernel base. ] Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Linus Torvalds Signed-off-by: Ingo Molnar --- arch/x86/include/asm/traps.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index b25e633033c3..ffa0dc8a535e 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -69,6 +69,9 @@ dotraplinkage void do_overflow(struct pt_regs *regs, long error_code); dotraplinkage void do_bounds(struct pt_regs *regs, long error_code); dotraplinkage void do_invalid_op(struct pt_regs *regs, long error_code); dotraplinkage void do_device_not_available(struct pt_regs *regs, long error_code); +#if defined(CONFIG_X86_64) || defined(CONFIG_DOUBLEFAULT) +dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsigned long cr2); +#endif dotraplinkage void do_coprocessor_segment_overrun(struct pt_regs *regs, long error_code); dotraplinkage void do_invalid_TSS(struct pt_regs *regs, long error_code); dotraplinkage void do_segment_not_present(struct pt_regs *regs, long error_code); -- cgit From 405b45376de90b3027aaf8c4de035c6bb721fa7e Mon Sep 17 00:00:00 2001 From: Anthony Steinhauser Date: Sun, 24 Nov 2019 21:48:38 -0800 Subject: perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0 When you successfully write 0 to /sys/devices/cpu/rdpmc, the RDPMC instruction should be disabled unconditionally and immediately (after you close the SYSFS file) by the documentation. Instead, in the current implementation the PMU must be reloaded which happens only eventually some time in the future. Only after that the RDPMC instruction becomes disabled (on ring 3) on the respective core. This change makes the treatment of the 0 value as blocking and as unconditional as the current treatment of the 2 value, only the CR4.PCE bit is naturally set to false instead of true. Signed-off-by: Anthony Steinhauser Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: acme@kernel.org Link: https://lkml.kernel.org/r/20191125054838.137615-1-asteinhauser@google.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/mmu_context.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 16ae821483c8..5f33924e200f 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -26,12 +26,14 @@ static inline void paravirt_activate_mm(struct mm_struct *prev, #ifdef CONFIG_PERF_EVENTS +DECLARE_STATIC_KEY_FALSE(rdpmc_never_available_key); DECLARE_STATIC_KEY_FALSE(rdpmc_always_available_key); static inline void load_mm_cr4_irqsoff(struct mm_struct *mm) { if (static_branch_unlikely(&rdpmc_always_available_key) || - atomic_read(&mm->context.perf_rdpmc_allowed)) + (!static_branch_unlikely(&rdpmc_never_available_key) && + atomic_read(&mm->context.perf_rdpmc_allowed))) cr4_set_bits_irqsoff(X86_CR4_PCE); else cr4_clear_bits_irqsoff(X86_CR4_PCE); -- cgit From 59c4bd853abcea95eccc167a7d7fd5f1a5f47b98 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Thu, 28 Nov 2019 09:53:06 +0100 Subject: x86/fpu: Don't cache access to fpu_fpregs_owner_ctx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing to the context that is currently loaded. It never changed during the lifetime of a task - it remained stable/constant. After deferred FPU registers loading until return to userland was implemented, the content of fpu_fpregs_owner_ctx may change during preemption and must not be cached. This went unnoticed for some time and was now noticed, in particular since gcc 9 is caching that load in copy_fpstate_to_sigframe() and reusing it in the retry loop: copy_fpstate_to_sigframe() load fpu_fpregs_owner_ctx and save on stack fpregs_lock() copy_fpregs_to_sigframe() /* failed */ fpregs_unlock() *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx *** fault_in_pages_writeable() /* succeed, retry */ fpregs_lock() __fpregs_load_activate() fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */ copy_fpregs_to_sigframe() /* succeeds, random FPU content */ This is a comparison of the assembly produced by gcc 9, without vs with this patch: | # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size)) | cmpq %rdx, %rax # tmp183, _4 | jb .L190 #, |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |-#APP |-# 512 "arch/x86/include/asm/fpu/internal.h" 1 |- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__ |-# 0 "" 2 |-#NO_APP |- movq %rax, -88(%rbp) # pfo_ret__, %sfp … |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |- movq -88(%rbp), %rcx # %sfp, pfo_ret__ |- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#APP |+# 512 "arch/x86/include/asm/fpu/internal.h" 1 |+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__ |+# 0 "" 2 |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#NO_APP |+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of fpu_fpregs_owner_ctx during preemption points. The Fixes: tag points to the commit where deferred FPU loading was added. Since this commit, the compiler is no longer allowed to move the load of fpu_fpregs_owner_ctx somewhere else / outside of the locked section. A task preemption will change its value and stale content will be observed. [ bp: Massage. ] Debugged-by: Austin Clements Debugged-by: David Chase Debugged-by: Ian Lance Taylor Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Borislav Petkov Reviewed-by: Rik van Riel Tested-by: Borislav Petkov Cc: Aubrey Li Cc: Austin Clements Cc: Barret Rhoden Cc: Dave Hansen Cc: David Chase Cc: "H. Peter Anvin" Cc: ian@airs.com Cc: Ingo Molnar Cc: Josh Bleecher Snyder Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20191128085306.hxfa2o3knqtu4wfn@linutronix.de Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663 --- arch/x86/include/asm/fpu/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 4c95c365058a..44c48e34d799 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu) static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu) { - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; } /* -- cgit From 9ef0e004181956e158fb7ceb9b43810a193f80cd Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 4 Dec 2019 16:53:00 -0800 Subject: arch: msgbuf.h: make uapi asm/msgbuf.h self-contained Userspace cannot compile due to some missing type definitions. For example, building it for x86 fails as follows: CC usr/include/asm/msgbuf.h.s In file included from usr/include/asm/msgbuf.h:6:0, from :32: usr/include/asm-generic/msgbuf.h:25:20: error: field `msg_perm' has incomplete type struct ipc64_perm msg_perm; ^~~~~~~~ usr/include/asm-generic/msgbuf.h:27:2: error: unknown type name `__kernel_time_t' __kernel_time_t msg_stime; /* last msgsnd time */ ^~~~~~~~~~~~~~~ usr/include/asm-generic/msgbuf.h:28:2: error: unknown type name `__kernel_time_t' __kernel_time_t msg_rtime; /* last msgrcv time */ ^~~~~~~~~~~~~~~ usr/include/asm-generic/msgbuf.h:29:2: error: unknown type name `__kernel_time_t' __kernel_time_t msg_ctime; /* last change time */ ^~~~~~~~~~~~~~~ usr/include/asm-generic/msgbuf.h:41:2: error: unknown type name `__kernel_pid_t' __kernel_pid_t msg_lspid; /* pid of last msgsnd */ ^~~~~~~~~~~~~~ usr/include/asm-generic/msgbuf.h:42:2: error: unknown type name `__kernel_pid_t' __kernel_pid_t msg_lrpid; /* last receive pid */ ^~~~~~~~~~~~~~ It is just a matter of missing include directive. Include to make it self-contained, and add it to the compile-test coverage. Link: http://lkml.kernel.org/r/20191030063855.9989-2-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada Cc: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/include/uapi/asm/msgbuf.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/msgbuf.h b/arch/x86/include/uapi/asm/msgbuf.h index 7c5bb43ed8af..b3d0664fadc9 100644 --- a/arch/x86/include/uapi/asm/msgbuf.h +++ b/arch/x86/include/uapi/asm/msgbuf.h @@ -5,6 +5,9 @@ #if !defined(__x86_64__) || !defined(__ILP32__) #include #else + +#include + /* * The msqid64_ds structure for x86 architecture with x32 ABI. * -- cgit From 0fb9dc28679a627f84974165c8011e0630529ece Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 4 Dec 2019 16:53:03 -0800 Subject: arch: sembuf.h: make uapi asm/sembuf.h self-contained Userspace cannot compile due to some missing type definitions. For example, building it for x86 fails as follows: CC usr/include/asm/sembuf.h.s In file included from :32:0: usr/include/asm/sembuf.h:17:20: error: field `sem_perm' has incomplete type struct ipc64_perm sem_perm; /* permissions .. see ipc.h */ ^~~~~~~~ usr/include/asm/sembuf.h:24:2: error: unknown type name `__kernel_time_t' __kernel_time_t sem_otime; /* last semop time */ ^~~~~~~~~~~~~~~ usr/include/asm/sembuf.h:25:2: error: unknown type name `__kernel_ulong_t' __kernel_ulong_t __unused1; ^~~~~~~~~~~~~~~~ usr/include/asm/sembuf.h:26:2: error: unknown type name `__kernel_time_t' __kernel_time_t sem_ctime; /* last change time */ ^~~~~~~~~~~~~~~ usr/include/asm/sembuf.h:27:2: error: unknown type name `__kernel_ulong_t' __kernel_ulong_t __unused2; ^~~~~~~~~~~~~~~~ usr/include/asm/sembuf.h:29:2: error: unknown type name `__kernel_ulong_t' __kernel_ulong_t sem_nsems; /* no. of semaphores in array */ ^~~~~~~~~~~~~~~~ usr/include/asm/sembuf.h:30:2: error: unknown type name `__kernel_ulong_t' __kernel_ulong_t __unused3; ^~~~~~~~~~~~~~~~ usr/include/asm/sembuf.h:31:2: error: unknown type name `__kernel_ulong_t' __kernel_ulong_t __unused4; ^~~~~~~~~~~~~~~~ It is just a matter of missing include directive. Include to make it self-contained, and add it to the compile-test coverage. Link: http://lkml.kernel.org/r/20191030063855.9989-3-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada Cc: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/include/uapi/asm/sembuf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/uapi/asm/sembuf.h b/arch/x86/include/uapi/asm/sembuf.h index 93030e97269a..71205b02a191 100644 --- a/arch/x86/include/uapi/asm/sembuf.h +++ b/arch/x86/include/uapi/asm/sembuf.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_SEMBUF_H #define _ASM_X86_SEMBUF_H +#include + /* * The semid64_ds structure for x86 architecture. * Note extra padding because this structure is passed back and forth -- cgit