Age | Commit message (Collapse) | Author |
|
CONFIG_MITIGATION_CALL_DEPTH_TRACKING
Step 3/10 of the namespace unification of CPU mitigations related Kconfig options.
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-4-leitao@debian.org
|
|
Step 2/10 of the namespace unification of CPU mitigations related Kconfig options.
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-3-leitao@debian.org
|
|
So the CPU mitigations Kconfig entries - there's 10 meanwhile - are named
in a historically idiosyncratic and hence rather inconsistent fashion
and have become hard to relate with each other over the years:
https://lore.kernel.org/lkml/20231011044252.42bplzjsam3qsasz@treble/
When they were introduced we never expected that we'd eventually have
about a dozen of them, and that more organization would be useful,
especially for Linux distributions that want to enable them in an
informed fashion, and want to make sure all mitigations are configured
as expected.
For example, the current CONFIG_SPECULATION_MITIGATIONS namespace is only
halfway populated, where some mitigations have entries in Kconfig, and
they could be modified, while others mitigations do not have Kconfig entries,
and can not be controlled at build time.
Fine-grained control over these Kconfig entries can help in a number of ways:
1) Users can choose and pick only mitigations that are important for
their workloads.
2) Users and developers can choose to disable mitigations that mangle
the assembly code generation, making it hard to read.
3) Separate Kconfigs for just source code readability,
so that we see *which* butt-ugly piece of crap code is for what
reason...
In most cases, if a mitigation is disabled at compilation time, it
can still be enabled at runtime using kernel command line arguments.
This is the first patch of an initial series that renames various
mitigation related Kconfig options, unifying them under a single
CONFIG_MITIGATION_* namespace:
CONFIG_GDS_FORCE_MITIGATION => CONFIG_MITIGATION_GDS_FORCE
CONFIG_CPU_IBPB_ENTRY => CONFIG_MITIGATION_IBPB_ENTRY
CONFIG_CALL_DEPTH_TRACKING => CONFIG_MITIGATION_CALL_DEPTH_TRACKING
CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
CONFIG_SLS => CONFIG_MITIGATION_SLS
CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY
CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY
CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO
CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK
Implement step 1/10 of the namespace unification of CPU mitigations related
Kconfig options and rename CONFIG_GDS_FORCE_MITIGATION to
CONFIG_MITIGATION_GDS_FORCE.
[ mingo: Rewrote changelog for clarity. ]
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-2-leitao@debian.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Quite a lot of kexec work this time around. Many singleton patches in
many places. The notable patch series are:
- nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio
conversions for file paths'.
- Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2:
Folio conversions for directory paths'.
- IA64 remnant removal in Heiko Carstens's 'Remove unused code after
IA-64 removal'.
- Arnd Bergmann has enabled the -Wmissing-prototypes warning
everywhere in 'Treewide: enable -Wmissing-prototypes'. This had
some followup fixes:
- Nathan Chancellor has cleaned up the hexagon build in the series
'hexagon: Fix up instances of -Wmissing-prototypes'.
- Nathan also addressed some s390 warnings in 's390: A couple of
fixes for -Wmissing-prototypes'.
- Arnd Bergmann addresses the same warnings for MIPS in his series
'mips: address -Wmissing-prototypes warnings'.
- Baoquan He has made kexec_file operate in a top-down-fitting manner
similar to kexec_load in the series 'kexec_file: Load kernel at top
of system RAM if required'
- Baoquan He has also added the self-explanatory 'kexec_file: print
out debugging message if required'.
- Some checkstack maintenance work from Tiezhu Yang in the series
'Modify some code about checkstack'.
- Douglas Anderson has disentangled the watchdog code's logging when
multiple reports are occurring simultaneously. The series is
'watchdog: Better handling of concurrent lockups'.
- Yuntao Wang has contributed some maintenance work on the crash code
in 'crash: Some cleanups and fixes'"
* tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits)
crash_core: fix and simplify the logic of crash_exclude_mem_range()
x86/crash: use SZ_1M macro instead of hardcoded value
x86/crash: remove the unused image parameter from prepare_elf_headers()
kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE
scripts/decode_stacktrace.sh: strip unexpected CR from lines
watchdog: if panicking and we dumped everything, don't re-enable dumping
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
kexec_core: fix the assignment to kimage->control_page
x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
lib/trace_readwrite.c:: replace asm-generic/io with linux/io
nilfs2: cpfile: fix some kernel-doc warnings
stacktrace: fix kernel-doc typo
scripts/checkstack.pl: fix no space expression between sp and offset
x86/kexec: fix incorrect argument passed to kexec_dprintk()
x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
nilfs2: add missing set_freezable() for freezable kthread
kernel: relay: remove relay_file_splice_read dead code, doesn't work
docs: submit-checklist: remove all of "make namespacecheck"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
- Add branch stack counters ABI extension to better capture the growing
amount of information the PMU exposes via branch stack sampling.
There's matching tooling support.
- Fix race when creating the nr_addr_filters sysfs file
- Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support
- Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU
support
- Misc cleanups & fixes
* tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Factor out topology_gidnid_map()
perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
perf/x86/amd: Reject branch stack for IBS events
perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
perf/x86/intel/uncore: Support IIO free-running counters on GNR
perf/x86/intel/uncore: Support Granite Rapids
perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
perf: Fix the nr_addr_filters fix
perf/x86/intel/cstate: Add Grand Ridge support
perf/x86/intel/cstate: Add Sierra Forest support
x86/smp: Export symbol cpu_clustergroup_mask()
perf/x86/intel/cstate: Cleanup duplicate attr_groups
perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
perf/x86/intel: Support branch counters logging
perf/x86/intel: Reorganize attrs and is_visible
perf: Add branch_sample_call_stack
perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
perf: Add branch stack counters
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
- Change global variables to local
- Add missing kernel-doc function parameter descriptions
- Remove unused parameter from a macro
- Remove obsolete Kconfig entry
- Fix comments
- Fix typos, mostly scripted, manually reviewed
and a micro-optimization got misplaced as a cleanup:
- Micro-optimize the asm code in secondary_startup_64_no_verify()
* tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arch/x86: Fix typos
x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
x86/docs: Remove reference to syscall trampoline in PTI
x86/Kconfig: Remove obsolete config X86_32_SMP
x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
x86/mtrr: Document missing function parameters in kernel-doc
x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"Replace magic numbers in GDT descriptor definitions & handling:
- Introduce symbolic names via macros for descriptor
types/fields/flags, and then use these symbolic names.
- Clean up definitions a bit, such as GDT_ENTRY_INIT()
- Fix/clean up details that became visibly inconsistent after the
symbol-based code was introduced:
- Unify accessed flag handling
- Set the D/B size flag consistently & according to the HW
specification"
* tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Add DB flag to 32-bit percpu GDT entry
x86/asm: Always set A (accessed) flag in GDT descriptors
x86/asm: Replace magic numbers in GDT descriptors, script-generated change
x86/asm: Replace magic numbers in GDT descriptors, preparations
x86/asm: Provide new infrastructure for GDT descriptors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Ingo Molnar:
- Clean up 'struct apic':
- Drop ::delivery_mode
- Drop 'enum apic_delivery_modes'
- Drop 'struct local_apic'
- Fix comments
* tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Remove unfinished sentence from comment
x86/apic: Drop struct local_apic
x86/apic: Drop enum apic_delivery_modes
x86/apic: Drop apic::delivery_mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Convert the hw error storm handling into a finer-grained, per-bank
solution which allows for more timely detection and reporting of
errors
- Start a documentation section which will hold down relevant RAS
features description and how they should be used
- Add new AMD error bank types
- Slim down and remove error type descriptions from the kernel side of
error decoding to rasdaemon which can be used from now on to decode
hw errors on AMD
- Mark pages containing uncorrectable errors as poison so that kdump
can avoid them and thus not cause another panic
- The usual cleanups and fixlets
* tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Handle Intel threshold interrupt storms
x86/mce: Add per-bank CMCI storm mitigation
x86/mce: Remove old CMCI storm mitigation code
Documentation: Begin a RAS section
x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
EDAC/mce_amd: Remove SMCA Extended Error code descriptions
x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
x86/mce/inject: Clear test status value
x86/mce: Remove redundant check from mce_device_create()
x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu feature updates from Borislav Petkov:
- Add synthetic X86_FEATURE flags for the different AMD Zen generations
and use them everywhere instead of ad-hoc family/model checks. Drop
an ancient AMD errata checking facility as a result
- Fix a fragile initcall ordering in intel_epb
- Do not issue the MFENCE+LFENCE barrier for the TSC deadline and
X2APIC MSRs on AMD as it is not needed there
* tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Add X86_FEATURE_ZEN1
x86/CPU/AMD: Drop now unused CPU erratum checking function
x86/CPU/AMD: Get rid of amd_erratum_1485[]
x86/CPU/AMD: Get rid of amd_erratum_400[]
x86/CPU/AMD: Get rid of amd_erratum_383[]
x86/CPU/AMD: Get rid of amd_erratum_1054[]
x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
x86/CPU/AMD: Carve out the erratum 1386 fix
x86/CPU/AMD: Add ZenX generations flags
x86/cpu/intel_epb: Don't rely on link order
x86/barrier: Do not serialize MSR accesses on AMD
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Convert the sev-guest plaform ->remove callback to return void
- Move the SEV C-bit verification to the BSP as it needs to happen only
once and not on every AP
* tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt: sev-guest: Convert to platform remove callback returning void
x86/sev: Do the C-bit verification only on the BSP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:
- Replace the paravirt patching functionality using the alternatives
infrastructure and remove the former
- Misc other improvements
* tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternative: Correct feature bit debug output
x86/paravirt: Remove no longer needed paravirt patching code
x86/paravirt: Switch mixed paravirt/alternative calls to alternatives
x86/alternative: Add indirect call patching
x86/paravirt: Move some functions and defines to alternative.c
x86/paravirt: Introduce ALT_NOT_XEN
x86/paravirt: Make the struct paravirt_patch_site packed
x86/paravirt: Use relative reference for the original instruction offset
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Borislav Petkov:
- Correct minor issues after the microcode revision reporting
sanitization
* tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/intel: Set new revision only after a successful update
x86/microcode/intel: Remove redundant microcode late updated message
|
|
Use SZ_1M macro instead of hardcoded 1<<20 to make code more readable.
Link: https://lkml.kernel.org/r/20240102144905.110047-3-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "crash: Some cleanups and fixes", v2.
This patchset includes two cleanups and one fix.
This patch (of 3):
The image parameter is no longer in use, remove it. Also, tidy up the
code formatting.
Link: https://lkml.kernel.org/r/20240102144905.110047-1-ytcoode@gmail.com
Link: https://lkml.kernel.org/r/20240102144905.110047-2-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
kprobe_emulate_call_indirect
kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate
indirect calls. However, int3_emulate_call always assumes the size of
the call to be 5 bytes when calculating the return address. This is
incorrect for register-based indirect calls in x86, which can be either
2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime,
the incorrect return address causes control flow to land onto the wrong
place after return -- possibly not a valid instruction boundary. This
can lead to a panic like the following:
[ 7.308204][ C1] BUG: unable to handle page fault for address: 000000000002b4d8
[ 7.308883][ C1] #PF: supervisor read access in kernel mode
[ 7.309168][ C1] #PF: error_code(0x0000) - not-present page
[ 7.309461][ C1] PGD 0 P4D 0
[ 7.309652][ C1] Oops: 0000 [#1] SMP
[ 7.309929][ C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-rc5-trace-for-next #6
[ 7.310397][ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 7.311068][ C1] RIP: 0010:__common_interrupt+0x52/0xc0
[ 7.311349][ C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[ 7.312512][ C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[ 7.312899][ C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[ 7.313334][ C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[ 7.313702][ C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[ 7.314146][ C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[ 7.314509][ C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[ 7.314951][ C1] FS: 0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[ 7.315396][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.315691][ C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[ 7.316153][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.316508][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7.316948][ C1] Call Trace:
[ 7.317123][ C1] <IRQ>
[ 7.317279][ C1] ? __die_body+0x64/0xb0
[ 7.317482][ C1] ? page_fault_oops+0x248/0x370
[ 7.317712][ C1] ? __wake_up+0x96/0xb0
[ 7.317964][ C1] ? exc_page_fault+0x62/0x130
[ 7.318211][ C1] ? asm_exc_page_fault+0x22/0x30
[ 7.318444][ C1] ? __cfi_native_send_call_func_single_ipi+0x10/0x10
[ 7.318860][ C1] ? default_idle+0xb/0x10
[ 7.319063][ C1] ? __common_interrupt+0x52/0xc0
[ 7.319330][ C1] common_interrupt+0x78/0x90
[ 7.319546][ C1] </IRQ>
[ 7.319679][ C1] <TASK>
[ 7.319854][ C1] asm_common_interrupt+0x22/0x40
[ 7.320082][ C1] RIP: 0010:default_idle+0xb/0x10
[ 7.320309][ C1] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 66 90 0f 00 2d 09 b9 3b 00 fb f4 <fa> c3 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 e9
[ 7.321449][ C1] RSP: 0018:ffffc9000009bee8 EFLAGS: 00000256
[ 7.321808][ C1] RAX: ffff88813bca8b68 RBX: 0000000000000001 RCX: 000000000001ef0c
[ 7.322227][ C1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000001ef0c
[ 7.322656][ C1] RBP: ffffc9000009bef8 R08: 8000000000000000 R09: 00000000000008c2
[ 7.323083][ C1] R10: 0000000000000000 R11: ffffffff81058e70 R12: 0000000000000000
[ 7.323530][ C1] R13: ffff8881002b30c0 R14: 0000000000000000 R15: 0000000000000000
[ 7.323948][ C1] ? __cfi_lapic_next_deadline+0x10/0x10
[ 7.324239][ C1] default_idle_call+0x31/0x50
[ 7.324464][ C1] do_idle+0xd3/0x240
[ 7.324690][ C1] cpu_startup_entry+0x25/0x30
[ 7.324983][ C1] start_secondary+0xb4/0xc0
[ 7.325217][ C1] secondary_startup_64_no_verify+0x179/0x17b
[ 7.325498][ C1] </TASK>
[ 7.325641][ C1] Modules linked in:
[ 7.325906][ C1] CR2: 000000000002b4d8
[ 7.326104][ C1] ---[ end trace 0000000000000000 ]---
[ 7.326354][ C1] RIP: 0010:__common_interrupt+0x52/0xc0
[ 7.326614][ C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[ 7.327570][ C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[ 7.327910][ C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[ 7.328273][ C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[ 7.328632][ C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[ 7.329223][ C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[ 7.329780][ C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[ 7.330193][ C1] FS: 0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[ 7.330632][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.331050][ C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[ 7.331454][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.331854][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7.332236][ C1] Kernel panic - not syncing: Fatal exception in interrupt
[ 7.332730][ C1] Kernel Offset: disabled
[ 7.333044][ C1] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
The relevant assembly code is (from objdump, faulting address
highlighted):
ffffffff8102ed9d: 41 ff d3 call *%r11
ffffffff8102eda0: 65 48 <8b> 05 30 c7 ff mov %gs:0x7effc730(%rip),%rax
The emulation incorrectly sets the return address to be ffffffff8102ed9d
+ 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next
mov. This in turn causes incorrect subsequent instruction decoding and
eventually triggers the page fault above.
Instead of invoking int3_emulate_call, perform push and jmp emulation
directly in kprobe_emulate_call_indirect. At this point we can obtain
the instruction size from p->ainsn.size so that we can calculate the
correct return address.
Link: https://lore.kernel.org/all/20240102233345.385475-1-jinghao7@illinois.edu/
Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
Cc: stable@vger.kernel.org
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Fix typos, most reported by "codespell arch/x86". Only touches comments,
no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
|
|
In
https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
I wanted to adjust the alternative patching debug output to the new
changes introduced by
da0fe6e68e10 ("x86/alternative: Add indirect call patching")
but removed the '*' which denotes the ->x86_capability word. The correct
output should be, for example:
[ 0.230071] SMP alternatives: feat: 11*32+15, old: (entry_SYSCALL_64_after_hwframe+0x5a/0x77 (ffffffff81c000c2) len: 16), repl: (ffffffff89ae896a, len: 5) flags: 0x0
while the incorrect one says "... 1132+15" currently.
Add back the '*'.
Fixes: da0fe6e68e10 ("x86/alternative: Add indirect call patching")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
|
|
kernel_ident_mapping_init() takes an exclusive memory range [pstart, pend)
where pend is not included in the range, while res represents an inclusive
memory range [start, end] where end is considered part of the range.
Passing [start, end] rather than [start, end+1) to
kernel_ident_mapping_init() may result in the identity mapping for the
end address not being set up.
For example, when res->start is equal to res->end,
kernel_ident_mapping_init() will not establish any identity mapping.
Similarly, when the value of res->end is a multiple of 2M and the page
table maps 2M pages, kernel_ident_mapping_init() will also not set up
identity mapping for res->end.
Therefore, passing res->end directly to kernel_ident_mapping_init() is
incorrect, the correct end address should be `res->end + 1`.
Link: https://lkml.kernel.org/r/20231221101702.20956-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
kexec_dprintk() expects the last argument to be kbuf.memsz, but the actual
argument being passed is kbuf.bufsz.
Although these two values are currently equal, it is better to pass the
correct one, in case these two values become different in the future.
Link: https://lkml.kernel.org/r/20231220154105.215610-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When detecting an error, the current code uses kexec_dprintk() to output
log message. This is not quite appropriate as kexec_dprintk() is mainly
used for outputting debugging messages, rather than error messages.
Replace kexec_dprintk() with pr_err(). This also makes the output method
for this error log align with the output method for other error logs in
this function.
Additionally, the last return statement in set_page_address() is
unnecessary, remove it.
Link: https://lkml.kernel.org/r/20231220030124.149160-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The expression `mstart + resource_size(res) - 1` is actually equivalent to
`res->end`, simplify the logic of this function to improve readability.
Link: https://lkml.kernel.org/r/20231212150506.31711-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Then when specifying '-d' for kexec_file_load interface, loaded locations
of kernel/initrd/cmdline etc can be printed out to help debug.
Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.
And also print out e820 memmap passed to 2nd kernel just as kexec_load
interface has been doing.
Link: https://lkml.kernel.org/r/20231213055747.61826-4-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The D/B size flag for the 32-bit percpu GDT entry was not set.
The Intel manual (vol 3, section 3.4.5) only specifies the meaning of
this flag for three cases:
1) code segments used for %cs -- doesn't apply here
2) stack segments used for %ss -- doesn't apply
3) expand-down data segments -- but we don't have the expand-down flag
set, so it also doesn't apply here
The flag likely doesn't do anything here, although the manual does also
say: "This flag should always be set to 1 for 32-bit code and data
segments [...]" so we should probably do it anyway.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-6-vegard.nossum@oracle.com
|
|
We have no known use for having the CPU track whether GDT descriptors
have been accessed or not.
Simplify the code by adding the flag to the common flags and removing
it everywhere else.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-5-vegard.nossum@oracle.com
|
|
Actually replace the numeric values by the new symbolic values.
I used this to find all the existing users of the GDT_ENTRY*() macros:
$ git grep -P 'GDT_ENTRY(_INIT)?\('
Some of the lines will exceed 80 characters, but some of them will be
shorter again in the next couple of patches.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-4-vegard.nossum@oracle.com
|
|
We'd like to replace all the magic numbers in various GDT descriptors
with new, semantically meaningful, symbolic values.
In order to be able to verify that the change doesn't cause any actual
changes to the compiled binary code, I've split the change into two
patches:
- Part 1 (this commit): everything _but_ actually replacing the numbers
- Part 2 (the following commit): _only_ replacing the numbers
The reason we need this split for verification is that including new
headers causes some spurious changes to the object files, mostly line
number changes in the debug info but occasionally other subtle codegen
changes.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-3-vegard.nossum@oracle.com
|
|
The recent fix to ignore invalid x2APIC entries inadvertently broke
systems with creative MADT APIC tables. The affected systems have APIC
MADT tables where all entries have invalid APIC IDs (0xFF), which means
they register exactly zero CPUs.
But the condition to ignore the entries of APIC IDs < 255 in the X2APIC
MADT table is solely based on the count of MADT APIC table entries.
As a consequence, the affected machines enumerate no secondary CPUs at
all because the APIC table has entries and therefore the X2APIC table
entries with APIC IDs < 255 are ignored.
Change the condition so that the APIC table preference for APIC IDs <
255 only becomes effective when the APIC table has valid APIC ID
entries.
IOW, an APIC table full of invalid APIC IDs is considered to be empty
which in consequence enables the X2APIC table entries with a APIC ID
< 255 and restores the expected behaviour.
Fixes: ec9aedb2aa1a ("x86/acpi: Ignore invalid x2APIC entries")
Reported-by: John Sperbeck <jsperbeck@google.com>
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/169953729188.3135.6804572126118798018.tip-bot2@tip-bot2
|
|
apply_alternatives() treats alternatives with the ALT_FLAG_NOT flag set
special as it optimizes the existing NOPs in place.
Unfortunately, this happens with interrupts enabled and does not provide any
form of core synchronization.
So an interrupt hitting in the middle of the update and using the affected code
path will observe a half updated NOP and crash and burn. The following
3 NOP sequence was observed to expose this crash halfway reliably under QEMU
32bit:
0x90 0x90 0x90
which is replaced by the optimized 3 byte NOP:
0x8d 0x76 0x00
So an interrupt can observe:
1) 0x90 0x90 0x90 nop nop nop
2) 0x8d 0x90 0x90 undefined
3) 0x8d 0x76 0x90 lea -0x70(%esi),%esi
4) 0x8d 0x76 0x00 lea 0x0(%esi),%esi
Where only #1 and #4 are true NOPs. The same problem exists for 64bit obviously.
Disable interrupts around this NOP optimization and invoke sync_core()
before re-enabling them.
Fixes: 270a69c4485d ("x86/alternative: Support relocations in alternatives")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
|
|
text_poke_early() does:
local_irq_save(flags);
memcpy(addr, opcode, len);
local_irq_restore(flags);
sync_core();
That's not really correct because the synchronization should happen before
interrupts are re-enabled to ensure that a pending interrupt observes the
complete update of the opcodes.
It's not entirely clear whether the interrupt entry provides enough
serialization already, but moving the sync_core() invocation into interrupt
disabled region does no harm and is obviously correct.
Fixes: 6fffacb30349 ("x86/alternatives, jumplabel: Use text_poke_early() before mm_init()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
|
|
Chris reported that a Dell PowerEdge T340 system stopped to boot when upgrading
to a kernel which contains the parallel hotplug changes. Disabling parallel
hotplug on the kernel command line makes it boot again.
It turns out that the Dell BIOS has x2APIC enabled and the boot CPU comes up in
X2APIC mode, but the APs come up inconsistently in xAPIC mode.
Parallel hotplug requires that the upcoming CPU reads out its APIC ID from the
local APIC in order to map it to the Linux CPU number.
In this particular case the readout on the APs uses the MMIO mapped registers
because the BIOS failed to enable x2APIC mode. That readout results in a page
fault because the kernel does not have the APIC MMIO space mapped when X2APIC
mode was enabled by the BIOS on the boot CPU and the kernel switched to X2APIC
mode early. That page fault can't be handled on the upcoming CPU that early and
results in a silent boot failure.
If parallel hotplug is disabled the system boots because in that case the APIC
ID read is not required as the Linux CPU number is provided to the AP in the
smpboot control word. When the kernel uses x2APIC mode then the APs are
switched to x2APIC mode too slightly later in the bringup process, but there is
no reason to do it that late.
Cure the BIOS bogosity by checking in the parallel bootup path whether the
kernel uses x2APIC mode and if so switching over the APs to x2APIC mode before
the APIC ID readout.
Fixes: 0c7ffa32dbd6 ("x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it")
Reported-by: Chris Lindee <chris.lindee@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Chris Lindee <chris.lindee@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/CA%2B2tU59853R49EaU_tyvOZuOTDdcU0RshGyydccp9R1NX9bEeQ@mail.gmail.com
|
|
Add an Intel specific hook into machine_check_poll() to keep track of
per-CPU, per-bank corrected error logs (with a stub for the
CONFIG_MCE_INTEL=n case).
When a storm is observed the rate of interrupts is reduced by setting
a large threshold value for this bank in IA32_MCi_CTL2. This bank is
added to the bitmap of banks for this CPU to poll. The polling rate is
increased to once per second.
When a storm ends reset the threshold in IA32_MCi_CTL2 back to 1, remove
the bank from the bitmap for polling, and change the polling rate back
to the default.
If a CPU with banks in storm mode is taken offline, the new CPU that
inherits ownership of those banks takes over management of storm(s) in
the inherited bank(s).
The cmci_discover() function was already very large. These changes
pushed it well over the top. Refactor with three helper functions to
bring it back under control.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115195450.12963-4-tony.luck@intel.com
|
|
This is the core functionality to track CMCI storms at the machine check
bank granularity. Subsequent patches will add the vendor specific hooks
to supply input to the storm detection and take actions on the start/end
of a storm.
machine_check_poll() is called both by the CMCI interrupt code, and for
periodic polls from a timer. Add a hook in this routine to maintain
a bitmap history for each bank showing whether the bank logged an
corrected error or not each time it is polled.
In normal operation the interval between polls of these banks determines
how far to shift the history. The 64 bit width corresponds to about one
second.
When a storm is observed a CPU vendor specific action is taken to reduce
or stop CMCI from the bank that is the source of the storm. The bank is
added to the bitmap of banks for this CPU to poll. The polling rate is
increased to once per second. During a storm each bit in the history
indicates the status of the bank each time it is polled. Thus the
history covers just over a minute.
Declare a storm for that bank if the number of corrected interrupts seen
in that history is above some threshold (defined as 5 in this series,
could be tuned later if there is data to suggest a better value).
A storm on a bank ends if enough consecutive polls of the bank show no
corrected errors (defined as 30, may also change). That calls the CPU
vendor specific function to revert to normal operational mode, and
changes the polling rate back to the default.
[ bp: Massage. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115195450.12963-3-tony.luck@intel.com
|
|
When a "storm" of corrected machine check interrupts (CMCI) is detected
this code mitigates by disabling CMCI interrupt signalling from all of
the banks owned by the CPU that saw the storm.
There are problems with this approach:
1) It is very coarse grained. In all likelihood only one of the banks
was generating the interrupts, but CMCI is disabled for all. This
means Linux may delay seeing and processing errors logged from other
banks.
2) Although CMCI stands for Corrected Machine Check Interrupt, it is
also used to signal when an uncorrected error is logged. This is
a problem because these errors should be handled in a timely manner.
Delete all this code in preparation for a finer grained solution.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20231115195450.12963-2-tony.luck@intel.com
|
|
There's no need to do it on every AP.
The C-bit value read on the BSP and also verified there, is used
everywhere from now on.
No functional changes - just a bit faster booting APs.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20231130132601.10317-1-bp@alien8.de
|
|
There is no need to use TESTL when checking the least-significant bit
with a TEST instruction. Use TESTB, which is three bytes shorter:
f6 05 00 00 00 00 01 testb $0x1,0x0(%rip)
vs:
f7 05 00 00 00 00 01 testl $0x1,0x0(%rip)
00 00 00
for the same effect.
No functional changes intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231109201032.4439-1-ubizjak@gmail.com
|
|
Add a synthetic feature flag specifically for first generation Zen
machines. There's need to have a generic flag for all Zen generations so
make X86_FEATURE_ZEN be that flag.
Fixes: 30fa92832f40 ("x86/CPU/AMD: Add ZenX generations flags")
Suggested-by: Brian Gerst <brgerst@gmail.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/dc3835e3-0731-4230-bbb9-336bbe3d042b@amd.com
|
|
Now that paravirt is using the alternatives patching infrastructure,
remove the paravirt patching code.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231210062138.2417-6-jgross@suse.com
|
|
Instead of stacking alternative and paravirt patching, use the new
ALT_FLAG_CALL flag to switch those mixed calls to pure alternative
handling.
Eliminate the need to be careful regarding the sequence of alternative
and paravirt patching.
[ bp: Touch up commit message. ]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231210062138.2417-5-jgross@suse.com
|
|
In order to prepare replacing of paravirt patching with alternative
patching, add the capability to replace an indirect call with a direct
one.
This is done via a new flag ALT_FLAG_CALL as the target of the CALL
instruction needs to be evaluated using the value of the location
addressed by the indirect call.
For convenience, add a macro for a default CALL instruction. In case it
is being used without the new flag being set, it will result in a BUG()
when being executed. As in most cases, the feature used will be
X86_FEATURE_ALWAYS so add another macro ALT_CALL_ALWAYS usable for the
flags parameter of the ALTERNATIVE macros.
For a complete replacement, handle the special cases of calling a nop
function and an indirect call of NULL the same way as paravirt does.
[ bp: Massage commit message, fixup the debug output and clarify flow
more. ]
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231210062138.2417-4-jgross@suse.com
|
|
As a preparation for replacing paravirt patching completely by
alternative patching, move some backend functions and #defines to
the alternatives code and header.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231129133332.31043-3-jgross@suse.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add a forgotten CPU vendor check in the AMD microcode post-loading
callback so that the callback runs only on AMD
- Make sure SEV-ES protocol negotiation happens only once and on the
BSP
* tag 'x86_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Check vendor in the AMD microcode callback
x86/sev: Fix kernel crash due to late update to read-only ghcb_version
|
|
There is no real reason to have a separate ASM entry point implementation
for the legacy INT 0x80 syscall emulation on 64-bit.
IDTENTRY provides all the functionality needed with the only difference
that it does not:
- save the syscall number (AX) into pt_regs::orig_ax
- set pt_regs::ax to -ENOSYS
Both can be done safely in the C code of an IDTENTRY before invoking any of
the syscall related functions which depend on this convention.
Aside of ASM code reduction this prepares for detecting and handling a
local APIC injected vector 0x80.
[ kirill.shutemov: More verbose comments ]
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org> # v6.0+
|
|
This was meant to be done only when early microcode got updated
successfully. Move it into the if-branch.
Also, make sure the current revision is read unconditionally and only
once.
Fixes: 080990aa3344 ("x86/microcode: Rework early revisions reporting")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/ZWjVt5dNRjbcvlzR@a4bf019067fa.jf.intel.com
|
|
Commit in Fixes added an AMD-specific microcode callback. However, it
didn't check the CPU vendor the kernel runs on explicitly.
The only reason the Zenbleed check in it didn't run on other x86 vendors
hardware was pure coincidental luck:
if (!cpu_has_amd_erratum(c, amd_zenbleed))
return;
gives true on other vendors because they don't have those families and
models.
However, with the removal of the cpu_has_amd_erratum() in
05f5f73936fa ("x86/CPU/AMD: Drop now unused CPU erratum checking function")
that coincidental condition is gone, leading to the zenbleed check
getting executed on other vendors too.
Add the explicit vendor check for the whole callback as it should've
been done in the first place.
Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")
Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231201184226.16749-1-bp@alien8.de
|
|
After successful update, the late loading routine prints an update
summary similar to:
microcode: load: updated on 128 primary CPUs with 128 siblings
microcode: revision: 0x21000170 -> 0x21000190
Remove the redundant message in the Intel side of the driver.
[ bp: Massage commit message. ]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/ZWjYhedNfhAUmt0k@a4bf019067fa.jf.intel.com
|
|
A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.
Trying to write it results in a pagefault:
BUG: unable to handle page fault for address: ffffffffba556e70
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
[ ...]
Call Trace:
<TASK>
? __die_body.cold+0x1a/0x1f
? __die+0x2a/0x35
? page_fault_oops+0x10c/0x270
? setup_ghcb+0x71/0x100
? __x86_return_thunk+0x5/0x6
? search_exception_tables+0x60/0x70
? __x86_return_thunk+0x5/0x6
? fixup_exception+0x27/0x320
? kernelmode_fixup_or_oops+0xa2/0x120
? __bad_area_nosemaphore+0x16a/0x1b0
? kernel_exc_vmm_communication+0x60/0xb0
? bad_area_nosemaphore+0x16/0x20
? do_kern_addr_fault+0x7a/0x90
? exc_page_fault+0xbd/0x160
? asm_exc_page_fault+0x27/0x30
? setup_ghcb+0x71/0x100
? setup_ghcb+0xe/0x100
cpu_init_exception_handling+0x1b9/0x1f0
The fix is to call sev_es_negotiate_protocol() only in the BSP boot phase,
and it only needs to be done once in any case.
[ mingo: Refined the changelog. ]
Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/1701254429-18250-1-git-send-email-kashwindayan@vmware.com
|
|
Bye bye.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-14-bp@alien8.de
|
|
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-13-bp@alien8.de
|
|
Setting X86_BUG_AMD_E400 in init_amd() is early enough.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-12-bp@alien8.de
|