summaryrefslogtreecommitdiff
path: root/drivers/base/cpu.c
AgeCommit message (Collapse)Author
2024-03-21Merge tag 'driver-core-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core and kernfs changes for 6.9-rc1. Nothing all that crazy here, just some good updates that include: - automatic attribute group hiding from Dan Williams (he fixed up my horrible attempt at doing this.) - kobject lock contention fixes from Eric Dumazet - driver core cleanups from Andy - kernfs rcu work from Tejun - fw_devlink changes to resolve some reported issues - other minor changes, all details in the shortlog All of these have been in linux-next for a long time with no reported issues" * tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits) device: core: Log warning for devices pending deferred probe on timeout driver: core: Use dev_* instead of pr_* so device metadata is added driver: core: Log probe failure as error and with device metadata of: property: fw_devlink: Add support for "post-init-providers" property driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link driver core: Adds flags param to fwnode_link_add() debugfs: fix wait/cancellation handling during remove device property: Don't use "proxy" headers device property: Move enum dev_dma_attr to fwnode.h driver core: Move fw_devlink stuff to where it belongs driver core: Drop unneeded 'extern' keyword in fwnode.h firmware_loader: Suppress warning on FW_OPT_NO_WARN flag sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group. firmware_loader: introduce __free() cleanup hanler platform-msi: Remove usage of the deprecated ida_simple_xx() API sysfs: Introduce DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE() sysfs: Document new "group visible" helpers sysfs: Fix crash on empty group attributes array sysfs: Introduce a mechanism to hide static attribute_groups sysfs: Introduce a mechanism to hide static attribute_groups ...
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-11x86/rfds: Mitigate Register File Data Sampling (RFDS)Pawan Gupta
RFDS is a CPU vulnerability that may allow userspace to infer kernel stale data previously used in floating point registers, vector registers and integer registers. RFDS only affects certain Intel Atom processors. Intel released a microcode update that uses VERW instruction to clear the affected CPU buffers. Unlike MDS, none of the affected cores support SMT. Add RFDS bug infrastructure and enable the VERW based mitigation by default, that clears the affected buffers just before exiting to userspace. Also add sysfs reporting and cmdline parameter "reg_file_data_sampling" to control the mitigation. For details see: Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-02-23crash: split crash dumping code out from kexec_core.cBaoquan He
Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes need be built in to avoid compiling error when building kexec code even though the crash dumping functionality is not enabled. E.g -------------------- CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_KEXEC=y CONFIG_KEXEC_FILE=y --------------------- After splitting out crashkernel reservation code and vmcoreinfo exporting code, there's only crash related code left in kernel/crash_core.c. Now move crash related codes from kexec_core.c to crash_core.c and only build it in when CONFIG_CRASH_DUMP=y. And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope, or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP ifdef in generic kernel files. With these changes, crash_core codes are abstracted from kexec codes and can be disabled at all if only kexec reboot feature is wanted. Link: https://lkml.kernel.org/r/20240124051254.67105-5-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Michael Kelley <mhklinux@outlook.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-30driver core: cpu: make cpu_subsys constGreg Kroah-Hartman
Now that the driver core can properly handle constant struct bus_type, move the cpu_subsys variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/2024010548-crane-snooze-a871@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-18Merge tag 'driver-core-6.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are the set of driver core and kernfs changes for 6.8-rc1. Nothing major in here this release cycle, just lots of small cleanups and some tweaks on kernfs that in the very end, got reverted and will come back in a safer way next release cycle. Included in here are: - more driver core 'const' cleanups and fixes - fw_devlink=rpm is now the default behavior - kernfs tiny changes to remove some string functions - cpu handling in the driver core is updated to work better on many systems that add topologies and cpus after booting - other minor changes and cleanups All of the cpu handling patches have been acked by the respective maintainers and are coming in here in one series. Everything has been in linux-next for a while with no reported issues" * tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits) Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" kernfs: convert kernfs_idr_lock to an irq safe raw spinlock class: fix use-after-free in class_register() PM: clk: make pm_clk_add_notifier() take a const pointer EDAC: constantify the struct bus_type usage kernfs: fix reference to renamed function driver core: device.h: fix Excess kernel-doc description warning driver core: class: fix Excess kernel-doc description warning driver core: mark remaining local bus_type variables as const driver core: container: make container_subsys const driver core: bus: constantify subsys_register() calls driver core: bus: make bus_sort_breadthfirst() take a const pointer kernfs: d_obtain_alias(NULL) will do the right thing... driver core: Better advertise dev_err_probe() kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy() kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy() kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy() initramfs: Expose retained initrd as sysfs file fs/kernfs/dir: obey S_ISGID kernel/cgroup: use kernfs_create_dir_ns() ...
2023-12-06drivers/base/cpu: crash data showing should depends on KEXEC_COREBaoquan He
After commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes"), on x86_64, if only below kernel configs related to kdump are set, compiling error are triggered. ---- CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y CONFIG_CRASH_HOTPLUG=y ------ ------------------------------------------------------ drivers/base/cpu.c: In function `crash_hotplug_show': drivers/base/cpu.c:309:40: error: implicit declaration of function `crash_hotplug_cpu_support'; did you mean `crash_hotplug_show'? [-Werror=implicit-function-declaration] 309 | return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support()); | ^~~~~~~~~~~~~~~~~~~~~~~~~ | crash_hotplug_show cc1: some warnings being treated as errors ------------------------------------------------------ CONFIG_KEXEC is used to enable kexec_load interface, the crash_notes/crash_notes_size/crash_hotplug showing depends on CONFIG_KEXEC is incorrect. It should depend on KEXEC_CORE instead. Fix it now. Link: https://lkml.kernel.org/r/20231128055248.659808-1-bhe@redhat.com Fixes: 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes") Signed-off-by: Baoquan He <bhe@redhat.com> Tested-by: Ignat Korchagin <ignat@cloudflare.com> [compile-time only] Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06drivers: base: Print a warning instead of panic() when register_cpu() failsJames Morse
loongarch, mips, parisc, riscv and sh all print a warning if register_cpu() returns an error. Architectures that use GENERIC_CPU_DEVICES call panic() instead. Errors in this path indicate something is wrong with the firmware description of the platform, but the kernel is able to keep running. Downgrade this to a warning to make it easier to debug this issue. This will allow architectures that switching over to GENERIC_CPU_DEVICES to drop their warning, but keep the existing behaviour. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1r5R3W-00CszU-GM@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06drivers: base: add arch_cpu_is_hotpluggable()Russell King (Oracle)
The differences between architecture specific implementations of arch_register_cpu() are down to whether the CPU is hotpluggable or not. Rather than overriding the weak version of arch_register_cpu(), provide a function that can be used to provide this detail instead. Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1r5R3M-00CszH-6r@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06drivers: base: Implement weak arch_unregister_cpu()James Morse
Add arch_unregister_cpu() to allow the ACPI machinery to call unregister_cpu(). This is enough for arm64, riscv and loongarch, but needs to be overridden by x86 and ia64 who need to do more work. CC: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1r5R3H-00CszC-2n@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overriddenJames Morse
Architectures often have extra per-cpu work that needs doing before a CPU is registered, often to determine if a CPU is hotpluggable. To allow the ACPI architectures to use GENERIC_CPU_DEVICES, move the cpu_register() call into arch_register_cpu(), which is made __weak so architectures with extra work can override it. This aligns with the way x86, ia64 and loongarch register hotplug CPUs when they become present. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1r5R3B-00Csz6-Uh@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06drivers: base: Use present CPUs in GENERIC_CPU_DEVICESJames Morse
Three of the five ACPI architectures create sysfs entries using register_cpu() for present CPUs, whereas arm64, riscv and all GENERIC_CPU_DEVICES do this for possible CPUs. Registering a CPU is what causes them to show up in sysfs. It makes very little sense to register all possible CPUs. Registering a CPU is what triggers the udev notifications allowing user-space to react to newly added CPUs. To allow all five ACPI architectures to use GENERIC_CPU_DEVICES, change it to use for_each_present_cpu(). Making the ACPI architectures use GENERIC_CPU_DEVICES is a pre-requisite step to centralise their register_cpu() logic, before moving it into the ACPI processor driver. When we add support for register CPUs from ACPI in a later patch, we will avoid registering CPUs in this path. Of the ACPI architectures that register possible CPUs, arm64 and riscv do not support making possible CPUs present as they use the weak 'always fails' version of arch_register_cpu(). Only two of the eight architectures that use GENERIC_CPU_DEVICES have a distinction between present and possible CPUs. The following architectures use GENERIC_CPU_DEVICES but are not SMP, so possible == present: * m68k * microblaze * nios2 The following architectures use GENERIC_CPU_DEVICES and consider possible == present: * csky: setup_smp() * processor_probe() sets possible for all CPUs and present for all CPUs except the boot cpu, which will have been done by init/main.c::start_kernel(). um appears to be a subarchitecture of x86. The remaining architecture using GENERIC_CPU_DEVICES are: * openrisc and hexagon: where smp_init_cpus() makes all CPUs < NR_CPUS possible, whereas smp_prepare_cpus() only makes CPUs < setup_max_cpus present. After this change, openrisc and hexagon systems that use the max_cpus command line argument would not see the other CPUs present in sysfs. This should not be a problem as these CPUs can't be brought online as _cpu_up() checks cpu_present(). After this change, only CPUs which are present appear in sysfs. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1r5R36-00Csz0-Px@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-01Merge tag 'driver-core-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core updates and additions for 6.6-rc1. Included in here are: - stable kernel documentation updates - class structure const work from Ivan on various subsystems - kernfs tweaks - driver core tests! - kobject sanity cleanups - kobject structure reordering to save space - driver core error code handling fixups - other minor driver core cleanups All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits) driver core: Call in reversed order in device_platform_notify_remove() driver core: Return proper error code when dev_set_name() fails kobject: Remove redundant checks for whether ktype is NULL kobject: Add sanity check for kset->kobj.ktype in kset_register() drivers: base: test: Add missing MODULE_* macros to root device tests drivers: base: test: Add missing MODULE_* macros for platform devices tests drivers: base: Free devm resources when unregistering a device drivers: base: Add basic devm tests for platform devices drivers: base: Add basic devm tests for root devices kernfs: fix missing kernfs_iattr_rwsem locking docs: stable-kernel-rules: mention that regressions must be prevented docs: stable-kernel-rules: fine-tune various details docs: stable-kernel-rules: make the examples for option 1 a proper list docs: stable-kernel-rules: move text around to improve flow docs: stable-kernel-rules: improve structure by changing headlines base/node: Remove duplicated include kernfs: attach uuid for every kernfs and report it in fsid kernfs: add stub helper for kernfs_generic_poll() x86/resctrl: make pseudo_lock_class a static const structure x86/MSR: make msr_class a static const structure ...
2023-08-29Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-24crash: memory and CPU hotplug sysfs attributesEric DeVolder
Introduce the crash_hotplug attribute for memory and CPUs for use by userspace. These attributes directly facilitate the udev rule for managing userspace re-loading of the crash kernel upon hot un/plug changes. For memory, expose the crash_hotplug attribute to the /sys/devices/system/memory directory. For example: # udevadm info --attribute-walk /sys/devices/system/memory/memory81 looking at device '/devices/system/memory/memory81': KERNEL=="memory81" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="00000051" ATTR{removable}=="1" ATTR{state}=="online" ATTR{valid_zones}=="Movable" looking at parent device '/devices/system/memory': KERNELS=="memory" SUBSYSTEMS=="" DRIVERS=="" ATTRS{auto_online_blocks}=="offline" ATTRS{block_size_bytes}=="8000000" ATTRS{crash_hotplug}=="1" For CPUs, expose the crash_hotplug attribute to the /sys/devices/system/cpu directory. For example: # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0 looking at device '/devices/system/cpu/cpu0': KERNEL=="cpu0" SUBSYSTEM=="cpu" DRIVER=="processor" ATTR{crash_notes}=="277c38600" ATTR{crash_notes_size}=="368" ATTR{online}=="1" looking at parent device '/devices/system/cpu': KERNELS=="cpu" SUBSYSTEMS=="" DRIVERS=="" ATTRS{crash_hotplug}=="1" ATTRS{isolated}=="" ATTRS{kernel_max}=="8191" ATTRS{nohz_full}==" (null)" ATTRS{offline}=="4-7" ATTRS{online}=="0-3" ATTRS{possible}=="0-7" ATTRS{present}=="0-3" With these sysfs attributes in place, it is possible to efficiently instruct the udev rule to skip crash kernel reloading for kernels configured with crash hotplug support. For example, the following is the proposed udev rule change for RHEL system 98-kexec.rules (as the first lines of the rule file): # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" When examined in the context of 98-kexec.rules, the above rules test if crash_hotplug is set, and if so, the userspace initiated unload-then-reload of the crash kernel is skipped. CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options. If an architecture supports, for example, memory hotplug but not CPU hotplug, then the /sys/devices/system/memory/crash_hotplug attribute file is present, but the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be present. Thus the udev rule skips userspace processing of memory hot un/plug events, but the udev rule will evaluate false for CPU events, thus allowing userspace to process CPU hot un/plug events (ie the unload-then-reload of the kdump capture kernel). Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-11driver core: cpu: Fix the fallback cpu_show_gds() nameBorislav Petkov (AMD)
In 6524c798b727 ("driver core: cpu: Make cpu_show_not_affected() static") I fat-fingered the name of cpu_show_gds(). Usually, I'd rebase but since those are extraordinary embargoed times, the commit above was already pulled into another tree so no no. Therefore, fix it ontop. Fixes: 6524c798b727 ("driver core: cpu: Make cpu_show_not_affected() static") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230811095831.27513-1-bp@alien8.de
2023-08-10driver core: cpu: Make cpu_show_not_affected() staticBorislav Petkov (AMD)
Fix a -Wmissing-prototypes warning and add the gather_data_sampling() stub macro call for real. Fixes: 0fddfe338210 ("driver core: cpu: Unify redundant silly stubs") Closes: https://lore.kernel.org/oe-kbuild-all/202308101956.oRj1ls7s-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/202308101956.oRj1ls7s-lkp@intel.com
2023-08-10driver core: cpu: Unify redundant silly stubsBorislav Petkov (AMD)
Make them all a weak function, aliasing to a single function which issues the "Not affected" string. No functional changes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20230809102700.29449-3-bp@alien8.de
2023-08-07Merge tag 'gds-for-linus-2023-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/gds fixes from Dave Hansen: "Mitigate Gather Data Sampling issue: - Add Base GDS mitigation - Support GDS_NO under KVM - Fix a documentation typo" * tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Fix backwards on/off logic about YMM support KVM: Add GDS_NO support to KVM x86/speculation: Add Kconfig option for GDS x86/speculation: Add force option to GDS mitigation x86/speculation: Add Gather Data Sampling mitigation
2023-08-05driver/base/cpu: Retry online operation if -EBUSYWaiman Long
Booting the kernel with "maxcpus=1" is a common technique for CPU partitioning and isolation. It delays the CPU bringup process until when the bootup scripts are ready to bring CPUs online by writing 1 to /sys/device/system/cpu/cpu<X>/online. However, it was found that not all the CPUs were online after bootup. The collection of offline CPUs are different after every reboot. Further investigation reveals that some "online" write operations fail with an -EBUSY error. This error is returned when CPU hotplug is temporiarly disabled when cpu_hotplug_disable() is called. During bootup, the main caller of cpu_hotplug_disable() is pci_call_probe() for PCI device initialization. By measuring the times spent with cpu_hotplug_disabled set in a typical 2-socket server, most of them last less than 10ms. However, there are a few that can last hundreds of ms. Note that the cpu_hotplug_disabled period of different devices can overlap leading to longer cpu_hotplug_disabled hold time. Since the CPU hotplug disable condition is transient and it is not that easy to modify all the existing bootup scripts to handle this condition, the kernel can help by retrying the online operation when an -EBUSY error is returned. This patch retries the online operation in cpu_subsys_online() when an -EBUSY error is returned for up to 5 times after an exponentially increasing delay that can last a total of at least 620ms of waiting time by calling msleep(). With this patch in place, booting up the patched kernel with "maxcpus=1" does not leave any CPU in an offline state in 10 reboot attempts. Reported-by: Vishal Agrawal <vagrawal@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20230724143826.3996163-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27x86/srso: Add a Speculative RAS Overflow mitigationBorislav Petkov (AMD)
Add a mitigation for the speculative return address stack overflow vulnerability found on AMD processors. The mitigation works by ensuring all RET instructions speculate to a controlled location, similar to how speculation is controlled in the retpoline sequence. To accomplish this, the __x86_return_thunk forces the CPU to mispredict every function return using a 'safe return' sequence. To ensure the safety of this mitigation, the kernel must ensure that the safe return sequence is itself free from attacker interference. In Zen3 and Zen4, this is accomplished by creating a BTB alias between the untraining function srso_untrain_ret_alias() and the safe return function srso_safe_ret_alias() which results in evicting a potentially poisoned BTB entry and using that safe one for all function returns. In older Zen1 and Zen2, this is accomplished using a reinterpretation technique similar to Retbleed one: srso_untrain_ret() and srso_safe_ret(). Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-19x86/speculation: Add Gather Data Sampling mitigationDaniel Sneddon
Gather Data Sampling (GDS) is a hardware vulnerability which allows unprivileged speculative access to data which was previously stored in vector registers. Intel processors that support AVX2 and AVX512 have gather instructions that fetch non-contiguous data elements from memory. On vulnerable hardware, when a gather instruction is transiently executed and encounters a fault, stale data from architectural or internal vector registers may get transiently stored to the destination vector register allowing an attacker to infer the stale data using typical side channel techniques like cache timing attacks. This mitigation is different from many earlier ones for two reasons. First, it is enabled by default and a bit must be set to *DISABLE* it. This is the opposite of normal mitigation polarity. This means GDS can be mitigated simply by updating microcode and leaving the new control bit alone. Second, GDS has a "lock" bit. This lock bit is there because the mitigation affects the hardware security features KeyLocker and SGX. It needs to be enabled and *STAY* enabled for these features to be mitigated against GDS. The mitigation is enabled in the microcode by default. Disable it by setting gather_data_sampling=off or by disabling all mitigations with mitigations=off. The mitigation status can be checked by reading: /sys/devices/system/cpu/vulnerabilities/gather_data_sampling Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-04-05tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystemJoel Fernandes (Google)
For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined. However, cpu_is_hotpluggable() still returns true for those CPUs. This causes torture tests that do offlining to end up trying to offline this CPU causing test failures. Such failure happens on all architectures. Fix the repeated error messages thrown by this (even if the hotplug errors are harmless) by asking the opinion of the nohz subsystem on whether the CPU can be hotplugged. [ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ] For drivers/base/ portion: Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: rcu <rcu@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: 2987557f52b9 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-02-11driver core: cpu: don't hand-override the uevent bus_type callback.Greg Kroah-Hartman
Instead of having to change the uevent bus_type callback by hand at runtime, set it at build time based on the build configuration options, making this much simpler to maintain and understand (and allow to make the structure constant.) Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230210102408.1083177-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-10driver core: bus: add bus_get_dev_root() functionGreg Kroah-Hartman
Instead of poking around in the struct bus_type directly for the dev_root pointer, provide a function to return it properly reference counted, if it is present in the bus. This will be needed to move the pointer out of struct bus_type in the future. Use the function in the driver core code at the same time it is introduced to verify that it works properly. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230209093556.19132-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-27driver core: make struct bus_type.uevent() take a const *Greg Kroah-Hartman
The uevent() callback in struct bus_type should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-27x86/bugs: Report AMD retbleed vulnerabilityAlexandre Chartre
Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary Speculative Code Execution with Return Instructions) attack. [peterz: add hygon] [kim: invert parity; fam15h] Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-21x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale DataPawan Gupta
Add the sysfs reporting file for Processor MMIO Stale Data vulnerability. It exposes the vulnerability and mitigation state similar to the existing files for the other hardware vulnerabilities. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-02-16sched/isolation: Use single feature type while referring to housekeeping cpumaskFrederic Weisbecker
Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
2021-07-21driver: base: Prefer unsigned int to bare use of unsignedJinchao Wang
Fix checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Signed-off-by: Jinchao Wang <wjc@cdjrlc.com> Link: https://lore.kernel.org/r/20210628171907.63646-2-wjc@cdjrlc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-04drivers/base: Constify static attribute_group structsRikard Falkeborn
These are only used by putting their address in an array of pointers to const struct attribute_group (either directly or via the __ATTRIBUTE_GROUP macro). Make them const to allow the compiler to place them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Link: https://lore.kernel.org/r/20210528213408.20067-1-rikard.falkeborn@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23drivers/base/cpu: remove redundant assignment of variable retvalColin Ian King
The variable retval is being initialized with a value that is never read and it is being updated later with a new value. Clean this up by initializing retval to -ENOMEM and remove the assignment to retval on the !dev failure path. Kudos to Rafael for the improved fix suggestion. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210218202837.516231-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02drivers core: Miscellaneous changes for sysfs_emitJoe Perches
Change additional instances that could use sysfs_emit and sysfs_emit_at that the coccinelle script could not convert. o macros creating show functions with ## concatenation o unbound sprintf uses with buf+len for start of output to sysfs_emit_at o returns with ?: tests and sprintf to sysfs_emit o sysfs output with struct class * not struct device * arguments Miscellanea: o remove unnecessary initializations around these changes o consistently use int len for return length of show functions o use octal permissions and not S_<FOO> o rename a few show function names so DEVICE_ATTR_<FOO> can be used o use DEVICE_ATTR_ADMIN_RO where appropriate o consistently use const char *output for strings o checkpatch/style neatening Signed-off-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/8bc24444fe2049a9b2de6127389b57edfdfe324d.1600285923.git.joe@perches.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functionsJoe Perches
Convert the various sprintf fmaily calls in sysfs device show functions to sysfs_emit and sysfs_emit_at for PAGE_SIZE buffer safety. Done with: $ spatch -sp-file sysfs_emit_dev.cocci --in-place --max-width=80 . And cocci script: $ cat sysfs_emit_dev.cocci @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Signed-off-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/3d033c33056d88bbe34d4ddb62afd05ee166ab9a.1600285923.git.joe@perches.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-20x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigationMark Gross
SRBDS is an MDS-like speculative side channel that can leak bits from the random number generator (RNG) across cores and threads. New microcode serializes the processor access during the execution of RDRAND and RDSEED. This ensures that the shared buffer is overwritten before it is released for reuse. While it is present on all affected CPU models, the microcode mitigation is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the cases where TSX is not supported or has been disabled with TSX_CTRL. The mitigation is activated by default on affected processors and it increases latency for RDRAND and RDSEED instructions. Among other effects this will reduce throughput from /dev/urandom. * Enable administrator to configure the mitigation off when desired using either mitigations=off or srbds=off. * Export vulnerability status via sysfs * Rename file-scoped macros to apply for non-whitelist table initializations. [ bp: Massage, - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g, - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in, - flip check in cpu_set_bug_bits() to save an indentation level, - reflow comments. jpoimboe: s/Mitigated/Mitigation/ in user-visible strings tglx: Dropped the fused off magic for now ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
2020-03-30Merge tag 'smp-core-2020-03-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core SMP updates from Thomas Gleixner: "CPU (hotplug) updates: - Support for locked CSD objects in smp_call_function_single_async() which allows to simplify callsites in the scheduler core and MIPS - Treewide consolidation of CPU hotplug functions which ensures the consistency between the sysfs interface and kernel state. The low level functions cpu_up/down() are now confined to the core code and not longer accessible from random code" * tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus() cpu/hotplug: Hide cpu_up/down() cpu/hotplug: Move bringup of secondary CPUs out of smp_init() torture: Replace cpu_up/down() with add/remove_cpu() firmware: psci: Replace cpu_up/down() with add/remove_cpu() xen/cpuhotplug: Replace cpu_up/down() with device_online/offline() parisc: Replace cpu_up/down() with add/remove_cpu() sparc: Replace cpu_up/down() with add/remove_cpu() powerpc: Replace cpu_up/down() with add/remove_cpu() x86/smp: Replace cpu_up/down() with add/remove_cpu() arm64: hibernate: Use bringup_hibernate_cpu() cpu/hotplug: Provide bringup_hibernate_cpu() arm64: Use reboot_cpu instead of hardconding it to 0 arm64: Don't use disable_nonboot_cpus() ARM: Use reboot_cpu instead of hardcoding it to 0 ARM: Don't use disable_nonboot_cpus() ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus() cpu/hotplug: Create a new function to shutdown nonboot cpus cpu/hotplug: Add new {add,remove}_cpu() functions sched/core: Remove rq.hrtick_csd_pending ...
2020-03-25cpu/hotplug: Hide cpu_up/down()Qais Yousef
Use separate functions for the device core to bring a CPU up and down. Users outside the device core must use add/remove_cpu() which will take care of extra housekeeping work like keeping sysfs in sync. Make cpu_up/down() static and replace the extra layer of indirection. [ tglx: Removed the extra wrapper functions and adjusted function names ] Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com
2020-03-11drivers/base/cpu: Simplify s*nprintf() usagesTakashi Iwai
Use the simpler sprintf() instead of snprintf() or scnprintf() in a single-shot sysfs output callbacks where you are very sure that it won't go over PAGE_SIZE buffer limit. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20200311080207.12046-3-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11drivers/base/cpu: Use scnprintf() for avoiding potential buffer overflowTakashi Iwai
Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20200311080207.12046-2-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-04x86/bugs: Add ITLB_MULTIHIT bug infrastructureVineela Tummalapalli
Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-10-28x86/speculation/taa: Add sysfs reporting for TSX Async AbortPawan Gupta
Add the sysfs reporting file for TSX Async Abort. It exposes the vulnerability and the mitigation state similar to the existing files for the other hardware vulnerabilities. Sysfs file path is: /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-05-14Merge branch 'x86-mds-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MDS mitigations from Thomas Gleixner: "Microarchitectural Data Sampling (MDS) is a hardware vulnerability which allows unprivileged speculative access to data which is available in various CPU internal buffers. This new set of misfeatures has the following CVEs assigned: CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory MDS attacks target microarchitectural buffers which speculatively forward data under certain conditions. Disclosure gadgets can expose this data via cache side channels. Contrary to other speculation based vulnerabilities the MDS vulnerability does not allow the attacker to control the memory target address. As a consequence the attacks are purely sampling based, but as demonstrated with the TLBleed attack samples can be postprocessed successfully. The mitigation is to flush the microarchitectural buffers on return to user space and before entering a VM. It's bolted on the VERW instruction and requires a microcode update. As some of the attacks exploit data structures shared between hyperthreads, full protection requires to disable hyperthreading. The kernel does not do that by default to avoid breaking unattended updates. The mitigation set comes with documentation for administrators and a deeper technical view" * 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/speculation/mds: Fix documentation typo Documentation: Correct the possible MDS sysfs values x86/mds: Add MDSUM variant to the MDS documentation x86/speculation/mds: Add 'mitigations=' support for MDS x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off x86/speculation/mds: Fix comment x86/speculation/mds: Add SMT warning message x86/speculation: Move arch_smt_update() call to after mitigation decisions x86/speculation/mds: Add mds=full,nosmt cmdline option Documentation: Add MDS vulnerability documentation Documentation: Move L1TF to separate directory x86/speculation/mds: Add mitigation mode VMWERV x86/speculation/mds: Add sysfs reporting for MDS x86/speculation/mds: Add mitigation control for MDS x86/speculation/mds: Conditionally clear CPU buffers on idle entry x86/kvm/vmx: Add MDS protection when L1D Flush is not active x86/speculation/mds: Clear CPU buffers on exit to user x86/speculation/mds: Add mds_clear_cpu_buffers() x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests x86/speculation/mds: Add BUG_MSBDS_ONLY ...
2019-03-06Merge tag 'driver-core-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...
2019-03-06x86/speculation/mds: Add sysfs reporting for MDSThomas Gleixner
Add the sysfs reporting file for MDS. It exposes the vulnerability and mitigation state similar to the existing files for the other speculative hardware vulnerabilities. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com>
2019-02-19PM / core: Add support to skip power management in device/driver modelSudeep Holla
All device objects in the driver model contain fields that control the handling of various power management activities. However, it's not always useful. There are few instances where pseudo devices are added to the model just to take advantage of many other features like kobjects, udev events, and so on. One such example is cpu devices and their caches. The sysfs for the cpu caches are managed by adding devices with cpu as the parent in cpu_device_create() when secondary cpu is brought online. Generally when the secondary CPUs are hotplugged back in as part of resume from suspend-to-ram, we call cpu_device_create() from the cpu hotplug state machine while the cpu device associated with that CPU is not yet ready to be resumed as the device_resume() call happens bit later. It's not really needed to set the flag is_prepared for cpu devices as they are mostly pseudo device and hotplug framework deals with state machine and not managed through the cpu device. This often results in annoying warning when resuming: Enabling non-boot CPUs ... CPU1: Booted secondary processor cache: parent cpu1 should not be sleeping CPU1 is up CPU2: Booted secondary processor cache: parent cpu2 should not be sleeping CPU2 is up .... and so on. So in order to fix these kind of errors, we could just completely avoid doing any power management related initialisations and operations if they are not used by these devices. Add no_pm flags to indicate that the device doesn't require any sort of PM activities and all of them can be completely skipped. We can use the same flag to also avoid adding not used *power* sysfs entries for these devices. For now, lets use this for cpu cache devices. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-31drivers: base: Use __printf markup to silence compilerMathieu Malaterre
Silence warnings (triggered at W=1) by adding relevant __printf attributes. drivers/base/cpu.c:432:2: warning: function '__cpu_device_create' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-20x86/speculation/l1tf: Add sysfs reporting for l1tfAndi Kleen
L1TF core kernel workarounds are cheap and normally always enabled, However they still should be reported in sysfs if the system is vulnerable or mitigated. Add the necessary CPU feature/bug bits. - Extend the existing checks for Meltdowns to determine if the system is vulnerable. All CPUs which are not vulnerable to Meltdown are also not vulnerable to L1TF - Check for 32bit non PAE and emit a warning as there is no practical way for mitigation due to the limited physical address bits - If the system has more than MAX_PA/2 physical memory the invert page workarounds don't protect the system against the L1TF attack anymore, because an inverted physical address will also point to valid memory. Print a warning in this case and report that the system is vulnerable. Add a function which returns the PFN limit for the L1TF mitigation, which will be used in follow up patches for sanity and range checks. [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-05-03x86/bugs: Expose /sys/../spec_store_bypassKonrad Rzeszutek Wilk
Add the sysfs file for the new vulerability. It does not do much except show the words 'Vulnerable' for recent x86 cores. Intel cores prior to family 6 are known not to be vulnerable, and so are some Atoms and some Xeon Phi. It assumes that older Cyrix, Centaur, etc. cores are immune. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-03-15driver core: cpu: use put_device() if device_register failArvind Yadav
if device_register() returned an error! Always use put_device() to give up the reference initialized. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-01Merge tag 'driver-core-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of "big" driver core patches for 4.16-rc1. The majority of the work here is in the firmware subsystem, with reworks to try to attempt to make the code easier to handle in the long run, but no functional change. There's also some tree-wide sysfs attribute fixups with lots of acks from the various subsystem maintainers, as well as a handful of other normal fixes and changes. And finally, some license cleanups for the driver core and sysfs code. All have been in linux-next for a while with no reported issues" * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits) device property: Define type of PROPERTY_ENRTY_*() macros device property: Reuse property_entry_free_data() device property: Move property_entry_free_data() upper firmware: Fix up docs referring to FIRMWARE_IN_KERNEL firmware: Drop FIRMWARE_IN_KERNEL Kconfig option USB: serial: keyspan: Drop firmware Kconfig options sysfs: remove DEBUG defines sysfs: use SPDX identifiers drivers: base: add coredump driver ops sysfs: add attribute specification for /sysfs/devices/.../coredump test_firmware: fix missing unlock on error in config_num_requests_store() test_firmware: make local symbol test_fw_config static sysfs: turn WARN() into pr_warn() firmware: Fix a typo in fallback-mechanisms.rst treewide: Use DEVICE_ATTR_WO treewide: Use DEVICE_ATTR_RO treewide: Use DEVICE_ATTR_RW sysfs.h: Use octal permissions component: add debugfs support bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate ...