summaryrefslogtreecommitdiff
path: root/drivers/base
AgeCommit message (Collapse)Author
2022-04-28drivers/base/node.c: fix compaction sysfs file leakMiaohe Lin
Compaction sysfs file is created via compaction_register_node in register_node. But we forgot to remove it in unregister_node. Thus compaction sysfs file is leaked. Using compaction_unregister_node to fix this issue. Link: https://lkml.kernel.org/r/20220401070905.43679-1-linmiaohe@huawei.com Fixes: ed4a6d7f0676 ("mm: compaction: add /sys trigger for per-node memory compaction") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28device-core: Kill the lockdep_mutexDan Williams
Per Peter [1], the lockdep API has native support for all the use cases lockdep_mutex was attempting to enable. Now that all lockdep_mutex users have been converted to those APIs, drop this lock. Link: https://lore.kernel.org/r/Ylf0dewci8myLvoW@hirez.programming.kicks-ass.net [1] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/165055522548.3745911.14298368286915484086.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-28bus: platform,amba,fsl-mc,PCI: Add device DMA ownership managementLu Baolu
The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers with the device DMA managed by kernel drivers or user-space applications. Unfortunately, multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. The DMA on these devices must either be entirely under kernel control or userspace control, never a mixture. Otherwise the driver integrity is not guaranteed because they could access each other through the peer-to-peer accesses which by-pass the IOMMU protection. This checks and sets the default DMA mode during driver binding, and cleanups during driver unbinding. In the default mode, the device DMA is managed by the device driver which handles DMA operations through the kernel DMA APIs (see Documentation/core-api/dma-api.rst). For cases where the devices are assigned for userspace control through the userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/ vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this default setting in the assumption that the drivers know what they are doing with the device DMA. Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure is currently a problem. As things stand, the IOMMU driver ignored the initial iommu_probe_device() call when the device was added, since at that point it had no fwspec yet. In this situation, {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after the IOMMU driver has seen the firmware data via .of_xlate to learn that it actually responsible for the given device. As the result, before that gets fixed, iommu_use_default_domain() goes at the end, and calls arch_teardown_dma_ops() if it fails. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28amba: Stop sharing platform_dma_configure()Lu Baolu
Stop sharing platform_dma_configure() helper as they are about to have their own bus dma_configure callbacks. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220418005000.897664-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-28driver core: Add dma_cleanup callback in bus_typeLu Baolu
The bus_type structure defines dma_configure() callback for bus drivers to configure DMA on the devices. This adds the paired dma_cleanup() callback and calls it during driver unbinding so that bus drivers can do some cleanup work. One use case for this paired DMA callbacks is for the bus driver to check for DMA ownership conflicts during driver binding, where multiple devices belonging to a same IOMMU group (the minimum granularity of isolation and protection) may be assigned to kernel drivers or user space respectively. Without this change, for example, the vfio driver has to listen to a bus BOUND_DRIVER event and then BUG_ON() in case of dma ownership conflict. This leads to bad user experience since careless driver binding operation may crash the system if the admin overlooks the group restriction. Aside from bad design, this leads to a security problem as a root user, even with lockdown=integrity, can force the kernel to BUG. With this change, the bus driver could check and set the DMA ownership in driver binding process and fail on ownership conflicts. The DMA ownership should be released during driver unbinding. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220418005000.897664-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-27Revert "firmware_loader: use kernel credentials when reading firmware"Greg Kroah-Hartman
This reverts commit 3677563eb8731e1ad5970e3e57f74e5f9d63502a as it leaks memory :( Reported-by: Qian Cai <quic_qiancai@quicinc.com> Link: https://lore.kernel.org/r/20220427135823.GD71@qian Cc: Thiébaud Weksteen <tweek@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: John Stultz <jstultz@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27driver core: Add sysfs support for physical location of a deviceWon Chung
When ACPI table includes _PLD fields for a device, create a new directory (physical_location) in sysfs to share _PLD fields. Currently without PLD information, when there are multiple of same devices, it is hard to distinguish which device corresponds to which physical device at which location. For example, when there are two Type C connectors, it is hard to find out which connector corresponds to the Type C port on the left panel versus the Type C port on the right panel. With PLD information provided, we can determine which specific device at which location is doing what. _PLD output includes much more fields, but only generic fields are added and exposed to sysfs, so that non-ACPI devices can also support it in the future. The minimal generic fields needed for locating a device are the following. - panel - vertical_position - horizontal_position - dock - lid Signed-off-by: Won Chung <wonchung@google.com> Link: https://lore.kernel.org/r/20220314195458.271430-1-wonchung@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27platform: finally disallow IRQ0 in platform_get_irq() and its ilkSergey Shtylyov
The commit a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid") only calls WARN() when IRQ0 is about to be returned, however using IRQ0 is considered invalid (according to Linus) outside the arch/ code where it's used by the i8253 drivers. Many driver subsystems treat 0 specially (e.g. as an indication of the polling mode by libata), so the users of platform_get_irq[_byname]() in them would have to filter out IRQ0 explicitly and this (quite obviously) doesn't scale... Let's finally get this straight and return -EINVAL instead of IRQ0! Fixes: a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid") Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/025679e1-1f0a-ae4b-4369-01164f691511@omp.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27drivers/base/node.c: fix compaction sysfs file leakMiaohe Lin
Compaction sysfs file is created via compaction_register_node in register_node. But we forgot to remove it in unregister_node. Thus compaction sysfs file is leaked. Using compaction_unregister_node to fix this issue. Fixes: ed4a6d7f0676 ("mm: compaction: add /sys trigger for per-node memory compaction") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220401070905.43679-1-linmiaohe@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27driver core: Prevent overriding async driver of a device before it probeMark-PK Tsai
When there are 2 matched drivers for a device using async probe mechanism, the dev->p->async_driver might be overridden by the last attached driver. So just skip the later one if the previous matched driver was not handled by async thread yet. Below is my use case which having this problem. Make both driver mmcblk and mmc_test allow async probe, the dev->p->async_driver will be overridden by the later driver mmc_test and bind to the device then claim it for testing. When it happen, mmcblk will never do probe again. Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Link: https://lore.kernel.org/r/20220316074328.1801-1-mark-pk.tsai@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26device property: Use multi-connection matchers for single caseBjorn Andersson
The newly introduced helpers for searching for matches in the case of multiple connections can be resused by the single-connection case, so do this to save some duplication. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220422222351.1297276-3-bjorn.andersson@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26device property: Add helper to match multiple connectionsBjorn Andersson
In some cases multiple connections with the same connection id needs to be resolved from a fwnode graph. One such example is when separate hardware is used for performing muxing and/or orientation switching of the SuperSpeed and SBU lines in a USB Type-C connector. In this case the connector needs to belong to a graph with multiple matching remote endpoints, and the Type-C controller needs to be able to resolve them both. Add a new API that allows this kind of lookup. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220422222351.1297276-2-bjorn.andersson@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26Documentation: dd: Use ReST lists for return values of ↵Bagas Sanjaya
driver_deferred_probe_check_state() Sphinx reported build warnings mentioning drivers/base/dd.c: </path/to/linux>/Documentation/driver-api/infrastructure:35: ./drivers/base/dd.c:280: WARNING: Unexpected indentation. </path/to/linux>/Documentation/driver-api/infrastructure:35: ./drivers/base/dd.c:281: WARNING: Block quote ends without a blank line; unexpected unindent. The warnings above is due to syntax error in the "Return" section of driver_deferred_probe_check_state() which messed up with desired line breaks. Fix the issue by using ReST lists syntax. Fixes: c8c43cee29f6ca ("driver core: Fix driver_deferred_probe_check_state() logic") Cc: linux-pm@vger.kernel.org Cc: stable@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Mark Brown <broonie@kernel.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Kevin Hilman <khilman@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Rob Herring <robh@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20220416071137.19512-1-bagasdotme@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26firmware_loader: Add sysfs nodes to monitor fw_uploadRuss Weight
Add additional sysfs nodes to monitor the transfer of firmware upload data to the target device: cancel: Write 1 to cancel the data transfer error: Display error status for a failed firmware upload remaining_size: Display the remaining amount of data to be transferred status: Display the progress of the firmware upload Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Tianfei zhang <tianfei.zhang@intel.com> Tested-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Link: https://lore.kernel.org/r/20220421212204.36052-6-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26firmware_loader: Add firmware-upload supportRuss Weight
Extend the firmware subsystem to support a persistent sysfs interface that userspace may use to initiate a firmware update. For example, FPGA based PCIe cards load firmware and FPGA images from local FLASH when the card boots. The images in FLASH may be updated with new images provided by the user at his/her convenience. A device driver may call firmware_upload_register() to expose persistent "loading" and "data" sysfs files. These files are used in the same way as the fallback sysfs "loading" and "data" files. When 0 is written to "loading" to complete the write of firmware data, the data is transferred to the lower-level driver using pre-registered call-back functions. The data transfer is done in the context of a kernel worker thread. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Tianfei zhang <tianfei.zhang@intel.com> Tested-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Link: https://lore.kernel.org/r/20220421212204.36052-5-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26firmware_loader: Split sysfs support from fallbackRuss Weight
In preparation for sharing the "loading" and "data" sysfs nodes with the new firmware upload support, split out sysfs functionality from fallback.c and fallback.h into sysfs.c and sysfs.h. This includes the firmware class driver code that is associated with the sysfs files and the fw_fallback_config support for the timeout sysfs node. CONFIG_FW_LOADER_SYSFS is created and is selected by CONFIG_FW_LOADER_USER_HELPER in order to include sysfs.o in firmware_class-objs. This is mostly just a code reorganization. There are a few symbols that change in scope, and these can be identified by looking at the header file changes. A few white-space warnings from checkpatch are also addressed in this patch. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Tianfei zhang <tianfei.zhang@intel.com> Tested-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Link: https://lore.kernel.org/r/20220421212204.36052-4-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-25regmap: cache: set max_register with reg_strideJeongtae Park
Current logic does not consider multi-stride cases, the max_register have to calculate with reg_stride because it is a kind of address range. Signed-off-by: Jeongtae Park <jtp.park@samsung.com> Link: https://lore.kernel.org/r/20220425114613.15934-1-jtp.park@samsung.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-04-23topology: Fix up build warning in topology_is_visible()Greg Kroah-Hartman
Commit aa63a74d4535 ("topology/sysfs: Hide PPIN on systems that do not support it.") caused a build warning on some configurations: drivers/base/topology.c: In function 'topology_is_visible': drivers/base/topology.c:158:24: warning: unused variable 'dev' [-Wunused-variable] 158 | struct device *dev = kobj_to_dev(kobj); Fix this up by getting rid of the variable entirely. Fixes: aa63a74d4535 ("topology/sysfs: Hide PPIN on systems that do not support it.") Cc: Tony Luck <tony.luck@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20220422062653.3899972-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-22drivers/base/memory: Fix an unlikely reference counting issue in ↵Christophe JAILLET
__add_memory_block() __add_memory_block() calls both put_device() and device_unregister() when storing the memory block into the xarray. This is incorrect because xarray doesn't take an additional reference and device_unregister() already calls put_device(). Triggering the issue looks really unlikely and its only effect should be to log a spurious warning about a ref counted issue. Fixes: 4fb6eabf1037 ("drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/d44c63d78affe844f020dc02ad6af29abc448fc4.1650611702.git.christophe.jaillet@wanadoo.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-22firmware_loader: use kernel credentials when reading firmwareThiébaud Weksteen
Device drivers may decide to not load firmware when probed to avoid slowing down the boot process should the firmware filesystem not be available yet. In this case, the firmware loading request may be done when a device file associated with the driver is first accessed. The credentials of the userspace process accessing the device file may be used to validate access to the firmware files requested by the driver. Ensure that the kernel assumes the responsibility of reading the firmware. This was observed on Android for a graphic driver loading their firmware when the device file (e.g. /dev/mali0) was first opened by userspace (i.e. surfaceflinger). The security context of surfaceflinger was used to validate the access to the firmware file (e.g. /vendor/firmware/mali.bin). Because previous configurations were relying on the userspace fallback mechanism, the security context of the userspace daemon (i.e. ueventd) was consistently used to read firmware files. More devices are found to use the command line argument firmware_class.path which gives the kernel the opportunity to read the firmware directly, hence surfacing this misattribution. Signed-off-by: Thiébaud Weksteen <tweek@google.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/r/20220422013215.2301793-1-tweek@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-22firmware_loader: Check fw_state_is_done in loading_storeRuss Weight
Rename fw_sysfs_done() and fw_sysfs_loading() to fw_state_is_done() and fw_state_is_loading() respectively, and place them along side companion functions in drivers/base/firmware_loader/firmware.h. Use the fw_state_is_done() function to exit early from firmware_loading_store() if the state is already "done". This is being done in preparation for supporting persistent sysfs nodes to allow userspace to upload firmware to a device, potentially reusing the sysfs loading and data files multiple times. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Tianfei zhang <tianfei.zhang@intel.com> Tested-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Link: https://lore.kernel.org/r/20220421212204.36052-3-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-22firmware_loader: Clear data and size in fw_free_paged_bufRuss Weight
The fw_free_paged_buf() function resets the paged buffer information in the fw_priv data structure. Additionally, clear the data and size members of fw_priv in order to facilitate the reuse of fw_priv. This is being done in preparation for enabling userspace to initiate multiple firmware uploads using this sysfs interface. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Tianfei zhang <tianfei.zhang@intel.com> Tested-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Link: https://lore.kernel.org/r/20220421212204.36052-2-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-22driver: platform: Add helper for safer setting of driver_overrideKrzysztof Kozlowski
Several core drivers and buses expect that driver_override is a dynamically allocated memory thus later they can kfree() it. However such assumption is not documented, there were in the past and there are already users setting it to a string literal. This leads to kfree() of static memory during device release (e.g. in error paths or during unbind): kernel BUG at ../mm/slub.c:3960! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kfree) from [<c058da50>] (platform_device_release+0x88/0xb4) (platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90) (device_release) from [<c0a69050>] (kobject_put+0xec/0x20c) (kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c) (exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4) (platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414) (really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4) (driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8) (bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c) (__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90) (bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c) (device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc) (of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc) (of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc) (of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118) (of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8) (of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404) Provide a helper which clearly documents the usage of driver_override. This will allow later to reuse the helper and reduce the amount of duplicated code. Convert the platform driver to use a new helper and make the driver_override field const char (it is not modified by the core). Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-22PM: domains: Move genpd's time-accounting to ktime_get_mono_fast_ns()Ulf Hansson
To move towards a more consistent behaviour between genpd and the runtime PM core, let's start by converting genpd's time-accounting from ktime_get() into ktime_get_mono_fast_ns(). Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-22firmware: Add the support for ZSTD-compressed firmware filesTakashi Iwai
As the growing demand on ZSTD compressions, there have been requests for the support of ZSTD-compressed firmware files, so here it is: this patch extends the firmware loader code to allow loading ZSTD files. The implementation is fairly straightforward, it just adds a ZSTD decompression routine for the file expander. (And the code is even simpler than XZ thanks to the ZSTD API that gives the original decompressed size from the header.) Link: https://lore.kernel.org/all/20210127154939.13288-1-tiwai@suse.de/ Tested-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20220421152908.4718-2-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20arch_topology: Do not set llc_sibling if llc_id is invalidWang Qing
When ACPI is not enabled, cpuid_topo->llc_id = cpu_topo->llc_id = -1, which will set llc_sibling 0xff(...), this is misleading. Don't set llc_sibling(default 0) if we don't know the cache topology. Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Wang Qing <wangqing@vivo.com> Fixes: 37c3ec2d810f ("arm64: topology: divorce MC scheduling domain from core_siblings") Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1649644580-54626-1-git-send-email-wangqing@vivo.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20topology: make core_mask include at least cluster_siblingsDarren Hart
Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop Control Unit, but have no shared CPU-side last level cache. cpu_coregroup_mask() will return a cpumask with weight 1, while cpu_clustergroup_mask() will return a cpumask with weight 2. As a result, build_sched_domain() will BUG() once per CPU with: BUG: arch topology borken the CLS domain not a subset of the MC domain The MC level cpumask is then extended to that of the CLS child, and is later removed entirely as redundant. This sched domain topology is an improvement over previous topologies, or those built without SCHED_CLUSTER, particularly for certain latency sensitive workloads. With the current scheduler model and heuristics, this is a desirable default topology for Ampere Altra and Altra Max system. Rather than create a custom sched domains topology structure and introduce new logic in arch/arm64 to detect these systems, update the core_mask so coregroup is never a subset of clustergroup, extending it to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER is enabled to avoid also changing the topology (MC) when CONFIG_SCHED_CLUSTER is disabled. This has the added benefit over a custom topology of working for both symmetric and asymmetric topologies. It does not address systems where the CLUSTER topology is above a populated MC topology, but these are not considered today and can be addressed separately if and when they appear. The final sched domain topology for a 2 socket Ampere Altra system is unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided: For CPU0: CONFIG_SCHED_CLUSTER=y CLS [0-1] DIE [0-79] NUMA [0-159] CONFIG_SCHED_CLUSTER is not set DIE [0-79] NUMA [0-159] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: D. Scott Phillips <scott@os.amperecomputing.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: <stable@vger.kernel.org> # 5.16.x Suggested-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Darren Hart <darren@os.amperecomputing.com> Link: https://lore.kernel.org/r/c8fe9fce7c86ed56b4c455b8c902982dc2303868.1649696956.git.darren@os.amperecomputing.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20topology/sysfs: Hide PPIN on systems that do not support it.Tony Luck
Systems that do not support a Protected Processor Identification Number currently report: # cat /sys/devices/system/cpu/cpu0/topology/ppin 0x0 which is confusing/wrong. Add a ".is_visible" function to suppress inclusion of the ppin file. Fixes: ab28e944197f ("topology/sysfs: Add PPIN in sysfs under cpu topology") Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20220406220150.63855-1-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-19PM: runtime: Allow to call __pm_runtime_set_status() from atomic contextUlf Hansson
The only two users of __pm_runtime_set_status() are pm_runtime_set_active() and pm_runtime_set_suspended(). These are widely used and should be called from non-atomic context to work as expected. However, it would be convenient to allow them be called from atomic context too, as shown from a subsequent change, so let's add support for this. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Maulik Shah <quic_mkshah@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13device property: Drop 'test' prefix in parameters of fwnode_is_ancestor_of()Andy Shevchenko
The part 'is' in the function name implies the test against something. Drop unnecessary 'test' prefix in the fwnode_is_ancestor_of() parameters. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13device property: Introduce fwnode_for_each_parent_node()Andy Shevchenko
In a few cases the functionality of fwnode_for_each_parent_node() is already in use. Introduce a common helper macro for it. It may be used by others as well in the future. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13device property: Allow error pointer to be passed to fwnode APIsAndy Shevchenko
Some of the fwnode APIs might return an error pointer instead of NULL or valid fwnode handle. The result of such API call may be considered optional and hence the test for it is usually done in a form of fwnode = fwnode_find_reference(...); if (IS_ERR(fwnode)) ...error handling... Nevertheless the resulting fwnode may have bumped the reference count and hence caller of the above API is obliged to call fwnode_handle_put(). Since fwnode may be not valid either as NULL or error pointer the check has to be performed there. This approach uglifies the code and adds a point of making a mistake, i.e. forgetting about error point case. To prevent this, allow an error pointer to be passed to the fwnode APIs. Fixes: 83b34afb6b79 ("device property: Introduce fwnode_find_reference()") Reported-by: Nuno Sá <nuno.sa@analog.com> Tested-by: Nuno Sá <nuno.sa@analog.com> Acked-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13PM: runtime: Avoid device usage count underflowsRafael J. Wysocki
A PM-runtime device usage count underflow is potentially critical, because it may cause a device to be suspended when it is expected to be operational. It is also a programming problem that would be good to catch and warn about. For this reason, (1) make rpm_check_suspend_allowed() return an error when the device usage count is negative to prevent devices from being suspended in that case, (2) introduce rpm_drop_usage_count() that will detect device usage count underflows, warn about them and fix them up, and (3) use it to drop the usage count in a few places instead of atomic_dec_and_test(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-04-13PM: domains: Extend dev_pm_domain_detach() docKrzysztof Kozlowski
Mention all domain attach menthods which dev_pm_domain_detach() reverses. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-08net: mdio: don't defer probe forever if PHY IRQ provider is missingVladimir Oltean
When a driver for an interrupt controller is missing, of_irq_get() returns -EPROBE_DEFER ad infinitum, causing fwnode_mdiobus_phy_device_register(), and ultimately, the entire of_mdiobus_register() call, to fail. In turn, any phy_connect() call towards a PHY on this MDIO bus will also fail. This is not what is expected to happen, because the PHY library falls back to poll mode when of_irq_get() returns a hard error code, and the MDIO bus, PHY and attached Ethernet controller work fine, albeit suboptimally, when the PHY library polls for link status. However, -EPROBE_DEFER has special handling given the assumption that at some point probe deferral will stop, and the driver for the supplier will kick in and create the IRQ domain. Reasons for which the interrupt controller may be missing: - It is not yet written. This may happen if a more recent DT blob (with an interrupt-parent for the PHY) is used to boot an old kernel where the driver didn't exist, and that kernel worked with the vintage-correct DT blob using poll mode. - It is compiled out. Behavior is the same as above. - It is compiled as a module. The kernel will wait for a number of seconds specified in the "deferred_probe_timeout" boot parameter for user space to load the required module. The current default is 0, which times out at the end of initcalls. It is possible that this might cause regressions unless users adjust this boot parameter. The proposed solution is to use the driver_deferred_probe_check_state() helper function provided by the driver core, which gives up after some -EPROBE_DEFER attempts, taking "deferred_probe_timeout" into consideration. The return code is changed from -EPROBE_DEFER into -ENODEV or -ETIMEDOUT, depending on whether the kernel is compiled with support for modules or not. Fixes: 66bdede495c7 ("of_mdio: Fix broken PHY IRQ in case of probe deferral") Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220407165538.4084809-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05device property: Add irq_get to fwnode operationSakari Ailus
Add irq_get() fwnode operation to implement fwnode_irq_get() through fwnode operations, moving the code in fwnode_irq_get() to OF and ACPI frameworks. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05device property: Add iomap to fwnode operationsSakari Ailus
Add iomap() fwnode operation to implement fwnode_iomap() through fwnode operations, moving the code in fwnode_iomap() to OF framework. Note that the IS_ENABLED(CONFIG_OF_ADDRESS) && is_of_node(fwnode) check is needed for Sparc that has its own implementation of of_iomap anyway. Let the pre-compiler to handle that check. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05device property: Convert device_{dma_supported,get_dma_attr} to fwnodeSakari Ailus
Make the device_dma_supported and device_get_dma_attr functions to use the fwnode ops, and move the implementation to ACPI and OF frameworks. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-04regmap: Constify static regmap_bus structsRikard Falkeborn
The only usage of these is to pass their address to __regmap_init() or __devm_regmap_init(), both which takes pointers to const struct regmap_bus. Make them const to allow the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Link: https://lore.kernel.org/r/20220330214110.36337-1-rikard.falkeborn@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-29Merge tag 'devprop-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties code update from Rafael Wysocki: "This is based on new i2c material for 5.18-rc1 and simply reorganizes the code on top of it so as to group similar functions together (Andy Shevchenko)" * tag 'devprop-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: device property: Don't split fwnode_get_irq*() APIs in the code
2022-03-28Merge tag 'driver-core-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core changes for 5.18-rc1. Not much here, primarily it was a bunch of cleanups and small updates: - kobj_type cleanups for default_groups - documentation updates - firmware loader minor changes - component common helper added and take advantage of it in many drivers (the largest part of this pull request). All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits) Documentation: update stable review cycle documentation drivers/base/dd.c : Remove the initial value of the global variable Documentation: update stable tree link Documentation: add link to stable release candidate tree devres: fix typos in comments Documentation: add note block surrounding security patch note samples/kobject: Use sysfs_emit instead of sprintf base: soc: Make soc_device_match() simpler and easier to read driver core: dd: fix return value of __setup handler driver core: Refactor sysfs and drv/bus remove hooks driver core: Refactor multiple copies of device cleanup scripts: get_abi.pl: Fix typo in help message kernfs: fix typos in comments kernfs: remove unneeded #if 0 guard ALSA: hda/realtek: Make use of the helper component_compare_dev_name video: omapfb: dss: Make use of the helper component_compare_dev power: supply: ab8500: Make use of the helper component_compare_dev ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of iommu/mediatek: Make use of the helper component_compare/release_of drm: of: Make use of the helper component_release_of ...
2022-03-26Merge branch 'i2c/for-mergewindow' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: - tracepoints when Linux acts as an I2C client - added support for AMD PSP - whole subsystem now uses generic_handle_irq_safe() - piix4 driver gained MMIO access enabling so far missed controllers with AMD chipsets - a bulk of device driver updates, refactorization, and fixes. * 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (61 commits) i2c: mux: demux-pinctrl: do not deactivate a master that is not active i2c: meson: Fix wrong speed use from probe i2c: add tracepoints for I2C slave events i2c: designware: Remove code duplication i2c: cros-ec-tunnel: Fix syntax errors in comments MAINTAINERS: adjust XLP9XX I2C DRIVER after removing the devicetree binding i2c: designware: Mark dw_i2c_plat_{suspend,resume}() as __maybe_unused i2c: mediatek: Add i2c compatible for Mediatek MT8168 dt-bindings: i2c: update bindings for MT8168 SoC i2c: mt65xx: Simplify with clk-bulk i2c: i801: Drop two outdated comments i2c: xiic: Make bus names unique i2c: i801: Add support for the Process Call command i2c: i801: Drop useless masking in i801_access i2c: tegra: Add SMBus block read function i2c: designware: Use the i2c_mark_adapter_suspended/resumed() helpers i2c: designware: Lock the adapter while setting the suspended flag i2c: mediatek: remove redundant null check i2c: mediatek: modify bus speed calculation formula i2c: designware: Fix improper usage of readl ...
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22drivers/base/memory: clarify adding and removing of memory blocksDavid Hildenbrand
Let's make it clearer at which places we actually add and remove memory blocks -- streamlining the terminology -- and highlight which memory block start out online and which start out as offline. * rename add_memory_block -> add_boot_memory_block * rename init_memory_block -> add_memory_block * rename unregister_memory -> remove_memory_block * rename register_memory -> __add_memory_block * add add_hotplug_memory_block * mark add_boot_memory_block with __init (suggested by Oscar) __add_memory_block() is a pure helper for add_memory_block(), remove the somewhat obvious comment. Link: https://lkml.kernel.org/r/20220221154531.11382-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/memory: determine and store zone for single-zone memory blocksDavid Hildenbrand
test_pages_in_a_zone() is just another nasty PFN walker that can easily stumble over ZONE_DEVICE memory ranges falling into the same memory block as ordinary system RAM: the memmap of parts of these ranges might possibly be uninitialized. In fact, we observed (on an older kernel) with UBSAN: UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50 index 7 is out of range for type 'zone [5]' CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019 Call Trace: dump_stack+0x9a/0xf0 ubsan_epilogue+0x9/0x7a __ubsan_handle_out_of_bounds+0x13a/0x181 test_pages_in_a_zone+0x3c4/0x500 show_valid_zones+0x1fa/0x380 dev_attr_show+0x43/0xb0 sysfs_kf_seq_show+0x1c5/0x440 seq_read+0x49d/0x1190 vfs_read+0xff/0x300 ksys_read+0xb8/0x170 do_syscall_64+0xa5/0x4b0 entry_SYSCALL_64_after_hwframe+0x6a/0xdf RIP: 0033:0x7f01f4439b52 We seem to stumble over a memmap that contains a garbage zone id. While we could try inserting pfn_to_online_page() calls, it will just make memory offlining slower, because we use test_pages_in_a_zone() to make sure we're offlining pages that all belong to the same zone. Let's just get rid of this PFN walker and determine the single zone of a memory block -- if any -- for early memory blocks during boot. For memory onlining, we know the single zone already. Let's avoid any additional memmap scanning and just rely on the zone information available during boot. For memory hot(un)plug, we only really care about memory blocks that: * span a single zone (and, thereby, a single node) * are completely System RAM (IOW, no holes, no ZONE_DEVICE) If one of these conditions is not met, we reject memory offlining. Hotplugged memory blocks (starting out offline), always meet both conditions. There are three scenarios to handle: (1) Memory hot(un)plug A memory block with zone == NULL cannot be offlined, corresponding to our previous test_pages_in_a_zone() check. After successful memory onlining/offlining, we simply set the zone accordingly. * Memory onlining: set the zone we just used for onlining * Memory offlining: set zone = NULL So a hotplugged memory block starts with zone = NULL. Once memory onlining is done, we set the proper zone. (2) Boot memory with !CONFIG_NUMA We know that there is just a single pgdat, so we simply scan all zones of that pgdat for an intersection with our memory block PFN range when adding the memory block. If more than one zone intersects (e.g., DMA and DMA32 on x86 for the first memory block) we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. (3) Boot memory with CONFIG_NUMA At the point in time we create the memory block devices during boot, we don't know yet which nodes *actually* span a memory block. While we could scan all zones of all nodes for intersections, overlapping nodes complicate the situation and scanning all nodes is possibly expensive. But that problem has already been solved by the code that sets the node of a memory block and creates the link in the sysfs -- do_register_memory_block_under_node(). So, we hook into the code that sets the node id for a memory block. If we already have a different node id set for the memory block, we know that multiple nodes *actually* have PFNs falling into our memory block: we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. If there is no node id set, we do the same as (2) for the given node. Note that the call order in driver_init() is: -> memory_dev_init(): create memory block devices -> node_dev_init(): link memory block devices to the node and set the node id So in summary, we detect if there is a single zone responsible for this memory block and we consequently store the zone in that case in the memory block, updating it during memory onlining/offlining. Link: https://lkml.kernel.org/r/20220210184359.235565-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Rafael Parra <rparrazo@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rafael Parra <rparrazo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/node: rename link_mem_sections() to ↵David Hildenbrand
register_memory_block_under_node() Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2. I remember talking to Michal in the past about removing test_pages_in_a_zone(), which we use for: * verifying that a memory block we intend to offline is really only managed by a single zone. We don't support offlining of memory blocks that are managed by multiple zones (e.g., multiple nodes, DMA and DMA32) * exposing that zone to user space via /sys/devices/system/memory/memory*/valid_zones Now that I identified some more cases where test_pages_in_a_zone() might go wrong, and we received an UBSAN report (see patch #3), let's get rid of this PFN walker. So instead of detecting the zone at runtime with test_pages_in_a_zone() by scanning the memmap, let's determine and remember for each memory block if it's managed by a single zone. The stored zone can then be used for the above two cases, avoiding a manual lookup using test_pages_in_a_zone(). This avoids eventually stumbling over uninitialized memmaps in corner cases, especially when ZONE_DEVICE ranges partly fall into memory block (that are responsible for managing System RAM). Handling memory onlining is easy, because we online to exactly one zone. Handling boot memory is more tricky, because we want to avoid scanning all zones of all nodes to detect possible zones that overlap with the physical memory region of interest. Fortunately, we already have code that determines the applicable nodes for a memory block, to create sysfs links -- we'll hook into that. Patch #1 is a simple cleanup I had laying around for a longer time. Patch #2 contains the main logic to remove test_pages_in_a_zone() and further details. [1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com [2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com This patch (of 2): Let's adjust the stale terminology, making it match unregister_memory_block_under_nodes() and do_register_memory_block_under_node(). We're dealing with memory block devices, which span 1..X memory sections. Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rafael Parra <rparrazo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/node: consolidate node device subsystem initialization in ↵David Hildenbrand
node_dev_init() ... and call node_dev_init() after memory_dev_init() from driver_init(), so before any of the existing arch/subsys calls. All online nodes should be known at that point: early during boot, arch code determines node and zone ranges and sets the relevant nodes online; usually this happens in setup_arch(). This is in line with memory_dev_init(), which initializes the memory device subsystem and creates all memory block devices. Similar to memory_dev_init(), panic() if anything goes wrong, we don't want to continue with such basic initialization errors. The important part is that node_dev_init() gets called after memory_dev_init() and after cpu_dev_init(), but before any of the relevant archs call register_cpu() to register the new cpu device under the node device. The latter should be the case for the current users of topology_init(). Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64) Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/memory: add memory block to memory group after registration ↵David Hildenbrand
succeeded If register_memory() fails, we freed the memory block but already added the memory block to the group list, not good. Let's defer adding the block to the memory group to after registering the memory block device. We do handle it properly during unregister_memory(), but that's not called when the registration fails. Link: https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handlerluofei
When the hwpoison page meets the filter conditions, it should not be regarded as successful memory_failure() processing for mce handler, but should return a distinct value, otherwise mce handler regards the error page has been identified and isolated, which may lead to calling set_mce_nospec() to change page attribute, etc. Here memory_failure() return -EOPNOTSUPP to indicate that the error event is filtered, mce handler should not take any action for this situation and hwpoison injector should treat as correct. Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.com Signed-off-by: luofei <luofei@unicloud.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22Merge tag 'sched-core-2022-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Cleanups for SCHED_DEADLINE - Tracing updates/fixes - CPU Accounting fixes - First wave of changes to optimize the overhead of the scheduler build, from the fast-headers tree - including placeholder *_api.h headers for later header split-ups. - Preempt-dynamic using static_branch() for ARM64 - Isolation housekeeping mask rework; preperatory for further changes - NUMA-balancing: deal with CPU-less nodes - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD) - Updates to RSEQ UAPI in preparation for glibc usage - Lots of RSEQ/selftests, for same - Add Suren as PSI co-maintainer * tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) sched/headers: ARM needs asm/paravirt_api_clock.h too sched/numa: Fix boot crash on arm64 systems headers/prep: Fix header to build standalone: <linux/psi.h> sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y cgroup: Fix suspicious rcu_dereference_check() usage warning sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity() sched/deadline,rt: Remove unused functions for !CONFIG_SMP sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy() sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file sched/deadline: Remove unused def_dl_bandwidth sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE sched/tracing: Don't re-read p->state when emitting sched_switch event sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race sched/cpuacct: Remove redundant RCU read lock sched/cpuacct: Optimize away RCU read lock sched/cpuacct: Fix charge percpu cpuusage sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies ...