summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-07-05PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEPBjorn Helgaas
Previously the CONFIG_PM_SLEEP and !CONFIG_PM_SLEEP device_init_wakeup() implementations differed in confusing ways: - The PM_SLEEP version checked for a NULL device pointer and returned -EINVAL, while the !PM_SLEEP version did not and would simply dereference a NULL pointer. - When called with "false", the !PM_SLEEP version cleared "capable" and "enable" in the opposite order of the PM_SLEEP version. That was harmless because for !PM_SLEEP they're simple assignments, but it's unnecessary confusion. Use a simplified version of the PM_SLEEP implementation for both cases. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-05ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supportedMario Limonciello
commit 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported") added support for claiming to support CPPC in _OSC on non-Intel platforms. This unfortunately caused a regression on a vartiety of AMD platforms in the field because a number of AMD platforms don't set the `_OSC` bit 5 or 6 to indicate CPPC or CPPC v2 support. As these AMD platforms already claim CPPC support via a dedicated MSR from `X86_FEATURE_CPPC`, use this enable this feature rather than requiring the `_OSC` on platforms with a dedicated MSR. If there is additional breakage on the shared memory designs also missing this _OSC, additional follow up changes may be needed. Fixes: 72f2ecb7ece7 ("Set CPPC _OSC bits for all and when CPPC_LIB is supported") Reported-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-05ACPI: CPPC: Only probe for _CPC if CPPC v2 is ackedMario Limonciello
Previously the kernel used to ignore whether the firmware masked CPPC or CPPCv2 and would just pretend that it worked. When support for the USB4 bit in _OSC was introduced from commit 9e1f561afb ("ACPI: Execute platform _OSC also with query bit clear") the kernel began to look at the return when the query bit was clear. This caused regressions that were misdiagnosed and attempted to be solved as part of commit 2ca8e6285250 ("Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag""). This caused a different regression where non-Intel systems weren't able to negotiate _OSC properly. This was reverted in commit 2ca8e6285250 ("Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"") and attempted to be fixed by commit c42fa24b4475 ("ACPI: bus: Avoid using CPPC if not supported by firmware") but the regression still returned. These systems with the regression only load support for CPPC from an SSDT dynamically when _OSC reports CPPC v2. Avoid the problem by not letting CPPC satisfy the requirement in `acpi_cppc_processor_probe`. Reported-by: CUI Hao <cuihao.leo@gmail.com> Reported-by: maxim.novozhilov@gmail.com Reported-by: lethe.tree@protonmail.com Reported-by: garystephenwright@gmail.com Reported-by: galaxyking0419@gmail.com Fixes: c42fa24b4475 ("ACPI: bus: Avoid using CPPC if not supported by firmware") Fixes: 2ca8e6285250 ("Revert "ACPI Pass the same capabilities to the _OSC regardless of the query flag"") Link: https://bugzilla.kernel.org/show_bug.cgi?id=213023 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2075387 Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: CUI Hao <cuihao.leo@gmail.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-05ACPI: VIOT: Fix ACS setupEric Auger
Currently acpi_viot_init() gets called after the pci device has been scanned and pci_enable_acs() has been called. So pci_request_acs() fails to be taken into account leading to wrong single iommu group topologies when dealing with multi-function root ports for instance. We cannot simply move the acpi_viot_init() earlier, similarly as the IORT init because the VIOT parsing relies on the pci scan. However we can detect VIOT is present earlier and in such a case, request ACS. Introduce a new acpi_viot_early_init() routine that allows to call pci_request_acs() before the scan. While at it, guard the call to pci_request_acs() with #ifdef CONFIG_PCI. Fixes: 3cf485540e7b ("ACPI: Add driver for the VIOT table") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Jin Liu <jinl@redhat.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-05drm: Remove linux/i2c.h from drm_crtc.hVille Syrjälä
drm_crtc.h has no need for linux/i2c.h, so don't include it. Avoids useless rebuilds of the entire universe when touching linux/i2c.h. Quite a few placs do currently depend on linux/i2c.h without actually including it directly. All of those need to be fixed up. v2: imx and mcde need linux/io.h for readl()/etc. Acked-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220630195114.17407-5-ville.syrjala@linux.intel.com
2022-07-05drm: Remove linux/media-bus-format.h from drm_crtc.hVille Syrjälä
drm_crtc.h has no need for linux/media-bus-format.h, so don't include it. Avoids useless rebuilds of the entire universe when touching linux/media-bus-format.h. Quite a few placs do currently depend on linux/media-bus-format.h without actually including it directly. All of those need to be fixed up. v2: Deal with ingenic as well v3: Fix up mxsfb and remaining parts of imx Acked-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220630195114.17407-4-ville.syrjala@linux.intel.com
2022-07-05drm: Remove linux/fb.h from drm_crtc.hVille Syrjälä
drm_crtc.h has no need for linux/fb.h, so don't include it. Avoids useless rebuilds of the entire universe when touching linux/fb.h. Quite a few placs do currently depend on linux/fb.h or other headers pulled in by it without actually including any of it directly. All of those need to be fixed up. v2: Split the vmwgfx change out Acked-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220630195114.17407-3-ville.syrjala@linux.intel.com
2022-07-05fscache: Fix invalidation/lookup raceDavid Howells
If an NFS file is opened for writing and closed, fscache_invalidate() will be asked to invalidate the file - however, if the cookie is in the LOOKING_UP state (or the CREATING state), then request to invalidate doesn't get recorded for fscache_cookie_state_machine() to do something with. Fix this by making __fscache_invalidate() set a flag if it sees the cookie is in the LOOKING_UP state to indicate that we need to go to invalidation. Note that this requires a count on the n_accesses counter for the state machine, which that will release when it's done. fscache_cookie_state_machine() then shifts to the INVALIDATING state if it sees the flag. Without this, an nfs file can get corrupted if it gets modified locally and then read locally as the cache contents may not get updated. Fixes: d24af13e2e23 ("fscache: Implement cookie invalidation") Reported-by: Max Kellermann <mk@cm4all.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Max Kellermann <mk@cm4all.com> Link: https://lore.kernel.org/r/YlWWbpW5Foynjllo@rabbit.intern.cm-ag [1]
2022-07-05ASoC: madera: Replace kernel.h with the necessary inclusionsAndy Shevchenko
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://lore.kernel.org/r/20220603170707.48728-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-07-05Merge tag 'renesas-r9a07g043-dt-binding-defs-tag2' into HEADGeert Uytterhoeven
Renesas RZ/Five DT Binding Definitions Clock and reset definitions for the Renesas RZ/Five (R9A07G043) SoC, shared by driver and DT source files.
2022-07-05dt-bindings: clock: r9a07g043-cpg: Add Renesas RZ/Five CPG Clock and Reset ↵Lad Prabhakar
Definitions Renesas RZ/Five SoC has almost the same clock structure compared to the Renesas RZ/G2UL SoC, re-use the r9a07g043-cpg.h header file and just amend the RZ/Five CPG clock and reset definitions. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220622181723.13033-2-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2022-07-05drm: Add DRM_GEM_FOPSRob Clark
The DEFINE_DRM_GEM_FOPS() helper is a bit limiting if a driver wants to provide additional file ops, like show_fdinfo(). v2: Split out DRM_GEM_FOPS instead of making DEFINE_DRM_GEM_FOPS varardic v3: nits Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Patchwork: https://patchwork.freedesktop.org/patch/488904/ Link: https://lore.kernel.org/r/20220609174213.2265938-1-robdclark@gmail.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2022-07-04dma-mapping: Fix build error unused-valueRen Zhijie
If CONFIG_DMA_DECLARE_COHERENT is not set, make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- will be failed, like this: drivers/remoteproc/remoteproc_core.c: In function ‘rproc_rvdev_release’: ./include/linux/dma-map-ops.h:182:42: error: statement with no effect [-Werror=unused-value] #define dma_release_coherent_memory(dev) (0) ^ drivers/remoteproc/remoteproc_core.c:464:2: note: in expansion of macro ‘dma_release_coherent_memory’ dma_release_coherent_memory(dev); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors The return type of function dma_release_coherent_memory in CONFIG_DMA_DECLARE_COHERENT area is void, so in !CONFIG_DMA_DECLARE_COHERENT area it should neither return any value nor be defined as zero. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: e61c451476e6 ("dma-mapping: Add dma_release_coherent_memory to DMA API") Signed-off-by: Ren Zhijie <renzhijie2@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220630123528.251181-1-renzhijie2@huawei.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2022-07-04ACPI: Remove the unused find_acpi_cpu_cache_topology()Sudeep Holla
The sole user of this find_acpi_cpu_cache_topology() was arm64 topology which is now consolidated into the generic arch_topology without the need of this function. Drop the unused function find_acpi_cpu_cache_topology(). Link: https://lore.kernel.org/r/20220704101605.1318280-22-sudeep.holla@arm.com Cc: Rafael J. Wysocki <rafael@kernel.org> Reported-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04arch_topology: Drop LLC identifier stash from the CPU topologySudeep Holla
Since the cacheinfo LLC information is used directly in arch_topology, there is no need to parse and store the LLC ID information only for ACPI systems in the CPU topology. Remove the redundant LLC ID from the generic CPU arch_topology information. Link: https://lore.kernel.org/r/20220704101605.1318280-13-sudeep.holla@arm.com Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04cacheinfo: Allow early detection and population of cache attributesSudeep Holla
Some architecture/platforms may need to setup cache properties very early in the boot along with other cpu topologies so that all these information can be used to build sched_domains which is used by the scheduler. Allow detect_cache_attributes to be called quite early during the boot. Link: https://lore.kernel.org/r/20220704101605.1318280-7-sudeep.holla@arm.com Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04cacheinfo: Add support to check if last level cache(LLC) is valid or sharedSudeep Holla
It is useful to have helper to check if the given two CPUs share last level cache. We can do that check by comparing fw_token or by comparing the cache ID. Currently we check just for fw_token as the cache ID is optional. This helper can be used to build the llc_sibling during arch specific topology parsing and feeding information to the sched_domains. This also helps to get rid of llc_id in the CPU topology as it is sort of duplicate information. Also add helper to check if the llc information in cacheinfo is valid or not. Link: https://lore.kernel.org/r/20220704101605.1318280-6-sudeep.holla@arm.com Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04mm/tracing: add 'accounted' entry into output of allocation tracepointsVasily Averin
Slab caches marked with SLAB_ACCOUNT force accounting for every allocation from this cache even if __GFP_ACCOUNT flag is not passed. Unfortunately, at the moment this flag is not visible in ftrace output, and this makes it difficult to analyze the accounted allocations. This patch adds boolean "accounted" entry into trace output, and set it to 'true' for calls used __GFP_ACCOUNT flag and for allocations from caches marked with SLAB_ACCOUNT. Set it to 'false' if accounting is disabled in configs. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04include: trace: Add SCMI fast channel tracingCristian Marussi
All the currently defined SCMI events are meant to trace only regular SCMI transfers based on SCMI messages exchanges; SCMI transactions based on fast channels, where used, are completely invisible from the tracing point of view. Add support to trace fast channel transactions; while doing that avoid exposing full shared memory location addresses. Link: https://lore.kernel.org/r/20220704102241.2988447-6-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04firmware: arm_scmi: Add SCMI v3.1 powercap fast channels supportCristian Marussi
Add SCMIv3.1 powercap protocol fast channel support using common helpers provided by the SCMI core with scmi_proto_helpers_ops operations. Link: https://lore.kernel.org/r/20220704102241.2988447-5-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04firmware: arm_scmi: Add SCMI v3.1 powercap protocol basic supportCristian Marussi
Add support for SCMI v3.1 powercap protocol, with the exception of powercap fast channels, exposing all the new related powercap protocol operations as usual in include/linux/scmi_protocol.h. Link: https://lore.kernel.org/r/20220704102241.2988447-3-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04firmware: arm_scmi: Add devm_protocol_acquire helperCristian Marussi
Add a method to get hold of a protocol, causing it to be initialized and its resource accounting updated, without getting access to its operations and handle. Some protocols, like SCMI SystemPower, do not expose any protocol ops to the kernel OSPM agent but still need to be at least initialized. This helper avoids the need to invoke a full devm_get_protocol() only to get the protocol initialized while throwing away unused the protocol ops and handle. Link: https://lore.kernel.org/r/20220704101933.2981635-4-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04firmware: arm_scmi: Add SCMI v3.1 System Power extensionsCristian Marussi
Add support for SCMIv3.1 System Power optional timeout field while dispatching SYSTEM_POWER_STATE_NOTIFIER notification. Link: https://lore.kernel.org/r/20220704101933.2981635-3-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04include: trace: Add SCMI full message tracingCristian Marussi
Add a distinct trace event to dump full SCMI message headers and payloads. Link: https://lore.kernel.org/r/20220630173135.2086631-2-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-07-04interconnect: add device managed bulk APIPeng Fan
Add device managed bulk API to simplify driver. Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20220703091132.1412063-4-peng.fan@oss.nxp.com Signed-off-by: Georgi Djakov <djakov@kernel.org>
2022-07-04dt-bindings: interconnect: add fsl,imx8mp.hPeng Fan
Add fsl,imx8mp.h for i.MX8MP Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220703091132.1412063-3-peng.fan@oss.nxp.com Signed-off-by: Georgi Djakov <djakov@kernel.org>
2022-07-04Merge branch 'for-linus' into for-nextTakashi Iwai
Back-merge of 5.19-rc branch for the futher development, mainly about USB-audio and HD-audio Cirrus stuff. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-07-04mfd: bcm2835-pm: Add support for BCM2711Stefan Wahren
In BCM2711 the new RPiVid ASB took over V3D. The old ASB is still present with the ISP and H264 bits, and V3D is in the same place in the new ASB as the old one. As per the devicetree bindings, BCM2711 will provide both the old and new ASB resources, so get both of them and pass them into 'bcm2835-power,' which will take care of selecting which one to use accordingly. Since the RPiVid ASB's resources were being provided prior to formalizing the bindings[1], also support the old DT files that didn't use 'reg-names.' [1] See: 7dbe8c62ceeb ("ARM: dts: Add minimal Raspberry Pi 4 support") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Peter Robinson <pbrobinson@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20220625113619.15944-8-stefan.wahren@i2se.com
2022-07-04net: phy: broadcom: Add support for BCM53128 internal PHYsKurt Kanzenbach
Add support for BCM53128 internal PHYs. These support interrupts as well as statistics. Therefore, enable the Broadcom PHY driver for them. Tested on BCM53128 switch using the mainline b53 DSA driver. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04xfrm: improve wording of comment above XFRM_OFFLOAD flagsPetr Vaněk
I have noticed a few minor wording issues in a comment recently added above XFRM_OFFLOAD flags in 7c76ecd9c99b ("xfrm: enforce validity of offload input flags"). Signed-off-by: Petr Vaněk <arkamar@atlas.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-07-04sched/core: add forced idle accounting for cgroupsJosh Don
4feee7d1260 previously added per-task forced idle accounting. This patch extends this to also include cgroups. rstat is used for cgroup accounting, except for the root, which uses kcpustat in order to bypass the need for doing an rstat flush when reading root stats. Only cgroup v2 is supported. Similar to the task accounting, the cgroup accounting requires that schedstats is enabled. Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20220629211426.3329954-1-joshdon@google.com
2022-07-03mm/khugepaged: try to free transhuge swapcache when possibleMiaohe Lin
Transhuge swapcaches won't be freed in __collapse_huge_page_copy(). It's because release_pte_page() is not called for these pages and thus free_page_and_swap_cache can't grab the page lock. These pages won't be freed from swap cache even if we are the only user until next time reclaim. It shouldn't hurt indeed, but we could try to free these pages to save more memory for system. Link: https://lkml.kernel.org/r/20220625092816.4856-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: hugetlb: kill set_huge_swap_pte_at()Qi Zheng
Commit e5251fd43007 ("mm/hugetlb: introduce set_huge_swap_pte_at() helper") add set_huge_swap_pte_at() to handle swap entries on architectures that support hugepages consisting of contiguous ptes. And currently the set_huge_swap_pte_at() is only overridden by arm64. set_huge_swap_pte_at() provide a sz parameter to help determine the number of entries to be updated. But in fact, all hugetlb swap entries contain pfn information, so we can find the corresponding folio through the pfn recorded in the swap entry, then the folio_size() is the number of entries that need to be updated. And considering that users will easily cause bugs by ignoring the difference between set_huge_swap_pte_at() and set_huge_pte_at(). Let's handle swap entries in set_huge_pte_at() and remove the set_huge_swap_pte_at(), then we can call set_huge_pte_at() anywhere, which simplifies our coding. Link: https://lkml.kernel.org/r/20220626145717.53572-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm, docs: fix comments that mention mem_hotplug_end()Yun-Ze Li
Comments that mention mem_hotplug_end() are confusing as there is no function called mem_hotplug_end(). Fix them by replacing all the occurences of mem_hotplug_end() in the comments with mem_hotplug_done(). [akpm@linux-foundation.org: grammatical fixes] Link: https://lkml.kernel.org/r/20220620071516.1286101-1-p76091292@gs.ncku.edu.tw Signed-off-by: Yun-Ze Li <p76091292@gs.ncku.edu.tw> Cc: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with ↵Muchun Song
memmap_on_memory For now, the feature of hugetlb_free_vmemmap is not compatible with the feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. However, someone wants to make memory_hotplug.memmap_on_memory takes precedence over hugetlb_free_vmemmap since memmap_on_memory makes it more likely to succeed memory hotplug in close-to-OOM situations. So the decision of making hugetlb_free_vmemmap take precedence is not wise and elegant. The proper approach is to have hugetlb_vmemmap.c do the check whether the section which the HugeTLB pages belong to can be optimized. If the section's vmemmap pages are allocated from the added memory block itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do the optimization. Then both kernel parameters are compatible. So this patch introduces VmemmapSelfHosted to mask any non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag to detect if a vmemmap page can be optimized. [songmuchun@bytedance.com: walk vmemmap page tables to avoid false-positive] Link: https://lkml.kernel.org/r/20220620110616.12056-3-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220617135650.74901-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Co-developed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: memory_hotplug: enumerate all supported section flagsMuchun Song
Patch series "make hugetlb_optimize_vmemmap compatible with memmap_on_memory", v3. This series makes hugetlb_optimize_vmemmap compatible with memmap_on_memory. This patch (of 2): We are almost running out of section flags, only one bit is available in the worst case (powerpc with 256k pages). However, there are still some free bits (in ->section_mem_map) on other architectures (e.g. x86_64 has 10 bits available, arm64 has 8 bits available with worst case of 64K pages). We have hard coded those numbers in code, it is inconvenient to use those bits on other architectures except powerpc. So transfer those section flags to enumeration to make it easy to add new section flags in the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of CONFIG_ZONE_DEVICE to save a bit on non-zone-device case. [songmuchun@bytedance.com: replace enum with defines per David] Link: https://lkml.kernel.org/r/20220620110616.12056-2-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220617135650.74901-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220617135650.74901-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: convert destroy_compound_page() to destroy_large_folio()Matthew Wilcox (Oracle)
All callers now have a folio, so push the folio->page conversion down to this function. [akpm@linux-foundation.org: uninline destroy_large_folio() to fix build issue] Link: https://lkml.kernel.org/r/20220617175020.717127-20-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/swap: convert __put_page() to __folio_put()Matthew Wilcox (Oracle)
Saves 11 bytes of text by removing a check of PageTail. Link: https://lkml.kernel.org/r/20220617175020.717127-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/swap: make __pagevec_lru_add staticMatthew Wilcox (Oracle)
__pagevec_lru_add has no callers outside swap.c, so make it static, and move it to a more logical position in the file. Link: https://lkml.kernel.org/r/20220617175020.717127-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: add folios_put()Matthew Wilcox (Oracle)
Patch series "Convert the swap code to be more folio-based". There's still more to do with the swap code, but this reaps a lot of the folio benefit. More than 4kB of kernel text saved (with the UEK7 kernel config). I don't know how much that's going to translate into CPU savings, but some of those compound_head() calls are on every page free, so it should be noticable. It might even be noticable just from an I-cache consumption perspective. This patch (of 22): This is just a wrapper around release_pages() for now. Place the prototype in mm.h along with folio_put() and folio_put_refs(). Link: https://lkml.kernel.org/r/20220617175020.717127-1-willy@infradead.org Link: https://lkml.kernel.org/r/20220617175020.717127-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/vmscan: convert reclaim_clean_pages_from_list() to foliosMatthew Wilcox (Oracle)
Patch series "nvert much of vmscan to folios" vmscan always operates on folios since it puts the pages on the LRU list. Switching all of these functions from pages to folios saves 1483 bytes of text from removing all the baggage around calling compound_page() and similar functions. This patch (of 5): This is a straightforward conversion which removes several hidden calls to compound_head, saving 330 bytes of kernel text. Link: https://lkml.kernel.org/r/20220617154248.700416-1-willy@infradead.org Link: https://lkml.kernel.org/r/20220617154248.700416-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/mprotect: try avoiding write faults for exclusive anonymous pages when ↵David Hildenbrand
changing protection Similar to our MM_CP_DIRTY_ACCT handling for shared, writable mappings, we can try mapping anonymous pages in a private writable mapping writable if they are exclusive, the PTE is already dirty, and no special handling applies. Mapping the anonymous page writable is essentially the same thing the write fault handler would do in this case. Special handling is required for uffd-wp and softdirty tracking, so take care of that properly. Also, leave PROT_NONE handling alone for now; in the future, we could similarly extend the logic in do_numa_page() or use pte_mk_savedwrite() here. While this improves mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE) performance, it should also be a valuable optimization for uffd-wp, when un-protecting. This has been previously suggested by Peter Collingbourne in [1], relevant in the context of the Scudo memory allocator, before we had PageAnonExclusive. This commit doesn't add the same handling for PMDs (i.e., anonymous THP, anonymous hugetlb); benchmark results from Andrea indicate that there are minor performance gains, so it's might still be valuable to streamline that logic for all anonymous pages in the future. As we now also set MM_CP_DIRTY_ACCT for private mappings, let's rename it to MM_CP_TRY_CHANGE_WRITABLE, to make it clearer what's actually happening. Micro-benchmark courtesy of Andrea: === #define _GNU_SOURCE #include <sys/mman.h> #include <stdlib.h> #include <string.h> #include <stdio.h> #include <unistd.h> #define SIZE (1024*1024*1024) int main(int argc, char *argv[]) { char *p; if (posix_memalign((void **)&p, sysconf(_SC_PAGESIZE)*512, SIZE)) perror("posix_memalign"), exit(1); if (madvise(p, SIZE, argc > 1 ? MADV_HUGEPAGE : MADV_NOHUGEPAGE)) perror("madvise"); explicit_bzero(p, SIZE); for (int loops = 0; loops < 40; loops++) { if (mprotect(p, SIZE, PROT_READ)) perror("mprotect"), exit(1); if (mprotect(p, SIZE, PROT_READ|PROT_WRITE)) perror("mprotect"), exit(1); explicit_bzero(p, SIZE); } } === Results on my Ryzen 9 3900X: Stock 10 runs (lower is better): AVG 6.398s, STDEV 0.043 Patched 10 runs (lower is better): AVG 3.780s, STDEV 0.026 === [1] https://lkml.kernel.org/r/20210429214801.2583336-1-pcc@google.com Link: https://lkml.kernel.org/r/20220614093629.76309-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Peter Collingbourne <pcc@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/damon/schemes: add 'LRU_DEPRIO' actionSeongJae Park
This commit adds a new DAMON-based operation scheme action called 'LRU_DEPRIO' for physical address space. The action deprioritizes pages in the memory area of the target access pattern on their LRU lists. This is hence supposed to be used for rarely accessed (cold) memory regions so that cold pages could be more likely reclaimed first under memory pressure. Internally, it simply calls 'lru_deactivate()'. Using this with 'LRU_PRIO' action for hot pages, users can proactively sort LRU lists based on the access pattern. That is, it can make the LRU lists somewhat more trustworthy source of access temperature. As a result, efficiency of LRU-lists based mechanisms including the reclamation target selection could be improved. Link: https://lkml.kernel.org/r/20220613192301.8817-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/damon/schemes: add 'LRU_PRIO' DAMOS actionSeongJae Park
This commit adds a new DAMOS action called 'LRU_PRIO' for the physical address space. The action prioritizes pages in the memory regions of the user-specified target access pattern on their LRU lists. This is hence supposed to be used for frequently accessed (hot) memory regions so that hot pages could be more likely protected under memory pressure. Internally, it simply calls 'mark_page_accessed()'. Link: https://lkml.kernel.org/r/20220613192301.8817-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: shrinkers: provide shrinkers with namesRoman Gushchin
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: shrinkers: introduce debugfs interface for memory shrinkersRoman Gushchin
This commit introduces the /sys/kernel/debug/shrinker debugfs interface which provides an ability to observe the state of individual kernel memory shrinkers. Because the feature adds some memory overhead (which shouldn't be large unless there is a huge amount of registered shrinkers), it's guarded by a config option (enabled by default). This commit introduces the "count" interface for each shrinker registered in the system. The output is in the following format: <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>... <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>... ... To reduce the size of output on machines with many thousands cgroups, if the total number of objects on all nodes is 0, the line is omitted. If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed as cgroup inode id. If the shrinker is not numa-aware, 0's are printed for all nodes except the first one. This commit gives debugfs entries simple numeric names, which are not very convenient. The following commit in the series will provide shrinkers with more meaningful names. [akpm@linux-foundation.org: remove WARN_ON_ONCE(), per Roman] Reported-by: syzbot+300d27c79fe6d4cbcc39@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/20220601032227.4076670-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino()Roman Gushchin
Patch series "mm: introduce shrinker debugfs interface", v5. The only existing debugging mechanism is a couple of tracepoints in do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't covering everything though: shrinkers which report 0 objects will never show up, there is no support for memcg-aware shrinkers. Shrinkers are identified by their scan function, which is not always enough (e.g. hard to guess which super block's shrinker it is having only "super_cache_scan"). To provide a better visibility and debug options for memory shrinkers this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent similar to /sys/kernel/slab. For each shrinker registered in the system a directory is created. As now, the directory will contain only a "scan" file, which allows to get the number of managed objects for each memory cgroup (for memcg-aware shrinkers) and each numa node (for numa-aware shrinkers on a numa machine). Other interfaces might be added in the future. To make debugging more pleasant, the patchset also names all shrinkers, so that debugfs entries can have meaningful names. This patch (of 5): Shrinker debugfs requires a way to represent memory cgroups without using full paths, both for displaying information and getting input from a user. Cgroup inode number is a perfect way, already used by bpf. This commit adds a couple of helper functions which will be used to handle memcg-aware shrinkers. Link: https://lkml.kernel.org/r/20220601032227.4076670-1-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20220601032227.4076670-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: introduce clear_highpage_kasan_taggedAndrey Konovalov
Add a clear_highpage_kasan_tagged() helper that does clear_highpage() on a page potentially tagged by KASAN. This helper is used by the following patch. Link: https://lkml.kernel.org/r/4471979b46b2c487787ddcd08b9dc5fedd1b6ffd.1654798516.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/damon/{dbgfs,sysfs}: move target_has_pid() from dbgfs to damon.hSeongJae Park
The function for knowing if given monitoring context's targets will have pid or not is defined and used in dbgfs only. However, the logic is also needed for sysfs. This commit moves the code to damon.h and makes both dbgfs and sysfs to use it. Link: https://lkml.kernel.org/r/20220606182310.48781-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/migration: fix potential pte_unmap on an not mapped pteMiaohe Lin
__migration_entry_wait and migration_entry_wait_on_locked assume pte is always mapped from caller. But this is not the case when it's called from migration_entry_wait_huge and follow_huge_pmd. Add a hugetlbfs variant that calls hugetlb_migration_entry_wait(ptep == NULL) to fix this issue. Link: https://lkml.kernel.org/r/20220530113016.16663-5-linmiaohe@huawei.com Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>