summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-03-04mm: use __page_cache_release() in folios_put()Matthew Wilcox (Oracle)
Pass a pointer to the lruvec so we can take advantage of the folio_lruvec_relock_irqsave(). Adjust the calling convention of folio_lruvec_relock_irqsave() to suit and add a page_cache_release() wrapper. Link: https://lkml.kernel.org/r/20240227174254.710559-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04memcg: add mem_cgroup_uncharge_folios()Matthew Wilcox (Oracle)
Almost identical to mem_cgroup_uncharge_list(), except it takes a folio_batch instead of a list_head. Link: https://lkml.kernel.org/r/20240227174254.710559-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm: make folios_put() the basis of release_pages()Matthew Wilcox (Oracle)
Patch series "Rearrange batched folio freeing", v3. Other than the obvious "remove calls to compound_head" changes, the fundamental belief here is that iterating a linked list is much slower than iterating an array (5-15x slower in my testing). There's also an associated belief that since we iterate the batch of folios three times, we do better when the array is small (ie 15 entries) than we do with a batch that is hundreds of entries long, which only gives us the opportunity for the first pages to fall out of cache by the time we get to the end. It is possible we should increase the size of folio_batch. Hopefully the bots let us know if this introduces any performance regressions. This patch (of 3): By making release_pages() call folios_put(), we can get rid of the calls to compound_head() for the callers that already know they have folios. We can also get rid of the lock_batch tracking as we know the size of the batch is limited by folio_batch. This does reduce the maximum number of pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX (32) to PAGEVEC_SIZE (15). I do not expect this to make a significant difference, but if it does, we can increase PAGEVEC_SIZE to 31. Link: https://lkml.kernel.org/r/20240227174254.710559-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240227174254.710559-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm: remove total_mapcount()David Hildenbrand
All users of total_mapcount() are gone, let's remove it. Link: https://lkml.kernel.org/r/20240226141324.278526-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm: thp: split huge page to any lower order pagesZi Yan
To split a THP to any lower order pages, we need to reform THPs on subpages at given order and add page refcount based on the new page order. Also we need to reinitialize page_deferred_list after removing the page from the split_queue, otherwise a subsequent split will see list corruption when checking the page_deferred_list again. Note: Anonymous order-1 folio is not supported because _deferred_list, which is used by partially mapped folios, is stored in subpage 2 and an order-1 folio only has subpage 0 and 1. File-backed order-1 folios are fine, since they do not use _deferred_list. [ziy@nvidia.com: fixup per discussion with Ryan] Link: https://lkml.kernel.org/r/494F48CD-1F0F-4CAD-884E-6D48F40AF990@nvidia.com Link: https://lkml.kernel.org/r/20240226205534.1603748-8-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Koutny <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm: page_owner: add support for splitting to any order in split page_ownerZi Yan
It adds a new_order parameter to set new page order in page owner. It prepares for upcoming changes to support split huge page to any lower order. Link: https://lkml.kernel.org/r/20240226205534.1603748-7-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Koutny <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm: memcg: make memcg huge page split support any order splitZi Yan
It sets memcg information for the pages after the split. A new parameter new_order is added to tell the order of subpages in the new page, always 0 for now. It prepares for upcoming changes to support split huge page to any lower order. Link: https://lkml.kernel.org/r/20240226205534.1603748-6-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Koutny <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm/page_owner: use order instead of nr in split_page_owner()Zi Yan
We do not have non power of two pages, using nr is error prone if nr is not power-of-two. Use page order instead. Link: https://lkml.kernel.org/r/20240226205534.1603748-5-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Koutny <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm/memcg: use order instead of nr in split_page_memcg()Zi Yan
We do not have non power of two pages, using nr is error prone if nr is not power-of-two. Use page order instead. Link: https://lkml.kernel.org/r/20240226205534.1603748-4-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Koutny <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm: enumerate all gfp flagsSuren Baghdasaryan
Introduce GFP bits enumeration to let compiler track the number of used bits (which depends on the config options) instead of hardcoding them. That simplifies __GFP_BITS_SHIFT calculation. Link: https://lkml.kernel.org/r/20240224015800.2569851-1-surenb@google.com Suggested-by: Petr Tesařík <petr@tesarici.cz> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Petr Tesarik <petr@tesarici.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL ↵Vlastimil Babka
allocations Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sven van Ashbrook <svenva@chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Curtis Malainey <cujomalainey@chromium.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04Merge branch 'kvm-arm64/vfio-normal-nc' of ↵Alex Williamson
https://git.kernel.org/pub/scm/linux/kernel/git/oupton/linux into v6.9/vfio/next
2024-03-04bpf: struct_ops supports more than one page for trampolines.Kui-Feng Lee
The BPF struct_ops previously only allowed one page of trampolines. Each function pointer of a struct_ops is implemented by a struct_ops bpf program. Each struct_ops bpf program requires a trampoline. The following selftest patch shows each page can hold a little more than 20 trampolines. While one page is more than enough for the tcp-cc usecase, the sched_ext use case shows that one page is not always enough and hits the one page limit. This patch overcomes the one page limit by allocating another page when needed and it is limited to a total of MAX_IMAGE_PAGES (8) pages which is more than enough for reasonable usages. The variable st_map->image has been changed to st_map->image_pages, and its type has been changed to an array of pointers to pages. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04of: make for_each_property_of_node() available to to !OFBartosz Golaszewski
for_each_property_of_node() is a macro and so doesn't have a stub inline function for !OF. Move it out of the relevant #ifdef to make it available to all users. Fixes: 611cad720148 ("dt: add of_alias_scan and of_alias_get_id") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20240303104853.31511-1-brgl@bgdev.pl Signed-off-by: Rob Herring <robh@kernel.org>
2024-03-04Merge tag 'vfs-6.9.rw_hint' of ↵Christian Brauner
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull write hint fix from Christian Brauner: UFS devices are widely used in mobile applications, e.g. in smartphones. UFS vendors need data lifetime information to achieve good performance. Providing data lifetime information to UFS devices can result in up to 40% lower write amplification. Hence this patch series that restores the bi_write_hint member in struct bio. After this patch series has been merged, patches that implement data lifetime support in the SCSI disk (sd) driver will be sent to the Linux kernel SCSI maintainer. The following changes are included in this patch series: - Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls. - Move enum rw_hint into a new header file. - Support F_SET_RW_HINT for block devices to make it easy to test data lifetime support. - Restore the bio.bi_write_hint member and restore support in the VFS layer and also in the block layer for data lifetime information. The shell script that has been used to test the patch series combined with the SCSI patches is available at the end of this cover letter. * tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: block, fs: Restore the per-bio/request data lifetime fields fs: Propagate write hints to the struct block_device inode fs: Move enum rw_hint into a new header file fs: Split fcntl_rw_hint() fs: Verify write lifetime constants at compile time fs: Fix rw_hint validation Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-04Merge tag 'reset-for-v6.9' of git://git.pengutronix.de/pza/linux into soc/lateArnd Bergmann
Reset controller updates for v6.9 Enable support for the Sophgo SG2042 reset controller via reset-simple, add a GPIO-based reset controller criver for shared GPIO resets, extract an of_phandle_args_equal() helper function out of cpufreq, and use it in reset-gpio. Based on v6.8-rc5 because reset-gpio depends on commits in the gpio-driver-h-stubs-for-v6.8-rc5 tag. * tag 'reset-for-v6.9' of git://git.pengutronix.de/pza/linux: reset: Instantiate reset GPIO controller for shared reset-gpios reset: gpio: Add GPIO-based reset controller cpufreq: do not open-code of_phandle_args_equal() of: Add of_phandle_args_equal() helper reset: simple: add support for Sophgo SG2042 dt-bindings: reset: sophgo: support SG2042 Link: https://lore.kernel.org/r/20240301111300.4038207-1-p.zabel@pengutronix.de Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'omap-for-v6.9/dt-warnings-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into soc/late Update TI clksel clocks to use reg Updates for TI clksel clocks to use the standard reg property instead of the non-standard ti,bit-shift legacy property. There are still lots of TI composite clock related devicetree warnings for missing bindings, and overlapping reg properties. We have grouped some of the TI composite clocks under the clksel clock node, but did not consider the reg property issue. Let's update the existing users before we continue grouping more of the composite clocks. * tag 'omap-for-v6.9/dt-warnings-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: omap3: Update clksel clocks to use reg instead of ti,bit-shift ARM: dts: am3: Update clksel clocks to use reg instead of ti,bit-shift clk: ti: Improve clksel clock bit parsing for reg property clk: ti: Handle possible address in the node name Link: https://lore.kernel.org/r/pull-1709102378-94138@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04tee: make tee_bus_type constRicardo B. Marliere
Since commit d492cc2573a0 ("driver core: device.h: make struct bus_type a const *"), the driver core can properly handle constant struct bus_type, move the tee_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'samsung-drivers-6.9-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers Samsung SoC driver changes for v6.9, part two 1. Extend Exynos PMU (Power Management Unit) driver being also the syscon to main system controller registers block, to support Google GS101. The Google GS101 has PMU registers protected and writing is available only via SMC. The Exynos PMU will register its own custom regmap for such case of mixed MMIO+SMC. 2. Rework Samsung watchdog driver to get the regmap to PMU block not via syscon API, but from the Exynos PMU driver. This is necessary for the watchdog driver to work on Google GS101. * tag 'samsung-drivers-6.9-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs soc: samsung: exynos-pmu: Add regmap support for SoCs that protect PMU regs MAINTAINERS: samsung: gs101: match patches touching Google Tensor SoC Link: https://lore.kernel.org/r/20240227080755.34170-1-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'qcom-drivers-for-6.9' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers Qualcomm driver updates for v6.9 This introduces the Qualcomm Programmable Boot Sequencer (PBS) driver. The Qualcomm SMEM no longer acquires the hwspinlock during the "get" operation, to improve the system behavior during the recovery of a remoteproc that crashed with the hwspinlock held. The Qualcomm Always On Subsystem (AOSS) message protocol driver gains tracepoints, printf annotation, and a debugfs interface is introduced for tweaking system properties during development and debugging. The Qualcomm socinfo driver gains data for SM8475, QCM8550 and QCS8550 platforms, and the PM2250 is renamed to PM4125. Support for controlling the voltage regulator in SPM/SAW2 is introduced. The gfx.lvl power-domain is dropped for SA8540P, as this resource was incorrectly inherited from SC8280XP. Additionally some code cleanup improvements is introduced across APR, LLCC, SMP2P and SPM. * tag 'qcom-drivers-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (23 commits) dt-bindings: soc: qcom: qcom,saw2: add msm8226 l2 compatible soc: qcom: spm: add support for voltage regulator soc: qcom: spm: remove driver-internal structures from the driver API dt-bindings: soc: qcom: qcom,saw2: define optional regulator node dt-bindings: soc: qcom: qcom,saw2: add missing compatible strings dt-bindings: soc: qcom: merge qcom,saw2.txt into qcom,spm.yaml soc: qcom: llcc: Check return value on Broadcast_OR reg read soc: qcom: socinfo: Add Soc IDs for SM8475 family dt-bindings: arm: qcom,ids: Add IDs for SM8475 family soc: qcom: apr: make aprbus const dt-bindings: soc: qcom: qcom,pmic-glink: document X1E80100 compatible soc: qcom: add QCOM PBS driver dt-bindings: soc: qcom: Add qcom,pbs bindings pmdomain: qcom: rpmhpd: Drop SA8540P gfx.lvl soc: qcom: socinfo: rename PM2250 to PM4125 soc: qcom: aoss: Add tracepoints in qmp_send() soc: qcom: socinfo: add SoC Info support for QCM8550 and QCS8550 platform dt-bindings: arm: qcom,ids: add SoC ID for QCM8550 and QCS8550 soc: qcom: aoss: Add debugfs interface for sending messages soc: qcom: smem: remove hwspinlock from item get routine ... Link: https://lore.kernel.org/r/20240225030612.480241-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'tegra-for-6.9-soc' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers soc/tegra: Changes for v6.9-rc1 This set of changes adds ACPI support for the APBMISC driver and cleans up a few things like dependencies and unused code. * tag 'tegra-for-6.9-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: soc/tegra: pmc: Add SD wake event for Tegra234 soc/tegra: pmc: Update scratch as an optional aperture soc/tegra: pmc: Update address mapping sequence for PMC apertures bus: tegra-aconnect: Update dependency to ARCH_TEGRA soc/tegra: Fix build failure on Tegra241 soc/tegra: fuse: Fix crash in tegra_fuse_readl() soc/tegra: fuse: Define tegra194_soc_attr_group for Tegra241 soc/tegra: fuse: Add support for Tegra241 soc/tegra: fuse: Add ACPI support for Tegra194 and Tegra234 soc/tegra: fuse: Add function to print SKU info soc/tegra: fuse: Add function to add lookups soc/tegra: fuse: Add tegra_acpi_init_apbmisc() soc/tegra: fuse: Refactor resource mapping soc/tegra: fuse: Use dev_err_probe for probe failures mm/util: Introduce kmemdup_array() soc/tegra: pmc: Remove some old and deprecated functions and constants Link: https://lore.kernel.org/r/20240223174849.1509465-1-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'scmi-updates-6.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm SCMI updates for v6.9 Quite a few changes to extend support to SCMI v3.2 specification, to enhance notification handling and other miscellaneous updates. 1. Enhancements to notification handling Until now, trying to register a notifier for an unsuppported notification returned an error genrating unneeded message exchanges with the SCMI platform. This can be avoided by looking up in advance the specific protocol and resources available. With these changes SCMI driver user will fail to register a notifier if the related command or resource is not supported (like before) without the need of exchanging any message. Perf notifications are also extended to provide the pre-calculated frequencies corresponding to the level or index carried by the 2. More SCMI v3.2 related updates One of the main addition includes a centralized support to the SCMI core to handle v3.2 optional protocol version negotiation, so that at protocol initialization time, if the platform advertised version is newer than supported by the kernel and protocol version negotiation is supported, the SCMI core will attempt to negotiate an older protocol version. It also includes the clock get permissions which indicates if any of the clock operations are forbidden by the platform for the OSPM agent. It can be used in the clock driver to avoid unnecessary message exchanges between the kernel and the platform which will always end up with the failure. It also includes other missing bits of clock v3.2 protocol so that the supported protocol version can be bumped to 0x30000 (v3.2). 3. Miscellaneous updates This includes addition of warning if the domain frequency multiplier is 0 or rounded off to indicate the actual frequencies are either wrong ot rounded off, hardening of clock domain info lookups, addition of multiple protocols registration support within a SCMI driver, update to SCMI entry in MAINTAINERS to include HWMON driver and constifying the scmi_bus_type structure. This also includes couple for fixes to minor issues: double free in SMC transport cleanup path and struct kernel-doc warnings in optee transport. * tag 'scmi-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (29 commits) MAINTAINERS: Update SCMI entry with HWMON driver firmware: arm_scmi: Update the supported clock protocol version firmware: arm_scmi: Add standard clock OEM definitions firmware: arm_scmi: Add clock check for extended config support firmware: arm_scmi: Add support for v3.2 NEGOTIATE_PROTOCOL_VERSION firmware: arm_scmi: Fix struct kernel-doc warnings in optee transport firmware: arm_scmi: Report frequencies in the perf notifications firmware: arm_scmi: Use opps_by_lvl to store opps firmware: arm_scmi: Implement is_notify_supported callback in powercap protocol firmware: arm_scmi: Implement is_notify_supported callback in reset protocol firmware: arm_scmi: Implement is_notify_supported callback in sensor protocol firmware: arm_scmi: Implement is_notify_supported callback in clock protocol firmware: arm_scmi: Implement is_notify_supported callback in system power protocol firmware: arm_scmi: Implement is_notify_supported callback in power protocol firmware: arm_scmi: Implement is_notify_supported callback in perf protocol firmware: arm_scmi: Add a common helper to check if a message is supported firmware: arm_scmi: Check for notification support firmware: arm_scmi: Make scmi_bus_type const firmware: arm_scmi: Fix double free in SMC transport cleanup path firmware: arm_scmi: Implement clock get permissions ... Link: https://lore.kernel.org/r/20240223033435.118028-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04i2c: constify the struct device_type usageRicardo B. Marliere
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Move the i2c_adapter_type and i2c_client_type variables to be constant structures as well, placing it into read-only memory which can not be modified at runtime. Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-03-04smp: Consolidate smp_prepare_boot_cpu()Thomas Gleixner
There is no point in having seven architectures implementing the same empty stub. Provide a weak function in the init code and remove the stubs. This also allows to utilize the function on UP which is required to sanitize the per CPU handling on X86 UP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240304005104.567671691@linutronix.de
2024-03-04net: adopt skb_network_header_len() more broadlyEric Dumazet
(skb_transport_header(skb) - skb_network_header(skb)) can be replaced by skb_network_header_len(skb) Add a DEBUG_NET_WARN_ON_ONCE() in skb_network_header_len() to catch cases were the transport_header was not set. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-03Input: serio - make serio_bus constRicardo B. Marliere
Now that the driver core can properly handle constant struct bus_type, move the serio_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240210-bus_cleanup-input2-v1-2-0daef7e034e0@marliere.net Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-03-03soundwire: constify the struct device_type usageRicardo B. Marliere
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Move the sdw_master_type and sdw_slave_type variables to be constant structures as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240219-device_cleanup-soundwire-v1-1-9edd51767611@marliere.net Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-03-03of: Reimplement of_machine_is_compatible() using of_machine_compatible_match()Christophe Leroy
of_machine_compatible_match() works with a table of strings. of_machine_is_compatible() is a simplier version with only one string. Re-implement of_machine_is_compatible() by setting a table of strings with a single string then using of_machine_compatible_match(). Suggested-by: Rob Herring <robh+dt@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231214103152.12269-3-mpe@ellerman.id.au
2024-03-03of: Change of_machine_is_compatible() to return boolMichael Ellerman
of_machine_is_compatible() currently returns a positive integer if it finds a match. However none of the callers ever check the value, they all treat it as a true/false. So change of_machine_is_compatible() to return bool, which will allow the implementation to be changed in a subsequent patch. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231214103152.12269-2-mpe@ellerman.id.au
2024-03-03of: Add of_machine_compatible_match()Michael Ellerman
We have of_machine_is_compatible() to check if a machine is compatible with a single compatible string. However some code is able to support multiple compatible boards, and so wants to check for one of many compatible strings. So add of_machine_compatible_match() which takes a NULL terminated array of compatible strings to check against the root node's compatible property. Compared to an open coded match this is slightly more self documenting, and also avoids the caller needing to juggle the root node either directly or via of_find_node_by_path(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231214103152.12269-1-mpe@ellerman.id.au
2024-03-03gpio: nomadik: Finish conversion to use firmware node APIsAndy Shevchenko
Previously driver got a few updates in order to replace OF APIs by respective firmware node, however it was not finished to the logical end, e.g., some APIs that has been used are still require OF node to be passed. Finish that job by converting leftovers to use firmware node APIs. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20240302173401.217830-1-andy.shevchenko@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2024-03-02Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-02nvmet-rdma: set max_queue_size for RDMA transportMax Gurtovoy
A new port configuration was added to set max_queue_size. Clamp user configuration to RDMA transport limits. Increase the maximal queue size of RDMA controllers from 128 to 256 (the default size stays 128 same as before). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-02nvme-rdma: introduce NVME_RDMA_MAX_METADATA_QUEUE_SIZE definitionMax Gurtovoy
This definition will be used by controllers that are configured with metadata support. For now, both regular and metadata controllers have the same maximal queue size but later commit will increase the maximal queue size for regular RDMA controllers to 256. We'll keep the maximal queue size for metadata controllers to be 128 since there are more resources that are needed for metadata operations and 128 is the optimal size found for metadata controllers base on testing. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-02nvme-rdma: move NVME_RDMA_IP_PORT from common fileMax Gurtovoy
The correct place for this definition is the nvme rdma header file and not the common nvme header file. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-02USB: typec: no opencoding FIELD_GETOliver Neukum
We have a macro. It should be used. Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20240229132401.3270-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-02Merge tag 'thunderbolt-for-v6.9-rc1' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next Mika writes: thunderbolt: Changes for v6.9 merge window This includes following USB4/Thunderbolt changes for the v6.9 merge window: - Reset the topology also for USB4 v1 routers on driver load - DisplayPort tunneling and bandwidth allocation mode improvements - Tracepoint support for the control channel - Couple of minor fixes and cleanups. All these have been in linux-next with no reported issues. * tag 'thunderbolt-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (23 commits) thunderbolt: Constify the struct device_type usage thunderbolt: Add trace events support for the control channel thunderbolt: Keep the domain powered when USB4 port is in redrive mode thunderbolt: Improve DisplayPort tunnel setup process to be more robust thunderbolt: Calculate DisplayPort tunnel bandwidth after DPRX capabilities read thunderbolt: Reserve released DisplayPort bandwidth for a group for 10 seconds thunderbolt: Introduce tb_tunnel_direction_downstream() thunderbolt: Re-order bandwidth group functions thunderbolt: Fail the failed bandwidth request properly thunderbolt: Log an error if DPTX request is not cleared thunderbolt: Handle bandwidth allocation mode disable request thunderbolt: Re-calculate estimated bandwidth when allocation mode is enabled thunderbolt: Use DP_LOCAL_CAP for maximum bandwidth calculation thunderbolt: Correct typo in host_reset parameter thunderbolt: Skip discovery also in USB4 v2 host thunderbolt: Reset only non-USB4 host routers in resume thunderbolt: Remove usage of the deprecated ida_simple_xx() API thunderbolt: Fix rollback in tb_port_lane_bonding_enable() for lane 1 thunderbolt: Fix XDomain rx_lanes_show and tx_lanes_show thunderbolt: Reset topology created by the boot firmware ...
2024-03-02Merge tag 'coresight-next-v6.9' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: hwtracing subsystem updates for v6.9 Changes targeting Linux v6.9 include: - CoreSight: Enable W=1 warnings as default - CoreSight: Clean up sysfs/perf mode handling for tracing - Support for Qualcomm TPDM CMB Dataset - Miscellaneous fixes to the CoreSight subsystem - Fix for hisi_ptt PMU to reject events targeting other PMUs Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (32 commits) coresight-tpda: Change qcom,dsb-element-size to qcom,dsb-elem-bits dt-bindings: arm: qcom,coresight-tpdm: Rename qcom,dsb-element-size hwtracing: hisi_ptt: Move type check to the beginning of hisi_ptt_pmu_event_init() coresight: tpdm: Fix build break due to uninitialised field coresight: etm4x: Set skip_power_up in etm4_init_arch_data function coresight-tpdm: Add msr register support for CMB dt-bindings: arm: qcom,coresight-tpdm: Add support for TPDM CMB MSR register coresight-tpdm: Add timestamp control register support for the CMB coresight-tpdm: Add pattern registers support for CMB coresight-tpdm: Add support to configure CMB coresight-tpda: Add support to configure CMB element coresight-tpdm: Add CMB dataset support dt-bindings: arm: qcom,coresight-tpdm: Add support for CMB element size coresight-tpdm: Optimize the useage of tpdm_has_dsb_dataset coresight-tpdm: Optimize the store function of tpdm simple dataset coresight: Add helper for setting csdev->mode coresight: Add a helper for getting csdev->mode coresight: Add helper for atomically taking the device coresight: Add explicit member initializers to coresight_dev_type coresight: Remove unused stubs ...
2024-03-02Merge tag 'mhi-for-v6.9' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host ======== - Added new MHI_PM_SYS_ERR_FAIL state to the MHI state machine to properly cleanup the channel state if the device fails to respond to the MHI reset during SYS_ERR handling. This issue was discovered with the Qualcomm AIC100 AI accelerator device. - Modified the code that reads and exposes the OEM_PK_HASH registers through sysfs to read them on-demand instead of reading once during boot. Qualcomm AIC100 devices support provisioning the keys dynamically, so this allows the users to know the upto date information. - Added tracepoint support to expose the debug information over tracefs. - Reverted the commit that reads the MHI device revision from the device during boot. This is done because the read info was not used anywhere (dead code) and also it is not possible to read the revision info from all the devices. - Constified the modem config for Telit FN980 modem as required by the MHI core. MHI Endpoint ============ - Replaced kzalloc() with kcalloc() in an effort to avoid integer overflows during multiplication. Even though there is no potential overflow in the endpoint code, this is done for the sake of uniformity and best practice. - Fixed the kmem_cache_create() failure check to use the correct variable. * tag 'mhi-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: host: pci_generic: constify modem_telit_fn980_hw_v1_config bus: mhi: host: Change the trace string for the userspace tools mapping bus: mhi: ep: check the correct variable in mhi_ep_register_controller() Revert "bus: mhi: core: Add support for reading MHI info from device" bus: mhi: host: Add tracing support bus: mhi: ep: Use kcalloc() instead of kzalloc() bus: mhi: host: Read PK HASH dynamically bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state
2024-03-02block: define bvec_iter as __packed __aligned(4)Ming Lei
In commit 19416123ab3e ("block: define 'struct bvec_iter' as packed"), what we need is to save the 4byte padding, and avoid `bio` to spread on one extra cache line. It is enough to define it as '__packed __aligned(4)', as '__packed' alone means byte aligned, and can cause compiler to generate horrible code on architectures that don't support unaligned access in case that bvec_iter is embedded in other structures. Cc: Mikulas Patocka <mpatocka@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 19416123ab3e ("block: define 'struct bvec_iter' as packed") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-01net/mlx5: Check capability for fw_resetMoshe Shemesh
Functions which can't access MFRL (Management Firmware Reset Level) register, have no use of fw_reset structures or events. Remove fw_reset structures allocation and registration for fw reset events notifications for these functions. Having the devlink param enable_remote_dev_reset on functions that don't have this capability is misleading as these functions are not allowed to influence the reset flow. Hence, this patch removes this parameter for such functions. In addition, return not supported on devlink reload action fw_activate for these functions. Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01compiler.h: Explain how __is_constexpr() worksKees Cook
The __is_constexpr() macro is dark magic. Shed some light on it with a comment to explain how and why it works. Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240301044428.work.411-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01overflow: Allow non-type arg to type_max() and type_min()Kees Cook
A common use of type_max() is to find the max for the type of a variable. Using the pattern type_max(typeof(var)) is needlessly verbose. Instead, since typeof(type) == type we can just explicitly call typeof() on the argument to type_max() and type_min(). Add wrappers for readability. We can do some replacements right away: $ git grep '\btype_\(min\|max\)(typeof' | wc -l 11 Link: https://lore.kernel.org/r/20240301062221.work.840-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01power: supply: core: make power_supply_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the power_supply_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240301-class_cleanup-power-v1-1-97e0b7bf9c94@marliere.net Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-03-01sched/idle: Conditionally handle tick broadcast in default_idle_call()Thomas Gleixner
The x86 architecture has an idle routine for AMD CPUs which are affected by erratum 400. On the affected CPUs the local APIC timer stops in the C1E halt state. It therefore requires tick broadcasting. The invocation of tick_broadcast_enter()/exit() from this function violates the RCU constraints because it can end up in lockdep or tracing, which rightfully triggers a warning. tick_broadcast_enter()/exit() must be invoked before ct_cpuidle_enter() and after ct_cpuidle_exit() in default_idle_call(). Add a static branch conditional invocation of tick_broadcast_enter()/exit() into this function to allow X86 to replace the AMD specific idle code. It's guarded by a config switch which will be selected by x86. Otherwise it's a NOOP. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240229142248.266708822@linutronix.de
2024-03-01block: add a queue_limits_stack_bdev helperChristoph Hellwig
Add a small wrapper around blk_stack_limits that allows passing a bdev for the bottom device and prints an error in case of misaligned device. The name fits into the new queue limits API and the intent is to eventually replace disk_stack_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240228225653.947152-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01block: add a queue_limits_set helperChristoph Hellwig
Add a small wrapper around queue_limits_commit_update for stacking drivers that don't want to update existing limits, but set an entirely new set. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240228225653.947152-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01svcrdma: Add Write chunk WRs to the RPC's Send WR chainChuck Lever
Chain RDMA Writes that convey Write chunks onto the local Send chain. This means all WRs for an RPC Reply are now posted with a single ib_post_send() call, and there is a single Send completion when all of these are done. That reduces both the per-transport doorbell rate and completion rate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01svcrdma: Post WRs for Write chunks in svc_rdma_sendto()Chuck Lever
Refactor to eventually enable svcrdma to post the Write WRs for each RPC response using the same ib_post_send() as the Send WR (ie, as a single WR chain). svc_rdma_result_payload (originally svc_rdma_read_payload) was added so that the upper layer XDR encoder could identify a range of bytes to be possibly conveyed by RDMA (if a Write chunk was provided by the client). The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while XDR encoding replies") was to post as much of the result payload outside of svc_rdma_sendto() as possible because svc_rdma_sendto() used to be called with the xpt_mutex held. However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into socket xpo_sendto methods"), the xpt_mutex is no longer held when calling svc_rdma_sendto(). Thus, that benefit is no longer an issue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01svcrdma: Post the Reply chunk and Send WR togetherChuck Lever
Reduce the doorbell and Send completion rates when sending RPC/RDMA replies that have Reply chunks. NFS READDIR procedures typically return their result in a Reply chunk, for example. Instead of calling ib_post_send() to post the Write WRs for the Reply chunk, and then calling it again to post the Send WR that conveys the transport header, chain the Write WRs to the Send WR and call ib_post_send() only once. Thanks to the Send Queue completion ordering rules, when the Send WR completes, that guarantees that Write WRs posted before it have also completed successfully. Thus all Write WRs for the Reply chunk can remain unsignaled. Instead of handling a Write completion and then a Send completion, only the Send completion is seen, and it handles clean up for both the Writes and the Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>