Age | Commit message (Collapse) | Author |
|
Pass a pointer to the lruvec so we can take advantage of the
folio_lruvec_relock_irqsave(). Adjust the calling convention of
folio_lruvec_relock_irqsave() to suit and add a page_cache_release()
wrapper.
Link: https://lkml.kernel.org/r/20240227174254.710559-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Almost identical to mem_cgroup_uncharge_list(), except it takes a
folio_batch instead of a list_head.
Link: https://lkml.kernel.org/r/20240227174254.710559-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Rearrange batched folio freeing", v3.
Other than the obvious "remove calls to compound_head" changes, the
fundamental belief here is that iterating a linked list is much slower
than iterating an array (5-15x slower in my testing). There's also an
associated belief that since we iterate the batch of folios three times,
we do better when the array is small (ie 15 entries) than we do with a
batch that is hundreds of entries long, which only gives us the
opportunity for the first pages to fall out of cache by the time we get to
the end.
It is possible we should increase the size of folio_batch. Hopefully the
bots let us know if this introduces any performance regressions.
This patch (of 3):
By making release_pages() call folios_put(), we can get rid of the calls
to compound_head() for the callers that already know they have folios. We
can also get rid of the lock_batch tracking as we know the size of the
batch is limited by folio_batch. This does reduce the maximum number of
pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX (32) to
PAGEVEC_SIZE (15). I do not expect this to make a significant difference,
but if it does, we can increase PAGEVEC_SIZE to 31.
Link: https://lkml.kernel.org/r/20240227174254.710559-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20240227174254.710559-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All users of total_mapcount() are gone, let's remove it.
Link: https://lkml.kernel.org/r/20240226141324.278526-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
To split a THP to any lower order pages, we need to reform THPs on
subpages at given order and add page refcount based on the new page order.
Also we need to reinitialize page_deferred_list after removing the page
from the split_queue, otherwise a subsequent split will see list
corruption when checking the page_deferred_list again.
Note: Anonymous order-1 folio is not supported because _deferred_list,
which is used by partially mapped folios, is stored in subpage 2 and an
order-1 folio only has subpage 0 and 1. File-backed order-1 folios are
fine, since they do not use _deferred_list.
[ziy@nvidia.com: fixup per discussion with Ryan]
Link: https://lkml.kernel.org/r/494F48CD-1F0F-4CAD-884E-6D48F40AF990@nvidia.com
Link: https://lkml.kernel.org/r/20240226205534.1603748-8-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It adds a new_order parameter to set new page order in page owner. It
prepares for upcoming changes to support split huge page to any lower
order.
Link: https://lkml.kernel.org/r/20240226205534.1603748-7-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It sets memcg information for the pages after the split. A new parameter
new_order is added to tell the order of subpages in the new page, always 0
for now. It prepares for upcoming changes to support split huge page to
any lower order.
Link: https://lkml.kernel.org/r/20240226205534.1603748-6-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We do not have non power of two pages, using nr is error prone if nr is
not power-of-two. Use page order instead.
Link: https://lkml.kernel.org/r/20240226205534.1603748-5-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We do not have non power of two pages, using nr is error prone if nr is
not power-of-two. Use page order instead.
Link: https://lkml.kernel.org/r/20240226205534.1603748-4-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce GFP bits enumeration to let compiler track the number of used
bits (which depends on the config options) instead of hardcoding them.
That simplifies __GFP_BITS_SHIFT calculation.
Link: https://lkml.kernel.org/r/20240224015800.2569851-1-surenb@google.com
Suggested-by: Petr Tesařík <petr@tesarici.cz>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Petr Tesarik <petr@tesarici.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
allocations
Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.
Quoting Sven:
1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
with __GFP_RETRY_MAYFAIL set.
2. page alloc's __alloc_pages_slowpath tries to get a page from the
freelist. This fails because there is nothing free of that costly
order.
3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
which bails out because a zone is ready to be compacted; it pretends
to have made a single page of progress.
4. page alloc tries to compact, but this always bails out early because
__GFP_IO is not set (it's not passed by the snd allocator, and even
if it were, we are suspending so the __GFP_IO flag would be cleared
anyway).
5. page alloc believes reclaim progress was made (because of the
pretense in item 3) and so it checks whether it should retry
compaction. The compaction retry logic thinks it should try again,
because:
a) reclaim is needed because of the early bail-out in item 4
b) a zonelist is suitable for compaction
6. goto 2. indefinite stall.
(end quote)
The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries. There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.
To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.
Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
there's at least one attempt to actually reclaim, even if chances are
small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
which is then used to avoid retrying reclaim/compaction for costly
allocations (step 5) if we can't compact and also to skip the early
compaction attempt that we do in some cases
Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/oupton/linux into v6.9/vfio/next
|
|
The BPF struct_ops previously only allowed one page of trampolines.
Each function pointer of a struct_ops is implemented by a struct_ops
bpf program. Each struct_ops bpf program requires a trampoline.
The following selftest patch shows each page can hold a little more
than 20 trampolines.
While one page is more than enough for the tcp-cc usecase,
the sched_ext use case shows that one page is not always enough and hits
the one page limit. This patch overcomes the one page limit by allocating
another page when needed and it is limited to a total of
MAX_IMAGE_PAGES (8) pages which is more than enough for
reasonable usages.
The variable st_map->image has been changed to st_map->image_pages, and
its type has been changed to an array of pointers to pages.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240224223418.526631-3-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
for_each_property_of_node() is a macro and so doesn't have a stub inline
function for !OF. Move it out of the relevant #ifdef to make it available
to all users.
Fixes: 611cad720148 ("dt: add of_alias_scan and of_alias_get_id")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20240303104853.31511-1-brgl@bgdev.pl
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull write hint fix from Christian Brauner:
UFS devices are widely used in mobile applications, e.g. in smartphones.
UFS vendors need data lifetime information to achieve good performance.
Providing data lifetime information to UFS devices can result in up to
40% lower write amplification. Hence this patch series that restores the
bi_write_hint member in struct bio. After this patch series has been
merged, patches that implement data lifetime support in the SCSI disk
(sd) driver will be sent to the Linux kernel SCSI maintainer.
The following changes are included in this patch series:
- Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls.
- Move enum rw_hint into a new header file.
- Support F_SET_RW_HINT for block devices to make it easy to test data
lifetime support.
- Restore the bio.bi_write_hint member and restore support in the VFS
layer and also in the block layer for data lifetime information.
The shell script that has been used to test the patch series combined
with the SCSI patches is available at the end of this cover letter.
* tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
block, fs: Restore the per-bio/request data lifetime fields
fs: Propagate write hints to the struct block_device inode
fs: Move enum rw_hint into a new header file
fs: Split fcntl_rw_hint()
fs: Verify write lifetime constants at compile time
fs: Fix rw_hint validation
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Reset controller updates for v6.9
Enable support for the Sophgo SG2042 reset controller via reset-simple,
add a GPIO-based reset controller criver for shared GPIO resets, extract
an of_phandle_args_equal() helper function out of cpufreq, and use it in
reset-gpio.
Based on v6.8-rc5 because reset-gpio depends on commits in the
gpio-driver-h-stubs-for-v6.8-rc5 tag.
* tag 'reset-for-v6.9' of git://git.pengutronix.de/pza/linux:
reset: Instantiate reset GPIO controller for shared reset-gpios
reset: gpio: Add GPIO-based reset controller
cpufreq: do not open-code of_phandle_args_equal()
of: Add of_phandle_args_equal() helper
reset: simple: add support for Sophgo SG2042
dt-bindings: reset: sophgo: support SG2042
Link: https://lore.kernel.org/r/20240301111300.4038207-1-p.zabel@pengutronix.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into soc/late
Update TI clksel clocks to use reg
Updates for TI clksel clocks to use the standard reg property instead of
the non-standard ti,bit-shift legacy property.
There are still lots of TI composite clock related devicetree warnings for
missing bindings, and overlapping reg properties. We have grouped some of
the TI composite clocks under the clksel clock node, but did not consider
the reg property issue. Let's update the existing users before we continue
grouping more of the composite clocks.
* tag 'omap-for-v6.9/dt-warnings-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: omap3: Update clksel clocks to use reg instead of ti,bit-shift
ARM: dts: am3: Update clksel clocks to use reg instead of ti,bit-shift
clk: ti: Improve clksel clock bit parsing for reg property
clk: ti: Handle possible address in the node name
Link: https://lore.kernel.org/r/pull-1709102378-94138@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Since commit d492cc2573a0 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the tee_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers
Samsung SoC driver changes for v6.9, part two
1. Extend Exynos PMU (Power Management Unit) driver being also the
syscon to main system controller registers block, to support Google
GS101. The Google GS101 has PMU registers protected and writing is
available only via SMC. The Exynos PMU will register its own custom
regmap for such case of mixed MMIO+SMC.
2. Rework Samsung watchdog driver to get the regmap to PMU block not
via syscon API, but from the Exynos PMU driver. This is necessary
for the watchdog driver to work on Google GS101.
* tag 'samsung-drivers-6.9-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs
soc: samsung: exynos-pmu: Add regmap support for SoCs that protect PMU regs
MAINTAINERS: samsung: gs101: match patches touching Google Tensor SoC
Link: https://lore.kernel.org/r/20240227080755.34170-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers
Qualcomm driver updates for v6.9
This introduces the Qualcomm Programmable Boot Sequencer (PBS) driver.
The Qualcomm SMEM no longer acquires the hwspinlock during the "get"
operation, to improve the system behavior during the recovery of a
remoteproc that crashed with the hwspinlock held.
The Qualcomm Always On Subsystem (AOSS) message protocol driver gains
tracepoints, printf annotation, and a debugfs interface is introduced
for tweaking system properties during development and debugging.
The Qualcomm socinfo driver gains data for SM8475, QCM8550 and
QCS8550 platforms, and the PM2250 is renamed to PM4125.
Support for controlling the voltage regulator in SPM/SAW2 is introduced.
The gfx.lvl power-domain is dropped for SA8540P, as this resource was
incorrectly inherited from SC8280XP.
Additionally some code cleanup improvements is introduced across APR,
LLCC, SMP2P and SPM.
* tag 'qcom-drivers-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (23 commits)
dt-bindings: soc: qcom: qcom,saw2: add msm8226 l2 compatible
soc: qcom: spm: add support for voltage regulator
soc: qcom: spm: remove driver-internal structures from the driver API
dt-bindings: soc: qcom: qcom,saw2: define optional regulator node
dt-bindings: soc: qcom: qcom,saw2: add missing compatible strings
dt-bindings: soc: qcom: merge qcom,saw2.txt into qcom,spm.yaml
soc: qcom: llcc: Check return value on Broadcast_OR reg read
soc: qcom: socinfo: Add Soc IDs for SM8475 family
dt-bindings: arm: qcom,ids: Add IDs for SM8475 family
soc: qcom: apr: make aprbus const
dt-bindings: soc: qcom: qcom,pmic-glink: document X1E80100 compatible
soc: qcom: add QCOM PBS driver
dt-bindings: soc: qcom: Add qcom,pbs bindings
pmdomain: qcom: rpmhpd: Drop SA8540P gfx.lvl
soc: qcom: socinfo: rename PM2250 to PM4125
soc: qcom: aoss: Add tracepoints in qmp_send()
soc: qcom: socinfo: add SoC Info support for QCM8550 and QCS8550 platform
dt-bindings: arm: qcom,ids: add SoC ID for QCM8550 and QCS8550
soc: qcom: aoss: Add debugfs interface for sending messages
soc: qcom: smem: remove hwspinlock from item get routine
...
Link: https://lore.kernel.org/r/20240225030612.480241-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers
soc/tegra: Changes for v6.9-rc1
This set of changes adds ACPI support for the APBMISC driver and cleans
up a few things like dependencies and unused code.
* tag 'tegra-for-6.9-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
soc/tegra: pmc: Add SD wake event for Tegra234
soc/tegra: pmc: Update scratch as an optional aperture
soc/tegra: pmc: Update address mapping sequence for PMC apertures
bus: tegra-aconnect: Update dependency to ARCH_TEGRA
soc/tegra: Fix build failure on Tegra241
soc/tegra: fuse: Fix crash in tegra_fuse_readl()
soc/tegra: fuse: Define tegra194_soc_attr_group for Tegra241
soc/tegra: fuse: Add support for Tegra241
soc/tegra: fuse: Add ACPI support for Tegra194 and Tegra234
soc/tegra: fuse: Add function to print SKU info
soc/tegra: fuse: Add function to add lookups
soc/tegra: fuse: Add tegra_acpi_init_apbmisc()
soc/tegra: fuse: Refactor resource mapping
soc/tegra: fuse: Use dev_err_probe for probe failures
mm/util: Introduce kmemdup_array()
soc/tegra: pmc: Remove some old and deprecated functions and constants
Link: https://lore.kernel.org/r/20240223174849.1509465-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers
Arm SCMI updates for v6.9
Quite a few changes to extend support to SCMI v3.2 specification,
to enhance notification handling and other miscellaneous updates.
1. Enhancements to notification handling
Until now, trying to register a notifier for an unsuppported
notification returned an error genrating unneeded message exchanges
with the SCMI platform. This can be avoided by looking up in advance
the specific protocol and resources available.
With these changes SCMI driver user will fail to register a notifier
if the related command or resource is not supported (like before)
without the need of exchanging any message.
Perf notifications are also extended to provide the pre-calculated
frequencies corresponding to the level or index carried by the
2. More SCMI v3.2 related updates
One of the main addition includes a centralized support to the SCMI
core to handle v3.2 optional protocol version negotiation, so that
at protocol initialization time, if the platform advertised version
is newer than supported by the kernel and protocol version negotiation
is supported, the SCMI core will attempt to negotiate an older protocol
version.
It also includes the clock get permissions which indicates if any of
the clock operations are forbidden by the platform for the OSPM agent.
It can be used in the clock driver to avoid unnecessary message
exchanges between the kernel and the platform which will always end
up with the failure. It also includes other missing bits of clock
v3.2 protocol so that the supported protocol version can be bumped
to 0x30000 (v3.2).
3. Miscellaneous updates
This includes addition of warning if the domain frequency multiplier
is 0 or rounded off to indicate the actual frequencies are either
wrong ot rounded off, hardening of clock domain info lookups, addition
of multiple protocols registration support within a SCMI driver,
update to SCMI entry in MAINTAINERS to include HWMON driver and
constifying the scmi_bus_type structure.
This also includes couple for fixes to minor issues: double free in
SMC transport cleanup path and struct kernel-doc warnings in optee
transport.
* tag 'scmi-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (29 commits)
MAINTAINERS: Update SCMI entry with HWMON driver
firmware: arm_scmi: Update the supported clock protocol version
firmware: arm_scmi: Add standard clock OEM definitions
firmware: arm_scmi: Add clock check for extended config support
firmware: arm_scmi: Add support for v3.2 NEGOTIATE_PROTOCOL_VERSION
firmware: arm_scmi: Fix struct kernel-doc warnings in optee transport
firmware: arm_scmi: Report frequencies in the perf notifications
firmware: arm_scmi: Use opps_by_lvl to store opps
firmware: arm_scmi: Implement is_notify_supported callback in powercap protocol
firmware: arm_scmi: Implement is_notify_supported callback in reset protocol
firmware: arm_scmi: Implement is_notify_supported callback in sensor protocol
firmware: arm_scmi: Implement is_notify_supported callback in clock protocol
firmware: arm_scmi: Implement is_notify_supported callback in system power protocol
firmware: arm_scmi: Implement is_notify_supported callback in power protocol
firmware: arm_scmi: Implement is_notify_supported callback in perf protocol
firmware: arm_scmi: Add a common helper to check if a message is supported
firmware: arm_scmi: Check for notification support
firmware: arm_scmi: Make scmi_bus_type const
firmware: arm_scmi: Fix double free in SMC transport cleanup path
firmware: arm_scmi: Implement clock get permissions
...
Link: https://lore.kernel.org/r/20240223033435.118028-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver
core can properly handle constant struct device_type. Move the
i2c_adapter_type and i2c_client_type variables to be constant structures as
well, placing it into read-only memory which can not be modified at
runtime.
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
There is no point in having seven architectures implementing the same empty
stub.
Provide a weak function in the init code and remove the stubs.
This also allows to utilize the function on UP which is required to
sanitize the per CPU handling on X86 UP.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240304005104.567671691@linutronix.de
|
|
(skb_transport_header(skb) - skb_network_header(skb))
can be replaced by skb_network_header_len(skb)
Add a DEBUG_NET_WARN_ON_ONCE() in skb_network_header_len()
to catch cases were the transport_header was not set.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the driver core can properly handle constant struct bus_type,
move the serio_bus variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Link: https://lore.kernel.org/r/20240210-bus_cleanup-input2-v1-2-0daef7e034e0@marliere.net
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver
core can properly handle constant struct device_type. Move the
sdw_master_type and sdw_slave_type variables to be constant structures as
well, placing it into read-only memory which can not be modified at
runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Link: https://lore.kernel.org/r/20240219-device_cleanup-soundwire-v1-1-9edd51767611@marliere.net
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
of_machine_compatible_match() works with a table of strings.
of_machine_is_compatible() is a simplier version with only one string.
Re-implement of_machine_is_compatible() by setting a table of strings
with a single string then using of_machine_compatible_match().
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214103152.12269-3-mpe@ellerman.id.au
|
|
of_machine_is_compatible() currently returns a positive integer if it
finds a match. However none of the callers ever check the value, they
all treat it as a true/false.
So change of_machine_is_compatible() to return bool, which will allow
the implementation to be changed in a subsequent patch.
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214103152.12269-2-mpe@ellerman.id.au
|
|
We have of_machine_is_compatible() to check if a machine is compatible
with a single compatible string. However some code is able to support
multiple compatible boards, and so wants to check for one of many
compatible strings.
So add of_machine_compatible_match() which takes a NULL terminated
array of compatible strings to check against the root node's
compatible property.
Compared to an open coded match this is slightly more self
documenting, and also avoids the caller needing to juggle the root
node either directly or via of_find_node_by_path().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214103152.12269-1-mpe@ellerman.id.au
|
|
Previously driver got a few updates in order to replace OF APIs by
respective firmware node, however it was not finished to the logical
end, e.g., some APIs that has been used are still require OF node
to be passed. Finish that job by converting leftovers to use firmware
node APIs.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240302173401.217830-1-andy.shevchenko@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-02-29
We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).
The main changes are:
1) Extend the BPF verifier to enable static subprog calls in spin lock
critical sections, from Kumar Kartikeya Dwivedi.
2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
in BPF global subprogs, from Andrii Nakryiko.
3) Larger batch of riscv BPF JIT improvements and enabling inlining
of the bpf_kptr_xchg() for RV64, from Pu Lehui.
4) Allow skeleton users to change the values of the fields in struct_ops
maps at runtime, from Kui-Feng Lee.
5) Extend the verifier's capabilities of tracking scalars when they
are spilled to stack, especially when the spill or fill is narrowing,
from Maxim Mikityanskiy & Eduard Zingerman.
6) Various BPF selftest improvements to fix errors under gcc BPF backend,
from Jose E. Marchesi.
7) Avoid module loading failure when the module trying to register
a struct_ops has its BTF section stripped, from Geliang Tang.
8) Annotate all kfuncs in .BTF_ids section which eventually allows
for automatic kfunc prototype generation from bpftool, from Daniel Xu.
9) Several updates to the instruction-set.rst IETF standardization
document, from Dave Thaler.
10) Shrink the size of struct bpf_map resp. bpf_array,
from Alexei Starovoitov.
11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
from Benjamin Tissoires.
12) Fix bpftool to be more portable to musl libc by using POSIX's
basename(), from Arnaldo Carvalho de Melo.
13) Add libbpf support to gcc in CORE macro definitions,
from Cupertino Miranda.
14) Remove a duplicate type check in perf_event_bpf_event,
from Florian Lehner.
15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
with notrace correctly, from Yonghong Song.
16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
array to fix build warnings, from Kees Cook.
17) Fix resolve_btfids cross-compilation to non host-native endianness,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
selftests/bpf: Test if shadow types work correctly.
bpftool: Add an example for struct_ops map and shadow type.
bpftool: Generated shadow variables for struct_ops maps.
libbpf: Convert st_ops->data to shadow type.
libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
bpf, arm64: use bpf_prog_pack for memory management
arm64: patching: implement text_poke API
bpf, arm64: support exceptions
arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
bpf: add is_async_callback_calling_insn() helper
bpf: introduce in_sleepable() helper
bpf: allow more maps in sleepable bpf programs
selftests/bpf: Test case for lacking CFI stub functions.
bpf: Check cfi_stubs before registering a struct_ops type.
bpf: Clarify batch lookup/lookup_and_delete semantics
bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
bpf, docs: Fix typos in instruction-set.rst
selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
bpf: Shrink size of struct bpf_map/bpf_array.
...
====================
Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A new port configuration was added to set max_queue_size. Clamp user
configuration to RDMA transport limits.
Increase the maximal queue size of RDMA controllers from 128 to 256
(the default size stays 128 same as before).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
This definition will be used by controllers that are configured with
metadata support. For now, both regular and metadata controllers have
the same maximal queue size but later commit will increase the maximal
queue size for regular RDMA controllers to 256.
We'll keep the maximal queue size for metadata controllers to be 128
since there are more resources that are needed for metadata operations
and 128 is the optimal size found for metadata controllers base on
testing.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The correct place for this definition is the nvme rdma header file and
not the common nvme header file.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
We have a macro. It should be used.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://lore.kernel.org/r/20240229132401.3270-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next
Mika writes:
thunderbolt: Changes for v6.9 merge window
This includes following USB4/Thunderbolt changes for the v6.9 merge
window:
- Reset the topology also for USB4 v1 routers on driver load
- DisplayPort tunneling and bandwidth allocation mode improvements
- Tracepoint support for the control channel
- Couple of minor fixes and cleanups.
All these have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (23 commits)
thunderbolt: Constify the struct device_type usage
thunderbolt: Add trace events support for the control channel
thunderbolt: Keep the domain powered when USB4 port is in redrive mode
thunderbolt: Improve DisplayPort tunnel setup process to be more robust
thunderbolt: Calculate DisplayPort tunnel bandwidth after DPRX capabilities read
thunderbolt: Reserve released DisplayPort bandwidth for a group for 10 seconds
thunderbolt: Introduce tb_tunnel_direction_downstream()
thunderbolt: Re-order bandwidth group functions
thunderbolt: Fail the failed bandwidth request properly
thunderbolt: Log an error if DPTX request is not cleared
thunderbolt: Handle bandwidth allocation mode disable request
thunderbolt: Re-calculate estimated bandwidth when allocation mode is enabled
thunderbolt: Use DP_LOCAL_CAP for maximum bandwidth calculation
thunderbolt: Correct typo in host_reset parameter
thunderbolt: Skip discovery also in USB4 v2 host
thunderbolt: Reset only non-USB4 host routers in resume
thunderbolt: Remove usage of the deprecated ida_simple_xx() API
thunderbolt: Fix rollback in tb_port_lane_bonding_enable() for lane 1
thunderbolt: Fix XDomain rx_lanes_show and tx_lanes_show
thunderbolt: Reset topology created by the boot firmware
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next
Suzuki writes:
coresight: hwtracing subsystem updates for v6.9
Changes targeting Linux v6.9 include:
- CoreSight: Enable W=1 warnings as default
- CoreSight: Clean up sysfs/perf mode handling for tracing
- Support for Qualcomm TPDM CMB Dataset
- Miscellaneous fixes to the CoreSight subsystem
- Fix for hisi_ptt PMU to reject events targeting other PMUs
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-next-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (32 commits)
coresight-tpda: Change qcom,dsb-element-size to qcom,dsb-elem-bits
dt-bindings: arm: qcom,coresight-tpdm: Rename qcom,dsb-element-size
hwtracing: hisi_ptt: Move type check to the beginning of hisi_ptt_pmu_event_init()
coresight: tpdm: Fix build break due to uninitialised field
coresight: etm4x: Set skip_power_up in etm4_init_arch_data function
coresight-tpdm: Add msr register support for CMB
dt-bindings: arm: qcom,coresight-tpdm: Add support for TPDM CMB MSR register
coresight-tpdm: Add timestamp control register support for the CMB
coresight-tpdm: Add pattern registers support for CMB
coresight-tpdm: Add support to configure CMB
coresight-tpda: Add support to configure CMB element
coresight-tpdm: Add CMB dataset support
dt-bindings: arm: qcom,coresight-tpdm: Add support for CMB element size
coresight-tpdm: Optimize the useage of tpdm_has_dsb_dataset
coresight-tpdm: Optimize the store function of tpdm simple dataset
coresight: Add helper for setting csdev->mode
coresight: Add a helper for getting csdev->mode
coresight: Add helper for atomically taking the device
coresight: Add explicit member initializers to coresight_dev_type
coresight: Remove unused stubs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next
Manivannan writes:
MHI Host
========
- Added new MHI_PM_SYS_ERR_FAIL state to the MHI state machine to properly
cleanup the channel state if the device fails to respond to the MHI reset
during SYS_ERR handling. This issue was discovered with the Qualcomm AIC100 AI
accelerator device.
- Modified the code that reads and exposes the OEM_PK_HASH registers through
sysfs to read them on-demand instead of reading once during boot. Qualcomm
AIC100 devices support provisioning the keys dynamically, so this allows the
users to know the upto date information.
- Added tracepoint support to expose the debug information over tracefs.
- Reverted the commit that reads the MHI device revision from the device during
boot. This is done because the read info was not used anywhere (dead code) and
also it is not possible to read the revision info from all the devices.
- Constified the modem config for Telit FN980 modem as required by the MHI core.
MHI Endpoint
============
- Replaced kzalloc() with kcalloc() in an effort to avoid integer overflows
during multiplication. Even though there is no potential overflow in the
endpoint code, this is done for the sake of uniformity and best practice.
- Fixed the kmem_cache_create() failure check to use the correct variable.
* tag 'mhi-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: host: pci_generic: constify modem_telit_fn980_hw_v1_config
bus: mhi: host: Change the trace string for the userspace tools mapping
bus: mhi: ep: check the correct variable in mhi_ep_register_controller()
Revert "bus: mhi: core: Add support for reading MHI info from device"
bus: mhi: host: Add tracing support
bus: mhi: ep: Use kcalloc() instead of kzalloc()
bus: mhi: host: Read PK HASH dynamically
bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state
|
|
In commit 19416123ab3e ("block: define 'struct bvec_iter' as packed"),
what we need is to save the 4byte padding, and avoid `bio` to spread on
one extra cache line.
It is enough to define it as '__packed __aligned(4)', as '__packed'
alone means byte aligned, and can cause compiler to generate horrible
code on architectures that don't support unaligned access in case that
bvec_iter is embedded in other structures.
Cc: Mikulas Patocka <mpatocka@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 19416123ab3e ("block: define 'struct bvec_iter' as packed")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.
Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.
In addition, return not supported on devlink reload action fw_activate
for these functions.
Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The __is_constexpr() macro is dark magic. Shed some light on it with
a comment to explain how and why it works.
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240301044428.work.411-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
A common use of type_max() is to find the max for the type of a
variable. Using the pattern type_max(typeof(var)) is needlessly
verbose. Instead, since typeof(type) == type we can just explicitly
call typeof() on the argument to type_max() and type_min(). Add
wrappers for readability.
We can do some replacements right away:
$ git grep '\btype_\(min\|max\)(typeof' | wc -l
11
Link: https://lore.kernel.org/r/20240301062221.work.840-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the power_supply_class structure to be declared at build
time placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240301-class_cleanup-power-v1-1-97e0b7bf9c94@marliere.net
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
The x86 architecture has an idle routine for AMD CPUs which are affected
by erratum 400. On the affected CPUs the local APIC timer stops in the
C1E halt state.
It therefore requires tick broadcasting. The invocation of
tick_broadcast_enter()/exit() from this function violates the RCU
constraints because it can end up in lockdep or tracing, which
rightfully triggers a warning.
tick_broadcast_enter()/exit() must be invoked before ct_cpuidle_enter()
and after ct_cpuidle_exit() in default_idle_call().
Add a static branch conditional invocation of tick_broadcast_enter()/exit()
into this function to allow X86 to replace the AMD specific idle code. It's
guarded by a config switch which will be selected by x86. Otherwise it's
a NOOP.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240229142248.266708822@linutronix.de
|
|
Add a small wrapper around blk_stack_limits that allows passing a bdev
for the bottom device and prints an error in case of misaligned
device. The name fits into the new queue limits API and the intent is
to eventually replace disk_stack_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240228225653.947152-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a small wrapper around queue_limits_commit_update for stacking
drivers that don't want to update existing limits, but set an
entirely new set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240228225653.947152-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Chain RDMA Writes that convey Write chunks onto the local Send
chain. This means all WRs for an RPC Reply are now posted with a
single ib_post_send() call, and there is a single Send completion
when all of these are done. That reduces both the per-transport
doorbell rate and completion rate.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Refactor to eventually enable svcrdma to post the Write WRs for each
RPC response using the same ib_post_send() as the Send WR (ie, as a
single WR chain).
svc_rdma_result_payload (originally svc_rdma_read_payload) was added
so that the upper layer XDR encoder could identify a range of bytes
to be possibly conveyed by RDMA (if a Write chunk was provided by
the client).
The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while
XDR encoding replies") was to post as much of the result payload
outside of svc_rdma_sendto() as possible because svc_rdma_sendto()
used to be called with the xpt_mutex held.
However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into
socket xpo_sendto methods"), the xpt_mutex is no longer held when
calling svc_rdma_sendto(). Thus, that benefit is no longer an issue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Reduce the doorbell and Send completion rates when sending RPC/RDMA
replies that have Reply chunks. NFS READDIR procedures typically
return their result in a Reply chunk, for example.
Instead of calling ib_post_send() to post the Write WRs for the
Reply chunk, and then calling it again to post the Send WR that
conveys the transport header, chain the Write WRs to the Send WR
and call ib_post_send() only once.
Thanks to the Send Queue completion ordering rules, when the Send
WR completes, that guarantees that Write WRs posted before it have
also completed successfully. Thus all Write WRs for the Reply chunk
can remain unsignaled. Instead of handling a Write completion and
then a Send completion, only the Send completion is seen, and it
handles clean up for both the Writes and the Send.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|