Age | Commit message (Collapse) | Author |
|
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-8-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-7-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
This driver is input-only and as such doesn't need to define empty set()
and direction_output() callbacks. Remove them.
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-6-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-5-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-4-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-3-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-2-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
struct gpio_chip now has callbacks for setting line values that return
an integer, allowing to indicate failures. Convert the driver to using
them.
Link: https://lore.kernel.org/r/20250506-gpiochip-set-rv-gpio-part3-v1-1-0fbdea5a9667@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into for-6.16/block
Pull MD changes from Yu Kuai:
- Fix that normal IO can be starved by sync IO, found by mkfs on newly
created large raid5, with some clean up patches for bdev inflight
counters.
* tag 'md-6.16-20250513' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
md: clean up accounting for issued sync IO
md: fix is_mddev_idle()
md: add a new api sync_io_depth
md: record dm-raid gendisk in mddev
block: export API to get the number of bdev inflight IO
block: clean up blk_mq_in_flight_rw()
block: WARN if bdev inflight counter is negative
block: reuse part_in_flight_rw for part_in_flight
blk-mq: remove blk_mq_in_flight()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into gpio/for-next
Immutable branch between MFD, GPIO and NVMEM due for the v6.16 merge window
|
|
We will extend the firmware methods, so move it to its own module
instead to keep gpu.rs focused.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/r/20250507-nova-frts-v3-7-fcb02749754d@nvidia.com
[ Don't require a bound device, remove pub visibility from Firmware
fields, use FIRMWARE_VERSION consistently. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
The layout of NV_PMC_BOOT_0 has two small issues:
- The "chipset" field, while useful to identify a chip, is actually an
aggregate of two distinct fields named "architecture" and
"implementation".
- The "architecture" field is split, with its MSB being at a different
location than the rest of its bits.
Redefine the register layout to match its actual definition as provided
by OpenRM and expose the fully-constructed "architecture" field through
our own "Architecture" type. The "chipset" pseudo-field is also useful
to have, so keep providing it.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/r/20250507-nova-frts-v3-6-fcb02749754d@nvidia.com
[ Use Result from kernel::prelude. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Add the register!() macro, which defines a given register's layout and
provide bit-field accessors with a way to convert them to a given type.
This macro will allow us to make clear definitions of the registers and
manipulate their fields safely.
The long-term goal is to eventually move it to the kernel crate so it
can be used by other drivers as well, but it was agreed to first land it
into nova-core and make it mature there.
To illustrate its usage, use it to define the layout for the Boot0
(renamed to NV_PMC_BOOT_0 to match OpenRM's naming scheme) and take
advantage of its accessors.
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/r/20250507-nova-frts-v3-5-fcb02749754d@nvidia.com
[ Fix typo in commit message. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Describe the support for hybrid processors in intel_pstate, including
the CAS and EAS support, in the admin-guide documentation.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1935040.CQOukoFCf9@rjwysocki.net
|
|
On some hybrid platforms some efficient CPUs (E-cores) are not connected
to the L3 cache, but there are no other differences between them and the
other E-cores that use L3. In that case, it is generally more efficient
to run "light" workloads on the E-cores that do not use L3 and allow all
of the cores using L3, including P-cores, to go into idle states.
For this reason, slightly increase the cost for all CPUs sharing the L3
cache to make EAS prefer CPUs that do not use it to the other CPUs of
the same type (if any).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2032776.usQuhbGJ8B@rjwysocki.net
|
|
Modify intel_pstate to register EM perf domains for CPUs on hybrid
platforms without SMT which causes EAS to be enabled on them when
schedutil is used as the cpufreq governor (which requires intel_pstate
to operate in the passive mode).
This change is targeting platforms (for example, Lunar Lake) where the
"little" CPUs (E-cores) are always more energy-efficient than the "big"
or "performance" CPUs (P-cores) when run at the same HWP performance
level, so it is sufficient to tell EAS that E-cores are always preferred
(so long as there is enough spare capacity on one of them to run the
given task). However, migrating tasks between CPUs of the same type
too often is not desirable because it may hurt both performance and
energy efficiency due to leaving warm caches behind.
For this reason, register a separate perf domain for each CPU and choose
the cost values for them so that the cost mostly depends on the CPU type,
but there is also a small component of it depending on the performance
level (utilization) which helps to balance the load between CPUs of the
same type.
The cost component related to the CPU type is computed with the help of
the observation that the IPC metric value for a given CPU is inversely
proportional to its performance-to-frequency scaling factor and the cost
of running code on it can be assumed to be roughly proportional to that
IPC ratio (in principle, the higher the IPC ratio, the more resources
are utilized when running at a given frequency, so the cost should be
higher).
For all CPUs that are online at the system initialization time, EM perf
domains are registered when the driver starts up, after asymmetric
capacity support has been enabled. For the CPUs that become online
later, EM perf domains are registered after setting the asymmetric
capacity for them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/6057101.MhkbZ0Pkbq@rjwysocki.net
|
|
|
|
Add a function for updating the Energy Model for a CPU after its
capacity has changed, which subsequently will be used by the
intel_pstate driver.
An EM_PERF_DOMAIN_ARTIFICIAL check is added to em_recalc_and_update()
to prevent it from calling em_compute_costs() for an "artificial" perf
domain with a NULL cb parameter which would cause it to crash.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/3637203.iIbC2pHGDl@rjwysocki.net
|
|
Move the check of the CPU capacity currently stored in the energy model
against the arch_scale_cpu_capacity() value to em_adjust_new_capacity()
so it will be done regardless of where the latter is called from.
This will be useful when a new em_adjust_new_capacity() caller is added
subsequently.
While at it, move the pd local variable declaration in
em_check_capacity_update() into the loop in which it is used.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/7810787.EvYhyI6sBW@rjwysocki.net
|
|
Fix the API name to free the allocated table in the example driver code
that modifies the EM. Also fix the passing of correct table when
updating the cost.
Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com>
Link: https://patch.msgid.link/20250511071141.13237-1-atulpant.linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Policy locking was added to cpufreq_policy_is_good_for_eas() by commit
4854649b1fb4 ("cpufreq/sched: Move cpufreq-specific EAS checks to
cpufreq") to address a theoretical race condition, but it turned out to
introduce a circular locking dependency between the policy rwsem and
sched_domains_mutex via cpuset_mutex. This leads to a board lockup on
OdroidN2 that is based on the ARM64 Amlogic Meson SoC.
Drop the policy locking from cpufreq_policy_is_good_for_eas() to address
this issue.
Fixes: 4854649b1fb4 ("cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq")
Closes: https://lore.kernel.org/linux-pm/1bf3df62-0641-459f-99fc-fd511e564b84@samsung.com/
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2806514.mvXUDI8C0e@rjwysocki.net
|
|
We will need to perform things like allocating DMA memory during device
creation, so make sure to take the device context that will allow us to
perform these actions. This also allows us to use Devres::access to
obtain the BAR without holding a RCU lock.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/r/20250507-nova-frts-v3-4-fcb02749754d@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
linux-firmware contains a directory for GA100, and it is a defined
chipset in Nouveau.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/r/20250507-nova-frts-v3-3-fcb02749754d@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
We will commonly need to compare chipset versions, so derive the
ordering traits to make that possible. Also derive Copy and Clone since
passing Chipset by value will be more efficient than by reference.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/r/20250507-nova-frts-v3-2-fcb02749754d@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Introduce pm_suspend_in_progress() to be used for checking if a system-
wide suspend or resume transition is in progress, instead of comparing
pm_suspend_target_state directly to PM_SUSPEND_ON, and use it where
applicable.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/2020901.PYKUYFuaPT@rjwysocki.net
|
|
Commit cdb8c100d8a4 ("include/linux/suspend.h: Only show pm_pr_dbg
messages at suspend/resume") caused PM debug messages to only be
printed during system-wide suspend and resume in progress, but it
forgot about hibernation.
Address this by adding a check for hibernation in progress to
pm_debug_messages_should_print().
Fixes: cdb8c100d8a4 ("include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/4998903.GXAFRqVoOG@rjwysocki.net
|
|
Commit aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending
children") had triggered a suspend issue on Tegra boards because it had
reordered the syspend of devices with async suspend enabled with respect
to some other devices. Specifically, the devices with async suspend
enabled that have no children are now suspended before any other devices
unless there are device links pointing to them as suppliers.
The investigation that followed the failure report uncovered that async
suspend was enabled for the cypd4226 device that was a Type-C controller
with a dependency on USB PHY and it turned out that disabling async
suspend for that device made the issue go away. Since async suspend
takes dependencies between parents and children into account as well
as other dependencies between devices represented by device links, this
means that the cypd4226 has a dependency on another device that is
not represented in any form in the kernel (a "hidden" dependency), in
which case async suspend should not be enabled for it.
Accordingly, make ucsi_ccg_probe() disable async suspend for the
devices handled by, which covers the cypd4226 device on the Tegra
boards as well as other devices likely to have similar "hidden"
dependencies.
Fixes: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children")
Closes: https://lore.kernel.org/linux-pm/c6cd714b-b0eb-42fc-b9b5-4f5f396fb4ec@nvidia.com/
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/6180608.lOV4Wx5bFT@rjwysocki.net
|
|
After recent changes, pinctrl-amd will not support hibernation when
CONFIG_HIBERNATION is set and CONFIG_SUSPEND isn't because it will not
register amd_gpio_pm_ops then.
Address this by restoring dependencies on CONFIG_PM_SLEEP where
necessary for hibernation support.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/5889368.DvuYhMxLoT@rjwysocki.net
|
|
Raju Rangoju says:
====================
amd-xgbe: add support for AMD Renoir
Add support for a new AMD Ethernet device called "Renoir". It has a new
PCI ID, add this to the current list of supported devices in the
amd-xgbe devices. Also, the BAR1 addresses cannot be used to access the
PCS registers on Renoir platform, use the indirect addressing via SMN
instead.
====================
Link: https://patch.msgid.link/20250509155325.720499-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for new pci device id 0x1641 to register
Renoir device with PCIe.
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-6-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
A new version of XPCS access routines have been introduced, add the
support to xgbe_pci_probe() to use these routines.
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-5-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add the necessary support to enable Renoir ethernet device. Since the
BAR1 address cannot be used to access the XPCS registers on Renoir, use
the smn functions.
Some of the ethernet add-in-cards have dual PHY but share a single MDIO
line (between the ports). In such cases, link inconsistencies are
noticed during the heavy traffic and during reboot stress tests. Using
smn calls helps avoid such race conditions.
Suggested-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-4-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Reorganize the xgbe_pci_probe() code path to convert if/else statements
to switch case to help add future code. This helps code look cleaner.
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-3-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The xgbe_{read/write}_mmd_regs_v* functions have common code which can
be moved to helper functions. Add new helper functions to calculate the
mmd_address for v1/v2 of xpcs access.
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250509155325.720499-2-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Jakub Kicinski says:
====================
tools: ynl-gen: support sub-types for binary attributes
Binary attributes have sub-type annotations which either indicate
that the binary object should be interpreted as a raw / C array of
a simple type (e.g. u32), or that it's a struct.
Use this information in the C codegen instead of outputting void *
for all binary attrs. It doesn't make a huge difference in the genl
families, but in classic Netlink there is a lot more structs.
v1: https://lore.kernel.org/20250508022839.1256059-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250509154213.1747885-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Support using a struct pointer for binary attrs. Len field is maintained
because the structs may grow with newer kernel versions. Or, which matters
more, be shorter if the binary is built against newer uAPI than kernel
against which it's executed. Since we are storing a pointer to a struct
type - always allocate at least the amount of memory needed by the struct
per current uAPI headers (unused mem is zeroed). Technically users should
check the length field but per modern ASAN checks storing a short object
under a pointer seems like a bad idea.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250509154213.1747885-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We auto-indent if statements (increase the indent of the subsequent
line by 1), do the same thing for else branches without a block.
There hasn't been any else branches before but we're about to add one.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250509154213.1747885-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Sub-type annotation on binary attributes may indicate that the attribute
carries an array of simple types (also referred to as "C array" in docs).
Support rendering them as such in the C user code. For example for u32,
instead of:
struct {
u32 arr;
} _len;
void *arr;
render:
struct {
u32 arr;
} _count;
__u32 *arr;
Note that count is the number of elements while len was the length in bytes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250509154213.1747885-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If the caller wrote more characters, count is truncated to the max
available space in "simple_write_to_buffer". Check that the input
size does not exceed the buffer size. Write a zero termination
afterwards.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505091754.285hHbr2-lkp@intel.com/
Signed-off-by: Markus Burri <markus.burri@mt.com>
Link: https://lore.kernel.org/r/20250509150459.115489-1-markus.burri@mt.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
If an input changes state during wake-up and is used as an interrupt
source, the IRQ handler reads the volatile input register to clear the
interrupt mask and deassert the IRQ line. However, the IRQ handler is
triggered before access to the register is granted, causing the read
operation to fail.
As a result, the IRQ handler enters a loop, repeatedly printing the
"failed reading register" message, until `pca953x_resume()` is eventually
called, which restores the driver context and enables access to
registers.
Fix by disabling the IRQ line before entering suspend mode, and
re-enabling it after the driver context is restored in `pca953x_resume()`.
An IRQ can be disabled with disable_irq() and still wake the system as
long as the IRQ has wake enabled, so the wake-up functionality is
preserved.
Fixes: b76574300504 ("gpio: pca953x: Restore registers after suspend/resume cycle")
Cc: stable@vger.kernel.org
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20250512095441.31645-1-francesco@dolcini.it
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
We don't need the CORES() match nor jacket (which really doesn't
even make sense to match to the RF anyway), and since the subdevice
masks we care about are contiguous, we can encode them as highest
and lowest bit set (automatically.) By encoding whether to match or
not as separate flags and taking advantage of the limited range of
the RF type, step and ID we can reduce the amount of memory needed
for the table, while also making the logic (apart perhaps from the
subdevice mask) easier to understand.
This reduces the size of the module by about 1.5KiB on x86-64.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250511195137.38a805a7c96f.Ieece00476cea6054b0827cd075eb8ba5943373df@changeid
|
|
During probe the regulator supply drivers may not yet be available.
Use dev_err_probe to provide just the pertinent log.
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://patch.msgid.link/20250512185739.2907466-1-nm@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Mina Almasry says:
====================
Device memory TCP TX
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Full outline on usage of the TX path is detailed in the documentation
included with this series.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
1. Add netmem refcounting needed for the TX path.
2. Devmem TX netlink API.
3. Devmem TX net stack implementation.
4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
freed from contexts where we can't sleep.
5. Add devmem TX documentation.
6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.
7. Guard netmem_tx against being enabled against drivers that don't
support it.
8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
devmem.py.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.com/
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#m066dd407fbed108828e2c40ae50e3f4376ef57fd
Cc: sdf@fomichev.me
Cc: asml.silence@gmail.com
Cc: dw@davidwei.uk
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Victor Nogueira <victor@mojatatu.com>
Cc: Pedro Tammela <pctammela@mojatatu.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
v14: https://lore.kernel.org/netdev/20250429032645.363766-1-almasrymina@google.com/
v13: https://lore.kernel.org/netdev/20250425204743.617260-1-almasrymina@google.com/
v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.com/
v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.com/
v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.com/
v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.com/
v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.com/
v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.com/
v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.com/
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.com/
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.com/
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
====================
Link: https://patch.msgid.link/20250508004830.4100853-1-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for devmem TX in ncdevmem.
This is a combination of the ncdevmem from the devmem TCP series RFCv1
which included the TX path, and work by Stan to include the netlink API
and refactored on top of his generic memory_provider support.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-10-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We should not enable netmem TX for drivers that don't declare support.
Check for driver netmem TX support during devmem TX binding and fail if
the driver does not have the functionality.
Check for driver support in validate_xmit_skb as well.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-9-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use netmem_dma_*() helpers in gve_tx_dqo.c DQO-RDA paths to
enable netmem TX support in that mode.
Declare support for netmem TX in GVE DQO-RDA mode.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-8-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Drivers need to make sure not to pass netmem dma-addrs to the
dma-mapping API in order to support netmem TX.
Add helpers and netmem_dma_*() helpers that enables special handling of
netmem dma-addrs that drivers can use.
Document in netmem.rst what drivers need to do to support netmem TX.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-7-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add documentation outlining the usage and details of the devmem TCP TX
API.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-6-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Augment dmabuf binding to be able to handle TX. Additional to all the RX
binding, we also create tx_vec needed for the TX path.
Provide API for sendmsg to be able to send dmabufs bound to this device:
- Provide a new dmabuf_tx_cmsg which includes the dmabuf to send from.
- MSG_ZEROCOPY with SCM_DEVMEM_DMABUF cmsg indicates send from dma-buf.
Devmem is uncopyable, so piggyback off the existing MSG_ZEROCOPY
implementation, while disabling instances where MSG_ZEROCOPY falls back
to copying.
We additionally pipe the binding down to the new
zerocopy_fill_skb_from_devmem which fills a TX skb with net_iov netmems
instead of the traditional page netmems.
We also special case skb_frag_dma_map to return the dma-address of these
dmabuf net_iovs instead of attempting to map pages.
The TX path may release the dmabuf in a context where we cannot wait.
This happens when the user unbinds a TX dmabuf while there are still
references to its netmems in the TX path. In that case, the netmems will
be put_netmem'd from a context where we can't unmap the dmabuf, Resolve
this by making __net_devmem_dmabuf_binding_free schedule_work'd.
Based on work by Stanislav Fomichev <sdf@fomichev.me>. A lot of the meat
of the implementation came from devmem TCP RFC v1[1], which included the
TX path, but Stan did all the rebasing on top of netmem/net_iov.
Cc: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-5-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add bind-tx netlink call to attach dmabuf for TX; queue is not
required, only ifindex and dmabuf fd for attachment.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|