Age | Commit message (Collapse) | Author |
|
Easwar Hariharan says:
====================
Converge on using secs_to_jiffies() part two
This is the second series (part 1*) that converts users of msecs_to_jiffies() that
either use the multiply pattern of either of:
- msecs_to_jiffies(N*1000) or
- msecs_to_jiffies(N*MSEC_PER_SEC)
where N is a constant or an expression, to avoid the multiplication.
The conversion is made with Coccinelle with the secs_to_jiffies() script
in scripts/coccinelle/misc. Attention is paid to what the best change
can be rather than restricting to what the tool provides.
v1: https://lore.kernel.org/20250219-netdev-secs-to-jiffies-part-2-v1-0-c484cc63611b@linux.microsoft.com
====================
Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-0-b7817036342f@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
@depends on patch@
expression E;
@@
-msecs_to_jiffies(E * 1000)
+secs_to_jiffies(E)
-msecs_to_jiffies(E * MSEC_PER_SEC)
+secs_to_jiffies(E)
While here, manually convert a couple timeouts denominated in seconds
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-2-b7817036342f@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
@depends on patch@
expression E;
@@
-msecs_to_jiffies(E * 1000)
+secs_to_jiffies(E)
-msecs_to_jiffies(E * MSEC_PER_SEC)
+secs_to_jiffies(E)
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-1-b7817036342f@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All memory in GVE is currently allocated without regard for the NUMA
node of the device. Because access to NUMA-local memory access is
significantly cheaper than access to a remote node, this change attempts
to ensure that page frags used in the RX path, including page pool
frags, are allocated on the NUMA node local to the gVNIC device. Note
that this attempt is best-effort. If necessary, the driver will still
allocate non-local memory, as __GFP_THISNODE is not passed. Descriptor
ring allocations are not updated, as dma_alloc_coherent handles that.
This change also modifies the IRQ affinity setting to only select CPUs
from the node local to the device, preserving the behavior that TX and
RX queues of the same index share CPU affinity.
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250707210107.2742029-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ivan Vecera says:
====================
Add Microchip ZL3073x support (part 1)
Add support for Microchip Azurite DPLL/PTP/SyncE chip family that
provides DPLL and PTP functionality. This series bring first part
that adds the core functionality and basic DPLL support.
The next part of the series will bring additional DPLL functionality
like eSync support, phase offset and frequency offset reporting and
phase adjustments.
Testing was done by myself and by Prathosh Satish on Microchip EDS2
development board with ZL30732 DPLL chip connected over I2C bus.
====================
Link: https://patch.msgid.link/20250704182202.1641943-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support to get/set frequency on pins. The frequency for input
pins (references) is computed in the device according this formula:
freq = base_freq * multiplier * (nominator / denominator)
where the base_freq comes from the list of supported base frequencies
and other parameters are arbitrary numbers. All these parameters are
16-bit unsigned integers.
The frequency for output pin is determined by the frequency of
synthesizer the output pin is connected to and divisor of the output
to which is the given pin belongs. The resulting frequency of the
P-pin and the N-pin from this output pair depends on the signal
format of this output pair.
The device supports so-called N-divided signal formats where for the
N-pin there is an additional divisor. The frequencies for both pins
from such output pair are computed:
P-pin-freq = synth_freq / output_div
N-pin-freq = synth_freq / output_div / n_div
For other signal-format types both P and N pin have the same frequency
based only synth frequency and output divisor.
Implement output pin callbacks to get and set frequency. The frequency
setting for the output non-N-divided signal format is simple as we have
to compute just new output divisor. For N-divided formats it is more
complex because by changing of output divisor we change frequency for
both P and N pins. In this case if we are changing frequency for P-pin
we have to compute also new N-divisor for N-pin to keep its current
frequency. From this and the above it follows that the frequency of
the N-pin cannot be higher than the frequency of the P-pin and the
callback must take this limitation into account.
Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-13-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement input pin state setting when the DPLL is running in automatic
mode. Unlike manual mode, the DPLL mode switching is not used here and
the implementation uses special priority value (15) to make the given
pin non-selectable.
When the user sets state of the pin as disconnected the driver
internally sets its priority in HW to 15 that prevents the DPLL to
choose this input pin. Conversely, if the pin status is set to
selectable, the driver sets the pin priority in HW to the original saved
value.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-12-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for getting and setting input pin priority. Implement
required callbacks and set appropriate capability for input pins.
Although the pin priority make sense only if the DPLL is running in
automatic mode we have to expose this capability unconditionally because
input pins (references) are shared between all DPLLs where one of them
can run in automatic mode while the other one not.
Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-11-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement input pin state setting if the DPLL is running in manual mode.
The driver indicates manual mode if the DPLL mode is one of ref-lock,
forced-holdover, freerun.
Use these modes to implement input pin state change between connected
and disconnected states. When the user set the particular pin as
connected the driver marks this input pin as forced reference and
switches the DPLL mode to ref-lock. When the use set the pin as
disconnected the driver switches the DPLL to freerun or forced holdover
mode. The switch to holdover mode is done if the DPLL has holdover
capability (e.g is currently locked with holdover acquired).
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-10-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enumerate all available DPLL channels and registers a DPLL device for
each of them. Check all input references and outputs and register
DPLL pins for them.
Number of registered DPLL pins depends on configuration of references
and outputs. If the reference or output is configured as differential
one then only one DPLL pin is registered. Both references and outputs
can be also disabled from firmware configuration and in this case
no DPLL pins are registered.
All registrable references are registered to all available DPLL devices
with exception of DPLLs that are configured in NCO (numerically
controlled oscillator) mode. In this mode DPLL channel acts as PHC and
cannot be locked to any reference.
Device outputs are connected to one of synthesizers and each synthesizer
is driven by some DPLL channel. So output pins belonging to given output
are registered to DPLL device that drives associated synthesizer.
Finally add kworker task to monitor async changes on all DPLL channels
and input pins and to notify about them DPLL core. Output pins are not
monitored as their parameters are not changed asynchronously by the
device.
Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-9-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for reading of DPLL types and optional pin properties from
the system firmware (DT, ACPI...).
The DPLL types are stored in property 'dpll-types' as string array and
possible values 'pps' and 'eec' are mapped to DPLL enums DPLL_TYPE_PPS
and DPLL_TYPE_EEC.
The pin properties are stored under 'input-pins' and 'output-pins'
sub-nodes and the following ones are supported:
* reg
integer that specifies pin index
* label
string that is used by driver as board label
* connection-type
string that indicates pin connection type
* supported-frequencies-hz
array of u64 values what frequencies are supported / allowed for
given pin with respect to hardware wiring
Do not blindly trust system firmware and filter out frequencies that
cannot be configured/represented in device (input frequencies have to
be factorized by one of the base frequencies and output frequencies have
to divide configured synthesizer frequency).
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-8-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Several configuration parameters will remain constant at runtime,
so we can load them during probe to avoid excessive reads from
the hardware.
Read the following parameters from the device during probe and store
them for later use:
* enablement status and frequencies of the synthesizers and their
associated DPLL channels
* enablement status and type (single-ended or differential) of input pins
* associated synthesizers, signal format, and enablement status of
outputs
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-7-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Microchip Azurite ZL3073x represents chip family providing DPLL
and optionally PHC (PTP) functionality. The chips can be connected
be connected over I2C or SPI bus.
They have the following characteristics:
* up to 5 separate DPLL units (channels)
* 5 synthesizers
* 10 input pins (references)
* 10 outputs
* 20 output pins (output pin pair shares one output)
* Each reference and output can operate in either differential or
single-ended mode (differential mode uses 2 pins)
* Each output is connected to one of the synthesizers
* Each synthesizer is driven by one of the DPLL unit
The device uses 7-bit addresses and 8-bits values. It exposes 8-, 16-,
32- and 48-bits registers in address range <0x000,0x77F>. Due to 7bit
addressing, the range is organized into pages of 128 bytes, with each
page containing a page selector register at address 0x7F.
For reading/writing multi-byte registers, the device supports bulk
transfers.
Add basic functionality to access device registers, probe functionality
both I2C and SPI cases and add devlink support to provide info and
to set clock ID parameter.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new device generic parameter to specify clock ID that should
be used by the device for registering DPLL devices and pins.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Only 8, 16 and 32-bit integers are supported for numeric devlink
parameters. The subsequent patch adds support for DPLL clock ID
that is defined as 64-bit number. Add support for u64 parameter
type.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add DT bindings for Microchip Azurite DPLL chip family. These chips
provide up to 5 independent DPLL channels, 10 differential or
single-ended inputs and 10 differential or 20 single-ended outputs.
They can be connected via I2C or SPI busses.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a common DT schema for DPLL device and its associated pins.
The DPLL (device phase-locked loop) is a device used for precise clock
synchronization in networking and telecom hardware.
The device includes one or more DPLLs (channels) and one or more
physical input/output pins.
Each DPLL channel is used either to provide a pulse-per-clock signal or
to drive an Ethernet equipment clock.
The input and output pins have the following properties:
* label: specifies board label
* connection type: specifies its usage depending on wiring
* list of supported or allowed frequencies: depending on how the pin
is connected and where)
* embedded sync capability: indicates whether the pin supports this
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit does not do any functional changes. It moves xdp->data
adjustment for buffer other than first buffer to buf_to_xdp() helper so
that the xdp_buff adjustment does not scatter over different functions.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250705075515.34260-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mengyuan Lou says:
====================
Add vf drivers for wangxun virtual functions
Introduces basic support for Wangxun’s virtual function (VF) network
drivers, specifically txgbevf and ngbevf. These drivers provide SR-IOV
VF functionality for Wangxun 10/25/40G network devices.
The first three patches add common APIs for Wangxun VF drivers, including
mailbox communication and shared initialization logic.These abstractions
are placed in libwx to reduce duplication across VF drivers.
Patches 4–8 introduce the txgbevf driver, including:
PCI device initialization, Hardware reset, Interrupt setup, Rx/Tx datapath
implementation and link status changeing flow.
Patches 9–12 implement the ngbevf driver, mirroring the functionality
added in txgbevf.
v2: https://lore.kernel.org/20250625102058.19898-1-mengyuanlou@net-swift.com
v1: https://lore.kernel.org/20250611083559.14175-1-mengyuanlou@net-swift.com
====================
Link: https://patch.msgid.link/20250704094923.652-1-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add link update flow to wangxun 1G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-13-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add specific parameters for irq alloc, then use
wx_init_interrupt_scheme to initialize interrupt
allocation in probe.
Add .ndo_start_xmit support and start all queues.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-12-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do sw init and reset hw for ngbevf virtual functions, then
register netdev.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-11-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add doc build infrastructure for ngbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 1G virtual
functions for Wangxun.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-10-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add link update flow to wangxun 10/25/40G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-9-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Improve the configuration of Rx and Tx ring.
Setup and alloc resources.
Configure Rx and Tx unit on hardware.
Add .ndo_start_xmit support and start all queues.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-8-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add irq alloc flow functions for vf.
Alloc pcie msix irqs for drivers and request_irq for tx/rx rings
and misc other events.
If the application is successful, config vertors for interrupts.
Enable interrupts mask in wxvf_irq_enable.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-7-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add sw init and reset hw for txgbevf virtual functions
which initialize basic parameters, and then register netdev.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-6-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add doc build infrastructure for txgbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 10/25/40G virtual
functions for Wangxun.
Ioremap the space of bar0 and bar4 which will be used.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-5-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add common wx_configure_vf and wx_set_mac_vf for
ngbevf and txgbevf.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-4-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement mbox_write_and_read_ack functions which are
used to set basic functions like set_mac, get_link.etc
for vf.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-3-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implements the mailbox interfaces for Wangxun vf drivers which
will be used in txgbevf and ngbevf.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-2-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The GMAC IP on CV1800 series SoC is a standard Synopsys
DesignWare MAC (version 3.70a).
Add necessary compatible string for this device.
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250703021220.124195-1-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use DEFINE_MUTEX() to initialize udp_tunnel_gro_type_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250707091634.311974-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add missing 'minItems: 1' to iommus property of the Altera SOCFPGA SoC
implementation of the Synopsys DWMAC.
Fixes: 6d359cf464f4 ("dt-bindings: net: Convert socfpga-dwmac bindings to yaml")
Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250707154409.15527-1-matthew.gerlach@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This ethernet controller is using fixed links for DSA switches
in two already existing device trees, so make sure the checker
does not complain like this:
intel-ixp42x-linksys-wrv54g.dtb: ethernet@c8009000 (intel,ixp4xx-ethernet):
'fixed-link' does not match any of the regexes: '^pinctrl-[0-9]+$'
from schema $id: http://devicetree.org/schemas/net/intel,ixp4xx-ethernet.yaml#
intel-ixp42x-usrobotics-usr8200.dtb: ethernet@c800a000 (intel,ixp4xx-ethernet):
'fixed-link' does not match any of the regexes: '^pinctrl-[0-9]+$'
from schema $id: http://devicetree.org/schemas/net/intel,ixp4xx-ethernet.yaml#
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507040609.K9KytWBA-lkp@intel.com/
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250704-ixp4xx-ethernet-binding-fix-v1-1-8ac360d5bc9b@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kuniyuki Iwashima says:
====================
ipv6: Drop RTNL from mcast.c and anycast.c
This is a prep series for RCU conversion of RTM_NEWNEIGH, which needs
RTNL during neigh_table.{pconstructor,pdestructor}() touching IPv6
multicast code.
Currently, IPv6 multicast code is protected by lock_sock() and
inet6_dev->mc_lock, and RTNL is not actually needed.
In addition, anycast code is also in the same situation and does not
need RTNL at all.
This series removes RTNL from net/ipv6/{mcast.c,anycast.c} and finally
removes setsockopt_needs_rtnl() from do_ipv6_setsockopt().
v2: https://lore.kernel.org/20250624202616.526600-1-kuni1840@gmail.com
v1: https://lore.kernel.org/20250616233417.1153427-1-kuni1840@gmail.com
====================
Link: https://patch.msgid.link/20250702230210.3115355-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We no longer need to hold RTNL for IPv6 socket options.
Let's remove setsockopt_needs_rtnl().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-16-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet6_sk(sk)->ipv6_ac_list is protected by lock_sock().
In ipv6_sock_ac_join(), only __dev_get_by_index(), __dev_get_by_flags(),
and __in6_dev_get() require RTNL.
__dev_get_by_flags() is only used by ipv6_sock_ac_join() and can be
converted to RCU version.
Let's replace RCU version helper and drop RTNL from IPV6_JOIN_ANYCAST.
setsockopt_needs_rtnl() will be removed in the next patch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-15-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The next patch will replace __dev_get_by_index() and __dev_get_by_flags()
to RCU + refcount version.
Then, we will need to call dev_put() in some error paths.
Let's unify two error paths to make the next patch cleaner.
Note that we add READ_ONCE() for net->ipv6.devconf_all->forwarding
and idev->conf.forwarding as we will drop RTNL that protects them.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-14-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet6_sk(sk)->ipv6_ac_list is protected by lock_sock().
In ipv6_sock_ac_drop() and ipv6_sock_ac_close(),
only __dev_get_by_index() and __in6_dev_get() requrie RTNL.
Let's replace them with dev_get_by_index() and in6_dev_get()
and drop RTNL from IPV6_LEAVE_ANYCAST and IPV6_ADDRFORM.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-13-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet6_dev->ac_list is protected by inet6_dev->lock, so rtnl_dereference()
is a bit rough annotation.
As done in mcast.c, we can use ac_dereference() that checks if
inet6_dev->lock is held.
Let's replace rtnl_dereference() with a new helper ac_dereference().
Note that now addrconf_join_solict() / addrconf_leave_solict() in
__ipv6_dev_ac_inc() / __ipv6_dev_ac_dec() does not need RTNL, so we
can remove ASSERT_RTNL() there.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-12-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now, RTNL is not needed for mcast code, and what's commented in
ip6_mc_msfget() is apparent by for_each_pmc_socklock(), which has
lockdep annotation for lock_sock().
Let's remove the comment and ASSERT_RTNL() in ipv6_mc_rejoin_groups().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-11-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In ip6_mc_source() and ip6_mc_msfilter(), per-socket mld data is
protected by lock_sock() and inet6_dev->mc_lock is also held for
some per-interface functions.
ip6_mc_find_dev_rtnl() only depends on RTNL. If we want to remove
it, we need to check inet6_dev->dead under mc_lock to close the race
with addrconf_ifdown(), as mentioned earlier.
Let's do that and drop RTNL for the rest of MCAST_ socket options.
Note that ip6_mc_msfilter() has unnecessary lock dances and they
are integrated into one to avoid the last-minute error and simplify
the error handling.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-10-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In __ipv6_sock_mc_close(), per-socket mld data is protected by lock_sock(),
and only __dev_get_by_index() and __in6_dev_get() require RTNL.
Let's call __ipv6_sock_mc_drop() and drop RTNL in ipv6_sock_mc_close().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-9-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In __ipv6_sock_mc_drop(), per-socket mld data is protected by lock_sock(),
and only __dev_get_by_index() and __in6_dev_get() require RTNL.
Let's use dev_get_by_index() and in6_dev_get() and drop RTNL for
IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.
Note that __ipv6_sock_mc_drop() is factorised to reuse in the next patch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-8-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In __ipv6_sock_mc_join(), per-socket mld data is protected by lock_sock(),
and only __dev_get_by_index() requires RTNL.
Let's use dev_get_by_index() and drop RTNL for IPV6_ADD_MEMBERSHIP and
MCAST_JOIN_GROUP.
Note that we must call rt6_lookup() and dev_hold() under RCU.
If rt6_lookup() returns an entry from the exception table, dst_dev_put()
could change rt->dev.dst to loopback concurrently, and the original device
could lose the refcount before dev_hold() and unblock device registration.
dst_dev_put() is called from NETDEV_UNREGISTER and synchronize_net() follows
it, so as long as rt6_lookup() and dev_hold() are called within the same
RCU critical section, the dev is alive.
Even if the race happens, they are synchronised by idev->dead and mcast
addresses are cleaned up.
For the racy access to rt->dst.dev, we use dst_dev().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-7-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As well as __ipv6_dev_mc_inc(), all code in __ipv6_dev_mc_dec() are
protected by inet6_dev->mc_lock, and RTNL is not needed.
Let's use in6_dev_get() in ipv6_dev_mc_dec() and remove ASSERT_RTNL()
in __ipv6_dev_mc_dec().
Now, we can remove the RTNL comment above addrconf_leave_solict() too.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-6-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 63ed8de4be81 ("mld: add mc_lock for protecting per-interface
mld data"), the newly allocated struct ifmcaddr6 cannot be removed until
inet6_dev->mc_lock is released, so mca_get() and mc_put() are unnecessary.
Let's remove the extra refcounting.
Note that mca_get() was only used in __ipv6_dev_mc_inc().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-5-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 63ed8de4be81 ("mld: add mc_lock for protecting
per-interface mld data"), every multicast resource is protected
by inet6_dev->mc_lock.
RTNL is unnecessary in terms of protection but still needed for
synchronisation between addrconf_ifdown() and __ipv6_dev_mc_inc().
Once we removed RTNL, there would be a race below, where we could
add a multicast address to a dead inet6_dev.
CPU1 CPU2
==== ====
addrconf_ifdown() __ipv6_dev_mc_inc()
if (idev->dead) <-- false
dead = true return -ENODEV;
ipv6_mc_destroy_dev() / ipv6_mc_down()
mutex_lock(&idev->mc_lock)
...
mutex_unlock(&idev->mc_lock)
mutex_lock(&idev->mc_lock)
...
mutex_unlock(&idev->mc_lock)
The race window can be easily closed by checking inet6_dev->dead
under inet6_dev->mc_lock in __ipv6_dev_mc_inc() as addrconf_ifdown()
will acquire it after marking inet6_dev dead.
Let's check inet6_dev->dead under mc_lock in __ipv6_dev_mc_inc().
Note that now __ipv6_dev_mc_inc() no longer depends on RTNL and
we can remove ASSERT_RTNL() there and the RTNL comment above
addrconf_join_solict().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-4-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 63ed8de4be81 ("mld: add mc_lock for protecting per-interface
mld data") added the same comments regarding locking to many functions.
Let's replace the comments with lockdep annotation, which is more helpful.
Note that we just remove the comment for mld_clear_zeros() and
mld_send_cr(), where mc_dereference() is used in the entry of the
function.
While at it, a comment for __ipv6_sock_mc_join() is moved back to the
correct place.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-3-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|