Age | Commit message (Collapse) | Author |
|
All memory in GVE is currently allocated without regard for the NUMA
node of the device. Because access to NUMA-local memory access is
significantly cheaper than access to a remote node, this change attempts
to ensure that page frags used in the RX path, including page pool
frags, are allocated on the NUMA node local to the gVNIC device. Note
that this attempt is best-effort. If necessary, the driver will still
allocate non-local memory, as __GFP_THISNODE is not passed. Descriptor
ring allocations are not updated, as dma_alloc_coherent handles that.
This change also modifies the IRQ affinity setting to only select CPUs
from the node local to the device, preserving the behavior that TX and
RX queues of the same index share CPU affinity.
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250707210107.2742029-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skb_shared_info
While transitioning from netdev_alloc_ip_align() to build_skb(), memory
for the "skb_shared_info" member of an "skb" was not allocated. Fix this
by allocating "PAGE_SIZE" as the skb length, accounting for the packet
length, headroom and tailroom, thereby including the required memory space
for skb_shared_info.
Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Link: https://patch.msgid.link/20250707085201.1898818-1-c-vankar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current logic in nicvf_change_mtu() writes the new MTU to
netdev->mtu using WRITE_ONCE() before verifying if the hardware
update succeeds. However on hardware update failure, it attempts
to revert to the original MTU using a direct assignment
(netdev->mtu = orig_mtu)
which violates the intended of WRITE_ONCE protection introduced in
commit 1eb2cded45b3 ("net: annotate writes on dev->mtu from
ndo_change_mtu()")
Additionally, WRITE_ONCE(netdev->mtu, new_mtu) is unnecessarily
performed even when the device is not running.
Fix this by:
Only writing netdev->mtu after successfully updating the hardware.
Skipping hardware update when the device is down, and setting MTU
directly. Remove unused variable orig_mtu.
This ensures that all writes to netdev->mtu are consistent with
WRITE_ONCE expectations and avoids unintended state corruption
on failure paths.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250706194327.1369390-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DRR/NETEM/BLACKHOLE chain
Create a tdc test for the UAF scenario with DRR/NETEM/BLACKHOLE chain
shared by Lion on his report [1].
[1] https://lore.kernel.org/netdev/45876f14-cf28-4177-8ead-bb769fd9e57a@gmail.com/
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250705203638.246350-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
atmarpd_dev_ops does not implement the send method, which may cause crash
as bellow.
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: Oops: 0010 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5324 Comm: syz.0.0 Not tainted 6.15.0-rc6-syzkaller-00346-g5723cc3450bc #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffffc9000d3cf778 EFLAGS: 00010246
RAX: 1ffffffff1910dd1 RBX: 00000000000000c0 RCX: dffffc0000000000
RDX: ffffc9000dc82000 RSI: ffff88803e4c4640 RDI: ffff888052cd0000
RBP: ffffc9000d3cf8d0 R08: ffff888052c9143f R09: 1ffff1100a592287
R10: dffffc0000000000 R11: 0000000000000000 R12: 1ffff92001a79f00
R13: ffff888052cd0000 R14: ffff88803e4c4640 R15: ffffffff8c886e88
FS: 00007fbc762566c0(0000) GS:ffff88808d6c2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000041f1b000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vcc_sendmsg+0xa10/0xc50 net/atm/common.c:644
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg+0x219/0x270 net/socket.c:727
____sys_sendmsg+0x52d/0x830 net/socket.c:2566
___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620
__sys_sendmmsg+0x227/0x430 net/socket.c:2709
__do_sys_sendmmsg net/socket.c:2736 [inline]
__se_sys_sendmmsg net/socket.c:2733 [inline]
__x64_sys_sendmmsg+0xa0/0xc0 net/socket.c:2733
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+e34e5e6b5eddb0014def@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/682f82d5.a70a0220.1765ec.0143.GAE@google.com/T
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250705085228.329202-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ivan Vecera says:
====================
Add Microchip ZL3073x support (part 1)
Add support for Microchip Azurite DPLL/PTP/SyncE chip family that
provides DPLL and PTP functionality. This series bring first part
that adds the core functionality and basic DPLL support.
The next part of the series will bring additional DPLL functionality
like eSync support, phase offset and frequency offset reporting and
phase adjustments.
Testing was done by myself and by Prathosh Satish on Microchip EDS2
development board with ZL30732 DPLL chip connected over I2C bus.
====================
Link: https://patch.msgid.link/20250704182202.1641943-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support to get/set frequency on pins. The frequency for input
pins (references) is computed in the device according this formula:
freq = base_freq * multiplier * (nominator / denominator)
where the base_freq comes from the list of supported base frequencies
and other parameters are arbitrary numbers. All these parameters are
16-bit unsigned integers.
The frequency for output pin is determined by the frequency of
synthesizer the output pin is connected to and divisor of the output
to which is the given pin belongs. The resulting frequency of the
P-pin and the N-pin from this output pair depends on the signal
format of this output pair.
The device supports so-called N-divided signal formats where for the
N-pin there is an additional divisor. The frequencies for both pins
from such output pair are computed:
P-pin-freq = synth_freq / output_div
N-pin-freq = synth_freq / output_div / n_div
For other signal-format types both P and N pin have the same frequency
based only synth frequency and output divisor.
Implement output pin callbacks to get and set frequency. The frequency
setting for the output non-N-divided signal format is simple as we have
to compute just new output divisor. For N-divided formats it is more
complex because by changing of output divisor we change frequency for
both P and N pins. In this case if we are changing frequency for P-pin
we have to compute also new N-divisor for N-pin to keep its current
frequency. From this and the above it follows that the frequency of
the N-pin cannot be higher than the frequency of the P-pin and the
callback must take this limitation into account.
Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-13-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement input pin state setting when the DPLL is running in automatic
mode. Unlike manual mode, the DPLL mode switching is not used here and
the implementation uses special priority value (15) to make the given
pin non-selectable.
When the user sets state of the pin as disconnected the driver
internally sets its priority in HW to 15 that prevents the DPLL to
choose this input pin. Conversely, if the pin status is set to
selectable, the driver sets the pin priority in HW to the original saved
value.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-12-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for getting and setting input pin priority. Implement
required callbacks and set appropriate capability for input pins.
Although the pin priority make sense only if the DPLL is running in
automatic mode we have to expose this capability unconditionally because
input pins (references) are shared between all DPLLs where one of them
can run in automatic mode while the other one not.
Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-11-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement input pin state setting if the DPLL is running in manual mode.
The driver indicates manual mode if the DPLL mode is one of ref-lock,
forced-holdover, freerun.
Use these modes to implement input pin state change between connected
and disconnected states. When the user set the particular pin as
connected the driver marks this input pin as forced reference and
switches the DPLL mode to ref-lock. When the use set the pin as
disconnected the driver switches the DPLL to freerun or forced holdover
mode. The switch to holdover mode is done if the DPLL has holdover
capability (e.g is currently locked with holdover acquired).
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-10-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enumerate all available DPLL channels and registers a DPLL device for
each of them. Check all input references and outputs and register
DPLL pins for them.
Number of registered DPLL pins depends on configuration of references
and outputs. If the reference or output is configured as differential
one then only one DPLL pin is registered. Both references and outputs
can be also disabled from firmware configuration and in this case
no DPLL pins are registered.
All registrable references are registered to all available DPLL devices
with exception of DPLLs that are configured in NCO (numerically
controlled oscillator) mode. In this mode DPLL channel acts as PHC and
cannot be locked to any reference.
Device outputs are connected to one of synthesizers and each synthesizer
is driven by some DPLL channel. So output pins belonging to given output
are registered to DPLL device that drives associated synthesizer.
Finally add kworker task to monitor async changes on all DPLL channels
and input pins and to notify about them DPLL core. Output pins are not
monitored as their parameters are not changed asynchronously by the
device.
Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-9-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for reading of DPLL types and optional pin properties from
the system firmware (DT, ACPI...).
The DPLL types are stored in property 'dpll-types' as string array and
possible values 'pps' and 'eec' are mapped to DPLL enums DPLL_TYPE_PPS
and DPLL_TYPE_EEC.
The pin properties are stored under 'input-pins' and 'output-pins'
sub-nodes and the following ones are supported:
* reg
integer that specifies pin index
* label
string that is used by driver as board label
* connection-type
string that indicates pin connection type
* supported-frequencies-hz
array of u64 values what frequencies are supported / allowed for
given pin with respect to hardware wiring
Do not blindly trust system firmware and filter out frequencies that
cannot be configured/represented in device (input frequencies have to
be factorized by one of the base frequencies and output frequencies have
to divide configured synthesizer frequency).
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-8-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Several configuration parameters will remain constant at runtime,
so we can load them during probe to avoid excessive reads from
the hardware.
Read the following parameters from the device during probe and store
them for later use:
* enablement status and frequencies of the synthesizers and their
associated DPLL channels
* enablement status and type (single-ended or differential) of input pins
* associated synthesizers, signal format, and enablement status of
outputs
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-7-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Microchip Azurite ZL3073x represents chip family providing DPLL
and optionally PHC (PTP) functionality. The chips can be connected
be connected over I2C or SPI bus.
They have the following characteristics:
* up to 5 separate DPLL units (channels)
* 5 synthesizers
* 10 input pins (references)
* 10 outputs
* 20 output pins (output pin pair shares one output)
* Each reference and output can operate in either differential or
single-ended mode (differential mode uses 2 pins)
* Each output is connected to one of the synthesizers
* Each synthesizer is driven by one of the DPLL unit
The device uses 7-bit addresses and 8-bits values. It exposes 8-, 16-,
32- and 48-bits registers in address range <0x000,0x77F>. Due to 7bit
addressing, the range is organized into pages of 128 bytes, with each
page containing a page selector register at address 0x7F.
For reading/writing multi-byte registers, the device supports bulk
transfers.
Add basic functionality to access device registers, probe functionality
both I2C and SPI cases and add devlink support to provide info and
to set clock ID parameter.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new device generic parameter to specify clock ID that should
be used by the device for registering DPLL devices and pins.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Only 8, 16 and 32-bit integers are supported for numeric devlink
parameters. The subsequent patch adds support for DPLL clock ID
that is defined as 64-bit number. Add support for u64 parameter
type.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add DT bindings for Microchip Azurite DPLL chip family. These chips
provide up to 5 independent DPLL channels, 10 differential or
single-ended inputs and 10 differential or 20 single-ended outputs.
They can be connected via I2C or SPI busses.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a common DT schema for DPLL device and its associated pins.
The DPLL (device phase-locked loop) is a device used for precise clock
synchronization in networking and telecom hardware.
The device includes one or more DPLLs (channels) and one or more
physical input/output pins.
Each DPLL channel is used either to provide a pulse-per-clock signal or
to drive an Ethernet equipment clock.
The input and output pins have the following properties:
* label: specifies board label
* connection type: specifies its usage depending on wiring
* list of supported or allowed frequencies: depending on how the pin
is connected and where)
* embedded sync capability: indicates whether the pin supports this
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250704182202.1641943-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit does not do any functional changes. It moves xdp->data
adjustment for buffer other than first buffer to buf_to_xdp() helper so
that the xdp_buff adjustment does not scatter over different functions.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250705075515.34260-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mengyuan Lou says:
====================
Add vf drivers for wangxun virtual functions
Introduces basic support for Wangxun’s virtual function (VF) network
drivers, specifically txgbevf and ngbevf. These drivers provide SR-IOV
VF functionality for Wangxun 10/25/40G network devices.
The first three patches add common APIs for Wangxun VF drivers, including
mailbox communication and shared initialization logic.These abstractions
are placed in libwx to reduce duplication across VF drivers.
Patches 4–8 introduce the txgbevf driver, including:
PCI device initialization, Hardware reset, Interrupt setup, Rx/Tx datapath
implementation and link status changeing flow.
Patches 9–12 implement the ngbevf driver, mirroring the functionality
added in txgbevf.
v2: https://lore.kernel.org/20250625102058.19898-1-mengyuanlou@net-swift.com
v1: https://lore.kernel.org/20250611083559.14175-1-mengyuanlou@net-swift.com
====================
Link: https://patch.msgid.link/20250704094923.652-1-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add link update flow to wangxun 1G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-13-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add specific parameters for irq alloc, then use
wx_init_interrupt_scheme to initialize interrupt
allocation in probe.
Add .ndo_start_xmit support and start all queues.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-12-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do sw init and reset hw for ngbevf virtual functions, then
register netdev.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-11-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add doc build infrastructure for ngbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 1G virtual
functions for Wangxun.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-10-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add link update flow to wangxun 10/25/40G virtual functions.
Get link status from pf in mbox, and if it is failed then
check the vx_status, because vx_status switching is too slow.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-9-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Improve the configuration of Rx and Tx ring.
Setup and alloc resources.
Configure Rx and Tx unit on hardware.
Add .ndo_start_xmit support and start all queues.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-8-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add irq alloc flow functions for vf.
Alloc pcie msix irqs for drivers and request_irq for tx/rx rings
and misc other events.
If the application is successful, config vertors for interrupts.
Enable interrupts mask in wxvf_irq_enable.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-7-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add sw init and reset hw for txgbevf virtual functions
which initialize basic parameters, and then register netdev.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-6-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add doc build infrastructure for txgbevf driver.
Implement the basic PCI driver loading and unloading interface.
Initialize the id_table which support 10/25/40G virtual
functions for Wangxun.
Ioremap the space of bar0 and bar4 which will be used.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-5-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add common wx_configure_vf and wx_set_mac_vf for
ngbevf and txgbevf.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-4-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement mbox_write_and_read_ack functions which are
used to set basic functions like set_mac, get_link.etc
for vf.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-3-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implements the mailbox interfaces for Wangxun vf drivers which
will be used in txgbevf and ngbevf.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/20250704094923.652-2-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It's needed to check the return value of lockdep_commit_lock_is_held(),
otherwise there's no point in this assertion as it doesn't print any
debug information on itself.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
Fixes: b04df3da1b5c ("netfilter: nf_tables: do not defer rule destruction via call_rcu")
Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Restore commit 28339b21a365 ("netfilter: nf_tables: do not send complete
notification of deletions") and fix it:
- Avoid upfront modification of 'event' variable so the conditionals
become effective.
- Always include NFTA_OBJ_TYPE attribute in object notifications, user
space requires it for proper deserialisation.
- Catch DESTROY events, too.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This practically reverts commit 28339b21a365 ("netfilter: nf_tables: do
not send complete notification of deletions"): The feature was never
effective, due to prior modification of 'event' variable the conditional
early return never happened.
User space also relies upon the current behaviour, so better reintroduce
the shortened deletion notifications once it is fixed.
Fixes: 28339b21a365 ("netfilter: nf_tables: do not send complete notification of deletions")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
'atm-clip-fix-infinite-recursion-potential-null-ptr-deref-and-memleak'
Kuniyuki Iwashima says:
====================
atm: clip: Fix infinite recursion, potential null-ptr-deref, and memleak.
Patch 1 fixes racy access to atmarpd found while checking RTNL usage
in clip.c.
Patch 2 fixes memory leak by ioctl(ATMARP_MKIP) and ioctl(ATMARPD_CTRL).
Patch 3 fixes infinite recursive call of clip_vcc->old_push(), which
was reported by syzbot.
v1: https://lore.kernel.org/20250702020437.703698-1-kuniyu@google.com
====================
Link: https://patch.msgid.link/20250704062416.1613927-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported the splat below. [0]
This happens if we call ioctl(ATMARP_MKIP) more than once.
During the first call, clip_mkip() sets clip_push() to vcc->push(),
and the second call copies it to clip_vcc->old_push().
Later, when the socket is close()d, vcc_destroy_socket() passes
NULL skb to clip_push(), which calls clip_vcc->old_push(),
triggering the infinite recursion.
Let's prevent the second ioctl(ATMARP_MKIP) by checking
vcc->user_back, which is allocated by the first call as clip_vcc.
Note also that we use lock_sock() to prevent racy calls.
[0]:
BUG: TASK stack guard page was hit at ffffc9000d66fff8 (stack is ffffc9000d670000..ffffc9000d678000)
Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5322 Comm: syz.0.0 Not tainted 6.16.0-rc4-syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:clip_push+0x5/0x720 net/atm/clip.c:191
Code: e0 8f aa 8c e8 1c ad 5b fa eb ae 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 <41> 57 41 56 41 55 41 54 53 48 83 ec 20 48 89 f3 49 89 fd 48 bd 00
RSP: 0018:ffffc9000d670000 EFLAGS: 00010246
RAX: 1ffff1100235a4a5 RBX: ffff888011ad2508 RCX: ffff8880003c0000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888037f01000
RBP: dffffc0000000000 R08: ffffffff8fa104f7 R09: 1ffffffff1f4209e
R10: dffffc0000000000 R11: ffffffff8a99b300 R12: ffffffff8a99b300
R13: ffff888037f01000 R14: ffff888011ad2500 R15: ffff888037f01578
FS: 000055557ab6d500(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000d66fff8 CR3: 0000000043172000 CR4: 0000000000352ef0
Call Trace:
<TASK>
clip_push+0x6dc/0x720 net/atm/clip.c:200
clip_push+0x6dc/0x720 net/atm/clip.c:200
clip_push+0x6dc/0x720 net/atm/clip.c:200
...
clip_push+0x6dc/0x720 net/atm/clip.c:200
clip_push+0x6dc/0x720 net/atm/clip.c:200
clip_push+0x6dc/0x720 net/atm/clip.c:200
vcc_destroy_socket net/atm/common.c:183 [inline]
vcc_release+0x157/0x460 net/atm/common.c:205
__sock_release net/socket.c:647 [inline]
sock_close+0xc0/0x240 net/socket.c:1391
__fput+0x449/0xa70 fs/file_table.c:465
task_work_run+0x1d1/0x260 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:114
exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline]
do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff31c98e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffb5aa1f78 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 0000000000012747 RCX: 00007ff31c98e929
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007ff31cbb7ba0 R08: 0000000000000001 R09: 0000000db5aa226f
R10: 00007ff31c7ff030 R11: 0000000000000246 R12: 00007ff31cbb608c
R13: 00007ff31cbb6080 R14: ffffffffffffffff R15: 00007fffb5aa2090
</TASK>
Modules linked in:
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+0c77cccd6b7cd917b35a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2371d94d248d126c1eb1
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250704062416.1613927-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ioctl(ATMARP_MKIP) allocates struct clip_vcc and set it to
vcc->user_back.
The code assumes that vcc_destroy_socket() passes NULL skb
to vcc->push() when the socket is close()d, and then clip_push()
frees clip_vcc.
However, ioctl(ATMARPD_CTRL) sets NULL to vcc->push() in
atm_init_atmarp(), resulting in memory leak.
Let's serialise two ioctl() by lock_sock() and check vcc->push()
in atm_init_atmarp() to prevent memleak.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250704062416.1613927-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
atmarpd is protected by RTNL since commit f3a0592b37b8 ("[ATM]: clip
causes unregister hang").
However, it is not enough because to_atmarpd() is called without RTNL,
especially clip_neigh_solicit() / neigh_ops->solicit() is unsleepable.
Also, there is no RTNL dependency around atmarpd.
Let's use a private mutex and RCU to protect access to atmarpd in
to_atmarpd().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250704062416.1613927-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The GMAC IP on CV1800 series SoC is a standard Synopsys
DesignWare MAC (version 3.70a).
Add necessary compatible string for this device.
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250703021220.124195-1-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 12ffc3b1513e ("PM: Restrict swap use to later in the suspend
sequence") changed two pm_restore_gfp_mask() calls in enter_state()
and hibernation_restore() into one pm_restore_gfp_mask() call in
dpm_resume_end(), but it put that call before the dpm_resume()
invocation which is too early (some swap-backing devices may not be
ready at that point).
Moreover, this code ordering change was not even mentioned in the
changelog of the commit mentioned above.
Address this by moving that call after the dpm_resume() one.
Fixes: 12ffc3b1513e ("PM: Restrict swap use to later in the suspend sequence")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/2797018.mvXUDI8C0e@rjwysocki.net
|
|
In function kvm_guest_time_update(), __scale_tsc() is used to calculate
a TSC *frequency* rather than a TSC value. With low-enough ratios,
a TSC value that is less than 1 would underflow to 0 and to an infinite
while loop in kvm_get_time_scale():
kvm_guest_time_update(struct kvm_vcpu *v)
if (kvm_caps.has_tsc_control)
tgt_tsc_khz = kvm_scale_tsc(tgt_tsc_khz,
v->arch.l1_tsc_scaling_ratio);
__scale_tsc(u64 ratio, u64 tsc)
ratio=122380531, tsc=2299998, N=48
ratio*tsc >> N = 0.999... -> 0
Later in the function:
Call Trace:
<TASK>
kvm_get_time_scale arch/x86/kvm/x86.c:2458 [inline]
kvm_guest_time_update+0x926/0xb00 arch/x86/kvm/x86.c:3268
vcpu_enter_guest.constprop.0+0x1e70/0x3cf0 arch/x86/kvm/x86.c:10678
vcpu_run+0x129/0x8d0 arch/x86/kvm/x86.c:11126
kvm_arch_vcpu_ioctl_run+0x37a/0x13d0 arch/x86/kvm/x86.c:11352
kvm_vcpu_ioctl+0x56b/0xe60 virt/kvm/kvm_main.c:4188
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0x12d/0x190 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x59/0x110 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x78/0xe2
This can really happen only when fuzzing, since the TSC frequency
would have to be nonsensically low.
Fixes: 35181e86df97 ("KVM: x86: Add a common TSC scaling function")
Reported-by: Yuntao Liu <liuyuntao12@huawei.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Jann Horn points out that epoll is decrementing the ep refcount and then
doing a
mutex_unlock(&ep->mtx);
afterwards. That's very wrong, because it can lead to a use-after-free.
That pattern is actually fine for the very last reference, because the
code in question will delay the actual call to "ep_free(ep)" until after
it has unlocked the mutex.
But it's wrong for the much subtler "next to last" case when somebody
*else* may also be dropping their reference and free the ep while we're
still using the mutex.
Note that this is true even if that other user is also using the same ep
mutex: mutexes, unlike spinlocks, can not be used for object ownership,
even if they guarantee mutual exclusion.
A mutex "unlock" operation is not atomic, and as one user is still
accessing the mutex as part of unlocking it, another user can come in
and get the now released mutex and free the data structure while the
first user is still cleaning up.
See our mutex documentation in Documentation/locking/mutex-design.rst,
in particular the section [1] about semantics:
"mutex_unlock() may access the mutex structure even after it has
internally released the lock already - so it's not safe for
another context to acquire the mutex and assume that the
mutex_unlock() context is not using the structure anymore"
So if we drop our ep ref before the mutex unlock, but we weren't the
last one, we may then unlock the mutex, another user comes in, drops
_their_ reference and releases the 'ep' as it now has no users - all
while the mutex_unlock() is still accessing it.
Fix this by simply moving the ep refcount dropping to outside the mutex:
the refcount itself is atomic, and doesn't need mutex protection (that's
the whole _point_ of refcounts: unlike mutexes, they are inherently
about object lifetimes).
Reported-by: Jann Horn <jannh@google.com>
Link: https://docs.kernel.org/locking/mutex-design.html#semantics [1]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix bogus KASAN splat on EFI runtime stack
- Select JUMP_LABEL unconditionally to avoid boot failure with pKVM and
the legacy implementation of static keys
- Avoid touching GCS registers when 'arm64.nogcs' has been passed on
the command-line
- Move a 'cpumask_t' off the stack in smp_send_stop()
- Don't advertise SME-related hwcaps to userspace when ID_AA64PFR1_EL1
indicates that SME is not implemented
- Always check the VMA when handling an Overlay fault
- Avoid corrupting TCR2_EL1 during boot
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/mm: Drop wrong writes into TCR2_EL1
arm64: poe: Handle spurious Overlay faults
arm64: Filter out SME hwcaps when FEAT_SME isn't implemented
arm64: move smp_send_stop() cpu mask off stack
arm64/gcs: Don't try to access GCS registers if arm64.nogcs is enabled
arm64: Unconditionally select CONFIG_JUMP_LABEL
arm64: efi: Fix KASAN false positive for EFI runtime stack
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Mark som pins as invalid for IRQ use in the Qualcomm driver
- Fix up the use of device properties on the MA35DX Nuvoton, apparently
something went sidewise
- Clear the GPIO debounce settings when going down for suspend in the
AMD driver. Very good for some AMD laptops that now wake up from
suspend again!
- Add the compulsory .can_sleep bool flag in the AW9523 driver, should
have been there from the beginning, now there are users finding the
bug
- Drop some bouncing email address from MAINTAINERS
* tag 'pinctrl-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: aw9523: fix can_sleep flag for GPIO chip
pinctrl: amd: Clear GPIO debounce for suspend
pinctrl: nuvoton: Fix boot on ma35dx platforms
MAINTAINERS: drop bouncing Lakshmi Sowjanya D
pinctrl: qcom: msm: mark certain pins as invalid for interrupts
|
|
of_flags is passed down to GPIO chip's xlate function, so ensure this one
is properly initialized as - if the xlate callback does nothing with it
- we may end up with various configuration errors like:
gpio-720 (enable): multiple pull-up, pull-down or pull-disable enabled, invalid configuration
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20250708083829.658051-1-alexander.stein@ew.tq-group.com
[Bartosz: tweaked the commit message]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
When SAE commit is sent and received in response, there's no
ordering for the SAE confirm messages. As such, don't call
drivers to stop listening on the channel when the confirm
message is still expected.
This fixes an issue if the local confirm is transmitted later
than the AP's confirm, for iwlwifi (and possibly mt76) the
AP's confirm would then get lost since the device isn't on
the channel at the time the AP transmit the confirm.
For iwlwifi at least, this also improves the overall timing
of the authentication handshake (by about 15ms according to
the report), likely since the session protection won't be
aborted and rescheduled.
Note that even before this, mgd_complete_tx() wasn't always
called for each call to mgd_prepare_tx() (e.g. in the case
of WEP key shared authentication), and the current drivers
that have the complete callback don't seem to mind. Document
this as well though.
Reported-by: Jan Hendrik Farr <kernel@jfarr.cc>
Closes: https://lore.kernel.org/all/aB30Ea2kRG24LINR@archlinux/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213232.12691580e140.I3f1d3127acabcd58348a110ab11044213cf147d3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Implement dot11ExtendedRegInfoSupport to advertise non-AP station
regulatory power capability as part of regulatory connectivity
element in (Re)Association request frames so that AP can achieve
maximum client connectivity. Control field which was interpreted
using value of 3-bits B5 to B3, now uses value of 4-bits B6 to B3 to
interpret the type of AP. Hence update IEEE80211_HE_6GHZ_OPER_CTRL_REG_INFO
to parse 4-bits control field. If older AP still updates only 3-bits
value of control field, station can still interpret the value as per
section E.2.7 of IEEE 802.11 REVme D7.0 and support the appropriate
AP type.
Also update IEEE80211_6GHZ_CTRL_REG_INDOOR_SP_AP as the value of
standard power AP is changed to 8 instead of 4 so that AP can support both
LPI AP and SP AP to maximize the connectivity with stations. For backward
compatibility, keeping value 4 as old AP by limiting it to SP AP only.
Signed-off-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213232.90cdef116aad.I85da390fbee59355e3855691933e6a5e55c47ac4@changeid
[fix kernel-doc]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Currently the code only sends extended MLD capa/ops in
strict mode, but if the AP has it then it should also
be able to parse it. There could be cases where the AP
doesn't have it but we would want to advertise it (e.g.
if the AP supports nothing but we want to have BTM.),
but given the broken deployed APs out there right now
this is the best we can do.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213232.c9b8b3a6ca77.I1153d4283d1fbb9e5db60e7b939cc133a6345db5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
cfg80211 now reports whether this is the first part of a scan. Copy
that information into the driver request.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.63f6078bd7be.Ia6e5cee945e6d9617c2f427552d89d23c92eee83@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|