summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-09dpll: zl3073x: Add support to get/set priority on input pinsIvan Vecera
Add support for getting and setting input pin priority. Implement required callbacks and set appropriate capability for input pins. Although the pin priority make sense only if the DPLL is running in automatic mode we have to expose this capability unconditionally because input pins (references) are shared between all DPLLs where one of them can run in automatic mode while the other one not. Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-11-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dpll: zl3073x: Implement input pin selection in manual modeIvan Vecera
Implement input pin state setting if the DPLL is running in manual mode. The driver indicates manual mode if the DPLL mode is one of ref-lock, forced-holdover, freerun. Use these modes to implement input pin state change between connected and disconnected states. When the user set the particular pin as connected the driver marks this input pin as forced reference and switches the DPLL mode to ref-lock. When the use set the pin as disconnected the driver switches the DPLL to freerun or forced holdover mode. The switch to holdover mode is done if the DPLL has holdover capability (e.g is currently locked with holdover acquired). Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-10-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dpll: zl3073x: Register DPLL devices and pinsIvan Vecera
Enumerate all available DPLL channels and registers a DPLL device for each of them. Check all input references and outputs and register DPLL pins for them. Number of registered DPLL pins depends on configuration of references and outputs. If the reference or output is configured as differential one then only one DPLL pin is registered. Both references and outputs can be also disabled from firmware configuration and in this case no DPLL pins are registered. All registrable references are registered to all available DPLL devices with exception of DPLLs that are configured in NCO (numerically controlled oscillator) mode. In this mode DPLL channel acts as PHC and cannot be locked to any reference. Device outputs are connected to one of synthesizers and each synthesizer is driven by some DPLL channel. So output pins belonging to given output are registered to DPLL device that drives associated synthesizer. Finally add kworker task to monitor async changes on all DPLL channels and input pins and to notify about them DPLL core. Output pins are not monitored as their parameters are not changed asynchronously by the device. Co-developed-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-9-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dpll: zl3073x: Read DPLL types and pin properties from system firmwareIvan Vecera
Add support for reading of DPLL types and optional pin properties from the system firmware (DT, ACPI...). The DPLL types are stored in property 'dpll-types' as string array and possible values 'pps' and 'eec' are mapped to DPLL enums DPLL_TYPE_PPS and DPLL_TYPE_EEC. The pin properties are stored under 'input-pins' and 'output-pins' sub-nodes and the following ones are supported: * reg integer that specifies pin index * label string that is used by driver as board label * connection-type string that indicates pin connection type * supported-frequencies-hz array of u64 values what frequencies are supported / allowed for given pin with respect to hardware wiring Do not blindly trust system firmware and filter out frequencies that cannot be configured/represented in device (input frequencies have to be factorized by one of the base frequencies and output frequencies have to divide configured synthesizer frequency). Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-8-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dpll: zl3073x: Fetch invariants during probeIvan Vecera
Several configuration parameters will remain constant at runtime, so we can load them during probe to avoid excessive reads from the hardware. Read the following parameters from the device during probe and store them for later use: * enablement status and frequencies of the synthesizers and their associated DPLL channels * enablement status and type (single-ended or differential) of input pins * associated synthesizers, signal format, and enablement status of outputs Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-7-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dpll: Add basic Microchip ZL3073x supportIvan Vecera
Microchip Azurite ZL3073x represents chip family providing DPLL and optionally PHC (PTP) functionality. The chips can be connected be connected over I2C or SPI bus. They have the following characteristics: * up to 5 separate DPLL units (channels) * 5 synthesizers * 10 input pins (references) * 10 outputs * 20 output pins (output pin pair shares one output) * Each reference and output can operate in either differential or single-ended mode (differential mode uses 2 pins) * Each output is connected to one of the synthesizers * Each synthesizer is driven by one of the DPLL unit The device uses 7-bit addresses and 8-bits values. It exposes 8-, 16-, 32- and 48-bits registers in address range <0x000,0x77F>. Due to 7bit addressing, the range is organized into pages of 128 bytes, with each page containing a page selector register at address 0x7F. For reading/writing multi-byte registers, the device supports bulk transfers. Add basic functionality to access device registers, probe functionality both I2C and SPI cases and add devlink support to provide info and to set clock ID parameter. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-6-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09devlink: Add new "clock_id" generic device paramIvan Vecera
Add a new device generic parameter to specify clock ID that should be used by the device for registering DPLL devices and pins. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-5-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09devlink: Add support for u64 parametersIvan Vecera
Only 8, 16 and 32-bit integers are supported for numeric devlink parameters. The subsequent patch adds support for DPLL clock ID that is defined as 64-bit number. Add support for u64 parameter type. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-4-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dt-bindings: dpll: Add support for Microchip Azurite chip familyIvan Vecera
Add DT bindings for Microchip Azurite DPLL chip family. These chips provide up to 5 independent DPLL channels, 10 differential or single-ended inputs and 10 differential or 20 single-ended outputs. They can be connected via I2C or SPI busses. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dt-bindings: dpll: Add DPLL device and pinIvan Vecera
Add a common DT schema for DPLL device and its associated pins. The DPLL (device phase-locked loop) is a device used for precise clock synchronization in networking and telecom hardware. The device includes one or more DPLLs (channels) and one or more physical input/output pins. Each DPLL channel is used either to provide a pulse-per-clock signal or to drive an Ethernet equipment clock. The input and output pins have the following properties: * label: specifies board label * connection type: specifies its usage depending on wiring * list of supported or allowed frequencies: depending on how the pin is connected and where) * embedded sync capability: indicates whether the pin supports this Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09virtio-net: xsk: rx: move the xdp->data adjustment to buf_to_xdp()Bui Quang Minh
This commit does not do any functional changes. It moves xdp->data adjustment for buffer other than first buffer to buf_to_xdp() helper so that the xdp_buff adjustment does not scatter over different functions. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250705075515.34260-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09Merge branch 'add-vf-drivers-for-wangxun-virtual-functions'Jakub Kicinski
Mengyuan Lou says: ==================== Add vf drivers for wangxun virtual functions Introduces basic support for Wangxun’s virtual function (VF) network drivers, specifically txgbevf and ngbevf. These drivers provide SR-IOV VF functionality for Wangxun 10/25/40G network devices. The first three patches add common APIs for Wangxun VF drivers, including mailbox communication and shared initialization logic.These abstractions are placed in libwx to reduce duplication across VF drivers. Patches 4–8 introduce the txgbevf driver, including: PCI device initialization, Hardware reset, Interrupt setup, Rx/Tx datapath implementation and link status changeing flow. Patches 9–12 implement the ngbevf driver, mirroring the functionality added in txgbevf. v2: https://lore.kernel.org/20250625102058.19898-1-mengyuanlou@net-swift.com v1: https://lore.kernel.org/20250611083559.14175-1-mengyuanlou@net-swift.com ==================== Link: https://patch.msgid.link/20250704094923.652-1-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: ngbevf: add link update flowMengyuan Lou
Add link update flow to wangxun 1G virtual functions. Get link status from pf in mbox, and if it is failed then check the vx_status, because vx_status switching is too slow. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-13-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: ngbevf: init interrupts and request irqsMengyuan Lou
Add specific parameters for irq alloc, then use wx_init_interrupt_scheme to initialize interrupt allocation in probe. Add .ndo_start_xmit support and start all queues. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-12-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: ngbevf: add sw init pci info and reset hardwareMengyuan Lou
Do sw init and reset hw for ngbevf virtual functions, then register netdev. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-11-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: wangxun: add ngbevf buildMengyuan Lou
Add doc build infrastructure for ngbevf driver. Implement the basic PCI driver loading and unloading interface. Initialize the id_table which support 1G virtual functions for Wangxun. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-10-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: txgbevf: add link update flowMengyuan Lou
Add link update flow to wangxun 10/25/40G virtual functions. Get link status from pf in mbox, and if it is failed then check the vx_status, because vx_status switching is too slow. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-9-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: txgbevf: Support Rx and Tx process pathMengyuan Lou
Improve the configuration of Rx and Tx ring. Setup and alloc resources. Configure Rx and Tx unit on hardware. Add .ndo_start_xmit support and start all queues. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-8-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: txgbevf: init interrupts and request irqsMengyuan Lou
Add irq alloc flow functions for vf. Alloc pcie msix irqs for drivers and request_irq for tx/rx rings and misc other events. If the application is successful, config vertors for interrupts. Enable interrupts mask in wxvf_irq_enable. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-7-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: txgbevf: add sw init pci info and reset hardwareMengyuan Lou
Add sw init and reset hw for txgbevf virtual functions which initialize basic parameters, and then register netdev. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-6-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: wangxun: add txgbevf buildMengyuan Lou
Add doc build infrastructure for txgbevf driver. Implement the basic PCI driver loading and unloading interface. Initialize the id_table which support 10/25/40G virtual functions for Wangxun. Ioremap the space of bar0 and bar4 which will be used. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-5-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: libwx: add wangxun vf common apiMengyuan Lou
Add common wx_configure_vf and wx_set_mac_vf for ngbevf and txgbevf. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-4-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: libwx: add base vf api for vf driversMengyuan Lou
Implement mbox_write_and_read_ack functions which are used to set basic functions like set_mac, get_link.etc for vf. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-3-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09net: libwx: add mailbox api for wangxun vf driversMengyuan Lou
Implements the mailbox interfaces for Wangxun vf drivers which will be used in txgbevf and ngbevf. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/20250704094923.652-2-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09Merge branch ↵Jakub Kicinski
'atm-clip-fix-infinite-recursion-potential-null-ptr-deref-and-memleak' Kuniyuki Iwashima says: ==================== atm: clip: Fix infinite recursion, potential null-ptr-deref, and memleak. Patch 1 fixes racy access to atmarpd found while checking RTNL usage in clip.c. Patch 2 fixes memory leak by ioctl(ATMARP_MKIP) and ioctl(ATMARPD_CTRL). Patch 3 fixes infinite recursive call of clip_vcc->old_push(), which was reported by syzbot. v1: https://lore.kernel.org/20250702020437.703698-1-kuniyu@google.com ==================== Link: https://patch.msgid.link/20250704062416.1613927-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09atm: clip: Fix infinite recursive call of clip_push().Kuniyuki Iwashima
syzbot reported the splat below. [0] This happens if we call ioctl(ATMARP_MKIP) more than once. During the first call, clip_mkip() sets clip_push() to vcc->push(), and the second call copies it to clip_vcc->old_push(). Later, when the socket is close()d, vcc_destroy_socket() passes NULL skb to clip_push(), which calls clip_vcc->old_push(), triggering the infinite recursion. Let's prevent the second ioctl(ATMARP_MKIP) by checking vcc->user_back, which is allocated by the first call as clip_vcc. Note also that we use lock_sock() to prevent racy calls. [0]: BUG: TASK stack guard page was hit at ffffc9000d66fff8 (stack is ffffc9000d670000..ffffc9000d678000) Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5322 Comm: syz.0.0 Not tainted 6.16.0-rc4-syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:clip_push+0x5/0x720 net/atm/clip.c:191 Code: e0 8f aa 8c e8 1c ad 5b fa eb ae 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 <41> 57 41 56 41 55 41 54 53 48 83 ec 20 48 89 f3 49 89 fd 48 bd 00 RSP: 0018:ffffc9000d670000 EFLAGS: 00010246 RAX: 1ffff1100235a4a5 RBX: ffff888011ad2508 RCX: ffff8880003c0000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888037f01000 RBP: dffffc0000000000 R08: ffffffff8fa104f7 R09: 1ffffffff1f4209e R10: dffffc0000000000 R11: ffffffff8a99b300 R12: ffffffff8a99b300 R13: ffff888037f01000 R14: ffff888011ad2500 R15: ffff888037f01578 FS: 000055557ab6d500(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000d66fff8 CR3: 0000000043172000 CR4: 0000000000352ef0 Call Trace: <TASK> clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 ... clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 vcc_destroy_socket net/atm/common.c:183 [inline] vcc_release+0x157/0x460 net/atm/common.c:205 __sock_release net/socket.c:647 [inline] sock_close+0xc0/0x240 net/socket.c:1391 __fput+0x449/0xa70 fs/file_table.c:465 task_work_run+0x1d1/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff31c98e929 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fffb5aa1f78 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 0000000000012747 RCX: 00007ff31c98e929 RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007ff31cbb7ba0 R08: 0000000000000001 R09: 0000000db5aa226f R10: 00007ff31c7ff030 R11: 0000000000000246 R12: 00007ff31cbb608c R13: 00007ff31cbb6080 R14: ffffffffffffffff R15: 00007fffb5aa2090 </TASK> Modules linked in: Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+0c77cccd6b7cd917b35a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2371d94d248d126c1eb1 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09atm: clip: Fix memory leak of struct clip_vcc.Kuniyuki Iwashima
ioctl(ATMARP_MKIP) allocates struct clip_vcc and set it to vcc->user_back. The code assumes that vcc_destroy_socket() passes NULL skb to vcc->push() when the socket is close()d, and then clip_push() frees clip_vcc. However, ioctl(ATMARPD_CTRL) sets NULL to vcc->push() in atm_init_atmarp(), resulting in memory leak. Let's serialise two ioctl() by lock_sock() and check vcc->push() in atm_init_atmarp() to prevent memleak. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09atm: clip: Fix potential null-ptr-deref in to_atmarpd().Kuniyuki Iwashima
atmarpd is protected by RTNL since commit f3a0592b37b8 ("[ATM]: clip causes unregister hang"). However, it is not enough because to_atmarpd() is called without RTNL, especially clip_neigh_solicit() / neigh_ops->solicit() is unsleepable. Also, there is no RTNL dependency around atmarpd. Let's use a private mutex and RCU to protect access to atmarpd in to_atmarpd(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09dt-bindings: net: Add support for Sophgo CV1800 dwmacInochi Amaoto
The GMAC IP on CV1800 series SoC is a standard Synopsys DesignWare MAC (version 3.70a). Add necessary compatible string for this device. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250703021220.124195-1-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09KVM: x86: avoid underflow when scaling TSC frequencyPaolo Bonzini
In function kvm_guest_time_update(), __scale_tsc() is used to calculate a TSC *frequency* rather than a TSC value. With low-enough ratios, a TSC value that is less than 1 would underflow to 0 and to an infinite while loop in kvm_get_time_scale(): kvm_guest_time_update(struct kvm_vcpu *v) if (kvm_caps.has_tsc_control) tgt_tsc_khz = kvm_scale_tsc(tgt_tsc_khz, v->arch.l1_tsc_scaling_ratio); __scale_tsc(u64 ratio, u64 tsc) ratio=122380531, tsc=2299998, N=48 ratio*tsc >> N = 0.999... -> 0 Later in the function: Call Trace: <TASK> kvm_get_time_scale arch/x86/kvm/x86.c:2458 [inline] kvm_guest_time_update+0x926/0xb00 arch/x86/kvm/x86.c:3268 vcpu_enter_guest.constprop.0+0x1e70/0x3cf0 arch/x86/kvm/x86.c:10678 vcpu_run+0x129/0x8d0 arch/x86/kvm/x86.c:11126 kvm_arch_vcpu_ioctl_run+0x37a/0x13d0 arch/x86/kvm/x86.c:11352 kvm_vcpu_ioctl+0x56b/0xe60 virt/kvm/kvm_main.c:4188 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0x12d/0x190 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x59/0x110 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x78/0xe2 This can really happen only when fuzzing, since the TSC frequency would have to be nonsensically low. Fixes: 35181e86df97 ("KVM: x86: Add a common TSC scaling function") Reported-by: Yuntao Liu <liuyuntao12@huawei.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-09eventpoll: don't decrement ep refcount while still holding the ep mutexLinus Torvalds
Jann Horn points out that epoll is decrementing the ep refcount and then doing a mutex_unlock(&ep->mtx); afterwards. That's very wrong, because it can lead to a use-after-free. That pattern is actually fine for the very last reference, because the code in question will delay the actual call to "ep_free(ep)" until after it has unlocked the mutex. But it's wrong for the much subtler "next to last" case when somebody *else* may also be dropping their reference and free the ep while we're still using the mutex. Note that this is true even if that other user is also using the same ep mutex: mutexes, unlike spinlocks, can not be used for object ownership, even if they guarantee mutual exclusion. A mutex "unlock" operation is not atomic, and as one user is still accessing the mutex as part of unlocking it, another user can come in and get the now released mutex and free the data structure while the first user is still cleaning up. See our mutex documentation in Documentation/locking/mutex-design.rst, in particular the section [1] about semantics: "mutex_unlock() may access the mutex structure even after it has internally released the lock already - so it's not safe for another context to acquire the mutex and assume that the mutex_unlock() context is not using the structure anymore" So if we drop our ep ref before the mutex unlock, but we weren't the last one, we may then unlock the mutex, another user comes in, drops _their_ reference and releases the 'ep' as it now has no users - all while the mutex_unlock() is still accessing it. Fix this by simply moving the ep refcount dropping to outside the mutex: the refcount itself is atomic, and doesn't need mutex protection (that's the whole _point_ of refcounts: unlike mutexes, they are inherently about object lifetimes). Reported-by: Jann Horn <jannh@google.com> Link: https://docs.kernel.org/locking/mutex-design.html#semantics [1] Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-07-09Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix bogus KASAN splat on EFI runtime stack - Select JUMP_LABEL unconditionally to avoid boot failure with pKVM and the legacy implementation of static keys - Avoid touching GCS registers when 'arm64.nogcs' has been passed on the command-line - Move a 'cpumask_t' off the stack in smp_send_stop() - Don't advertise SME-related hwcaps to userspace when ID_AA64PFR1_EL1 indicates that SME is not implemented - Always check the VMA when handling an Overlay fault - Avoid corrupting TCR2_EL1 during boot * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/mm: Drop wrong writes into TCR2_EL1 arm64: poe: Handle spurious Overlay faults arm64: Filter out SME hwcaps when FEAT_SME isn't implemented arm64: move smp_send_stop() cpu mask off stack arm64/gcs: Don't try to access GCS registers if arm64.nogcs is enabled arm64: Unconditionally select CONFIG_JUMP_LABEL arm64: efi: Fix KASAN false positive for EFI runtime stack
2025-07-09Merge tag 'pinctrl-v6.16-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Mark som pins as invalid for IRQ use in the Qualcomm driver - Fix up the use of device properties on the MA35DX Nuvoton, apparently something went sidewise - Clear the GPIO debounce settings when going down for suspend in the AMD driver. Very good for some AMD laptops that now wake up from suspend again! - Add the compulsory .can_sleep bool flag in the AW9523 driver, should have been there from the beginning, now there are users finding the bug - Drop some bouncing email address from MAINTAINERS * tag 'pinctrl-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: aw9523: fix can_sleep flag for GPIO chip pinctrl: amd: Clear GPIO debounce for suspend pinctrl: nuvoton: Fix boot on ma35dx platforms MAINTAINERS: drop bouncing Lakshmi Sowjanya D pinctrl: qcom: msm: mark certain pins as invalid for interrupts
2025-07-08udp: remove udp_tunnel_gro_init()Eric Dumazet
Use DEFINE_MUTEX() to initialize udp_tunnel_gro_type_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250707091634.311974-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08dt-bindings: net: altr,socfpga-stmmac.yaml: add minItems to iommusMatthew Gerlach
Add missing 'minItems: 1' to iommus property of the Altera SOCFPGA SoC implementation of the Synopsys DWMAC. Fixes: 6d359cf464f4 ("dt-bindings: net: Convert socfpga-dwmac bindings to yaml") Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250707154409.15527-1-matthew.gerlach@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: dt-bindings: ixp4xx-ethernet: Support fixed linksLinus Walleij
This ethernet controller is using fixed links for DSA switches in two already existing device trees, so make sure the checker does not complain like this: intel-ixp42x-linksys-wrv54g.dtb: ethernet@c8009000 (intel,ixp4xx-ethernet): 'fixed-link' does not match any of the regexes: '^pinctrl-[0-9]+$' from schema $id: http://devicetree.org/schemas/net/intel,ixp4xx-ethernet.yaml# intel-ixp42x-usrobotics-usr8200.dtb: ethernet@c800a000 (intel,ixp4xx-ethernet): 'fixed-link' does not match any of the regexes: '^pinctrl-[0-9]+$' from schema $id: http://devicetree.org/schemas/net/intel,ixp4xx-ethernet.yaml# Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507040609.K9KytWBA-lkp@intel.com/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250704-ixp4xx-ethernet-binding-fix-v1-1-8ac360d5bc9b@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'ipv6-drop-rtnl-from-mcast-c-and-anycast-c'Jakub Kicinski
Kuniyuki Iwashima says: ==================== ipv6: Drop RTNL from mcast.c and anycast.c This is a prep series for RCU conversion of RTM_NEWNEIGH, which needs RTNL during neigh_table.{pconstructor,pdestructor}() touching IPv6 multicast code. Currently, IPv6 multicast code is protected by lock_sock() and inet6_dev->mc_lock, and RTNL is not actually needed. In addition, anycast code is also in the same situation and does not need RTNL at all. This series removes RTNL from net/ipv6/{mcast.c,anycast.c} and finally removes setsockopt_needs_rtnl() from do_ipv6_setsockopt(). v2: https://lore.kernel.org/20250624202616.526600-1-kuni1840@gmail.com v1: https://lore.kernel.org/20250616233417.1153427-1-kuni1840@gmail.com ==================== Link: https://patch.msgid.link/20250702230210.3115355-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: Remove setsockopt_needs_rtnl().Kuniyuki Iwashima
We no longer need to hold RTNL for IPv6 socket options. Let's remove setsockopt_needs_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-16-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: anycast: Don't hold RTNL for IPV6_JOIN_ANYCAST.Kuniyuki Iwashima
inet6_sk(sk)->ipv6_ac_list is protected by lock_sock(). In ipv6_sock_ac_join(), only __dev_get_by_index(), __dev_get_by_flags(), and __in6_dev_get() require RTNL. __dev_get_by_flags() is only used by ipv6_sock_ac_join() and can be converted to RCU version. Let's replace RCU version helper and drop RTNL from IPV6_JOIN_ANYCAST. setsockopt_needs_rtnl() will be removed in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-15-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: anycast: Unify two error paths in ipv6_sock_ac_join().Kuniyuki Iwashima
The next patch will replace __dev_get_by_index() and __dev_get_by_flags() to RCU + refcount version. Then, we will need to call dev_put() in some error paths. Let's unify two error paths to make the next patch cleaner. Note that we add READ_ONCE() for net->ipv6.devconf_all->forwarding and idev->conf.forwarding as we will drop RTNL that protects them. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-14-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: anycast: Don't hold RTNL for IPV6_LEAVE_ANYCAST and IPV6_ADDRFORM.Kuniyuki Iwashima
inet6_sk(sk)->ipv6_ac_list is protected by lock_sock(). In ipv6_sock_ac_drop() and ipv6_sock_ac_close(), only __dev_get_by_index() and __in6_dev_get() requrie RTNL. Let's replace them with dev_get_by_index() and in6_dev_get() and drop RTNL from IPV6_LEAVE_ANYCAST and IPV6_ADDRFORM. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-13-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: anycast: Don't use rtnl_dereference().Kuniyuki Iwashima
inet6_dev->ac_list is protected by inet6_dev->lock, so rtnl_dereference() is a bit rough annotation. As done in mcast.c, we can use ac_dereference() that checks if inet6_dev->lock is held. Let's replace rtnl_dereference() with a new helper ac_dereference(). Note that now addrconf_join_solict() / addrconf_leave_solict() in __ipv6_dev_ac_inc() / __ipv6_dev_ac_dec() does not need RTNL, so we can remove ASSERT_RTNL() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-12-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Remove unnecessary ASSERT_RTNL and comment.Kuniyuki Iwashima
Now, RTNL is not needed for mcast code, and what's commented in ip6_mc_msfget() is apparent by for_each_pmc_socklock(), which has lockdep annotation for lock_sock(). Let's remove the comment and ASSERT_RTNL() in ipv6_mc_rejoin_groups(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-11-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Don't hold RTNL for MCAST_ socket options.Kuniyuki Iwashima
In ip6_mc_source() and ip6_mc_msfilter(), per-socket mld data is protected by lock_sock() and inet6_dev->mc_lock is also held for some per-interface functions. ip6_mc_find_dev_rtnl() only depends on RTNL. If we want to remove it, we need to check inet6_dev->dead under mc_lock to close the race with addrconf_ifdown(), as mentioned earlier. Let's do that and drop RTNL for the rest of MCAST_ socket options. Note that ip6_mc_msfilter() has unnecessary lock dances and they are integrated into one to avoid the last-minute error and simplify the error handling. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-10-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Don't hold RTNL in ipv6_sock_mc_close().Kuniyuki Iwashima
In __ipv6_sock_mc_close(), per-socket mld data is protected by lock_sock(), and only __dev_get_by_index() and __in6_dev_get() require RTNL. Let's call __ipv6_sock_mc_drop() and drop RTNL in ipv6_sock_mc_close(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-9-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Don't hold RTNL for IPV6_DROP_MEMBERSHIP and MCAST_LEAVE_GROUP.Kuniyuki Iwashima
In __ipv6_sock_mc_drop(), per-socket mld data is protected by lock_sock(), and only __dev_get_by_index() and __in6_dev_get() require RTNL. Let's use dev_get_by_index() and in6_dev_get() and drop RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP. Note that __ipv6_sock_mc_drop() is factorised to reuse in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-8-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Don't hold RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.Kuniyuki Iwashima
In __ipv6_sock_mc_join(), per-socket mld data is protected by lock_sock(), and only __dev_get_by_index() requires RTNL. Let's use dev_get_by_index() and drop RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP. Note that we must call rt6_lookup() and dev_hold() under RCU. If rt6_lookup() returns an entry from the exception table, dst_dev_put() could change rt->dev.dst to loopback concurrently, and the original device could lose the refcount before dev_hold() and unblock device registration. dst_dev_put() is called from NETDEV_UNREGISTER and synchronize_net() follows it, so as long as rt6_lookup() and dev_hold() are called within the same RCU critical section, the dev is alive. Even if the race happens, they are synchronised by idev->dead and mcast addresses are cleaned up. For the racy access to rt->dst.dev, we use dst_dev(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-7-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Use in6_dev_get() in ipv6_dev_mc_dec().Kuniyuki Iwashima
As well as __ipv6_dev_mc_inc(), all code in __ipv6_dev_mc_dec() are protected by inet6_dev->mc_lock, and RTNL is not needed. Let's use in6_dev_get() in ipv6_dev_mc_dec() and remove ASSERT_RTNL() in __ipv6_dev_mc_dec(). Now, we can remove the RTNL comment above addrconf_leave_solict() too. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-6-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Remove mca_get().Kuniyuki Iwashima
Since commit 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data"), the newly allocated struct ifmcaddr6 cannot be removed until inet6_dev->mc_lock is released, so mca_get() and mc_put() are unnecessary. Let's remove the extra refcounting. Note that mca_get() was only used in __ipv6_dev_mc_inc(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-5-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08ipv6: mcast: Check inet6_dev->dead under idev->mc_lock in __ipv6_dev_mc_inc().Kuniyuki Iwashima
Since commit 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data"), every multicast resource is protected by inet6_dev->mc_lock. RTNL is unnecessary in terms of protection but still needed for synchronisation between addrconf_ifdown() and __ipv6_dev_mc_inc(). Once we removed RTNL, there would be a race below, where we could add a multicast address to a dead inet6_dev. CPU1 CPU2 ==== ==== addrconf_ifdown() __ipv6_dev_mc_inc() if (idev->dead) <-- false dead = true return -ENODEV; ipv6_mc_destroy_dev() / ipv6_mc_down() mutex_lock(&idev->mc_lock) ... mutex_unlock(&idev->mc_lock) mutex_lock(&idev->mc_lock) ... mutex_unlock(&idev->mc_lock) The race window can be easily closed by checking inet6_dev->dead under inet6_dev->mc_lock in __ipv6_dev_mc_inc() as addrconf_ifdown() will acquire it after marking inet6_dev dead. Let's check inet6_dev->dead under mc_lock in __ipv6_dev_mc_inc(). Note that now __ipv6_dev_mc_inc() no longer depends on RTNL and we can remove ASSERT_RTNL() there and the RTNL comment above addrconf_join_solict(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-4-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>