diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-30 08:58:55 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-30 08:58:55 -0700 |
commit | 8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf (patch) | |
tree | fec3039a08284cd87f4ec9c3bea5b5a439f1859f /lib/ref_tracker.c | |
parent | 4b290aae788e06561754b28c6842e4080957d3f7 (diff) | |
parent | fa582ca7e187a15e772e6a72fe035f649b387a60 (diff) |
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
Diffstat (limited to 'lib/ref_tracker.c')
-rw-r--r-- | lib/ref_tracker.c | 289 |
1 files changed, 274 insertions, 15 deletions
diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c index cf5609b1ca79..a9e6ffcff04b 100644 --- a/lib/ref_tracker.c +++ b/lib/ref_tracker.c @@ -8,6 +8,7 @@ #include <linux/slab.h> #include <linux/stacktrace.h> #include <linux/stackdepot.h> +#include <linux/seq_file.h> #define REF_TRACKER_STACK_ENTRIES 16 #define STACK_BUF_SIZE 1024 @@ -28,6 +29,45 @@ struct ref_tracker_dir_stats { } stacks[]; }; +#ifdef CONFIG_DEBUG_FS +#include <linux/xarray.h> + +/* + * ref_tracker_dir_init() is usually called in allocation-safe contexts, but + * the same is not true of ref_tracker_dir_exit() which can be called from + * anywhere an object is freed. Removing debugfs dentries is a blocking + * operation, so we defer that work to the debugfs_reap_worker. + * + * Each dentry is tracked in the appropriate xarray. When + * ref_tracker_dir_exit() is called, its entries in the xarrays are marked and + * the workqueue job is scheduled. The worker then runs and deletes any marked + * dentries asynchronously. + */ +static struct xarray debugfs_dentries; +static struct xarray debugfs_symlinks; +static struct work_struct debugfs_reap_worker; + +#define REF_TRACKER_DIR_DEAD XA_MARK_0 +static inline void ref_tracker_debugfs_mark(struct ref_tracker_dir *dir) +{ + unsigned long flags; + + xa_lock_irqsave(&debugfs_dentries, flags); + __xa_set_mark(&debugfs_dentries, (unsigned long)dir, REF_TRACKER_DIR_DEAD); + xa_unlock_irqrestore(&debugfs_dentries, flags); + + xa_lock_irqsave(&debugfs_symlinks, flags); + __xa_set_mark(&debugfs_symlinks, (unsigned long)dir, REF_TRACKER_DIR_DEAD); + xa_unlock_irqrestore(&debugfs_symlinks, flags); + + schedule_work(&debugfs_reap_worker); +} +#else +static inline void ref_tracker_debugfs_mark(struct ref_tracker_dir *dir) +{ +} +#endif + static struct ref_tracker_dir_stats * ref_tracker_get_stats(struct ref_tracker_dir *dir, unsigned int limit) { @@ -63,21 +103,39 @@ ref_tracker_get_stats(struct ref_tracker_dir *dir, unsigned int limit) } struct ostream { + void __ostream_printf (*func)(struct ostream *stream, char *fmt, ...); + char *prefix; char *buf; + struct seq_file *seq; int size, used; }; +static void __ostream_printf pr_ostream_log(struct ostream *stream, char *fmt, ...) +{ + va_list args; + + va_start(args, fmt); + vprintk(fmt, args); + va_end(args); +} + +static void __ostream_printf pr_ostream_buf(struct ostream *stream, char *fmt, ...) +{ + int ret, len = stream->size - stream->used; + va_list args; + + va_start(args, fmt); + ret = vsnprintf(stream->buf + stream->used, len, fmt, args); + va_end(args); + if (ret > 0) + stream->used += min(ret, len); +} + #define pr_ostream(stream, fmt, args...) \ ({ \ struct ostream *_s = (stream); \ \ - if (!_s->buf) { \ - pr_err(fmt, ##args); \ - } else { \ - int ret, len = _s->size - _s->used; \ - ret = snprintf(_s->buf + _s->used, len, pr_fmt(fmt), ##args); \ - _s->used += min(ret, len); \ - } \ + _s->func(_s, fmt, ##args); \ }) static void @@ -96,8 +154,8 @@ __ref_tracker_dir_pr_ostream(struct ref_tracker_dir *dir, stats = ref_tracker_get_stats(dir, display_limit); if (IS_ERR(stats)) { - pr_ostream(s, "%s@%pK: couldn't get stats, error %pe\n", - dir->name, dir, stats); + pr_ostream(s, "%s%s@%p: couldn't get stats, error %pe\n", + s->prefix, dir->class, dir, stats); return; } @@ -107,14 +165,15 @@ __ref_tracker_dir_pr_ostream(struct ref_tracker_dir *dir, stack = stats->stacks[i].stack_handle; if (sbuf && !stack_depot_snprint(stack, sbuf, STACK_BUF_SIZE, 4)) sbuf[0] = 0; - pr_ostream(s, "%s@%pK has %d/%d users at\n%s\n", dir->name, dir, - stats->stacks[i].count, stats->total, sbuf); + pr_ostream(s, "%s%s@%p has %d/%d users at\n%s\n", s->prefix, + dir->class, dir, stats->stacks[i].count, + stats->total, sbuf); skipped -= stats->stacks[i].count; } if (skipped) - pr_ostream(s, "%s@%pK skipped reports about %d/%d users.\n", - dir->name, dir, skipped, stats->total); + pr_ostream(s, "%s%s@%p skipped reports about %d/%d users.\n", + s->prefix, dir->class, dir, skipped, stats->total); kfree(sbuf); @@ -124,7 +183,8 @@ __ref_tracker_dir_pr_ostream(struct ref_tracker_dir *dir, void ref_tracker_dir_print_locked(struct ref_tracker_dir *dir, unsigned int display_limit) { - struct ostream os = {}; + struct ostream os = { .func = pr_ostream_log, + .prefix = "ref_tracker: " }; __ref_tracker_dir_pr_ostream(dir, display_limit, &os); } @@ -143,7 +203,10 @@ EXPORT_SYMBOL(ref_tracker_dir_print); int ref_tracker_dir_snprint(struct ref_tracker_dir *dir, char *buf, size_t size) { - struct ostream os = { .buf = buf, .size = size }; + struct ostream os = { .func = pr_ostream_buf, + .prefix = "ref_tracker: ", + .buf = buf, + .size = size }; unsigned long flags; spin_lock_irqsave(&dir->lock, flags); @@ -161,6 +224,11 @@ void ref_tracker_dir_exit(struct ref_tracker_dir *dir) bool leak = false; dir->dead = true; + /* + * The xarray entries must be marked before the dir->lock is taken to + * protect simultaneous debugfs readers. + */ + ref_tracker_debugfs_mark(dir); spin_lock_irqsave(&dir->lock, flags); list_for_each_entry_safe(tracker, n, &dir->quarantine, head) { list_del(&tracker->head); @@ -273,3 +341,194 @@ int ref_tracker_free(struct ref_tracker_dir *dir, return 0; } EXPORT_SYMBOL_GPL(ref_tracker_free); + +#ifdef CONFIG_DEBUG_FS +#include <linux/debugfs.h> + +static struct dentry *ref_tracker_debug_dir = (struct dentry *)-ENOENT; + +static void __ostream_printf pr_ostream_seq(struct ostream *stream, char *fmt, ...) +{ + va_list args; + + va_start(args, fmt); + seq_vprintf(stream->seq, fmt, args); + va_end(args); +} + +static int ref_tracker_dir_seq_print(struct ref_tracker_dir *dir, struct seq_file *seq) +{ + struct ostream os = { .func = pr_ostream_seq, + .prefix = "", + .seq = seq }; + + __ref_tracker_dir_pr_ostream(dir, 16, &os); + + return os.used; +} + +static int ref_tracker_debugfs_show(struct seq_file *f, void *v) +{ + struct ref_tracker_dir *dir = f->private; + unsigned long index = (unsigned long)dir; + unsigned long flags; + int ret; + + /* + * "dir" may not exist at this point if ref_tracker_dir_exit() has + * already been called. Take care not to dereference it until its + * legitimacy is established. + * + * The xa_lock is necessary to ensure that "dir" doesn't disappear + * before its lock can be taken. If it's in the hash and not marked + * dead, then it's safe to take dir->lock which prevents + * ref_tracker_dir_exit() from completing. Once the dir->lock is + * acquired, the xa_lock can be released. All of this must be IRQ-safe. + */ + xa_lock_irqsave(&debugfs_dentries, flags); + if (!xa_load(&debugfs_dentries, index) || + xa_get_mark(&debugfs_dentries, index, REF_TRACKER_DIR_DEAD)) { + xa_unlock_irqrestore(&debugfs_dentries, flags); + return -ENODATA; + } + + spin_lock(&dir->lock); + xa_unlock(&debugfs_dentries); + ret = ref_tracker_dir_seq_print(dir, f); + spin_unlock_irqrestore(&dir->lock, flags); + return ret; +} + +static int ref_tracker_debugfs_open(struct inode *inode, struct file *filp) +{ + struct ref_tracker_dir *dir = inode->i_private; + + return single_open(filp, ref_tracker_debugfs_show, dir); +} + +static const struct file_operations ref_tracker_debugfs_fops = { + .owner = THIS_MODULE, + .open = ref_tracker_debugfs_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +/** + * ref_tracker_dir_debugfs - create debugfs file for ref_tracker_dir + * @dir: ref_tracker_dir to be associated with debugfs file + * + * In most cases, a debugfs file will be created automatically for every + * ref_tracker_dir. If the object was created before debugfs is brought up + * then that may fail. In those cases, it is safe to call this at a later + * time to create the file. + */ +void ref_tracker_dir_debugfs(struct ref_tracker_dir *dir) +{ + char name[NAME_MAX + 1]; + struct dentry *dentry; + int ret; + + /* No-op if already created */ + dentry = xa_load(&debugfs_dentries, (unsigned long)dir); + if (dentry && !xa_is_err(dentry)) + return; + + ret = snprintf(name, sizeof(name), "%s@%px", dir->class, dir); + name[sizeof(name) - 1] = '\0'; + + if (ret < sizeof(name)) { + dentry = debugfs_create_file(name, S_IFREG | 0400, + ref_tracker_debug_dir, dir, + &ref_tracker_debugfs_fops); + if (!IS_ERR(dentry)) { + void *old; + + old = xa_store_irq(&debugfs_dentries, (unsigned long)dir, + dentry, GFP_KERNEL); + + if (xa_is_err(old)) + debugfs_remove(dentry); + else + WARN_ON_ONCE(old); + } + } +} +EXPORT_SYMBOL(ref_tracker_dir_debugfs); + +void __ostream_printf ref_tracker_dir_symlink(struct ref_tracker_dir *dir, const char *fmt, ...) +{ + char name[NAME_MAX + 1]; + struct dentry *symlink, *dentry; + va_list args; + int ret; + + symlink = xa_load(&debugfs_symlinks, (unsigned long)dir); + dentry = xa_load(&debugfs_dentries, (unsigned long)dir); + + /* Already created?*/ + if (symlink && !xa_is_err(symlink)) + return; + + if (!dentry || xa_is_err(dentry)) + return; + + va_start(args, fmt); + ret = vsnprintf(name, sizeof(name), fmt, args); + va_end(args); + name[sizeof(name) - 1] = '\0'; + + if (ret < sizeof(name)) { + symlink = debugfs_create_symlink(name, ref_tracker_debug_dir, + dentry->d_name.name); + if (!IS_ERR(symlink)) { + void *old; + + old = xa_store_irq(&debugfs_symlinks, (unsigned long)dir, + symlink, GFP_KERNEL); + if (xa_is_err(old)) + debugfs_remove(symlink); + else + WARN_ON_ONCE(old); + } + } +} +EXPORT_SYMBOL(ref_tracker_dir_symlink); + +static void debugfs_reap_work(struct work_struct *work) +{ + struct dentry *dentry; + unsigned long index; + bool reaped; + + do { + reaped = false; + xa_for_each_marked(&debugfs_symlinks, index, dentry, REF_TRACKER_DIR_DEAD) { + xa_erase_irq(&debugfs_symlinks, index); + debugfs_remove(dentry); + reaped = true; + } + xa_for_each_marked(&debugfs_dentries, index, dentry, REF_TRACKER_DIR_DEAD) { + xa_erase_irq(&debugfs_dentries, index); + debugfs_remove(dentry); + reaped = true; + } + } while (reaped); +} + +static int __init ref_tracker_debugfs_postcore_init(void) +{ + INIT_WORK(&debugfs_reap_worker, debugfs_reap_work); + xa_init_flags(&debugfs_dentries, XA_FLAGS_LOCK_IRQ); + xa_init_flags(&debugfs_symlinks, XA_FLAGS_LOCK_IRQ); + return 0; +} +postcore_initcall(ref_tracker_debugfs_postcore_init); + +static int __init ref_tracker_debugfs_late_init(void) +{ + ref_tracker_debug_dir = debugfs_create_dir("ref_tracker", NULL); + return 0; +} +late_initcall(ref_tracker_debugfs_late_init); +#endif /* CONFIG_DEBUG_FS */ |