summaryrefslogtreecommitdiff
path: root/include/trace/events
AgeCommit message (Collapse)Author
2025-03-27Merge tag 'trace-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Add option traceoff_after_boot In order to debug kernel boot, it sometimes is helpful to enable tracing via the kernel command line. Unfortunately, by the time the login prompt appears, the trace is overwritten by the init process and other user space start up applications. Adding a "traceoff_after_boot" will disable tracing when the kernel passes control to init which will allow developers to be able to see the traces that occurred during boot. - Clean up the mmflags macros that display the GFP flags in trace events The macros to print the GFP flags for trace events had a bit of duplication. The code was restructured to remove duplication and in the process it also adds some flags that were missed before. - Removed some dead code and scripts/draw_functrace.py draw_functrace.py hasn't worked in years and as nobody complained about it, remove it. - Constify struct event_trigger_ops The event_trigger_ops is just a structure that has function pointers that are assigned when the variables are created. These variables should all be constants. - Other minor clean ups and fixes * tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Replace strncpy with memcpy for fixed-length substring copy tracing: Fix synth event printk format for str fields tracing: Do not use PERF enums when perf is not defined tracing: Ensure module defining synth event cannot be unloaded while tracing tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER tracing/osnoise: Fix possible recursive locking for cpus_read_lock() tracing: Align synth event print fmt tracing: gfp: vsprintf: Do not print "none" when using %pGg printf format tracepoint: Print the function symbol when tracepoint_debug is set tracing: Constify struct event_trigger_ops scripts/tracing: Remove scripts/tracing/draw_functrace.py tracing: Update MAINTAINERS file to include tracepoint.c tracing/user_events: Slightly simplify user_seq_show() tracing/user_events: Don't use %pK through printk tracing: gfp: Remove duplication of recording GFP flags tracing: Remove orphaned event_trace_printk ring-buffer: Fix typo in comment about header page pointer tracing: Add traceoff_after_boot option
2025-03-27Merge tag 'trace-latency-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull latency tracing updates from Steven Rostedt: - Add some trace events to osnoise and timerlat sample generation This adds more information to the osnoise and timerlat tracers as well as allows BPF programs to be attached to these locations to extract even more data. - Fix to DECLARE_TRACE_CONDITION() macro It wasn't used but now will be and it happened to be broken causing the build to fail. - Add scheduler specification monitors to runtime verifier (RV) This is a continuation of Daniel Bristot's work. RV allows monitors to run and react concurrently. Running the cumulative model is equivalent to running single components using the same reactors, with the advantage that it's easier to point out which specification failed in case of error. This update introduces nested monitors to RV, in short, the sysfs monitor folder will contain a monitor named sched, which is nothing but an empty container for other monitors. Controlling the sched monitor (enable, disable, set reactors) controls all nested monitors. The following scheduling monitors are added: - sco: scheduling context operations Monitor to ensure sched_set_state happens only in thread context - tss: task switch while scheduling Monitor to ensure sched_switch happens only in scheduling context - snroc: set non runnable on its own context Monitor to ensure set_state happens only in the respective task's context - scpd: schedule called with preemption disabled Monitor to ensure schedule is called with preemption disabled - snep: schedule does not enable preempt Monitor to ensure schedule does not enable preempt - sncid: schedule not called with interrupt disabled Monitor to ensure schedule is not called with interrupt disabled * tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tools/rv: Allow rv list to filter for container Documentation/rv: Add docs for the sched monitors verification/dot2k: Add support for nested monitors tools/rv: Add support for nested monitors rv: Add scpd, snep and sncid per-cpu monitors rv: Add snroc per-task monitor rv: Add sco and tss per-cpu monitors rv: Add option for nested monitors and include sched sched: Add sched tracepoints for RV task model rv: Add license identifiers to monitor files tracing: Fix DECLARE_TRACE_CONDITION trace/osnoise: Add trace events for samples
2025-03-27Merge tag 'erofs-for-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, EROFS 48-bit block addressing is available to support massive datasets for model training and other large data archive use cases. In addition, byte-oriented encoded extents have been supported to reduce metadata sizes when using large configurations as well as to improve Zstd compression speed. There are some bugfixes and cleanups as usual. Summary: - Support 48-bit block addressing for large images - Introduce encoded extents to reduce metadata on larger pclusters - Enable unaligned compressed data to improve Zstd compression speed - Allow 16-byte volume names again - Minor cleanups" * tag 'erofs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: enable 48-bit layout support erofs: support unaligned encoded data erofs: implement encoded extent metadata erofs: add encoded extent on-disk definition erofs: initialize decompression early erofs: support dot-omitted directories erofs: implement 48-bit block addressing for unencoded inodes erofs: add 48-bit block addressing on-disk support erofs: simplify erofs_{read,fill}_inode() erofs: get rid of erofs_map_blocks_flatmode() erofs: move {in,out}pages into struct z_erofs_decompress_req erofs: clean up header parsing for ztailpacking and fragments erofs: simplify tail inline pcluster handling erofs: allow 16-byte volume name again erofs: get rid of erofs_kmap_type erofs: use Z_EROFS_LCLUSTER_TYPE_MAX to simplify switches
2025-03-26Merge tag 'net-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Continue Netlink conversions to per-namespace RTNL lock (IPv4 routing, routing rules, routing next hops, ARP ioctls) - Continue extending the use of netdev instance locks. As a driver opt-in protect queue operations and (in due course) ethtool operations with the instance lock and not RTNL lock. - Support collecting TCP timestamps (data submitted, sent, acked) in BPF, allowing for transparent (to the application) and lower overhead tracking of TCP RPC performance. - Tweak existing networking Rx zero-copy infra to support zero-copy Rx via io_uring. - Optimize MPTCP performance in single subflow mode by 29%. - Enable GRO on packets which went thru XDP CPU redirect (were queued for processing on a different CPU). Improving TCP stream performance up to 2x. - Improve performance of contended connect() by 200% by searching for an available 4-tuple under RCU rather than a spin lock. Bring an additional 229% improvement by tweaking hash distribution. - Avoid unconditionally touching sk_tsflags on RX, improving performance under UDP flood by as much as 10%. - Avoid skb_clone() dance in ping_rcv() to improve performance under ping flood. - Avoid FIB lookup in netfilter if socket is available, 20% perf win. - Rework network device creation (in-kernel) API to more clearly identify network namespaces and their roles. There are up to 4 namespace roles but we used to have just 2 netns pointer arguments, interpreted differently based on context. - Use sysfs_break_active_protection() instead of trylock to avoid deadlocks between unregistering objects and sysfs access. - Add a new sysctl and sockopt for capping max retransmit timeout in TCP. - Support masking port and DSCP in routing rule matches. - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST. - Support specifying at what time packet should be sent on AF_XDP sockets. - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users. - Add Netlink YAML spec for WiFi (nl80211) and conntrack. - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols which only need to be exported when IPv6 support is built as a module. - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to normal bridging. - Allow users to specify source port range for GENEVE tunnels. - netconsole: allow attaching kernel release, CPU ID and task name to messages as metadata Driver API: - Continue rework / fixing of Energy Efficient Ethernet (EEE) across the SW layers. Delegate the responsibilities to phylink where possible. Improve its handling in phylib. - Support symmetric OR-XOR RSS hashing algorithm. - Support tracking and preserving IRQ affinity by NAPI itself. - Support loopback mode speed selection for interface selftests. Device drivers: - Remove the IBM LCS driver for s390 - Remove the sb1000 cable modem driver - Add support for SFP module access over SMBus - Add MCTP transport driver for MCTP-over-USB - Enable XDP metadata support in multiple drivers - Ethernet high-speed NICs: - Broadcom (bnxt): - add PCIe TLP Processing Hints (TPH) support for new AMD platforms - support dumping RoCE queue state for debug - opt into instance locking - Intel (100G, ice, idpf): - ice: rework MSI-X IRQ management and distribution - ice: support for E830 devices - iavf: add support for Rx timestamping - iavf: opt into instance locking - nVidia/Mellanox: - mlx4: use page pool memory allocator for Rx - mlx5: support for one PTP device per hardware clock - mlx5: support for 200Gbps per-lane link modes - mlx5: move IPSec policy check after decryption - AMD/Solarflare: - support FW flashing via devlink - Cisco (enic): - use page pool memory allocator for Rx - enable 32, 64 byte CQEs - get max rx/tx ring size from the device - Meta (fbnic): - support flow steering and RSS configuration - report queue stats - support TCP segmentation - support IRQ coalescing - support ring size configuration - Marvell/Cavium: - support AF_XDP - Wangxun: - support for PTP clock and timestamping - Huawei (hibmcge): - checksum offload - add more statistics - Ethernet virtual: - VirtIO net: - aggressively suppress Tx completions, improve perf by 96% with 1 CPU and 55% with 2 CPUs - expose NAPI to IRQ mapping and persist NAPI settings - Google (gve): - support XDP in DQO RDA Queue Format - opt into instance locking - Microsoft vNIC: - support BIG TCP - Ethernet NICs consumer, and embedded: - Synopsys (stmmac): - cleanup Tx and Tx clock setting and other link-focused cleanups - enable SGMII and 2500BASEX mode switching for Intel platforms - support Sophgo SG2044 - Broadcom switches (b53): - support for BCM53101 - TI: - iep: add perout configuration support - icssg: support XDP - Cadence (macb): - implement BQL - Xilinx (axinet): - support dynamic IRQ moderation and changing coalescing at runtime - implement BQL - report standard stats - MediaTek: - support phylink managed EEE - Intel: - igc: don't restart the interface on every XDP program change - RealTek (r8169): - support reading registers of internal PHYs directly - increase max jumbo packet size on RTL8125/RTL8126 - Airoha: - support for RISC-V NPU packet processing unit - enable scatter-gather and support MTU up to 9kB - Tehuti (tn40xx): - support cards with TN4010 MAC and an Aquantia AQR105 PHY - Ethernet PHYs: - support for TJA1102S, TJA1121 - dp83tg720: add randomized polling intervals for link detection - dp83822: support changing the transmit amplitude voltage - support for LEDs on 88q2xxx - CAN: - canxl: support Remote Request Substitution bit access - flexcan: add S32G2/S32G3 SoC - WiFi: - remove cooked monitor support - strict mode for better AP testing - basic EPCS support - OMI RX bandwidth reduction support - batman-adv: add support for jumbo frames - WiFi drivers: - RealTek (rtw88): - support RTL8814AE and RTL8814AU - RealTek (rtw89): - switch using wiphy_lock and wiphy_work - add BB context to manipulate two PHY as preparation of MLO - improve BT-coexistence mechanism to play A2DP smoothly - Intel (iwlwifi): - add new iwlmld sub-driver for latest HW/FW combinations - MediaTek (mt76): - preparation for mt7996 Multi-Link Operation (MLO) support - Qualcomm/Atheros (ath12k): - continued work on MLO - Silabs (wfx): - Wake-on-WLAN support - Bluetooth: - add support for skb TX SND/COMPLETION timestamping - hci_core: enable buffer flow control for SCO/eSCO - coredump: log devcd dumps into the monitor - Bluetooth drivers: - intel: add support to configure TX power - nxp: handle bootloader error during cmd5 and cmd7" * tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits) unix: fix up for "apparmor: add fine grained af_unix mediation" mctp: Fix incorrect tx flow invalidation condition in mctp-i2c net: usb: asix: ax88772: Increase phy_name size net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string net: libwx: fix Tx L4 checksum net: libwx: fix Tx descriptor content for some tunnel packets atm: Fix NULL pointer dereference net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus net: phy: aquantia: add essential functions to aqr105 driver net: phy: aquantia: search for firmware-name in fwnode net: phy: aquantia: add probe function to aqr105 for firmware loading net: phy: Add swnode support to mdiobus_scan gve: add XDP DROP and PASS support for DQ gve: update XDP allocation path support RX buffer posting gve: merge packet buffer size fields gve: update GQ RX to use buf_size gve: introduce config-based allocation for XDP gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics ...
2025-03-26Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (scsi_debug, ufs, lpfc, st, fnic, mpi3mr, mpt3sas) and the removal of cxlflash. The only non-trivial core change is an addition to unit attention handling to recognize UAs for power on/reset and new media so the tape driver can use it" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (107 commits) scsi: st: Tighten the page format heuristics with MODE SELECT scsi: st: ERASE does not change tape location scsi: st: Fix array overflow in st_setup() scsi: target: tcm_loop: Fix wrong abort tag scsi: lpfc: Restore clearing of NLP_UNREG_INP in ndlp->nlp_flag scsi: hisi_sas: Fixed failure to issue vendor specific commands scsi: fnic: Remove unnecessary NUL-terminations scsi: fnic: Remove redundant flush_workqueue() calls scsi: core: Use a switch statement when attaching VPD pages scsi: ufs: renesas: Add initialization code for R-Car S4-8 ES1.2 scsi: ufs: renesas: Add reusable functions scsi: ufs: renesas: Refactor 0x10ad/0x10af PHY settings scsi: ufs: renesas: Remove register control helper function scsi: ufs: renesas: Add register read to remove save/set/restore scsi: ufs: renesas: Replace init data by init code scsi: ufs: dt-bindings: renesas,ufs: Add calibration data scsi: mpi3mr: Task Abort EH Support scsi: storvsc: Don't report the host packet status as the hv status scsi: isci: Make most module parameters static scsi: megaraid_sas: Make most module parameters static ...
2025-03-25Merge tag 'pmdomain-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain updates from Ulf Hansson: "pmdomain core: - Add dev_pm_genpd_rpm_always_on() to support more fine-grained PM pmdomain providers: - arm: Remove redundant state verification for the SCMI PM domain - bcm: Add system-wakeup support for bcm2835 via GENPD_FLAG_ACTIVE_WAKEUP - rockchip: Add support for regulators - rockchip: Use SMC call to properly inform firmware - sunxi: Add V853 ppu support - thead: Add support for RISC-V TH1520 power-domains firmware: - Add support for the AON firmware protocol for RISC-V THEAD cpuidle-psci: - Update section in MAINTAINERS for cpuidle-psci - Add trace support for PSCI domain-idlestates" * tag 'pmdomain-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (29 commits) firmware: thead: add CONFIG_MAILBOX dependency firmware: thead,th1520-aon: Fix use after free in th1520_aon_init() pmdomain: arm: scmi_pm_domain: Remove redundant state verification pmdomain: thead: fix TH1520_AON_PROTOCOL dependency pmdomain: thead: Add power-domain driver for TH1520 dt-bindings: power: Add TH1520 SoC power domains firmware: thead: Add AON firmware protocol driver dt-bindings: firmware: thead,th1520: Add support for firmware node pmdomain: rockchip: add regulator dependency pmdomain: rockchip: add regulator support pmdomain: rockchip: fix rockchip_pd_power error handling pmdomain: rockchip: reduce indentation in rockchip_pd_power pmdomain: rockchip: forward rockchip_do_pmu_set_power_domain errors pmdomain: rockchip: cleanup mutex handling in rockchip_pd_power dt-bindings: power: rockchip: add regulator support pmdomain: rockchip: Fix build error pmdomain: imx: gpcv2: use proper helper for property detection MAINTAINERS: Update section for cpuidle-psci pmdomain: rockchip: Check if SMC could be handled by TA cpuidle: psci: Add trace for PSCI domain idle ...
2025-03-24Merge tag 'sched-core-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Cancel the slice protection of the idle entity (Zihan Zhou) - Reduce the default slice to avoid tasks getting an extra tick (Zihan Zhou) - Force propagating min_slice of cfs_rq when {en,de}queue tasks (Tianchen Ding) - Refactor can_migrate_task() to elimate looping (I Hsin Cheng) - Add unlikey branch hints to several system calls (Colin Ian King) - Optimize current_clr_polling() on certain architectures (Yujun Dong) Deadline scheduler: (Juri Lelli) - Remove redundant dl_clear_root_domain call - Move dl_rebuild_rd_accounting to cpuset.h Uclamp: - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan) - Optimize sched_uclamp_used static key enabling (Xuewen Yan) Scheduler topology support: (Juri Lelli) - Ignore special tasks when rebuilding domains - Add wrappers for sched_domains_mutex - Generalize unique visiting of root domains - Rebuild root domain accounting after every update - Remove partition_and_rebuild_sched_domains - Stop exposing partition_sched_domains_locked RSEQ: (Michael Jeanson) - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y - Fix segfault on registration when rseq_cs is non-zero - selftests: Add rseq syscall errors test - selftests: Ensure the rseq ABI TLS is actually 1024 bytes Membarriers: - Fix redundant load of membarrier_state (Nysal Jan K.A.) Scheduler debugging: - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior) - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar) Fixes and cleanups: - Always save/restore x86 TSC sched_clock() on suspend/resume (Guilherme G. Piccoli) - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian Andrzej Siewior)" * tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling() sched/debug: Remove CONFIG_SCHED_DEBUG sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional sched/debug: Make 'const_debug' tunables unconditional __read_mostly sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() rseq/selftests: Fix namespace collision with rseq UAPI header include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h sched/topology: Stop exposing partition_sched_domains_locked cgroup/cpuset: Remove partition_and_rebuild_sched_domains sched/topology: Remove redundant dl_clear_root_domain call sched/deadline: Rebuild root domain accounting after every update sched/deadline: Generalize unique visiting of root domains sched/topology: Wrappers for sched_domains_mutex sched/deadline: Ignore special tasks when rebuilding domains tracing: Use preempt_model_str() xtensa: Rely on generic printing of preemption model x86: Rely on generic printing of preemption model s390: Rely on generic printing of preemption model ...
2025-03-24Merge tag 'rcu-next-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Boqun Feng: "Documentation: - Add broken-timing possibility to stallwarn.rst - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr() - Document self-propagating callbacks - Point call_srcu() to call_rcu() for detailed memory ordering - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text - Remove references to old grace-period-wait primitives srcu: - Introduce srcu_read_{un,}lock_fast(), which is similar to srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock at the cost of calling synchronize_rcu() in synchronize_srcu() Moreover, by returning the percpu offset of the counter at srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid extra pointer dereferencing, which makes it faster than srcu_read_{un,}lock_lite() srcu_read_{un,}lock_fast() are intended to replace rcu_read_{un,}lock_trace() if possible RCU torture: - Add get_torture_init_jiffies() to return the start time of the test - Add a test_boost_holdoff module parameter to allow delaying boosting tests when building rcutorture as built-in - Add grace period sequence number logging at the beginning and end of failure/close-call results - Switch to hexadecimal for the expedited grace period sequence number in the rcu_exp_grace_period trace point - Make cur_ops->format_gp_seqs take buffer length - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool - Complain when invalid SRCU reader_flavor is specified - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU uses atomics even when percpu ops are NMI safe, and use the Kconfig for SRCU lockdep testing Misc: - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes() - Fix get_state_synchronize_rcu_full() GP-start detection - Move RCU Tasks self-tests to core_initcall() - Print segment lengths in show_rcu_nocb_gp_state() - Make RCU watch ct_kernel_exit_state() warning - Flush console log from kernel_power_off() - rcutorture: Allow a negative value for nfakewriters - rcu: Update TREE05.boot to test normal synchronize_rcu() - rcu: Use _full() API to debug synchronize_rcu() Make RCU handle PREEMPT_LAZY better: - Fix header guard for rcu_all_qs() - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY - Update __cond_resched comment about RCU quiescent states - Handle unstable rdp in rcu_read_unlock_strict() - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y - osnoise: Provide quiescent states - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y combination - Limit PREEMPT_RCU configurations - Make rcutorture senario TREE07 and senario TREE10 use PREEMPT_LAZY=y" * tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits) rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y rcu: limit PREEMPT_RCU configurations rcutorture: Update ->extendables check for lazy preemption rcutorture: Update rcutorture_one_extend_check() for lazy preemption osnoise: provide quiescent states rcu: Use _full() API to debug synchronize_rcu() rcu: Update TREE05.boot to test normal synchronize_rcu() rcutorture: Allow a negative value for nfakewriters Flush console log from kernel_power_off() context_tracking: Make RCU watch ct_kernel_exit_state() warning rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state() rcu-tasks: Move RCU Tasks self-tests to core_initcall() rcu: Fix get_state_synchronize_rcu_full() GP-start detection torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe() srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing rcutorture: Complain when invalid SRCU reader_flavor is specified rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool rcutorture: Make cur_ops->format_gp_seqs take buffer length rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output ...
2025-03-24Merge tag 'sched_ext-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - Add mechanism to count and report internal events. This significantly improves visibility on subtle corner conditions. - The default idle CPU selection logic is revamped and improved in multiple ways including being made topology aware. - sched_ext was disabling ttwu_queue for simplicity, which can be costly when hardware topology is more complex. Implement SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively enable ttwu_queue. - tools/sched_ext updates to improve compatibility among others. - Other misc updates and fixes. - sched_ext/for-6.14-fixes were pulled a few times to receive prerequisite fixes and resolve conflicts. * tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits) sched_ext: idle: Refactor scx_select_cpu_dfl() sched_ext: idle: Honor idle flags in the built-in idle selection policy sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local() sched_ext: Add trace point to track sched_ext core events sched_ext: Change the event type from u64 to s64 sched_ext: Documentation: add task lifecycle summary tools/sched_ext: Provide a compatible helper for scx_bpf_events() selftests/sched_ext: Add NUMA-aware scheduler test tools/sched_ext: Provide consistent access to scx flags sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior sched_ext: idle: Introduce scx_bpf_nr_node_ids() sched_ext: idle: Introduce node-aware idle cpu kfunc helpers sched_ext: idle: Per-node idle cpumasks sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE sched_ext: idle: Make idle static keys private sched/topology: Introduce for_each_node_numadist() iterator mm/numa: Introduce nearest_node_nodemask() nodemask: numa: reorganize inclusion path nodemask: add nodes_copy() tools/sched_ext: Sync with scx repo ...
2025-03-24Merge tag 'slab-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Move the TINY_RCU kvfree_rcu() implementation from RCU to SLAB subsystem and cleanup its integration (Vlastimil Babka) Following the move of the TREE_RCU batching kvfree_rcu() implementation in 6.14, move also the simpler TINY_RCU variant. Refactor the #ifdef guards so that the simple implementation is also used with SLUB_TINY. Remove the need for RCU to recognize fake callback function pointers (__is_kvfree_rcu_offset()) when handling call_rcu() by implementing a callback that calculates the object's address from the embedded rcu_head address without knowing its offset. - Improve kmalloc cache randomization in kvmalloc (GONG Ruiqi) Due to an extra layer of function call, all kvmalloc() allocations used the same set of random caches. Thanks to moving the kvmalloc() implementation to slub.c, this is improved and randomization now works for kvmalloc. - Various improvements to debugging, testing and other cleanups (Hyesoo Yu, Lilith Gkini, Uladzislau Rezki, Matthew Wilcox, Kevin Brodsky, Ye Bin) * tag 'slab-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slub: Handle freelist cycle in on_freelist() mm/slab: call kmalloc_noprof() unconditionally in kmalloc_array_noprof() slab: Mark large folios for debugging purposes kunit, slub: Add test_kfree_rcu_wq_destroy use case mm, slab: cleanup slab_bug() parameters mm: slub: call WARN() when detecting a slab corruption mm: slub: Print the broken data before restoring them slab: Achieve better kmalloc caches randomization in kvmalloc slab: Adjust placement of __kvmalloc_node_noprof mm/slab: simplify SLAB_* flag handling slab: don't batch kvfree_rcu() with SLUB_TINY rcu, slab: use a regular callback function for kvfree_rcu rcu: remove trace_rcu_kvfree_callback slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
2025-03-24sched: Add sched tracepoints for RV task modelGabriele Monaco
Add the following tracepoints: * sched_entry(bool preempt, ip) Called while entering __schedule * sched_exit(bool is_switch, ip) Called while exiting __schedule * sched_set_state(task, curr_state, state) Called when a task changes its state (to and from running) These tracepoints are useful to describe the Linux task model and are adapted from the patches by Daniel Bristot de Oliveira (https://bristot.me/linux-task-model/). Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250305140406.350227-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-22tracing: gfp: vsprintf: Do not print "none" when using %pGg printf formatPetr Mladek
The commit ca29a0bf122145 ("tracing: gfp: Remove duplication of recording GFP flags") caused the following regression in printf_test selftest: [ 46.208199] test_printf: kvasprintf(..., "%pGg", ...) returned 'none|0xfc000000', expected '0xfc000000' [ 46.208209] test_printf: kvasprintf(..., "%pGg", ...) returned '__GFP_HIGH|none|0xfc000000', expected '__GFP_HIGH|0xfc000000' The problem is the new '{ 0, "none" }' entry in __def_gfpflag_names macro and the following code: char *format_flags(char *buf, char *end, unsigned long flags, const struct trace_print_flags *names) { [...] if ((flags & mask) != mask) continue; [...] } The purpose of the code is to print the name of a mask instead of bits, for example, printk "GFP_ZONEMASK", instead of "__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE". Unfortunately, the mask "0" pass this check and "none" is always printed. A solution would be to move TRACE_GFP_FLAGS up so that it is not the last entry. But it breaks the rule that named masks must be defined before names of single bytes. Otherwise, it would print the names of the bytes instead of the mask. Instead, replace '{ 0, "none" }' with '{ 0, NULL }'. It works because __def_gfpflag_names defines a standalone array and this is the standard trailing entry. The code processing these arrays always ends the cycle when flag->name == NULL. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Tamir Duberstein <tamird@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/Z9Q5d11ZbA3CNMZm@pathway.suse.cz Fixes: ca29a0bf122145 ("tracing: gfp: Remove duplication of recording GFP flags") Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-21NFS: Treat ENETUNREACH errors as fatal in containersTrond Myklebust
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the generic NFS client. If the flag is set, the client will receive ENETDOWN and ENETUNREACH errors from the RPC layer, and is expected to treat them as being fatal. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-19sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditionalIngo Molnar
All the big Linux distros enable CONFIG_SCHED_DEBUG, because the various features it provides help not just with kernel development, but with system administration and user-space software development as well. Reflect this reality and enable this functionality unconditionally. Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-4-mingo@kernel.org
2025-03-17mm/page_alloc: add trace event for totalreserve_pages calculationMartin Liu
This commit introduces a new trace event, `mm_calculate_totalreserve_pages`, which reports the new reserve value at the exact time when it takes effect. The `totalreserve_pages` value represents the total amount of memory reserved across all zones and nodes in the system. This reserved memory is crucial for ensuring that critical kernel operations have access to sufficient memory, even under memory pressure. By tracing the `totalreserve_pages` value, developers can gain insights that how the total reserved memory changes over time. Link: https://lkml.kernel.org/r/20250308034606.2036033-4-liumartin@google.com Signed-off-by: Martin Liu <liumartin@google.com> Acked-by: David Rientjes <rientjes@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/page_alloc: add trace event for per-zone lowmem reserve setupMartin Liu
This commit introduces the `mm_setup_per_zone_lowmem_reserve` trace event,which provides detailed insights into the kernel's per-zone lowmem reserve configuration. The trace event provides precise timestamps, allowing developers to 1. Correlate lowmem reserve changes with specific kernel events and able to diagnose unexpected kswapd or direct reclaim behavior triggered by dynamic changes in lowmem reserve. 2. Know memory allocation failures that occur due to insufficient lowmem reserve, by precisely correlating allocation attempts with reserve adjustments. Link: https://lkml.kernel.org/r/20250308034606.2036033-3-liumartin@google.com Signed-off-by: Martin Liu <liumartin@google.com> Acked-by: David Rientjes <rientjes@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/page_alloc: add trace event for per-zone watermark setupMartin Liu
Patch series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages", v2. This patchset introduces tracepoints to track changes in the lowmem reserves, watermarks and totalreserve_pages. This helps to track the exact timing of such changes and understand their relation to reclaim activities. The tracepoints added are: mm_setup_per_zone_lowmem_reserve mm_setup_per_zone_wmarks mm_calculate_totalreserve_pagesi This patch (of 3): This commit introduces the `mm_setup_per_zone_wmarks` trace event, which provides detailed insights into the kernel's per-zone watermark configuration, offering precise timing and the ability to correlate watermark changes with specific kernel events. While `/proc/zoneinfo` provides some information about zone watermarks, this trace event offers: 1. The ability to link watermark changes to specific kernel events and logic. 2. The ability to capture rapid or short-lived changes in watermarks that may be missed by user-space polling 3. Diagnosing unexpected kswapd activity or excessive direct reclaim triggered by rapidly changing watermarks. Link: https://lkml.kernel.org/r/20250308034606.2036033-1-liumartin@google.com Link: https://lkml.kernel.org/r/20250308034606.2036033-2-liumartin@google.com Signed-off-by: Martin Liu <liumartin@google.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17writeback: fix calculations in trace_balance_dirty_pages() for cgwbTang Yizhou
In the commit dcc25ae76eb7 ("writeback: move global_dirty_limit into wb_domain") of the cgroup writeback backpressure propagation patchset, Tejun made some adaptations to trace_balance_dirty_pages() for cgroup writeback. However, this adaptation was incomplete and Tejun missed further adaptation in the subsequent patches. In the cgroup writeback scenario, if sdtc in balance_dirty_pages() is assigned to mdtc, then upon entering trace_balance_dirty_pages(), __entry->limit should be assigned based on the dirty_limit of the corresponding memcg's wb_domain, rather than global_wb_domain. To address this issue and simplify the implementation, introduce a 'limit' field in struct dirty_throttle_control to store the hard_limit value computed in wb_position_ratio() by calling hard_dirty_limit(). This field will then be used in trace_balance_dirty_pages() to assign the value to __entry->limit. Link: https://lkml.kernel.org/r/20250304110318.159567-4-yizhou.tang@shopee.com Fixes: dcc25ae76eb7 ("writeback: move global_dirty_limit into wb_domain") Signed-off-by: Tang Yizhou <yizhou.tang@shopee.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17writeback: rename variables in trace_balance_dirty_pages()Tang Yizhou
Rename bdi_setpoint and bdi_dirty in the tracepoint to wb_setpoint and wb_dirty, respectively. These changes were omitted by Tejun in the cgroup writeback patchset. Link: https://lkml.kernel.org/r/20250304110318.159567-3-yizhou.tang@shopee.com Signed-off-by: Tang Yizhou <yizhou.tang@shopee.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17writeback: let trace_balance_dirty_pages() take struct dtc as parameterTang Yizhou
Patch series "Fix calculations in trace_balance_dirty_pages() for cgwb", v2. In my experiment, I found that the output of trace_balance_dirty_pages() in the cgroup writeback scenario was strange because trace_balance_dirty_pages() always uses global_wb_domain.dirty_limit for related calculations instead of the dirty_limit of the corresponding memcg's wb_domain. The basic idea of the fix is to store the hard dirty limit value computed in wb_position_ratio() into struct dirty_throttle_control and use it for calculations in trace_balance_dirty_pages(). This patch (of 3): Currently, trace_balance_dirty_pages() already has 12 parameters. In the patch #3, I initially attempted to introduce an additional parameter. However, in include/linux/trace_events.h, bpf_trace_run12() only supports up to 12 parameters and bpf_trace_run13() does not exist. To reduce the number of parameters in trace_balance_dirty_pages(), we can make it accept a pointer to struct dirty_throttle_control as a parameter. To achieve this, we need to move the definition of struct dirty_throttle_control from mm/page-writeback.c to include/linux/writeback.h. Link: https://lkml.kernel.org/r/20250304110318.159567-1-yizhou.tang@shopee.com Link: https://lkml.kernel.org/r/20250304110318.159567-2-yizhou.tang@shopee.com Signed-off-by: Tang Yizhou <yizhou.tang@shopee.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Tang Yizhou <yizhou.tang@shopee.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17erofs: implement 48-bit block addressing for unencoded inodesGao Xiang
It adapts the on-disk changes from the previous commit. It also supports EROFS_NULL_ADDR (all 1's) for EROFS_INODE_FLAT_PLAIN inodes to indicate 0-filled inodes, as it's common for composefs use cases. As a result, EROFS_INODE_CHUNK_BASED is no longer needed. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20250310095459.2620647-5-hsiangkao@linux.alibaba.com
2025-03-10afs: Simplify cell record handlingDavid Howells
Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-03-10afs: Fix afs_server ref accountingDavid Howells
The current way that afs_server refs are accounted and cleaned up sometimes cause rmmod to hang when it is waiting for cell records to be removed. The problem is that the cell cleanup might occasionally happen before the server cleanup and then there's nothing that causes the cell to garbage-collect the remaining servers as they become inactive. Partially fix this by: (1) Give each afs_server record its own management timer that rather than relying on the cell manager's central timer to drive each individual cell's maintenance work item to garbage collect servers. This timer is set when afs_unuse_server() reduces a server's activity count to zero and will schedule the server's destroyer work item upon firing. (2) Give each afs_server record its own destroyer work item that removes the record from the cell's database, shuts down the timer, cancels any pending work for itself, sends an RPC to the server to cancel outstanding callbacks. This change, in combination with the timer, obviates the need to try and coordinate so closely between the cell record and a bunch of other server records to try and tear everything down in a coordinated fashion. With this, the cell record is pinned until the server RCU is complete and namespace/module removal will wait until all the cell records are removed. (3) Now that incoming calls are mapped to servers (and thus cells) using data attached to an rxrpc_peer, the UUID-to-server mapping tree is moved from the namespace to the cell (cell->fs_servers). This means there can no longer be duplicates therein - and that allows the mapping tree to be simpler as there doesn't need to be a chain of same-UUID servers that are in different cells. (4) The lock protecting the UUID mapping tree is switched to an rw_semaphore on the cell rather than a seqlock on the namespace as it's now only used during mounting in contexts in which we're allowed to sleep. (5) When it comes time for a cell that is being removed to purge its set of servers, it just needs to iterate over them and wake them up. Once a server becomes inactive, its destroyer work item will observe the state of the cell and immediately remove that record. (6) When a server record is removed, it is marked AFS_SERVER_FL_EXPIRED to prevent reattempts at removal. The record will be dispatched to RCU for destruction once its refcount reaches 0. (7) The AFS_SERVER_FL_UNCREATED/CREATING flags are used to synchronise simultaneous creation attempts. If one attempt fails, it will abandon the attempt and allow another to try again. Note that the record can't just be abandoned when dead as it's bound into a server list attached to a volume and only subject to replacement if the server list obtained for the volume from the VLDB changes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-15-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-11-dhowells@redhat.com/ # v4
2025-03-10afs: Use the per-peer app data provided by rxrpcDavid Howells
Make use of the per-peer application data that rxrpc now allows the application to store on the rxrpc_peer struct to hold a back pointer to the afs_server record that peer represents an endpoint for. Then, when a call comes in to the AFS cache manager, this can be used to map it to the correct server record rather than having to use a UUID-to-server mapping table and having to do an additional lookup. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-14-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-10-dhowells@redhat.com/ # v4
2025-03-10afs: Drop the net parameter from afs_unuse_cell()David Howells
Remove the redundant net parameter to afs_unuse_cell() as cell->net can be used instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-12-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-8-dhowells@redhat.com/ # v4
2025-03-10afs: Make afs_lookup_cell() take a trace noteDavid Howells
Pass a note to be added to the afs_cell tracepoint to afs_lookup_cell() so that different callers can be distinguished. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-11-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-7-dhowells@redhat.com/ # v4
2025-03-10afs: Improve server refcount/active count tracingDavid Howells
Improve server refcount/active count tracing to distinguish between simply getting/putting a ref and using/unusing the server record (which changes the activity count as well as the refcount). This makes it a bit easier to work out what's going on. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-10-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-6-dhowells@redhat.com/ # v4
2025-03-10afs: Improve afs_volume tracing to display a debug IDDavid Howells
Improve the tracing of afs_volume objects to include displaying a debug ID so that different instances of volumes with the same "vid" can be distinguished. Also be consistent about displaying the volume's refcount (and not the cell's). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-9-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-5-dhowells@redhat.com/ # v4
2025-03-10afs: Change dynroot to create contents on demandDavid Howells
Change the AFS dynamic root to do things differently: (1) Rather than having the creation of cell records create inodes and dentries for cell mountpoints, create them on demand during lookup. This simplifies cell management and locking as we no longer have to create these objects in advance *and* on speculative lookup by the user for a cell that isn't precreated. (2) Rather than using the libfs dentry-based readdir (the dentries now no longer exist until accessed from (1)), have readdir generate the contents by reading the list of cells. The @cell symlinks get pushed in positions 2 and 3 if rootcell has been configured. (3) Make the @cell symlink dentries persist for the life of the superblock or until reclaimed, but make cell mountpoints disappear immediately if unused. It's not perfect as someone doing an "ls -l /afs" may create a whole bunch of dentries which will be garbage collected immediately. But any dentry that gets automounted will be pinned by the mount, so it shouldn't be too bad. (4) Allocate the inode numbers for the cell mountpoints from an IDR to prevent duplicates appearing in the event it cycles round. The number allocated from the IDR is doubled to provide two inode numbers - one for the normal cell name (RO) and one for the dotted cell name (RW). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-8-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-4-dhowells@redhat.com/ # v4
2025-03-06tracing: gfp: Remove duplication of recording GFP flagsSteven Rostedt
The gfp_flags when recorded in the trace require being converted from their numbers to values. Various macros are used to help facilitate this, but there's two sets of macros that need to keep track of the same GFP flags to stay in sync. Commit 60295b944ff68 ("tracing: gfp: Fix the GFP enum values shown for user space tracing tools") added a TRACE_GFP_FLAGS macro that holds the enum ___GFP_*_BIT defined bits, and creates the TRACE_DEFINE_ENUM() wrapper around them. The __def_gfpflag_names() macro creates the mapping of various flags or multiple flags to give them human readable names via the __print_flags() tracing helper macro. As the TRACE_GFP_FLAGS is a subset of the __def_gfpflags_names(), it can be used to cover the individual bit names, by redefining the internal macro TRACE_GFP_EM(): #undef TRACE_GFP_EM #define TRACE_GFP_EM(a) gfpflag_string(__GFP_##a), This will remove the bits that are duplicate between the two macros. If a new bit is created, only the TRACE_GFP_FLAGS needs to be updated and that will also update the __def_gfpflags_names() macro. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250225135611.1942b65c@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04sched_ext: Add trace point to track sched_ext core eventsChangwoo Min
Add tracing support to track sched_ext core events (/sched_ext/sched_ext_event). This may be useful for debugging sched_ext schedulers that trigger a particular event. The trace point can be used as other trace points, so it can be used in, for example, `perf trace` and BPF programs, as follows: ====== $> sudo perf trace -e sched_ext:sched_ext_event --filter 'name == "SCX_EV_ENQ_SLICE_DFL"' ====== ====== struct tp_sched_ext_event { struct trace_entry ent; u32 __data_loc_name; s64 delta; }; SEC("tracepoint/sched_ext/sched_ext_event") int rtp_add_event(struct tp_sched_ext_event *ctx) { char event_name[128]; unsigned short offset = ctx->__data_loc_name & 0xFFFF; bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset); bpf_printk("name %s delta %lld", event_name, ctx->delta); return 0; } ====== Signed-off-by: Changwoo Min <changwoo@igalia.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-28pmdomain: Merge tag 'v6.14-rc4' from Linus into nextUlf Hansson
Linux 6.14-rc4
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27Merge tag 'net-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. We didn't get netfilter or wireless PRs this week, so next week's PR is probably going to be bigger. A healthy dose of fixes for bugs introduced in the current release nonetheless. Current release - regressions: - Bluetooth: always allow SCO packets for user channel - af_unix: fix memory leak in unix_dgram_sendmsg() - rxrpc: - remove redundant peer->mtu_lock causing lockdep splats - fix spinlock flavor issues with the peer record hash - eth: iavf: fix circular lock dependency with netdev_lock - net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net() RDMA driver register notifier after the device Current release - new code bugs: - ethtool: fix ioctl confusing drivers about desired HDS user config - eth: ixgbe: fix media cage present detection for E610 device Previous releases - regressions: - loopback: avoid sending IP packets without an Ethernet header - mptcp: reset connection when MPTCP opts are dropped after join Previous releases - always broken: - net: better track kernel sockets lifetime - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT - eth: enetc: number of error handling fixes - dsa: rtl8366rb: reshuffle the code to fix config / build issue with LED support" * tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: ti: icss-iep: Reject perout generation request idpf: fix checksums set in idpf_rx_rsc() selftests: drv-net: Check if combined-count exists net: ipv6: fix dst ref loop on input in rpl lwt net: ipv6: fix dst ref loop on input in seg6 lwt usbnet: gl620a: fix endpoint checking in genelink_bind() net/mlx5: IRQ, Fix null string in debug print net/mlx5: Restore missing trace event when enabling vport QoS net/mlx5: Fix vport QoS cleanup on error net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. af_unix: Fix memory leak in unix_dgram_sendmsg() net: Handle napi_schedule() calls from non-interrupt net: Clear old fragment checksum value in napi_reuse_skb gve: unlink old napi when stopping a queue using queue API net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net(). tcp: Defer ts_recent changes until req is owned net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() net: enetc: remove the mm_lock from the ENETC v4 driver net: enetc: add missing enetc4_link_deinit() net: enetc: update UDP checksum when updating originTimestamp field ...
2025-02-26trace/osnoise: Add trace events for samplesTomas Glozar
Add trace events that fire at osnoise and timerlat sample generation, in addition to the already existing noise and threshold events. This allows processing the samples directly in the kernel, either with ftrace triggers or with BPF. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250203090418.1458923-1-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Tested-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-26Merge tag 'nfs-for-6.14-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Stable Fixes: - O_DIRECT writes should adjust file length Other Bugfixes: - Adjust delegated timestamps for O_DIRECT reads and writes - Prevent looping due to rpc_signal_task() races - Fix a deadlock when recovering state on a sillyrenamed file - Properly handle -ETIMEDOUT errors from tlshd - Suppress build warnings for unused procfs functions - Fix memory leak of lsm_contexts" * tag 'nfs-for-6.14-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: lsm,nfs: fix memory leak of lsm_context sunrpc: suppress warnings for unused procfs functions SUNRPC: Handle -ETIMEDOUT return from tlshd NFSv4: Fix a deadlock when recovering state on a sillyrenamed file SUNRPC: Prevent looping due to rpc_signal_task() races NFS: Adjust delegated timestamps for O_DIRECT reads and writes NFS: O_DIRECT writes must check and adjust the file length
2025-02-24Merge patch series "Initial support for RK3576 UFS controller"Martin K. Petersen
Shawn Lin <shawn.lin@rock-chips.com> says: This patchset adds initial UFS controller supprt for RK3576 SoC. Patch 1 is the dt-bindings. Patch 2-4 deal with rpm and spm support in advanced suggested by Ulf. Patch 5 exports two new APIs for host driver. Patch 6 and 7 are the host driver and dtsi support. Link: https://lore.kernel.org/r/1738736156-119203-1-git-send-email-shawn.lin@rock-chips.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-21afs: Give an afs_server object a ref on the afs_cell object it points toDavid Howells
Give an afs_server object a ref on the afs_cell object it points to so that the cell doesn't get deleted before the server record. Whilst this is circular (cell -> vol -> server_list -> server -> cell), the ref only pins the memory, not the lifetime as that's controlled by the activity counter. When the volume's activity counter reaches 0, it detaches from the cell and discards its server list; when a cell's activity counter reaches 0, it discards its root volume. At that point, the circularity is cut. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250218192250.296870-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc4). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19SUNRPC: Prevent looping due to rpc_signal_task() racesTrond Myklebust
If rpc_signal_task() is called while a task is in an rpc_call_done() callback function, and the latter calls rpc_restart_call(), the task can end up looping due to the RPC_TASK_SIGNALLED flag being set without the tk_rpc_status being set. Removing the redundant mechanism for signalling the task fixes the looping behaviour. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-02-18trace: tcp: Add tracepoint for tcp_cwnd_reduction()Breno Leitao
Add a lightweight tracepoint to monitor TCP congestion window adjustments via tcp_cwnd_reduction(). This tracepoint enables tracking of: - TCP window size fluctuations - Active socket behavior - Congestion window reduction events Meta has been using BPF programs to monitor this function for years. Adding a proper tracepoint provides a stable API for all users who need to monitor TCP congestion window behavior. Use DECLARE_TRACE instead of TRACE_EVENT to avoid creating trace event infrastructure and exporting to tracefs, keeping the implementation minimal. (Thanks Steven Rostedt) Given that this patch creates a rawtracepoint, you could hook into it using regular tooling, like bpftrace, using regular rawtracepoint infrastructure, such as: rawtracepoint:tcp_cwnd_reduction_tp { .... } Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250214-cwnd_tracepoint-v2-1-ef8d15162d95@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-17cpuidle: psci: Add trace for PSCI domain idleKeita Morisaki
The trace event cpu_idle provides insufficient information for debugging PSCI requests due to lacking access to determined PSCI domain idle states. The cpu_idle usually only shows -1, 0, or 1 regardless how many idle states the power domain has. Add new trace events namely psci_domain_idle_enter and psci_domain_idle_exit to trace enter and exit events with a determined idle state. These new trace events will help developers debug CPUidle issues on ARM systems using PSCI by providing more detailed information about the requested idle states. Signed-off-by: Keita Morisaki <keyz@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20250210055828.1875372-1-keyz@google.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-02-13netfs: Fix a number of read-retry hangsDavid Howells
Fix a number of hangs in the netfslib read-retry code, including: (1) netfs_reissue_read() doubles up the getting of references on subrequests, thereby leaking the subrequest and causing inode eviction to wait indefinitely. This can lead to the kernel reporting a hang in the filesystem's evict_inode(). Fix this by removing the get from netfs_reissue_read() and adding one to netfs_retry_read_subrequests() to deal with the one place that didn't double up. (2) The loop in netfs_retry_read_subrequests() that retries a sequence of failed subrequests doesn't record whether or not it retried the one that the "subreq" pointer points to when it leaves the loop. It may not if renegotiation/repreparation of the subrequests means that fewer subrequests are needed to span the cumulative range of the sequence. Because it doesn't record this, the piece of code that discards now-superfluous subrequests doesn't know whether it should discard the one "subreq" points to - and so it doesn't. Fix this by noting whether the last subreq it examines is superfluous and if it is, then getting rid of it and all subsequent subrequests. If that one one wasn't superfluous, then we would have tried to go round the previous loop again and so there can be no further unretried subrequests in the sequence. (3) netfs_retry_read_subrequests() gets yet an extra ref on any additional subrequests it has to get because it ran out of ones it could reuse to to renegotiation/repreparation shrinking the subrequests. Fix this by removing that extra ref. (4) In netfs_retry_reads(), it was using wait_on_bit() to wait for NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the sequence - but netfs_read_subreq_terminated() is now using a wait queue on the request instead and so this wait will never finish. Fix this by waiting on the wait queue instead. To make this work, a new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell the wake-up code to wake up the wait queue rather than requeuing the request's work item. Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is no longer used. (5) Whilst not strictly anything to do with the hang, netfs_retry_read_subrequests() was also doubly incrementing the subreq_counter and re-setting the debug index, leaving a gap in the trace. This is also fixed. One of these hangs was observed with 9p and with cifs. Others were forced by manual code injection into fs/afs/file.c. Firstly, afs_prepare_read() was created to provide an changing pattern of maximum subrequest sizes: static int afs_prepare_read(struct netfs_io_subrequest *subreq) { struct netfs_io_request *rreq = subreq->rreq; if (!S_ISREG(subreq->rreq->inode->i_mode)) return 0; if (subreq->retry_count < 20) rreq->io_streams[0].sreq_max_len = umax(200, 2222 - subreq->retry_count * 40); else rreq->io_streams[0].sreq_max_len = 3333; return 0; } and pointed to by afs_req_ops. Then the following: struct netfs_io_subrequest *subreq = op->fetch.subreq; if (subreq->error == 0 && S_ISREG(subreq->rreq->inode->i_mode) && subreq->retry_count < 20) { subreq->transferred = subreq->already_done; __clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags); __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags); afs_fetch_data_notify(op); return; } was inserted into afs_fetch_data_success() at the beginning and struct netfs_io_subrequest given an extra field, "already_done" that was set to the value in "subreq->transferred" by netfs_reissue_read(). When reading a 4K file, the subrequests would get gradually smaller, a new subrequest would be allocated around the 3rd retry and then eventually be rendered superfluous when the 20th retry was hit and the limit on the first subrequest was eased. Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250212222402.3618494-2-dhowells@redhat.com Tested-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Steve French <stfrench@microsoft.com> cc: Ihor Solodrai <ihor.solodrai@pm.me> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: v9fs@lists.linux.dev cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12scsi: usb: Rename the RESERVE and RELEASE constantsBart Van Assche
The names RESERVE and RELEASE are not only used in <scsi/scsi_proto.h> but also elsewhere in the kernel: $ git grep -nHE 'define[[:blank:]]*(RESERVE|RELEASE)[[:blank:]]' drivers/input/joystick/walkera0701.c:13:#define RESERVE 20000 drivers/s390/char/tape_std.h:56:#define RELEASE 0xD4 /* 3420 NOP, 3480 REJECT */ drivers/s390/char/tape_std.h:58:#define RESERVE 0xF4 /* 3420 NOP, 3480 REJECT */ Additionally, while the names of the symbolic constants RESERVE_10 and RELEASE_10 include the command length, the command length is not included in the RESERVE and RELEASE names. Address both issues by renaming the RESERVE and RELEASE constants into RESERVE_6 and RELEASE_6 respectively. Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250210205031.2970833-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-05rcu: Trace expedited grace-period numbers in hexadecimalPaul E. McKenney
This commit reformats the expedited grace-period numbers into hexadecimal for easier decoding and comparison. The normal grace-period numbers remain in decimal for the time being. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05rcu: remove trace_rcu_kvfree_callbackVlastimil Babka
Tree RCU does not handle kvfree_rcu() by queueing individual objects by call_rcu() anymore, thus the tracepoint and associated __is_kvfree_rcu_offset() check is dead code now. Remove it. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-02-04rxrpc: Fix the rxrpc_connection attend queue handlingDavid Howells
The rxrpc_connection attend queue is never used because conn::attend_link is never initialised and so is always NULL'd out and thus always appears to be busy. This requires the following fix: (1) Fix this the attend queue problem by initialising conn::attend_link. And, consequently, two further fixes for things masked by the above bug: (2) Fix rxrpc_input_conn_event() to handle being invoked with a NULL sk_buff pointer - something that can now happen with the above change. (3) Fix the RXRPC_SKB_MARK_SERVICE_CONN_SECURED message to carry a pointer to the connection and a ref on it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jakub Kicinski <kuba@kernel.org> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Paolo Abeni <pabeni@redhat.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Fixes: f2cce89a074e ("rxrpc: Implement a mechanism to send an event notification to a connection") Link: https://patch.msgid.link/20250203110307.7265-3-dhowells@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-27Merge tag 'f2fs-for-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this series, there are several major improvements such as folio conversion by Matthew, speed-up of block truncation, and caching more dentry pages. In addition, we implemented a linear dentry search to address recent unicode regression, and figured out some false alarms that we could get rid of. Enhancements: - foilio conversion in various IO paths - optimize f2fs_truncate_data_blocks_range() - cache more dentry pages - remove unnecessary blk_finish_plug - procfs: show mtime in segment_bits Bug fixes: - introduce linear search for dentries - don't call block truncation for aliased file - fix using wrong 'submitted' value in f2fs_write_cache_pages - fix to do sanity check correctly on i_inline_xattr_size - avoid trying to get invalid block address - fix inconsistent dirty state of atomic file" * tag 'f2fs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits) f2fs: fix inconsistent dirty state of atomic file f2fs: fix to avoid changing 'check only' behaior of recovery f2fs: Clean up the loop outside of f2fs_invalidate_blocks() f2fs: procfs: show mtime in segment_bits f2fs: fix to avoid return invalid mtime from f2fs_get_section_mtime() f2fs: Fix format specifier in sanity_check_inode() f2fs: avoid trying to get invalid block address f2fs: fix to do sanity check correctly on i_inline_xattr_size f2fs: remove blk_finish_plug f2fs: Optimize f2fs_truncate_data_blocks_range() f2fs: fix using wrong 'submitted' value in f2fs_write_cache_pages f2fs: add parameter @len to f2fs_invalidate_blocks() f2fs: update_sit_entry_for_release() supports consecutive blocks. f2fs: introduce update_sit_entry_for_release/alloc() f2fs: don't call block truncation for aliased file f2fs: Introduce linear search for dentries f2fs: add parameter @len to f2fs_invalidate_internal_cache() f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range() f2fs: ensure that node info flags are always initialized f2fs: The GC triggered by ioctl also needs to mark the segno as victim ...
2025-01-26Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-26Merge tag 'trace-tools-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull rv and tools/rtla updates from Steven Rostedt: - Add a test suite to test the tool Add a small test suite that can be used to test rtla's basic features to at least have something to test when applying changes. - Automate manual steps in monitor creation While creating a new monitor in RV, besides generating code from dot2k, there are a few manual steps which can be tedious and error prone, like adding the tracepoints, makefile lines and kconfig, or selecting events that start the monitor in the initial state. Updates were made to try and automate as much as possible among those steps to make creating a new RV monitor much quicker. It is still requires to select proper tracepoints, this step is harder to automate in a general way and, in several cases, would still need user intervention. - Have rtla timerlat hist and top set OSNOISE_WORKLOAD flag Have both rtla-timerlat-hist and rtla-timerlat-top set OSNOISE_WORKLOAD to the proper value ("on" when running with -k, "off" when running with -u) every time the option is available instead of setting it only when running with -u. This prevents rtla timerlat -k from giving no results when NO_OSNOISE_WORKLOAD is set, either manually or by an abnormally exited earlier run of rtla timerlat -u. - Stop rtla timerlat on signal properly when overloaded There is an issue where if rtla is run on machines with a high number of CPUs (100+), timerlat can generate more samples than rtla is able to process via tracefs_iterate_raw_events. This is especially common when the interval is set to 100us (rteval and cyclictest default) as opposed to the rtla default of 1000us, but also happens with the rtla default. Currently, this leads to rtla hanging and having to be terminated with SIGTERM. SIGINT setting stop_tracing is not enough, since more and more events are coming and tracefs_iterate_raw_events never exits. To fix this: Stop the timerlat tracer on SIGINT/SIGALRM to ensure no more events are generated when rtla is supposed to exit. Also on receiving SIGINT/SIGALRM twice, abort iteration immediately with tracefs_iterate_stop, making rtla exit right away instead of waiting for all events to be processed. - Account for missed events Due to tracefs buffer overflow, it can happen that rtla misses events, making the tracing results inaccurate. Count both the number of missed events and the total number of processed events, and display missed events as well as their percentage. The numbers are displayed for both osnoise and timerlat, even though for the earlier, missed events are generally not expected. For hist, the number is displayed at the end of the run; for top, it is displayed on each printing of the top table. - Changes to make osnoise more robust There was a dependency in the code that the first field of the osnoise_tool structure was the trace field. If that that ever changed, then the code work break. Change the code to encapsulate this dependency where the code that uses the structure does not have this dependency. * tag 'trace-tools-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits) rtla: Report missed event count rtla: Add function to report missed events rtla: Count all processed events rtla: Count missed trace events tools/rtla: Add osnoise_trace_is_off() rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads rtla/osnoise: Distinguish missing workload option rtla/timerlat_top: Abort event processing on second signal rtla/timerlat_hist: Abort event processing on second signal rtla/timerlat_top: Stop timerlat tracer on signal rtla/timerlat_hist: Stop timerlat tracer on signal rtla: Add trace_instance_stop tools/rtla: Add basic test suite verification/dot2k: Implement event type detection verification/dot2k: Auto patch current kernel source verification/dot2k: Simplify manual steps in monitor creation rv: Simplify manual steps in monitor creation verification/dot2k: Add support for name and description options verification/dot2k: More robust template variables ...