summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-24rv: Add sco and tss per-cpu monitorsGabriele Monaco
Add 2 per-cpu monitors as part of the sched model: * sco: scheduling context operations Monitor to ensure sched_set_state happens only in thread context * tss: task switch while scheduling Monitor to ensure sched_switch happens only in scheduling context To: Ingo Molnar <mingo@redhat.com> To: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/20250305140406.350227-4-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24rv: Add option for nested monitors and include schedGabriele Monaco
Monitors describing complex systems, such as the scheduler, can easily grow to the point where they are just hard to understand because of the many possible state transitions. Often it is possible to break such descriptions into smaller monitors, sharing some or all events. Enabling those smaller monitors concurrently is, in fact, testing the system as if we had one single larger monitor. Splitting models into multiple specification is not only easier to understand, but gives some more clues when we see errors. Add the possibility to create container monitors, whose only purpose is to host other nested monitors. Enabling a container monitor enables all nested ones, but it's still possible to enable nested monitors independently. Add the sched monitor as first container, for now empty. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250305140406.350227-3-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24sched: Add sched tracepoints for RV task modelGabriele Monaco
Add the following tracepoints: * sched_entry(bool preempt, ip) Called while entering __schedule * sched_exit(bool is_switch, ip) Called while exiting __schedule * sched_set_state(task, curr_state, state) Called when a task changes its state (to and from running) These tracepoints are useful to describe the Linux task model and are adapted from the patches by Daniel Bristot de Oliveira (https://bristot.me/linux-task-model/). Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/20250305140406.350227-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24Merge branch 'bnxt_en-fix-max_skb_frags-30'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Fix MAX_SKB_FRAGS > 30 The driver was written with the assumption that MAX_SKB_FRAGS could not exceed what the NIC can support. About 2 years ago, CONFIG_MAX_SKB_FRAGS was added. The value can exceed what the NIC can support and it may cause TX timeout. These 2 patches will fix the issue. ==================== Link: https://patch.msgid.link/20250321211639.3812992-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24bnxt_en: Linearize TX SKB if the fragments exceed the maxMichael Chan
If skb_shinfo(skb)->nr_frags excceds what the chip can support, linearize the SKB and warn once to let the user know. net.core.max_skb_frags can be lowered, for example, to avoid the issue. Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321211639.3812992-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24bnxt_en: Mask the bd_cnt field in the TX BD properlyMichael Chan
The bd_cnt field in the TX BD specifies the total number of BDs for the TX packet. The bd_cnt field has 5 bits and the maximum number supported is 32 with the value 0. CONFIG_MAX_SKB_FRAGS can be modified and the total number of SKB fragments can approach or exceed the maximum supported by the chip. Add a macro to properly mask the bd_cnt field so that the value 32 will be properly masked and set to 0 in the bd_cnd field. Without this patch, the out-of-range bd_cnt value will corrupt the TX BD and may cause TX timeout. The next patch will check for values exceeding 32. Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321211639.3812992-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24tracing: Do not use PERF enums when perf is not definedSteven Rostedt
An update was made to up the module ref count when a synthetic event is registered for both trace and perf events. But if perf is not configured in, the perf enums used will cause the kernel to fail to build. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Link: https://lore.kernel.org/20250323152151.528b5ced@batman.local.home Fixes: 21581dd4e7ff ("tracing: Ensure module defining synth event cannot be unloaded while tracing") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503232230.TeREVy8R-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24net: introduce per netns packet chainsPaolo Abeni
Currently network taps unbound to any interface are linked in the global ptype_all list, affecting the performance in all the network namespaces. Add per netns ptypes chains, so that in the mentioned case only the netns owning the packet socket(s) is affected. While at that drop the global ptype_all list: no in kernel user registers a tap on "any" type without specifying either the target device or the target namespace (and IMHO doing that would not make any sense). Note that this adds a conditional in the fast path (to check for per netns ptype_specific list) and increases the dataset size by a cacheline (owing the per netns lists). Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumaze@google.com> Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24tty: caif: removed unused function debugfs_tx()Simon Horman
Remove debugfs_tx() which was added when the caif driver was added in commit 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)") but it has never been used. Flagged by LLVM 19.1.7 W=1 builds. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250320-caif-debugfs-tx-v1-1-be5654770088@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: ethernet: Drop unused of_gpio.hPeng Fan
of_gpio.h is deprecated. Since there is no of_gpio_x API, drop unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if no user in driver. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: phy: fixed_phy: transition to the faux device interfaceSudeep Holla
The net fixed phy driver does not require the creation of a platform device. Originally, this approach was chosen for simplicity when the driver was first implemented. With the introduction of the lightweight faux device interface, we now have a more appropriate alternative. Migrate the device to utilize the faux bus, given that the platform device it previously created was not a real one anyway. This will get rid of the fake platform device. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/20250319135209.2734594-1-sudeep.holla@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'mlx5-cleanups-2025-03-19'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 cleanups 2025-03-19 This series contains small cleanups to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/1742412199-159596-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Always select CONFIG_PAGE_POOL_STATSTariq Toukan
Always set PAGE_POOL_STATS in mlx5 Eth driver. Cleanup the corresponding #ifdefs. Page pool stats are essential to monitor and analyze RX performance. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Use right API to free bitmap memoryMark Zhang
Use bitmap_free() to free memory allocated with bitmap_zalloc_node(). This fixes memtrack error: mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5: Remove NULL check before dev_{put, hold}Gal Pressman
Fix coccinelle warnings: WARNING: NULL check before dev_{put, hold} functions is not needed. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: phylink: Remove unused function pointer from phylink structureAlexander Duyck
From what I can tell the .get_fixed_state pointer in the phylink structure hasn't been used since commit 5c05c1dbb177 ("net: phylink, dsa: eliminate phylink_fixed_state_cb()") . Since I can't find any users for it we might as well just drop the pointer. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/174240634772.1745174.5690351737682751849.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24netpoll: Eliminate redundant assignmentBreno Leitao
The assignment of zero to udph->check is unnecessary as it is immediately overwritten in the subsequent line. Remove the redundant assignment. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250319-netpoll_nit-v1-1-a7faac5cbd92@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge tag 'kernel-6.15-rc1.tasklist_lock' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull tasklist_lock optimizations from Christian Brauner: "According to the performance testbots this brings a 23% performance increase when creating new processes: - Reduce tasklist_lock hold time on exit: - Perform add_device_randomness() without tasklist_lock - Perform free_pid() calls outside of tasklist_lock - Drop irq disablement around pidmap_lock - Add some tasklist_lock asserts - Call flush_sigqueue() lockless by changing release_task() - Don't pointlessly clear TIF_SIGPENDING in __exit_signal() -> clear_tsk_thread_flag()" * tag 'kernel-6.15-rc1.tasklist_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pid: drop irq disablement around pidmap_lock pid: perform free_pid() calls outside of tasklist_lock pid: sprinkle tasklist_lock asserts exit: hoist get_pid() in release_task() outside of tasklist_lock exit: perform add_device_randomness() without tasklist_lock exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING) exit: change the release_task() paths to call flush_sigqueue() lockless
2025-03-24net/mlx5e: Fix ethtool -N flow-type ip4 to RSS contextMaxim Mikityanskiy
There commands can be used to add an RSS context and steer some traffic into it: # ethtool -X eth0 context new New RSS context is 1 # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 Added rule with ID 1023 However, the second command fails with EINVAL on mlx5e: # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 rmgr: Cannot insert RX class rule: Invalid argument Cannot insert classification rule It happens when flow_get_tirn calls flow_type_to_traffic_type with flow_type = IP_USER_FLOW or IPV6_USER_FLOW. That function only handles IPV4_FLOW and IPV6_FLOW cases, but unlike all other cases which are common for hash and spec, IPv4 and IPv6 defines different contants for hash and for spec: #define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */ #define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */ ... #define IPV4_USER_FLOW 0x0d /* spec only (usr_ip4_spec) */ #define IP_USER_FLOW IPV4_USER_FLOW #define IPV6_USER_FLOW 0x0e /* spec only (usr_ip6_spec; nfc only) */ #define IPV4_FLOW 0x10 /* hash only */ #define IPV6_FLOW 0x11 /* hash only */ Extend the switch in flow_type_to_traffic_type to support both, which fixes the failing ethtool -N command with flow-type ip4 or ip6. Fixes: 248d3b4c9a39 ("net/mlx5e: Support flow classification into RSS contexts") Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250319124508.3979818-1-maxim@isovalent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24ipe: policy_fs: fix kernel-doc warningsRandy Dunlap
Use the "struct" keyword in kernel-doc when describing struct ipefs_file. Add kernel-doc for the struct members also. Don't use kernel-doc notation for 'policy_subdir'. kernel-doc does not support documentation comments for data definitions. This eliminates multiple kernel-doc warnings: security/ipe/policy_fs.c:21: warning: cannot understand function prototype: 'struct ipefs_file ' security/ipe/policy_fs.c:407: warning: cannot understand function prototype: 'const struct ipefs_file policy_subdir[] = ' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Fan Wu <wufan@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: Fan Wu <wufan@kernel.org>
2025-03-24Merge tag 'vfs-6.15-rc1.rust' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs rust updates from Christian Brauner: "This contains minor fixes and improvements to rust file bindings: - Optimize rust symbol generation for FileDescriptorReservation - Optimize rust symbol generation for SeqFile" * tag 'vfs-6.15-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: rust: optimize rust symbol generation for SeqFile rust: file: optimize rust symbol generation for FileDescriptorReservation
2025-03-24net: stmmac: Call xpcs_config_eee_mult_fact() only when xpcs is presentMaxime Chevallier
Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs. Don't call xpcs_config_eee_mult_fact() in this case, as this causes a crash at init : Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write [...] Call trace: xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244 stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc socfpga_dwmac_probe from platform_probe+0x5c/0xb0 Fixes: 060fb27060e8 ("net: stmmac: call xpcs_config_eee_mult_fact()") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250321103502.1303539-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge tag 'vfs-6.15-rc1.file' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs file handling updates from Christian Brauner: "This contains performance improvements for struct file's new refcount mechanism and various other performance work: - The stock kernel transitioning the file to no refs held penalizes the caller with an extra atomic to block any increments. For cases where the file is highly likely to be going away this is easily avoidable. Add file_ref_put_close() to better handle the common case where closing a file descriptor also operates on the last reference and build fput_close_sync() and fput_close() on top of it. This brings about 1% performance improvement by eliding one atomic in the common case. - Predict no error in close() since the vast majority of the time system call returns 0. - Reduce the work done in fdget_pos() by predicting that the file was found and by explicitly comparing the reference count to one and ignoring the dead zone" * tag 'vfs-6.15-rc1.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: reduce work in fdget_pos() fs: use fput_close() in path_openat() fs: use fput_close() in filp_close() fs: use fput_close_sync() in close() file: add fput and file_ref_put routines optimized for use when closing a fd fs: predict no error in close()
2025-03-24Merge tag 'vfs-6.15-rc1.orangefs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs orangefs updates from Christian Brauner: "This contains the work to remove orangefs_writepage() and partially convert it to folios. A few regular bugfixes are included as well" * tag 'vfs-6.15-rc1.orangefs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: orangefs: Convert orangefs_writepages to contain an array of folios orangefs: Simplify bvec setup in orangefs_writepages_work() orangefs: Unify error & success paths in orangefs_writepages_work() orangefs: Pass mapping to orangefs_writepages_work() orangefs: Convert orangefs_writepage_locked() to take a folio orangefs: Remove orangefs_writepage() orangefs: make open_for_read and open_for_write boolean orangefs: Move s_kmod_keyword_mask_map to orangefs-debugfs.c orangefs: Do not truncate file size
2025-03-24Merge tag 'vfs-6.15-rc1.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs afs updates from Christian Brauner: "This contains the work for afs for this cycle: - Fix an occasional hang that's only really encountered when rmmod'ing the kafs module - Remove the "-o autocell" mount option. This is obsolete with the dynamic root and removing it makes the next patch slightly easier - Change how the dynamic root mount is constructed. Currently, the root directory is (de)populated when it is (un)mounted if there are cells already configured and, further, pairs of automount points have to be created/removed each time a cell is added/deleted This is changed so that readdir on the root dir lists all the known cell automount pairs plus the @cell symlinks and the inodes and dentries are constructed by lookup on demand. This simplifies the cell management code - A few improvements to the afs_volume and afs_server tracepoints - Pass trace info into the afs_lookup_cell() function to allow the trace log to indicate the purpose of the lookup - Remove the 'net' parameter from afs_unuse_cell() as it's superfluous - In rxrpc, allow a kernel app (such as kafs) to store a word of information on rxrpc_peer records - Use the information stored on the rxrpc_peer record to point to the afs_server record. This allows the server address lookup to be done away with - Simplify the afs_server ref/activity accounting to make each one self-contained and not garbage collected from the cell management work item - Simplify the afs_cell ref/activity accounting to make each one of these also self-contained and not driven by a central management work item The current code was intended to make it such that a single timer for the namespace and one work item per cell could do all the work required to maintain these records. This, however, made for some sequencing problems when cleaning up these records. Further, the attempt to pass refs along with timers and work items made getting it right rather tricky when the timer or work item already had a ref attached and now a ref had to be got rid of" * tag 'vfs-6.15-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Simplify cell record handling afs: Fix afs_server ref accounting afs: Use the per-peer app data provided by rxrpc rxrpc: Allow the app to store private data on peer structs afs: Drop the net parameter from afs_unuse_cell() afs: Make afs_lookup_cell() take a trace note afs: Improve server refcount/active count tracing afs: Improve afs_volume tracing to display a debug ID afs: Change dynroot to create contents on demand afs: Remove the "autocell" mount option
2025-03-24dt-bindings: PCI: Add common schema for devices accessible through PCI BARsAndrea della Porta
Common YAML schema for devices that exports internal peripherals through PCI BARs. The BARs are exposed as simple-buses through which the peripherals can be accessed. This is not intended to be used as a standalone binding, but should be included by device specific bindings. Signed-off-by: Andrea della Porta <andrea.porta@suse.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: fix typo] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/096ab7addb39e498e28ac2526c07157cc9327c42.1742418429.git.andrea.porta@suse.com
2025-03-24PCI: intel-gw: Remove intel_pcie_cpu_addr()Frank Li
Remove intel_pcie_cpu_addr(), the .cpu_addr_fixup() method, because the dwc core driver already handles address translation based on the devicetree description. [bhelgaas: this does require a minor dts change, but maintainer Lei Chuan Hua <lchuanhua@maxlinear.com> confirms that the driver is only used internally to Maxlinear and internal users will update dts: https://lore.kernel.org/r/BY3PR19MB507667CE7531D863E1E5F8AEBDD82@BY3PR19MB5076.namprd19.prod.outlook.com] Link: https://lore.kernel.org/r/20250305-intel-v1-1-40db3a685490@nxp.com Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24PCI: imx6: Remove imx_pcie_cpu_addr_fixup()Frank Li
Remove imx_pcie_cpu_addr_fixup, the .cpu_addr_fixup() method, because the dwc core driver already handles address translation based on the devicetree description. Link: https://lore.kernel.org/r/20250315201548.858189-14-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
2025-03-24PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()Frank Li
We know the parent_bus_offset, either computed from a DT reg property (the offset is the CPU physical addr - the 'config'/'addr_space' address on the parent bus) or from a .cpu_addr_fixup() (which may have used a host bridge window offset). Apply that parent_bus_offset instead of calling .cpu_addr_fixup() when programming the ATU. This assumes all intermediate addresses are at the same offset from the CPU physical addresses. [bhelgaas: commit log] Link: https://lore.kernel.org/r/20250315201548.858189-13-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24PCI: dwc: ep: Ensure proper iteration over outbound map windowsFrank Li
Most systems' PCIe outbound map windows have non-zero physical addresses, but the possibility of encountering zero increased after following commit ("PCI: dwc: Use parent_bus_offset"). 'ep->outbound_addr[n]', representing 'parent_bus_address', might be 0 on some hardware, which trims high address bits through bus fabric before sending to the PCIe controller. Replace the iteration logic with 'for_each_set_bit()' to ensure only allocated map windows are iterated when determining the ATU index from a given address. Link: https://lore.kernel.org/r/20250315201548.858189-12-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offsetFrank Li
Endpoint ┌───────────────────────────────────────────────┐ │ pcie-ep@5f010000 │ │ ┌────────────────┐│ │ │ Endpoint ││ │ │ PCIe ││ │ │ Controller ││ │ bus@5f000000 │ ┌────────► │ ┌──────────┐ │ │ ││dynamically │ │ │ Outbound Transfer │ ││allocated │┌─────┐ │ Bus ┼─────►│ ATU ───────┘ ││PCI Addr ││ │ │ Fabric │Bus │ ││ ││ CPU ├───►│ │Addr │ ││ ││ │CPU │ │0x8000_0000 ││ │└─────┘Addr└──────────┘ │ ││ │ 0x7000_0000 └────────────────┘│ └───────────────────────────────────────────────┘ bus@5f000000 { compatible = "simple-bus"; ranges = <0x80000000 0x0 0x70000000 0x10000000>; pcie-ep@5f010000 { reg = <0x80000000 0x10000000>; reg-names ="addr_space"; ... }; ... }; In the diagram above, CPU writes data to outbound window address 0x7000_0000, and the bus fabric maps it to 0x8000_0000. The ATU uses bus address 0x8000_0000 as input address and maps to some PCI address dynamically allocated by a PCI device driver on the host side. The pcie-ep@5f010000 'reg[addr_space]' is the parent bus address, which is the input of PCIe controller, including the ATU. Set parent_bus_offset, the offset from the CPU address to the PCIe controller input address using dw_pcie_init_parent_bus_offset(). The parent_bus_offset is not used yet, so no functional change intended. [bhelgaas: commit log] Link: https://lore.kernel.org/r/20250315201548.858189-11-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()Bjorn Helgaas
Consolidate devicetree resource handling in dw_pcie_ep_get_resources(). No functional change intended. Link: https://lore.kernel.org/r/20250315201548.858189-10-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-03-24PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()Bjorn Helgaas
Move devm_pci_epc_create() to the beginning of dw_pcie_ep_init(). devm_pci_epc_create() is generic code that doesn't depend on any DWC resource, so moving it earlier keeps all the subsequent devicetree-related code together. Link: https://lore.kernel.org/r/20250315201548.858189-9-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-03-24PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offsetFrank Li
The 'ranges' property of a PCI controller's parent can indicate address translation information. Most system use 1:1 map between CPU physical and PCI controller input addresses. But some hardware, like i.MX8QXP, doesn't use 1:1 map. See below diagram: ┌─────────┐ ┌────────────┐ ┌─────┐ │ │ IA: 0x8ff8_0000 │ │ │ CPU ├───►│ ┌────►├─────────────────┐ │ PCI │ └─────┘ │ │ │ IA: 0x8ff0_0000 │ │ │ CPU Addr │ │ ┌─►├─────────────┐ │ │ Controller │ 0x7ff8_0000─┼───┘ │ │ │ │ │ │ │ │ │ │ │ │ │ PCI Addr 0x7ff0_0000─┼──────┘ │ │ └──► IOSpace ─┼────────────► │ │ │ │ │ 0 0x7000_0000─┼────────►├─────────┐ │ │ │ └─────────┘ │ └──────► CfgSpace ─┼────────────► Bus Fabric │ │ │ 0 │ │ │ └──────────► MemSpace ─┼────────────► IA: 0x8000_0000 │ │ 0x8000_0000 └────────────┘ bus@5f000000 { compatible = "simple-bus"; #address-cells = <1>; #size-cells = <1>; ranges = <0x80000000 0x0 0x70000000 0x10000000>; pcie@5f010000 { compatible = "fsl,imx8q-pcie"; reg = <0x5f010000 0x10000>, <0x8ff00000 0x80000>; reg-names = "dbi", "config"; ... }; }; Intermediate address (IA) here means the PCIe controller input address. The pcie@5f010000 'reg[config]' address is the parent bus (PCIe controller input) address of CfgSpace. The ATU in MemSpace is not explicitly described via devicetree, so we assume the offset from CPU address to intermediate MemSpace address is the same as that for CfgSpace. We could use bus@5f000000 'ranges' for the same purpose. Set parent_bus_offset using dw_pcie_init_parent_bus_offset(). The parent_bus_offset is not used yet, so no functional change intended. Link: https://lore.kernel.org/r/20250315201548.858189-8-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debugFrank Li
dw_pcie_parent_bus_offset() looks up the parent bus address of a PCI controller 'reg' property in devicetree. If implemented, .cpu_addr_fixup() is a hard-coded way to get the parent bus address corresponding to a CPU physical address. Add debug code to compare the address from .cpu_addr_fixup() with the address from devicetree. If they match, warn that .cpu_addr_fixup() is redundant and should be removed; if they differ, warn that something is wrong with the devicetree. If .cpu_addr_fixup() is not implemented, the parent bus address should be identical to the CPU physical address because we previously ignored the parent bus address from devicetree. If the devicetree has a different parent bus address, warn about it being broken. [bhelgaas: split debug to separate patch for easier future revert, commit log] Link: https://lore.kernel.org/r/20250315201548.858189-7-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> [bhelgaas: squash Ioana Ciornei <ioana.ciornei@nxp.com> fix for NULL pointer deref when driver doesn't supply dw_pcie_ops, e.g., layerscape-pcie https://lore.kernel.org/r/20250319134339.3114817-1-ioana.ciornei@nxp.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24Merge tag 'vfs-6.15-rc1.initramfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs initramfs updates from Christian Brauner: "This adds basic kunit test coverage for initramfs unpacking and cleans up some buffer handling issues and inefficiencies" * tag 'vfs-6.15-rc1.initramfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: append initramfs files to the VFS section initramfs: avoid static buffer for error message initramfs: fix hardlink hash leak without TRAILER initramfs: reuse name_len for dir mtime tracking initramfs: allocate heap buffers together initramfs: avoid memcpy for hex header fields vsprintf: add simple_strntoul initramfs_test: kunit tests for initramfs unpacking init: add initramfs_internal.h
2025-03-24selftests: drv-net: rss_ctx: Don't assume indirection table is presentGal Pressman
The test_rss_context_dump() test assumes the indirection table is always supported, which is not true for all drivers, e.g., virtio_net when VIRTIO_NET_F_RSS is disabled. Skip the check if 'indir' is not present. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24docs/kcm: Fix typo "BFP"Ryohei Kinugawa
'BFP' should be 'BPF'. Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com> Link: https://patch.msgid.link/20250318095154.4187952-1-ryohei.kinugawa@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'r8169-enable-more-devices-aspm-support'Jakub Kicinski
ChunHao Lin says: ==================== r8169: enable more devices ASPM support This series of patches will enable more devices ASPM support. It also fix a RTL8126 cannot enter L1 substate issue when ASPM is enabled. ==================== Link: https://patch.msgid.link/20250318083721.4127-1-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24r8169: disable RTL8126 ZRX-DC timeoutChunHao Lin
Disable it due to it dose not meet ZRX-DC specification. If it is enabled, device will exit L1 substate every 100ms. Disable it for saving more power in L1 substate. Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24r8169: enable RTL8168H/RTL8168EP/RTL8168FP ASPM supportChunHao Lin
This patch will enable RTL8168H/RTL8168EP/RTL8168FP ASPM support on the platforms that have tested with ASPM enabled. Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20250318083721.4127-2-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge tag 'vfs-6.15-rc1.ceph' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs ceph updates from Christian Brauner: "This contains the work to remove access to page->index from ceph and fixes the test failure observed for ceph with generic/421 by refactoring ceph_writepages_start()" * tag 'vfs-6.15-rc1.ceph' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fscrypt: Change fscrypt_encrypt_pagecache_blocks() to take a folio ceph: Fix error handling in fill_readdir_cache() fs: Remove page_mkwrite_check_truncate() ceph: Pass a folio to ceph_allocate_page_array() ceph: Convert ceph_move_dirty_page_in_page_array() to move_dirty_folio_in_page_array() ceph: Remove uses of page from ceph_process_folio_batch() ceph: Convert ceph_check_page_before_write() to use a folio ceph: Convert writepage_nounlock() to write_folio_nounlock() ceph: Convert ceph_readdir_cache_control to store a folio ceph: Convert ceph_find_incompatible() to take a folio ceph: Use a folio in ceph_page_mkwrite() ceph: Remove ceph_writepage() ceph: fix generic/421 test failure ceph: introduce ceph_submit_write() method ceph: introduce ceph_process_folio_batch() method ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoring
2025-03-24docs: networking: strparser: Fix a typoWangYuli
The context indicates that 'than' is the correct word instead of 'then', as a comparison is being performed. Given that 'then' is also a valid English word, checkpatch.pl wouldn't have picked up on this spelling error. This typo was caught by AI during code review. Suggested-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24docs: fix the path of example code and example commands for device memory TCPYui Washizu
This updates the old path and fixes the description of unavailable options. Signed-off-by: Yui Washizu <yui.washidu@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24tcp/dccp: Remove inet_connection_sock_af_ops.addr2sockaddr().Kuniyuki Iwashima
inet_connection_sock_af_ops.addr2sockaddr() hasn't been used at all in the git era. $ git grep addr2sockaddr $(git rev-list HEAD | tail -n 1) Let's remove it. Note that there was a 4 bytes hole after sockaddr_len and now it's 6 bytes, so the binary layout is not changed. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250318060112.3729-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24gve: unlink old napi only if page pool existsHarshitha Ramamurthy
Commit de70981f295e ("gve: unlink old napi when stopping a queue using queue API") unlinks the old napi when stopping a queue. But this breaks QPL mode of the driver which does not use page pool. Fix this by checking that there's a page pool associated with the ring. Cc: stable@vger.kernel.org Fixes: de70981f295e ("gve: unlink old napi when stopping a queue using queue API") Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250317214141.286854-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24selftest: net: update proc_net_pktgen (add more imix_weights test cases)Peter Seiderer
Add more imix_weights test cases (for incomplete input). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: pktgen: add strict buffer parsing index checkPeter Seiderer
Add strict buffer parsing index check to avoid the following Smatch warning: net/core/pktgen.c:877 get_imix_entries() warn: check that incremented offset 'i' is capped Checking the buffer index i after every get_user/i++ step and returning with error code immediately avoids the current indirect (but correct) error handling. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/netdev/36cf3ee2-38b1-47e5-a42a-363efeb0ace3@stanley.mountain/ Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250317090401.1240704-1-ps.report@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge tag 'vfs-6.15-rc1.pagesize' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs pagesize updates from Christian Brauner: "This enables block sizes greater than the page size for block devices. With this we can start supporting block devices with logical block sizes larger than 4k. It also allows to lift the device cache sector size support to 64k. This allows filesystems which can use larger sector sizes up to 64k to ensure that the filesystem will not generate writes that are smaller than the specified sector size" * tag 'vfs-6.15-rc1.pagesize' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: bdev: add back PAGE_SIZE block size validation for sb_set_blocksize() bdev: use bdev_io_min() for statx block size block/bdev: lift block size restrictions to 64k block/bdev: enable large folio support for large logical block sizes fs/buffer fs/mpage: remove large folio restriction fs/mpage: use blocks_per_folio instead of blocks_per_page fs/mpage: avoid negative shift for large blocksize fs/buffer: remove batching from async read fs/buffer: simplify block_read_full_folio() with bh_offset()
2025-03-24Merge tag 'vfs-6.15-rc1.mount.namespace' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount namespace updates from Christian Brauner: "This expands the ability of anonymous mount namespaces: - Creating detached mounts from detached mounts Currently, detached mounts can only be created from attached mounts. This limitaton prevents various use-cases. For example, the ability to mount a subdirectory without ever having to make the whole filesystem visible first. The current permission modelis: (1) Check that the caller is privileged over the owning user namespace of it's current mount namespace. (2) Check that the caller is located in the mount namespace of the mount it wants to create a detached copy of. While it is not strictly necessary to do it this way it is consistently applied in the new mount api. This model will also be used when allowing the creation of detached mount from another detached mount. The (1) requirement can simply be met by performing the same check as for the non-detached case, i.e., verify that the caller is privileged over its current mount namespace. To meet the (2) requirement it must be possible to infer the origin mount namespace that the anonymous mount namespace of the detached mount was created from. The origin mount namespace of an anonymous mount is the mount namespace that the mounts that were copied into the anonymous mount namespace originate from. In order to check the origin mount namespace of an anonymous mount namespace the sequence number of the original mount namespace is recorded in the anonymous mount namespace. With this in place it is possible to perform an equivalent check (2') to (2). The origin mount namespace of the anonymous mount namespace must be the same as the caller's mount namespace. To establish this the sequence number of the caller's mount namespace and the origin sequence number of the anonymous mount namespace are compared. The caller is always located in a non-anonymous mount namespace since anonymous mount namespaces cannot be setns()ed into. The caller's mount namespace will thus always have a valid sequence number. The owning namespace of any mount namespace, anonymous or non-anonymous, can never change. A mount attached to a non-anonymous mount namespace can never change mount namespace. If the sequence number of the non-anonymous mount namespace and the origin sequence number of the anonymous mount namespace match, the owning namespaces must match as well. Hence, the capability check on the owning namespace of the caller's mount namespace ensures that the caller has the ability to copy the mount tree. - Allow mount detached mounts on detached mounts Currently, detached mounts can only be mounted onto attached mounts. This limitation makes it impossible to assemble a new private rootfs and move it into place. Instead, a detached tree must be created, attached, then mounted open and then either moved or detached again. Lift this restriction. In order to allow mounting detached mounts onto other detached mounts the same permission model used for creating detached mounts from detached mounts can be used (cf. above). Allowing to mount detached mounts onto detached mounts leaves three cases to consider: (1) The source mount is an attached mount and the target mount is a detached mount. This would be equivalent to moving a mount between different mount namespaces. A caller could move an attached mount to a detached mount. The detached mount can now be freely attached to any mount namespace. This changes the current delegatioh model significantly for no good reason. So this will fail. (2) Anonymous mount namespaces are always attached fully, i.e., it is not possible to only attach a subtree of an anoymous mount namespace. This simplifies the implementation and reasoning. Consequently, if the anonymous mount namespace of the source detached mount and the target detached mount are the identical the mount request will fail. (3) The source mount's anonymous mount namespace is different from the target mount's anonymous mount namespace. In this case the source anonymous mount namespace of the source mount tree must be freed after its mounts have been moved to the target anonymous mount namespace. The source anonymous mount namespace must be empty afterwards. By allowing to mount detached mounts onto detached mounts a caller may do the following: fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE) fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE) fd_tree1 and fd_tree2 refer to two different detached mount trees that belong to two different anonymous mount namespace. It is important to note that fd_tree1 and fd_tree2 both refer to the root of their respective anonymous mount namespaces. By allowing to mount detached mounts onto detached mounts the caller may now do: move_mount(fd_tree1, "", fd_tree2, "", MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH) This will cause the detached mount referred to by fd_tree1 to be mounted on top of the detached mount referred to by fd_tree2. Thus, the detached mount fd_tree1 is moved from its separate anonymous mount namespace into fd_tree2's anonymous mount namespace. It also means that while fd_tree2 continues to refer to the root of its respective anonymous mount namespace fd_tree1 doesn't anymore. This has the consequence that only fd_tree2 can be moved to another anonymous or non-anonymous mount namespace. Moving fd_tree1 will now fail as fd_tree1 doesn't refer to the root of an anoymous mount namespace anymore. Now fd_tree1 and fd_tree2 refer to separate detached mount trees referring to the same anonymous mount namespace. This is conceptually fine. The new mount api does allow for this to happen already via: mount -t tmpfs tmpfs /mnt mkdir -p /mnt/A mount -t tmpfs tmpfs /mnt/A fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE | AT_RECURSIVE) fd_tree4 = open_tree(-EBADF, "/mnt/A", 0) Both fd_tree3 and fd_tree4 refer to two different detached mount trees but both detached mount trees refer to the same anonymous mount namespace. An as with fd_tree1 and fd_tree2, only fd_tree3 may be moved another mount namespace as fd_tree3 refers to the root of the anonymous mount namespace just while fd_tree4 doesn't. However, there's an important difference between the fd_tree3/fd_tree4 and the fd_tree1/fd_tree2 example. Closing fd_tree4 and releasing the respective struct file will have no further effect on fd_tree3's detached mount tree. However, closing fd_tree3 will cause the mount tree and the respective anonymous mount namespace to be destroyed causing the detached mount tree of fd_tree4 to be invalid for further mounting. By allowing to mount detached mounts on detached mounts as in the fd_tree1/fd_tree2 example both struct files will affect each other. Both fd_tree1 and fd_tree2 refer to struct files that have FMODE_NEED_UNMOUNT set. To handle this we use the fact that @fd_tree1 will have a parent mount once it has been attached to @fd_tree2. When dissolve_on_fput() is called the mount that has been passed in will refer to the root of the anonymous mount namespace. If it doesn't it would mean that mounts are leaked. So before allowing to mount detached mounts onto detached mounts this would be a bug. Now that detached mounts can be mounted onto detached mounts it just means that the mount has been attached to another anonymous mount namespace and thus dissolve_on_fput() must not unmount the mount tree or free the anonymous mount namespace as the file referring to the root of the namespace hasn't been closed yet. If it had been closed yet it would be obvious because the mount namespace would be NULL, i.e., the @fd_tree1 would have already been unmounted. If @fd_tree1 hasn't been unmounted yet and has a parent mount it is safe to skip any cleanup as closing @fd_tree2 will take care of all cleanup operations. - Allow mount propagation for detached mount trees In commit ee2e3f50629f ("mount: fix mounting of detached mounts onto targets that reside on shared mounts") I fixed a bug where propagating the source mount tree of an anonymous mount namespace into a target mount tree of a non-anonymous mount namespace could be used to trigger an integer overflow in the non-anonymous mount namespace causing any new mounts to fail. The cause of this was that the propagation algorithm was unable to recognize mounts from the source mount tree that were already propagated into the target mount tree and then reappeared as propagation targets when walking the destination propagation mount tree. When fixing this I disabled mount propagation into anonymous mount namespaces. Make it possible for anonymous mount namespace to receive mount propagation events correctly. This is now also a correctness issue now that we allow mounting detached mount trees onto detached mount trees. Mark the source anonymous mount namespace with MNTNS_PROPAGATING indicating that all mounts belonging to this mount namespace are currently in the process of being propagated and make the propagation algorithm discard those if they appear as propagation targets" * tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits) selftests: test subdirectory mounting selftests: add test for detached mount tree propagation fs: namespace: fix uninitialized variable use mount: handle mount propagation for detached mount trees fs: allow creating detached mounts from fsmount() file descriptors selftests: seventh test for mounting detached mounts onto detached mounts selftests: sixth test for mounting detached mounts onto detached mounts selftests: fifth test for mounting detached mounts onto detached mounts selftests: fourth test for mounting detached mounts onto detached mounts selftests: third test for mounting detached mounts onto detached mounts selftests: second test for mounting detached mounts onto detached mounts selftests: first test for mounting detached mounts onto detached mounts fs: mount detached mounts onto detached mounts fs: support getname_maybe_null() in move_mount() selftests: create detached mounts from detached mounts fs: create detached mounts from detached mounts fs: add may_copy_tree() fs: add fastpath for dissolve_on_fput() fs: add assert for move_mount() fs: add mnt_ns_empty() helper ...