Age | Commit message (Collapse) | Author |
|
There is asymmetry in how the VXLAN join and leave functions are used.
The join function (mlxsw_sp_bridge_vxlan_join()) is only called in
response to netdev events (e.g., VXLAN device joining a bridge), but the
leave function is also called in response to switchdev events (e.g.,
VLAN configuration on top of the VXLAN device) in order to invalidate
VNI to FID mappings.
This asymmetry will cause problems when the functions will be later
extended to mark VXLAN bridge ports as offloaded or not.
Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave())
that is used to invalidate VNI to FID mappings and call it from
mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to
netdev events, like mlxsw_sp_bridge_vxlan_join().
No functional changes intended.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bridge
mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device
joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the
bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a
VLAN, but at this point no VLANs are configured on the VxLAN device. This
means that we can call the APIs, but there is no point to do that, as they
do not configure anything in such cases.
Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware
domain for VXLAN, this should be done also when a VXLAN device joins or
leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything
in these flows.
Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like
mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up,
so move the check to be done before calling
mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing
behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave().
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Next patch will set the same hardware domain for all bridge ports,
including VXLAN, to prevent packets from being forwarded by software when
they were already forwarded by hardware.
ARP packets are not flooded by hardware to VXLAN, so software should handle
such flooding. When hardware domain of VXLAN device will be changed, ARP
packets which are trapped and marked with offload_fwd_mark will not be
flooded to VXLAN also in software, which will break VXLAN traffic.
To prevent such breaking, trap ARP packets at layer 2 and don't mark them
as L2-forwarded in hardware, then flooding ARP packets will be done only
in software, and VXLAN will send ARP packets.
Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are
trapped when they enter the device.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Marek Behún says:
====================
Fixes for mv88e6xxx (mainly 6320 family)
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20250313134146.27087-1-kabel@kernel.org/
====================
Link: https://patch.msgid.link/20250317173250.28780-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement the workaround for erratum
3.3 RGMII timing may be out of spec when transmit delay is enabled
for the 6320 family, which says:
When transmit delay is enabled via Port register 1 bit 14 = 1, duty
cycle may be out of spec. Under very rare conditions this may cause
the attached device receive CRC errors.
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: <stable@vger.kernel.org> # 5.4.x
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-8-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix internal PHYs definition for the 6320 family, which has only 2
internal PHYs (on ports 3 and 4).
Fixes: bc3931557d1d ("net: dsa: mv88e6xxx: Add number of internal PHYs")
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: <stable@vger.kernel.org> # 6.6.x
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-7-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit c050f5e91b47 ("net: dsa: mv88e6xxx: Fill in STU support for all
supported chips") introduced STU methods, but did not add them to the
6320 family. Fix it.
Fixes: c050f5e91b47 ("net: dsa: mv88e6xxx: Fill in STU support for all supported chips")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-6-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy")
did not add the .port_set_policy() method for the 6320 family. Fix it.
Fixes: f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-5-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit f36456522168 ("net: dsa: mv88e6xxx: move PVT description in
info") did not enable PVT for 6321 switch. Fix it.
Fixes: f36456522168 ("net: dsa: mv88e6xxx: move PVT description in info")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-4-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The atu_move_port_mask for 6341 family (Topaz) is 0xf, not 0x1f. The
PortVec field is 8 bits wide, not 11 as in 6390 family. Fix this.
Fixes: e606ca36bbf2 ("net: dsa: mv88e6xxx: rework ATU Remove")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-3-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The VTU registers of the 6320 family use the 6352 semantics, not 6185.
Fix it.
Fixes: b8fee9571063 ("net: dsa: mv88e6xxx: add VLAN Get Next support")
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: <stable@vger.kernel.org> # 5.15.x
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250317173250.28780-2-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- elf: Define and use note name macros (Akihiko Odaki)
- elf: add remaining SHF_ flag macros (Timur Tabi)
- binfmt: Remove loader from linux_binprm struct (Yonatan Goldschmidt)
- binfmt_elf_fdpic: fix variable set but not used warning (sunliming)
* tag 'execve-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix variable set but not used warning
elf: add remaining SHF_ flag macros
binfmt: Remove loader from linux_binprm struct
crash: Remove KEXEC_CORE_NOTE_NAME
s390/crash: Use note name macros
crash: Use note name macros
powerpc/crash: Use note name macros
binfmt_elf: Use note name macros
elf: Define note name macros
|
|
A change to the clk driver broke configurations that enable DA830
but not DA850:
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o: in function `__da850_pll0_of_clk_init_declare':
pll.c:(.init.text+0x30): undefined reference to `of_da850_pll0_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o:(.rodata.davinci_pll_id_table+0x14): undefined reference to `da850_pll0_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o:(.rodata.davinci_pll_id_table+0x2c): undefined reference to `da850_pll1_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/pll.o:(.rodata.davinci_pll_of_match+0xc0): undefined reference to `of_da850_pll1_init'
arm-linux-gnueabi-ld: drivers/clk/davinci/psc.o:(.rodata.davinci_psc_id_table+0x14): undefined reference to `da850_psc0_init_data'
arm-linux-gnueabi-ld: drivers/clk/davinci/psc.o:(.rodata.davinci_psc_id_table+0x2c): undefined reference to `da850_psc1_init_data'
Select ARCH_DAVINCI_DA850 unconditionally to ensure the driver can still
build.
Fixes: a31b4dcf188c ("clk: davinci: remove support for da830")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Michael Chan says:
====================
bnxt_en: Fix MAX_SKB_FRAGS > 30
The driver was written with the assumption that MAX_SKB_FRAGS could
not exceed what the NIC can support. About 2 years ago,
CONFIG_MAX_SKB_FRAGS was added. The value can exceed what the NIC
can support and it may cause TX timeout. These 2 patches will fix
the issue.
====================
Link: https://patch.msgid.link/20250321211639.3812992-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If skb_shinfo(skb)->nr_frags excceds what the chip can support,
linearize the SKB and warn once to let the user know.
net.core.max_skb_frags can be lowered, for example, to avoid the
issue.
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321211639.3812992-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bd_cnt field in the TX BD specifies the total number of BDs for
the TX packet. The bd_cnt field has 5 bits and the maximum number
supported is 32 with the value 0.
CONFIG_MAX_SKB_FRAGS can be modified and the total number of SKB
fragments can approach or exceed the maximum supported by the chip.
Add a macro to properly mask the bd_cnt field so that the value 32
will be properly masked and set to 0 in the bd_cnd field.
Without this patch, the out-of-range bd_cnt value will corrupt the
TX BD and may cause TX timeout.
The next patch will check for values exceeding 32.
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321211639.3812992-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently network taps unbound to any interface are linked in the
global ptype_all list, affecting the performance in all the network
namespaces.
Add per netns ptypes chains, so that in the mentioned case only
the netns owning the packet socket(s) is affected.
While at that drop the global ptype_all list: no in kernel user
registers a tap on "any" type without specifying either the target
device or the target namespace (and IMHO doing that would not make
any sense).
Note that this adds a conditional in the fast path (to check for
per netns ptype_specific list) and increases the dataset size by
a cacheline (owing the per netns lists).
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumaze@google.com>
Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove debugfs_tx() which was added when the caif driver was added in
commit 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)")
but it has never been used.
Flagged by LLVM 19.1.7 W=1 builds.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250320-caif-debugfs-tx-v1-1-be5654770088@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
of_gpio.h is deprecated. Since there is no of_gpio_x API, drop
unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if
no user in driver.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The net fixed phy driver does not require the creation of a platform
device. Originally, this approach was chosen for simplicity when the
driver was first implemented.
With the introduction of the lightweight faux device interface, we now
have a more appropriate alternative. Migrate the device to utilize the
faux bus, given that the platform device it previously created was not
a real one anyway. This will get rid of the fake platform device.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/20250319135209.2734594-1-sudeep.holla@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tariq Toukan says:
====================
mlx5 cleanups 2025-03-19
This series contains small cleanups to the mlx5 core and Eth drivers.
====================
Link: https://patch.msgid.link/1742412199-159596-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Always set PAGE_POOL_STATS in mlx5 Eth driver.
Cleanup the corresponding #ifdefs.
Page pool stats are essential to monitor and analyze RX performance.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use bitmap_free() to free memory allocated with bitmap_zalloc_node().
This fixes memtrack error:
mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix coccinelle warnings:
WARNING: NULL check before dev_{put, hold} functions is not needed.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
From what I can tell the .get_fixed_state pointer in the phylink structure
hasn't been used since commit 5c05c1dbb177 ("net: phylink, dsa: eliminate
phylink_fixed_state_cb()") . Since I can't find any users for it we might
as well just drop the pointer.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/174240634772.1745174.5690351737682751849.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The assignment of zero to udph->check is unnecessary as it is
immediately overwritten in the subsequent line. Remove the redundant
assignment.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250319-netpoll_nit-v1-1-a7faac5cbd92@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull tasklist_lock optimizations from Christian Brauner:
"According to the performance testbots this brings a 23% performance
increase when creating new processes:
- Reduce tasklist_lock hold time on exit:
- Perform add_device_randomness() without tasklist_lock
- Perform free_pid() calls outside of tasklist_lock
- Drop irq disablement around pidmap_lock
- Add some tasklist_lock asserts
- Call flush_sigqueue() lockless by changing release_task()
- Don't pointlessly clear TIF_SIGPENDING in __exit_signal() ->
clear_tsk_thread_flag()"
* tag 'kernel-6.15-rc1.tasklist_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pid: drop irq disablement around pidmap_lock
pid: perform free_pid() calls outside of tasklist_lock
pid: sprinkle tasklist_lock asserts
exit: hoist get_pid() in release_task() outside of tasklist_lock
exit: perform add_device_randomness() without tasklist_lock
exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING)
exit: change the release_task() paths to call flush_sigqueue() lockless
|
|
There commands can be used to add an RSS context and steer some traffic
into it:
# ethtool -X eth0 context new
New RSS context is 1
# ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1
Added rule with ID 1023
However, the second command fails with EINVAL on mlx5e:
# ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
It happens when flow_get_tirn calls flow_type_to_traffic_type with
flow_type = IP_USER_FLOW or IPV6_USER_FLOW. That function only handles
IPV4_FLOW and IPV6_FLOW cases, but unlike all other cases which are
common for hash and spec, IPv4 and IPv6 defines different contants for
hash and for spec:
#define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */
#define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */
...
#define IPV4_USER_FLOW 0x0d /* spec only (usr_ip4_spec) */
#define IP_USER_FLOW IPV4_USER_FLOW
#define IPV6_USER_FLOW 0x0e /* spec only (usr_ip6_spec; nfc only) */
#define IPV4_FLOW 0x10 /* hash only */
#define IPV6_FLOW 0x11 /* hash only */
Extend the switch in flow_type_to_traffic_type to support both, which
fixes the failing ethtool -N command with flow-type ip4 or ip6.
Fixes: 248d3b4c9a39 ("net/mlx5e: Support flow classification into RSS contexts")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250319124508.3979818-1-maxim@isovalent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rust updates from Christian Brauner:
"This contains minor fixes and improvements to rust file bindings:
- Optimize rust symbol generation for FileDescriptorReservation
- Optimize rust symbol generation for SeqFile"
* tag 'vfs-6.15-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
rust: optimize rust symbol generation for SeqFile
rust: file: optimize rust symbol generation for FileDescriptorReservation
|
|
Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs.
Don't call xpcs_config_eee_mult_fact() in this case, as this causes a
crash at init :
Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write
[...]
Call trace:
xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c
stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244
stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc
socfpga_dwmac_probe from platform_probe+0x5c/0xb0
Fixes: 060fb27060e8 ("net: stmmac: call xpcs_config_eee_mult_fact()")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250321103502.1303539-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file handling updates from Christian Brauner:
"This contains performance improvements for struct file's new refcount
mechanism and various other performance work:
- The stock kernel transitioning the file to no refs held penalizes
the caller with an extra atomic to block any increments. For cases
where the file is highly likely to be going away this is easily
avoidable.
Add file_ref_put_close() to better handle the common case where
closing a file descriptor also operates on the last reference and
build fput_close_sync() and fput_close() on top of it. This brings
about 1% performance improvement by eliding one atomic in the
common case.
- Predict no error in close() since the vast majority of the time
system call returns 0.
- Reduce the work done in fdget_pos() by predicting that the file was
found and by explicitly comparing the reference count to one and
ignoring the dead zone"
* tag 'vfs-6.15-rc1.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: reduce work in fdget_pos()
fs: use fput_close() in path_openat()
fs: use fput_close() in filp_close()
fs: use fput_close_sync() in close()
file: add fput and file_ref_put routines optimized for use when closing a fd
fs: predict no error in close()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs orangefs updates from Christian Brauner:
"This contains the work to remove orangefs_writepage() and partially
convert it to folios.
A few regular bugfixes are included as well"
* tag 'vfs-6.15-rc1.orangefs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
orangefs: Convert orangefs_writepages to contain an array of folios
orangefs: Simplify bvec setup in orangefs_writepages_work()
orangefs: Unify error & success paths in orangefs_writepages_work()
orangefs: Pass mapping to orangefs_writepages_work()
orangefs: Convert orangefs_writepage_locked() to take a folio
orangefs: Remove orangefs_writepage()
orangefs: make open_for_read and open_for_write boolean
orangefs: Move s_kmod_keyword_mask_map to orangefs-debugfs.c
orangefs: Do not truncate file size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs afs updates from Christian Brauner:
"This contains the work for afs for this cycle:
- Fix an occasional hang that's only really encountered when
rmmod'ing the kafs module
- Remove the "-o autocell" mount option. This is obsolete with the
dynamic root and removing it makes the next patch slightly easier
- Change how the dynamic root mount is constructed. Currently, the
root directory is (de)populated when it is (un)mounted if there are
cells already configured and, further, pairs of automount points
have to be created/removed each time a cell is added/deleted
This is changed so that readdir on the root dir lists all the known
cell automount pairs plus the @cell symlinks and the inodes and
dentries are constructed by lookup on demand. This simplifies the
cell management code
- A few improvements to the afs_volume and afs_server tracepoints
- Pass trace info into the afs_lookup_cell() function to allow the
trace log to indicate the purpose of the lookup
- Remove the 'net' parameter from afs_unuse_cell() as it's
superfluous
- In rxrpc, allow a kernel app (such as kafs) to store a word of
information on rxrpc_peer records
- Use the information stored on the rxrpc_peer record to point to the
afs_server record. This allows the server address lookup to be done
away with
- Simplify the afs_server ref/activity accounting to make each one
self-contained and not garbage collected from the cell management
work item
- Simplify the afs_cell ref/activity accounting to make each one of
these also self-contained and not driven by a central management
work item
The current code was intended to make it such that a single timer
for the namespace and one work item per cell could do all the work
required to maintain these records. This, however, made for some
sequencing problems when cleaning up these records. Further, the
attempt to pass refs along with timers and work items made getting
it right rather tricky when the timer or work item already had a
ref attached and now a ref had to be got rid of"
* tag 'vfs-6.15-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Simplify cell record handling
afs: Fix afs_server ref accounting
afs: Use the per-peer app data provided by rxrpc
rxrpc: Allow the app to store private data on peer structs
afs: Drop the net parameter from afs_unuse_cell()
afs: Make afs_lookup_cell() take a trace note
afs: Improve server refcount/active count tracing
afs: Improve afs_volume tracing to display a debug ID
afs: Change dynroot to create contents on demand
afs: Remove the "autocell" mount option
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs initramfs updates from Christian Brauner:
"This adds basic kunit test coverage for initramfs unpacking and cleans
up some buffer handling issues and inefficiencies"
* tag 'vfs-6.15-rc1.initramfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
MAINTAINERS: append initramfs files to the VFS section
initramfs: avoid static buffer for error message
initramfs: fix hardlink hash leak without TRAILER
initramfs: reuse name_len for dir mtime tracking
initramfs: allocate heap buffers together
initramfs: avoid memcpy for hex header fields
vsprintf: add simple_strntoul
initramfs_test: kunit tests for initramfs unpacking
init: add initramfs_internal.h
|
|
The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.
Skip the check if 'indir' is not present.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'BFP' should be 'BPF'.
Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com>
Link: https://patch.msgid.link/20250318095154.4187952-1-ryohei.kinugawa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ChunHao Lin says:
====================
r8169: enable more devices ASPM support
This series of patches will enable more devices ASPM support.
It also fix a RTL8126 cannot enter L1 substate issue when ASPM is
enabled.
====================
Link: https://patch.msgid.link/20250318083721.4127-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Disable it due to it dose not meet ZRX-DC specification. If it is enabled,
device will exit L1 substate every 100ms. Disable it for saving more power
in L1 substate.
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch will enable RTL8168H/RTL8168EP/RTL8168FP ASPM support on
the platforms that have tested with ASPM enabled.
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-2-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs ceph updates from Christian Brauner:
"This contains the work to remove access to page->index from ceph
and fixes the test failure observed for ceph with generic/421 by
refactoring ceph_writepages_start()"
* tag 'vfs-6.15-rc1.ceph' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fscrypt: Change fscrypt_encrypt_pagecache_blocks() to take a folio
ceph: Fix error handling in fill_readdir_cache()
fs: Remove page_mkwrite_check_truncate()
ceph: Pass a folio to ceph_allocate_page_array()
ceph: Convert ceph_move_dirty_page_in_page_array() to move_dirty_folio_in_page_array()
ceph: Remove uses of page from ceph_process_folio_batch()
ceph: Convert ceph_check_page_before_write() to use a folio
ceph: Convert writepage_nounlock() to write_folio_nounlock()
ceph: Convert ceph_readdir_cache_control to store a folio
ceph: Convert ceph_find_incompatible() to take a folio
ceph: Use a folio in ceph_page_mkwrite()
ceph: Remove ceph_writepage()
ceph: fix generic/421 test failure
ceph: introduce ceph_submit_write() method
ceph: introduce ceph_process_folio_batch() method
ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoring
|
|
The context indicates that 'than' is the correct word instead of 'then',
as a comparison is being performed.
Given that 'then' is also a valid English word, checkpatch.pl wouldn't
have picked up on this spelling error.
This typo was caught by AI during code review.
Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This updates the old path and fixes the description of unavailable options.
Signed-off-by: Yui Washizu <yui.washidu@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet_connection_sock_af_ops.addr2sockaddr() hasn't been used at all
in the git era.
$ git grep addr2sockaddr $(git rev-list HEAD | tail -n 1)
Let's remove it.
Note that there was a 4 bytes hole after sockaddr_len and now it's
6 bytes, so the binary layout is not changed.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250318060112.3729-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit de70981f295e ("gve: unlink old napi when stopping a queue using
queue API") unlinks the old napi when stopping a queue. But this breaks
QPL mode of the driver which does not use page pool. Fix this by checking
that there's a page pool associated with the ring.
Cc: stable@vger.kernel.org
Fixes: de70981f295e ("gve: unlink old napi when stopping a queue using queue API")
Reviewed-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317214141.286854-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add more imix_weights test cases (for incomplete input).
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add strict buffer parsing index check to avoid the following Smatch
warning:
net/core/pktgen.c:877 get_imix_entries()
warn: check that incremented offset 'i' is capped
Checking the buffer index i after every get_user/i++ step and returning
with error code immediately avoids the current indirect (but correct)
error handling.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/netdev/36cf3ee2-38b1-47e5-a42a-363efeb0ace3@stanley.mountain/
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-1-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pagesize updates from Christian Brauner:
"This enables block sizes greater than the page size for block devices.
With this we can start supporting block devices with logical block
sizes larger than 4k.
It also allows to lift the device cache sector size support to 64k.
This allows filesystems which can use larger sector sizes up to 64k to
ensure that the filesystem will not generate writes that are smaller
than the specified sector size"
* tag 'vfs-6.15-rc1.pagesize' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()
bdev: use bdev_io_min() for statx block size
block/bdev: lift block size restrictions to 64k
block/bdev: enable large folio support for large logical block sizes
fs/buffer fs/mpage: remove large folio restriction
fs/mpage: use blocks_per_folio instead of blocks_per_page
fs/mpage: avoid negative shift for large blocksize
fs/buffer: remove batching from async read
fs/buffer: simplify block_read_full_folio() with bh_offset()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount namespace updates from Christian Brauner:
"This expands the ability of anonymous mount namespaces:
- Creating detached mounts from detached mounts
Currently, detached mounts can only be created from attached
mounts. This limitaton prevents various use-cases. For example, the
ability to mount a subdirectory without ever having to make the
whole filesystem visible first.
The current permission modelis:
(1) Check that the caller is privileged over the owning user
namespace of it's current mount namespace.
(2) Check that the caller is located in the mount namespace of the
mount it wants to create a detached copy of.
While it is not strictly necessary to do it this way it is
consistently applied in the new mount api. This model will also be
used when allowing the creation of detached mount from another
detached mount.
The (1) requirement can simply be met by performing the same check
as for the non-detached case, i.e., verify that the caller is
privileged over its current mount namespace.
To meet the (2) requirement it must be possible to infer the origin
mount namespace that the anonymous mount namespace of the detached
mount was created from.
The origin mount namespace of an anonymous mount is the mount
namespace that the mounts that were copied into the anonymous mount
namespace originate from.
In order to check the origin mount namespace of an anonymous mount
namespace the sequence number of the original mount namespace is
recorded in the anonymous mount namespace.
With this in place it is possible to perform an equivalent check
(2') to (2). The origin mount namespace of the anonymous mount
namespace must be the same as the caller's mount namespace. To
establish this the sequence number of the caller's mount namespace
and the origin sequence number of the anonymous mount namespace are
compared.
The caller is always located in a non-anonymous mount namespace
since anonymous mount namespaces cannot be setns()ed into. The
caller's mount namespace will thus always have a valid sequence
number.
The owning namespace of any mount namespace, anonymous or
non-anonymous, can never change. A mount attached to a
non-anonymous mount namespace can never change mount namespace.
If the sequence number of the non-anonymous mount namespace and the
origin sequence number of the anonymous mount namespace match, the
owning namespaces must match as well.
Hence, the capability check on the owning namespace of the caller's
mount namespace ensures that the caller has the ability to copy the
mount tree.
- Allow mount detached mounts on detached mounts
Currently, detached mounts can only be mounted onto attached
mounts. This limitation makes it impossible to assemble a new
private rootfs and move it into place. Instead, a detached tree
must be created, attached, then mounted open and then either moved
or detached again. Lift this restriction.
In order to allow mounting detached mounts onto other detached
mounts the same permission model used for creating detached mounts
from detached mounts can be used (cf. above).
Allowing to mount detached mounts onto detached mounts leaves three
cases to consider:
(1) The source mount is an attached mount and the target mount is
a detached mount. This would be equivalent to moving a mount
between different mount namespaces. A caller could move an
attached mount to a detached mount. The detached mount can now
be freely attached to any mount namespace. This changes the
current delegatioh model significantly for no good reason. So
this will fail.
(2) Anonymous mount namespaces are always attached fully, i.e., it
is not possible to only attach a subtree of an anoymous mount
namespace. This simplifies the implementation and reasoning.
Consequently, if the anonymous mount namespace of the source
detached mount and the target detached mount are the identical
the mount request will fail.
(3) The source mount's anonymous mount namespace is different from
the target mount's anonymous mount namespace.
In this case the source anonymous mount namespace of the
source mount tree must be freed after its mounts have been
moved to the target anonymous mount namespace. The source
anonymous mount namespace must be empty afterwards.
By allowing to mount detached mounts onto detached mounts a caller
may do the following:
fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE)
fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE)
fd_tree1 and fd_tree2 refer to two different detached mount trees
that belong to two different anonymous mount namespace.
It is important to note that fd_tree1 and fd_tree2 both refer to
the root of their respective anonymous mount namespaces.
By allowing to mount detached mounts onto detached mounts the
caller may now do:
move_mount(fd_tree1, "", fd_tree2, "",
MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH)
This will cause the detached mount referred to by fd_tree1 to be
mounted on top of the detached mount referred to by fd_tree2.
Thus, the detached mount fd_tree1 is moved from its separate
anonymous mount namespace into fd_tree2's anonymous mount
namespace.
It also means that while fd_tree2 continues to refer to the root of
its respective anonymous mount namespace fd_tree1 doesn't anymore.
This has the consequence that only fd_tree2 can be moved to another
anonymous or non-anonymous mount namespace. Moving fd_tree1 will
now fail as fd_tree1 doesn't refer to the root of an anoymous mount
namespace anymore.
Now fd_tree1 and fd_tree2 refer to separate detached mount trees
referring to the same anonymous mount namespace.
This is conceptually fine. The new mount api does allow for this to
happen already via:
mount -t tmpfs tmpfs /mnt
mkdir -p /mnt/A
mount -t tmpfs tmpfs /mnt/A
fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE | AT_RECURSIVE)
fd_tree4 = open_tree(-EBADF, "/mnt/A", 0)
Both fd_tree3 and fd_tree4 refer to two different detached mount
trees but both detached mount trees refer to the same anonymous
mount namespace. An as with fd_tree1 and fd_tree2, only fd_tree3
may be moved another mount namespace as fd_tree3 refers to the root
of the anonymous mount namespace just while fd_tree4 doesn't.
However, there's an important difference between the
fd_tree3/fd_tree4 and the fd_tree1/fd_tree2 example.
Closing fd_tree4 and releasing the respective struct file will have
no further effect on fd_tree3's detached mount tree.
However, closing fd_tree3 will cause the mount tree and the
respective anonymous mount namespace to be destroyed causing the
detached mount tree of fd_tree4 to be invalid for further mounting.
By allowing to mount detached mounts on detached mounts as in the
fd_tree1/fd_tree2 example both struct files will affect each other.
Both fd_tree1 and fd_tree2 refer to struct files that have
FMODE_NEED_UNMOUNT set.
To handle this we use the fact that @fd_tree1 will have a parent
mount once it has been attached to @fd_tree2.
When dissolve_on_fput() is called the mount that has been passed in
will refer to the root of the anonymous mount namespace. If it
doesn't it would mean that mounts are leaked. So before allowing to
mount detached mounts onto detached mounts this would be a bug.
Now that detached mounts can be mounted onto detached mounts it
just means that the mount has been attached to another anonymous
mount namespace and thus dissolve_on_fput() must not unmount the
mount tree or free the anonymous mount namespace as the file
referring to the root of the namespace hasn't been closed yet.
If it had been closed yet it would be obvious because the mount
namespace would be NULL, i.e., the @fd_tree1 would have already
been unmounted. If @fd_tree1 hasn't been unmounted yet and has a
parent mount it is safe to skip any cleanup as closing @fd_tree2
will take care of all cleanup operations.
- Allow mount propagation for detached mount trees
In commit ee2e3f50629f ("mount: fix mounting of detached mounts
onto targets that reside on shared mounts") I fixed a bug where
propagating the source mount tree of an anonymous mount namespace
into a target mount tree of a non-anonymous mount namespace could
be used to trigger an integer overflow in the non-anonymous mount
namespace causing any new mounts to fail.
The cause of this was that the propagation algorithm was unable to
recognize mounts from the source mount tree that were already
propagated into the target mount tree and then reappeared as
propagation targets when walking the destination propagation mount
tree.
When fixing this I disabled mount propagation into anonymous mount
namespaces. Make it possible for anonymous mount namespace to
receive mount propagation events correctly. This is now also a
correctness issue now that we allow mounting detached mount trees
onto detached mount trees.
Mark the source anonymous mount namespace with MNTNS_PROPAGATING
indicating that all mounts belonging to this mount namespace are
currently in the process of being propagated and make the
propagation algorithm discard those if they appear as propagation
targets"
* tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
selftests: test subdirectory mounting
selftests: add test for detached mount tree propagation
fs: namespace: fix uninitialized variable use
mount: handle mount propagation for detached mount trees
fs: allow creating detached mounts from fsmount() file descriptors
selftests: seventh test for mounting detached mounts onto detached mounts
selftests: sixth test for mounting detached mounts onto detached mounts
selftests: fifth test for mounting detached mounts onto detached mounts
selftests: fourth test for mounting detached mounts onto detached mounts
selftests: third test for mounting detached mounts onto detached mounts
selftests: second test for mounting detached mounts onto detached mounts
selftests: first test for mounting detached mounts onto detached mounts
fs: mount detached mounts onto detached mounts
fs: support getname_maybe_null() in move_mount()
selftests: create detached mounts from detached mounts
fs: create detached mounts from detached mounts
fs: add may_copy_tree()
fs: add fastpath for dissolve_on_fput()
fs: add assert for move_mount()
fs: add mnt_ns_empty() helper
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs nsfs updates from Christian Brauner:
"This contains non-urgent fixes for nsfs to validate ioctls before
performing any relevant operations.
We alredy did this for a few other filesystems last cycle"
* tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/nsfs: add ioctl validation tests
nsfs: validate ioctls
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs sysv removal from Christian Brauner:
"This removes the sysv filesystem. We've discussed this various times.
It's time to try"
* tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
sysv: Remove the filesystem
|