Age | Commit message (Collapse) | Author |
|
Support large folios for symlinks and change the name from
fuse_getlink_page() to fuse_getlink_folio().
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add support for folios larger than one page size for folio reads into
the page cache.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add support for folios larger than one page size for writethrough
writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Refactor the logic in fuse_fill_write_pages() for copying out write
data. This will make the future change for supporting large folios for
writes easier. No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add support for folios larger than one page size for retrieves.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Currently, all folios associated with fuse are one page size. As part of
the work to enable large folios, this commit adds support for copying
to/from folios larger than one page size.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Implement the Device Memory TCP transmit path, allowing zero-copy
data transmission on top of TCP from e.g. GPU memory to the wire.
- Move all the IPv6 routing tables management outside the RTNL scope,
under its own lock and RCU. The route control path is now 3x times
faster.
- Convert queue related netlink ops to instance lock, reducing again
the scope of the RTNL lock. This improves the control plane
scalability.
- Refactor the software crc32c implementation, removing unneeded
abstraction layers and improving significantly the related
micro-benchmarks.
- Optimize the GRO engine for UDP-tunneled traffic, for a 10%
performance improvement in related stream tests.
- Cover more per-CPU storage with local nested BH locking; this is a
prep work to remove the current per-CPU lock in local_bh_disable()
on PREMPT_RT.
- Introduce and use nlmsg_payload helper, combining buffer bounds
verification with accessing payload carried by netlink messages.
Netfilter:
- Rewrite the procfs conntrack table implementation, improving
considerably the dump performance. A lot of user-space tools still
use this interface.
- Implement support for wildcard netdevice in netdev basechain and
flowtables.
- Integrate conntrack information into nft trace infrastructure.
- Export set count and backend name to userspace, for better
introspection.
BPF:
- BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
programs and can be controlled in similar way to traditional qdiscs
using the "tc qdisc" command.
- Refactor the UDP socket iterator, addressing long standing issues
WRT duplicate hits or missed sockets.
Protocols:
- Improve TCP receive buffer auto-tuning and increase the default
upper bound for the receive buffer; overall this improves the
single flow maximum thoughput on 200Gbs link by over 60%.
- Add AFS GSSAPI security class to AF_RXRPC; it provides transport
security for connections to the AFS fileserver and VL server.
- Improve TCP multipath routing, so that the sources address always
matches the nexthop device.
- Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
and thus preventing DoS caused by passing around problematic FDs.
- Retire DCCP socket. DCCP only receives updates for bugs, and major
distros disable it by default. Its removal allows for better
organisation of TCP fields to reduce the number of cache lines hit
in the fast path.
- Extend TCP drop-reason support to cover PAWS checks.
Driver API:
- Reorganize PTP ioctl flag support to require an explicit opt-in for
the drivers, avoiding the problem of drivers not rejecting new
unsupported flags.
- Converted several device drivers to timestamping APIs.
- Introduce per-PHY ethtool dump helpers, improving the support for
dump operations targeting PHYs.
Tests and tooling:
- Add support for classic netlink in user space C codegen, so that
ynl-c can now read, create and modify links, routes addresses and
qdisc layer configuration.
- Add ynl sub-types for binary attributes, allowing ynl-c to output
known struct instead of raw binary data, clarifying the classic
netlink output.
- Extend MPTCP selftests to improve the code-coverage.
- Add tests for XDP tail adjustment in AF_XDP.
New hardware / drivers:
- OpenVPN virtual driver: offload OpenVPN data channels processing to
the kernel-space, increasing the data transfer throughput WRT the
user-space implementation.
- Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.
- Broadcom asp-v3.0 ethernet driver.
- AMD Renoir ethernet device.
- ReakTek MT9888 2.5G ethernet PHY driver.
- Aeonsemi 10G C45 PHYs driver.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- refactor the steering table handling to significantly
reduce the amount of memory used
- add support for complex matches in H/W flow steering
- improve flow streeing error handling
- convert to netdev instance locking
- Intel (100G, ice, igb, ixgbe, idpf):
- ice: add switchdev support for LLDP traffic over VF
- ixgbe: add firmware manipulation and regions devlink support
- igb: introduce support for frame transmission premption
- igb: adds persistent NAPI configuration
- idpf: introduce RDMA support
- idpf: add initial PTP support
- Meta (fbnic):
- extend hardware stats coverage
- add devlink dev flash support
- Broadcom (bnxt):
- add support for RX-side device memory TCP
- Wangxun (txgbe):
- implement support for udp tunnel offload
- complete PTP and SRIOV support for AML 25G/10G devices
- Ethernet NICs embedded and virtual:
- Google (gve):
- add device memory TCP TX support
- Amazon (ena):
- support persistent per-NAPI config
- Airoha:
- add H/W support for L2 traffic offload
- add per flow stats for flow offloading
- RealTek (rtl8211): add support for WoL magic packet
- Synopsys (stmmac):
- dwmac-socfpga 1000BaseX support
- add Loongson-2K3000 support
- introduce support for hardware-accelerated VLAN stripping
- Broadcom (bcmgenet):
- expose more H/W stats
- Freescale (enetc, dpaa2-eth):
- enetc: add MAC filter, VLAN filter RSS and loopback support
- dpaa2-eth: convert to H/W timestamping APIs
- vxlan: convert FDB table to rhashtable, for better scalabilty
- veth: apply qdisc backpressure on full ring to reduce TX drops
- Ethernet switches:
- Microchip (kzZ88x3): add ETS scheduler support
- Ethernet PHYs:
- RealTek (rtl8211):
- add support for WoL magic packet
- add support for PHY LEDs
- CAN:
- Adds RZ/G3E CANFD support to the rcar_canfd driver.
- Preparatory work for CAN-XL support.
- Add self-tests framework with support for CAN physical interfaces.
- WiFi:
- mac80211:
- scan improvements with multi-link operation (MLO)
- Qualcomm (ath12k):
- enable AHB support for IPQ5332
- add monitor interface support to QCN9274
- add multi-link operation support to WCN7850
- add 802.11d scan offload support to WCN7850
- monitor mode for WCN7850, better 6 GHz regulatory
- Qualcomm (ath11k):
- restore hibernation support
- MediaTek (mt76):
- WiFi-7 improvements
- implement support for mt7990
- Intel (iwlwifi):
- enhanced multi-link single-radio (EMLSR) support on 5 GHz links
- rework device configuration
- RealTek (rtw88):
- improve throughput for RTL8814AU
- RealTek (rtw89):
- add multi-link operation support
- STA/P2P concurrency improvements
- support different SAR configs by antenna
- Bluetooth:
- introduce HCI Driver protocol
- btintel_pcie: do not generate coredump for diagnostic events
- btusb: add HCI Drv commands for configuring altsetting
- btusb: add RTL8851BE device 0x0bda:0xb850
- btusb: add new VID/PID 13d3/3584 for MT7922
- btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
- btnxpuart: implement host-wakeup feature"
* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
selftests/bpf: Fix bpf selftest build warning
selftests: netfilter: Fix skip of wildcard interface test
net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
net: openvswitch: Fix the dead loop of MPLS parse
calipso: Don't call calipso functions for AF_INET sk.
selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
octeontx2-pf: QOS: Perform cache sync on send queue teardown
net: mana: Add support for Multi Vports on Bare metal
net: devmem: ncdevmem: remove unused variable
net: devmem: ksft: upgrade rx test to send 1K data
net: devmem: ksft: add 5 tuple FS support
net: devmem: ksft: add exit_wait to make rx test pass
net: devmem: ksft: add ipv4 support
net: devmem: preserve sockc_err
page_pool: fix ugly page_pool formatting
net: devmem: move list_add to net_devmem_bind_dmabuf.
selftests: netfilter: nft_queue.sh: include file transfer duration in log message
net: phy: mscc: Fix memory leak when using one step timestamping
...
|
|
On NFS4ERR_DELAY nfs slient updates its stats, but misses for
flexfiles v4.1 DSes.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Instead of calling xchg() and unrcu_pointer() before
nfsd_file_put_local(), we now pass pointer to the __rcu pointer and call
xchg() and unrcu_pointer() inside that function.
Where unrcu_pointer() is currently called the internals of "struct
nfsd_file" are not known and that causes older compilers such as gcc-8
to complain.
In some cases we have a __kernel (aka normal) pointer not an __rcu
pointer so we need to cast it to __rcu first. This is strictly a
weakening so no information is lost. Somewhat surprisingly, this cast
is accepted by gcc-8.
This has the pleasing result that the cmpxchg() which sets ro_file and
rw_file, and also the xchg() which clears them, are both now in the nfsd
code.
Reported-by: Pali Rohár <pali@kernel.org>
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
nfs_uuid_put() and nfs_close_local_fh() can race if a "struct
nfs_file_localio" is released at the same time that nfsd calls
nfs_localio_invalidate_clients().
It is important that neither of these functions completes after the
other has started looking at a given nfs_file_localio and before it
finishes.
If nfs_uuid_put() exits while nfs_close_local_fh() is closing ro_file
and rw_file it could return to __nfd_file_cache_purge() while some files
are still referenced so the purge may not succeed.
If nfs_close_local_fh() exits while nfsd_uuid_put() is still closing the
files then the "struct nfs_file_localio" could be freed while
nfsd_uuid_put() is still looking at it. This side is currently handled
by copying the pointers out of ro_file and rw_file before deleting from
the list in nfsd_uuid. We need to preserve this while ensuring that
nfsd_uuid_put() does wait for nfs_close_local_fh().
This patch use nfl->uuid and nfl->list to provide the required
interlock.
nfs_uuid_put() removes the nfs_file_localio from the list, then drops
locks and puts the two files, then reclaims the spinlock and sets
->nfs_uuid to NULL.
nfs_close_local_fh() operates in the reverse order, setting ->nfs_uuid
to NULL, then closing the files, then unlinking from the list.
If nfs_uuid_put() finds that ->nfs_uuid is already NULL, it waits for
the nfs_file_localio to be removed from the list. If
nfs_close_local_fh() find that it has already been unlinked it waits for
->nfs_uuid to become NULL. This ensure that one of the two tries to
close the files, but they each waits for the other.
As nfs_uuid_put() is making the list empty, change from a
list_for_each_safe loop to a while that always takes the first entry.
This makes the intent more clear.
Also don't move the list to a temporary local list as this would defeat
the guarantees required for the interlock.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
nfs_close_local_fh() is called from two different places for quite
different use case.
It is called from nfs_uuid_put() when the nfs_uuid is being detached -
possibly because the nfs server is not longer serving that filesystem.
In this case there will always be an nfs_uuid and so rcu_read_lock() is
not needed.
It is also called when the nfs_file_localio is no longer needed. In
this case there may not be an active nfs_uuid.
These two can race, and handling the race properly while avoiding
excessive locking will require different handling on each side.
This patch prepares the way by opencoding nfs_close_local_fh() into
nfs_uuid_put(), then simplifying the code there as befits the context.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
The nfsd_localio_operations structure contains nfsd_file_get() to get a
reference to an nfsd_file. This is only used in one place, where
nfsd_open_local_fh() is also used.
This patch combines the two, calling nfsd_open_local_fh() passing a
pointer to where the nfsd_file pointer might be stored. If there is a
pointer there an nfsd_file_get() can get a reference, that reference is
returned. If not a new nfsd_file is acquired, stored at the pointer,
and returned. When we store a reference we also increase the refcount
on the net, as that refcount is decrements when we clear the stored
pointer.
We now get an extra reference *before* storing the new nfsd_file at the
given location. This avoids possible races with the nfsd_file being
freed before the final reference can be taken.
This patch moves the rcu_dereference() needed after fetching from
ro_file or rw_file into the nfsd code where the 'struct nfs_file' is
fully defined. This avoids an error reported by older versions of gcc
such as gcc-8 which complain about rcu_dereference() use in contexts
where the structure (which will supposedly be accessed) is not fully
defined.
Reported-by: Pali Rohár <pali@kernel.org>
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Having separate nfsd_file_put and nfsd_file_put_local in struct
nfsd_localio_operations doesn't make much sense. The difference is that
nfsd_file_put doesn't drop a reference to the nfs_net which is what
keeps nfsd from shutting down.
Currently, if nfsd tries to shutdown it will invalidate the files stored
in the list from the nfs_uuid and this will drop all references to the
nfsd net that the client holds. But the client could still hold some
references to nfsd_files for active IO. So nfsd might think is has
completely shut down local IO, but hasn't and has no way to wait for
those active IO requests to complete.
So this patch changes nfsd_file_get to nfsd_file_get_local and has it
increase the ref count on the nfsd net and it replaces all calls to
->nfsd_put_file to ->nfsd_put_file_local.
It also changes ->nfsd_open_local_fh to return with the refcount on the
net elevated precisely when a valid nfsd_file is returned.
This means that whenever the client holds a valid nfsd_file, there will
be an associated count on the nfsd net, and so the count can only reach
zero when all nfsd_files have been returned.
nfs_local_file_put() is changed to call nfs_to_nfsd_file_put_local()
instead of replacing calls to one with calls to the other because this
will help a later patch which changes nfs_to_nfsd_file_put_local() to
take an __rcu pointer while nfs_local_file_put() doesn't.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Rather than using nfs_uuid.lock to protect installing
a new ro_file or rw_file, change to use cmpxchg().
Removing the file already uses xchg() so this improves symmetry
and also makes the code a little simpler.
Also remove the optimisation of not taking the lock, and not removing
the nfs_file_localio from the linked list, when both ->ro_file and
->rw_file are already NULL. Given that ->nfs_uuid was not NULL, it is
extremely unlikely that neither ->ro_file or ->rw_file is NULL so
this optimisation can be of little value and it complicates
understanding of the code - why can the list_del_init() be skipped?
Finally, move the assignment of NULL to ->nfs_uuid until after
the last action on the nfs_file_localio (the list_del_init). As soon as
this is NULL a racing nfs_close_local_fh() can bypass all the locking
and go on to free the nfs_file_localio, so we must be certain to be
finished with it first.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
A recent commit introduced nfs4_do_mkdir() which reports an error from
nfs4_call_sync() by returning it with ERR_PTR().
This is a problem as nfs4_call_sync() can return negative NFS-specific
errors with values larger than MAX_ERRNO (4095). One example is
NFS4ERR_DELAY which has value 10008.
This "pointer" gets to PTR_ERR_OR_ZERO() in nfs4_proc_mkdir() which
chooses ZERO because it isn't in the range of value errors. Ultimately
the pointer is dereferenced.
This patch changes nfs4_do_mkdir() to report the dentry pointer and
status separately - pointer as a return value, status in an "int *"
parameter.
The same separation is used for _nfs4_proc_mkdir() and the two are
combined only in nfs4_proc_mkdir() after the status has passed through
nfs4_handle_exception(), which ensures the error code does not exceed
MAX_ERRNO.
It also fixes a problem in the even when nfs4_handle_exception() updated
the error value, the original 'alias' was still returned.
Reported-by: Anna Schumaker <anna@kernel.org>
Fixes: 8376583b84a1 ("nfs: change mkdir inode_operation to return alternate dentry if needed.")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
In some scenarios, when mounting NFS, more than one superblock may be
created. The final superblock used is the last one created, but only the
first superblock carries the ro flag passed from user space. If a ro flag
is added to the superblock via remount, it will trigger the issue
described in Link[1].
Link[2] attempted to address this by marking the superblock as ro during
the initial mount. However, this introduced a new problem in scenarios
where multiple mount points share the same superblock:
[root@a ~]# mount /dev/sdb /mnt/sdb
[root@a ~]# echo "/mnt/sdb *(rw,no_root_squash)" > /etc/exports
[root@a ~]# echo "/mnt/sdb/test_dir2 *(ro,no_root_squash)" >> /etc/exports
[root@a ~]# systemctl restart nfs-server
[root@a ~]# mount -t nfs -o rw 127.0.0.1:/mnt/sdb/test_dir1 /mnt/test_mp1
[root@a ~]# mount | grep nfs4
127.0.0.1:/mnt/sdb/test_dir1 on /mnt/test_mp1 type nfs4 (rw,relatime,...
[root@a ~]# mount -t nfs -o ro 127.0.0.1:/mnt/sdb/test_dir2 /mnt/test_mp2
[root@a ~]# mount | grep nfs4
127.0.0.1:/mnt/sdb/test_dir1 on /mnt/test_mp1 type nfs4 (ro,relatime,...
127.0.0.1:/mnt/sdb/test_dir2 on /mnt/test_mp2 type nfs4 (ro,relatime,...
[root@a ~]#
When mounting the second NFS, the shared superblock is marked as ro,
causing the previous NFS mount to become read-only.
To resolve both issues, the ro flag is no longer applied to the superblock
during remount. Instead, the ro flag on the mount is used to control
whether the mount point is read-only.
Fixes: 281cad46b34d ("NFS: Create a submount rpc_op")
Link[1]: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/
Link[2]: https://lore.kernel.org/all/20241130035818.1459775-1-lilingfeng3@huawei.com/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
As described in the link, commit 52cb7f8f1778 ("nfs: ignore SB_RDONLY when
mounting nfs") removed the check for the ro flag when determining whether
to share the superblock, which caused issues when mounting different
subdirectories under the same export directory via NFSv3. However, this
change did not affect NFSv4.
For NFSv3:
1) A single superblock is created for the initial mount.
2) When mounted read-only, this superblock carries the SB_RDONLY flag.
3) Before commit 52cb7f8f1778 ("nfs: ignore SB_RDONLY when mounting nfs"):
Subsequent rw mounts would not share the existing ro superblock due to
flag mismatch, creating a new superblock without SB_RDONLY.
After the commit:
The SB_RDONLY flag is ignored during superblock comparison, and this leads
to sharing the existing superblock even for rw mounts.
Ultimately results in write operations being rejected at the VFS layer.
For NFSv4:
1) Multiple superblocks are created and the last one will be kept.
2) The actually used superblock for ro mounts doesn't carry SB_RDONLY flag.
Therefore, commit 52cb7f8f1778 doesn't affect NFSv4 mounts.
Clear SB_RDONLY before getting superblock when NFS_MOUNT_UNSHARED is not
set to fix it.
Fixes: 52cb7f8f1778 ("nfs: ignore SB_RDONLY when mounting nfs")
Closes: https://lore.kernel.org/all/12d7ea53-1202-4e21-a7ef-431c94758ce5@app.fastmail.com/T/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
It was reported that NFS client mounts of AWS Elastic File System
(EFS) volumes is slow, this is because the AWS firewall disallows
LOCALIO (because it doesn't consider the use of NFS_LOCALIO_PROGRAM
valid), see: https://bugzilla.redhat.com/show_bug.cgi?id=2335129
Switch to performing the LOCALIO probe asynchronously to address the
potential for the NFS LOCALIO protocol being disallowed and/or slowed
by the remote server's response.
While at it, fix nfs_local_probe_async() to always take/put a
reference on the nfs_client that is using the LOCALIO protocol.
Also, unexport the nfs_local_probe() symbol and make it private to
fs/nfs/localio.c
This change has the side-effect of initially issuing reads, writes and
commits over the wire via SUNRPC until the LOCALIO probe completes.
Suggested-by: Jeff Layton <jlayton@kernel.org> # to always probe async
Fixes: 76d4cb6345da ("nfs: probe for LOCALIO when v4 client reconnects to server")
Cc: stable@vger.kernel.org # 6.14+
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Implementation follows bones of the pattern that was established in
commit a35518cae4b325 ("NFSv4.1/pnfs: fix NFS with TLS in pnfs").
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
The Linux NFS client and server added support for LOCALIO in Linux
v6.12. It is useful to know if a client and server negotiated LOCALIO
be used, so expose it through the 'localio' attribute.
Suggested-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Stop using write_cache_pages and use writeback_iter directly. This
removes an indirect call per written folio and makes the code easier
to follow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Use early returns wherever possible to simplify the code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
nfs_do_writepage is a successful return that requires the caller to
unlock the folio. Using it here requires special casing both in
nfs_do_writepage and nfs_writepages_callback and leaves a land mine in
nfs_wb_folio in case it ever set the flag. Remove it and just
unconditionally unlock in nfs_writepages_callback.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Fold nfs_page_async_flush into its only caller to clean up the code a
bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
fattr4_numlinks is a recommended attribute, so the client should emulate
it even if the server doesn't support it. In decode_attr_nlink function
in nfs4xdr.c, nlink is initialized to 1. However, this default value
isn't set to the inode due to the check in nfs_fhget.
So if the server doesn't support numlinks, inode's nlink will be zero,
the mount will fail with error "Stale file handle". Set the nlink to 1
if the server doesn't support it.
Signed-off-by: Han Young <hanyang.tony@bytedance.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
The NFS client's list of delegations can grow quite large (well beyond the
delegation watermark) if the server is revoking or there are repeated
events that expire state. Once this happens, the revoked delegations can
cause a performance problem for subsequent walks of the
servers->delegations list when the client tries to test and free state.
If we can determine that the FREE_STATEID operation has completed without
error, we can prune the delegation from the list.
Since the NFS client combines TEST_STATEID with FREE_STATEID in its minor
version operations, there isn't an easy way to communicate success of
FREE_STATEID. Rather than re-arrange quite a number of calling paths to
break out the separate procedures, let's signal the success of FREE_STATEID
by setting the stateid's type.
Set NFS4_FREED_STATEID_TYPE for stateids that have been successfully
discarded from the server, and use that type to signal that the delegation
can be cleaned up.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
fattr4_open_arguments is a v4.2 recommended attribute, so we shouldn't
be sending it to v4.1 servers.
Fixes: cb78f9b7d0c0 ("nfs: fix the fetch of FATTR4_OPEN_ARGUMENTS")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 6.11+
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Currently, when NFS is queried for all the labels present on the
file via a command example "getfattr -d -m . /mnt/testfile", it
does not return the security label. Yet when asked specifically for
the label (getfattr -n security.selinux) it will be returned.
Include the security label when all attributes are queried.
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
delegated
nfs_setattr will flush all pending writes before updating a file time
attributes. However when the client holds delegated timestamps, it can
update its timestamps locally as it is the authority for the file
times attributes. The client will later set the file attributes by
adding a setattr to the delegreturn compound updating the server time
attributes.
Fix nfs_setattr to avoid flushing pending writes when the file time
attributes are delegated and the mtime/atime are set to a fixed
timestamp (ATTR_[MODIFY|ACCESS]_SET. Also, when sending the setattr
procedure over the wire, we need to clear the correct attribute bits
from the bitmask.
I was able to measure a noticable speedup when measuring untar performance.
Test: $ time tar xzf ~/dir.tgz
Baseline: 1m13.072s
Patched: 0m49.038s
Which is more than 30% latency improvement.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
This implements a suggestion from Trond that we can mimic
FALLOC_FL_ZERO_RANGE by sending a compound that first does a DEALLOCATE
to punch a hole in a file, and then an ALLOCATE to fill the hole with
zeroes. There might technically be a race here, but once the DEALLOCATE
finishes any reads from the region would return zeroes anyway, so I
don't expect it to cause problems.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Sometimes, when a file was read while it was being truncated by
another NFS client, the kernel could deadlock because folio_unlock()
was called twice, and the second call would XOR back the `PG_locked`
flag.
Most of the time (depending on the timing of the truncation), nobody
notices the problem because folio_unlock() gets called three times,
which flips `PG_locked` back off:
1. vfs_read, nfs_read_folio, ... nfs_read_add_folio,
nfs_return_empty_folio
2. vfs_read, nfs_read_folio, ... netfs_read_collection,
netfs_unlock_abandoned_read_pages
3. vfs_read, ... nfs_do_read_folio, nfs_read_add_folio,
nfs_return_empty_folio
The problem is that nfs_read_add_folio() is not supposed to unlock the
folio if fscache is enabled, and a nfs_netfs_folio_unlock() check is
missing in nfs_return_empty_folio().
Rarely this leads to a warning in netfs_read_collection():
------------[ cut here ]------------
R=0000031c: folio 10 is not locked
WARNING: CPU: 0 PID: 29 at fs/netfs/read_collect.c:133 netfs_read_collection+0x7c0/0xf00
[...]
Workqueue: events_unbound netfs_read_collection_worker
RIP: 0010:netfs_read_collection+0x7c0/0xf00
[...]
Call Trace:
<TASK>
netfs_read_collection_worker+0x67/0x80
process_one_work+0x12e/0x2c0
worker_thread+0x295/0x3a0
Most of the time, however, processes just get stuck forever in
folio_wait_bit_common(), waiting for `PG_locked` to disappear, which
never happens because nobody is really holding the folio lock.
Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled")
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Pull jfs updates from David Kleikamp:
"A few small fixes for jfs"
* tag 'jfs-6.16' of github.com:kleikamp/linux-shaggy:
jfs: fix array-index-out-of-bounds read in add_missing_indices
jfs: Fix null-ptr-deref in jfs_ioc_trim
jfs: validate AG parameters in dbMount() to prevent crashes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This fixes delays when shutting down SCTP connections, and updates dlm
Kconfig for SCTP"
* tag 'dlm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: drop SCTP Kconfig dependency
dlm: reject SCTP configuration if not enabled
dlm: use SHUT_RDWR for SCTP shutdown
dlm: mask sk_shutdown value
|
|
Pull nfsd updates from Chuck Lever:
"The marquee feature for this release is that the limit on the maximum
rsize and wsize has been raised to 4MB. The default remains at 1MB,
but risk-seeking administrators now have the ability to try larger I/O
sizes with NFS clients that support them. Eventually the default
setting will be increased when we have confidence that this change
will not have negative impact.
With v6.16, NFSD now has its own debugfs file system where we can add
experimental features and make them available outside of our
development community without impacting production deployments. The
first experimental setting added is one that makes all NFS READ
operations use vfs_iter_read() instead of the NFSD splice actor. The
plan is to eventually retire the splice actor, as that will enable a
number of new capabilities such as the use of struct bio_vec from the
top to the bottom of the NFSD stack.
Jeff Layton contributed a number of observability improvements. The
use of dprintk() in a number of high-traffic code paths has been
replaced with static trace points.
This release sees the continuation of efforts to harden the NFSv4.2
COPY operation. Soon, the restriction on async COPY operations can be
lifted.
Many thanks to the contributors, reviewers, testers, and bug reporters
who participated during the v6.16 development cycle"
* tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits)
xdrgen: Fix code generated for counted arrays
SUNRPC: Bump the maximum payload size for the server
NFSD: Add a "default" block size
NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro
NFSD: Remove NFSD_BUFSIZE
sunrpc: Remove the RPCSVC_MAXPAGES macro
svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
sunrpc: Adjust size of socket's receive page array dynamically
SUNRPC: Remove svc_rqst :: rq_vec
SUNRPC: Remove svc_fill_write_vector()
NFSD: Use rqstp->rq_bvec in nfsd_iter_write()
SUNRPC: Export xdr_buf_to_bvec()
NFSD: De-duplicate the svc_fill_write_vector() call sites
NFSD: Use rqstp->rq_bvec in nfsd_iter_read()
sunrpc: Replace the rq_bvec array with dynamically-allocated memory
sunrpc: Replace the rq_pages array with dynamically-allocated memory
sunrpc: Remove backchannel check in svc_init_buffer()
sunrpc: Add a helper to derive maxpages from sv_max_mesg
svcrdma: Reduce the number of rdma_rw contexts per-QP
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"New ext4 features and performance improvements:
- Fast commit performance improvements
- Multi-fsblock atomic write support for bigalloc file systems
- Large folio support for regular files
This last can result in really stupendous performance for the right
workloads. For example, see [1] where the Kernel Test Robot reported
over 37% improvement on a large sequential I/O workload.
There are also the usual bug fixes and cleanups. Of note are cleanups
of the extent status tree to fix potential races that could result in
the extent status tree getting corrupted under heavy simultaneous
allocation and deallocation to a single file"
Link: https://lore.kernel.org/all/202505161418.ec0d753f-lkp@intel.com/ [1]
* tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits)
ext4: Add a WARN_ON_ONCE for querying LAST_IN_LEAF instead
ext4: Simplify flags in ext4_map_query_blocks()
ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER
ext4: Simplify last in leaf check in ext4_map_query_blocks
ext4: Unwritten to written conversion requires EXT4_EX_NOCACHE
ext4: only dirty folios when data journaling regular files
ext4: Add atomic block write documentation
ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc
ext4: Add multi-fsblock atomic write support with bigalloc
ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS
ext4: Make ext4_meta_trans_blocks() non-static for later use
ext4: Check if inode uses extents in ext4_inode_can_atomic_write()
ext4: Document an edge case for overwrites
jbd2: remove journal_t argument from jbd2_superblock_csum()
jbd2: remove journal_t argument from jbd2_chksum()
ext4: remove sb argument from ext4_superblock_csum()
ext4: remove sbi argument from ext4_chksum()
ext4: enable large folio for regular file
ext4: make online defragmentation support large folios
ext4: make the writeback path support large folios
...
|
|
https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs updates from Konstantin Komarov:
"Added:
- missing direct_IO in ntfs_aops_cmpr
- handling of hdr_first_de() return value
Fixed:
- handling of InitializeFileRecordSegment operation.
Removed:
- ability to change compression on mounted volume
- redundant NULL check"
* tag 'ntfs3_for_6.16' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: remove ability to change compression on mounted volume
fs/ntfs3: Fix handling of InitializeFileRecordSegment
fs/ntfs3: Add missing direct_IO in ntfs_aops_cmpr
fs/ntfs3: handle hdr_first_de() return value
fs/ntfs3: Drop redundant NULL check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs update from Mike Marshall:
"Convert to use the new mount API.
Code from Eric Sandeen at redhat that converts orangefs over to the
new mount API"
* tag 'for-linus-6.16-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Convert to use the new mount API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Fix xfstests generic/482 test failure
- Fix double free in delayed_free
* tag 'exfat-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: do not clear volume dirty flag during sync
exfat: fix double free in delayed_free
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"A fixup to the xarray conversion sent in the main 6.16 batch. It was
not included because it would cause rebase/refresh of like 80 patches,
right before sending the early pull request last week.
It's fixing a bug when zoned mode is enabled on btrfs so it's not
affecting most people"
* tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't drop a reference if btrfs_check_write_meta_pointer() fails
|
|
SMB2_QFS_info() has been unused since 2018's
commit 730928c8f4be ("cifs: update smb2_queryfs() to use compounding")
sign_CIFS_PDUs has been unused since 2009's
commit 2edd6c5b0517 ("[CIFS] NTLMSSP support moving into new file, old dead
code removed")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Should be "old_dir" here.
Fixes: 5c57132eaf52 ("f2fs: support project quota")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
no logic changes.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Syzbot reports a f2fs bug as below:
INFO: task syz-executor328:5856 blocked for more than 144 seconds.
Not tainted 6.15.0-rc6-syzkaller-00208-g3c21441eeffc #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor328 state:D stack:24392 pid:5856 tgid:5832 ppid:5826 task_flags:0x400040 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x168f/0x4c70 kernel/sched/core.c:6767
__schedule_loop kernel/sched/core.c:6845 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6860
io_schedule+0x81/0xe0 kernel/sched/core.c:7742
f2fs_balance_fs+0x4b4/0x780 fs/f2fs/segment.c:444
f2fs_map_blocks+0x3af1/0x43b0 fs/f2fs/data.c:1791
f2fs_expand_inode_data+0x653/0xaf0 fs/f2fs/file.c:1872
f2fs_fallocate+0x4f5/0x990 fs/f2fs/file.c:1975
vfs_fallocate+0x6a0/0x830 fs/open.c:338
ioctl_preallocate fs/ioctl.c:290 [inline]
file_ioctl fs/ioctl.c:-1 [inline]
do_vfs_ioctl+0x1b8f/0x1eb0 fs/ioctl.c:885
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x82/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The root cause is after commit 84b5bb8bf0f6 ("f2fs: modify
f2fs_is_checkpoint_ready logic to allow more data to be written with the
CP disable"), we will get chance to allow f2fs_is_checkpoint_ready() to
return true once below conditions are all true:
1. checkpoint is disabled
2. there are not enough free segments
3. there are enough free blocks
Then it will cause f2fs_balance_fs() to trigger foreground GC.
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
...
if (!f2fs_is_checkpoint_ready(sbi))
return;
And the testcase mounts f2fs image w/ gc_merge,checkpoint=disable, so deadloop
will happen through below race condition:
- f2fs_do_shutdown - vfs_fallocate - gc_thread_func
- file_start_write
- __sb_start_write(SB_FREEZE_WRITE)
- f2fs_fallocate
- f2fs_expand_inode_data
- f2fs_map_blocks
- f2fs_balance_fs
- prepare_to_wait
- wake_up(gc_wait_queue_head)
- io_schedule
- bdev_freeze
- freeze_super
- sb->s_writers.frozen = SB_FREEZE_WRITE;
- sb_wait_write(sb, SB_FREEZE_WRITE);
- if (sbi->sb->s_writers.frozen >= SB_FREEZE_WRITE) continue;
: cause deadloop
This patch fix to add check condition in f2fs_balance_fs(), so that if
checkpoint is disabled, we will just skip trigger foreground GC to
avoid such deadloop issue.
Meanwhile let's remove f2fs_is_checkpoint_ready() check condition in
f2fs_balance_fs(), since it's redundant, due to the main logic in the
function is to check:
a) whether checkpoint is disabled
b) there is enough free segments
f2fs_balance_fs() still has all logics after f2fs_is_checkpoint_ready()'s
removal.
Reported-by: syzbot+aa5bb5f6860e08a60450@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/682d743a.a00a0220.29bc26.0289.GAE@google.com
Fixes: 84b5bb8bf0f6 ("f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable")
Cc: Qi Han <hanqi@vivo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Check bi_status w/ BLK_STS_OK instead of 0 for cleanup.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just cleanup, no changes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
when performing buffered writes in a large section,
overhead is incurred due to the iteration through
ckpt_valid_blocks within the section.
when SEGS_PER_SEC is 128, this overhead accounts for 20% within
the f2fs_write_single_data_page routine.
as the size of the section increases, the overhead also grows.
to handle this problem ckpt_valid_blocks is
added within the section entries.
Test
insmod null_blk.ko nr_devices=1 completion_nsec=1 submit_queues=8
hw_queue_depth=64 max_sectors=512 bs=4096 memory_backed=1
make_f2fs /dev/block/nullb0
make_f2fs -s 128 /dev/block/nullb0
fio --bs=512k --size=1536M --rw=write --name=1
--filename=/mnt/test_dir/seq_write
--ioengine=io_uring --iodepth=64 --end_fsync=1
before
SEGS_PER_SEC 1
2556MiB/s
SEGS_PER_SEC 128
2145MiB/s
after
SEGS_PER_SEC 1
2556MiB/s
SEGS_PER_SEC 128
2556MiB/s
Signed-off-by: yohan.joung <yohan.joung@sk.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
segment in LFS mode.
In LFS mode, the previous segment cannot use invalid blocks,
so the remaining blocks from the next_blkoff of the current segment
to the end of the section are calculated.
Signed-off-by: yohan.joung <yohan.joung@sk.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
DONTCACHE I/O must have the completion punted to a workqueue, just like
what is done for unwritten extents, as the completion needs task context
to perform the invalidation of the folio(s). However, if writeback is
started off filemap_fdatawrite_range() off generic_sync() and it's an
overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
don't look at the folio being added and no further state is passed down
to help it know that this is a dropbehind/DONTCACHE write.
Check if the folio being added is marked as dropbehind, and set
IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
the decision making of completion context in xfs_submit_ioend().
Additionally include this ioend flag in the NOMERGE flags, to avoid
mixing it with unrelated IO.
Since this is the 3rd flag that will cause XFS to punt the completion to
a workqueue, add a helper so that each one of them can get appropriately
commented.
This fixes extra page cache being instantiated when the write performed
is an overwrite, rather than newly instantiated blocks.
Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/5153f6e8-274d-4546-bf55-30a5018e0d03@kernel.dk
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The commit 93e72b3c612adcaca1 ("squashfs: migrate from ll_rw_block usage
to BIO") removed caching of compressed blocks in SquashFS, causing fio
performance regression in workloads with repeated file reads. Without
caching, every read triggers disk I/O, severely impacting performance in
tools like fio.
This patch introduces a new CONFIG_SQUASHFS_COMP_CACHE_FULL Kconfig option
to enable caching of all compressed blocks, restoring performance to
pre-BIO migration levels. When enabled, all pages in a BIO are cached in
the page cache, reducing disk I/O for repeated reads. The fio test
results with this patch confirm the performance restoration:
For example, fio tests (iodepth=1, numjobs=1,
ioengine=psync) show a notable performance restoration:
Disable CONFIG_SQUASHFS_COMP_CACHE_FULL:
IOPS=815, BW=102MiB/s (107MB/s)(6113MiB/60001msec)
Enable CONFIG_SQUASHFS_COMP_CACHE_FULL:
IOPS=2223, BW=278MiB/s (291MB/s)(16.3GiB/59999msec)
The tradeoff is increased memory usage due to caching all compressed
blocks. The CONFIG_SQUASHFS_COMP_CACHE_FULL option allows users to enable
this feature selectively, balancing performance and memory usage for
workloads with frequent repeated reads.
Link: https://lkml.kernel.org/r/20250521072559.2389-1-chanho.min@lge.com
Signed-off-by: Chanho Min <chanho.min@lge.com>
Reviewed-by Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Configfs can be configured as a loadable module, which causes a link-time
failure for dm-crypt crash dump support:
crash_dump_dm_crypt.c:(.text+0x3a4): undefined reference to `config_item_init_type_name'
aarch64-linux-ld: kernel/crash_dump_dm_crypt.o: in function `configfs_dmcrypt_keys_init':
crash_dump_dm_crypt.c:(.init.text+0x90): undefined reference to `config_group_init'
aarch64-linux-ld: crash_dump_dm_crypt.c:(.init.text+0xb4): undefined reference to `configfs_register_subsystem'
aarch64-linux-ld: crash_dump_dm_crypt.c:(.init.text+0xd8): undefined reference to `configfs_unregister_subsystem'
This could be avoided with a dependency on CONFIGFS_FS=y, but the
dependency has an additional problem of causing Kconfig dependency loops
since most other uses select the symbol.
Using a simple 'select CONFIGFS_FS' here in turn fails with
CONFIG_DM_CRYPT=m, because that still only causes configfs to be a
loadable module.
The only version I found that fixes this reliably uses an additional
Kconfig symbol to ensure the 'select' actually turns on configfs as
builtin, with two additional changes to avoid dependency loops with nvme
and sysfs.
There is no compile-time dependency between configfs and sysfs, so
selecting configfs from a driver with sysfs disabled does not cause link
failures, only the default /sys/kernel/config mount point will not be
created.
Link: https://lkml.kernel.org/r/20250521160359.2132363-1-arnd@kernel.org
Fixes: 6b23858fd63b ("crash_dump: make dm crypt keys persist for the kdump kernel")
Fixes: 1fb470408497 ("nvme-loop: add configfs dependency")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Coiby Xu <coxu@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|