Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs into xfs-6.15-merge
XFS code for 6.15 depends on patches within iomap. Merge them before
pulling in XFS code.
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
We have a central definition for this function since 2023, used by
a number of different parts of the kernel.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
K Prateek Nayak <kprateek.nayak@amd.com> says:
Based on the suggestion on the RFC, the treewide conversion of
references to pipe->{head,tail} from unsigned int to pipe_index_t has
been dropped for now. The series contains trivial cleanup suggested to
limit the nr_slots in pipe_resize_ring() to be covered between
pipe_index_t limits of pipe->{head,tail} and using pipe_buf() to remove
the open-coded usage of masks to access pipe buffer building on Linus'
cleanup of fs/fuse/dev.c in commit ebb0f38bb47f ("fs/pipe: fix pipe
buffer index use in FUSE")
* patches from https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com:
fs/splice: Use pipe_buf() helper to retrieve pipe buffer
fs/pipe: Use pipe_buf() helper to retrieve pipe buffer
kernel/watch_queue: Use pipe_buf() to retrieve the pipe buffer
fs/pipe: Limit the slots in pipe_resize_ring()
Link: https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use pipe_buf() helper to retrieve the pipe buffer throughout the file
replacing the open-coded the logic.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250307052919.34542-5-kprateek.nayak@amd.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use pipe_buf() helper to retrieve the pipe buffer throughout the file
replacing the open-coded the logic.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250307052919.34542-4-kprateek.nayak@amd.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Limit the number of slots in pipe_resize_ring() to the maximum value
representable by pipe->{head,tail}. Values beyond the max limit can
lead to incorrect pipe occupancy related calculations where the pipe
will never appear full.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250307052919.34542-2-kprateek.nayak@amd.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Mainline now contains various changes to pipes that are relevant for
other pipe work this cycle. So merge them into the respective VFS tree.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"33 hotfixes. 24 are cc:stable and the remainder address post-6.13
issues or aren't considered necessary for -stable kernels.
26 are for MM and 7 are for non-MM.
- "mm: memory_failure: unmap poisoned folio during migrate properly"
from Ma Wupeng fixes a couple of two year old bugs involving the
migration of hwpoisoned folios.
- "selftests/damon: three fixes for false results" from SeongJae Park
fixes three one year old bugs in the SAMON selftest code.
The remainder are singletons and doubletons. Please see the individual
changelogs for details"
* tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits)
mm/page_alloc: fix uninitialized variable
rapidio: add check for rio_add_net() in rio_scan_alloc_net()
rapidio: fix an API misues when rio_add_net() fails
MAINTAINERS: .mailmap: update Sumit Garg's email address
Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
mm: fix finish_fault() handling for large folios
mm: don't skip arch_sync_kernel_mappings() in error paths
mm: shmem: remove unnecessary warning in shmem_writepage()
userfaultfd: fix PTE unmapping stack-allocated PTE copies
userfaultfd: do not block on locking a large folio with raised refcount
mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages
mm: shmem: fix potential data corruption during shmem swapin
mm: fix kernel BUG when userfaultfd_move encounters swapcache
selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
selftests/damon/damos_quota: make real expectation of quota exceeds
include/linux/log2.h: mark is_power_of_2() with __always_inline
NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
mm, swap: avoid BUG_ON in relocate_cluster()
mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
...
|
|
Introduce a new mount option "nat_bits" to control nat_bits feature,
by default nat_bits feature is disabled.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The function can be replaced by evict_inodes. The only difference is
that evict_inodes() skips the inodes with positive refcount without
touching ->i_lock, but they are equivalent as evict_inodes() repeats the
refcount check after having grabbed ->i_lock.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix below kernel warning:
fs/binfmt_elf_fdpic.c:1024:52: warning: variable 'excess1' set but not
used [-Wunused-but-set-variable]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250308022754.75013-1-sunliming@linux.dev
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull core dumping fix from Kees Cook:
- Only sort VMAs when core_sort_vma sysctl is set
* tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
coredump: Only sort VMAs when core_sort_vma sysctl is set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix leaked extent map after error when reading chunks
- replace use of deprecated strncpy
- in zoned mode, fixed range when ulocking extent range, causing a hang
* tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix a leaked chunk map issue in read_one_chunk()
btrfs: replace deprecated strncpy() with strscpy()
btrfs: zoned: fix extent range end unlock in cow_file_range()
|
|
The commit titled "block/bdev: lift block size restrictions to 64k"
lifted the block layer's max supported block size to 64k inside the
helper blk_validate_block_size() now that we support large folios.
However in lifting the block size we also removed the silly use
cases many filesystems have to use sb_set_blocksize() to *verify*
that the block size <= PAGE_SIZE. The call to sb_set_blocksize() was
used to check the block size <= PAGE_SIZE since historically we've
always supported userspace to create for example 64k block size
filesystems even on 4k page size systems, but what we didn't allow
was mounting them. Older filesystems have been using the check with
sb_set_blocksize() for years.
While, we could argue that such checks should be filesystem specific,
there are much more users of sb_set_blocksize() than LBS enabled
filesystem on upstream, so just do the easier thing and bring back
the PAGE_SIZE check for sb_set_blocksize() users and only skip it
for LBS enabled filesystems.
This will ensure that tests such as generic/466 when run in a loop
against say, ext4, won't try to try to actually mount a filesystem with
a block size larger than your filesystem supports given your PAGE_SIZE
and in the worst case crash.
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20250307020403.3068567-1-mcgrof@kernel.org
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
As agreed with the fwupd/LVFS maintainer, this reverts commit
fc20737d8b85691ecabab3739ed7d06c9b7bc00f again for the v6.15 cycle,
leaving them sufficient time to roll out a fix for the issue that the
reverted commit works around.
Link: https://lore.kernel.org/all/63837c36eceaf8cf2af7933dccca54ff4dd9f30d.camel@HansenPartnership.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The fix to atomically read the pipe head and tail state when not holding
the pipe mutex has caused a number of headaches due to the size change
of the involved types.
It turns out that we don't have _that_ many places that access these
fields directly and were affected, but we have more than we strictly
should have, because our low-level helper functions have been designed
to have intimate knowledge of how the pipes work.
And as a result, that random noise of direct 'pipe->head' and
'pipe->tail' accesses makes it harder to pinpoint any actual potential
problem spots remaining.
For example, we didn't have a "is the pipe full" helper function, but
instead had a "given these pipe buffer indexes and this pipe size, is
the pipe full". That's because some low-level pipe code does actually
want that much more complicated interface.
But most other places literally just want a "is the pipe full" helper,
and not having it meant that those places ended up being unnecessarily
much too aware of this all.
It would have been much better if only the very core pipe code that
cared had been the one aware of this all.
So let's fix it - better late than never. This just introduces the
trivial wrappers for "is this pipe full or empty" and to get how many
pipe buffers are used, so that instead of writing
if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
the places that literally just want to know if a pipe is full can just
say
if (pipe_is_full(pipe))
instead. The existing trivial cases were converted with a 'sed' script.
This cuts down on the places that access pipe->head and pipe->tail
directly outside of the pipe code (and core splice code) quite a lot.
The splice code in particular still revels in doing the direct low-level
accesses, and the fuse fuse_dev_splice_write() code also seems a bit
unnecessarily eager to go very low-level, but it's at least a bit better
than it used to be.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull bcachefs fixes from Kent Overstreet:
- Fix a compatibility issue: we shouldn't be setting incompat feature
bits unless explicitly requested
- Fix another bug where the journal alloc/resize path could spuriously
fail with -BCH_ERR_open_buckets_empty
- Copygc shouldn't run on read-only devices: fragmentation isn't an
issue if we're not currently writing to a given device, and it may
not have anywhere to move the data to
* tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefs:
bcachefs: copygc now skips non-rw devices
bcachefs: Fix bch2_dev_journal_alloc() spuriously failing
bcachefs: Don't set BCH_FEATURE_incompat_version_field unless requested
|
|
There's no point in doing copygc on non-rw devices: the fragmentation
doesn't matter if we're not writing to them, and we may not have
anywhere to put the data on our other devices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we fixed journal resize spuriousl failing with
-BCH_ERR_open_buckets_empty, but initial journal allocation was missed
because it didn't invoke the "block on allocator" loop at all.
Factor out the "loop on allocator" code to fix that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc6).
Conflicts:
net/ethtool/cabletest.c
2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock")
637399bf7e77 ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device")
No Adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull smb fixes from Steve French:
"Five SMB server fixes, two related client fixes, and minor MAINTAINERS
update:
- Two SMB3 lock fixes fixes (including use after free and bug on fix)
- Fix to race condition that can happen in processing IPC responses
- Four ACL related fixes: one related to endianness of num_aces, and
two related fixes to the checks for num_aces (for both client and
server), and one fixing missing check for num_subauths which can
cause memory corruption
- And minor update to email addresses in MAINTAINERS file"
* tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbd:
cifs: fix incorrect validation for num_aces field of smb_acl
ksmbd: fix incorrect validation for num_aces field of smb_acl
smb: common: change the data type of num_aces to le16
ksmbd: fix bug on trap in smb2_lock
ksmbd: fix use-after-free in smb2_lock
ksmbd: fix type confusion via race condition when using ipc_msg_send_request
ksmbd: fix out-of-bounds in parse_sec_desc()
MAINTAINERS: update email address in cifs and ksmbd entry
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fixes from Namjae Jeon:
- Optimize new cluster allocation by correctly find empty entry slot
- Add a check to prevent excessive bitmap clearing due to invalid
data size of file/dir entry
- Fix incorrect error return for zero-byte writes
* tag 'exfat-for-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: add a check for invalid data size
exfat: short-circuit zero-byte writes in exfat_file_write_iter
exfat: fix soft lockup in exfat_clear_bitmap
exfat: fix just enough dentries but allocate a new cluster to dir
|
|
This was another case that Rasmus pointed out where the direct access to
the pipe head and tail pointers broke on 32-bit configurations due to
the type changes.
As with the pipe FIONREAD case, fix it by using the appropriate helper
functions that deal with the right pipe index sizing.
Reported-by: Rasmus Villemoes <ravi@prevas.dk>
Link: https://lore.kernel.org/all/878qpi5wz4.fsf@prevas.dk/
Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg >
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rasmus points out that we do indeed have other cases of breakage from
the type changes that were introduced on 32-bit targets in order to read
the pipe head and tail values atomically (commit 3d252160b818: "fs/pipe:
Read pipe->{head,tail} atomically outside pipe->mutex").
Fix it up by using the proper helper functions that now deal with the
pipe buffer index types properly. This makes the code simpler and more
obvious.
The compiler does the CSE and loop hoisting of the pipe ring size
masking that we used to do manually, so open-coding this was never a
good idea.
Reported-by: Rasmus Villemoes <ravi@prevas.dk>
Link: https://lore.kernel.org/all/87cyeu5zgk.fsf@prevas.dk/
Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ntfs_flush_inodes() was added in 2021 by
commit 82cae269cfa9 ("fs/ntfs3: Add initialization of super block")
but has remained unused.
Remove it, and it's helper function.
Note: There is a commented out call to ntfs_flush_inodes in
ntfs_truncate() - I've left that in place.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
ntfs_sb_read() was added in 2021 by
commit 82cae269cfa9 ("fs/ntfs3: Add initialization of super block")
but hasn't been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
ni_load_attr() was added in 2021 by
commit 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
but hasn't been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
The "de_off" and "used" variables come from the disk so they both need to
check. The problem is that on 32bit systems if they're both greater than
UINT_MAX - 16 then the check does work as intended because of an integer
overflow.
Fixes: 60ce8dfde035 ("fs/ntfs3: Fix wrong if in hdr_first_de")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
On 32bit systems the "off + sizeof(struct NTFS_DE)" addition can
have an integer wrapping issue. Fix it by using size_add().
Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
Add btrfs_free_chunk_map() to free the memory allocated
by btrfs_alloc_chunk_map() if btrfs_add_chunk_map() fails.
Fixes: 7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps")
CC: stable@vger.kernel.org
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Filesystems like ext4 can submit writes in multiples of blocksizes.
But we still can't allow the writes to be split. Hence let's check if
the iomap_length() is same as iter->len or not.
It is the role of the FS to ensure that a single mapping may be created
for an atomic write. The FS will also continue to check size and alignment
legality.
Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
jpg: Tweak commit message
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250303171120.2837067-7-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Currently atomic write support requires dedicated HW support. This imposes
a restriction on the filesystem that disk blocks need to be aligned and
contiguously mapped to FS blocks to issue atomic writes.
XFS has no method to guarantee FS block alignment for regular,
non-RT files. As such, atomic writes are currently limited to 1x FS block
there.
To deal with the scenario that we are issuing an atomic write over
misaligned or discontiguous data blocks - and raise the atomic write size
limit - support a SW-based software emulated atomic write mode. For XFS,
this SW-based atomic writes would use CoW support to issue emulated untorn
writes.
It is the responsibility of the FS to detect discontiguous atomic writes
and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed,
SW-based atomic writes could be used always when the mounted bdev does
not support HW offload, but this strategy is not initially expected to be
used.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In future xfs will support a SW-based atomic write, so rename
IOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used.
Also relocate setting of IOMAP_ATOMIC_HW to the write path in
__iomap_dio_rw(), to be clear that this flag is only relevant to writes.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Bring in iomap changes that xfs relies on.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The pages being passed in are always folios (since they come from the
page cache). This eliminates several hidden calls to compound_head(),
and uses of legacy APIs.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-10-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This produces a bvec which is slightly different as the last page is added
in its entirety rather than only the portion which is being written back.
However we don't use this information anywhere; the iovec has its own
length parameter.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-9-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Both arms of this conditional now have the same loop, so sink it out
of the conditional.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-8-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Remove two accesses to page->mapping by passing the mapping from
orangefs_writepages() to orangefs_writepages_callback() and then
orangefs_writepages_work(). That makes it obvious that all folios come
from the same mapping, so we can hoist the call to mapping_set_error()
outside the loop. While I'm here, switch from write_cache_pages()
to writeback_iter() which removes an indirect function call.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-7-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Both callers have a folio, pass it in and use it inside
orangefs_writepage_locked(). Removes a few hidden calls to
compound_head() and accesses to page->mapping.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-6-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If we add a migrate_folio operation, we can remove orangefs_writepage
(as there is already a writepages operation). filemap_migrate_folio()
will do fine as struct orangefs_write_range does not need to be adjusted
when the folio is migrated.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-5-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
sparse currently warns:
fs/orangefs/file.c:119:32: warning: incorrect type in assignment (different base types)
fs/orangefs/file.c:119:32: expected int open_for_write
fs/orangefs/file.c:119:32: got restricted fmode_t
Turning open_for_write and open_for_read into booleans (which is how
they're used) removes this warning.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-4-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Attempting to build orangefs with W=1 currently reports errors like:
In file included from ../fs/orangefs/protocol.h:287,
from ../fs/orangefs/waitqueue.c:16:
../fs/orangefs/orangefs-debug.h:86:18: error: ‘num_kmod_keyword_mask_map’ defined but not used [-Werror=unused-const-variable=]
Move num_kmod_keyword_mask_map, s_kmod_keyword_mask_map and
struct __keyword_mask_s to orangefs-debugfs.c which is the only file
they're used in.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-3-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
'len' is used to store the result of i_size_read(), so making 'len'
a size_t results in truncation to 4GiB on 32-bit systems.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250305204734.1475264-2-willy@infradead.org
Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for it so
nfs_release_folio() can skip calling nfs_wb_folio() from kcompactd.
Otherwise NFS can deadlock waiting for kcompactd enduced writeback which
recurses back to NFS (which triggers writeback to NFSD via NFS loopback
mount on the same host, NFSD blocks waiting for XFS's call to
__filemap_get_folio):
6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds.
{---
[58] "kcompactd0"
[<0>] folio_wait_bit+0xe8/0x200
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] nfs_wb_folio+0x80/0x1b0 [nfs]
[<0>] nfs_release_folio+0x68/0x130 [nfs]
[<0>] split_huge_page_to_list_to_order+0x362/0x840
[<0>] migrate_pages_batch+0x43d/0xb90
[<0>] migrate_pages_sync+0x9a/0x240
[<0>] migrate_pages+0x93c/0x9f0
[<0>] compact_zone+0x8e2/0x1030
[<0>] compact_node+0xdb/0x120
[<0>] kcompactd+0x121/0x2e0
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x40
[<0>] ret_from_fork_asm+0x1a/0x30
---}
[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/20250225022002.26141-1-snitzer@kernel.org
Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Protect ext4_release_dquot against freezing so that we
don't try to start a transaction when FS is frozen, leading
to warnings.
Further, avoid taking the freeze protection if a transaction
is already running so that we don't need end up in a deadlock
as described in
46e294efc355 ext4: fix deadlock with fs freezing and EA inodes
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241121123855.645335-3-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This bumps failing open rate by 1.7% on Sapphire Rapids by avoiding one
atomic.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250305123644.554845-5-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When tracing a kernel build over refcounts seen this is a wash:
@[kprobe:filp_close]:
[0] 32195 |@@@@@@@@@@ |
[1] 164567 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
I verified vast majority of the skew comes from do_close_on_exec() which
could be changed to use a different variant instead.
Even without changing that, the 19.5% of calls which got here still can
save the extra atomic. Calls here are borderline non-existent compared
to fput (over 3.2 mln!), so they should not negatively affect
scalability.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250305123644.554845-4-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This bumps open+close rate by 1% on Sapphire Rapids by eliding one
atomic.
It would be higher if it was not for several other slowdowns of the same
nature.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250305123644.554845-3-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Vast majority of the time closing a file descriptor also operates on the
last reference, where a regular fput usage will result in 2 atomics.
This can be changed to only suffer 1.
See commentary above file_ref_put_close() for more information.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250305123644.554845-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Vast majority of the time the system call returns 0.
Letting the compiler know shortens the routine (119 -> 116) and the fast
path.
Disasm starting at the call to __fput_sync():
before:
<+55>: call 0xffffffff816b0da0 <__fput_sync>
<+60>: lea 0x201(%rbx),%eax
<+66>: cmp $0x1,%eax
<+69>: jbe 0xffffffff816ab707 <__x64_sys_close+103>
<+71>: mov %ebx,%edx
<+73>: movslq %ebx,%rax
<+76>: and $0xfffffffd,%edx
<+79>: cmp $0xfffffdfc,%edx
<+85>: mov $0xfffffffffffffffc,%rdx
<+92>: cmove %rdx,%rax
<+96>: pop %rbx
<+97>: pop %rbp
<+98>: jmp 0xffffffff82242fa0 <__x86_return_thunk>
<+103>: mov $0xfffffffffffffffc,%rax
<+110>: jmp 0xffffffff816ab700 <__x64_sys_close+96>
<+112>: mov $0xfffffffffffffff7,%rax
<+119>: jmp 0xffffffff816ab700 <__x64_sys_close+96>
after:
<+56>: call 0xffffffff816b0da0 <__fput_sync>
<+61>: xor %eax,%eax
<+63>: test %ebp,%ebp
<+65>: jne 0xffffffff816ab6ea <__x64_sys_close+74>
<+67>: pop %rbx
<+68>: pop %rbp
<+69>: jmp 0xffffffff82242fa0 <__x86_return_thunk> # the jmp out
<+74>: lea 0x201(%rbp),%edx
<+80>: mov $0xfffffffffffffffc,%rax
<+87>: cmp $0x1,%edx
<+90>: jbe 0xffffffff816ab6e3 <__x64_sys_close+67>
<+92>: mov %ebp,%edx
<+94>: and $0xfffffffd,%edx
<+97>: cmp $0xfffffdfc,%edx
<+103>: cmovne %rbp,%rax
<+107>: jmp 0xffffffff816ab6e3 <__x64_sys_close+67>
<+109>: mov $0xfffffffffffffff7,%rax
<+116>: jmp 0xffffffff816ab6e3 <__x64_sys_close+67>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250301104356.246031-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|