Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs overlayfs updates from Christian Brauner:
"Currently overlayfs uses the mounter's credentials for its
override_creds() calls. That provides a consistent permission model.
This patches allows a caller to instruct overlayfs to use its
credentials instead. The caller must be located in the same user
namespace hierarchy as the user namespace the overlayfs instance will
be mounted in. This provides a consistent and simple security model.
With this it is possible to e.g., mount an overlayfs instance where
the mounter must have CAP_SYS_ADMIN but the credentials used for
override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage
of custom fs{g,u}id different from the callers and other tweaks"
* tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/ovl: add third selftest for "override_creds"
selftests/ovl: add second selftest for "override_creds"
selftests/filesystems: add utils.{c,h}
selftests/ovl: add first selftest for "override_creds"
ovl: allow to specify override credentials
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs iomap updates from Christian Brauner:
- Allow the filesystem to submit the writeback bios.
- Allow the filsystem to track completions on a per-bio bases
instead of the entire I/O.
- Change writeback_ops so that ->submit_bio can be done by the
filesystem.
- A new ANON_WRITE flag for writes that don't have a block number
assigned to them at the iomap level leaving the filesystem to do
that work in the submission handler.
- Incremental iterator advance
The folio_batch support for zero range where the filesystem provides
a batch of folios to process that might not be logically continguous
requires more flexibility than the current offset based iteration
currently offers.
Update all iomap operations to advance the iterator within the
operation and thus remove the need to advance from the core iomap
iterator.
- Make buffered writes work with RWF_DONTCACHE
If RWF_DONTCACHE is set for a write, mark the folios being written as
uncached. On writeback completion the pages will be dropped.
- Introduce infrastructure for large atomic writes
This will eventually be used by xfs and ext4.
* tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
iomap: rework IOMAP atomic flags
iomap: comment on atomic write checks in iomap_dio_bio_iter()
iomap: inline iomap_dio_bio_opflags()
iomap: fix inline data on buffered read
iomap: Lift blocksize restriction on atomic writes
iomap: Support SW-based atomic writes
iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW
xfs: flag as supporting FOP_DONTCACHE
iomap: make buffered writes work with RWF_DONTCACHE
iomap: introduce a full map advance helper
iomap: rename iomap_iter processed field to status
iomap: remove unnecessary advance from iomap_iter()
dax: advance the iomap_iter on pte and pmd faults
dax: advance the iomap_iter on dedupe range
dax: advance the iomap_iter on unshare range
dax: advance the iomap_iter on zero range
dax: push advance down into dax_iomap_iter() for read and write
dax: advance the iomap_iter in the read/write path
iomap: convert misc simple ops to incremental advance
iomap: advance the iter on direct I/O
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Add CONFIG_DEBUG_VFS infrastucture:
- Catch invalid modes in open
- Use the new debug macros in inode_set_cached_link()
- Use debug-only asserts around fd allocation and install
- Place f_ref to 3rd cache line in struct file to resolve false
sharing
Cleanups:
- Start using anon_inode_getfile_fmode() helper in various places
- Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
f_pos_lock
- Add unlikely() to kcmp()
- Remove legacy ->remount_fs method from ecryptfs after port to the
new mount api
- Remove invalidate_inodes() in favour of evict_inodes()
- Simplify ep_busy_loopER by removing unused argument
- Avoid mmap sem relocks when coredumping with many missing pages
- Inline getname()
- Inline new_inode_pseudo() and de-staticize alloc_inode()
- Dodge an atomic in putname if ref == 1
- Consistently deref the files table with rcu_dereference_raw()
- Dedup handling of struct filename init and refcounts bumps
- Use wq_has_sleeper() in end_dir_add()
- Drop the lock trip around I_NEW wake up in evict()
- Load the ->i_sb pointer once in inode_sb_list_{add,del}
- Predict not reaching the limit in alloc_empty_file()
- Tidy up do_sys_openat2() with likely/unlikely
- Call inode_sb_list_add() outside of inode hash lock
- Sort out fd allocation vs dup2 race commentary
- Turn page_offset() into a wrapper around folio_pos()
- Remove locking in exportfs around ->get_parent() call
- try_lookup_one_len() does not need any locks in autofs
- Fix return type of several functions from long to int in open
- Fix return type of several functions from long to int in ioctls
Fixes:
- Fix watch queue accounting mismatch"
* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
fs: sort out fd allocation vs dup2 race commentary, take 2
fs: call inode_sb_list_add() outside of inode hash lock
fs: tidy up do_sys_openat2() with likely/unlikely
fs: predict not reaching the limit in alloc_empty_file()
fs: load the ->i_sb pointer once in inode_sb_list_{add,del}
fs: drop the lock trip around I_NEW wake up in evict()
fs: use wq_has_sleeper() in end_dir_add()
VFS/autofs: try_lookup_one_len() does not need any locks
fs: dedup handling of struct filename init and refcounts bumps
fs: consistently deref the files table with rcu_dereference_raw()
exportfs: remove locking around ->get_parent() call.
fs: use debug-only asserts around fd allocation and install
fs: dodge an atomic in putname if ref == 1
vfs: Remove invalidate_inodes()
ecryptfs: remove NULL remount_fs from super_operations
watch_queue: fix pipe accounting mismatch
fs: place f_ref to 3rd cache line in struct file to resolve false sharing
epoll: simplify ep_busy_loop by removing always 0 argument
fs: Turn page_offset() into a wrapper around folio_pos()
kcmp: improve performance adding an unlikely hint to task comparisons
...
|
|
In b529c06f9dc7 (Update the documentation referencing Plan 9 from User
Space., 2020-04-26), another instance of the link was left unfixed.
Fix that as well.
Signed-off-by: Tuomas Ahola <taahol@utu.fi>
Message-ID: <20250322153639.4917-1-taahol@utu.fi>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|
|
Commit dcdfdd40fa82 ("mm: Add support for unaccepted memory") added a
entry to meminfo but did not document it in the proc.rst file.
This counter tracks the amount of "Unaccepted" guest memory for some
Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP.
Add the missing entry in the documentation.
Link: https://lkml.kernel.org/r/20250317230403.79632-1-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "track memory used by balloon drivers", v2.
This series introduces a way to track memory used by balloon drivers.
Add a NR_BALLOON_PAGES counter to track how many pages are reclaimed by
the balloon drivers. First add the accounting, then updates the balloon
drivers (virtio, Hyper-V, VMware, Pseries-cmm, and Xen) to maintain this
counter. The virtio, Vmware, and pseries-cmm balloon drivers utilize the
balloon_compaction interface to allocate and free balloon pages. Other
balloon drivers will have to maintain this counter manually.
This makes the information visible in memory reporting interfaces like
/proc/meminfo, show_mem, and OOM reporting.
This provides admins visibility into their VM balloon sizes without
requiring different virtualization tooling. Furthermore, this information
is helpful when debugging an OOM inside a VM.
This patch (of 4):
Add NR_BALLOON_PAGES counter to track memory used by balloon drivers and
expose it through /proc/meminfo and other memory reporting interfaces.
[npache@redhat.com: document Balloon Meminfo entry]
Link: https://lkml.kernel.org/r/a0315ccf-f244-460e-8643-fd7388724fe5@redhat.com
Link: https://lkml.kernel.org/r/20250314213757.244258-1-npache@redhat.com
Link: https://lkml.kernel.org/r/20250314213757.244258-2-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Juegren Gross <jgross@suse.com>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
is that the FS ->iomap_begin callback could check if this flag is set to
decide whether to do a SW (FS-based) atomic write. But the FS can set
which ->iomap_begin callback it wants when deciding to do a FS-based
atomic write.
Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
the block driver can use SW-methods to emulate an atomic write. So change
back to IOMAP_ATOMIC.
The ->iomap_begin callback needs though to indicate to iomap core that
REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.
These changes were suggested by Christoph Hellwig and Dave Chinner.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
(CONFIG_NO_PAGE_MAPCOUNT)
Everything is in place to stop using the per-page mapcounts in large
folios: the mapcount of tail pages will always be logically 0 (-1 value),
just like it currently is for hugetlb folios already, and the page
mapcount of the head page is either 0 (-1 value) or contains a page type
(e.g., hugetlb).
Maintaining _nr_pages_mapped without per-page mapcounts is impossible, so
that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.
There are two remaining implications:
(1) Per-node, per-cgroup and per-lruvec stats of "NR_ANON_MAPPED"
("mapped anonymous memory") and "NR_FILE_MAPPED"
("mapped file memory"):
As soon as any page of the folio is mapped -- folio_mapped() -- we
now account the complete folio as mapped. Once the last page is
unmapped -- !folio_mapped() -- we account the complete folio as
unmapped.
This implies that ...
* "AnonPages" and "Mapped" in /proc/meminfo and
/sys/devices/system/node/*/meminfo
* cgroup v2: "anon" and "file_mapped" in "memory.stat" and
"memory.numa_stat"
* cgroup v1: "rss" and "mapped_file" in "memory.stat" and
"memory.numa_stat
... can now appear higher than before. But note that these folios do
consume that memory, simply not all pages are actually currently
mapped.
It's worth nothing that other accounting in the kernel (esp. cgroup
charging on allocation) is not affected by this change.
[why oh why is "anon" called "rss" in cgroup v1]
(2) Detecting partial mappings
Detecting whether anon THPs are partially mapped gets a bit more
unreliable. As long as a single MM maps such a large folio
("exclusively mapped"), we can reliably detect it. Especially before
fork() / after a short-lived child process quit, we will detect
partial mappings reliably, which is the common case.
In essence, if the average per-page mapcount in an anon THP is < 1,
we know for sure that we have a partial mapping.
However, as soon as multiple MMs are involved, we might miss detecting
partial mappings: this might be relevant with long-lived child
processes. If we have a fully-mapped anon folio before fork(), once
our child processes and our parent all unmap (zap/COW) the same pages
(but not the complete folio), we might not detect the partial mapping.
However, once the child processes quit we would detect the partial
mapping.
How relevant this case is in practice remains to be seen.
Swapout/migration will likely mitigate this.
In the future, RMAP walkers could check for that for that case
(e.g., when collecting access bits during reclaim) and simply flag
them for deferred-splitting.
Link: https://lkml.kernel.org/r/20250303163014.1128035-21-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
(CONFIG_NO_PAGE_MAPCOUNT)
Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.
When computing the output for smaps / smaps_rollups, in particular when
calculating the USS (Unique Set Size) and the PSS (Proportional Set Size),
we still rely on per-page mapcounts.
To determine private vs. shared, we'll use folio_likely_mapped_shared(),
similar to how we handle PM_MMAP_EXCLUSIVE. Similarly, we might now
under-estimate the USS and count pages towards "shared" that are actually
"private" ("exclusively mapped").
When calculating the PSS, we'll now also use the average per-page mapcount
for large folios: this can result in both, an over-estimation and an
under-estimation of the PSS. The difference is not expected to matter
much in practice, but we'll have to learn as we go.
We can now provide folio_precise_page_mapcount() only with
CONFIG_PAGE_MAPCOUNT, and remove one of the last users of per-page
mapcounts when CONFIG_NO_PAGE_MAPCOUNT is enabled.
Document the new behavior.
Link: https://lkml.kernel.org/r/20250303163014.1128035-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
(CONFIG_NO_PAGE_MAPCOUNT)
Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.
For calculating "mapmax", we now use the average per-page mapcount in a
large folio instead of the per-page mapcount.
For hugetlb folios and folios that are not partially mapped into MMs,
there is no change.
Likely, this change will not matter much in practice, and an alternative
might be to simple remove this stat with CONFIG_NO_PAGE_MAPCOUNT.
However, there might be value to it, so let's keep it like that and
document the behavior.
Link: https://lkml.kernel.org/r/20250303163014.1128035-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The dcssblk driver has long needed special case supoprt to enable limited
dax operation, so called CONFIG_FS_DAX_LIMITED. This mode works around
the incomplete support for ZONE_DEVICE on s390 by forgoing the ability of
dax-mapped pages to support GUP.
Now, pending cleanups to fsdax that fix its reference counting [1] depend
on the ability of all dax drivers to supply ZONE_DEVICE pages.
To allow that work to move forward, dax support needs to be paused for
dcssblk until ZONE_DEVICE support arrives. That work has been known for a
few years [2], and the removal of "pte_devmap" requirements [3] makes the
conversion easier.
For now, place the support behind CONFIG_BROKEN, and remove PFN_SPECIAL
(dcssblk was the only user).
Link: http://lore.kernel.org/cover.9f0e45d52f5cff58807831b6b867084d0b14b61c.1725941415.git-series.apopple@nvidia.com [1]
Link: http://lore.kernel.org/20210820210318.187742e8@thinkpad/ [2]
Link: http://lore.kernel.org/4511465a4f8429f45e2ac70d2e65dc5e1df1eb47.1725941415.git-series.apopple@nvidia.com [3]
Link: https://lkml.kernel.org/r/33eef2379c0d240f40cc15453fad2df1a4ae34c8.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a paragraph explaining what sort of capabilities a process would need
to read procfs data for some other process. Also mention that reading
data for its own process doesn't require any extra permissions.
Link: https://lkml.kernel.org/r/20250129001747.759990-1-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Footnotes list are outputted in htmldocs simply as long-running
paragraph instead. Use reST numbered footnotes syntax for the job.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
SubmttingPatches.rst has 4 section headings, all under the same heading
levels. In absence of title headings, these section headings are all
ended up as title headings in the docs output, which also affect
the index toctree (increasing titles to 6 from the original 2)
due to :numbered: option.
Demote second-to-last section headings, making "Submitting patches
to bcachefs" as title heading.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bcachefs subsystem currently has 4 docs: two are development notes and
the rest are actual filesystem docs. These two groups are clearly
distinct and can be organized.
Split the toctree into two, one for each docs group. While at it, also
reduce :maxdepth: so that only title headings are listed in the
toctrees.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Sphinx reports htmldocs toctree warning:
Documentation/filesystems/bcachefs/casefolding.rst: WARNING: document isn't included in any toctree
Fix the warning by adding casefolding documentation entry to bcachefs
toctree.
Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161728.32739f85@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The doc lists dirent structure for both regular and casefolded names,
yet it is written (and rendered) as long paragraph instead.
Write the structure list as bullet list.
Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Sphinx reports htmldocs warnings on dentry/dcache section:
Documentation/filesystems/bcachefs/casefolding.rst:75: WARNING: Title underline too short.
dentry/dcache considerations
--------- [docutils]
Documentation/filesystems/bcachefs/casefolding.rst:84: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Fix the section by:
* Extending the section underline to match the section title length;
* Separating problem list from surrounding paragraphs.
Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161911.2d16138b@canb.auug.org.au/
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Sphinx reports htmldocs warning:
Documentation/filesystems/bcachefs/casefolding.rst:36: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils]
That's because NUL word is italicized but it is written in plural form
instead (`NUL`s). Sphinx, however, doesn't tip over when the italicized
word in this fashion is followed by punctuation instead.
Do not italicize the word to keep Sphinx happy.
Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This patch implements support for case-insensitive file name lookups
in bcachefs.
The implementation uses the same UTF-8 lowering and normalization that
ext4 and f2fs is using.
More information is provided in Documentation/bcachefs/casefolding.rst
Compatibility notes:
This uses the new versioning scheme for incompatible features where an
incompatible feature is tied to a version number: the superblock says
"we may use incompat features up to x" and "incompat features up to x
are in use", disallowing mounting by previous versions.
Additionally, and old style incompat feature bit is used, so that
kernels without utf8 casefolding support know if casefolding
specifically is in use and they're allowed to mount.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
To simulate inconsistent node footer error.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We do not and cannot support file locking with NFS reexport over
NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport
server reboot cannot allow clients to recover locks because the source
NFS server has not rebooted, and so it is not in grace. Since the
source NFS server is not in grace, it cannot offer any guarantees that
the file won't have been changed between the locks getting lost and
any attempt to recover/reclaim them. The same applies to delegations
and any associated locks, so disallow them too.
Clients are no longer allowed to get file locks or delegations from a
reexport server, any attempts will fail with operation not supported.
Update the "Reboot recovery" section accordingly in
Documentation/filesystems/nfs/reexport.rst
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Introduce a new mount option "nat_bits" to control nat_bits feature,
by default nat_bits feature is disabled.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The function can be replaced by evict_inodes. The only difference is
that evict_inodes() skips the inodes with positive refcount without
touching ->i_lock, but they are equivalent as evict_inodes() repeats the
refcount check after having grabbed ->i_lock.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Currently atomic write support requires dedicated HW support. This imposes
a restriction on the filesystem that disk blocks need to be aligned and
contiguously mapped to FS blocks to issue atomic writes.
XFS has no method to guarantee FS block alignment for regular,
non-RT files. As such, atomic writes are currently limited to 1x FS block
there.
To deal with the scenario that we are issuing an atomic write over
misaligned or discontiguous data blocks - and raise the atomic write size
limit - support a SW-based software emulated atomic write mode. For XFS,
this SW-based atomic writes would use CoW support to issue emulated untorn
writes.
It is the responsibility of the FS to detect discontiguous atomic writes
and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed,
SW-based atomic writes could be used always when the mounted bdev does
not support HW offload, but this strategy is not initially expected to be
used.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In future xfs will support a SW-based atomic write, so rename
IOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used.
Also relocate setting of IOMAP_ATOMIC_HW to the write path in
__iomap_dio_rw(), to be clear that this flag is only relevant to writes.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Bring in iomap changes that xfs relies on.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add the 'k' prefix to id 21000. And id `u1000` in the third
idmapping should be mapped to `k31000`, not `u31000`.
Signed-off-by: Aiden Ma <jiaheng.ma@foxmail.com>
Link: https://lore.kernel.org/r/tencent_4E7B1F143E8051530C21FCADF4E014DCBB06@qq.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix a warning spotted by linux-next build (htmldocs):
Documentation/filesystems/porting.rst:1186: WARNING: Inline emphasis start-string without end-string. [docutils]
Introduced by commit
88d5baf69082 ("Change inode_operations.mkdir to return struct dentry *")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 88d5baf69082 ("Change inode_operations.mkdir to return struct dentry *")
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Page poisoning is an older debug option. The modern way to initialize
memory on free for security reasons is to set init_on_free=1.
Link: https://lore.kernel.org/r/20250304210156.14912-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Since the crypto kconfig options are being fixed to enable optimized
ChaCha20 automatically
(https://lore.kernel.org/r/Z8AY16EIqAYpfmRI@gondor.apana.org.au/), it is
no longer necessary to give a recommendation to enable it.
Link: https://lore.kernel.org/r/20250304205501.13797-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g. on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns. For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.
This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes. Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.
This lookup-after-mkdir requires that the directory remains locked to
avoid races. Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.
To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
NULL - the directory was created and no other dentry was used
ERR_PTR() - an error occurred
non-NULL - this other dentry was spliced in
This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations. Subsequent patches will make
further changes to some file-systems to return a correct dentry.
Not all filesystems reliably result in a positive hashed dentry:
- NFS, cifs, hostfs will sometimes need to perform a lookup of
the name to get inode information. Races could result in this
returning something different. Note that this lookup is
non-atomic which is what we are trying to avoid. Placing the
lookup in filesystem code means it only happens when the filesystem
has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
operation ensures that lookup will be called to correctly populate
the dentry. This could be fixed but I don't think it is important
to any of the users of vfs_mkdir() which look at the dentry.
The recommendation to use
d_drop();d_splice_alias()
is ugly but fits with current practice. A planned future patch will
change this.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE is
set for a write, mark the folios being written as uncached. Then
writeback completion will drop the pages. The write_iter handler simply
kicks off writeback for the pages, and writeback completion will take
care of the rest.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
NeilBrown <neilb@suse.de> says:
These two patches are cleanup are dependencies for my mkdir changes and
subsequence directory locking changes.
* patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits)
nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
nfs/vfs: discard d_exact_alias()
Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since 2002 (change "Replace BKL for chain locking with sysvfs-private
rwlock") the sysv filesystem was doing IO under a rwlock in its
get_block() function (yes, a non-sleepable lock hold over a function
used to read inode metadata for all reads and writes). Nobody noticed
until syzbot in 2023 [1]. This shows nobody is using the filesystem.
Just drop it.
[1] https://lore.kernel.org/all/0000000000000ccf9a05ee84f5b0@google.com/
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250220163940.10155-2-jack@suse.cz
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Currently overlayfs uses the mounter's credentials for it's
override_creds() calls. That provides a consistent permission model.
This patches allows a caller to instruct overlayfs to use its
credentials instead. The caller must be located in the same user
namespace hierarchy as the user namespace the overlayfs instance will be
mounted in. This provides a consistent and simple security model.
With this it is possible to e.g., mount an overlayfs instance where the
mounter must have CAP_SYS_ADMIN but the credentials used for
override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage of
custom fs{g,u}id different from the callers and other tweaks.
Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-1-46af55e4ceda@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Callers of lookup_one_qstr_excl() often check if the result is negative or
positive.
These changes can easily be moved into lookup_one_qstr_excl() by checking the
lookup flags:
LOOKUP_CREATE means it is NOT an error if the name doesn't exist.
LOOKUP_EXCL means it IS an error if the name DOES exist.
This patch adds these checks, then removes error checks from callers,
and ensures that appropriate flags are passed.
This subtly changes the meaning of LOOKUP_EXCL. Previously it could
only accompany LOOKUP_CREATE. Now it can accompany LOOKUP_RENAME_TARGET
as well. A couple of small changes are needed to accommodate this. The
NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does
exactly what the name says.
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250217003020.3170652-3-neilb@suse.de
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
negative dentry
No callers of kern_path_locked() or user_path_locked_at() want a
negative dentry. So change them to return -ENOENT instead. This
simplifies callers.
This results in a subtle change to bcachefs in that an ioctl will now
return -ENOENT in preference to -EXDEV. I believe this restores the
behaviour to what it was prior to
Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking")
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250217003020.3170652-2-neilb@suse.de
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Mention another potential usecase for FS_IOC_READ_VERITY_METADATA:
creating filesystem images which contain fs-verity-enabled files,
without having to redo all of the work in userspace.
Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Link: https://lore.kernel.org/r/20241126084833.70538-1-allison.karlitskaya@redhat.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Remove the repeated word "to" docs.
Signed-off-by: Charles Han <hanchunchao@inspur.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250207073433.23604-1-hanchunchao@inspur.com
|
|
Corrected the following spelling mistakes,
based on the suggestions by codespell:
1. Optionaly -> Optionally
2. prefereable -> preferable
3. peformance -> performance
4. ontext -> context
5. failuer -> failure
6. poiners -> pointers
7. realtively -> relatively
8. uptream -> upstream
Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250210043937.30952-1-ritvikfoss@gmail.com
|
|
After we remove ext4 journal callback, transaction->t_private_list is
not used anymore. Just remove unused transaction->t_private_list.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20241218145414.1422946-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Add an (initial?) patch submission checklist, focusing mainly on
testing.
Yes, all patches must be tested, and that starts (but does not end) with
the patch author.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a IOMAP_F_ANON_WRITE flag that indicates that the write I/O does not
have a target block assigned to it yet at iomap time and the file system
will do that in the bio submission handler, splitting the I/O as needed.
This is used to implement Zone Append based I/O for zoned XFS, where
splitting writes to the hardware limits and assigning a zone to them
happens just before sending the I/O off to the block layer, but could
also be useful for other things like compressed I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250206064035.2323428-4-hch@lst.de
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Change ->prepare_ioend to ->submit_ioend and require file systems that
implement it to submit the bio. This is needed for file systems that
do their own work on the bios before submitting them to the block layer
like btrfs or zoned xfs. To make this easier also pass the writeback
context to the method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250206064035.2323428-2-hch@lst.de
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_revalidate updates from Al Viro:
"Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name and
parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for expected name
and expected parent of the dentry being validated. That kills quite a
bit of boilerplate in ->d_revalidate() instances, along with a bunch
of races where they used to access ->d_name without sufficient
precautions"
* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: fix ->rename_sem exclusion
orangefs_d_revalidate(): use stable parent inode and name passed by caller
ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
nfs: fix ->d_revalidate() UAF on ->d_name accesses
nfs{,4}_lookup_validate(): use stable parent inode passed by caller
gfs2_drevalidate(): use stable parent inode and name passed by caller
fuse_dentry_revalidate(): use stable parent inode and name passed by caller
vfat_revalidate{,_ci}(): use stable parent inode passed by caller
exfat_d_revalidate(): use stable parent inode passed by caller
fscrypt_d_revalidate(): use stable parent inode passed by caller
ceph_d_revalidate(): propagate stable name down into request encoding
ceph_d_revalidate(): use stable parent inode passed by caller
afs_d_revalidate(): use stable name and parent inode passed by caller
Pass parent directory inode and expected name to ->d_revalidate()
generic_ci_d_compare(): use shortname_storage
ext4 fast_commit: make use of name_snapshot primitives
dissolve external_name.u into separate members
make take_dentry_name_snapshot() lockless
dcache: back inline names with a struct-wrapped array of unsigned long
make sure that DNAME_INLINE_LEN is a multiple of word size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"Add support for io-uring communication between kernel and userspace
using IORING_OP_URING_CMD (Bernd Schubert). Following features enable
gains in performance compared to the regular interface:
- Allow processing multiple requests with less syscall overhead
- Combine commit of old and fetch of new fuse request
- CPU/NUMA affinity of queues
Patches were reviewed by several people, including Pavel Begunkov,
io-uring co-maintainer"
* tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: prevent disabling io-uring on active connections
fuse: enable fuse-over-io-uring
fuse: block request allocation until io-uring init is complete
fuse: {io-uring} Prevent mount point hang on fuse-server termination
fuse: Allow to queue bg requests through io-uring
fuse: Allow to queue fg requests through io-uring
fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
fuse: {io-uring} Handle teardown of ring entries
fuse: Add io-uring sqe commit and fetch support
fuse: {io-uring} Make hash-list req unique finding functions non-static
fuse: Add fuse-io-uring handling into fuse_copy
fuse: Make fuse_copy non static
fuse: {io-uring} Handle SQEs - register commands
fuse: make args->in_args[0] to be always the header
fuse: Add fuse-io-uring design documentation
fuse: Move request bits
fuse: Move fuse_get_dev to header file
fuse: rename to fuse_dev_end_requests and make non-static
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable using direct IO with localio
- Added localio related tracepoints
Bugfixes:
- Sunrpc fixes for working with a very large cl_tasks list
- Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
- Fixes for handling reconnections with localio
- Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
- Fix COPY_NOTIFY xdr_buf size calculations
- pNFS/Flexfiles fix for retrying requesting a layout segment for
reads
- Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
expired
Cleanups:
- Various other nfs & nfsd localio cleanups
- Prepratory patches for async copy improvements that are under
development
- Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
xprts
- Add netns inum and srcaddr to debugfs rpc_xprt info"
* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
pnfs/flexfiles: retry getting layout segment for reads
NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
NFS: Rename struct nfs4_offloadcancel_data
NFS: Fix typo in OFFLOAD_CANCEL comment
NFS: CB_OFFLOAD can return NFS4ERR_DELAY
nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
nfs: fix incorrect error handling in LOCALIO
nfs: probe for LOCALIO when v3 client reconnects to server
nfs: probe for LOCALIO when v4 client reconnects to server
nfs/localio: remove redundant code and simplify LOCALIO enablement
nfs_common: add nfs_localio trace events
nfs_common: track all open nfsd_files per LOCALIO nfs_client
nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
nfsd: update percpu_ref to manage references on nfsd_net
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs updates from Greg KH:
"Here is the big set of driver core and debugfs updates for 6.14-rc1.
Included in here is a bunch of driver core, PCI, OF, and platform rust
bindings (all acked by the different subsystem maintainers), hence the
merge conflict with the rust tree, and some driver core api updates to
mark things as const, which will also require some fixups due to new
stuff coming in through other trees in this merge window.
There are also a bunch of debugfs updates from Al, and there is at
least one user that does have a regression with these, but Al is
working on tracking down the fix for it. In my use (and everyone
else's linux-next use), it does not seem like a big issue at the
moment.
Here's a short list of the things in here:
- driver core rust bindings for PCI, platform, OF, and some i/o
functions.
We are almost at the "write a real driver in rust" stage now,
depending on what you want to do.
- misc device rust bindings and a sample driver to show how to use
them
- debugfs cleanups in the fs as well as the users of the fs api for
places where drivers got it wrong or were unnecessarily doing
things in complex ways.
- driver core const work, making more of the api take const * for
different parameters to make the rust bindings easier overall.
- other small fixes and updates
All of these have been in linux-next with all of the aforementioned
merge conflicts, and the one debugfs issue, which looks to be resolved
"soon""
* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
rust: device: Use as_char_ptr() to avoid explicit cast
rust: device: Replace CString with CStr in property_present()
devcoredump: Constify 'struct bin_attribute'
devcoredump: Define 'struct bin_attribute' through macro
rust: device: Add property_present()
saner replacement for debugfs_rename()
orangefs-debugfs: don't mess with ->d_name
octeontx2: don't mess with ->d_parent or ->d_parent->d_name
arm_scmi: don't mess with ->d_parent->d_name
slub: don't mess with ->d_name
sof-client-ipc-flood-test: don't mess with ->d_name
qat: don't mess with ->d_name
xhci: don't mess with ->d_iname
mtu3: don't mess wiht ->d_iname
greybus/camera - stop messing with ->d_iname
mediatek: stop messing with ->d_iname
netdevsim: don't embed file_operations into your structs
b43legacy: make use of debugfs_get_aux()
b43: stop embedding struct file_operations into their objects
carl9170: stop embedding file_operations into their objects
...
|
|
9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.
->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain. Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.
Solution:
* have d_move() done in ->rename() rather than in its caller
* maintain a 9p-private rwsem (per-filesystem)
* hold it exclusive over the relevant part of ->rename()
* hold it shared over the places where we want the path.
That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control. However, there's
also __d_unalias(), which isn't covered by any of that.
If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.
Add a couple of optional methods, so that __d_unalias() would do
if alias->d_op->d_unalias_trylock != NULL
if (!alias->d_op->d_unalias_trylock(alias))
fail (resulting in -ESTALE from lookup)
__d_move(...)
if alias->d_op->d_unalias_unlock != NULL
alias->d_unalias_unlock(alias)
where it currently does __d_move(). 9p instances do down_write_trylock()
and up_write() of ->rename_mutex.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|