summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-30vfs: elide smp_mb in iversion handling in the common caseMateusz Guzik
According to bpftrace on these routines most calls result in cmpxchg, which already provides the same guarantee. In inode_maybe_inc_iversion elision is possible because even if the wrong value was read due to now missing smp_mb fence, the issue is going to correct itself after cmpxchg. If it appears cmpxchg wont be issued, the fence + reload are there bringing back previous behavior. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240815083310.3865-1-mjguzik@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30autofs: add per dentry expire timeoutIan Kent
Add ability to set per-dentry mount expire timeout to autofs. There are two fairly well known automounter map formats, the autofs format and the amd format (more or less System V and Berkley). Some time ago Linux autofs added an amd map format parser that implemented a fair amount of the amd functionality. This was done within the autofs infrastructure and some functionality wasn't implemented because it either didn't make sense or required extra kernel changes. The idea was to restrict changes to be within the existing autofs functionality as much as possible and leave changes with a wider scope to be considered later. One of these changes is implementing the amd options: 1) "unmount", expire this mount according to a timeout (same as the current autofs default). 2) "nounmount", don't expire this mount (same as setting the autofs timeout to 0 except only for this specific mount) . 3) "utimeout=<seconds>", expire this mount using the specified timeout (again same as setting the autofs timeout but only for this mount). To implement these options per-dentry expire timeouts need to be implemented for autofs indirect mounts. This is because all map keys (mounts) for autofs indirect mounts use an expire timeout stored in the autofs mount super block info. structure and all indirect mounts use the same expire timeout. Now I have a request to add the "nounmount" option so I need to add the per-dentry expire handling to the kernel implementation to do this. The implementation uses the trailing path component to identify the mount (and is also used as the autofs map key) which is passed in the autofs_dev_ioctl structure path field. The expire timeout is passed in autofs_dev_ioctl timeout field (well, of the timeout union). If the passed in timeout is equal to -1 the per-dentry timeout and flag are cleared providing for the "unmount" option. If the timeout is greater than or equal to 0 the timeout is set to the value and the flag is also set. If the dentry timeout is 0 the dentry will not expire by timeout which enables the implementation of the "nounmount" option for the specific mount. When the dentry timeout is greater than zero it allows for the implementation of the "utimeout=<seconds>" option. Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/20240814090231.963520-1-raven@themaw.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: use RCU in ilookupMateusz Guzik
A soft lockup in ilookup was reported when stress-testing a 512-way system [1] (see [2] for full context) and it was verified that not taking the lock shifts issues back to mm. [1] https://lore.kernel.org/linux-mm/56865e57-c250-44da-9713-cf1404595bcc@amd.com/ [2] https://lore.kernel.org/linux-mm/d2841226-e27b-4d3d-a578-63587a3aa4f3@amd.com/ Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240715071324.265879-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: move FMODE_UNSIGNED_OFFSET to fop_flagsChristian Brauner
This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: only read fops once in fops_get/putMateusz Guzik
In do_dentry_open() the usage is: f->f_op = fops_get(inode->i_fop); In generated asm the compiler emits 2 reads from inode->i_fop instead of just one. This popped up due to false-sharing where loads from that offset end up bouncing a cacheline during parallel open. While this is going to be fixed, the spurious load does not need to be there. This makes do_dentry_open() go down from 1177 to 1154 bytes. fops_put() is patched to maintain some consistency. No functional changes. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240810064753.1211441-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs/select: Annotate struct poll_list with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Link: https://lore.kernel.org/r/20240808150023.72578-2-thorsten.blum@toblux.com Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: rearrange general fastpath check now that O_CREAT uses itChristian Brauner
If we find a positive dentry we can now simply try and open it. All prelimiary checks are already done with or without O_CREAT. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: remove audit dummy context checkChristian Brauner
Now that we audit later during lookup_open() we can remove the audit dummy context check. This simplifies things a lot. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: pull up trailing slashes check for O_CREATChristian Brauner
Perform the check for trailing slashes right in the fastpath check and don't bother with any additional work. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: move audit parent inodeChristian Brauner
During O_CREAT we unconditionally audit the parent inode. This makes it difficult to support a fastpath for O_CREAT when the file already exists because we have to drop out of RCU lookup needlessly. We worked around this by checking whether audit was actually active but that's also suboptimal. Instead, move the audit of the parent inode down into lookup_open() at a point where it's mostly certain that the file needs to be created. This also reduced the inconsistency that currently exists: while audit on the parent is done independent of whether or no the file already existed an audit on the file is only performed if it has been created. By moving the audit down a bit we emit the audit a little later but it will allow us to simplify the fastpath for O_CREAT significantly. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: try an opportunistic lookup for O_CREAT opens tooJeff Layton
Today, when opening a file we'll typically do a fast lookup, but if O_CREAT is set, the kernel always takes the exclusive inode lock. I assume this was done with the expectation that O_CREAT means that we always expect to do the create, but that's often not the case. Many programs set O_CREAT even in scenarios where the file already exists. This patch rearranges the pathwalk-for-open code to also attempt a fast_lookup in certain O_CREAT cases. If a positive dentry is found, the inode_lock can be avoided altogether, and if auditing isn't enabled, it can stay in rcuwalk mode for the last step_into. One notable exception that is hopefully temporary: if we're doing an rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the dentry in that case is more expensive than taking the i_rwsem for now. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240807-openfast-v3-1-040d132d2559@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30eventpoll: Annotate data-race of busy_poll_usecsMartin Karsten
A struct eventpoll's busy_poll_usecs field can be modified via a user ioctl at any time. All reads of this field should be annotated with READ_ONCE. Fixes: 85455c795c07 ("eventpoll: support busy poll per epoll instance") Cc: stable@vger.kernel.org Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://lore.kernel.org/r/20240806123301.167557-1-jdamato@fastly.com Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30eventpoll: Don't re-zero eventpoll fieldsJoe Damato
Remove redundant and unnecessary code. ep_alloc uses kzalloc to create struct eventpoll, so there is no need to set fields to defaults of 0. This was accidentally introduced in commit 85455c795c07 ("eventpoll: support busy poll per epoll instance") and expanded on in follow-up commits. Signed-off-by: Joe Damato <jdamato@fastly.com> Link: https://lore.kernel.org/r/20240807105231.179158-1-jdamato@fastly.com Reviewed-by: Martin Karsten <mkarsten@uwaterloo.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30Fix spelling and gramatical errorsXiaxi Shen
Fixed 3 typos in design.rst Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Link: https://lore.kernel.org/r/20240807070536.14536-1-shenxiaxi26@gmail.com Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: dodge smp_mb in break_lease and break_deleg in the common caseMateusz Guzik
These inlines show up in the fast path (e.g., in do_dentry_open()) and induce said full barrier regarding i_flctx access when in most cases the pointer is NULL. The pointer can be safely checked before issuing the barrier, dodging it in most cases as a result. It is plausible the consume fence would be sufficient, but I don't want to go audit all callers regarding what they before calling here. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240806172846.886570-1-mjguzik@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30file: remove outdated comment after close_fd()Joel Savitz
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org The comment on EXPORT_SYMBOL(close_fd) was added in commit 2ca2a09d6215 ("fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()"), before commit 8760c909f54a ("file: Rename __close_fd to close_fd and remove the files parameter") gave the function its current name, however commit 1572bfdf21d4 ("file: Replace ksys_close with close_fd") removes the referenced caller entirely, obsoleting this comment. Signed-off-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/r/20240803025455.239276-1-jsavitz@redhat.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs/namespace.c: Fix typo in commentYuesong Li
replace 'permanetly' with 'permanently' in the comment & replace 'propogated' with 'propagated' in the comment Signed-off-by: Yuesong Li <liyuesong@vivo.com> Link: https://lore.kernel.org/r/20240806034710.2807788-1-liyuesong@vivo.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30exec: don't WARN for racy path_noexec checkMateusz Guzik
Both i_mode and noexec checks wrapped in WARN_ON stem from an artifact of the previous implementation. They used to legitimately check for the condition, but that got moved up in two commits: 633fb6ac3980 ("exec: move S_ISREG() check earlier") 0fd338b2d2cd ("exec: move path_noexec() check earlier") Instead of being removed said checks are WARN_ON'ed instead, which has some debug value. However, the spurious path_noexec check is racy, resulting in unwarranted warnings should someone race with setting the noexec flag. One can note there is more to perm-checking whether execve is allowed and none of the conditions are guaranteed to still hold after they were tested for. Additionally this does not validate whether the code path did any perm checking to begin with -- it will pass if the inode happens to be regular. Keep the redundant path_noexec() check even though it's mindless nonsense checking for guarantee that isn't given so drop the WARN. Reword the commentary and do small tidy ups while here. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240805131721.765484-1-mjguzik@gmail.com [brauner: keep redundant path_noexec() check] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: add a kerneldoc header over lookup_fastJeff Layton
The lookup_fast helper in fs/namei.c has some subtlety in how dentries are returned. Document them. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240802-openfast-v1-2-a1cff2a33063@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: remove comment about d_rcu_to_refcountJeff Layton
This function no longer exists. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240802-openfast-v1-1-a1cff2a33063@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: mounts: Remove unused declaration mnt_cursor_del()Yue Haibing
Commit 2eea9ce4310d ("mounts: keep list of mounts in an rbtree") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240803115000.589872-1-yuehaibing@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19percpu-rwsem: remove the unused parameter 'read'Wang Long
In the function percpu_rwsem_release, the parameter `read` is unused, so remove it. Signed-off-by: Wang Long <w@laoqinren.net> Link: https://lore.kernel.org/r/20240802091901.2546797-1-w@laoqinren.net Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19coda: use param->file for FSCONFIG_SET_FDAleksa Sarai
While the old code did support FSCONFIG_SET_FD, there's no need to re-get the file the fs_context infrastructure already grabbed for us. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20240731-fsconfig-fsparam_fd-fixes-v2-2-e7c472224417@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19autofs: fix missing fput for FSCONFIG_SET_FDAleksa Sarai
If you pass an fd using FSCONFIG_SET_FD, autofs_parse_fd() "steals" the param->file and so the fs_context infrastructure will not do fput() for us. Fixes: e6ec453bd0f0 ("autofs: convert autofs to use the new mount api") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20240731-fsconfig-fsparam_fd-fixes-v2-1-e7c472224417@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19fs/aio: Fix __percpu annotation of *cpu pointer in struct kioctxUros Bizjak
__percpu annotation of *cpu pointer in struct kioctx is put at the wrong place, resulting in several sparse warnings: aio.c:623:24: warning: incorrect type in argument 1 (different address spaces) aio.c:623:24: expected void [noderef] __percpu *__pdata aio.c:623:24: got struct kioctx_cpu *cpu aio.c:788:18: warning: incorrect type in assignment (different address spaces) aio.c:788:18: expected struct kioctx_cpu *cpu aio.c:788:18: got struct kioctx_cpu [noderef] __percpu * aio.c:835:24: warning: incorrect type in argument 1 (different address spaces) aio.c:835:24: expected void [noderef] __percpu *__pdata aio.c:835:24: got struct kioctx_cpu *cpu aio.c:940:16: warning: incorrect type in initializer (different address spaces) aio.c:940:16: expected void const [noderef] __percpu *__vpp_verify aio.c:940:16: got struct kioctx_cpu * aio.c:958:16: warning: incorrect type in initializer (different address spaces) aio.c:958:16: expected void const [noderef] __percpu *__vpp_verify aio.c:958:16: got struct kioctx_cpu * Put __percpu annotation at the right place to fix these warnings. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20240730121915.4514-1-ubizjak@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19mount: handle OOM on mnt_warn_timestamp_expiryOlaf Hering
If no page could be allocated, an error pointer was used as format string in pr_warn. Rearrange the code to return early in case of OOM. Also add a check for the return value of d_path. Fixes: f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry") Signed-off-by: Olaf Hering <olaf@aepfle.de> Link: https://lore.kernel.org/r/20240730085856.32385-1-olaf@aepfle.de [brauner: rewrite commit and commit message] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19Fixed: fs: file_table_c: Missing blank line warnings and struct declaration ↵Mohit0404
improved Fixed- WARNING: Missing a blank line after declarations WARNING: Missing a blank line after declarations Declaration format: improved struct file declaration format Signed-off-by: Mohit0404 <mohitpawar@mitaoe.ac.in> Link: https://lore.kernel.org/r/20240727072134.130962-2-mohitpawar@mitaoe.ac.in Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19fs/direct-io: Remove linux/prefetch.h includeYouling Tang
After commit c22198e78d52 ("direct-io: remove random prefetches"), Nothing in this file needs anything from `linux/prefetch.h`. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Link: https://lore.kernel.org/r/20240603014834.45294-1-youling.tang@linux.dev Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19fs: don't flush in-flight wb switches for superblocks without cgroup writebackHaifeng Xu
When deactivating any type of superblock, it had to wait for the in-flight wb switches to be completed. wb switches are executed in inode_switch_wbs_work_fn() which needs to acquire the wb_switch_rwsem and races against sync_inodes_sb(). If there are too much dirty data in the superblock, the waiting time may increase significantly. For superblocks without cgroup writeback such as tmpfs, they have nothing to do with the wb swithes, so the flushing can be avoided. Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Link: https://lore.kernel.org/r/20240726030525.180330-1-haifeng.xu@shopee.com Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19Merge patch series "Add an fcntl() to check file creation"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Systemd has a helper called openat_report_new() that returns whether a file was created anew or it already existed before for cases where O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't something that's specific to systemd but it's where I noticed it. The current logic is that it first attempts to open the file without O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both flags. If that succeeds all is well. If it now reports EEXIST it retries. That works fairly well but some corner cases make this more involved. If this operates on a dangling symlink the first openat() without O_CREAT | O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL will fail with EEXIST. The reason is that openat() without O_CREAT | O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security reasons. So it's not something we can really change unless we add an explicit opt-in via O_FOLLOW which seems really ugly. The caller could try and use fanotify() to register to listen for creation events in the directory before calling openat(). The caller could then compare the returned tid to its own tid to ensure that even in threaded environments it actually created the file. That might work but is a lot of work for something that should be fairly simple and I'm uncertain about it's reliability. The caller could use a bpf lsm hook to hook into security_file_open() to figure out whether they created the file. That also seems a bit wild. So let's add F_CREATED_QUERY which allows the caller to check whether they actually did create the file. That has caveats of course but I don't think they are problematic: * In multi-threaded environments a thread can only be sure that it did create the file if it calls openat() with O_CREAT. In other words, it's obviously not enough to just go through it's fdtable and check these fds because another thread could've created the file. * If there's any codepaths where an openat() with O_CREAT would yield the same struct file as that of another thread it would obviously cause wrong results. I'm not aware of any such codepaths from openat() itself. Imho, that would be a bug. * Related to the previous point, calling the new fcntl() on files created and opened via special-purpose system calls or ioctl()s would cause wrong results only if the affected subsystem a) raises FMODE_CREATED and b) may return the same struct file for two different calls. I'm not seeing anything outside of regular VFS code that raises FMODE_CREATED. There is code for b) in e.g., the drm layer where the same struct file is resurfaced but again FMODE_CREATED isn't used and it would be very misleading if it did. [1]: https://github.com/systemd/systemd/blob/11d5e2b5fbf9f6bfa5763fd45b56829ad4f0777f/src/basic/fs-util.c#L1078 * patches from https://lore.kernel.org/r/20240724-work-fcntl-v1-0-e8153a2f1991@kernel.org: selftests: add F_CREATED_QUERY tests fcntl: add F_CREATED_QUERY Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19selftests: add F_CREATED_QUERY testsChristian Brauner
Add simple selftests for fcntl(fd, F_CREATED_QUERY, 0). Link: https://lore.kernel.org/r/20240724-work-fcntl-v1-2-e8153a2f1991@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19fcntl: add F_CREATED_QUERYChristian Brauner
Systemd has a helper called openat_report_new() that returns whether a file was created anew or it already existed before for cases where O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't something that's specific to systemd but it's where I noticed it. The current logic is that it first attempts to open the file without O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both flags. If that succeeds all is well. If it now reports EEXIST it retries. That works fairly well but some corner cases make this more involved. If this operates on a dangling symlink the first openat() without O_CREAT | O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL will fail with EEXIST. The reason is that openat() without O_CREAT | O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security reasons. So it's not something we can really change unless we add an explicit opt-in via O_FOLLOW which seems really ugly. The caller could try and use fanotify() to register to listen for creation events in the directory before calling openat(). The caller could then compare the returned tid to its own tid to ensure that even in threaded environments it actually created the file. That might work but is a lot of work for something that should be fairly simple and I'm uncertain about it's reliability. The caller could use a bpf lsm hook to hook into security_file_open() to figure out whether they created the file. That also seems a bit wild. So let's add F_CREATED_QUERY which allows the caller to check whether they actually did create the file. That has caveats of course but I don't think they are problematic: * In multi-threaded environments a thread can only be sure that it did create the file if it calls openat() with O_CREAT. In other words, it's obviously not enough to just go through it's fdtable and check these fds because another thread could've created the file. * If there's any codepaths where an openat() with O_CREAT would yield the same struct file as that of another thread it would obviously cause wrong results. I'm not aware of any such codepaths from openat() itself. Imho, that would be a bug. * Related to the previous point, calling the new fcntl() on files created and opened via special-purpose system calls or ioctl()s would cause wrong results only if the affected subsystem a) raises FMODE_CREATED and b) may return the same struct file for two different calls. I'm not seeing anything outside of regular VFS code that raises FMODE_CREATED. There is code for b) in e.g., the drm layer where the same struct file is resurfaced but again FMODE_CREATED isn't used and it would be very misleading if it did. Link: https://github.com/systemd/systemd/blob/11d5e2b5fbf9f6bfa5763fd45b56829ad4f0777f/src/basic/fs-util.c#L1078 [1] Link: https://lore.kernel.org/r/20240724-work-fcntl-v1-1-e8153a2f1991@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-18Linux 6.11-rc4v6.11-rc4Linus Torvalds
2024-08-18Merge tag 'driver-core-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two driver fixes for regressions from 6.11-rc1 due to the driver core change making a structure in a driver core callback const. These were missed by all testing EXCEPT for what Bart happened to be running, so I appreciate the fixes provided here for some odd/not-often-used driver subsystems that nothing else happened to catch. Both of these fixes have been in linux-next all week with no reported issues" * tag 'driver-core-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: mips: sgi-ip22: Fix the build ARM: riscpc: ecard: Fix the build
2024-08-18Merge tag 'char-misc-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc fixes from Greg KH: "Here are some small char/misc fixes for 6.11-rc4 to resolve reported problems. Included in here are: - fastrpc revert of a change that broke userspace - xillybus fixes for reported issues Half of these have been in linux-next this week with no reported problems, I don't know if the last bit of xillybus driver changes made it in, but they are 'obviously correct' so will be safe :)" * tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: char: xillybus: Check USB endpoints when probing device char: xillybus: Refine workqueue handling Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD" char: xillybus: Don't destroy workqueue from work item running on it
2024-08-18Merge tag 'tty-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial fixes from Greg KH: "Here are some small tty and serial driver fixes for 6.11-rc4 to resolve some reported problems. Included in here are: - conmakehash.c userspace build issues - fsl_lpuart driver fix - 8250_omap revert for reported regression - atmel_serial rts flag fix All of these have been in linux-next this week with no reported issues" * tag 'tty-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "serial: 8250_omap: Set the console genpd always on if no console suspend" tty: atmel_serial: use the correct RTS flag. tty: vt: conmakehash: remove non-portable code printing comment header tty: serial: fsl_lpuart: mark last busy before uart_add_one_port
2024-08-18Merge tag 'usb-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt driver fixes from Greg KH: "Here are some small USB and Thunderbolt driver fixes for 6.11-rc4 to resolve some reported issues. Included in here are: - thunderbolt driver fixes for reported problems - typec driver fixes - xhci fixes - new device id for ljca usb driver All of these have been in linux-next this week with no reported issues" * tag 'usb-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[] Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET" usb: typec: ucsi: Fix the return value of ucsi_run_command() usb: xhci: fix duplicate stall handling in handle_tx_event() usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup() thunderbolt: Mark XDomain as unplugged when router is removed thunderbolt: Fix memory leaks in {port|retimer}_sb_regs_write()
2024-08-18Merge tag 'for-6.11-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs fixes from David Sterba: "A more fixes. We got reports that shrinker added in 6.10 still causes latency spikes and the fixes don't handle all corner cases. Due to summer holidays we're taking a shortcut to disable it for release builds and will fix it in the near future. - only enable extent map shrinker for DEBUG builds, temporary quick fix to avoid latency spikes for regular builds - update target inode's ctime on unlink, mandated by POSIX - properly take lock to read/update block group's zoned variables - add counted_by() annotations" * tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: only enable extent map shrinker for DEBUG builds btrfs: zoned: properly take lock to read/update block group's zoned variables btrfs: tree-checker: add dev extent item checks btrfs: update target inode's ctime on unlink btrfs: send: annotate struct name_cache_entry with __counted_by()
2024-08-18fuse: Initialize beyond-EOF page contents before setting uptodateJann Horn
fuse_notify_store(), unlike fuse_do_readpage(), does not enable page zeroing (because it can be used to change partial page contents). So fuse_notify_store() must be more careful to fully initialize page contents (including parts of the page that are beyond end-of-file) before marking the page uptodate. The current code can leave beyond-EOF page contents uninitialized, which makes these uninitialized page contents visible to userspace via mmap(). This is an information leak, but only affects systems which do not enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the corresponding kernel command line parameter). Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574 Cc: stable@kernel.org Fixes: a1d75f258230 ("fuse: add store request") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-17Merge tag 'mm-hotfixes-stable-2024-08-17-19-34' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. All except one are for MM. 10 of these are cc:stable and the others pertain to post-6.10 issues. As usual with these merges, singletons and doubletons all over the place, no identifiable-by-me theme. Please see the lovingly curated changelogs to get the skinny" * tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/migrate: fix deadlock in migrate_pages_batch() on large folios alloc_tag: mark pages reserved during CMA activation as not tagged alloc_tag: introduce clear_page_tag_ref() helper function crash: fix riscv64 crash memory reserve dead loop selftests: memfd_secret: don't build memfd_secret test on unsupported arches mm: fix endless reclaim on machines with unaccepted memory selftests/mm: compaction_test: fix off by one in check_compaction() mm/numa: no task_numa_fault() call if PMD is changed mm/numa: no task_numa_fault() call if PTE is changed mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0 mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu mm: don't account memmap per-node mm: add system wide stats items category mm: don't account memmap on failure mm/hugetlb: fix hugetlb vs. core-mm PT locking mseal: fix is_madv_discard()
2024-08-17Merge tag 'powerpc-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes on 85xx with some configs since the recent hugepd rework. - Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL on some platforms. - Don't enable offline cores when changing SMT modes, to match existing userspace behaviour. Thanks to Christophe Leroy, Dr. David Alan Gilbert, Guenter Roeck, Nysal Jan K.A, Shrikanth Hegde, Thomas Gleixner, and Tyrel Datwyler. * tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/topology: Check if a core is online cpu/SMT: Enable SMT only if a core is online powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL powerpc/mm: Fix size of allocated PGDIR soc: fsl: qbman: remove unused struct 'cgr_comp'
2024-08-17Merge tag 'v6.11-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - fix for clang warning - additional null check - fix for cached write with posix locks - flexible structure fix * tag 'v6.11-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: smb2pdu.h: Use static_assert() to check struct sizes smb3: fix lock breakage for cached writes smb/client: avoid possible NULL dereference in cifs_free_subrequest()
2024-08-17Merge tag 'i2c-for-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C core fix replacing IS_ENABLED() with IS_REACHABLE() For host drivers, there are two fixes: - Tegra I2C Controller: Addresses a potential double-locking issue during probe. ACPI devices are not IRQ-safe when invoking runtime suspend and resume functions, so the irq_safe flag should not be set. - Qualcomm GENI I2C Controller: Fixes an oversight in the exit path of the runtime_resume() function, which was missed in the previous release" * tag 'i2c-for-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: tegra: Do not mark ACPI devices as irq safe i2c: Use IS_REACHABLE() for substituting empty ACPI functions i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
2024-08-17Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes to the mpi3mr driver. One to avoid oversize allocations in tracing and the other to fix an uninitialized spinlock in the user to driver feature request code (used to trigger dumps and the like)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpi3mr: Avoid MAX_PAGE_ORDER WARNING for buffer allocations scsi: mpi3mr: Add missing spin_lock_init() for mrioc->trigger_lock
2024-08-17Merge tag 'xfs-6.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Chandan Babu: - Check for presence of only 'attr' feature before scrubbing an inode's attribute fork. - Restore the behaviour of setting AIL thread to TASK_INTERRUPTIBLE for long (i.e. 50ms) sleep durations to prevent high load averages. - Do not allow users to change the realtime flag of a file unless the datadev and rtdev both support fsdax access modes. * tag 'xfs-6.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set xfs: revert AIL TASK_KILLABLE threshold xfs: attr forks require attr, not attr2
2024-08-17Merge tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent OverstreetL - New on disk format version, bcachefs_metadata_version_disk_accounting_inum This adds one more disk accounting counter, which counts disk usage and number of extents per inode number. This lets us track fragmentation, for implementing defragmentation later, and it also counts disk usage per inode in all snapshots, which will be a useful thing to expose to users. - One performance issue we've observed is threads spinning when they should be waiting for dirty keys in the key cache to be flushed by journal reclaim, so we now have hysteresis for the waiting thread, as well as improving the tracepoint and a new time_stat, for tracking time blocked waiting on key cache flushing. ... and various assorted smaller fixes. * tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs: bcachefs: Fix locking in __bch2_trans_mark_dev_sb() bcachefs: fix incorrect i_state usage bcachefs: avoid overflowing LRU_TIME_BITS for cached data lru bcachefs: Fix forgetting to pass trans to fsck_err() bcachefs: Increase size of cuckoo hash table on too many rehashes bcachefs: bcachefs_metadata_version_disk_accounting_inum bcachefs: Kill __bch2_accounting_mem_mod() bcachefs: Make bkey_fsck_err() a wrapper around fsck_err() bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in bcachefs: Add a time_stat for blocked on key cache flush bcachefs: Improve trans_blocked_journal_reclaim tracepoint bcachefs: Add hysteresis to waiting on btree key cache flush lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc() bcachefs: Convert for_each_btree_node() to lockrestart_do() bcachefs: Add missing downgrade table entry bcachefs: disk accounting: ignore unknown types bcachefs: bch2_accounting_invalid() fixup bcachefs: Fix bch2_trigger_alloc when upgrading from old versions bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
2024-08-16bcachefs: Fix locking in __bch2_trans_mark_dev_sb()Kent Overstreet
We run this in full RW mode now, so we have to guard against the superblock buffer being reallocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull memcg-v1 fix from Al Viro: "memcg_write_event_control() oops fix" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: memcg_write_event_control(): fix a user-triggerable oops
2024-08-16Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Fix the arm64 __get_mem_asm() to use the _ASM_EXTABLE_##type##ACCESS() macro instead of the *_ERR() one in order to avoid writing -EFAULT to the value register in case of a fault - Initialise all elements of the acpi_early_node_map[] to NUMA_NO_NODE. Prior to this fix, only the first element was initialised - Move the KASAN random tag seed initialisation after the per-CPU areas have been initialised (prng_state is __percpu) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix KASAN random tag seed initialization arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE arm64: uaccess: correct thinko in __get_mem_asm()
2024-08-16Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "One fix for the new T-Head TH1520 clk driver that marks a bus clk critical so that it isn't turned off during late init which breaks emmc-sdio" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: thead: fix dependency on clk_ignore_unused