summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-09-05debugfs: add debugfs_lookup_and_remove()Greg Kroah-Hartman
There is a very common pattern of using debugfs_remove(debufs_lookup(..)) which results in a dentry leak of the dentry that was looked up. Instead of having to open-code the correct pattern of calling dput() on the dentry, create debugfs_lookup_and_remove() to handle this pattern automatically and properly without any memory leaks. Cc: stable <stable@kernel.org> Reported-by: Kuyo Chang <kuyo.chang@mediatek.com> Tested-by: Kuyo Chang <kuyo.chang@mediatek.com> Link: https://lore.kernel.org/r/YxIaQ8cSinDR881k@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-04exfat: fix overflow for large capacity partitionYuezhang Mo
Using int type for sector index, there will be overflow in a large capacity partition. For example, if storage with sector size of 512 bytes and partition capacity is larger than 2TB, there will be overflow. Fixes: 1b6138385499 ("exfat: reduce block requests when zeroing a cluster") Cc: stable@vger.kernel.org # v5.19+ Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-09-02Merge tag '6.0-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Five fixes, all also marked for stable: - fixes for collapse range and insert range (also fixes xfstest generic/031) - memory leak fix" * tag '6.0-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix small mempool leak in SMB2_negotiate() smb3: use filemap_write_and_wait_range instead of filemap_write_and_wait smb3: fix temporary data corruption in insert range smb3: fix temporary data corruption in collapse range smb3: Move the flush out of smb2_copychunk_range() into its callers
2022-09-02Merge tag 'rxrpc-fixes-20220901' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc fixes Here are some fixes for AF_RXRPC: (1) Fix the handling of ICMP/ICMP6 packets. This is a problem due to rxrpc being switched to acting as a UDP tunnel, thereby allowing it to steal the packets before they go through the UDP Rx queue. UDP tunnels can't get ICMP/ICMP6 packets, however. This patch adds an additional encap hook so that they can. (2) Fix the encryption routines in rxkad to handle packets that have more than three parts correctly. The problem is that ->nr_frags doesn't count the initial fragment, so the sglist ends up too short. (3) Fix a problem with destruction of the local endpoint potentially getting repeated. (4) Fix the calculation of the time at which to resend. jiffies_to_usecs() gives microseconds, not nanoseconds. (5) Fix AFS to work out when callback promises and locks expire based on the time an op was issued rather than the time the first reply packet arrives. We don't know how long the server took between calculating the expiry interval and transmitting the reply. (6) Given (5), rxrpc_get_reply_time() is no longer used, so remove it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-01orangefs: use ->f_mappingAl Viro
... and don't check for impossible conditions - file_inode() is never NULL in anything seen by ->release() and neither is its ->i_mapping. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01_nfs42_proc_copy(): use ->f_mapping instead of file_inode()->i_mappingAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01nfs_finish_open(): don't open-code file_inode()Al Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01bprm_fill_uid(): don't open-code file_inode()Al Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01exfat_iterate(): don't open-code file_inode(file)Al Viro
and it's file, not filp... Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01ecryptfs: constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01nd_jump_link(): constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01do_proc_readlink(): constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01overlayfs: constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01fs/notify: constify pathAl Viro
Reviewed-by: Matthew Bobrowski <repnop@google.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01may_linkat(): constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01do_sys_name_to_handle(): constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01->getprocattr(): attribute name is const char *, TYVM...Al Viro
cast of ->d_name.name to char * is completely wrong - nothing is allowed to modify its contents. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01fs: fix UAF/GPF bug in nilfs_mdt_destroyDongliang Mu
In alloc_inode, inode_init_always() could return -ENOMEM if security_inode_alloc() fails, which causes inode->i_private uninitialized. Then nilfs_is_metadata_file_inode() returns true and nilfs_free_inode() wrongly calls nilfs_mdt_destroy(), which frees the uninitialized inode->i_private and leads to crashes(e.g., UAF/GPF). Fix this by moving security_inode_alloc just prior to this_cpu_inc(nr_inodes) Link: https://lkml.kernel.org/r/CAFcO6XOcf1Jj2SeGt=jJV59wmhESeSKpfR0omdFRq+J9nD1vfQ@mail.gmail.com Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Reported-by: Hao Sun <sunhao.th@gmail.com> Reported-by: Jiacheng Xu <stitch@zju.edu.cn> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/.gitignore sort the net-next version and use it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01kernfs: Implement kernfs_show()Tejun Heo
Currently, kernfs nodes can be created hidden and activated later by calling kernfs_activate() to allow creation of multiple nodes to succeed or fail as a unit. This is an one-way one-time-only transition. This patch introduces kernfs_show() which can toggle visibility dynamically. As the currently proposed use - toggling the cgroup pressure files - only requires operating on leaf nodes, for the sake of simplicity, restrict it as such for now. Hiding uses the same mechanism as deactivation and likewise guarantees that there are no in-flight operations on completion. KERNFS_ACTIVATED and KERNFS_HIDDEN are used to manage the interactions between activations and show/hide operations. A node is visible iff both activated & !hidden. Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-9-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Factor out kernfs_activate_one()Tejun Heo
Factor out kernfs_activate_one() from kernfs_activate() and reorder operations so that KERNFS_ACTIVATED now simply indicates whether activation was attempted on the node ignoring whether activation took place. As the flag doesn't have a reader, the refactoring and reordering shouldn't cause any behavior difference. Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-8-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Add KERNFS_REMOVING flagsTejun Heo
KERNFS_ACTIVATED tracks whether a given node has ever been activated. As a node was only deactivated on removal, this was used for 1. Drain optimization (removed by the previous patch). 2. To hide !activated nodes 3. To avoid double activations 4. Reject adding children to a node being removed 5. Skip activaing a node which is being removed. We want to decouple deactivation from removal so that nodes can be deactivated and hidden dynamically, which makes KERNFS_ACTIVATED useless for all of the above purposes. #1 is already gone. #2 and #3 can instead test whether the node is currently active. A new flag KERNFS_REMOVING is added to explicitly mark nodes which are being removed for #4 and #5. While this leaves KERNFS_ACTIVATED with no users, leave it be as it will be used in a following patch. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-7-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Improve kernfs_drain() and always call on removalTejun Heo
__kernfs_remove() was skipping draining based on KERNFS_ACTIVATED - whether the node has ever been activated since creation. Instead, update it to always call kernfs_drain() which now drains or skips based on the precise drain conditions. This ensures that the nodes will be deactivated and drained regardless of their states. This doesn't make meaningful difference now but will enable deactivating and draining nodes dynamically by making removals safe when racing those operations. While at it, drop / update comments. v2: Fix the inverted test on kernfs_should_drain_open_files() noted by Chengming. This was fixed by the next unrelated patch in the previous posting. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-6-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Skip kernfs_drain_open_files() more aggressivelyTejun Heo
Track the number of mmapped files and files that need to be released and skip kernfs_drain_open_file() if both are zero, which are the precise conditions which require draining open_files. The early exit test is factored into kernfs_should_drain_open_files() which is now tested by kernfs_drain_open_files()'s caller - kernfs_drain(). This isn't a meaningful optimization on its own but will enable future stand-alone kernfs_deactivate() implementation. v2: Chengming noticed that on->nr_to_release was leaking after ->open() failure. Fix it by telling kernfs_unlink_open_file() that it's called from the ->open() fail path and should dec the counter. Use kzalloc() to allocate kernfs_open_node so that the tracking fields are correctly initialized. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-5-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Refactor kernfs_get_open_node()Tejun Heo
Factor out commont part. This is cleaner and should help with future changes. No functional changes. Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-4-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Drop unnecessary "mutex" local variable initializationTejun Heo
These are unnecessary and unconventional. Remove them. Also move variable declaration into the block that it's used. No functional changes. Cc: Imran Khan <imran.f.khan@oracle.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-3-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Simply by replacing kernfs_deref_open_node() with of_on()Tejun Heo
kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU dereference is currently only used in kernfs_notify(). Everywhere else, either we're holding the lock which protects it or know that the kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file which is hanging off of it. kernfs_deref_open_node() is used for the latter case - accessing kernfs_open_node from kernfs_open_file. The lifetime and visibility rules are simple and clear here. To someone who can access a kernfs_open_file, its kernfs_open_node is pinned and visible through of->kn->attr.open. Replace kernfs_deref_open_node() which simpler of_on(). The former takes both @kn and @of and RCU deref @kn->attr.open while sanity checking with @of. The latter takes @of and uses protected deref on of->kn->attr.open. As the return value can't be NULL, remove the error handling in the callers too. This shouldn't cause any functional changes. Cc: Imran Khan <imran.f.khan@oracle.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-2-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01NFSv4: Turn off open-by-filehandle and NFS re-export for NFSv4.0Trond Myklebust
The NFSv4.0 protocol only supports open() by name. It cannot therefore be used with open_by_handle() and friends, nor can it be re-exported by knfsd. Reported-by: Chuck Lever III <chuck.lever@oracle.com> Fixes: 20fa19027286 ("nfs: add export operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-09-01afs: Use the operation issue time instead of the reply time for callbacksDavid Howells
rxrpc and kafs between them try to use the receive timestamp on the first data packet (ie. the one with sequence number 1) as a base from which to calculate the time at which callback promise and lock expiration occurs. However, we don't know how long it took for the server to send us the reply from it having completed the basic part of the operation - it might then, for instance, have to send a bunch of a callback breaks, depending on the particular operation. Fix this by using the time at which the operation is issued on the client as a base instead. That should never be longer than the server's idea of the expiry time. Fixes: 781070551c26 ("afs: Fix calculation of callback expiry time") Fixes: 2070a3e44962 ("rxrpc: Allow the reply time to be obtained on a client call") Suggested-by: Jeffrey E Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2022-08-31cachefiles: make on-demand request distribution fairerXin Yin
For now, enqueuing and dequeuing on-demand requests all start from idx 0, this makes request distribution unfair. In the weighty concurrent I/O scenario, the request stored in higher idx will starve. Searching requests cyclically in cachefiles_ondemand_daemon_read, makes distribution fairer. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Reported-by: Yongqing Li <liyongqing@bytedance.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220817065200.11543-1-yinxin.x@bytedance.com/ # v1 Link: https://lore.kernel.org/r/20220825020945.2293-1-yinxin.x@bytedance.com/ # v2
2022-08-31cachefiles: fix error return code in cachefiles_ondemand_copen()Sun Ke
The cache_size field of copen is specified by the user daemon. If cache_size < 0, then the OPEN request is expected to fail, while copen itself shall succeed. However, returning 0 is indeed unexpected when cache_size is an invalid error code. Fix this by returning error when cache_size is an invalid error code. Changes ======= v4: update the code suggested by Dan v3: update the commit log suggested by Jingbo. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Sun Ke <sunke32@huawei.com> Suggested-by: Jeffle Xu <jefflexu@linux.alibaba.com> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220818111935.1683062-1-sunke32@huawei.com/ # v2 Link: https://lore.kernel.org/r/20220818125038.2247720-1-sunke32@huawei.com/ # v3 Link: https://lore.kernel.org/r/20220826023515.3437469-1-sunke32@huawei.com/ # v4
2022-08-31xattr: constify value argument in vfs_setxattr()Christian Brauner
Now that we don't perform translations directly in vfs_setxattr() anymore we can constify the @value argument in vfs_setxattr(). This also allows us to remove the hack to cast from a const in ovl_do_setxattr(). Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
2022-08-31ovl: use vfs_set_acl_prepare()Christian Brauner
The posix_acl_from_xattr() helper should mainly be used in i_op->get_acl() handlers. It translates from the uapi struct into the kernel internal POSIX ACL representation and doesn't care about mount idmappings. Use the vfs_set_acl_prepare() helper to generate a kernel internal POSIX ACL representation in struct posix_acl format taking care to map from the mount idmapping into the filesystem's idmapping. The returned struct posix_acl is in the correct format to be cached by the VFS or passed to the filesystem's i_op->set_acl() method to write to the backing store. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
2022-08-31acl: move idmapping handling into posix_acl_xattr_set()Christian Brauner
The uapi POSIX ACL struct passed through the value argument during setxattr() contains {g,u}id values encoded via ACL_{GROUP,USER} entries that should actually be stored in the form of k{g,u}id_t (See [1] for a long explanation of the issue.). In 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") we took the mount's idmapping into account in order to let overlayfs handle POSIX ACLs on idmapped layers correctly. The fixup is currently performed directly in vfs_setxattr() which piles on top of the earlier hackiness by handling the mount's idmapping and stuff the vfs{g,u}id_t values into the uapi struct as well. While that is all correct and works fine it's just ugly. Now that we have introduced vfs_make_posix_acl() earlier move handling idmapped mounts out of vfs_setxattr() and into the POSIX ACL handler where it belongs. Note that we also need to call vfs_make_posix_acl() for EVM which interpretes POSIX ACLs during security_inode_setxattr(). Leave them a longer comment for future reference. All filesystems that support idmapped mounts via FS_ALLOW_IDMAP use the standard POSIX ACL xattr handlers and are covered by this change. This includes overlayfs which simply calls vfs_{g,s}etxattr(). The following filesystems use custom POSIX ACL xattr handlers: 9p, cifs, ecryptfs, and ntfs3 (and overlayfs but we've covered that in the paragraph above) and none of them support idmapped mounts yet. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/ [1] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
2022-08-31acl: add vfs_set_acl_prepare()Christian Brauner
Various filesystems store POSIX ACLs on the backing store in their uapi format. Such filesystems need to translate from the uapi POSIX ACL format into the VFS format during i_op->get_acl(). The VFS provides the posix_acl_from_xattr() helper for this task. But the usage of posix_acl_from_xattr() is currently ambiguous. It is intended to transform from a uapi POSIX ACL to the VFS represenation. For example, when retrieving POSIX ACLs for permission checking during lookup or when calling getxattr() to retrieve system.posix_acl_{access,default}. Calling posix_acl_from_xattr() during i_op->get_acl() will map the raw {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL format into k{g,u}id_t in the filesystem's idmapping and return a struct posix_acl ready to be returned to the VFS for caching and to perform permission checks on. However, posix_acl_from_xattr() is also called during setxattr() for all filesystems that rely on VFS provides posix_acl_{access,default}_xattr_handler. The posix_acl_xattr_set() handler which is used for the ->set() method of posix_acl_{access,default}_xattr_handler uses posix_acl_from_xattr() to translate from the uapi POSIX ACL format to the VFS format so that it can be passed to the i_op->set_acl() handler of the filesystem or for direct caching in case no i_op->set_acl() handler is defined. During setxattr() the {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL format aren't raw {g,u}id values that need to be mapped according to the filesystem's idmapping. Instead they are {g,u}id values in the caller's idmapping which have been generated during posix_acl_fix_xattr_from_user(). In other words, they are k{g,u}id_t which are passed as raw {g,u}id values abusing the uapi POSIX ACL format (Please note that this type safety violation has existed since the introduction of k{g,u}id_t. Please see [1] for more details.). So when posix_acl_from_xattr() is called in posix_acl_xattr_set() the filesystem idmapping is completely irrelevant. Instead, we abuse the initial idmapping to recover the k{g,u}id_t base on the value stored in raw {g,u}id as ACL_{GROUP,USER} in the uapi POSIX ACL format. We need to clearly distinguish betweeen these two operations as it is really easy to confuse for filesystems as can be seen in ntfs3. In order to do this we factor out make_posix_acl() which takes callbacks allowing callers to pass dedicated methods to generate the correct k{g,u}id_t. This is just an internal static helper which is not exposed to any filesystems but it neatly encapsulates the basic logic of walking through a uapi POSIX ACL and returning an allocated VFS POSIX ACL with the correct k{g,u}id_t values. The posix_acl_from_xattr() helper can then be implemented as a simple call to make_posix_acl() with callbacks that generate the correct k{g,u}id_t from the raw {g,u}id values in ACL_{GROUP,USER} entries in the uapi POSIX ACL format as read from the backing store. For setxattr() we add a new helper vfs_set_acl_prepare() which has callbacks to map the POSIX ACLs from the uapi format with the k{g,u}id_t values stored in raw {g,u}id format in ACL_{GROUP,USER} entries into the correct k{g,u}id_t values in the filesystem idmapping. In contrast to posix_acl_from_xattr() the vfs_set_acl_prepare() helper needs to take the mount idmapping into account. The differences are explained in more detail in the kernel doc for the new functions. In follow up patches we will remove all abuses of posix_acl_from_xattr() for setxattr() operations and replace it with calls to vfs_set_acl_prepare(). The new vfs_set_acl_prepare() helper allows us to deal with the ambiguity in how the POSI ACL uapi struct stores {g,u}id values depending on whether this is a getxattr() or setxattr() operation. This also allows us to remove the posix_acl_setxattr_idmapped_mnt() helper reducing the abuse of the POSIX ACL uapi format to pass values that should be distinct types in {g,u}id values stored as ACL_{GROUP,USER} entries. The removal of posix_acl_setxattr_idmapped_mnt() in turn allows us to re-constify the value parameter of vfs_setxattr() which in turn allows us to avoid the nasty cast from a const void pointer to a non-const void pointer on ovl_do_setxattr(). Ultimately, the plan is to get rid of the type violations completely and never pass the values from k{g,u}id_t as raw {g,u}id in ACL_{GROUP,USER} entries in uapi POSIX ACL format. But that's a longer way to go and this is a preparatory step. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Co-Developed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-08-31acl: return EOPNOTSUPP in posix_acl_fix_xattr_common()Christian Brauner
Return EOPNOTSUPP when the POSIX ACL version doesn't match and zero if there are no entries. This will allow us to reuse the helper in posix_acl_from_xattr(). This change will have no user visible effects. Fixes: 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>>
2022-08-31ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpersChristian Brauner
The xattr code in ntfs3 is currently a bit confused. For example, it defines a POSIX ACL i_op->set_acl() method but instead of relying on the generic POSIX ACL VFS helpers it defines its own set of xattr helpers with the consequence that i_op->set_acl() is currently dead code. Switch ntfs3 to rely on the VFS POSIX ACL xattr handlers. Also remove i_op->{g,s}et_acl() methods from symlink inode operations. Symlinks don't support xattrs. This is a preliminary change for the following patches which move handling idmapped mounts directly in posix_acl_xattr_set(). This survives POSIX ACL xfstests. Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations") Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>>
2022-08-30cifs: fix small mempool leak in SMB2_negotiate()Enzo Matsumiya
In some cases of failure (dialect mismatches) in SMB2_negotiate(), after the request is sent, the checks would return -EIO when they should be rather setting rc = -EIO and jumping to neg_exit to free the response buffer from mempool. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Cc: stable@vger.kernel.org Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-30smb3: use filemap_write_and_wait_range instead of filemap_write_and_waitSteve French
When doing insert range and collapse range we should be writing out the cached pages for the ranges affected but not the whole file. Fixes: c3a72bb21320 ("smb3: Move the flush out of smb2_copychunk_range() into its callers") Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-30userfaultfd: open userfaultfds with O_RDONLYOndrej Mosnacek
Since userfaultfd doesn't implement a write operation, it is more appropriate to open it read-only. When userfaultfds are opened read-write like it is now, and such fd is passed from one process to another, SELinux will check both read and write permissions for the target process, even though it can't actually do any write operation on the fd later. Inspired by the following bug report, which has hit the SELinux scenario described above: https://bugzilla.redhat.com/show_bug.cgi?id=1974559 Reported-by: Robert O'Callahan <roc@ocallahan.org> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-29f2fs: remove gc_urgent_high_limited for cleanupChao Yu
Remove redundant sbi->gc_urgent_high_limited. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-29f2fs: iostat: support accounting compressed IOChao Yu
Previously, we supported to account FS_CDATA_READ_IO type IO only, in this patch, it adds to account more type IO for compressed file: - APP_BUFFERED_CDATA_IO - APP_MAPPED_CDATA_IO - FS_CDATA_IO - APP_BUFFERED_CDATA_READ_IO - APP_MAPPED_CDATA_READ_IO Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-29f2fs: use memcpy_{to,from}_page() where possibleEric Biggers
This is simpler, and as a side effect it replaces several uses of kmap_atomic() with its recommended replacement kmap_local_page(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-29f2fs: fix wrong continue condition in GCJaegeuk Kim
We should decrease the frozen counter. Cc: stable@vger.kernel.org Fixes: 325163e9892b ("f2fs: add gc_urgent_high_remaining sysfs node") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-29f2fs: LFS mode does not support ATGCJaegeuk Kim
ATGC is using SSR which violates LFS mode used by zoned device. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-29genetlink: start to validate reserved header bytesJakub Kicinski
We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-28smb3: fix temporary data corruption in insert rangeDavid Howells
insert range doesn't discard the affected cached region so can risk temporarily corrupting file data. Also includes some minor cleanup (avoiding rereading inode size repeatedly unnecessarily) to make it clearer. Cc: stable@vger.kernel.org Fixes: 7fe6fe95b936 ("cifs: add FALLOC_FL_INSERT_RANGE support") Signed-off-by: David Howells <dhowells@redhat.com> cc: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-28smb3: fix temporary data corruption in collapse rangeSteve French
collapse range doesn't discard the affected cached region so can risk temporarily corrupting the file data. This fixes xfstest generic/031 I also decided to merge a minor cleanup to this into the same patch (avoiding rereading inode size repeatedly unnecessarily) to make it clearer. Cc: stable@vger.kernel.org Fixes: 5476b5dd82c8b ("cifs: add support for FALLOC_FL_COLLAPSE_RANGE") Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> cc: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-28smb3: Move the flush out of smb2_copychunk_range() into its callersDavid Howells
Move the flush out of smb2_copychunk_range() into its callers. This will allow the pagecache to be invalidated between the flush and the operation in smb3_collapse_range() and smb3_insert_range(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-28Merge tag 'mm-hotfixes-stable-2022-08-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more hotfixes from Andrew Morton: "Seventeen hotfixes. Mostly memory management things. Ten patches are cc:stable, addressing pre-6.0 issues" * tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: .mailmap: update Luca Ceresoli's e-mail address mm/mprotect: only reference swap pfn page if type match squashfs: don't call kmalloc in decompressors mm/damon/dbgfs: avoid duplicate context directory creation mailmap: update email address for Colin King asm-generic: sections: refactor memory_intersects bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown Revert "memcg: cleanup racy sum avoidance code" mm/zsmalloc: do not attempt to free IS_ERR handle binder_alloc: add missing mmap_lock calls when using the VMA mm: re-allow pinning of zero pfns (again) vmcoreinfo: add kallsyms_num_syms symbol mailmap: update Guilherme G. Piccoli's email addresses writeback: avoid use-after-free after removing device shmem: update folio if shmem_replace_page() updates the page mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte