summaryrefslogtreecommitdiff
path: root/fs/cifs/dir.c
AgeCommit message (Collapse)Author
2023-05-24smb: move client and server files to common directory fs/smbSteve French
Move CIFS/SMB3 related client and server files (cifs.ko and ksmbd.ko and helper modules) to new fs/smb subdirectory: fs/cifs --> fs/smb/client fs/ksmbd --> fs/smb/server fs/smbfs_common --> fs/smb/common Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-22Merge tag '6.3-rc-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs client updates from Steve French: "The largest subset of this is from David Howells et al: making the cifs/smb3 driver pass iov_iters down to the lowest layers, directly to the network transport rather than passing lists of pages around, helping multiple areas: - Pin user pages, thereby fixing the race between concurrent DIO read and fork, where the pages containing the DIO read buffer may end up belonging to the child process and not the parent - with the result that the parent might not see the retrieved data. - cifs shouldn't take refs on pages extracted from non-user-backed iterators (eg. KVEC). With these changes, cifs will apply the appropriate cleanup. - Making it easier to transition to using folios in cifs rather than pages by dealing with them through BVEC and XARRAY iterators. - Allowing cifs to use the new splice function The remainder are: - fixes for stable, including various fixes for uninitialized memory, wrong length field causing mount issue to very old servers, important directory lease fixes and reconnect fixes - cleanups (unused code removal, change one element array usage, and a change form strtobool to kstrtobool, and Kconfig cleanups) - SMBDIRECT (RDMA) fixes including iov_iter integration and UAF fixes - reconnect fixes - multichannel fixes, including improving channel allocation (to least used channel) - remove the last use of lock_page_killable by moving to folio_lock_killable" * tag '6.3-rc-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (46 commits) update internal module version number for cifs.ko cifs: update ip_addr for ses only for primary chan setup cifs: use tcon allocation functions even for dummy tcon cifs: use the least loaded channel for sending requests cifs: DIO to/from KVEC-type iterators should now work cifs: Remove unused code cifs: Build the RDMA SGE list directly from an iterator cifs: Change the I/O paths to use an iterator rather than a page list cifs: Add a function to read into an iter from a socket cifs: Add some helper functions cifs: Add a function to Hash the contents of an iterator cifs: Add a function to build an RDMA SGE list from an iterator netfs: Add a function to extract an iterator into a scatterlist netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE splice: Export filemap/direct_splice_read() iov_iter: Add a function to extract a page list from an iterator iov_iter: Define flags to qualify page extraction. splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE splice: Add a func to do a splice from a buffered file without ITER_PIPE ...
2023-02-20cifs: Fix uninitialized memory reads for oparms.modeVolker Lendecke
Use a struct assignment with implicit member initialization Signed-off-by: Volker Lendecke <vl@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-19fs: port ->mknod() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->create() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-12-19cifs: use origin fullpath for automountsPaulo Alcantara
Use TCP_Server_Info::origin_fullpath instead of cifs_tcon::tree_name when building source paths for automounts as it will be useful for domain-based DFS referrals where the connections and referrals would get either re-used from the cache or re-created when chasing the dfs link. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-18cifs: Fix xid leak in cifs_create()Zhang Xiaoxu
If the cifs already shutdown, we should free the xid before return, otherwise, the xid will be leaked. Fixes: 087f757b0129 ("cifs: add shutdown support") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15cifs: lease key is uninitialized in smb1 pathsSteve French
It is cleaner to set lease key to zero in the places where leases are not supported (smb1 can not return lease keys so the field was uninitialized). Addresses-Coverity: 1513994 ("Uninitialized scalar variable") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-13cifs: improve symlink handling for smb2+Paulo Alcantara
When creating inode for symlink, the client used to send below requests to fill it in: * create+query_info+close (STATUS_STOPPED_ON_SYMLINK) * create(+reparse_flag)+query_info+close (set file attrs) * create+ioctl(get_reparse)+close (query reparse tag) and then for every access to the symlink dentry, the ->link() method would send another: * create+ioctl(get_reparse)+close (parse symlink) So, in order to improve: (i) Get rid of unnecessary roundtrips and then resolve symlinks as follows: * create+query_info+close (STATUS_STOPPED_ON_SYMLINK + parse symlink + get reparse tag) * create(+reparse_flag)+query_info+close (set file attrs) (ii) Set the resolved symlink target directly in inode->i_link and use simple_get_link() for ->link() to simply return it. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-05smb3: add dynamic trace points for tree disconnectSteve French
Needed this for debugging a failing xfstest. Also change camel case for "treeName" to "tree_name" in tcon struct. Example trace output (from "trace-cmd record -e smb3_tdis*"): umount-9718 [006] ..... 5909.780244: smb3_tdis_enter: xid=206 sid=0xcf38894e tid=0x3d0b8cf8 path=\\localhost\test umount-9718 [007] ..... 5909.780878: smb3_tdis_done: xid=206 sid=0xcf38894e tid=0x3d0b8cf8 Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-05cifs: when insecure legacy is disabled shrink amount of SMB1 codeSteve French
Currently much of the smb1 code is built even when CONFIG_CIFS_ALLOW_INSECURE_LEGACY is disabled. Move cifssmb.c to only be compiled when insecure legacy is disabled, and move various SMB1/CIFS helper functions to that ifdef. Some functions that were not SMB1/CIFS specific needed to be moved out of cifssmb.c This shrinks cifs.ko by more than 10% which is good - but also will help with the eventual movement of the legacy code to a distinct module. Follow on patches can shrink the number of ifdefs by code restructuring where smb1 code is wedged in functions that should be calling dialect specific helper functions instead, and also by moving some functions from file.c/dir.c/inode.c into smb1 specific c files. Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19cifs: Support fscache indexing rewriteDavid Howells
Change the cifs filesystem to take account of the changes to fscache's indexing rewrite and reenable caching in cifs. The following changes have been made: (1) The fscache_netfs struct is no more, and there's no need to register the filesystem as a whole. (2) The session cookie is now an fscache_volume cookie, allocated with fscache_acquire_volume(). That takes three parameters: a string representing the "volume" in the index, a string naming the cache to use (or NULL) and a u64 that conveys coherency metadata for the volume. For cifs, I've made it render the volume name string as: "cifs,<ipaddress>,<sharename>" where the sharename has '/' characters replaced with ';'. This probably needs rethinking a bit as the total name could exceed the maximum filename component length. Further, the coherency data is currently just set to 0. It needs something else doing with it - I wonder if it would suffice simply to sum the resource_id, vol_create_time and vol_serial_number or maybe hash them. (3) The fscache_cookie_def is no more and needed information is passed directly to fscache_acquire_cookie(). The cache no longer calls back into the filesystem, but rather metadata changes are indicated at other times. fscache_acquire_cookie() is passed the same keying and coherency information as before. (4) The functions to set/reset cookies are removed and fscache_use_cookie() and fscache_unuse_cookie() are used instead. fscache_use_cookie() is passed a flag to indicate if the cookie is opened for writing. fscache_unuse_cookie() is passed updates for the metadata if we changed it (ie. if the file was opened for writing). These are called when the file is opened or closed. (5) cifs_setattr_*() are made to call fscache_resize() to change the size of the cache object. (6) The functions to read and write data are stubbed out pending a conversion to use netfslib. Changes ======= ver #8: - Abstract cache invalidation into a helper function. - Fix some checkpatch warnings[3]. ver #7: - Removed the accidentally added-back call to get the super cookie in cifs_root_iget(). - Fixed the right call to cifs_fscache_get_super_cookie() to take account of the "-o fsc" mount flag. ver #6: - Moved the change of gfpflags_allow_blocking() to current_is_kswapd() for cifs here. - Fixed one of the error paths in cifs_atomic_open() to jump around the call to use the cookie. - Fixed an additional successful return in the middle of cifs_open() to use the cookie on the way out. - Only get a volume cookie (and thus inode cookies) when "-o fsc" is supplied to mount. ver #5: - Fixed a couple of bits of cookie handling[2]: - The cookie should be released in cifs_evict_inode(), not cifsFileInfo_put_final(). The cookie needs to persist beyond file closure so that writepages will be able to write to it. - fscache_use_cookie() needs to be called in cifs_atomic_open() as it is for cifs_open(). ver #4: - Fixed the use of sizeof with memset. - tcon->vol_create_time is __le64 so doesn't need cpu_to_le64(). ver #3: - Canonicalise the cifs coherency data to make the cache portable. - Set volume coherency data. ver #2: - Use gfpflags_allow_blocking() rather than using flag directly. - Upgraded to -rc4 to allow for upstream changes[1]. - fscache_acquire_volume() now returns errors. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> cc: Steve French <smfrench@gmail.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: linux-cifs@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23b55d673d7527b093cd97b7c217c82e70cd1af0 [1] Link: https://lore.kernel.org/r/3419813.1641592362@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/CAH2r5muTanw9pJqzAHd01d9A8keeChkzGsCEH6=0rHutVLAF-A@mail.gmail.com/ [3] Link: https://lore.kernel.org/r/163819671009.215744.11230627184193298714.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906982979.143852.10672081929614953210.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967187187.1823006.247415138444991444.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021579335.640689.2681324337038770579.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/3462849.1641593783@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/1318953.1642024578@warthog.procyon.org.uk/ # v6 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-13cifs: remove pathname for file from SPDX headerSteve French
checkpatch complains about source files with filenames (e.g. in these cases just below the SPDX header in comments at the top of various files in fs/cifs). It also is helpful to change this now so will be less confusing when the parent directory is renamed e.g. from fs/cifs to fs/smb_client (or fs/smbfs) Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-08-10cifs: use the correct max-length for dentry_path_raw()Ronnie Sahlberg
RHBZ: 1972502 PATH_MAX is 4096 but PAGE_SIZE can be >4096 on some architectures such as ppc and would thus write beyond the end of the actual object. Cc: <stable@vger.kernel.org> Reported-by: Xiaoli Feng <xifeng@redhat.com> Suggested-by: Brian foster <bfoster@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-23cifs: missing null check for newinode pointerSteve French
in cifs_do_create we check if newinode is valid before referencing it but are missing the check in one place in fs/cifs/dir.c Addresses-Coverity: 1357292 ("Dereference after null check") Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-20cifs: use SPDX-Licence-IdentifierSteve French
Add SPDX license identifier and replace license boilerplate. Corrects various checkpatch errors with the older format for noting the LGPL license. Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-20cifs: retry lookup and readdir when EAGAIN is returned.Thiago Rafael Becker
According to the investigation performed by Jacob Shivers at Red Hat, cifs_lookup and cifs_readdir leak EAGAIN when the user session is deleted on the server. Fix this issue by implementing a retry with limits, as is implemented in cifs_revalidate_dentry_attr. Reproducer based on the work by Jacob Shivers: ~~~ $ cat readdir-cifs-test.sh #!/bin/bash # Install and configure powershell and sshd on the windows # server as descibed in # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_overview # This script uses expect(1) USER=dude SERVER=192.168.0.2 RPATH=root PASS='password' function debug_funcs { for line in $@ ; do echo "func $line +p" > /sys/kernel/debug/dynamic_debug/control done } function setup { echo 1 > /proc/fs/cifs/cifsFYI debug_funcs wait_for_compound_request \ smb2_query_dir_first cifs_readdir \ compound_send_recv cifs_reconnect_tcon \ generic_ip_connect cifs_reconnect \ smb2_reconnect_server smb2_reconnect \ cifs_readv_from_socket cifs_readv_receive tcpdump -i eth0 -w cifs.pcap host 192.168.2.182 & sleep 5 dmesg -C } function test_call { if [[ $1 == 1 ]] ; then tracer="strace -tt -f -s 4096 -o trace-$(date -Iseconds).txt" fi # Change the command here to anything appropriate $tracer ls $2 > /dev/null res=$? if [[ $1 == 1 ]] ; then if [[ $res == 0 ]] ; then 1>&2 echo success else 1>&2 echo "failure ($res)" fi fi } mountpoint /mnt > /dev/null || mount -t cifs -o username=$USER,pass=$PASS //$SERVER/$RPATH /mnt test_call 0 /mnt/ /usr/bin/expect << EOF set timeout 60 spawn ssh $USER@$SERVER expect "yes/no" { send "yes\r" expect "*?assword" { send "$PASS\r" } } "*?assword" { send "$PASS\r" } expect ">" { send "powershell close-smbsession -force\r" } expect ">" { send "exit\r" } expect eof EOF sysctl -w vm.drop_caches=2 > /dev/null sysctl -w vm.drop_caches=2 > /dev/null setup test_call 1 /mnt/ ~~~ Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-05Merge tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: "Ten CIFS/SMB3 changes - including two marked for stable - including some important multichannel fixes, as well as support for handle leases (deferred close) and shutdown support: - some important multichannel fixes - support for handle leases (deferred close) - shutdown support (which is also helpful since it enables multiple xfstests) - enable negotiating stronger encryption by default (GCM256) - improve wireshark debugging by allowing more options for root to dump decryption keys SambaXP and the SMB3 Plugfest test event are going on now so I am expecting more patches over the next few days due to extra testing (including more multichannel fixes)" * tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6: fs/cifs: Fix resource leak Cifs: Fix kernel oops caused by deferred close for files. cifs: fix regression when mounting shares with prefix paths cifs: use echo_interval even when connection not ready. cifs: detect dead connections only when echoes are enabled. smb3.1.1: allow dumping keys for multiuser mounts smb3.1.1: allow dumping GCM256 keys to improve debugging of encrypted shares cifs: add shutdown support cifs: Deferred close for files smb3.1.1: enable negotiating stronger encryption by default
2021-05-03cifs: add shutdown supportSteve French
Various filesystem support the shutdown ioctl which is used by various xfstests. The shutdown ioctl sets a flag on the superblock which prevents open, unlink, symlink, hardlink, rmdir, create etc. on the file system until unmount and remounted. The two flags supported in this patch are: FSOP_GOING_FLAGS_LOGFLUSH and FSOP_GOING_FLAGS_NOLOGFLUSH which require very little other than blocking new operations (since we do not cache writes to metadata on the client with cifs.ko). FSOP_GOING_FLAGS_DEFAULT is not supported yet, but could be added in the future but would need to call syncfs or equivalent to write out pending data on the mount. With this patch various xfstests now work including tests 043 through 046 for example. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2021-04-27Merge branch 'work.inode-type-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs inode type handling updates from Al Viro: "We should never change the type bits of ->i_mode or the method tables (->i_op and ->i_fop) of a live inode. Unfortunately, not all filesystems took care to prevent that" * 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: spufs: fix bogosity in S_ISGID handling 9p: missing chunk of "fs/9p: Don't update file type when updating file attributes" openpromfs: don't do unlock_new_inode() until the new inode is set up hostfs_mknod(): don't bother with init_special_inode() cifs: have cifs_fattr_to_inode() refuse to change type on live inode cifs: have ->mkdir() handle race with another client sanely do_cifs_create(): don't set ->i_mode of something we had not created gfs2: be careful with inode refresh ocfs2_inode_lock_update(): make sure we don't change the type bits of i_mode orangefs_inode_is_stale(): i_mode type bits do *not* form a bitmap... vboxsf: don't allow to change the inode type afs: Fix updating of i_mode due to 3rd party change ceph: don't allow type or device number to change on non-I_NEW inodes ceph: fix up error handling with snapdirs new helper: inode_wrong_type()
2021-04-25cifs: switch build_path_from_dentry() to using dentry_path_raw()Al Viro
The cost is that we might need to flip '/' to '\\' in more than just the prefix. Needs profiling, but I suspect that we won't get slowdown on that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-25cifs: allocate buffer in the caller of build_path_from_dentry()Al Viro
build_path_from_dentry() open-codes dentry_path_raw(). The reason we can't use dentry_path_raw() in there (and postprocess the result as needed) is that the callers of build_path_from_dentry() expect that the object to be freed on cleanup and the string to be used are at the same address. That's painful, since the path is naturally built end-to-beginning - we start at the leaf and go through the ancestors, accumulating the pathname. Life would be easier if we left the buffer allocation to callers. It wouldn't be exact-sized buffer, but none of the callers keep the result for long - it's always freed before the caller returns. So there's no need to do exact-sized allocation; better use __getname()/__putname(), same as we do for pathname arguments of syscalls. What's more, there's no need to do allocation under spinlocks, so GFP_ATOMIC is not needed. Next patch will replace the open-coded dentry_path_raw() (in build_path_from_dentry_optional_prefix()) with calling the real thing. This patch only introduces wrappers for allocating/freeing the buffers and switches to new calling conventions: build_path_from_dentry(dentry, buf) expects buf to be address of a page-sized object or NULL, return value is a pathname built inside that buffer on success, ERR_PTR(-ENOMEM) if buf is NULL and ERR_PTR(-ENAMETOOLONG) if the pathname won't fit into page. Note that we don't need to check for failure when allocating the buffer in the caller - build_path_from_dentry() will do the right thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-25cifs: make build_path_from_dentry() return const char *Al Viro
... and adjust the callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-12do_cifs_create(): don't set ->i_mode of something we had not createdAl Viro
If the file had existed before we'd called ->atomic_open() (without O_EXCL, that is), we have no more business setting ->i_mode than we would setting ->i_uid or ->i_gid. We also have no business doing either if another client has managed to get unlink+mkdir between ->open() and cifs_inode_get_info(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-02-23Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-05cifs: report error instead of invalid when revalidating a dentry failsAurelien Aptel
Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Suggested-by: Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com> CC: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-24fs: make helpers idmap mount awareChristian Brauner
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-12-13cifs: rename smb_vol as smb3_fs_context and move it to fs_context.hRonnie Sahlberg
Harmonize and change all such variables to 'ctx', where possible. No changes to actual logic. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-12smb311: add support for using info level for posix extensions querySteve French
Adds calls to the newer info level for query info using SMB3.1.1 posix extensions. The remaining two places that call the older query info (non-SMB3.1.1 POSIX) require passing in the fid and can be updated in a later patch. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12smb311: Add support for lookup with posix extensions query infoSteve French
Improve support for lookup when using SMB3.1.1 posix mounts. Use new info level 100 (posix query info) Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-03-12cifs_atomic_open(): fix double-put on late allocation failureAl Viro
several iterations of ->atomic_open() calling conventions ago, we used to need fput() if ->atomic_open() failed at some point after successful finish_open(). Now (since 2016) it's not needed - struct file carries enough state to make fput() work regardless of the point in struct file lifecycle and discarding it on failure exits in open() got unified. Unfortunately, I'd missed the fact that we had an instance of ->atomic_open() (cifs one) that used to need that fput(), as well as the stale comment in finish_open() demanding such late failure handling. Trivially fixed... Fixes: fe9ec8291fca "do_last(): take fput() on error after opening to out:" Cc: stable@kernel.org # v4.7+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-03SMB3: Backup intent flag missing from some more opsAmir Goldstein
When "backup intent" is requested on the mount (e.g. backupuid or backupgid mount options), the corresponding flag was missing from some of the operations. Change all operations to use the macro cifs_create_options() to set the backup intent flag if needed. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Return directly after a failed build_path_from_dentry() in ↵Markus Elfring
cifs_do_create() Return directly after a call of the function "build_path_from_dentry" failed at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-10-09CIFS: Force reval dentry if LOOKUP_REVAL flag is setPavel Shilovsky
Mark inode for force revalidation if LOOKUP_REVAL flag is set. This tells the client to actually send a QueryInfo request to the server to obtain the latest metadata in case a directory or a file were changed remotely. Only do that if the client doesn't have a lease for the file to avoid unneeded round trips to the server. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16cifs: create a helper to find a writeable handle by path nameRonnie Sahlberg
rename() takes a path for old_file and in SMB2 we used to just create a compound for create(old_path)/rename/close(). If we already have a writable handle we can avoid the create() and close() altogether and just use the existing handle. For this situation, as we avoid doing the create() we also avoid triggering an oplock break for the existing handle. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27cifs: replace various strncpy with strscpy and similarRonnie Sahlberg
Using strscpy is cleaner, and avoids some problems with handling maximum length strings. Linus noticed the original problem and Aurelien pointed out some additional problems. Fortunately most of this is SMB1 code (and in particular the ASCII string handling older, which is less common). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-14CIFS: make mknod() an smb_version_opAurelien Aptel
This cleanup removes cifs specific code from SMB2/SMB3 code paths which is cleaner and easier to maintain as the code to handle special files is improved. Below is an example creating special files using 'sfu' mount option over SMB3 to Windows (with this patch) (Note that to Samba server, support for saving dos attributes has to be enabled for the SFU mount option to work). In the future this will also make implementation of creating special files as reparse points easier (as Windows NFS server does for example). root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/char character special file root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/block block special file Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-12-06cifs: Fix separator when building path from dentryPaulo Alcantara
Make sure to use the CIFS_DIR_SEP(cifs_sb) as path separator for prefixpath too. Fixes a bug with smb1 UNIX extensions. Fixes: a6b5058fafdf ("fs/cifs: make share unaccessible at root level mountable") Signed-off-by: Paulo Alcantara <palcantara@suse.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2018-07-12get rid of 'opened' argument of ->atomic_open() - part 3Al Viro
now it can be done... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12getting rid of 'opened' argument of ->atomic_open() - part 1Al Viro
'opened' argument of finish_open() is unused. Kill it. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
2018-07-12introduce FMODE_CREATED and switch to itAl Viro
Parallel to FILE_CREATED, goes into ->f_mode instead of *opened. NFS is a bit of a wart here - it doesn't have file at the point where FILE_CREATED used to be set, so we need to propagate it there (for now). IMA is another one (here and everywhere)... Note that this needs do_dentry_open() to leave old bits in ->f_mode alone - we want it to preserve FMODE_CREATED if it had been already set (no other bit can be there). Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-04Merge tag '4.18-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: - smb3 fixes for stable - addition of ftrace hooks for cifs.ko - improvements in compounding and smbdirect (rdma) * tag '4.18-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (38 commits) CIFS: Add support for direct pages in wdata CIFS: Use offset when reading pages CIFS: Add support for direct pages in rdata cifs: update multiplex loop to handle compounded responses cifs: remove header_preamble_size where it is always 0 cifs: remove struct smb2_hdr CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiry, status STATUS_NETWORK_SESSION_EXPIRED, however the server can also respond with STATUS_USER_SESSION_DELETED in cases where the session has been idle for some time and the server reaps the session to recover resources. cifs: change smb2_get_data_area_len to take a smb2_sync_hdr as argument cifs: update smb2_calc_size to use smb2_sync_hdr instead of smb2_hdr cifs: remove struct smb2_oplock_break_rsp cifs: remove rfc1002 header from all SMB2 response structures smb3: on reconnect set PreviousSessionId field smb3: Add posix create context for smb3.11 posix mounts smb3: add tracepoints for smb2/smb3 open cifs: add debug output to show nocase mount option smb3: add define for id for posix create context and corresponding struct cifs: update smb2_check_message to handle PDUs without a 4 byte length header smb3: allow "posix" mount option to enable new SMB311 protocol extensions smb3: add support for posix negotiate context cifs: allow disabling less secure legacy dialects ...
2018-06-04Merge branch 'work.lookup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache lookup cleanups from Al Viro: "Cleaning ->lookup() instances up - mostly d_splice_alias() conversions" * 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (29 commits) switch the rest of procfs lookups to d_splice_alias() procfs: switch instantiate_t to d_splice_alias() don't bother with tid_fd_revalidate() in lookups proc_lookupfd_common(): don't bother with instantiate unless the file is open procfs: get rid of ancient BS in pid_revalidate() uses cifs_lookup(): switch to d_splice_alias() cifs_lookup(): cifs_get_inode_...() never returns 0 with *inode left NULL 9p: unify paths in v9fs_vfs_lookup() ncp_lookup(): use d_splice_alias() hfsplus: switch to d_splice_alias() hfs: don't allow mounting over .../rsrc hfs: use d_splice_alias() omfs_lookup(): report IO errors, use d_splice_alias() orangefs_lookup: simplify openpromfs: switch to d_splice_alias() xfs_vn_lookup: simplify a bit adfs_lookup: do not fail with ENOENT on negatives, use d_splice_alias() adfs_lookup_byname: .. *is* taken care of in fs/namei.c romfs_lookup: switch to d_splice_alias() qnx6_lookup: switch to d_splice_alias() ...
2018-05-31smb3: Add posix create context for smb3.11 posix mountsSteve French
Signed-off-by: Steve French <smfrench@gmail.com>
2018-05-22cifs_lookup(): switch to d_splice_alias()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22cifs_lookup(): cifs_get_inode_...() never returns 0 with *inode left NULLAl Viro
not since 2004... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-20cifs: do not allow creating sockets except with SMB1 posix exensionsSteve French
RHBZ: 1453123 Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040 Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used. Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin. Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com> CC: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Cc: stable@vger.kernel.org
2017-10-30cifs: check MaxPathNameComponentLength != 0 before using itRonnie Sahlberg
And fix tcon leak in error path. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: David Disseldorp <ddiss@samba.org>
2017-08-30CIFS: remove endian related sparse warningSteve French
Recent patch had an endian warning ie cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() Signed-off-by: Steve French <smfrench@gmail.com> CC: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
2017-08-23cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()Ronnie Sahlberg
Add checking for the path component length and verify it is <= the maximum that the server advertizes via FileFsAttributeInformation. With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT when users to access an overlong path. To test this, try to cd into a (non-existing) directory on a CIFS share that has a too long name: cd /mnt/aaaaaaaaaaaaaaa... and it now should show a good error message from the shell: bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long rh bz 1153996 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Cc: <stable@vger.kernel.org>