summaryrefslogtreecommitdiff
path: root/fs/ioctl.c
AgeCommit message (Collapse)Author
2023-08-28Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2023-08-19fs: Fix kernel-doc warningsMatthew Wilcox (Oracle)
These have a variety of causes and a corresponding variety of solutions. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230818200824.2720007-1-willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-17fs: distinguish between user initiated freeze and kernel initiated freezeDarrick J. Wong
Userspace can freeze a filesystem using the FIFREEZE ioctl or by suspending the block device; this state persists until userspace thaws the filesystem with the FITHAW ioctl or resuming the block device. Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for the fsfreeze ioctl") we only allow the first freeze command to succeed. The kernel may decide that it is necessary to freeze a filesystem for its own internal purposes, such as suspends in progress, filesystem fsck activities, or quiescing a device prior to removal. Userspace thaw commands must never break a kernel freeze, and kernel thaw commands shouldn't undo userspace's freeze command. Introduce a couple of freeze holder flags and wire it into the sb_writers state. One kernel and one userspace freeze are allowed to coexist at the same time; the filesystem will not thaw until both are lifted. I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing behaviors. Cc: mcgrof@kernel.org Cc: jack@suse.cz Cc: hch@infradead.org Cc: ruansy.fnst@fujitsu.com Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2023-01-19fs: port inode_owner_or_capable() to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->fileattr_set() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-04-01Merge tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull vfs fix from Darrick Wong: "The erofs developers felt that FIEMAP should handle ranged requests starting at s_maxbytes by returning EFBIG instead of passing the filesystem implementation a nonsense 0-byte request. Not sure why they keep tagging this 'iomap', but the VFS shouldn't be asking for information about ranges of a file that the filesystem already declared that it does not support. - Fix a potential infinite loop in FIEMAP by fixing an off by one error when comparing the requested range against s_maxbytes" * tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: fix an infinite loop in iomap_fiemap
2022-03-30fs: fix an infinite loop in iomap_fiemapGuo Xuenan
when get fiemap starting from MAX_LFS_FILESIZE, (maxbytes - *len) < start will always true , then *len set zero. because of start offset is beyond file size, for erofs filesystem it will always return iomap.length with zero,iomap iterate will enter infinite loop. it is necessary cover this corner case to avoid this situation. ------------[ cut here ]------------ WARNING: CPU: 7 PID: 905 at fs/iomap/iter.c:35 iomap_iter+0x97f/0xc70 Modules linked in: xfs erofs CPU: 7 PID: 905 Comm: iomap Tainted: G W 5.17.0-rc8 #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:iomap_iter+0x97f/0xc70 Code: 85 a1 fc ff ff e8 71 be 9c ff 0f 1f 44 00 00 e9 92 fc ff ff e8 62 be 9c ff 0f 0b b8 fb ff ff ff e9 fc f8 ff ff e8 51 be 9c ff <0f> 0b e9 2b fc ff ff e8 45 be 9c ff 0f 0b e9 e1 fb ff ff e8 39 be RSP: 0018:ffff888060a37ab0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888060a37bb0 RCX: 0000000000000000 RDX: ffff88807e19a900 RSI: ffffffff81a7da7f RDI: ffff888060a37be0 RBP: 7fffffffffffffff R08: 0000000000000000 R09: ffff888060a37c20 R10: ffff888060a37c67 R11: ffffed100c146f8c R12: 7fffffffffffffff R13: 0000000000000000 R14: ffff888060a37bd8 R15: ffff888060a37c20 FS: 00007fd3cca01540(0000) GS:ffff888108780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020010820 CR3: 0000000054b92000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> iomap_fiemap+0x1c9/0x2f0 erofs_fiemap+0x64/0x90 [erofs] do_vfs_ioctl+0x40d/0x12e0 __x64_sys_ioctl+0xaa/0x1c0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 0000000000000000 ]--- watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [iomap:905] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [djwong: fix some typos] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-14fs: allow cross-vfsmount reflink/dedupeJosef Bacik
Currently we disallow reflink and dedupe if the two files aren't on the same vfsmount. However we really only need to disallow it if they're not on the same super block. It is very common for btrfs to have a main subvolume that is mounted and then different subvolumes mounted at different locations. It's allowed to reflink between these volumes, but the vfsmount check disallows this. Instead fix dedupe to check for the same superblock, and simply remove the vfsmount check for reflink as it already does the superblock check. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-15fs/ioctl: remove unnecessary __user annotationAmit Daniel Kachhap
__user annotations are used by the checker (e.g sparse) to mark user pointers. However here __user is applied to a struct directly, without a pointer being directly involved. Although the presence of __user does not cause sparse to emit a warning, __user should be removed for consistency with other uses of offsetof(). Note: No functional changes intended. Link: https://lkml.kernel.org/r/20211122101256.7875-1-amit.kachhap@arm.com Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-31Merge tag 'vfs-5.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull project quota update from Darrick Wong: "A single VFS patch that prevents userspace from setting project quota ids on files that the VFS considers invalid" * tag 'vfs-5.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: forbid invalid project ID
2021-08-03fs: forbid invalid project IDWang Shilong
fileattr_set_prepare() should check if project ID is valid, otherwise dqget() will return NULL for such project ID quota. Signed-off-by: Wang Shilong <wshilong@ddn.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-27fs: remove generic_block_fiemapChristoph Hellwig
Remove the now unused generic_block_fiemap helper. Link: https://lore.kernel.org/r/20210720133341.405438-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2021-04-12vfs: add fileattr opsMiklos Szeredi
There's a substantial amount of boilerplate in filesystems handling FS_IOC_[GS]ETFLAGS/ FS_IOC_FS[GS]ETXATTR ioctls. Also due to userspace buffers being involved in the ioctl API this is difficult to stack, as shown by overlayfs issues related to these ioctls. Introduce a new internal API named "fileattr" (fsxattr can be confused with xattr, xflags is inappropriate, since this is more than just flags). There's significant overlap between flags and xflags and this API handles the conversions automatically, so filesystems may choose which one to use. In ->fileattr_get() a hint is provided to the filesystem whether flags or xattr are being requested by userspace, but in this series this hint is ignored by all filesystems, since generating all the attributes is cheap. If a filesystem doesn't implemement the fileattr API, just fall back to f_op->ioctl(). When all filesystems are converted, the fallback can be removed. 32bit compat ioctls are now handled by the generic code as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-31fs: remove ksys_ioctlChristoph Hellwig
Fold it into the only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03fs: remove the access_ok() check in ioctl_fiemapChristoph Hellwig
access_ok just checks we are fed a proper user pointer. We also do that in copy_to_user itself, so no need to do this early. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-9-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03fs: handle FIEMAP_FLAG_SYNC in fiemap_prepChristoph Hellwig
By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is handled once instead of duplicated, but can still be done under fs locks, like xfs/iomap intended with its duplicate handling. Also make sure the error value of filemap_write_and_wait is propagated to user space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03fs: move fiemap range validation into the file systems instancesChristoph Hellwig
Replace fiemap_check_flags with a fiemap_prep helper that also takes the inode and mapped range, and performs the sanity check and truncation previously done in fiemap_check_range. This way the validation is inside the file system itself and thus properly works for the stacked overlayfs case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-7-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03fs: move the fiemap definitions out of fs.hChristoph Hellwig
No need to pull the fiemap definitions into almost every file in the kernel build. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03fs: mark __generic_block_fiemap staticChristoph Hellwig
There is no caller left outside of ioctl.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-4-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-04-30fibmap: Warn and return an error in case of block > INT_MAXRitesh Harjani
We better warn the fibmap user and not return a truncated and therefore an incorrect block map address if the bmap() returned block address is greater than INT_MAX (since user supplied integer pointer). It's better to pr_warn() all user of ioctl_fibmap() and return a proper error code rather than silently letting a FS corruption happen if the user tries to fiddle around with the returned block map address. We fix this by returning an error code of -ERANGE and returning 0 as the block mapping address in case if it is > INT_MAX. Now iomap_bmap() could be called from either of these two paths. Either when a user is calling an ioctl_fibmap() interface to get the block mapping address or by some filesystem via use of bmap() internal kernel API. bmap() kernel API is well equipped with handling of u64 addresses. WARN condition in iomap_bmap_actor() was mainly added to warn all the fibmap users. But now that we have directly added this warning for all fibmap users and also made sure to return 0 as block map address in case if addr > INT_MAX. So we can now remove this logic from iomap_bmap_actor(). Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-02-08Merge tag 'compat-ioctl-fix' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull compat-ioctl fix from Arnd Bergmann: "One patch in the compat-ioctl series broke 32-bit rootfs for multiple people testing on 64-bit kernels. Let's fix it in -rc1 before others run into the same issue" * tag 'compat-ioctl-fix' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: compat_ioctl: fix FIONREAD on devices
2020-02-08Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: - bmap series from cmaiolino - getting rid of convolutions in copy_mount_options() (use a couple of copy_from_user() instead of the __get_user() crap) * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: saner copy_mount_options() fibmap: Reject negative block numbers fibmap: Use bmap instead of ->bmap method in ioctl_fibmap ecryptfs: drop direct calls to ->bmap cachefiles: drop direct usage of ->bmap method. fs: Enable bmap() function to properly return errors
2020-02-08compat_ioctl: fix FIONREAD on devicesArnd Bergmann
My final cleanup patch for sys_compat_ioctl() introduced a regression on the FIONREAD ioctl command, which is used for both regular and special files, but only works on regular files after my patch, as I had missed the warning that Al Viro put into a comment right above it. Change it back so it can work on any file again by moving the implementation to do_vfs_ioctl() instead. Fixes: 77b9040195de ("compat_ioctl: simplify the implementation") Reported-and-tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Reported-and-tested-by: youling257 <youling257@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-02-03fibmap: Reject negative block numbersCarlos Maiolino
FIBMAP receives an integer from userspace which is then implicitly converted into sector_t to be passed to bmap(). No check is made to ensure userspace didn't send a negative block number, which can end up in an underflow, and returning to userspace a corrupted block address. As a side-effect, the underflow caused by a negative block here, will trigger the WARN() in iomap_bmap_actor(), which is how this issue was first discovered. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-03fibmap: Use bmap instead of ->bmap method in ioctl_fibmapCarlos Maiolino
Now we have the possibility of proper error return in bmap, use bmap() function in ioctl_fibmap() instead of calling ->bmap method directly. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-03compat_ioctl: simplify the implementationArnd Bergmann
Now that both native and compat ioctl syscalls are in the same file, a couple of simplifications can be made, bringing the implementation closer together: - do_vfs_ioctl(), ioctl_preallocate(), and compat_ioctl_preallocate() can become static, allowing the compiler to optimize better - slightly update the coding style for consistency between the functions. - rather than listing each command in two switch statements for the compat case, just call a single function that has all the common commands. As a side-effect, FS_IOC_RESVSP/FS_IOC_RESVSP64 are now available to x86 compat tasks, along with FS_IOC_RESVSP_32/FS_IOC_RESVSP64_32. This is harmless for i386 emulation, and can be considered a bugfix for x32 emulation, which never supported these in the past. Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-01-03compat_ioctl: move sys_compat_ioctl() to ioctl.cArnd Bergmann
The rest of the fs/compat_ioctl.c file is no longer useful now, so move the actual syscall as planned. Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-02Merge tag 'xfs-5.5-merge-16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull XFS updates from Darrick Wong: "For this release, we changed quite a few things. Highlights: - Fixed some long tail latency problems in the block allocator - Removed some long deprecated (and for the past several years no-op) mount options and ioctls - Strengthened the extended attribute and directory verifiers - Audited and fixed all the places where we could return EFSCORRUPTED without logging anything - Refactored the old SGI space allocation ioctls to make the equivalent fallocate calls - Fixed a race between fallocate and directio - Fixed an integer overflow when files have more than a few billion(!) extents - Fixed a longstanding bug where quota accounting could be incorrect when performing unwritten extent conversion on a freshly mounted fs - Fixed various complaints in scrub about soft lockups and unresponsiveness to signals - De-vtable'd the directory handling code, which should make it faster - Converted to the new mount api, for better or for worse - Cleaned up some memory leaks and quite a lot of other smaller fixes and cleanups. A more detailed summary: - Fill out the build string - Prevent inode fork extent count overflows - Refactor the allocator to reduce long tail latency - Rework incore log locking a little to reduce spinning - Break up the xfs_iomap_begin functions into smaller more cohesive parts - Fix allocation alignment being dropped too early when the allocation request is for more blocks than an AG is large - Other small cleanups - Clean up file buftarg retrieval helpers - Hoist the resvsp and unresvsp ioctls to the vfs - Remove the undocumented biosize mount option, since it has never been mentioned as existing or supported on linux - Clean up some of the mount option printing and parsing - Enhance attr leaf verifier to check block structure - Check dirent and attr names for invalid characters before passing them to the vfs - Refactor open-coded bmbt walking - Fix a few places where we return EIO instead of EFSCORRUPTED after failing metadata sanity checks - Fix a synchronization problem between fallocate and aio dio corrupting the file length - Clean up various loose ends in the iomap and bmap code - Convert to the new mount api - Make sure we always log something when returning EFSCORRUPTED - Fix some problems where long running scrub loops could trigger soft lockup warnings and/or fail to exit due to fatal signals pending - Fix various Coverity complaints - Remove most of the function pointers from the directory code to reduce indirection penalties - Ensure that dquots are attached to the inode when performing unwritten extent conversion after io - Deuglify incore projid and crtime types - Fix another AGI/AGF locking order deadlock when renaming - Clean up some quota typedefs - Remove the FSSETDM ioctls which haven't done anything in 20 years - Fix some memory leaks when mounting the log fails - Fix an underflow when updating an xattr leaf freemap - Remove some trivial wrappers - Report metadata corruption as an error, not a (potentially) fatal assertion - Clean up the dir/attr buffer mapping code - Allow fatal signals to kill scrub during parent pointer checks" * tag 'xfs-5.5-merge-16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (198 commits) xfs: allow parent directory scans to be interrupted with fatal signals xfs: remove the mappedbno argument to xfs_da_get_buf xfs: remove the mappedbno argument to xfs_da_read_buf xfs: split xfs_da3_node_read xfs: remove the mappedbno argument to xfs_dir3_leafn_read xfs: remove the mappedbno argument to xfs_dir3_leaf_read xfs: remove the mappedbno argument to xfs_attr3_leaf_read xfs: remove the mappedbno argument to xfs_da_reada_buf xfs: improve the xfs_dabuf_map calling conventions xfs: refactor xfs_dabuf_map xfs: simplify mappedbno handling in xfs_da_{get,read}_buf xfs: report corruption only as a regular error xfs: Remove kmem_zone_free() wrapper xfs: Remove kmem_zone_destroy() wrapper xfs: Remove slab init wrappers xfs: fix attr leaf header freemap.size underflow xfs: fix some memory leaks in log recovery xfs: fix another missing include xfs: remove XFS_IOC_FSSETDM and XFS_IOC_FSSETDM_BY_HANDLE xfs: remove duplicated include from xfs_dir2_data.c ...
2019-10-28fs: add generic UNRESVSP and ZERO_RANGE ioctl handlersChristoph Hellwig
These use the same scheme as the pre-existing mapping of the XFS RESVP ioctls to ->falloc, so just extend it and remove the XFS implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> [darrick: fix compile error on s390] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-23compat: move FS_IOC_RESVSP_32 handling to fs/ioctl.cAl Viro
... and lose the ridiculous games with compat_alloc_user_space() there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23do_vfs_ioctl(): use saner typesAl Viro
casting to pointer to int, only to pass that to function that takes pointer to void and uses it as pointer to structure is really asking for trouble. "Some pointer, I'm not sure what to" is spelled "void *", not "int *"; use that. And declare the functions we are passing that pointer to as taking the pointer to what they really want to access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23compat_ioctl: add compat_ptr_ioctl()Arnd Bergmann
Many drivers have ioctl() handlers that are completely compatible between 32-bit and 64-bit architectures, except for the argument that is passed down from user space and may have to be passed through compat_ptr() in order to become a valid 64-bit pointer. Using ".compat_ptr = compat_ptr_ioctl" in file operations should let us simplify a lot of those drivers to avoid #ifdef checks, and convert additional drivers that don't have proper compat handling yet. On most architectures, the compat_ptr_ioctl() just passes all arguments to the corresponding ->ioctl handler. The exception is arch/s390, where compat_ptr() clears the top bit of a 32-bit pointer value, so user space pointers to the second 2GB alias the first 2GB, as is the case for native 32-bit s390 user space. The compat_ptr_ioctl() function must therefore be used only with ioctl functions that either ignore the argument or pass a pointer to a compatible data type. If any ioctl command handled by fops->unlocked_ioctl passes a plain integer instead of a pointer, or any of the passed data types is incompatible between 32-bit and 64-bit architectures, a proper handler is required instead of compat_ptr_ioctl. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- v3: add a better description v2: use compat_ptr_ioctl instead of generic_compat_ioctl_ptrarg, as suggested by Al Viro
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-02Merge tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull vfs dedup fixes from Dave Chinner: "This reworks the vfs data cloning infrastructure. We discovered many issues with these interfaces late in the 4.19 cycle - the worst of them (data corruption, setuid stripping) were fixed for XFS in 4.19-rc8, but a larger rework of the infrastructure fixing all the problems was needed. That rework is the contents of this pull request. Rework the vfs_clone_file_range and vfs_dedupe_file_range infrastructure to use a common .remap_file_range method and supply generic bounds and sanity checking functions that are shared with the data write path. The current VFS infrastructure has problems with rlimit, LFS file sizes, file time stamps, maximum filesystem file sizes, stripping setuid bits, etc and so they are addressed in these commits. We also introduce the ability for the ->remap_file_range methods to return short clones so that clones for vfs_copy_file_range() don't get rejected if the entire range can't be cloned. It also allows filesystems to sliently skip deduplication of partial EOF blocks if they are not capable of doing so without requiring errors to be thrown to userspace. Existing filesystems are converted to user the new remap_file_range method, and both XFS and ocfs2 are modified to make use of the new generic checking infrastructure" * tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits) xfs: remove [cm]time update from reflink calls xfs: remove xfs_reflink_remap_range xfs: remove redundant remap partial EOF block checks xfs: support returning partial reflink results xfs: clean up xfs_reflink_remap_blocks call site xfs: fix pagecache truncation prior to reflink ocfs2: remove ocfs2_reflink_remap_range ocfs2: support partial clone range and dedupe range ocfs2: fix pagecache truncation prior to reflink ocfs2: truncate page cache for clone destination file before remapping vfs: clean up generic_remap_file_range_prep return value vfs: hide file range comparison function vfs: enable remap callers that can handle short operations vfs: plumb remap flags through the vfs dedupe functions vfs: plumb remap flags through the vfs clone functions vfs: make remap_file_range functions take and return bytes completed vfs: remap helper should update destination inode metadata vfs: pass remap flags to generic_remap_checks vfs: pass remap flags to generic_remap_file_range_prep vfs: combine the clone and dedupe into a single remap_file_range ...
2018-10-30vfs: plumb remap flags through the vfs clone functionsDarrick J. Wong
Plumb a remap_flags argument through the {do,vfs}_clone_file_range functions so that clone can take advantage of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: make remap_file_range functions take and return bytes completedDarrick J. Wong
Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource limits in a graceful manner. A subsequent patch will enable copy_file_range to signal to the ->clone_file_range implementation that it can handle a short length, which will be returned in the function's return value. For now the short return is not implemented anywhere so the behavior won't change -- either copy_file_range manages to clone the entire range or it tries an alternative. Neither clone ioctl can take advantage of this, alas. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-26vfs: fix FIGETBSZ ioctl on an overlayfs fileAmir Goldstein
Some anon_bdev filesystems (e.g. overlayfs, ceph) don't have s_blocksize set. Returning zero from FIGETBSZ ioctl results in a Floating point exception from the e2fsprogs utility filefrag, which divides the size of the file with the value returned by FIGETBSZ. Fix the interface by returning -EINVAL for these filesystems. Fixes: d1d04ef8572b ("ovl: stack file ops") Cc: <stable@vger.kernel.org> # v4.19 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-24vfs: swap names of {do,vfs}_clone_file_range()Amir Goldstein
Commit 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze protection") created a wrapper do_clone_file_range() around vfs_clone_file_range() moving the freeze protection to former, so overlayfs could call the latter. The more common vfs practice is to call do_xxx helpers from vfs_xxx helpers, where freeze protecction is taken in the vfs_xxx helper, so this anomality could be a source of confusion. It seems that commit 8ede205541ff ("ovl: add reflink/copyfile/dedup support") may have fallen a victim to this confusion - ovl_clone_file_range() calls the vfs_clone_file_range() helper in the hope of getting freeze protection on upper fs, but in fact results in overlayfs allowing to bypass upper fs freeze protection. Swap the names of the two helpers to conform to common vfs practice and call the correct helpers from overlayfs and nfsd. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18vfs: export vfs_ioctl() to modulesMiklos Szeredi
This is needed by the stacked ioctl implementation in overlayfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-24fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystemsSeth Forshee
The user in control of a super block should be allowed to freeze and thaw it. Relax the restrictions on the FIFREEZE and FITHAW ioctls to require CAP_SYS_ADMIN in s_user_ns. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-04-02fs: add ksys_ioctl() helper; remove in-kernel calls to sys_ioctl()Dominik Brodowski
Using this helper allows us to avoid the in-kernel calls to the sys_ioctl() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_ioctl(). After careful review, at least some of these calls could be converted to do_vfs_ioctl() in future. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-02sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API ↵Ingo Molnar
dependency Instead of including the full <linux/signal.h>, we are going to include the types-only <linux/signal_types.h> header in <linux/sched.h>, to further decouple the scheduler header from the signal headers. This means that various files which relied on the full <linux/signal.h> need to be updated to gain an explicit dependency on it. Update the code that relies on sched.h's inclusion of the <linux/signal.h> header. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-16vfs: call vfs_clone_file_range() under freeze protectionAmir Goldstein
Move sb_start_write()/sb_end_write() out of the vfs helper and up into the ioctl handler. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16vfs: allow vfs_clone_file_range() across mount pointsAmir Goldstein
FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest files are not on the same mount point. Practically, clone only requires that src and dest files are on the same file system. Move the check for same mount point to ioctl handler and keep only the check for same super block in the vfs helper. A following patch is going to use the vfs_clone_file_range() helper in overlayfs to copy up between lower and upper mount points on the same file system. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-15vfs: cap dedupe request structure size at PAGE_SIZEDarrick J. Wong
Kirill A Shutemov reports that the kernel doesn't try to cap dest_count in any way, and uses the number to allocate kernel memory. This causes high order allocation warnings in the kernel log if someone passes in a big enough value. We should clamp the allocation at PAGE_SIZE to avoid stressing the VM. The two existing users of the dedupe ioctl never send more than 120 requests, so we can safely clamp dest_range at PAGE_SIZE, because with 4k pages we can handle up to 127 dedupe candidates. Given the max extent length of 16MB, we can end up doing 2GB of IO which is plenty. [ Note: the "offsetof()" can't overflow, because 'count' is just a 16-bit integer. That's not obvious in the limited context of the patch, so I'm noting it here because it made me go look. - Linus ] Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-15vfs: fix return type of ioctl_file_dedupe_rangeDarrick J. Wong
All the VFS functions in the dedupe ioctl path return int status, so the ioctl handler ought to as well. Found by Coverity, CID 1350952. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28vfs: ioctl: prevent double-fetch in dedupe ioctlScott Bauer
This prevents a double-fetch from user space that can lead to to an undersized allocation and heap overflow. Fixes: 54dbc1517237 ("vfs: hoist the btrfs deduplication ioctl to the vfs") Signed-off-by: Scott Bauer <sbauer@plzdonthack.me> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-12Merge branch 'work.copy_file_range' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs copy_file_range updates from Al Viro: "Several series around copy_file_range/CLONE" * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: use new dedupe data function pointer vfs: hoist the btrfs deduplication ioctl to the vfs vfs: wire up compat ioctl for CLONE/CLONE_RANGE cifs: avoid unused variable and label nfsd: implement the NFSv4.2 CLONE operation nfsd: Pass filehandle to nfs4_preprocess_stateid_op() vfs: pull btrfs clone API to vfs layer locks: new locks_mandatory_area calling convention vfs: Add vfs_copy_file_range() support for pagecache copies btrfs: add .copy_file_range file operation x86: add sys_copy_file_range to syscall tables vfs: add copy_file_range syscall and vfs helper