summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-01-09fs: move guard_bio_eod() after bio_set_op_attrsMing Lei
Commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod") adds bio_truncate() for handling bio EOD. However, bio_truncate() doesn't use the passed 'op' parameter from guard_bio_eod's callers. So bio_trunacate() may retrieve wrong 'op', and zering pages may not be done for READ bio. Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs() in submit_bh_wbc() so that bio_truncate() can always retrieve correct op info. Meantime remove the 'op' parameter from guard_bio_eod() because it isn't used any more. Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: linux-fsdevel@vger.kernel.org Fixes: 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod") Signed-off-by: Ming Lei <ming.lei@redhat.com> Fold in kerneldoc and bio_op() change. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-28block: add bio_truncate to fix guard_bio_eodMing Lei
Some filesystem, such as vfat, may send bio which crosses device boundary, and the worse thing is that the IO request starting within device boundaries can contain more than one segment past EOD. Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors") tries to fix this issue by returning -EIO for this situation. However, this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb() may hang for ever. Also the current truncating on last segment is dangerous by updating the last bvec, given bvec table becomes not immutable any more, and fs bio users may not retrieve the truncated pages via bio_for_each_segment_all() in its .end_io callback. Fixes this issue by supporting multi-segment truncating. And the approach is simpler: - just update bio size since block layer can make correct bvec with the updated bio size. Then bvec table becomes really immutable. - zero all truncated segments for read bio Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: linux-fsdevel@vger.kernel.org Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors") Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-16Merge tag 'linux-kselftest-5.5-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - ftrace and safesetid test fixes from Masami Hiramatsu - Kunit fixes from Brendan Higgins, Iurii Zaikin, and Heidi Fahim - Kselftest framework fixes from SeongJae Park and Michael Ellerman * tag 'linux-kselftest-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kselftest: Support old perl versions kselftest/runner: Print new line in print of timeout log selftests: Fix dangling documentation references to kselftest_module.sh Documentation: kunit: add documentation for kunit_tool Documentation: kunit: fix typos and gramatical errors kunit: testing kunit: Bug fix in test_run_timeout function fs/ext4/inode-test: Fix inode test on 32 bit platforms. selftests: safesetid: Fix Makefile to set correct test program selftests: safesetid: Check the return value of setuid/setgid selftests: safesetid: Move link library to LDLIBS selftests/ftrace: Fix multiple kprobe testcase selftests/ftrace: Do not to use absolute debugfs path selftests/ftrace: Fix ftrace test cases to check unsupported selftests/ftrace: Fix to check the existence of set_ftrace_filter
2019-12-15Merge branch 'remove-ksys-mount-dup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull ksys_mount() and ksys_dup() removal from Dominik Brodowski: "This small series replaces all in-kernel calls to the userspace-focused ksys_mount() and ksys_dup() with calls to kernel-centric functions: For each replacement of ksys_mount() with do_mount(), one needs to verify that the first and third parameter (char *dev_name, char *type) are strings allocated in kernelspace and that the fifth parameter (void *data) is either NULL or refers to a full page (only occurence in init/do_mounts.c::do_mount_root()). The second and fourth parameters (char *dir_name, unsigned long flags) are passed by ksys_mount() to do_mount() unchanged, and therefore do not require particular care. Moreover, instead of pretending to be userspace, the opening of /dev/console as stdin/stdout/stderr can be implemented using in-kernel functions as well. Thereby, ksys_dup() can be removed for good" [ This doesn't get rid of the special "kernel init runs with KERNEL_DS" case, but it at least removes _some_ of the users of "treat kernel pointers as user pointers for our magical init sequence". One day we'll hopefully be rid of it all, and can initialize our init_thread addr_limit to USER_DS. - Linus ] * 'remove-ksys-mount-dup' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: fs: remove ksys_dup() init: unify opening /dev/console as stdin/stdout/stderr init: use do_mount() instead of ksys_mount() initrd: use do_mount() instead of ksys_mount() devtmpfs: use do_mount() instead of ksys_mount()
2019-12-14Merge tag '5.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cfis fixes from Steve French: "Three small smb3 fixes: this addresses two recent issues reported in additional testing during rc1, a refcount underflow and a problem with an intermittent crash in SMB2_open_init" * tag '5.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Close cached root handle only if it has a lease SMB3: Fix crash in SMB2_open_init due to uninitialized field in compounding path smb3: fix refcount underflow warning on unmount when no directory leases
2019-12-14Merge tag 'ovl-fixes-5.5-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix some bugs and documentation" * tag 'ovl-fixes-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: docs: filesystems: overlayfs: Fix restview warnings docs: filesystems: overlayfs: Rename overlayfs.txt to .rst ovl: relax WARN_ON() on rename to self ovl: fix corner case of non-unique st_dev;st_ino ovl: don't use a temp buf for encoding real fh ovl: make sure that real fid is 32bit aligned in memory ovl: fix lookup failure on multi lower squashfs
2019-12-13Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that don't sever if the result is < 0. This is mostly for linked timeouts, where if we ask for a pure timeout we always get -ETIME. This makes links useless for that case, hence allow a case where it works. - Five minor optimizations to fix and improve cases that regressed since v5.4. - An SQTHREAD locking fix. - A sendmsg/recvmsg iov assignment fix. - Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and subsequently ensuring that works for io_uring. - Fix a case where for an invalid opcode we might return -EBADF instead of -EINVAL, if the ->fd of that sqe was set to an invalid fd value. * tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block: io_uring: ensure we return -EINVAL on unknown opcode io_uring: add sockets to list of files that support non-blocking issue net: make socket read/write_iter() honor IOCB_NOWAIT io_uring: only hash regular files for async work execution io_uring: run next sqe inline if possible io_uring: don't dynamically allocate poll data io_uring: deferred send/recvmsg should assign iov io_uring: sqthread should grab ctx->uring_lock for submissions io-wq: briefly spin for new work after finishing work io-wq: remove worker->wait waitqueue io_uring: allow unbreakable links
2019-12-13Merge tag 'sizeof_field-v5.5-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull FIELD_SIZEOF conversion from Kees Cook: "A mostly mechanical treewide conversion from FIELD_SIZEOF() to sizeof_field(). This avoids the redundancy of having 2 macros (actually 3) doing the same thing, and consolidates on sizeof_field(). While "field" is not an accurate name, it is the common name used in the kernel, and doesn't result in any unintended innuendo. As there are still users of FIELD_SIZEOF() in -next, I will clean up those during this coming development cycle and send the final old macro removal patch at that time" * tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Use sizeof_field() macro MIPS: OCTEON: Replace SIZEOF_FIELD() macro
2019-12-13CIFS: Close cached root handle only if it has a leasePavel Shilovsky
SMB2_tdis() checks if a root handle is valid in order to decide whether it needs to close the handle or not. However if another thread has reference for the handle, it may end up with putting the reference twice. The extra reference that we want to put during the tree disconnect is the reference that has a directory lease. So, track the fact that we have a directory lease and close the handle only in that case. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-13SMB3: Fix crash in SMB2_open_init due to uninitialized field in compounding pathSteve French
Ran into an intermittent crash in SMB2_open_init+0x2f6/0x970 due to oparms.cifs_sb not being initialized when called from: smb2_compound_op+0x45d/0x1690 Zero the whole oparms struct in the compounding path before setting up the oparms so we don't risk any uninitialized fields. Fixes: fdef665ba44a ("smb3: fix mode passed in on create for modetosid mount option") Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-12-12Merge tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A fix to avoid a corner case when scheduling cap reclaim in batches from Xiubo, a patch to add some observability into cap waiters from Jeff and a couple of cleanups" * tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client: ceph: add more debug info when decoding mdsmap ceph: switch to global cap helper ceph: trigger the reclaim work once there has enough pending caps ceph: show tasks waiting on caps in debugfs caps file ceph: convert int fields in ceph_mount_options to unsigned int
2019-12-12fs: remove ksys_dup()Dominik Brodowski
ksys_dup() is used only at one place in the kernel, namely to duplicate fd 0 of /dev/console to stdout and stderr. The same functionality can be achieved by using functions already available within the kernel namespace. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2019-12-12init: use do_mount() instead of ksys_mount()Dominik Brodowski
In prepare_namespace(), do_mount() can be used instead of ksys_mount() as the first and third argument are const strings in the kernel, the second and fourth argument are passed through anyway, and the fifth argument is NULL. In do_mount_root(), ksys_mount() is called with the first and third argument being already kernelspace strings, which do not need to be copied over from userspace to kernelspace (again). The second and fourth arguments are passed through to do_mount() anyway. The fifth argument, while already residing in kernelspace, needs to be put into a page of its own. Then, do_mount() can be used instead of ksys_mount(). Once this is done, there are no in-kernel users to ksys_mount() left, which can therefore be removed. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2019-12-11Merge tag 'afs-fixes-20191211' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fixes for AFS plus one patch to make debugging easier: - Fix how addresses are matched to server records. This is currently incorrect which means cache invalidation callbacks from the server don't necessarily get delivered correctly. This causes stale data and metadata to be seen under some circumstances. - Make the dynamic root superblock R/W so that rpm/dnf can reapply the SELinux label to it when upgrading the Fedora filesystem-afs package. If the filesystem is R/O, this fails and the upgrade fails. It might be better in future to allow setxattr from an LSM to bypass the R/O protections, if only for pseudo-filesystems. - Fix the parsing of mountpoint strings. The mountpoint object has to have a terminal dot, whereas the source/device string passed to mount should not. This confuses type-forcing suffix detection leading to the wrong volume variant being mounted. - Make lookups in the dynamic root superblock for creation events (such as mkdir) fail with EOPNOTSUPP rather than something like EEXIST. The dynamic root only allows implicit creation by the ->lookup() method - and only if the target cell exists. - Fix the looking up of an AFS superblock to include the cell in the matching key - otherwise all volumes with the same ID number are treated as the same thing, irrespective of which cell they're in. - Show the volume name of each volume in the volume records displayed in /proc/net/afs/<cell>/volumes. This proved useful in debugging as it provides a way to map the volume IDs to names, where the names are what appear in /proc/mounts" * tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Show volume name in /proc/net/afs/<cell>/volumes afs: Fix missing cell comparison in afs_test_super() afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP afs: Fix mountpoint parsing afs: Fix SELinux setting security label on /afs afs: Fix afs_find_server lookups for ipv4 peers
2019-12-11io_uring: ensure we return -EINVAL on unknown opcodeJens Axboe
If we submit an unknown opcode and have fd == -1, io_op_needs_file() will return true as we default to needing a file. Then when we go and assign the file, we find the 'fd' invalid and return -EBADF. We really should be returning -EINVAL for that case, as we normally do for unsupported opcodes. Change io_op_needs_file() to have the following return values: 0 - does not need a file 1 - does need a file < 0 - error value and use this to pass back the right value for this invalid case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-11Merge tag 'erofs-for-5.5-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Mainly address a regression reported by David recently observed together with overlayfs due to the improper return value of listxattr() without xattr. Update outdated expressions in document as well. Summary: - Fix improper return value of listxattr() with no xattr - Keep up documentation with latest code" * tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: update documentation erofs: zero out when listxattr is called with no xattr
2019-12-11pipe: simplify signal handling in pipe_read() and add commentsLinus Torvalds
There's no need to separately check for signals while inside the locked region, since we're going to do "wait_event_interruptible()" right afterwards anyway, and the error handling is much simpler there. The check for whether we had already read anything was also redundant, since we no longer do the odd merging of reads when there are pending writers. But perhaps more importantly, this adds commentary about why we still need to wake up possible writers even though we didn't read any data, and why we can skip all the finishing touches now if we get a signal (or had a signal pending) while waiting for more data. [ This is a split-out cleanup from my "make pipe IO use exclusive wait queues" thing, which I can't apply because it triggers a nasty bug in the GNU make jobserver - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-11afs: Show volume name in /proc/net/afs/<cell>/volumesDavid Howells
Show the name of each volume in /proc/net/afs/<cell>/volumes to make it easier to work out the name corresponding to a volume ID. This makes it easier to work out which mounts in /proc/mounts correspond to which volume ID. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
2019-12-11afs: Fix missing cell comparison in afs_test_super()David Howells
Fix missing cell comparison in afs_test_super(). Without this, any pair volumes that have the same volume ID will share a superblock, no matter the cell, unless they're in different network namespaces. Normally, most users will only deal with a single cell and so they won't see this. Even if they do look into a second cell, they won't see a problem unless they happen to hit a volume with the same ID as one they've already got mounted. Before the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #grand.central.org:root.archive /afs/kth.se afs ro,relatime 0 0 After the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ admin/ common/ install/ OldFiles/ service/ system/ bakrestores/ home/ misc/ pkg/ src/ wsadmin/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #kth.se:root.cell /afs/kth.se afs ro,relatime 0 0 Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Carsten Jacobi <jacobi@de.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org> cc: Todd DeSantis <atd@us.ibm.com>
2019-12-11afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPPDavid Howells
Fix the lookup method on the dynamic root directory such that creation calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP rather than failing with some odd error (such as EEXIST). lookup() itself tries to create automount directories when it is invoked. These are cached locally in RAM and not committed to storage. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-11afs: Fix mountpoint parsingDavid Howells
Each AFS mountpoint has strings that define the target to be mounted. This is required to end in a dot that is supposed to be stripped off. The string can include suffixes of ".readonly" or ".backup" - which are supposed to come before the terminal dot. To add to the confusion, the "fs lsmount" afs utility does not show the terminal dot when displaying the string. The kernel mount source string parser, however, assumes that the terminal dot marks the suffix and that the suffix is always "" and is thus ignored. In most cases, there is no suffix and this is not a problem - but if there is a suffix, it is lost and this affects the ability to mount the correct volume. The command line mount command, on the other hand, is expected not to include a terminal dot - so the problem doesn't arise there. Fix this by making sure that the dot exists and then stripping it when passing the string to the mount configuration. Fixes: bec5eb614130 ("AFS: Implement an autocell mount capability [ver #2]") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-10io_uring: add sockets to list of files that support non-blocking issueJens Axboe
In chasing a performance issue between using IORING_OP_RECVMSG and IORING_OP_READV on sockets, tracing showed that we always punt the socket reads to async offload. This is due to io_file_supports_async() not checking for S_ISSOCK on the inode. Since sockets supports the O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list of file types that we can do a non-blocking issue to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: only hash regular files for async work executionJens Axboe
We hash regular files to avoid having multiple threads hammer on the inode mutex, but it should not be needed on other types of files (like sockets). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: run next sqe inline if possibleJens Axboe
One major use case of linked commands is the ability to run the next link inline, if at all possible. This is done correctly for async offload, but somewhere along the line we lost the ability to do so when we were able to complete a request without having to punt it. Ensure that we do so correctly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: don't dynamically allocate poll dataJens Axboe
This essentially reverts commit e944475e6984. For high poll ops workloads, like TAO, the dynamic allocation of the wait_queue entry for IORING_OP_POLL_ADD adds considerable extra overhead. Go back to embedding the wait_queue_entry, but keep the usage of wait->private for the pointer stashing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: deferred send/recvmsg should assign iovJens Axboe
Don't just assign it from the main call path, that can miss the case when we're called from issue deferral. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: sqthread should grab ctx->uring_lock for submissionsJens Axboe
We use the mutex to guard against registered file updates, for instance. Ensure we're safe in accessing that state against concurrent updates. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io-wq: briefly spin for new work after finishing workJens Axboe
To avoid going to sleep only to get woken shortly thereafter, spin briefly for new work upon completion of work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io-wq: remove worker->wait waitqueueJens Axboe
We only have one cases of using the waitqueue to wake the worker, the rest are using wake_up_process(). Since we can save some cycles not fiddling with the waitqueue io_wqe_worker(), switch the work activation to task wakeup and get rid of the now unused wait_queue_head_t in struct io_worker. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: allow unbreakable linksJens Axboe
Some commands will invariably end in a failure in the sense that the completion result will be less than zero. One such example is timeouts that don't have a completion count set, they will always complete with -ETIME unless cancelled. For linked commands, we sever links and fail the rest of the chain if the result is less than zero. Since we have commands where we know that will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever regardless of the completion result. Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly. Cc: stable@vger.kernel.org # v5.4 Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10ovl: relax WARN_ON() on rename to selfAmir Goldstein
In ovl_rename(), if new upper is hardlinked to old upper underneath overlayfs before upper dirs are locked, user will get an ESTALE error and a WARN_ON will be printed. Changes to underlying layers while overlayfs is mounted may result in unexpected behavior, but it shouldn't crash the kernel and it shouldn't trigger WARN_ON() either, so relax this WARN_ON(). Reported-by: syzbot+bb1836a212e69f8e201a@syzkaller.appspotmail.com Fixes: 804032fabb3b ("ovl: don't check rename to self") Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix corner case of non-unique st_dev;st_inoAmir Goldstein
On non-samefs overlay without xino, non pure upper inodes should use a pseudo_dev assigned to each unique lower fs and pure upper inodes use the real upper st_dev. It is fine for an overlay pure upper inode to use the same st_dev;st_ino values as the real upper inode, because the content of those two different filesystem objects is always the same. In this case, however: - two filesystems, A and B - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B Non pure upper overlay inode, whose origin is in layer 1 will have the same st_dev;st_ino values as the real lower inode. This may result with a false positive results of 'diff' between the real lower and copied up overlay inode. Fix this by using the upper st_dev;st_ino values in this case. This breaks the property of constant st_dev;st_ino across copy up of this case. This breakage will be fixed by a later patch. Fixes: 5148626b806a ("ovl: allocate anon bdev per unique lower fs") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: don't use a temp buf for encoding real fhAmir Goldstein
We can allocate maximum fh size and encode into it directly. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: make sure that real fid is 32bit aligned in memoryAmir Goldstein
Seprate on-disk encoding from in-memory and on-wire resresentation of overlay file handle. In-memory and on-wire we only ever pass around pointers to struct ovl_fh, which encapsulates at offset 3 the on-disk format struct ovl_fb. struct ovl_fb encapsulates at offset 21 the real file handle. That makes sure that the real file handle is always 32bit aligned in-memory when passed down to the underlying filesystem. On-disk format remains the same and store/load are done into correctly aligned buffer. New nfs exported file handles are exported with aligned real fid. Old nfs file handles are copied to an aligned buffer before being decoded. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix lookup failure on multi lower squashfsAmir Goldstein
In the past, overlayfs required that lower fs have non null uuid in order to support nfs export and decode copy up origin file handles. Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed this requirement for nfs export support, as long as uuid (even if null) is unique among all lower fs. However, said commit unintentionally also relaxed the non null uuid requirement for decoding copy up origin file handles, regardless of the unique uuid requirement. Amend this mistake by disabling decoding of copy up origin file handle from lower fs with a conflicting uuid. We still encode copy up origin file handles from those fs, because file handles like those already exist in the wild and because they might provide useful information in the future. There is an unhandled corner case described by Miklos this way: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case bad_uuid won't be set for B, because the check only involves the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We will deal with this corner case later. Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid ...") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-09smb3: fix refcount underflow warning on unmount when no directory leasesSteve French
Fix refcount underflow warning when unmounting to servers which didn't grant directory leases. [ 301.680095] refcount_t: underflow; use-after-free. [ 301.680192] WARNING: CPU: 1 PID: 3569 at lib/refcount.c:28 refcount_warn_saturate+0xb4/0xf3 ... [ 301.682139] Call Trace: [ 301.682240] close_shroot+0x97/0xda [cifs] [ 301.682351] SMB2_tdis+0x7c/0x176 [cifs] [ 301.682456] ? _get_xid+0x58/0x91 [cifs] [ 301.682563] cifs_put_tcon.part.0+0x99/0x202 [cifs] [ 301.682637] ? ida_free+0x99/0x10a [ 301.682727] ? cifs_umount+0x3d/0x9d [cifs] [ 301.682829] cifs_put_tlink+0x3a/0x50 [cifs] [ 301.682929] cifs_umount+0x44/0x9d [cifs] Fixes: 72e73c78c446 ("cifs: close the shared root handle on tree disconnect") Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-and-tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
2019-12-09ceph: add more debug info when decoding mdsmapXiubo Li
Show the laggy state. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: switch to global cap helperXiubo Li
__ceph_is_any_caps is a duplicate helper. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: trigger the reclaim work once there has enough pending capsXiubo Li
The nr in ceph_reclaim_caps_nr() is very possibly larger than 1, so we may miss it and the reclaim work couldn't triggered as expected. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: show tasks waiting on caps in debugfs caps fileJeff Layton
Add some visibility of tasks that are waiting for caps to the "caps" debugfs file. Display the tgid of the waiting task, inode number, and the caps the task needs and wants. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09ceph: convert int fields in ceph_mount_options to unsigned intJeff Layton
Most of these values should never be negative, so convert them to unsigned values. Add some sanity checking to the parsed values, and clean up some unneeded casts. Note that while caps_max should never be negative, this patch leaves it signed, since this value ends up later being compared to a signed counter. Just ensure that userland never passes in a negative value for caps_max. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09fs/ext4/inode-test: Fix inode test on 32 bit platforms.Iurii Zaikin
Fixes the issue caused by the fact that in C in the expression of the form -1234L only 1234L is the actual literal, the unary minus is an operation applied to the literal. Which means that to express the lower bound for the type one has to negate the upper bound and subtract 1. Original error: Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == -2147483648 timestamp.tv_sec == 2147483648 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits: msb:1 lower_bound:1 extra_bits: 0 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 2147483648 timestamp.tv_sec == 6442450944 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on: msb:1 lower_bound:1 extra_bits: 1 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 6442450944 timestamp.tv_sec == 10737418240 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on: msb:1 lower_bound:1 extra_bits: 2 not ok 1 - inode_test_xtimestamp_decoding not ok 1 - ext4_inode_test Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Iurii Zaikin <yzaikin@google.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-09btrfs: add Kconfig dependency for BLAKE2BDavid Sterba
Because the BLAKE2B code went through a different tree, it was not available at the time the btrfs part was merged. Now that the Kconfig symbol exists, add it to the list. Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-09afs: Fix SELinux setting security label on /afsDavid Howells
Make the AFS dynamic root superblock R/W so that SELinux can set the security label on it. Without this, upgrades to, say, the Fedora filesystem-afs RPM fail if afs is mounted on it because the SELinux label can't be (re-)applied. It might be better to make it possible to bypass the R/O check for LSM label application through setxattr. Fixes: 4d673da14533 ("afs: Support the AFS dynamic root") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: selinux@vger.kernel.org cc: linux-security-module@vger.kernel.org
2019-12-09afs: Fix afs_find_server lookups for ipv4 peersMarc Dionne
afs_find_server tries to find a server that has an address that matches the transport address of an rxrpc peer. The code assumes that the transport address is always ipv6, with ipv4 represented as ipv4 mapped addresses, but that's not the case. If the transport family is AF_INET, srx->transport.sin6.sin6_addr.s6_addr32[] will be beyond the actual ipv4 address and will always be 0, and all ipv4 addresses will be seen as matching. As a result, the first ipv4 address seen on any server will be considered a match, and the server returned may be the wrong one. One of the consequences is that callbacks received over ipv4 will only be correctly applied for the server that happens to have the first ipv4 address on the fs_addresses4 list. Callbacks over ipv4 from all other servers are dropped, causing the client to serve stale data. This is fixed by looking at the transport family, and comparing ipv4 addresses based on a sockaddr_in structure rather than a sockaddr_in6. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-12-08Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Nine cifs/smb3 fixes: - one fix for stable (oops during oplock break) - two timestamp fixes including important one for updating mtime at close to avoid stale metadata caching issue on dirty files (also improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the wire) - two fixes for "modefromsid" mount option for file create (now allows mode bits to be set more atomically and accurately on create by adding "sd_context" on create when modefromsid specified on mount) - two fixes for multichannel found in testing this week against different servers - two small cleanup patches" * tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: improve check for when we send the security descriptor context on create smb3: fix mode passed in on create for modetosid mount option cifs: fix possible uninitialized access and race on iface_list cifs: Fix lookup of SMB connections on multichannel smb3: query attributes on file close smb3: remove unused flag passed into close functions cifs: remove redundant assignment to pointer pneg_ctxt fs: cifs: Fix atime update check vs mtime CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
2019-12-08Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs cleanups from Al Viro: "No common topic, just three cleanups". * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make __d_alloc() static fs/namespace: add __user to open_tree and move_mount syscalls fs/fnctl: fix missing __user in fcntl_rw_hint()
2019-12-07Merge tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap fixes from Darrick Wong: "Fix a race condition and a use-after-free error: - Fix a UAF when reporting writeback errors - Fix a race condition when handling page uptodate on fragmented file with blocksize < pagesize" * tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: stop using ioend after it's been freed in iomap_finish_ioend() iomap: fix sub-page uptodate handling
2019-12-07Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Fix a couple of resource management errors and a hang: - fix a crash in the log setup code when log mounting fails - fix a hang when allocating space on the realtime device - fix a block leak when freeing space on the realtime device" * tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix mount failure crash on invalid iclog memory access xfs: don't check for AG deadlock for realtime files in bunmapi xfs: fix realtime file data space leak