summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-07-19Merge tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A number of small fixes for -rc1 Luminous changes plus a readdir race fix, marked for stable" * tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client: libceph: potential NULL dereference in ceph_msg_data_create() ceph: fix race in concurrent readdir libceph: don't call encode_request_finish() on MOSDBackoff messages libceph: use alloc_pg_mapping() in __decode_pg_upmap_items() libceph: set -EINVAL in one place in crush_decode() libceph: NULL deref on osdmap_apply_incremental() error path libceph: fix old style declaration warnings
2017-07-18jfs: preserve i_mode if __jfs_set_acl() failsErnesto A. Fernández
When changing a file's acl mask, __jfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2017-07-18jfs: Don't clear SGID when inheriting ACLsJan Kara
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by moving posix_acl_update_mode() out of __jfs_set_acl() into jfs_set_acl(). That way the function will not be called when inheriting ACLs which is what we want as it prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: stable@vger.kernel.org CC: jfs-discussion@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2017-07-18Merge tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fix from Bruce Fields: "One fix for a problem introduced in the most recent merge window and found by Dave Jones and KASAN" * tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux: nfsd: Fix a memory scribble in the callback channel
2017-07-18hfsplus: Don't clear SGID when inheriting ACLsJan Kara
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by creating __hfsplus_set_posix_acl() function that does not call posix_acl_update_mode() and use it when inheriting ACLs. That prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-18isofs: Fix off-by-one in 'session' mount option parsingJan Kara
According to ECMA-130 standard maximum valid track number is 99. Since 'session' mount option starts indexing at 0 (and we add 1 to the passed number), we should refuse value 99. Also the condition in isofs_get_last_session() unnecessarily repeats the check - remove it. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-18reiserfs: preserve i_mode if __reiserfs_set_acl() failsErnesto A. Fernández
When changing a file's acl mask, reiserfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-18ext2: preserve i_mode if ext2_set_acl() failsErnesto A. Fernández
When changing a file's acl mask, ext2_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. [JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"] Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-17f2fs: avoid cpu lockupJaegeuk Kim
Before retrying to flush data or dentry pages, we need to release cpu in order to prevent watchdog. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-17f2fs: include seq_file.h for sysfs.cJaegeuk Kim
This patch includes seq_file.h to avoid compile error. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-17gfs2: Lock holder cleanup (fixup)Andreas Gruenbacher
Function gfs2_holder_initialized should be used in do_flock as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-17nfsd: Fix a memory scribble in the callback channelTrond Myklebust
The offset of the entry in struct rpc_version has to match the version number. Reported-by: Dave Jones <davej@codemonkey.org.uk> Fixes: 1c5876ddbdb4 ("sunrpc: move p_count out of struct rpc_procinfo") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-17block: order /proc/devices by major numberLogan Gunthorpe
Presently, the order of the block devices listed in /proc/devices is not entirely sequential. If a block device has a major number greater than BLKDEV_MAJOR_HASH_SIZE (255), it will be ordered as if its major were module 255. For example, 511 appears after 1. This patch cleans that up and prints each major number in the correct order, regardless of where they are stored in the hash table. In order to do this, we introduce BLKDEV_MAJOR_MAX as an artificial limit (chosen to be 512). It will then print all devices in major order number from 0 to the maximum. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-17GFS2: Prevent double brelse in gfs2_meta_indirect_bufferBob Peterson
Before this patch, problems reading in indirect buffers would send an IO error back to the caller, and release the buffer_head with brelse() in function gfs2_meta_indirect_buffer, however, it would still return the address of the buffer_head it released. After the error was discovered, function gfs2_block_map would call function release_metapath to free all buffers. That checked: if (mp->mp_bh[i] == NULL) but since the value was set after the error, it was non-zero, so brelse was called a second time. This resulted in the following error: kernel: WARNING: at fs/buffer.c:1224 __brelse+0x3a/0x40() (Tainted: G W -- ------------ ) kernel: Hardware name: RHEV Hypervisor kernel: VFS: brelse: Trying to free free buffer This patch changes gfs2_meta_indirect_buffer so it only sets the buffer_head pointer in cases where it isn't released. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2017-07-17char_dev: order /proc/devices by major numberLogan Gunthorpe
Presently, the order of the char devices listed in /proc/devices is not entirely sequential. If a char device has a major number greater than CHRDEV_MAJOR_HASH_SIZE (255), it will be ordered as if its major were module 255. For example, 511 appears after 1. This patch cleans that up and prints each major number in the correct order, regardless of where they are stored in the hash table. In order to do this, we introduce CHRDEV_MAJOR_MAX as an artificial limit (chosen to be 511). It will then print all devices in major order number from 0 to the maximum. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-17char_dev: extend dynamic allocation of majors into a higher rangeLogan Gunthorpe
We've run into problems with running out of dynamicly assign char device majors particullarly on automated test systems with all-yes-configs. Roughly 40 dynamic assignments can be made with such kernels at this time while space is reserved for only 20. Currently, the kernel only prints a warning when dynamic allocation overflows the reserved region. And when this happens drivers that have fixed assignments can randomly fail depending on the order of initialization of other drivers. Thus, adding a new char device can cause unexpected failures in completely unrelated parts of the kernel. This patch solves the problem by extending dynamic major number allocations down from 511 once the 234-254 region fills up. Fixed majors already exist above 255 so the infrastructure to support high number majors is already in place. The patch reserves an additional 128 major numbers which should hopefully last us a while. Kernels that don't require more than 20 dynamic majors assigned (which is pretty typical) should not be affected by this change. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Walleij <linus.walleij@linaro.org> Link: https://lkml.org/lkml/2017/6/4/107 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-17ceph: fix race in concurrent readdirYan, Zheng
For a large directory, program needs to issue multiple readdir syscalls to get all dentries. When there are multiple programs read the directory concurrently. Following sequence of events can happen. - program calls readdir with pos = 2. ceph sends readdir request to mds. The reply contains N1 entries. ceph adds these N1 entries to readdir cache. - program calls readdir with pos = N1+2. The readdir is satisfied by the readdir cache, N2 entries are returned. (Other program calls readdir in the middle, which fills the cache) - program calls readdir with pos = N1+N2+2. ceph sends readdir request to mds. The reply contains N3 entries and it reaches directory end. ceph adds these N3 entries to the readdir cache and marks directory complete. The second readdir call does not update fi->readdir_cache_idx. ceph add the last N3 entries to wrong places. Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-17ext2: Don't clear SGID when inheriting ACLsJan Kara
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by creating __ext2_set_acl() function that does not call posix_acl_update_mode() and use it when inheriting ACLs. That prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: stable@vger.kernel.org CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-17reiserfs: Don't clear SGID when inheriting ACLsJan Kara
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by moving posix_acl_update_mode() out of __reiserfs_set_acl() into reiserfs_set_acl(). That way the function will not be called when inheriting ACLs which is what we want as it prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: stable@vger.kernel.org CC: reiserfs-devel@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2017-07-17VFS: Differentiate mount flags (MS_*) from internal superblock flagsDavid Howells
Differentiate the MS_* flags passed to mount(2) from the internal flags set in the super_block's s_flags. s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. In this patch, just the headers are altered and some kernel code where blind automated conversion isn't necessarily correct. Note that this shows up some interesting issues: (1) Some MS_* flags get translated to MNT_* flags (such as MS_NODEV -> MNT_NODEV) without passing this on to the filesystem, but some filesystems set such flags anyway. (2) The ->remount_fs() methods of some filesystems adjust the *flags argument by setting MS_* flags in it, such as MS_NOATIME - but these flags are then scrubbed by do_remount_sb() (only the occupants of MS_RMT_MASK are permitted: MS_RDONLY, MS_SYNCHRONOUS, MS_MANDLOCK, MS_I_VERSION and MS_LAZYTIME) I'm not sure what's the best way to solve all these cases. Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)David Howells
Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-16binfmt_flat: Use %u to format u32Geert Uytterhoeven
Several variables had their types changed from unsigned long to u32, but the printk()-style format to print them wasn't updated, leading to: fs/binfmt_flat.c: In function ‘load_flat_file’: fs/binfmt_flat.c:577: warning: format ‘%ld’ expects type ‘long int’, but argument 3 has type ‘u32’ Fixes: 468138d78510688f ("binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-16fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locksBenjamin Coddington
Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be atomic with the stateid update", NFSv4 has been inserting locks in rpciod worker context. The result is that the file_lock's fl_nspid is the kworker's pid instead of the original userspace pid. The fl_nspid is only used to represent the namespaced virtual pid number when displaying locks or returning from F_GETLK. There's no reason to set it for every inserted lock, since we can usually just look it up from fl_pid. So, instead of looking up and holding struct pid for every lock, let's just look up the virtual pid number from fl_pid when it is needed. That means we can remove fl_nspid entirely. The translaton and presentation of fl_pid should handle the following four cases: 1 - F_GETLK on a remote file with a remote lock: In this case, the filesystem should determine the l_pid to return here. Filesystems should indicate that the fl_pid represents a non-local pid value that should not be translated by returning an fl_pid <= 0. 2 - F_GETLK on a local file with a remote lock: This should be the l_pid of the lock manager process, and translated. 3 - F_GETLK on a remote file with a local lock, and 4 - F_GETLK on a local file with a local lock: These should be the translated l_pid of the local locking process. Fuse was already doing the correct thing by translating the pid into the caller's namespace. With this change we must update fuse to translate to init's pid namespace, so that the locks API can then translate from init's pid namespace into the pid namespace of the caller. With this change, the locks API will expect that if a filesystem returns a remote pid as opposed to a local pid for F_GETLK, that remote pid will be <= 0. This signifies that the pid is remote, and the locks API will forego translating that pid into the pid namespace of the local calling process. Finally, we convert remote filesystems to present remote pids using negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote pid returned for F_GETLK lock requests. Since local pids will never be larger than PID_MAX_LIMIT (which is currently defined as <= 4 million), but pid_t is an unsigned int, we should have plenty of room to represent remote pids with negative numbers if we assume that remote pid numbers are similarly limited. If this is not the case, then we run the risk of having a remote pid returned for which there is also a corresponding local pid. This is a problem we have now, but this patch should reduce the chances of that occurring, while also returning those remote pid numbers, for whatever that may be worth. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-16fs/locks: Use allocation rather than the stack in fcntl_getlk()Benjamin Coddington
Struct file_lock is fairly large, so let's save some space on the stack by using an allocation for struct file_lock in fcntl_getlk(), just as we do for fcntl_setlk(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-15f2fs: Don't clear SGID when inheriting ACLsJaegeuk Kim
This patch copies commit b7f8a09f80: "btrfs: Don't clear SGID when inheriting ACLs" written by Jan. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-15f2fs: remove extra inode_unlock() in error pathLuis Henriques
This commit removes an extra inode_unlock() that is being done in function f2fs_ioc_setflags error path. While there, get rid of a useless 'out' label as well. Fixes: 0abd675e97e6 ("f2fs: support plain user/group quota") Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-15Merge tag 'random_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random updates from Ted Ts'o: "Add wait_for_random_bytes() and get_random_*_wait() functions so that callers can more safely get random bytes if they can block until the CRNG is initialized. Also print a warning if get_random_*() is called before the CRNG is initialized. By default, only one single-line warning will be printed per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a warning will be printed for each function which tries to get random bytes before the CRNG is initialized. This can get spammy for certain architecture types, so it is not enabled by default" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: reorder READ_ONCE() in get_random_uXX random: suppress spammy warnings about unseeded randomness random: warn when kernel uses unseeded randomness net/route: use get_random_int for random counter net/neighbor: use get_random_u32 for 32-bit hash random rhashtable: use get_random_u32 for hash_rnd ceph: ensure RNG is seeded before using iscsi: ensure RNG is seeded before use cifs: use get_random_u32 for 32-bit lock random random: add get_random_{bytes,u32,u64,int,long,once}_wait family random: add wait_for_random_bytes() API
2017-07-15Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ->s_options removal from Al Viro: "Preparations for fsmount/fsopen stuff (coming next cycle). Everything gets moved to explicit ->show_options(), killing ->s_options off + some cosmetic bits around fs/namespace.c and friends. Basically, the stuff needed to work with fsmount series with minimum of conflicts with other work. It's not strictly required for this merge window, but it would reduce the PITA during the coming cycle, so it would be nice to have those bits and pieces out of the way" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: isofs: Fix isofs_show_options() VFS: Kill off s_options and helpers orangefs: Implement show_options 9p: Implement show_options isofs: Implement show_options afs: Implement show_options affs: Implement show_options befs: Implement show_options spufs: Implement show_options bpf: Implement show_options ramfs: Implement show_options pstore: Implement show_options omfs: Implement show_options hugetlbfs: Implement show_options VFS: Don't use save/replace_mount_options if not using generic_show_options VFS: Provide empty name qstr VFS: Make get_filesystem() return the affected filesystem VFS: Clean up whitespace in fs/namespace.c and fs/super.c Provide a function to create a NUL-terminated string from unterminated data
2017-07-15Merge branch 'work.uaccess-unaligned' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uacess-unaligned removal from Al Viro: "That stuff had just one user, and an exotic one, at that - binfmt_flat on arm and m68k" * 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kill {__,}{get,put}_user_unaligned() binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
2017-07-15Merge tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBIFS updates from Richard Weinberger: - Updates and fixes for the file encryption mode - Minor improvements - Random fixes * tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs: ubifs: Set double hash cookie also for RENAME_EXCHANGE ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs ubifs: Don't leak kernel memory to the MTD ubifs: Change gfp flags in page allocation for bulk read ubifs: Fix oops when remounting with no_bulk_read. ubifs: Fail commit if TNC is obviously inconsistent ubifs: allow userspace to map mounts to volumes ubifs: Wire-up statx() support ubifs: Remove dead code from ubifs_get_link() ubifs: Massage debug prints wrt. fscrypt ubifs: Add assert to dent_key_init() ubifs: Fix unlink code wrt. double hash lookups ubifs: Fix data node size for truncating uncompressed nodes ubifs: Don't encrypt special files on creation ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_rename ubifs: Fix inode data budget in ubifs_mknod ubifs: Correctly evict xattr inodes ubifs: Unexport ubifs_inode_slab ubifs: don't bother checking for encryption key in ->mmap() ubifs: require key for truncate(2) of encrypted file
2017-07-14Merge tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull XFS fixes from Darrick Wong: "Largely debugging and regression fixes. - Add some locking assertions for the _ilock helpers. - Revert the XFS_QMOPT_NOLOCK patch; after discussion with hch the online fsck patch that would have needed it has been redesigned and no longer needs it. - Fix behavioral regression of SEEK_HOLE/DATA with negative offsets to match 4.12-era XFS behavior" * tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: vfs: in iomap seek_{hole,data}, return -ENXIO for negative offsets Revert "xfs: grab dquots without taking the ilock" xfs: assert locking precondition in xfs_readlink_bmap_ilocked xfs: assert locking precondіtion in xfs_attr_list_int_ilocked xfs: fixup xfs_attr_get_ilocked
2017-07-14Merge branch 'for-4.13-part2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've identified and fixed a silent corruption (introduced by code in the first pull), a fixup after the blk_status_t merge and two fixes to incremental send that Filipe has been hunting for some time" * 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix unexpected return value of bio_readpage_error btrfs: btrfs_create_repair_bio never fails, skip error handling btrfs: cloned bios must not be iterated by bio_for_each_segment_all Btrfs: fix write corruption due to bio cloning on raid5/6 Btrfs: incremental send, fix invalid memory access Btrfs: incremental send, fix invalid path for link commands
2017-07-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge even more updates from Andrew Morton: - a few leftovers - fault-injector rework - add a module loader test driver * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kmod: throttle kmod thread limit kmod: add test driver to stress test the module loader MAINTAINERS: give kmod some maintainer love xtensa: use generic fb.h fault-inject: add /proc/<pid>/fail-nth fault-inject: simplify access check for fail-nth fault-inject: make fail-nth read/write interface symmetric fault-inject: parse as natural 1-based value for fail-nth write interface fault-inject: automatically detect the number base for fail-nth write interface kernel/watchdog.c: use better pr_fmt prefix MAINTAINERS: move the befs tree to kernel.org lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int mm: fix overflow check in expand_upwards()
2017-07-14fault-inject: add /proc/<pid>/fail-nthAkinobu Mita
fail-nth interface is only created in /proc/self/task/<current-tid>/. This change also adds it in /proc/<pid>/. This makes shell based tool a bit simpler. $ bash -c "builtin echo 100 > /proc/self/fail-nth && exec ls /" Link: http://lkml.kernel.org/r/1491490561-10485-6-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14fault-inject: simplify access check for fail-nthAkinobu Mita
The fail-nth file is created with 0666 and the access is permitted if and only if the task is current. This file is owned by the currnet user. So we can create it with 0644 and allow the owner to write it. This enables to watch the status of task->fail_nth from another processes. [akinobu.mita@gmail.com: don't convert unsigned type value as signed int] Link: http://lkml.kernel.org/r/1492444483-9239-1-git-send-email-akinobu.mita@gmail.com [akinobu.mita@gmail.com: avoid unwanted data race to task->fail_nth] Link: http://lkml.kernel.org/r/1499962492-8931-1-git-send-email-akinobu.mita@gmail.com Link: http://lkml.kernel.org/r/1491490561-10485-5-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14fault-inject: make fail-nth read/write interface symmetricAkinobu Mita
The read interface for fail-nth looks a bit odd. Read from this file returns "NYYYY..." or "YYYYY..." (this makes me surprise when cat this file). Because there is no EOF condition. The first character indicates current->fail_nth is zero or not, and then current->fail_nth is reset to zero. Just returning task->fail_nth value is more natural to understand. Link: http://lkml.kernel.org/r/1491490561-10485-4-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14fault-inject: parse as natural 1-based value for fail-nth write interfaceAkinobu Mita
The value written to fail-nth file is parsed as 0-based. Parsing as one-based is more natural to understand and it enables to cancel the previous setup by simply writing '0'. This change also converts task->fail_nth from signed to unsigned int. Link: http://lkml.kernel.org/r/1491490561-10485-3-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14fault-inject: automatically detect the number base for fail-nth write interfaceAkinobu Mita
Automatically detect the number base to use when writing to fail-nth file instead of always parsing as a decimal number. Link: http://lkml.kernel.org/r/1491490561-10485-2-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14ubifs: Set double hash cookie also for RENAME_EXCHANGERichard Weinberger
We developed RENAME_EXCHANGE and UBIFS_FLG_DOUBLE_HASH more or less in parallel and this case was forgotten. :-( Cc: stable@vger.kernel.org Fixes: d63d61c16972 ("ubifs: Implement UBIFS_FLG_DOUBLE_HASH") Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrsXiaolei Li
The inode is not locked in init_xattrs when creating a new inode. Without this patch, there will occurs assert when booting or creating a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled. Log likes: UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156) CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S 4.12.0-rc1-207440-g1e70b02 #2 Hardware name: MediaTek MT2712 evaluation board (DT) Call trace: [<ffff000008088538>] dump_backtrace+0x0/0x238 [<ffff000008088834>] show_stack+0x14/0x20 [<ffff0000083d98d4>] dump_stack+0x9c/0xc0 [<ffff00000835d524>] ubifs_xattr_set+0x374/0x5e0 [<ffff00000835d7ec>] init_xattrs+0x5c/0xb8 [<ffff000008385788>] security_inode_init_security+0x110/0x190 [<ffff00000835e058>] ubifs_init_security+0x30/0x68 [<ffff00000833ada0>] ubifs_mkdir+0x100/0x200 [<ffff00000820669c>] vfs_mkdir+0x11c/0x1b8 [<ffff00000820b73c>] SyS_mkdirat+0x74/0xd0 [<ffff000008082f8c>] __sys_trace_return+0x0/0x4 Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Don't leak kernel memory to the MTDRichard Weinberger
When UBIFS prepares data structures which will be written to the MTD it ensues that their lengths are multiple of 8. Since it uses kmalloc() the padded bytes are left uninitialized and we leak a few bytes of kernel memory to the MTD. To make sure that all bytes are initialized, let's switch to kzalloc(). Kzalloc() is fine in this case because the buffers are not huge and in the IO path the performance bottleneck is anyway the MTD. Cc: stable@vger.kernel.org Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Change gfp flags in page allocation for bulk readHyunchul Lee
In low memory situations, page allocations for bulk read can kill applications for reclaiming memory, and print an failure message when allocations are failed. Because bulk read is just an optimization, we don't have to do these and can stop page allocations. Though this siutation happens rarely, add __GFP_NORETRY to prevent from excessive memory reclaim and killing applications, and __GFP_WARN to suppress this failure message. For this, Use readahead_gfp_mask for gfp flags when allocating pages. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Fix oops when remounting with no_bulk_read.karam.lee
When remounting with the no_bulk_read option, there is a problem accessing the "bulk_read buffer(bu.buf)" which has already been freed. If the bulk_read option is enabled, ubifs_tnc_bulk_read uses the pre-allocated bu.buf. While bu.buf is being used by ubifs_tnc_bulk_read, remounting with no_bulk_read frees bu.buf. So I added code to check the use of "bu.buf" to avoid this situation. ------ I tested as follows(kernel v3.18) : Use the script to repeat "no_bulk_read <-> bulk_read" remount.sh #!/bin/sh while true do; mount -o remount,no_bulk_read ${MOUNT_POINT}; sleep 1; mount -o remount,bulk_read ${MOUNT_POINT}; sleep 1; done Perform read operation cat ${MOUNT_POINT}/* > /dev/null The problem is reproduced immediately. [ 234.256845][kernel.0]Internal error: Oops: 17 [#1] PREEMPT ARM [ 234.258557][kernel.0]CPU: 0 PID: 2752 Comm: cat Tainted: G W O 3.18.31+ #51 [ 234.259531][kernel.0]task: cbff8580 ti: cbd66000 task.ti: cbd66000 [ 234.260306][kernel.0]PC is at validate_data_node+0x10/0x264 [ 234.260994][kernel.0]LR is at ubifs_tnc_bulk_read+0x388/0x3ec [ 234.261712][kernel.0]pc : [<c01d98fc>] lr : [<c01dc300>] psr: 80000013 [ 234.261712][kernel.0]sp : cbd67ba0 ip : 00000001 fp : 00000000 [ 234.263337][kernel.0]r10: cd3e0260 r9 : c0df2008 r8 : 00000000 [ 234.264087][kernel.0]r7 : cd3e0000 r6 : 00000000 r5 : cd3e0278 r4 : cd3e0000 [ 234.264999][kernel.0]r3 : 00000003 r2 : cd3e0280 r1 : 00000000 r0 : cd3e0000 [ 234.265910][kernel.0]Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 234.266896][kernel.0]Control: 10c53c7d Table: 8c40c059 DAC: 00000015 [ 234.267711][kernel.0]Process cat (pid: 2752, stack limit = 0xcbd66400) [ 234.268525][kernel.0]Stack: (0xcbd67ba0 to 0xcbd68000) [ 234.269169][kernel.0]7ba0: cd7c3940 c03d8650 0001bfe0 00002ab2 00000000 cbd67c5c cbd67c58 0001bfe0 [ 234.270287][kernel.0]7bc0: cd3e0000 00002ab2 0001bfe0 00000014 cbd66000 cd3e0260 00000000 c01d6660 [ 234.271403][kernel.0]7be0: 00002ab2 00000000 c82a5800 ffffffff cd3e0298 cd3e0278 00000000 cd3e0000 [ 234.272520][kernel.0]7c00: 00000000 00000000 cd3e0260 c01dc300 00002ab2 00000000 60000013 d663affa [ 234.273639][kernel.0]7c20: cd3e01f0 cd3e01f0 60000013 c09397ec 00000000 cd3e0278 00002ab2 00000000 [ 234.274755][kernel.0]7c40: cd3e0000 c01dbf48 00000014 00000003 00000160 00000015 00000004 d663affa [ 234.275874][kernel.0]7c60: ccdaa978 cd3e0278 cd3e0000 cf32a5f4 ccdaa820 00000044 cbd66000 cd3e0260 [ 234.276992][kernel.0]7c80: 00000003 c01cec84 ccdaa8dc cbd67cc4 cbd67ec0 00000010 ccdaa978 00000000 [ 234.278108][kernel.0]7ca0: 0000015e ccdaa8dc 00000000 00000000 cf32a5d0 00000000 0000015f ccdaa8dc [ 234.279228][kernel.0]7cc0: 00000000 c8488300 0009e5a4 0000000e cbd66000 0000015e cf32a5f4 c0113c04 [ 234.280346][kernel.0]7ce0: 0000009f 0000003c c00098c4 ffffffff 00001000 00000000 000000ad 00000010 [ 234.281463][kernel.0]7d00: 00000038 cd68f580 00000150 c8488360 00000000 cbd67d30 cbd67d70 0000000e [ 234.282579][kernel.0]7d20: 00000010 00000000 c0951874 c0112a9c cf379b60 cf379b84 cf379890 cf3798b4 [ 234.283699][kernel.0]7d40: cf379578 cf37959c cf379380 cf3793a4 cf3790b0 cf3790d4 cf378fd8 cf378ffc [ 234.284814][kernel.0]7d60: cf378f48 cf378f6c cf32a5f4 cf32a5d0 00000000 00001000 00000018 00000000 [ 234.285932][kernel.0]7d80: 00001000 c0050da4 00000000 00001000 cec04c00 00000000 00001000 c0e11328 [ 234.287049][kernel.0]7da0: 00000000 00001000 cbd66000 00000000 00001000 c0012a60 00000000 00001000 [ 234.288166][kernel.0]7dc0: cbd67dd4 00000000 00001000 80000013 00000000 00001000 cd68f580 00000000 [ 234.289285][kernel.0]7de0: 00001000 c915d600 00000000 00001000 cbd67e48 00000000 00001000 00000018 [ 234.290402][kernel.0]7e00: 00000000 00001000 00000000 00000000 00001000 c915d768 c915d768 c0113550 [ 234.291522][kernel.0]7e20: cd68f580 cbd67e48 cd68f580 cb6713c0 00010000 000ac5a4 00000000 001fc5a4 [ 234.292637][kernel.0]7e40: 00000000 c8488300 cbd67ec0 00eb0000 cd68f580 c0113ee4 00000000 cbd67ec0 [ 234.293754][kernel.0]7e60: cd68f580 c8488300 cbd67ec0 00eb0000 cd68f580 00150000 c8488300 00eb0000 [ 234.294874][kernel.0]7e80: 00010000 c0112fd0 00000000 cbd67ec0 cd68f580 00150000 00000000 cd68f580 [ 234.295991][kernel.0]7ea0: cbd67ef0 c011308c 00000000 00000002 cd768850 00010000 00000000 c01133fc [ 234.297110][kernel.0]7ec0: 00150000 00000000 cbd67f50 00000000 00000000 cb6713c0 01000000 cbd67f48 [ 234.298226][kernel.0]7ee0: cbd67f50 c8488300 00000000 c0113204 00010000 01000000 00000000 cb6713c0 [ 234.299342][kernel.0]7f00: 00150000 00000000 cbd67f50 00000000 00000000 00000000 00000000 00000000 [ 234.300462][kernel.0]7f20: cbd67f50 01000000 01000000 cb6713c0 c8488300 c00ebba8 01000000 00000000 [ 234.301577][kernel.0]7f40: c8488300 cb6713c0 00000000 00000000 00000000 00000000 ccdaa820 00000000 [ 234.302697][kernel.0]7f60: 00000000 01000000 00000003 00000001 cbd66000 00000000 00000001 c00ec678 [ 234.303813][kernel.0]7f80: 00000000 00000200 00000000 01000000 01000000 00000000 00000000 000000ef [ 234.304933][kernel.0]7fa0: c000e904 c000e780 01000000 00000000 00000001 00000003 00000000 01000000 [ 234.306049][kernel.0]7fc0: 01000000 00000000 00000000 000000ef 00000001 00000003 01000000 00000001 [ 234.307165][kernel.0]7fe0: 00000000 beafb78c 0000ad08 00128d1c 60000010 00000001 00000000 00000000 [ 234.308292][kernel.0][<c01d98fc>] (validate_data_node) from [<c01dc300>] (ubifs_tnc_bulk_read+0x388/0x3ec) [ 234.309493][kernel.0][<c01dc300>] (ubifs_tnc_bulk_read) from [<c01cec84>] (ubifs_readpage+0x1dc/0x46c) [ 234.310656][kernel.0][<c01cec84>] (ubifs_readpage) from [<c0113c04>] (__generic_file_splice_read+0x29c/0x4cc) [ 234.311890][kernel.0][<c0113c04>] (__generic_file_splice_read) from [<c0113ee4>] (generic_file_splice_read+0xb0/0xf4) [ 234.313214][kernel.0][<c0113ee4>] (generic_file_splice_read) from [<c0112fd0>] (do_splice_to+0x68/0x7c) [ 234.314386][kernel.0][<c0112fd0>] (do_splice_to) from [<c011308c>] (splice_direct_to_actor+0xa8/0x190) [ 234.315544][kernel.0][<c011308c>] (splice_direct_to_actor) from [<c0113204>] (do_splice_direct+0x90/0xb8) [ 234.316741][kernel.0][<c0113204>] (do_splice_direct) from [<c00ebba8>] (do_sendfile+0x17c/0x2b8) [ 234.317838][kernel.0][<c00ebba8>] (do_sendfile) from [<c00ec678>] (SyS_sendfile64+0xc4/0xcc) [ 234.318890][kernel.0][<c00ec678>] (SyS_sendfile64) from [<c000e780>] (ret_fast_syscall+0x0/0x38) [ 234.319983][kernel.0]Code: e92d47f0 e24dd050 e59f9228 e1a04000 (e5d18014) Signed-off-by: karam.lee <karam.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Fail commit if TNC is obviously inconsistentRichard Weinberger
A reference to LEB 0 or with length 0 in the TNC is never correct and could be caused by a memory corruption. Don't write such a bad index node to the MTD. Instead fail the commit which will turn UBIFS into read-only mode. This is less painful than having the bad reference on the MTD from where UBFIS has no chance to recover. Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: allow userspace to map mounts to volumesRabin Vincent
There currently appears to be no way for userspace to find out the underlying volume number for a mounted ubifs file system, since ubifs uses anonymous block devices. The volume name is present in /proc/mounts but UBI volumes can be renamed after the volume has been mounted. To remedy this, show the UBI number and UBI volume number as part of the options visible under /proc/mounts. Also, accept and ignore the ubi= vol= options if they are used mounting (patch from Richard Weinberger). # mount -t ubifs ubi:baz x # mount ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2) # ubirename /dev/ubi0 baz bazz # mount ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2) # ubinfo -d 0 -n 2 Volume ID: 2 (on ubi0) Type: dynamic Alignment: 1 Size: 67 LEBs (1063424 bytes, 1.0 MiB) State: OK Name: bazz Character device major/minor: 254:3 Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Wire-up statx() supportRichard Weinberger
statx() can report what flags a file has, expose flags that UBIFS supports. Especially STATX_ATTR_COMPRESSED and STATX_ATTR_ENCRYPTED can be interesting for userspace. Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Remove dead code from ubifs_get_link()Richard Weinberger
We check the length already, no need to check later again for an empty string. Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Massage debug prints wrt. fscryptRichard Weinberger
If file names are encrypted we can no longer print them. That's why we have to change these prints or remove them completely. Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Add assert to dent_key_init()Richard Weinberger
...to make sure that we don't use it for double hashed lookups instead of dent_key_init_hash(). Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-14ubifs: Fix unlink code wrt. double hash lookupsRichard Weinberger
When removing an encrypted file with a long name and without having the key we have to be able to locate and remove the directory entry via a double hash. This corner case was simply forgotten. Fixes: 528e3d178f25 ("ubifs: Add full hash lookup support") Reported-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at>