summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-04-24ext4: fix data exposure after a crashJan Kara
Huang has reported that in his powerfail testing he is seeing stale block contents in some of recently allocated blocks although he mounts ext4 in data=ordered mode. After some investigation I have found out that indeed when delayed allocation is used, we don't add inode to transaction's list of inodes needing flushing before commit. Originally we were doing that but commit f3b59291a69d removed the logic with a flawed argument that it is not needed. The problem is that although for delayed allocated blocks we write their contents immediately after allocating them, there is no guarantee that the IO scheduler or device doesn't reorder things and thus transaction allocating blocks and attaching them to inode can reach stable storage before actual block contents. Actually whenever we attach freshly allocated blocks to inode using a written extent, we should add inode to transaction's ordered inode list to make sure we properly wait for block contents to be written before committing the transaction. So that is what we do in this patch. This also handles other cases where stale data exposure was possible - like filling hole via mmap in data=ordered,nodelalloc mode. The only exception to the above rule are extending direct IO writes where blkdev_direct_IO() waits for IO to complete before increasing i_size and thus stale data exposure is not possible. For now we don't complicate the code with optimizing this special case since the overhead is pretty low. In case this is observed to be a performance problem we can always handle it using a special flag to ext4_map_blocks(). CC: stable@vger.kernel.org Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-23ext4: allow readdir()'s of large empty directories to be interruptedTheodore Ts'o
If a directory has a large number of empty blocks, iterating over all of them can take a long time, leading to scheduler warnings and users getting irritated when they can't kill a process in the middle of one of these long-running readdir operations. Fix this by adding checks to ext4_readdir() and ext4_htree_fill_tree(). This was reverted earlier due to a typo in the original commit where I experimented with using signal_pending() instead of fatal_signal_pending(). The test was in the wrong place if we were going to return signal_pending() since we would end up returning duplicant entries. See 9f2394c9be47 for a more detailed explanation. Added fix as suggested by Linus to check for signal_pending() in in the filldir() functions. Reported-by: Benjamin LaHaise <bcrl@kvack.org> Google-Bug-Id: 27880676 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23ceph: kill __ceph_removexattr()Yan, Zheng
when removing a xattr, generic_removexattr() calls __ceph_setxattr() with NULL value and XATTR_REPLACE flag. __ceph_removexattr() is not used any more. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23ceph: Switch to generic xattr handlersAndreas Gruenbacher
Add a catch-all xattr handler at the end of ceph_xattr_handlers. Check for valid attribute names there, and remove those checks from __ceph_{get,set,remove}xattr instead. No "system.*" xattrs need to be handled by the catch-all handler anymore. The set xattr handler is called with a NULL value to indicate that the attribute should be removed; __ceph_setxattr already handles that case correctly (ceph_set_acl could already calling __ceph_setxattr with a NULL value). Move the check for snapshots from ceph_{set,remove}xattr into __ceph_{set,remove}xattr. With that, ceph_{get,set,remove}xattr can be replaced with the generic iops. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23ceph: Get rid of d_find_alias in ceph_set_aclAndreas Gruenbacher
Create a variant of ceph_setattr that takes an inode instead of a dentry. Change __ceph_setxattr (and also __ceph_removexattr) to take an inode instead of a dentry. Use those in ceph_set_acl so that we no longer need a dentry there. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Switch to generic xattr handlersAndreas Gruenbacher
Use xattr handlers for resolving attribute names. The amount of setup code required on cifs is nontrivial, so use the same get and set functions for all handlers, with switch statements for the different types of attributes in them. The set_EA handler can handle NULL values, so we don't need a separate removexattr function anymore. Remove the cifs_dbg statements related to xattr name resolution; they don't add much. Don't build xattr.o when CONFIG_CIFS_XATTR is not defined. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Fix removexattr for os2.* xattrsAndreas Gruenbacher
If cifs_removexattr finds a "user." or "os2." xattr name prefix, it skips 5 bytes, one byte too many for "os2.". Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULTAndreas Gruenbacher
The two values ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT are meant to be enumerations, not bits in a bit mask. Use '==' instead of '&' to check for these values. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Fix xattr name checksAndreas Gruenbacher
Use strcmp(str, name) instead of strncmp(str, name, strlen(name)) for checking if str and name are the same (as opposed to name being a prefix of str) in the gexattr and setxattr inode operations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-20eCryptfs: Do not allocate hash tfm in NORECLAIM contextHerbert Xu
You cannot allocate crypto tfm objects in NORECLAIM or NOFS contexts. The ecryptfs code currently does exactly that for the MD5 tfm. This patch fixes it by preallocating the MD5 tfm in a safe context. The MD5 tfm is also reentrant so this patch removes the superfluous cs_hash_tfm_mutex. Reported-by: Nicolas Boichat <drinkcat@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-19GFS2: Add calls to gfs2_holder_uninit in two error handlersDaniel DeFreez
This patch fixes two locations that do not call gfs2_holder_uninit if gfs2_glock_nq returns an error. Signed-off-by: Daniel DeFreez <dcdefreez@ucdavis.edu> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-19Merge branch 'ptmx-cleanup'Linus Torvalds
Merge the ptmx internal interface cleanup branch. This doesn't change semantics, but it should be a sane basis for eventually getting the multi-instance devpts code into some sane shape where we can get rid of the kernel config option. Which we can hopefully get done next merge window.. * ptmx-cleanup: devpts: clean up interface to pty drivers
2016-04-18devpts: clean up interface to pty driversLinus Torvalds
This gets rid of the horrible notion of having that struct inode *ptmx_inode be the linchpin of the interface between the pty code and devpts. By de-emphasizing the ptmx inode, a lot of things actually get cleaner, and we will have a much saner way forward. In particular, this will allow us to associate with any particular devpts instance at open-time, and not be artificially tied to one particular ptmx inode. The patch itself is actually fairly straightforward, and apart from some locking and return path cleanups it's pretty mechanical: - the interfaces that devpts exposes all take "struct pts_fs_info *" instead of "struct inode *ptmx_inode" now. NOTE! The "struct pts_fs_info" thing is a completely opaque structure as far as the pty driver is concerned: it's still declared entirely internally to devpts. So the pty code can't actually access it in any way, just pass it as a "cookie" to the devpts code. - the "look up the pts fs info" is now a single clear operation, that also does the reference count increment on the pts superblock. So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get ref" operation (devpts_get_ref(inode)), along with a "put ref" op (devpts_put_ref()). - the pty master "tty->driver_data" field now contains the pts_fs_info, not the ptmx inode. - because we don't care about the ptmx inode any more as some kind of base index, the ref counting can now drop the inode games - it just gets the ref on the superblock. - the pts_fs_info now has a back-pointer to the super_block. That's so that we can easily look up the information we actually need. Although quite often, the pts fs info was actually all we wanted, and not having to look it up based on some magical inode makes things more straightforward. In particular, now that "devpts_get_ref(inode)" operation should really be the *only* place we need to look up what devpts instance we're associated with, and we do it exactly once, at ptmx_open() time. The other side of this is that one ptmx node could now be associated with multiple different devpts instances - you could have a single /dev/ptmx node, and then have multiple mount namespaces with their own instances of devpts mounted on /dev/pts/. And that's all perfectly sane in a model where we just look up the pts instance at open time. This will eventually allow us to get rid of our odd single-vs-multiple pts instance model, but this patch in itself changes no semantics, only an internal binding model. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-19Merge 4.6-rc4 into driver-core-nextGreg Kroah-Hartman
We want those fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-18treewide: Fix typos in printkMasanari Iida
This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18Merge branch 'master' into for-nextJiri Kosina
Sync with Linus' tree so that patches against newer codebase can be applied. Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18Doc: treewide : Fix typos in DocBook/filesystem.xmlMasanari Iida
This patch fix spelling typos found in DocBook/filesystem.xml. It is because the file was generated from comments in code, I have to fix the comments in codes, instead of xml file. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-16Merge tag 'driver-core-4.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull misc fixes from Greg KH: "Here are three small fixes for 4.6-rc4. Two fix up some lz4 issues with big endian systems, and the remaining one resolves a minor debugfs issue that was reported. All have been in linux-next with no reported issues" * tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: lib: lz4: cleanup unaligned access efficiency detection lib: lz4: fixed zram with lz4 on big endian machines debugfs: Make automount point inodes permanently empty
2016-04-15f2fs: flush dirty pages before starting atomic writesJaegeuk Kim
If somebody wrote some data before atomic writes, we should flush them in order to handle atomic data in a right period. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: don't invalidate atomic page if successfulJaegeuk Kim
If we committed atomic write successfully, we don't need to invalidate pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: give -E2BIG for no space in xattrJaegeuk Kim
This patch returns -E2BIG if there is no space to add an xattr entry. This should fix generic/026 in xfstests as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: remove redundant condition checkJaegeuk Kim
This patch resolves the redundant condition check reported by David. Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: unset atomic/volatile flag in f2fs_release_fileJaegeuk Kim
The atomic/volatile operation should be done in pair of start and commit ioctl. For example, if a killed process remains open-ended atomic operation, we should drop its flag as well as its atomic data. Otherwise, if sqlite initiates another operation which doesn't require atomic writes, it will lose every data, since f2fs still treats with them as atomic writes; nobody will trigger its commit. Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: fix dropping inmemory pages in a wrong timeJaegeuk Kim
When one reader closes its file while the other writer is doing atomic writes, f2fs_release_file drops atomic data resulting in an empty commit. This patch fixes this wrong commit problem by checking openess of the file. Process0 Process1 open file start atomic write write data read data close file f2fs_release_file() clear atomic data commit atomic write Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: add BUG_ON to avoid unnecessary flowJaegeuk Kim
This patch adds BUG_ON instead of retrying loop. In the case of node pages, we already got this inode page, but unlocked it. By the fact that we don't truncate any node pages in operations, the page's mapping should be unchangeable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: use PGP_LOCK to check its truncationJaegeuk Kim
Previously, after trylock_page is succeeded, it doesn't check its mapping. In order to fix that, we can just give PGP_LOCK to pagecache_get_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: fix to convert inline directory correctlyChao Yu
With below serials, we will lose parts of dirents: 1) mount f2fs with inline_dentry option 2) echo 1 > /sys/fs/f2fs/sdX/dir_level 3) mkdir dir 4) touch 180 files named [1-180] in dir 5) touch 181 in dir 6) echo 3 > /proc/sys/vm/drop_caches 7) ll dir ls: cannot access 2: No such file or directory ls: cannot access 4: No such file or directory ls: cannot access 5: No such file or directory ls: cannot access 6: No such file or directory ls: cannot access 8: No such file or directory ls: cannot access 9: No such file or directory ... total 360 drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./ drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../ -rw-r--r-- 1 root root 0 Feb 19 15:12 1 -rw-r--r-- 1 root root 0 Feb 19 15:12 10 -rw-r--r-- 1 root root 0 Feb 19 15:12 100 -????????? ? ? ? ? ? 101 -????????? ? ? ? ? ? 102 -????????? ? ? ? ? ? 103 ... The reason is: when doing the inline dir conversion, we didn't consider that directory has hierarchical hash structure which can be configured through sysfs interface 'dir_level'. By default, dir_level of directory inode is 0, it means we have one bucket in hash table located in first level, all dirents will be hashed in this bucket, so it has no problem for us to do the duplication simply between inline dentry page and converted normal dentry page. However, if we configured dir_level with the value N (greater than 0), it will expand the bucket number of first level hash table by 2^N - 1, it hashs dirents into different buckets according their hash value, if we still move all dirents to first bucket, it makes incorrent locating for inline dirents, the result is, although we can iterate all dirents through ->readdir, we can't stat some of them in ->lookup which based on hash table searching. This patch fixes this issue by rehashing dirents into correct position when converting inline directory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: show current mount statusJaegeuk Kim
This patch remains the current mount status to f2fs status info. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: treat as a normal umount when remounting roJaegeuk Kim
When user remounts f2fs as read-only, we can mark the checkpoint as umount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: give -EINVAL for norecovery and rw mountJaegeuk Kim
Once detecting something to recover, f2fs should stop mounting, given norecovery and rw mount options. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: recover superblock at RW remountsJaegeuk Kim
This patch adds a sbi flag, SBI_NEED_SB_WRITE, which indicates it needs to recover superblock when (re)mounting as RW. This is set only when f2fs is mounted as RO. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15f2fs: give RO message when recovering superblockJaegeuk Kim
When one of superblocks is missing, f2fs recovers it with the valid one. But, even if f2fs is mounted as RO, we'd better notify that too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-14Merge tag 'for-linus-4.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs/fscrypto fixes from Jaegeuk Kim: "In addition to f2fs/fscrypto fixes, I've added one patch which prevents RCU mode lookup in d_revalidate, as Al mentioned. These patches fix f2fs and fscrypto based on -rc3 bug fixes in ext4 crypto, which have not yet been fully propagated as follows. - use of dget_parent and file_dentry to avoid crashes - disallow RCU-mode lookup in d_invalidate - disallow -ENOMEM in the core data encryption path" * tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: ext4/fscrypto: avoid RCU lookup in d_revalidate fscrypto: don't let data integrity writebacks fail with ENOMEM f2fs: use dget_parent and file_dentry in f2fs_file_open fscrypto: use dget_parent() in fscrypt_d_revalidate()
2016-04-14Make file credentials available to the seqfile interfacesLinus Torvalds
A lot of seqfile users seem to be using things like %pK that uses the credentials of the current process, but that is actually completely wrong for filesystem interfaces. The unix semantics for permission checking files is to check permissions at _open_ time, not at read or write time, and that is not just a small detail: passing off stdin/stdout/stderr to a suid application and making the actual IO happen in privileged context is a classic exploit technique. So if we want to be able to look at permissions at read time, we need to use the file open credentials, not the current ones. Normal file accesses can just use "f_cred" (or any of the helper functions that do that, like file_ns_capable()), but the seqfile interfaces do not have any such options. It turns out that seq_file _does_ save away the user_ns information of the file, though. Since user_ns is just part of the full credential information, replace that special case with saving off the cred pointer instead, and suddenly seq_file has all the permission information it needs. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14GFS2: Don't dereference inode in gfs2_inode_lookup until it's validBob Peterson
Function gfs2_inode_lookup was dereferencing the inode, and after, it checks for the value being NULL. We need to check that first. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-13sock: tigthen lockdep checks for sock_owned_by_userHannes Frederic Sowa
sock_owned_by_user should not be used without socket lock held. It seems to be a common practice to check .owned before lock reclassification, so provide a little help to abstract this check away. Cc: linux-cifs@vger.kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: linux-nfs@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-12ext4/fscrypto: avoid RCU lookup in d_revalidateJaegeuk Kim
As Al pointed, d_revalidate should return RCU lookup before using d_inode. This was originally introduced by: commit 34286d666230 ("fs: rcu-walk aware d_revalidate method"). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable <stable@vger.kernel.org>
2016-04-12debugfs: Make automount point inodes permanently emptySeth Forshee
Starting with 4.1 the tracing subsystem has its own filesystem which is automounted in the tracing subdirectory of debugfs. Prior to this debugfs could be bind mounted in a cloned mount namespace, but if tracefs has been mounted under debugfs this now fails because there is a locked child mount. This creates a regression for container software which bind mounts debugfs to satisfy the assumption of some userspace software. In other pseudo filesystems such as proc and sysfs we're already creating mountpoints like this in such a way that no dirents can be created in the directories, allowing them to be exceptions to some MNT_LOCKED tests. In fact we're already do this for the tracefs mountpoint in sysfs. Do the same in debugfs_create_automount(), since the intention here is clearly to create a mountpoint. This fixes the regression, as locked child mounts on permanently empty directories do not cause a bind mount to fail. Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12debugfs: unproxify files created through debugfs_create_u32_array()Nicolai Stange
The struct file_operations u32_array_fops associated with files created through debugfs_create_u32_array() has been lifetime aware already: everything needed for subsequent operation is copied to a ->f_private buffer at file opening time in u32_array_open(). Now, ->open() is always protected against file removal issues by the debugfs core. There is no need for the debugfs core to wrap the u32_array_fops with a file lifetime managing proxy. Make debugfs_create_u32_array() create its files in non-proxying operation mode by means of debugfs_create_file_unsafe(). Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12debugfs: unproxify files created through debugfs_create_blob()Nicolai Stange
Currently, the struct file_operations fops_blob associated with files created through the debugfs_create_blob() helpers are not file lifetime aware. Thus, a lifetime managing proxy is created around fops_blob each time such a file is opened which is an unnecessary waste of resources. Implement file lifetime management for the fops_bool file_operations. Namely, make read_file_blob() safe gainst file removals by means of debugfs_use_file_start() and debugfs_use_file_finish(). Make debugfs_create_blob() create its files in non-proxying operation mode by means of debugfs_create_file_unsafe(). Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12debugfs: unproxify files created through debugfs_create_bool()Nicolai Stange
Currently, the struct file_operations fops_bool associated with files created through the debugfs_create_bool() helpers are not file lifetime aware. Thus, a lifetime managing proxy is created around fops_bool each time such a file is opened which is an unnecessary waste of resources. Implement file lifetime management for the fops_bool file_operations. Namely, make debugfs_read_file_bool() and debugfs_write_file_bool() safe against file removals by means of debugfs_use_file_start() and debugfs_use_file_finish(). Make debugfs_create_bool() create its files in non-proxying operation mode through debugfs_create_mode_unsafe(). Finally, purge debugfs_create_mode() as debugfs_create_bool() had been its last user. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12debugfs: unproxify integer attribute filesNicolai Stange
Currently, the struct file_operations associated with the integer attribute style files created through the debugfs_create_*() helpers are not file lifetime aware as they are defined by means of DEFINE_SIMPLE_ATTRIBUTE(). Thus, a lifetime managing proxy is created around the original fops each time such a file is opened which is an unnecessary waste of resources. Migrate all usages of DEFINE_SIMPLE_ATTRIBUTE() within debugfs itself to DEFINE_DEBUGFS_ATTRIBUTE() in order to implement file lifetime managing within the struct file_operations thus defined. Introduce the debugfs_create_mode_unsafe() helper, analogous to debugfs_create_mode(), but distinct in that it creates the files in non-proxying operation mode through debugfs_create_file_unsafe(). Feed all struct file_operations migrated to DEFINE_DEBUGFS_ATTRIBUTE() into debugfs_create_mode_unsafe() instead of former debugfs_create_mode(). Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12debugfs: add support for self-protecting attribute file fopsNicolai Stange
In order to protect them against file removal issues, debugfs_create_file() creates a lifetime managing proxy around each struct file_operations handed in. In cases where this struct file_operations is able to manage file lifetime by itself already, the proxy created by debugfs is a waste of resources. The most common class of struct file_operations given to debugfs are those defined by means of the DEFINE_SIMPLE_ATTRIBUTE() macro. Introduce a DEFINE_DEBUGFS_ATTRIBUTE() macro to allow any struct file_operations of this class to be easily made file lifetime aware and thus, to be operated unproxied. Specifically, introduce debugfs_attr_read() and debugfs_attr_write() which wrap simple_attr_read() and simple_attr_write() under the protection of a debugfs_use_file_start()/debugfs_use_file_finish() pair. Make DEFINE_DEBUGFS_ATTRIBUTE() set the defined struct file_operations' ->read() and ->write() members to these wrappers. Export debugfs_create_file_unsafe() in order to allow debugfs users to create their files in non-proxying operation mode. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12debugfs: prevent access to removed files' private dataNicolai Stange
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might still be attempted to access associated private file data through previously opened struct file objects. If that data has been freed by the caller of debugfs_remove*() in the meanwhile, the reading/writing process would either encounter a fault or, if the memory address in question has been reassigned again, unrelated data structures could get overwritten. However, since debugfs files are seldomly removed, usually from module exit handlers only, the impact is very low. Currently, there are ~1000 call sites of debugfs_create_file() spread throughout the whole tree and touching all of those struct file_operations in order to make them file removal aware by means of checking the result of debugfs_use_file_start() from within their methods is unfeasible. Instead, wrap the struct file_operations by a lifetime managing proxy at file open: - In debugfs_create_file(), the original fops handed in has got stashed away in ->d_fsdata already. - In debugfs_create_file(), install a proxy file_operations factory, debugfs_full_proxy_file_operations, at ->i_fop. This proxy factory has got an ->open() method only. It carries out some lifetime checks and if successful, dynamically allocates and sets up a new struct file_operations proxy at ->f_op. Afterwards, it forwards to the ->open() of the original struct file_operations in ->d_fsdata, if any. The dynamically set up proxy at ->f_op has got a lifetime managing wrapper set for each of the methods defined in the original struct file_operations in ->d_fsdata. Its ->release()er frees the proxy again and forwards to the original ->release(), if any. In order not to mislead the VFS layer, it is strictly necessary to leave those fields blank in the proxy that have been NULL in the original struct file_operations also, i.e. aren't supported. This is why there is a need for dynamically allocated proxies. The choice made not to allocate a proxy instance for every dentry at file creation, but for every struct file object instantiated thereof is justified by the expected usage pattern of debugfs, namely that in general very few files get opened more than once at a time. The wrapper methods set in the struct file_operations implement lifetime managing by means of the SRCU protection facilities already in place for debugfs: They set up a SRCU read side critical section and check whether the dentry is still alive by means of debugfs_use_file_start(). If so, they forward the call to the original struct file_operation stored in ->d_fsdata, still under the protection of the SRCU read side critical section. This SRCU read side critical section prevents any pending debugfs_remove() and friends to return to their callers. Since a file's private data must only be freed after the return of debugfs_remove(), the ongoing proxied call is guarded against any file removal race. If, on the other hand, the initial call to debugfs_use_file_start() detects that the dentry is dead, the wrapper simply returns -EIO and does not forward the call. Note that the ->poll() wrapper is special in that its signature does not allow for the return of arbitrary -EXXX values and thus, POLLHUP is returned here. In order not to pollute debugfs with wrapper definitions that aren't ever needed, I chose not to define a wrapper for every struct file_operations method possible. Instead, a wrapper is defined only for the subset of methods which are actually set by any debugfs users. Currently, these are: ->llseek() ->read() ->write() ->unlocked_ioctl() ->poll() The ->release() wrapper is special in that it does not protect the original ->release() in any way from dead files in order not to leak resources. Thus, any ->release() handed to debugfs must implement file lifetime management manually, if needed. For only 33 out of a total of 434 releasers handed in to debugfs, it could not be verified immediately whether they access data structures that might have been freed upon a debugfs_remove() return in the meanwhile. Export debugfs_use_file_start() and debugfs_use_file_finish() in order to allow any ->release() to manually implement file lifetime management. For a set of common cases of struct file_operations implemented by the debugfs_core itself, future patches will incorporate file lifetime management directly within those in order to allow for their unproxied operation. Rename the original, non-proxying "debugfs_create_file()" to "debugfs_create_file_unsafe()" and keep it for future internal use by debugfs itself. Factor out code common to both into the new __debugfs_create_file(). Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12debugfs: prevent access to possibly dead file_operations at file openNicolai Stange
Nothing prevents a dentry found by path lookup before a return of __debugfs_remove() to actually get opened after that return. Now, after the return of __debugfs_remove(), there are no guarantees whatsoever regarding the memory the corresponding inode's file_operations object had been kept in. Since __debugfs_remove() is seldomly invoked, usually from module exit handlers only, the race is hard to trigger and the impact is very low. A discussion of the problem outlined above as well as a suggested solution can be found in the (sub-)thread rooted at http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk ("Yet another pipe related oops.") Basically, Greg KH suggests to introduce an intermediate fops and Al Viro points out that a pointer to the original ones may be stored in ->d_fsdata. Follow this line of reasoning: - Add SRCU as a reverse dependency of DEBUG_FS. - Introduce a srcu_struct object for the debugfs subsystem. - In debugfs_create_file(), store a pointer to the original file_operations object in ->d_fsdata. - Make debugfs_remove() and debugfs_remove_recursive() wait for a SRCU grace period after the dentry has been delete()'d and before they return to their callers. - Introduce an intermediate file_operations object named "debugfs_open_proxy_file_operations". It's ->open() functions checks, under the protection of a SRCU read lock, whether the dentry is still alive, i.e. has not been d_delete()'d and if so, tries to acquire a reference on the owning module. On success, it sets the file object's ->f_op to the original file_operations and forwards the ongoing open() call to the original ->open(). - For clarity, rename the former debugfs_file_operations to debugfs_noop_file_operations -- they are in no way canonical. The choice of SRCU over "normal" RCU is justified by the fact, that the former may also be used to protect ->i_private data from going away during the execution of a file's readers and writers which may (and do) sleep. Finally, introduce the fs/debugfs/internal.h header containing some declarations internal to the debugfs implementation. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12fscrypto: don't let data integrity writebacks fail with ENOMEMJaegeuk Kim
This patch fixes the issue introduced by the ext4 crypto fix in a same manner. For F2FS, however, we flush the pending IOs and wait for a while to acquire free memory. Fixes: c9af28fdd4492 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12f2fs: use dget_parent and file_dentry in f2fs_file_openJaegeuk Kim
This patch synced with the below two ext4 crypto fixes together. In 4.6-rc1, f2fs newly introduced accessing f_path.dentry which crashes overlayfs. To fix, now we need to use file_dentry() to access that field. Fixes: c0a37d487884 ("ext4: use file_dentry()") Fixes: 9dd78d8c9a7b ("ext4: use dget_parent() in ext4_file_open()") Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12fscrypto: use dget_parent() in fscrypt_d_revalidate()Jaegeuk Kim
This patch updates fscrypto along with the below ext4 crypto change. Fixes: 3d43bcfef5f0 ("ext4 crypto: use dget_parent() in ext4_d_revalidate()") Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytesDenys Vlasenko
This function compiles to 522 bytes of machine code. Error paths are not very time critical. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>