summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-08-17xfs: (re-)implement FIEMAP_FLAG_XATTRChristoph Hellwig
Use a special read-only iomap_ops implementation to support fiemap on the attr fork. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: simplify xfs_file_iomap_beginChristoph Hellwig
We'll never get nimap == 0 for a successful return from xfs_bmapi_read, so don't try to handle it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: mark ->iomap_end as optionalChristoph Hellwig
No need to implement it for read-only mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: prepare iomap_fiemap for attribute mappingsDave Chinner
By bassing through an -ENOENT, similar to the old XFS implementation of FIEMAP_FLAG_XATTR. Signed-off-by: Dave Chinner <dchinner@redhat.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: fiemap should honor the FIEMAP_FLAG_SYNC flagDave Chinner
The flag is checked as supported, but then we do an unconditional sync of the file, regardless of whether the flag is set or not. Make the sync conditional on having the FIEMAP_FLAG_SYNC flag set. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: remove superflous pagefault_disable from iomap_write_actorChristoph Hellwig
iov_iter_copy_from_user_atomic disables page faults internally, no need to do it around the call. This also brings the iomap code in line with the original filemap version. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: remove superflous mark_page_accessed from iomap_write_actorChristoph Hellwig
This catches up with commit 2457ae ("mm: non-atomically mark page accessed during page cache allocation where possible"), which moved the initial access marking into the pagecache allocator. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: store rmapbt block count in the AGFDarrick J. Wong
Track the number of blocks used for the rmapbt in the AGF. When we get to the AG reservation code we need this counter to quickly make our reservation during mount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: don't invalidate whole file on DAX read/writeDave Chinner
When we do DAX IO, we try to invalidate the entire page cache held on the file. This is incorrect as it will trash the entire mapping tree that now tracks dirty state in exceptional entries in the radix tree slots. What we are trying to do is remove cached pages (e.g from reads into holes) that sit in the radix tree over the range we are about to write to. Hence we should just limit the invalidation to the range we are about to overwrite. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: fix bogus space reservation in xfs_iomap_write_allocateChristoph Hellwig
The space reservations was without an explaination in commit "Add error reporting calls in error paths that return EFSCORRUPTED" back in 2003. There is no reason to reserve disk blocks in the transaction when allocating blocks for delalloc space as we already reserved the space when creating the delalloc extent. With this fix we stop running out of the reserved pool in generic/229, which has happened for long time with small blocksize file systems, and has increased in severity with the new buffered write path. [ dchinner: we still need to pass the block reservation into xfs_bmapi_write() to ensure we don't deadlock during AG selection. See commit dbd5c8c ("xfs: pass total block res. as total xfs_bmapi_write() parameter") for more details on why this is necessary. ] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: don't assert fail on non-async buffers on ioacct decrementBrian Foster
The buffer I/O accounting mechanism tracks async buffers under I/O. As an optimization, the buffer I/O count is incremented only once on the first async I/O for a given hold cycle of a buffer and decremented once the buffer is released to the LRU (or freed). xfs_buf_ioacct_dec() has an ASSERT() check for an XBF_ASYNC buffer, but we have one or two corner cases where a buffer can be submitted for I/O multiple times via different methods in a single hold cycle. If an async I/O occurs first, the I/O count is incremented. If a sync I/O occurs before the hold count drops, XBF_ASYNC is cleared by the time the I/O count is decremented. Remove the async assert check from xfs_buf_ioacct_dec() as this is a perfectly valid scenario. For the purposes of I/O accounting, we really only care about the buffer async state at I/O submission time. Discovered-and-analyzed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-16orangefs: rename most remaining global variablesMartin Brandenburg
Only op_timeout_secs, slot_timeout_secs, and hash_table_size are left because they are exposed as module parameters. All other global variables have the orangefs_ prefix. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-16pNFS/flexfiles: Set reasonable default retrans values for the data channelTrond Myklebust
Prior to this patch, the retrans value was set at 5, meaning that we could see a maximum retransmission timeout value of more than 6 minutes. That's a tad high for NFSv3 where the protocol does allow the server to drop requests at any time. Since this is a data channel, let's just set retrans to 0, and the default timeout to 60s. The user can continue to adjust these defaults using the dataserver_retrans and dataserver_timeo module parameters. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-16NFS: Allow the mount option retrans=0Trond Myklebust
We should allow retrans=0 as just meaning that every timeout is a major timeout, and that there is no increment in the timeout value. For instance, this means that we would allow TCP users to specify a flat timeout value of 60s, by specifying "timeo=600,retrans=0" in their mount option string. Siged-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-15orangefs: g_orangefs_stats -> orangefs_stats for consistencyMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-15orangefs: make devreq_mutex staticMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-15orangefs: describe organization of sysfsMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-15orangefs: remove duplicated sysfs_ops structuresMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-15orangefs: consolidate sysfs show and store functionsMartin Brandenburg
Remove a good bit of obfuscated and duplicated code. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-15orangefs: reorganize duplicated sysfs attribute structsMartin Brandenburg
We had a separate struct type for each type of attribute, but they all did the exact same thing. Consolidate them into one struct orangefs_attribute type. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-15orangefs: remove dead code in sysfsMartin Brandenburg
We had a pageful of structures containing kobjects and variables to store sysfs entries. However only the kobjects were in use. Replace them with kobjects. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-15quota: fill in Q_XGETQSTAT inode information for inactive quotasEric Sandeen
The manpage for quotactl says that the Q_XGETQSTAT command is "useful in finding out how much space is spent to store quota information," but the current implementation does not report this info if the inode is allocated, but its quota type is not enabled. This is a change from the earlier XFS implementation, which reported information about allocated quota inodes even if their quota type was not currently active. Change quota_getstate() and quota_getstatev() to copy out the inode information if the filesystem has provided it, even if the quota type for that inode is not currently active. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-08-15orangefs: clean up debugfs globalsMartin Brandenburg
Mostly this is moving code into orangefs-debugfs.c so that globals turn into static globals. Then gossip_debug_mask is renamed orangefs_gossip_debug_mask but keeps global visibility, so it can be used from a macro. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-14net: make net namespace sysctls belong to container's ownerDmitry Torokhov
If net namespace is attached to a user namespace let's make container's root owner of sysctls affecting said network namespace instead of global root. This also allows us to clean up net_ctl_permissions() because we do not need to fudge permissions anymore for the container's owner since it now owns the objects in question. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14proc: make proc entries inherit ownership from parentDmitry Torokhov
There are certain parameters that belong to net namespace and that are exported in /proc. They should be controllable by the container's owner, but are currently owned by global root and thus not available. Let's change proc code to inherit ownership of parent entry, and when create per-ns "net" proc entry set it up as owned by container's owner. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14pNFS/flexfiles: Fix layoutstat periodic reportingTrond Myklebust
Putting the periodicity timer in the mirror instances is causing non-scalable reporting behaviour and missed reporting intervals. When you recall layouts and/or implement client side mirroring, it leads to consecutive reports with only a few ms between RPC calls. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: d0379a5d066a9 ("pNFS/flexfiles: Support server-supplied...")
2016-08-13Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - an NVMe fix from Gabriel, fixing a suspend/resume issue on some setups - addition of a few missing entries in the block queue sysfs documentation, from Joe - a fix for a sparse shadow warning for the bvec iterator, from Johannes - a writeback deadlock involving raid issuing barriers, and not flushing the plug when we wakeup the flusher threads. From Konstantin - a set of patches for the NVMe target/loop/rdma code, from Roland and Sagi * 'for-linus' of git://git.kernel.dk/linux-block: bvec: avoid variable shadowing warning doc: update block/queue-sysfs.txt entries nvme: Suspend all queues before deletion mm, writeback: flush plugged IO in wakeup_flusher_threads() nvme-rdma: Remove unused includes nvme-rdma: start async event handler after reconnecting to a controller nvmet: Fix controller serial number inconsistency nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads nvmet-rdma: Correctly handle RDMA device hot removal nvme-rdma: Make sure to shutdown the controller if we can nvme-loop: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Free the I/O tags when we delete the controller nvme-rdma: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Fix device removal handling nvme-rdma: Queue ns scanning after a sucessful reconnection nvme-rdma: Don't leak uninitialized memory in connect request private data
2016-08-12Merge tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for races in the LOCK code which appear to go back to the big nfsd state lock removal from 3.17" * tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux: nfsd: don't return an unhashed lock stateid after taking mutex nfsd: Fix race between FREE_STATEID and LOCK nfsd: fix dentry refcounting on create
2016-08-12orangefs: do not allow client readahead cache without feature bitMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-12nfsd: don't return an unhashed lock stateid after taking mutexJeff Layton
nfsd4_lock will take the st_mutex before working with the stateid it gets, but between the time when we drop the cl_lock and take the mutex, the stateid could become unhashed (a'la FREE_STATEID). If that happens the lock stateid returned to the client will be forgotten. Fix this by first moving the st_mutex acquisition into lookup_or_create_lock_state. Then, have it check to see if the lock stateid is still hashed after taking the mutex. If it's not, then put the stateid and try the find/create again. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-12Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs multiple different security services (e.g. krb5i and krb5p). - Stable patch to fix a regression introduced by the use of SO_REUSEPORT, and that prevented the use of multiple different NFS versions to the same server. - TCP socket reconnection timer fixes. - Patch from Neil to disable the use of IPv6 temporary addresses" * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Cap the transport reconnection timer at 1/2 lease period NFSv4: Cleanup the setting of the nfs4 lease period SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout SUNRPC: Fix reconnection timeouts NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED SUNRPC: disable the use of IPv6 temporary addresses. SUNRPC: allow for upcalls for same uid but different gss service SUNRPC: Fix up socket autodisconnect SUNRPC: Handle EADDRNOTAVAIL on connection failures
2016-08-12orangefs: add features opMartin Brandenburg
This is a new userspace operation, which will be done if the client-core version is greater than or equal to 2.9.6. This will provide a way to implement optional features and to determine which features are supported by the client-core. If the client-core version is older than 2.9.6, no optional features are supported and the op will not be done. The intent is to allow protocol extensions without relying on the client-core's current behavior of ignoring what it doesn't understand. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-12ARM: 8594/1: enable binfmt_flat on systems with an MMUNicolas Pitre
Now that the generic changes are in place, this can be enabled on ARM with the use of proper user space accessors in the flat_get_addr_from_rp() and flat_put_addr_at_rp() handlers as rp actually holds a user space address. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-08-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "7 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats mm, oom: fix uninitialized ret in task_will_free_mem() kasan: remove the unnecessary WARN_ONCE from quarantine.c mm: memcontrol: fix memcg id ref counter on swap charge move mm: memcontrol: fix swap counter leak on swapout from offline cgroup proc, meminfo: use correct helpers for calculating LRU sizes in meminfo mm/hugetlb: fix incorrect hugepages count during mem hotplug
2016-08-11proc, meminfo: use correct helpers for calculating LRU sizes in meminfoMel Gorman
meminfo_proc_show() and si_mem_available() are using the wrong helpers for calculating the size of the LRUs. The user-visible impact is that there appears to be an abnormally high number of unevictable pages. Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11Merge tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch for a NULL dereference bug introduced in 4.8-rc1 and a handful of static checker fixes" * tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client: ceph: initialize pathbase in the !dentry case in encode_caps_cb() rbd: nuke the 32-bit pool id check rbd: destroy header_oloc in rbd_dev_release() ceph: fix null pointer dereference in ceph_flush_snaps() libceph: using kfree_rcu() to simplify the code libceph: make cancel_generic_request() static libceph: fix return value check in alloc_msg_with_page_vector()
2016-08-11nfsd: Fix race between FREE_STATEID and LOCKChuck Lever
When running LTP's nfslock01 test, the Linux client can send a LOCK and a FREE_STATEID request at the same time. The outcome is: Frame 324 R OPEN stateid [2,O] Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64 Frame 115008 R LOCK stateid [1,L] Frame 115012 C WRITE stateid [0,L] offset 672000 len 64 Frame 115016 R WRITE NFS4_OK Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64 Frame 115022 R LOCKU NFS4_OK Frame 115025 C FREE_STATEID stateid [2,L] Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64 Frame 115029 R FREE_STATEID NFS4_OK Frame 115030 R LOCK stateid [3,L] Frame 115034 C WRITE stateid [0,L] offset 672128 len 64 Frame 115038 R WRITE NFS4ERR_BAD_STATEID In other words, the server returns stateid L in a successful LOCK reply, but it has already released it. Subsequent uses of stateid L fail. To address this, protect the generation check in nfsd4_free_stateid with the st_mutex. This should guarantee that only one of two outcomes occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID returns NFS4ERR_LOCKS_HELD. Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-11ext4: avoid deadlock when expanding inode sizeJan Kara
When we need to move xattrs into external xattr block, we call ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end up calling ext4_mark_inode_dirty() again which will recurse back into the inode expansion code leading to deadlocks. Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move its management into ext4_expand_extra_isize_ea() since its manipulation is safe there (due to xattr_sem) from possible races with ext4_xattr_set_handle() which plays with it as well. CC: stable@vger.kernel.org # 4.4.x Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-08-11ext4: properly align shifted xattrs when expanding inodesJan Kara
We did not count with the padding of xattr value when computing desired shift of xattrs in the inode when expanding i_extra_isize. As a result we could create unaligned start of inline xattrs. Account for alignment properly. CC: stable@vger.kernel.org # 4.4.x- Signed-off-by: Jan Kara <jack@suse.cz>
2016-08-11ext4: fix xattr shifting when expanding inodes part 2Jan Kara
When multiple xattrs need to be moved out of inode, we did not properly recompute total size of xattr headers in the inode and the new header position. Thus when moving the second and further xattr we asked ext4_xattr_shift_entries() to move too much and from the wrong place, resulting in possible xattr value corruption or general memory corruption. CC: stable@vger.kernel.org # 4.4.x Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-08-11ext4: fix xattr shifting when expanding inodesJan Kara
The code in ext4_expand_extra_isize_ea() treated new_extra_isize argument sometimes as the desired target i_extra_isize and sometimes as the amount by which we need to grow current i_extra_isize. These happen to coincide when i_extra_isize is 0 which used to be the common case and so nobody noticed this until recently when we added i_projid to the inode and so i_extra_isize now needs to grow from 28 to 32 bytes. The result of these bugs was that we sometimes unnecessarily decided to move xattrs out of inode even if there was enough space and we often ended up corrupting in-inode xattrs because arguments to ext4_xattr_shift_entries() were just wrong. This could demonstrate itself as BUG_ON in ext4_xattr_shift_entries() triggering. Fix the problem by introducing new isize_diff variable and use it where appropriate. CC: stable@vger.kernel.org # 4.4.x Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-08-11nfsd: fix dentry refcounting on createJosef Bacik
b44061d0b9 introduced a dentry ref counting bug. Previously we were grabbing one ref to dchild in nfsd_create(), but with the creation of nfsd_create_locked() we have a ref for dchild from the lookup in nfsd_create(), and then another ref in nfsd_create_locked(). The ref from the lookup in nfsd_create() is never dropped and results in dentries still in use at unmount. Signed-off-by: Josef Bacik <jbacik@fb.com> Fixes: b44061d0b9 "nfsd: reorganize nfsd_create" Reported-by: kernel test robot <xiaolong.ye@intel.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-10Merge branch 'for-linus-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Some fixes for btrfs send/recv and fsync from Filipe and Robbie Ko. Bonus points to Filipe for already having xfstests in place for many of these" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() Btrfs: improve performance on fsync against new inode after rename/unlink Btrfs: be more precise on errors when getting an inode from disk Btrfs: send, don't bug on inconsistent snapshots Btrfs: send, avoid incorrect leaf accesses when sending utimes operations Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations Btrfs: send, fix warning due to late freeing of orphan_dir_info structures Btrfs: incremental send, fix premature rmdir operations Btrfs: incremental send, fix invalid paths for rename operations Btrfs: send, add missing error check for calls to path_loop() Btrfs: send, fix failure to move directories with the same name around Btrfs: add missing check for writeback errors on fsync
2016-08-10kernfs: remove kernfs_path_len()Tejun Heo
It doesn't have any in-kernel user and the same result can be obtained from kernfs_path(@kn, NULL, 0). Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
2016-08-10kernfs: make kernfs_path*() behave in the style of strlcpy()Tejun Heo
kernfs_path*() functions always return the length of the full path but the path content is undefined if the length is larger than the provided buffer. This makes its behavior different from strlcpy() and requires error handling in all its users even when they don't care about truncation. In addition, the implementation can actully be simplified by making it behave properly in strlcpy() style. * Update kernfs_path_from_node_locked() to always fill up the buffer with path. If the buffer is not large enough, the output is truncated and terminated. * kernfs_path() no longer needs error handling. Make it a simple inline wrapper around kernfs_path_from_node(). * sysfs_warn_dup()'s use of kernfs_path() doesn't need error handling. Updated accordingly. * cgroup_path()'s use of kernfs_path() updated to retain the old behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2016-08-09mm, writeback: flush plugged IO in wakeup_flusher_threads()Konstantin Khlebnikov
I've found funny live-lock between raid10 barriers during resync and memory controller hard limits. Inside mpage_readpages() task holds on to its plug bio which blocks the barrier in raid10. Its memory cgroup have no free memory thus the task goes into reclaimer but all reclaimable pages are dirty and cannot be written because raid10 is rebuilding and stuck on the barrier. Common flush of such IO in schedule() never happens, because the caller doesn't go to sleep. Lock is 'live' because changing memory limit or killing tasks which holds that stuck bio unblock whole progress. That was what happened in 3.18.x but I see no difference in upstream logic. Theoretically this might happen even without memory cgroup. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-09orangefs: record userspace version for feature compatbilityMartin Brandenburg
The client reports its version to the kernel on startup. We already test that it is above the minimum version. Now we record it in a global variable so code elsewhere can consult it before making a request the client may not understand. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-08-09mm: memcontrol: only mark charged pages with PageKmemcgVladimir Davydov
To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg, which sets page->_mapcount to -512. Currently, we set/clear PageKmemcg in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated with __GFP_ACCOUNT, including those that aren't actually charged to any cgroup, i.e. allocated from the root cgroup context. To avoid overhead in case cgroups are not used, we only do that if memcg_kmem_enabled() is true. The latter is set iff there are kmem-enabled memory cgroups (online or offline). The root cgroup is not considered kmem-enabled. As a result, if a page is allocated with __GFP_ACCOUNT for the root cgroup when there are kmem-enabled memory cgroups and is freed after all kmem-enabled memory cgroups were removed, e.g. # no memory cgroups has been created yet, create one mkdir /sys/fs/cgroup/memory/test # run something allocating pages with __GFP_ACCOUNT, e.g. # a program using pipe dmesg | tail # remove the memory cgroup rmdir /sys/fs/cgroup/memory/test we'll get bad page state bug complaining about page->_mapcount != -1: BUG: Bad page state in process swapper/0 pfn:1fd945c page:ffffea007f651700 count:0 mapcount:-511 mapping: (null) index:0x0 flags: 0x1000000000000000() To avoid that, let's mark with PageKmemcg only those pages that are actually charged to and hence pin a non-root memory cgroup. Fixes: 4949148ad433 ("mm: charge/uncharge kmemcg from generic page allocator paths") Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09ceph: initialize pathbase in the !dentry case in encode_caps_cb()Ilya Dryomov
pathbase is the base inode; set it to 0 if we've got no path. Coverity-id: 146348 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-09ext2: Check return value from ext2_get_group_desc()Jan Kara
ext2_get_group_desc() can return NULL if there is some error. This usually means there is some programming error in the ext2 driver itself but let's be defensive and handle that case. Coverity-id: 115628 Signed-off-by: Jan Kara <jack@suse.cz>