summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-06-04fs/hugetlbfs/inode.c: use static const for dentry_operationsFabian Frederick
...like other filesystems. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/hugetlbfs/inode.c: add static to hugetlbfs_i_mmap_mutex_keyFabian Frederick
hugetlbfs_i_mmap_mutex_key is only used in inode.c Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: non-atomically mark page accessed during page cache allocation where ↵Mel Gorman
possible aops->write_begin may allocate a new page and make it visible only to have mark_page_accessed called almost immediately after. Once the page is visible the atomic operations are necessary which is noticable overhead when writing to an in-memory filesystem like tmpfs but should also be noticable with fast storage. The objective of the patch is to initialse the accessed information with non-atomic operations before the page is visible. The bulk of filesystems directly or indirectly use grab_cache_page_write_begin or find_or_create_page for the initial allocation of a page cache page. This patch adds an init_page_accessed() helper which behaves like the first call to mark_page_accessed() but may called before the page is visible and can be done non-atomically. The primary APIs of concern in this care are the following and are used by most filesystems. find_get_page find_lock_page find_or_create_page grab_cache_page_nowait grab_cache_page_write_begin All of them are very similar in detail to the patch creates a core helper pagecache_get_page() which takes a flags parameter that affects its behavior such as whether the page should be marked accessed or not. Then old API is preserved but is basically a thin wrapper around this core function. Each of the filesystems are then updated to avoid calling mark_page_accessed when it is known that the VM interfaces have already done the job. There is a slight snag in that the timing of the mark_page_accessed() has now changed so in rare cases it's possible a page gets to the end of the LRU as PageReferenced where as previously it might have been repromoted. This is expected to be rare but it's worth the filesystem people thinking about it in case they see a problem with the timing change. It is also the case that some filesystems may be marking pages accessed that previously did not but it makes sense that filesystems have consistent behaviour in this regard. The test case used to evaulate this is a simple dd of a large file done multiple times with the file deleted on each iterations. The size of the file is 1/10th physical memory to avoid dirty page balancing. In the async case it will be possible that the workload completes without even hitting the disk and will have variable results but highlight the impact of mark_page_accessed for async IO. The sync results are expected to be more stable. The exception is tmpfs where the normal case is for the "IO" to not hit the disk. The test machine was single socket and UMA to avoid any scheduling or NUMA artifacts. Throughput and wall times are presented for sync IO, only wall times are shown for async as the granularity reported by dd and the variability is unsuitable for comparison. As async results were variable do to writback timings, I'm only reporting the maximum figures. The sync results were stable enough to make the mean and stddev uninteresting. The performance results are reported based on a run with no profiling. Profile data is based on a separate run with oprofile running. async dd 3.15.0-rc3 3.15.0-rc3 vanilla accessed-v2 ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%) tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%) btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%) ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%) xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%) The XFS figure is a bit strange as it managed to avoid a worst case by sheer luck but the average figures looked reasonable. samples percentage ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs: buffer: do not use unnecessary atomic operations when discarding buffersMel Gorman
Discarding buffers uses a bunch of atomic operations when discarding buffers because ...... I can't think of a reason. Use a cmpxchg loop to clear all the necessary flags. In most (all?) cases this will be a single atomic operations. [akpm@linux-foundation.org: move BUFFER_FLAGS_DISCARD into the .c file] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: page_alloc: convert hot/cold parameter and immediate callers to boolMel Gorman
cold is a bool, make it one. Make the likely case the "if" part of the block instead of the else as according to the optimisation manual this is preferred. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/block_dev.c: add bdev_read_page() and bdev_write_page()Matthew Wilcox
A block device driver may choose to provide a rw_page operation. These will be called when the filesystem is attempting to do page sized I/O to page cache pages (ie not for direct I/O). This does preclude I/Os that are larger than page size, so this may only be a performance gain for some devices. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Tested-by: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/mpage.c: factor page_endio() out of mpage_end_io()Matthew Wilcox
page_endio() takes care of updating all the appropriate page flags once I/O has finished to a page. Switch to using mapping_set_error() instead of setting AS_EIO directly; this will handle thin-provisioned devices correctly. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/mpage.c: factor clean_buffers() out of __mpage_writepage()Matthew Wilcox
__mpage_writepage() is over 200 lines long, has 20 local variables, four goto labels and could desperately use simplification. Splitting clean_buffers() into a helper function improves matters a little, removing 20+ lines from it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/buffer.c: remove block_write_full_page_endio()Matthew Wilcox
The last in-tree caller of block_write_full_page_endio() was removed in January 2013. It's time to remove the EXPORT_SYMBOL, which leaves block_write_full_page() as the only caller of block_write_full_page_endio(), so inline block_write_full_page_endio() into block_write_full_page(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/hugetlbfs/inode.c: complete conversion to pr_foo()Andrew Morton
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm: softdirty: clear VM_SOFTDIRTY flag inside clear_refs_write() instead of ↵Cyrill Gorcunov
clear_soft_dirty() clear_refs_write() is called earlier than clear_soft_dirty() and it is more natural to clear VM_SOFTDIRTY (which belongs to VMA entry but not PTEs) that early instead of clearing it a way deeper inside call chain. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/libfs.c: add generic data flush to fsyncFabian Frederick
Description by Jan Kara: "A lot of older filesystems don't properly flush volatile disk caches on fsync(2) which can lead to loss of fsynced data after power failure. This patch makes generic_file_fsync() issue proper cache flush to fix the problem. Sysadmin can use /sys/devices/.../cache_type to tell the system it should not send the cache flush." [akpm@linux-foundation.org: nuke ifdef] [akpm@linux-foundation.org: fix warning] Signed-off-by: Fabian Frederick <fabf@skynet.be> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/9p: kerneldoc fixesFabian Frederick
Function parameters comment fixing. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/9p/v9fs.c: add __init to v9fs_sysfs_initFabian Frederick
v9fs_sysfs_init is only called by __init init_v9fs Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: remove some unused codeXue jiufei
dlm_recovery_ctxt.received is unused. ocfs2_should_refresh_lock_res() can only return 0 or 1, so the error handling code in ocfs2_super_lock() is unneeded. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: fix incorrect i_size of global bitmap inode after resizeJoseph Qi
Ocfs2 cluster size may be 1MB, which has 20 bits. When resize, the input new clusters is mostly the number of clusters in a group descriptor(32256). Since the input clusters is defined as type int, so it will overflow when shift left 20 bits and then lead to incorrect global bitmap i_size. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: cleanup unused paramters in ocfs2_calc_new_backup_superJoseph Qi
Parameters new_clusters and first_new_cluster are not used in ocfs2_update_last_group_and_inode, so remove them. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2/dlm: disallow node joining when recovery is on goingXue jiufei
We found a race situation when dlm recovery and node joining occurs simultaneously if the network state is bad. N1 N4 start joining dlm and send query join to all live nodes set joining node to N1, return OK send query join to other live nodes and it may take a while call dlm_send_join_assert() to send assert join message when N2 is down, so keep trying to send message to N2 until find N2 is down send assert join message to N3, but connection is down with N3, so it may take a while become the recovery master for N2 and send begin reco message to other nodes in domain map but no N1 connection with N3 is rebuild, then send assert join to N4 call dlm_assert_joined_handler(), add N1 to domain_map dlm recovery done, send finalize message to nodes in domain map, including N1 receiving finalize message, trigger the BUG() because recovery master mismatch. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: fix umount hang while shutting down truncate logXue jiufei
Revert commit 75f82eaa502c ("ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously") because it may cause a umount hang while shutting down the truncate log. fix NULL pointer dereference when dismount and ocfs2rec simultaneously The situation is as followes: ocfs2_dismout_volume -> ocfs2_recovery_exit -> free osb->recovery_map -> ocfs2_truncate_shutdown -> lock global bitmap inode -> ocfs2_wait_for_recovery -> check whether osb->recovery_map->rm_used is zero Because osb->recovery_map is already freed, rm_used can be any other values, so it may yield umount hang. To prevent NULL pointer dereference while getting sys_root_inode, we use a osb_tl_disable flag to disable schedule osb_truncate_log_wq after truncate log shutdown. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/ocfs2/ioctl.c: add static to local functionsFabian Frederick
ocfs_info_foo() and ocfs2_get_request_ptr functions are only used in ioctl.c Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2/dlm: fix possible convert=sion deadlockXue jiufei
We found there is a conversion deadlock when the owner of lockres happened to crash before send DLM_PROXY_AST_MSG for a downconverting lock. The situation is as follows: Node1 Node2 Node3 the owner of lockresA lock_1 granted at EX mode and call ocfs2_cluster_unlock to decrease ex_holders. converting lock_3 from NL to EX send DLM_PROXY_AST_MSG to Node1, asking Node 1 to downconvert. receiving DLM_PROXY_AST_MSG, thread ocfs2dc send DLM_CONVERT_LOCK_MSG to Node2 to downconvert lock_1(EX->NL). lock_1 can be granted and put it into pending_asts list, return DLM_NORMAL. then something happened and Node2 crashed. received DLM_NORMAL, waiting for DLM_PROXY_AST_MSG. selected as the recovery master, receving migrate lock from Node1, queue lock_1 to the tail of converting list. After dlm recovery, converting list in the master of lockresA(Node3) will be: converting list head <-> lock_3(NL->EX) <->lock_1(EX<->NL). Requested mode of lock_3 is not compatible with the granted mode of lock_1, so it can not be granted. and lock_1 can not downconvert because covnerting queue is strictly FIFO. So a deadlock is created. We think function dlm_process_recovery_data() should queue_ast for lock_1 or alter the order of lock_1 and lock_3, so dlm_thread can process lock_1 first. And if there are multiple downconverting locks, they must convert form PR to NL, so no need to sort them. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: limit printk when journal is abortedJoseph Qi
Once JBD2_ABORT is set, ocfs2_commit_cache will fail in ocfs2_commit_thread. Then it will get into a loop with mass logs. This will meaninglessly consume a larger number of resource and may lead to the system hanging. So limit printk in this case. [akpm@linux-foundation.org: document the msleep] Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: remove some redundant castingGeorge Spelvin
There are two standard techniques for dereferencing structures pointed to by void *: cast to the right type each time they're used, or assign to local variables of the right type. But there's no need to do *both*. Signed-off-by: George Spelvin <linux@horizon.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/ocfs2/super.c: use OCFS2_MAX_VOL_LABEL_LEN and strlcpyFabian Frederick
Replace strncpy(size 63) by defined value. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: remove NULL assignments on staticFabian Frederick
Static values are automatically initialized to NULL. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/configfs: use pr_fmtFabian Frederick
Add pr_fmt based on module name. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/configfs: convert printk to pr_foo()Fabian Frederick
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/configs/item.c: kernel-doc fixes + clean-upFabian Frederick
Fix function parameter documentation EXPORT_SYMBOLS moved after corresponding functions Small coding style and checkpatch warning fixes Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/squashfs/squashfs.h: replace pr_warning by pr_warnFabian Frederick
Update the last pr_warning callsite in fs branch Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ntfs: remove NULL value assignmentsFabian Frederick
Static values are automatically initialized to NULL. Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fanotify: check file flags passed in fanotify_initHeinrich Schuchardt
Without this patch fanotify_init does not validate the value passed in event_f_flags. When a fanotify event is read from the fanotify file descriptor a new file descriptor is created where file.f_flags = event_f_flags. Internal and external open flags are stored together in field f_flags of struct file. Hence, an application might create file descriptors with internal flags like FMODE_EXEC, FMODE_NOCMTIME set. Jan Kara and Eric Paris both aggreed that this is a bug and the value of event_f_flags should be checked: https://lkml.org/lkml/2014/4/29/522 https://lkml.org/lkml/2014/4/29/539 This updated patch version considers the comments by Michael Kerrisk in https://lkml.org/lkml/2014/5/4/10 With the patch the value of event_f_flags is checked. When specifying an invalid value error EINVAL is returned. Internal flags are disallowed. File creation flags are disallowed: O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TRUNC, and O_TTY_INIT. Flags which do not make sense with fanotify are disallowed: __O_TMPFILE, O_PATH, FASYNC, and O_DIRECT. This leaves us with the following allowed values: O_RDONLY, O_WRONLY, O_RDWR are basic functionality. The are stored in the bits given by O_ACCMODE. O_APPEND is working as expected. The value might be useful in a logging application which appends the current status each time the log is opened. O_LARGEFILE is needed for files exceeding 4GB on 32bit systems. O_NONBLOCK may be useful when monitoring slow devices like tapes. O_NDELAY is equal to O_NONBLOCK except for platform parisc. To avoid code breaking on parisc either both flags should be allowed or none. The patch allows both. __O_SYNC and O_DSYNC may be used to avoid data loss on power disruption. O_NOATIME may be useful to reduce disk activity. O_CLOEXEC may be useful, if separate processes shall be used to scan files. Once this patch is accepted, the fanotify_init.2 manpage has to be updated. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/notify/fanotify/fanotify_user.c: fix FAN_MARK_FLUSH flag checkingHeinrich Schuchardt
If fanotify_mark is called with illegal value of arguments flags and marks it usually returns EINVAL. When fanotify_mark is called with FAN_MARK_FLUSH the argument flags is not checked for irrelevant flags like FAN_MARK_IGNORED_MASK. The patch removes this inconsistency. If an irrelevant flag is set error EINVAL is returned. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/notify/mark.c: trivial cleanupDavid Cohen
Do not initialize private_destroy_list twice. list_replace_init() already takes care of initializing private_destroy_list. We don't need to initialize it with LIST_HEAD() beforehand. Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fanotify: create FAN_ACCESS event for readdirHeinrich Schuchardt
Before the patch, read creates FAN_ACCESS_PERM and FAN_ACCESS events, readdir creates only FAN_ACCESS_PERM events. This is inconsistent. After the patch, readdir creates FAN_ACCESS_PERM and FAN_ACCESS events. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fanotify: FAN_MARK_FLUSH: avoid having to provide a fake/invalid fd and pathHeinrich Schuchardt
Originally from Tvrtko Ursulin (https://lkml.org/lkml/2011/1/12/112) Avoid having to provide a fake/invalid fd and path when flushing marks Currently for a group to flush marks it has set it needs to provide a fake or invalid (but resolvable) file descriptor and path when calling fanotify_mark. This patch pulls the flush handling a bit up so file descriptor and path are completely ignored when flushing. I reworked the patch to be applicable again (the signature of fanotify_mark has changed since Tvrtko's work). Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/fscache: replace seq_printf by seq_putsFabian Frederick
Replace seq_printf where possible + coalesce formats from 2 existing seq_puts Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04fs/fscache: convert printk to pr_foo()Fabian Frederick
All printk converted to pr_foo() except internal.h: printk(KERN_DEBUG Coalesce formats. Add pr_fmt Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04nfsd4: hash deleg stateid only on successful nfs4_set_delegationBenny Halevy
We don't want the stateid to be found in the hash table before the delegation is granted. Currently this is protected by the client_mutex, but we want to break that up and this is a necessary step toward that goal. Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04nfsd4: rename recall_lock to state_lockBenny Halevy
...as the name is a bit more descriptive and we've started using it for other purposes. Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04nfsd: remove unneeded zeroing of fields in nfsd4_proc_compoundJeff Layton
The memset of resp in svc_process_common should ensure that these are already zeroed by the time they get here. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_openJeff Layton
In the NFS4_OPEN_CLAIM_PREVIOUS case, we should only mark it confirmed if the nfs4_check_open_reclaim check succeeds. In the NFS4_OPEN_CLAIM_DELEG_PREV_FH and NFS4_OPEN_CLAIM_DELEGATE_PREV cases, I see no point in declaring the openowner confirmed when the operation is going to fail anyway, and doing so might allow the client to game things such that it wouldn't need to confirm a subsequent open with the same owner. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04nfsd4: use recall_lock for delegation hashingBenny Halevy
This fixes a bug in the handling of the fi_delegations list. nfs4_setlease does not hold the recall_lock when adding to it. The client_mutex is held, which prevents against concurrent list changes, but nfsd_break_deleg_cb does not hold while walking it. New delegations could theoretically creep onto the list while we're walking it there. Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04Merge tag 'jfs-3.16' of git://github.com/kleikamp/linux-shaggy into nextLinus Torvalds
Pull jfs changes from Dave Kleikamp. * tag 'jfs-3.16' of git://github.com/kleikamp/linux-shaggy: fs/jfs/super.c: convert simple_str to kstr fs/jfs/jfs_dmap.c: replace min/casting by min_t fs/jfs/super.c: remove 0 assignment to static + code clean-up fs/jfs/jfs_logmgr.c: remove NULL assignment on static JFS: Check for NULL before calling posix_acl_equiv_mode() fs/jfs/jfs_inode.c: atomically set inode->i_flags
2014-06-04Merge tag 'gfs2-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw into next Pull gfs2 updates from Steven Whitehouse: "This must be about the smallest merge window patch set ever for GFS2. It is probably also the first one without a single patch from me. That is down to a combination of factors, and I have some things in the works that are not quite ready yet, that I hope to put in next time around. Returning to what is here this time... we have 3 patches which fix various warnings. Two are bug fixes (for quotas and also a rare recovery race condition). The final patch, from Ben Marzinski, is an important change in the freeze code which has been in progress for some time. This removes the need to take and drop the transaction lock for every single transaction, when the only time it was used, was at file system freeze time. Ben's patch integrates the freeze operation into the journal flush code as an alternative with lower overheads and also lands up resolving some difficult to fix races at the same time" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Prevent recovery before the local journal is set GFS2: fs/gfs2/file.c: kernel-doc warning fixes GFS2: fs/gfs2/bmap.c: kernel-doc warning fixes GFS2: remove transaction glock GFS2: lops.c: replace 0 by NULL for pointers GFS2: quotas not being refreshed in gfs2_adjust_quota
2014-06-04Merge tag 'locks-v3.16' of git://git.samba.org/jlayton/linux into nextLinus Torvalds
Pull file locking changes from Jeff Layton: "Pretty quiet on the file-locking related front this cycle. Just some small cleanups and the addition of some tracepoints in the lease handling code" * tag 'locks-v3.16' of git://git.samba.org/jlayton/linux: locks: add some tracepoints in the lease handling code fs/locks.c: replace seq_printf by seq_puts locks: ensure that fl_owner is always initialized properly in flock and lease codepaths
2014-06-04f2fs: fix to recover data written by dioJaegeuk Kim
If data are overwritten through dio, previous f2fs doesn't remain the fsync mark due to no additional node writes. Note that this patch should resolve the xfstests:311. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04f2fs: large volume supportChangman Lee
f2fs's cp has one page which consists of struct f2fs_checkpoint and version bitmap of sit and nat. To support lots of segments, we need more blocks for sit bitmap. So let's arrange sit bitmap as following: +-----------------+------------+ | f2fs_checkpoint | sit bitmap | | + nat bitmap | | +-----------------+------------+ 0 4k N blocks Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: simple code change for readability] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pagesChao Yu
Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Changes from V1 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. Call Trace: [<f1031630>] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [<f10377bb>] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [<f103c5da>] restore_node_summary+0x13a/0x280 [f2fs] [<f103e22d>] build_curseg+0x2bd/0x620 [f2fs] [<f104043b>] build_segment_manager+0x1cb/0x920 [f2fs] [<f1032c85>] f2fs_fill_super+0x535/0x8e0 [f2fs] [<c115b66a>] mount_bdev+0x16a/0x1a0 [<f102f63f>] f2fs_mount+0x1f/0x30 [f2fs] [<c115c096>] mount_fs+0x36/0x170 [<c1173635>] vfs_kern_mount+0x55/0xe0 [<c1175388>] do_mount+0x1e8/0x900 [<c1175d72>] SyS_mount+0x82/0xc0 [<c16059cc>] sysenter_do_call+0x12/0x22 Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04f2fs: avoid overflow when large directory feathure is enabledChao Yu
When large directory feathure is enable, We have one case which could cause overflow in dir_buckets() as following: special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2. Here we define MAX_DIR_BUCKETS to limit the return value when the condition could trigger potential overflow. Changes from V1 o modify description of calculation in f2fs.txt suggested by Changman Lee. Suggested-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull scheduler updates from Ingo Molnar: "The main scheduling related changes in this cycle were: - various sched/numa updates, for better performance - tree wide cleanup of open coded nice levels - nohz fix related to rq->nr_running use - cpuidle changes and continued consolidation to improve the kernel/sched/idle.c high level idle scheduling logic. As part of this effort I pulled cpuidle driver changes from Rafael as well. - standardized idle polling amongst architectures - continued work on preparing better power/energy aware scheduling - sched/rt updates - misc fixlets and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) sched/numa: Decay ->wakee_flips instead of zeroing sched/numa: Update migrate_improves/degrades_locality() sched/numa: Allow task switch if load imbalance improves sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice() sched: Initialize rq->age_stamp on processor start sched, nohz: Change rq->nr_running to always use wrappers sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance() sched: Use clamp() and clamp_val() to make sys_nice() more readable sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups() sched/numa: Fix initialization of sched_domain_topology for NUMA sched: Call select_idle_sibling() when not affine_sd sched: Simplify return logic in sched_read_attr() sched: Simplify return logic in sched_copy_attr() sched: Fix exec_start/task_hot on migrated tasks arm64: Remove TIF_POLLING_NRFLAG metag: Remove TIF_POLLING_NRFLAG sched/idle: Make cpuidle_idle_call() void sched/idle: Reflow cpuidle_idle_call() sched/idle: Delay clearing the polling bit ...