summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-04-30fs/dcache.c: add cond_resched() to shrink_dcache_parent()Greg Thelen
Call cond_resched() in shrink_dcache_parent() to maintain interactivity. Before this patch: void shrink_dcache_parent(struct dentry * parent) { while ((found = select_parent(parent, &dispose)) != 0) shrink_dentry_list(&dispose); } select_parent() populates the dispose list with dentries which shrink_dentry_list() then deletes. select_parent() carefully uses need_resched() to avoid doing too much work at once. But neither shrink_dcache_parent() nor its called functions call cond_resched(). So once need_resched() is set select_parent() will return single dentry dispose list which is then deleted by shrink_dentry_list(). This is inefficient when there are a lot of dentry to process. This can cause softlockup and hurts interactivity on non preemptable kernels. This change adds cond_resched() in shrink_dcache_parent(). The benefit of this is that need_resched() is quickly cleared so that future calls to select_parent() are able to efficiently return a big batch of dentry. These additional cond_resched() do not seem to impact performance, at least for the workload below. Here is a program which can cause soft lockup if other system activity sets need_resched(). int main() { struct rlimit rlim; int i; int f[100000]; char buf[20]; struct timeval t1, t2; double diff; /* cleanup past run */ system("rm -rf x"); /* boost nfile rlimit */ rlim.rlim_cur = 200000; rlim.rlim_max = 200000; if (setrlimit(RLIMIT_NOFILE, &rlim)) err(1, "setrlimit"); /* make directory for files */ if (mkdir("x", 0700)) err(1, "mkdir"); if (gettimeofday(&t1, NULL)) err(1, "gettimeofday"); /* populate directory with open files */ for (i = 0; i < 100000; i++) { snprintf(buf, sizeof(buf), "x/%d", i); f[i] = open(buf, O_CREAT); if (f[i] == -1) err(1, "open"); } /* close some of the files */ for (i = 0; i < 85000; i++) close(f[i]); /* unlink all files, even open ones */ system("rm -rf x"); if (gettimeofday(&t2, NULL)) err(1, "gettimeofday"); diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) - ((double)t1.tv_sec * 1000000 + t1.tv_usec)); printf("done: %g elapsed\n", diff/1e6); return 0; } Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30fs/block_dev.c: no need to check inode->i_bdev in bd_forget()Yan Hong
Its only caller evict() has promised a non-NULL inode->i_bdev. Signed-off-by: Yan Hong <clouds.yan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30inotify: invalid mask should return a error number but not set itZhao Hongjiang
When we run the crackerjack testsuite, the inotify_add_watch test is stalled. This is caused by the invalid mask 0 - the task is waiting for the event but it never comes. inotify_add_watch() should return -EINVAL as it did before commit 676a0675cf92 ("inotify: remove broken mask checks causing unmount to be EINVAL"). That commit removes the invalid mask check, but that check is needed. Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call. If none are set, just return -EINVAL. Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger the problem that above commit fixed. [akpm@linux-foundation.org: fix build] Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jim Somerville <Jim.Somerville@windriver.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Eric Paris <eparis@parisplace.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem update from James Morris: "Just some minor updates across the subsystem" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima: eliminate passing d_name.name to process_measurement() TPM: Retry SaveState command in suspend path tpm/tpm_i2c_infineon: Add small comment about return value of __i2c_transfer tpm/tpm_i2c_infineon.c: Add OF attributes type and name to the of_device_id table entries tpm_i2c_stm_st33: Remove duplicate inclusion of header files tpm: Add support for new Infineon I2C TPM (SLB 9645 TT 1.2 I2C) char/tpm: Convert struct i2c_msg initialization to C99 format drivers/char/tpm/tpm_ppi: use strlcpy instead of strncpy tpm/tpm_i2c_stm_st33: formatting and white space changes Smack: include magic.h in smackfs.c selinux: make security_sb_clone_mnt_opts return an error on context mismatch seccomp: allow BPF_XOR based ALU instructions. Fix NULL pointer dereference in smack_inode_unlink() and smack_inode_rmdir() Smack: add support for modification of existing rules smack: SMACK_MAGIC to include/uapi/linux/magic.h Smack: add missing support for transmute bit in smack_str_from_perm() Smack: prevent revoke-subject from failing when unseen label is written to it tomoyo: use DEFINE_SRCU() to define tomoyo_ss tomoyo: use DEFINE_SRCU() to define tomoyo_ss
2013-04-30NFSD: SECINFO doesn't handle unsupported pseudoflavors correctlyChuck Lever
If nfsd4_do_encode_secinfo() can't find GSS info that matches an export security flavor, it assumes the flavor is not a GSS pseudoflavor, and simply puts it on the wire. However, if this XDR encoding logic is given a legitimate GSS pseudoflavor but the RPC layer says it does not support that pseudoflavor for some reason, then the server leaks GSS pseudoflavor numbers onto the wire. I confirmed this happens by blacklisting rpcsec_gss_krb5, then attempted a client transition from the pseudo-fs to a Kerberos-only share. The client received a flavor list containing the Kerberos pseudoflavor numbers, rather than GSS tuples. The encoder logic can check that each pseudoflavor in flavs[] is less than MAXFLAVOR before writing it into the buffer, to prevent this. But after "nflavs" is written into the XDR buffer, the encoder can't skip writing flavor information into the buffer when it discovers the RPC layer doesn't support that flavor. So count the number of valid flavors as they are written into the XDR buffer, then write that count into a placeholder in the XDR buffer when all recognized flavors have been encoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()Chuck Lever
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30nfsd: make symbol nfsd_reply_cache_shrinker staticWei Yongjun
symbol 'nfsd_reply_cache_shrinker' only used within this file. It should be static. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30nfsd4: don't remap EISDIR errors in renameJ. Bruce Fields
We're going out of our way here to remap an error to make rfc 3530 happy--but the rfc itself (nor rfc 1813, which has similar language) gives no justification. And disagrees with local filesystem behavior, with Linux and posix man pages, and knfsd's implemented behavior for v2 and v3. And the documented behavior seems better, in that it gives a little more information--you could implement the 3530 behavior using the posix behavior, but not the other way around. Also, the Linux client makes no attempt to remap this error in the v4 case, so it can end up just returning EEXIST to the application in a case where it should return EISDIR. So honestly I think the rfc's are just buggy here--or in any case it doesn't see worth the trouble to remap this error. Reported-by: Frank S Filz <ffilz@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30Merge tag 'dlm-3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm update from David Teigland: "This includes a single patch to avoid fully processing a posix unlock from close when no posix locks exist on the file" * tag 'dlm-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: avoid unnecessary posix unlock
2013-04-30Merge tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes and cleanups from Trond Myklebust: - NLM: stable fix for NFSv2/v3 blocking locks - NFSv4.x: stable fixes for the delegation recall error handling code - NFSv4.x: Security flavour negotiation fixes and cleanups by Chuck Lever - SUNRPC: A number of RPCSEC_GSS fixes and cleanups also from Chuck - NFSv4.x assorted state management and reboot recovery bugfixes - NFSv4.1: In cases where we have already looked up a file, and hold a valid filehandle, use the new open-by-filehandle operation instead of opening by name. - Allow the NFSv4.1 callback thread to freeze - NFSv4.x: ensure that file unlock waits for readahead to complete - NFSv4.1: ensure that the RPC layer doesn't override the NFS session table size negotiation by limiting the number of slots. - NFSv4.x: Fix SETATTR spec compatibility issues * tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) NFSv4: Warn once about servers that incorrectly apply open mode to setattr NFSv4: Servers should only check SETATTR stateid open mode on size change NFSv4: Don't recheck permissions on open in case of recovery cached open NFSv4.1: Don't do a delegated open for NFS4_OPEN_CLAIM_DELEG_CUR_FH modes NFSv4.1: Use the more efficient open_noattr call for open-by-filehandle NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE NFSv4: Ensure that we clear the NFS_OPEN_STATE flag when appropriate LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot NFSv4: Ensure the LOCK call cannot use the delegation stateid NFSv4: Use the open stateid if the delegation has the wrong mode nfs: Send atime and mtime as a 64bit value NFSv4: Record the OPEN create mode used in the nfs4_opendata structure NFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports SUNRPC: Allow rpc_create() to request that TCP slots be unlimited SUNRPC: Fix a livelock problem in the xprt->backlog queue NFSv4: Fix handling of revoked delegations by setattr NFSv4 release the sequence id in the return on close case nfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks NFS: Ensure that NFS file unlock waits for readahead to complete NFS: Add functionality to allow waiting on all outstanding reads to complete ...
2013-04-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds
Pull GFS2 updates from Steven Whitehouse: "There is not a whole lot of change this time - there are some further changes which are in the works, but those will be held over until next time. Here there are some clean ups to inode creation, the addition of an origin (local or remote) indicator to glock demote requests, removal of one of the remaining GFP_NOFAIL allocations during log flushes, one minor clean up, and a one liner bug fix." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Flush work queue before clearing glock hash tables GFS2: Add origin indicator to glock demote tracing GFS2: Add origin indicator to glock callbacks GFS2: replace gfs2_ail structure with gfs2_trans GFS2: Remove vestigial parameter ip from function rs_deltree GFS2: Use gfs2_dinode_out() in the inode create path GFS2: Remove gfs2_refresh_inode from inode creation path GFS2: Clean up inode creation path
2013-04-30xfs: Teach dquot recovery about CONFIG_XFS_QUOTADave Chinner
Fix a build error when CONFIG_XFS_QUOTA=n: fs/built-in.o: In function `xlog_recovery_validate_buf_type': /home/dave/src/build/x86-64/xfsdev/fs/xfs/xfs_log_recover.c:1948: undefined reference to `xfs_dquot_buf_ops' Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...
2013-04-30Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core timer updates from Ingo Molnar: "The main changes in this cycle's merge are: - Implement shadow timekeeper to shorten in kernel reader side blocking, by Thomas Gleixner. - Posix timers enhancements by Pavel Emelyanov: - allocate timer ID per process, so that exact timer ID allocations can be re-created be checkpoint/restore code. - debuggability and tooling (/proc/PID/timers, etc.) improvements. - suspend/resume enhancements by Feng Tang: on certain new Intel Atom processors (Penwell and Cloverview), there is a feature that the TSC won't stop in S3 state, so the TSC value won't be reset to 0 after resume. This can be taken advantage of by the generic via the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to recover/approximate sleep time, the main (and precise) clocksource can be used. - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many CPUs the file goes beyond 4MB of size and thus the current simplistic seqfile approach fails. Convert /proc/timer_list to a proper seq_file with its own iterator. - Cleanups and refactorings of the core timekeeping code by John Stultz. - International Atomic Clock time is managed by the NTP code internally currently but not exposed externally. Separate the TAI code out and add CLOCK_TAI support and TAI support to the hrtimer and posix-timer code, by John Stultz. - Add deep idle support enhacement to the broadcast clockevents core timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ clockevents feature (which will be utilized by future clockevents driver updates), which allows the use of IRQ affinities to avoid spurious wakeups of idle CPUs - the right CPU with an expiring timer will be woken. - Add new ARM bcm281xx clocksource driver, by Christian Daudt - ... various other fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) clockevents: Set dummy handler on CPU_DEAD shutdown timekeeping: Update tk->cycle_last in resume posix-timers: Remove unused variable clockevents: Switch into oneshot mode even if broadcast registered late timer_list: Convert timer list to be a proper seq_file timer_list: Split timer_list_show_tickdevices posix-timers: Show sigevent info in proc file posix-timers: Introduce /proc/PID/timers file posix timers: Allocate timer id per process (v2) timekeeping: Make sure to notify hrtimers when TAI offset changes hrtimer: Fix ktime_add_ns() overflow on 32bit architectures hrtimer: Add expiry time overflow check in hrtimer_interrupt timekeeping: Shorten seq_count region timekeeping: Implement a shadow timekeeper timekeeping: Delay update of clock->cycle_last timekeeping: Store cycle_last value in timekeeper struct as well ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state timekeeping: Simplify tai updating from do_adjtimex timekeeping: Hold timekeepering locks in do_adjtimex and hardpps timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex() ...
2013-04-30Merge tag 'v3.9' into efi-for-tip2Matt Fleming
Resolve conflicts for Ingo. Conflicts: drivers/firmware/Kconfig drivers/firmware/efivars.c Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-30f2fs: modify the number of issued pages to merge IOsJaegeuk Kim
When testing f2fs on an SSD, I found some 128 page IOs followed by 1 page IO were issued by f2fs_write_node_pages. This means that there were some mishandling flows which degrades performance. Previous f2fs_write_node_pages determines the number of pages to be written, nr_to_write, as follows. 1. The bio_get_nr_vecs returns 129 pages. 2. The bio_alloc makes a room for 128 pages. 3. The initial 128 pages go into one bio. 4. The existing bio is submitted, and a new bio is prepared for the last 1 page. 5. Finally, sync_node_pages submits the last 1 page bio. The problem is from the use of bio_get_nr_vecs, so this patch replace it with max_hw_blocks using queue_max_sectors. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-30f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as ↵Haicheng Li
debug entry. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-30f2fs: fix inconsistent using of NM_WOUT_THRESHOLDHaicheng Li
try_to_free_nats() is usually called with parameter nr_shrink as "nm_i->nat_cnt - NM_WOUT_THRESHOLD" by flush_nat_entries() during checkpointing process. However, this is inconsistent with the actual threshold check as "if (nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD)" , which will ignore the free_nats requests when NM_WOUT_THRESHOLD < nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD So fix the threshold check condition. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-29inotify: convert inotify_add_to_idr() to use idr_alloc_cyclic()Jeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29nfsd: convert nfs4_alloc_stid() to use idr_alloc_cyclic()Jeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat (exportfs): rebuild directory-inode if fat_dget()Namjae Jeon
This patch enables rebuilding of directory inodes which are not present in the cache.This is done by traversing the disk clusters to find the directory entry of the parent directory and using its i_pos to build the inode. The traversal is done by fat_scan_logstart() which is similar to fat_scan() but matches i_pos values instead of names.fat_scan_logstart() needs an inode parameter to work, for which a dummy inode is created by it's caller fat_rebuild_parent(). This dummy inode is destroyed after the traversal completes. All this is done only if the nostale_ro nfs mount option is specified. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat (exportfs): rebuild inode if ilookup() failsNamjae Jeon
If the cache lookups fail,use the i_pos value to find the directory entry of the inode and rebuild the inode.Since this involves accessing the FAT media, do this only if the nostale_ro nfs mount option is specified. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat: restructure export_operationsNamjae Jeon
Define two nfs export_operation structures,one for 'stale_rw' mounts and the other for 'nostale_ro'. The latter uses i_pos as a basis for encoding and decoding file handles. Also, assign i_pos to kstat->ino. The logic for rebuilding the inode is added in the subsequent patches. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat: introduce a helper fat_get_blknr_offset()Namjae Jeon
Introduce helper function to get the block number and offset for a given i_pos value. Use it in __fat_write_inode() now and later on in nfs.c Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat: move fat_i_pos_read to fat.hNamjae Jeon
Move fat_i_pos_read to fat.h so that it can be called from nfs.c in the subsequent patches to encode the file handle. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat: introduce 2 new values for the -o nfs mount optionNamjae Jeon
This patchset eliminates the client side ESTALE errors when a FAT partition exported over NFS has its dentries evicted from the cache. The idea is to find the on-disk location_'i_pos' of the dirent of the inode that has been evicted and use it to rebuild the inode. This patch: Provide two possible values 'stale_rw' and 'nostale_ro' for the -o nfs mount option.The first one allows all file operations but does not reduce ESTALE errors on memory constrained systems. The second one eliminates ESTALE errors but mounts the filesystem as read-only. Not specifying a value defaults to 'stale_rw'. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/buffer.c: remove unnecessary init operation after allocating buffer_head.majianpeng
bh allocation uses kmem_cache_zalloc() so we needn't call 'init_buffer(bh, NULL, NULL)' and perform other set-zero-operations. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/proc/kcore.c: use register_hotmemory_notifier()Andrew Morton
Saves an ifdef, no code size changes Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: allow arch code to control the user page table ceilingHugh Dickins
On architectures where a pgd entry may be shared between user and kernel (e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0. This patch introduces a generic USER_PGTABLES_CEILING that arch code can override. It is the responsibility of the arch code setting the ceiling to ensure the complete freeing of the page tables (usually in pgd_free()). [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: <stable@vger.kernel.org> [3.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, vmalloc: move get_vmalloc_info() to vmalloc.cJoonsoo Kim
Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason that this code must be here and it's implementation needs vmlist_lock and it iterate a vmlist which may be internal data structure for vmalloc. It is preferable that vmlist_lock and vmlist is only used in vmalloc.c for maintainability. So move the code to vmalloc.c Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: make snapshotting pages for stable writes a per-bio operationDarrick J. Wong
Walking a bio's page mappings has proved problematic, so create a new bio flag to indicate that a bio's data needs to be snapshotted in order to guarantee stable pages during writeback. Next, for the one user (ext3/jbd) of snapshotting, hook all the places where writes can be initiated without PG_writeback set, and set BIO_SNAP_STABLE there. We must also flag journal "metadata" bios for stable writeout, since file data can be written through the journal. Finally, the MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get rid of it. [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration] [akpm@linux-foundation.org: teeny cleanup] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs: don't compile in drop_caches.c when CONFIG_SYSCTL=nJosh Triplett
drop_caches.c provides code only invokable via sysctl, so don't compile it in when CONFIG_SYSCTL=n. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29direct-io: submit bio after boundary buffer is added to itJan Kara
Currently, dio_send_cur_page() submits bio before current page and cached sdio->cur_page is added to the bio if sdio->boundary is set. This is actually wrong because sdio->boundary means the current buffer is the last one before metadata needs to be read. So we should rather submit the bio after the current page is added to it. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29direct-io: fix boundary block handlingJan Kara
When we read/write a file sequentially, we will read/write not only the data blocks but also the indirect blocks that may not be physically adjacent to the data blocks. So filesystems set the BH_Boundary flag to submit the previous I/O before reading/writing an indirect block. However the generic direct IO code mishandles buffer_boundary(), setting sdio->boundary before each submit_page_section() call which results in sending only one page bios as underlying code thinks this page is the last in the contiguous extent. So fix the problem by setting sdio->boundary only if the current page is really the last one in the mapped extent. With this patch and "direct-io: submit bio after boundary buffer is added to it" I've measured about 10% throughput improvement of direct IO reads on ext3 with SATA harddrive (from 90 MB/s to 100 MB/s). With ramdisk, the improvement was about 3-fold (from 350 MB/s to 1.2 GB/s). For other filesystems (such as ext4), the improvements won't be as visible because the frequency of BH_Boundary flag being set is much smaller. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/read_write.c: fix generic_file_llseek() commentMing Lei
Commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") has removed i_mutex from generic_file_llseek, so update the comment accordingly. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2/dlm: remove redundant null pointer checkSachin Kamat
kfree on a NULL pointer is a no-op. Remove the redundant null pointer check. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: fix NULL dereference for moving extentsDan Carpenter
We can't dereference "bg" before it has been assigned. GCC should have warned about this but "bg" was initialized to NULL. I've fixed that as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: fix error handling in ocfs2_ioctl_move_extents()Dan Carpenter
Smatch complains that if we hit an error (for example if the file is immutable) then "range" has uninitialized stack data and we copy it to the user. I've re-written the error handling to avoid this problem and make it a little cleaner as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: fix error return code in ocfs2_info_handle_freefrag()Wei Yongjun
Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: delay inode update transactions after verifying the input flagsJeff Liu
There is no need to start the inode update transactions before/while verifying the input flags. As a refinement, this patch delay the transactions utill the pre-check up is ok. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/fscache/stats.c: fix memory leakAnurup m
There is a kernel memory leak observed when the proc file /proc/fs/fscache/stats is read. The reason is that in fscache_stats_open, single_open is called and the respective release function is not called during release. Hence fix with correct release function - single_release(). Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101 Signed-off-by: Anurup m <anurup.m@huawei.com> Cc: shyju pv <shyju.pv@huawei.com> Cc: Sanil kumar <sanil.kumar@huawei.com> Cc: Nataraj m <nataraj.m@huawei.com> Cc: Li Zefan <lizefan@huawei.com> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29Merge branch 'nfs-for-next' of git://linux-nfs.org/~trondmy/nfs-2.6 into ↵J. Bruce Fields
for-3.10 Note conflict: Chuck's patches modified (and made static) gss_mech_get_by_OID, which is still needed by gss-proxy patches. The conflict resolution is a bit minimal; we may want some more cleanup.
2013-04-29proc: Split kcore bits from linux/procfs.h into linux/kcore.hDavid Howells
Split kcore bits from linux/procfs.h into linux/kcore.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> cc: linux-mips@linux-mips.org cc: sparclinux@vger.kernel.org cc: x86@kernel.org cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29Include missing linux/magic.h inclusionsDavid Howells
Include missing linux/magic.h inclusions where the source file is currently expecting to get magic numbers through linux/proc_fs.h. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-efi@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29Include missing linux/slab.h inclusionsDavid Howells
Include missing linux/slab.h inclusions where the source file is currently expecting to get kmalloc() and co. through linux/proc_fs.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-s390@vger.kernel.org cc: sparclinux@vger.kernel.org cc: linux-efi@vger.kernel.org cc: linux-mtd@lists.infradead.org cc: devel@driverdev.osuosl.org cc: x86@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29proc: Delete create_proc_read_entry()David Howells
Delete create_proc_read_entry() as it no longer has any users. Also delete read_proc_t, write_proc_t, the read_proc member of the proc_dir_entry struct and the support functions that use them. This saves a pointer for every PDE allocated. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29fanotify: don't wank with FASYNC on ->release()Al Viro
... it's done already by __fput() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29hppfs: get rid of ->fsync()Al Viro
it has grown by accident - directories there do *not* use page cache, so there's nothing to write. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29hppfs: fix the leaks on close()Al Viro
we need to close the underlying procfs file and free ->private_data Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29new helper: read_code()Al Viro
switch binfmts that use ->read() to that (and to kernel_read() in several cases in binfmt_flat - sure, it's nommu, but still, doing ->read() into kmalloc'ed buffer...) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>