summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-05-18ocfs2: Reset xattr value size after xa_cleanup_value_truncate().Tao Ma
In ocfs2_prepare_xattr_entry, if we fail to grow an existing value, xa_cleanup_value_truncate() will leave the old entry in place. Thus, we reset its value size. However, if we were allocating a new value, we must not reset the value size or we will BUG(). This resolves oss.oracle.com bug 1247. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18Merge branch 'discontig-bg' of git://oss.oracle.com/git/tma/linux-2.6 into ↵Joel Becker
ocfs2-merge-window
2010-05-18Revert "nfsd4: distinguish expired from stale stateids"J. Bruce Fields
This reverts commit 78155ed75f470710f2aecb3e75e3d97107ba8374. We're depending here on the boot time that we use to generate the stateid being monotonic, but get_seconds() is not necessarily. We still depend at least on boot_time being different every time, but that is a safer bet. We have a few reports of errors that might be explained by this problem, though we haven't been able to confirm any of them. But the minor gain of distinguishing expired from stale errors seems not worth the risk. Conflicts: fs/nfsd/nfs4state.c Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18fs/ocfs2/dlm: Use kstrdupJulia Lawall
Use kstrdup when the goal of an allocation is copy a string into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to; expression flag,E1,E2; statement S; @@ - to = kmalloc(strlen(from) + 1,flag); + to = kstrdup(from, flag); ... when != \(from = E1 \| to = E1 \) if (to==NULL || ...) S ... when != \(from = E2 \| to = E2 \) - strcpy(to, from); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18fs/ocfs2/dlm: Drop memory allocation castJulia Lawall
Drop cast on the result of kmalloc and similar functions. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; @@ - (T *) (\(kmalloc\|kzalloc\|kcalloc\|kmem_cache_alloc\|kmem_cache_zalloc\| kmem_cache_alloc_node\|kmalloc_node\|kzalloc_node\)(...)) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18Ocfs2: Optimize punching-hole code.Tristan Ye
This patch simplifies the logic of handling existing holes and skipping extent blocks and removes some confusing comments. The patch survived the fill_verify_holes testcase in ocfs2-test. It also passed my manual sanity check and stress tests with enormous extent records. Currently punching a hole on a file with 3+ extent tree depth was really a performance disaster. It can even take several hours, though we may not hit this in real life with such a huge extent number. One simple way to improve the performance is quite straightforward. From the logic of truncate, we can punch the hole from hole_end to hole_start, which reduces the overhead of btree operations in a significant way, such as tree rotation and moving. Following is the testing result when punching hole from 0 to file end in bytes, on a 1G file, 1G file consists of 256k extent records, each record cover 4k data(just one cluster, clustersize is 4k): =========================================================================== * Original punching-hole mechanism: =========================================================================== I waited 1 hour for its completion, unfortunately it's still ongoing. =========================================================================== * Patched punching-hode mechanism: =========================================================================== real 0m2.518s user 0m0.000s sys 0m2.445s That means we've gained up to 1000 times improvement on performance in this case, whee! It's fairly cool. and it looks like that performance gain will be raising when extent records grow. The patch was based on my former 2 patches, which were about truncating codes optimization and fixup to handle CoW on punching hole. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public.Tristan Ye
The original idea to pull ocfs2_find_cpos_for_left_leaf() out of alloc.c is to benefit punching-holes optimization patch, it however, can also be referred by other funcs in the future who want to do the same job. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing.Tristan Ye
Based on the previous patch of optimizing truncate, the bugfix for refcount trees when punching holes can be fairly easy and straightforward since most of work we should take into account for refcounting have been completed already in ocfs2_remove_btree_range(). This patch performs CoW for refcounted extents when a hole being punched whose start or end offset were in the middle of a cluster, which means partial zeroing of the cluster will be performed soon. The patch has been tested fixing the following bug: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1216 Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead.Tristan Ye
Truncate is just a special case of punching holes(from new i_size to end), we therefore could take advantage of the existing ocfs2_remove_btree_range() to reduce the comlexity and redundancy in alloc.c. The goal here is to make truncate more generic and straightforward. Several functions only used by ocfs2_commit_truncate() will smiply be removed. ocfs2_remove_btree_range() was originally used by the hole punching code, which didn't take refcount trees into account (definitely a bug). We therefore need to change that func a bit to handle refcount trees. It must take the refcount lock, calculate and reserve blocks for refcount tree changes, and decrease refcounts at the end. We replace ocfs2_lock_allocators() here by adding a new func ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to reserve. This will not hurt any other code using ocfs2_remove_btree_range() (such as dir truncate and hole punching). I merged the following steps into one patch since they may be logically doing one thing, though I know it looks a little bit fat to review. 1). Remove redundant code used by ocfs2_commit_truncate(), since we're moving to ocfs2_remove_btree_range anyway. 2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of accepting some extra blocks to reserve. 3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our needs. It's safe to do this since it's only being called by truncate. 4). Change ocfs2_remove_btree_range() a bit to take refcount case into account. 5). Finally, we change ocfs2_commit_truncate() to call ocfs2_remove_btree_range() in a proper way. The patch has been tested normally for sanity check, stress tests with heavier workload will be expected. Based on this patch, fixing the punching holes bug will be fairly easy. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18nfsd: safer initialization order in find_file()Pavel Emelyanov
The alloc_init_file() first adds a file to the hash and then initializes its fi_inode, fi_id and fi_had_conflict. The uninitialized fi_inode could thus be erroneously checked by the find_file(), so move the hash insertion lower. The client_mutex should prevent this race in practice; however, we eventually hope to make less use of the client_mutex, so the ordering here is an accident waiting to happen. I didn't find whether the same can be true for two other fields, but the common sense tells me it's better to initialize an object before putting it into a global hash table :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18nfs4: minor callback code simplification, commentJ. Bruce Fields
Note the position in the version array doesn't have to match the actual rpc version number--to me it seems clearer to maintain the distinction. Also document choice of rpc callback version number, as discussed in e.g. http://www.ietf.org/mail-archive/web/nfsv4/current/msg07985.html and followups. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback() sched, wait: Use wrapper functions sched: Remove a stale comment ondemand: Make the iowait-is-busy time a sysfs tunable ondemand: Solve a big performance issue by counting IOWAIT time as busy sched: Intoduce get_cpu_iowait_time_us() sched: Eliminate the ts->idle_lastupdate field sched: Fold updating of the last_update_time_info into update_ts_time_stats() sched: Update the idle statistics in get_cpu_idle_time_us() sched: Introduce a function to update the idle statistics sched: Add a comment to get_cpu_idle_time_us() cpu_stop: add dummy implementation for UP sched: Remove rq argument to the tracepoints rcu: need barrier() in UP synchronize_sched_expedited() sched: correctly place paranioa memory barriers in synchronize_sched_expedited() sched: kill paranoia check in synchronize_sched_expedited() sched: replace migration_thread with cpu_stop stop_machine: reimplement using cpu_stop cpu_stop: implement stop_cpu[s]() sched: Fix select_idle_sibling() logic in select_task_rq_fair() ...
2010-05-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (23 commits) cifs: fix noserverino handling when unix extensions are enabled cifs: don't update uniqueid in cifs_fattr_to_inode cifs: always revalidate hardlinked inodes when using noserverino [CIFS] drop quota operation stubs cifs: propagate cifs_new_fileinfo() error back to the caller cifs: add comments explaining cifs_new_fileinfo behavior cifs: remove unused parameter from cifs_posix_open_inode_helper() [CIFS] Remove unused cifs_oplock_cachep cifs: have decode_negTokenInit set flags in server struct cifs: break negotiate protocol calls out of cifs_setup_session cifs: eliminate "first_time" parm to CIFS_SessSetup [CIFS] Fix lease break for writes cifs: save the dialect chosen by server cifs: change && to || cifs: rename "extended_security" to "global_secflags" cifs: move tcon find/create into separate function cifs: move SMB session creation code into separate function cifs: track local_nls in volume info [CIFS] Allow null nd (as nfs server uses) on create [CIFS] Fix losing locks during fork() ...
2010-05-18Merge branch 'next' into for-linusJames Morris
2010-05-17ceph: all allocation functions should get gfp_maskYehuda Sadeh
This is essential, as for the rados block device we'll need to run in different contexts that would need flags that are other than GFP_NOFS. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: specify max_bytes on readdir repliesSage Weil
Specify max bytes in request to bound size of reply. Add associated mount option with default value of 512 KB. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: cleanup pool op stringsSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: Use kzallocJulia Lawall
Use kzalloc rather than the combination of kmalloc and memset. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,size,flags; statement S; @@ -x = kmalloc(size,flags); +x = kzalloc(size,flags); if (x == NULL) S -memset(x, 0, size); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: use common helper for aborted dir request invalidationSage Weil
We invalidate I_COMPLETE and dentry leases in two places: on aborted mds request and on request replay. Use common helper to avoid duplicate code. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: cope with out of order (unsafe after safe) mds replySage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: save peer feature bits in connection structureSage Weil
These are used for adjusting behavior, such as conditionally encoding a newer message format. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: resync headers with userlandSage Weil
Notable changes include pool op defines and types, FLOCK feature bit, and new CMPXATTR osd ops. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: use ceph. prefix for virtual xattrsSage Weil
Drop the 'user.' prefix and use just 'ceph.' for fs virtual xattrs. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: throw out dirty caps metadata, data on session teardownSage Weil
The remove_session_caps() helper is called when an MDS closes out our session (either normally, or as a result of a failed reconnect), and when we tear down state for umount. If we remove the last cap, and there are no cap migrations in progress, then there is little hope of us flushing out that data to the mds (without heroic efforts to reconnect and flush). So, to avoid leaving inodes pinned (due to dirty state) and crashing after umount, throw out dirty caps state and unpin the inodes. Print a warning to the console so we know something was lost. NOTE: Although we drop wrbuffer refs, we don't actually mark pages clean; maybe a truncate should be queued? Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: attempt mds reconnect if mds closes our sessionSage Weil
Currently, if our session is closed (due to a timeout, or explicit close, or whatever), we just sit there doing nothing unless/until the MDS restarts, at which point we try to reconnect. Change client to attempt an immediate reconnect if our session is closed. Note that currently the MDS doesn't support this, and our attempt will fail. We'll get a session CLOSE, our caps and dirty cap state will be dropped, and the client will be free to attempt to reconnect. That's clearly not as nice as a successful reconnect, but it at least allows us to try to carry on, and in the future the MDS will support a reconnect and we will fare better. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: clean up send_mds_reconnect interfaceSage Weil
Pass a ceph_mds_session, since the caller has it. Remove the dead code for sending empty reconnects. It used to be used when the MDS contacted _us_ to solicit a reconnect, and we could reply saying "go away, I have no session." Now we only send reconnects based on the mds map, and only when we do in fact have an open session. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: wait for mds OPEN reply to indicate reconnect successSage Weil
We used to infer reconnect success by watching the MDS state, essentially assuming that hearing nothing meant things were ok. That wasn't particularly reliable. Instead, the MDS replies with an explicit OPEN message to indicate success. Strictly speaking, this is a protocol change, but it is a backwards compatible one that does not break new clients + old servers or old clients + new servers. At least not yet. Drop unused @all argument from kick_requests while we're at it. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: only send cap releases when mds is OPEN|HUNGSage Weil
On OPENING we shouldn't have any caps (or releases). On CLOSING, we should wait until we succeed (and throw it all out), or don't (and are OPEN again). On RECONNECTING we can wait until we are OPEN. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: dicard cap releases on mds restartSage Weil
If the MDS restarts, the expire caps state is no longer shared, and can be thrown out. Caps state will be rebuilt on the MDS during the reconnect process that follows. Zero out any release messages and adjust the release counter accordingly. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: make mon client statfs handling more genericYehuda Sadeh
This is being done so that we could reuse the statfs infrastructure with other requests that return values. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: drop src address(es) from message header [new protocol feature]Sage Weil
The CEPH_FEATURE_NOSRCADDR protocol feature avoids putting the full source address in each message header (twice). This patch switches the client to the new scheme, and _requires_ this feature on the server. The server will support both the old and new schemes. That means an old client will work with a new server, but a new client will not work with an old server. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: cleanup: remove unused assignementDan Carpenter
We don't ever use "dirty" so we can remove it. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: clean up cap release loop vs spinlockSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: name bdi ceph-%d instead of major:minorSage Weil
The bdi_setup_and_register() helper doesn't help us since we bdi_init() in create_client() and bdi_register() only when sget() succeeds. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: skip mds sync on forced unmountSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: adjust masked struct_v variable namesSage Weil
Reported-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: clean up mount options, ->show_options()Sage Weil
Ensure all options are included in /proc/mounts. Some cleanup. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: set dn offset when splicedSage Weil
We want to assign an offset when the dentry goes from null to linked, which is always done by splice_dentry(). Notably, we should NOT assign an offset when a dentry is first created and is still null. BUG if we try to splice a non-null dentry (we shouldn't). Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: don't clobber i_max_offset on already complete dirSage Weil
This can screw up offsets assigned to new dentries and break dcache readdir results. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: skip set_dentry_offset work if directory not I_COMPLETESage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: set next_offset on readdir finishSage Weil
Set next_offset to 2 (always 2!), not 0, on readdir finish. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: listxattr should compare version by >=Henry C Chang
If the version hasn't changed, don't rebuild the index. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: fix xattr dangling pointer / double freeSage Weil
If we use the xattr_blob, clear the pointer so we don't release the memory at the bottom of the fuction. Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: close messenger raceSage Weil
Simplify messenger locking, and close race between ceph_con_close() setting the CLOSED bit and con_work() checking the bit, then taking the mutex. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: name msgpools; useful error messagesSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: fix memory leak due to possible dentry init raceSage Weil
Free dentry_info in error path. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: include auth method in error messagesSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: osdtimeout=0 for now timeoutSage Weil
Allow the osd reset timeout to be disabled. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: d_obtain_alias() returns ERR_PTR()Dan Carpenter
d_obtain_alias() doesn't return NULL, it returns an ERR_PTR(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: wake up mount thread when getting osdmapYehuda Sadeh
Now that the mount thread waits for the osdmap, it needs to be awaken. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>