summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-05-10autofs4-2.6.34-rc1 - fix link_count usageIan Kent
After commit 1f36f774b2 ("Switch !O_CREAT case to use of do_last()") in 2.6.34-rc1 autofs direct mounts stopped working. This is caused by current->link_count being 0 when ->follow_link() is called from do_filp_open(). I can't work out why this hasn't been seen before Als patch series. This patch removes the autofs dependence on current->link_count. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-10cifs: remove unused parameter from cifs_posix_open_inode_helper()Suresh Jayaraman
..a left over from the commit 3321b791b2e8897323f8c044a0c77ff25781381c. Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-05-10Merge branch 'master' of ↵David Woodhouse
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/mtd/mtdcore.c Pull in the bdi fixes and ARM platform changes that other outstanding patches depend on.
2010-05-10GFS2: Fix writing to non-page aligned gfs2_quota structuresAbhijith Das
This is the upstream fix for this bug. This patch differs from the RHEL5 fix (Red Hat bz #555754) which simply writes to the 8-byte value field of the quota. In upstream quota code, we're required to write the entire quota (88 bytes) which can be split across a page boundary. We check for such quotas, and read/write the two parts from/to the corresponding pages holding these parts. With this patch, I don't see the bug anymore using the reproducer in Red Hat bz 555754. I successfully ran a couple of simple tests/mounts/ umounts and it doesn't seem like this patch breaks anything else. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-10fix "seperate" typos in commentsAnand Gadiyar
s/seperate/separate Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-05-10nilfs2: disallow remount of snapshot from/to a regular mountRyusuke Konishi
Snapshots and regular ro/rw mounts are essentially-different within the meaning whether the checkpoint is static or not and is marked with a snapshot flag or not. The current implemenation, however, allows to remount a snapshot to a regular rw-mount if the checkpoint number equals the latest one. This transition is actually impossible since changing a checkpoint to a snapshot makes another checkpoint, thus the condition is never satisfied. This fixes the weird state of affairs, and specifically separates snapshots and regular rw/ro-mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: use huge_encode_dev/huge_decode_devRyusuke Konishi
This replaces uses of new_encode_dev/new_decode_dev with their 64-bit counterparts, huge_encode_dev/huge_decode_dev respectively. This is just for clarification and has no impact on the disk format. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: update comment on deactivate_super at nilfs_get_sbRyusuke Konishi
deactivate_super was replaced with deactivate_locked_super, but the comment of nilfs_get_sb remain unchanged. This renews the comment. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: replace MS_VERBOSE with MS_SILENTRyusuke Konishi
MS_VERBOSE is deprecated. This replaces it with MS_SILENT in reference to get_sb_bdev function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: add missing initialization of s_modeRyusuke Konishi
An fmode_t argument is passed to kill_block_super() through s_mode member of the super_block structure. This is used to release the block device with the same mode, however, nilfs does not set s_mode anywhere. This modifies nilfs_get_sb function to properly initialize the s_mode member. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusiveRyusuke Konishi
The second argument of open_bdev_exclusive/close_bdev_exclusive takes fmode_t flags instead of mount flags. This fixes the misuse. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: use checkpoint number instead of timestamp to select super blockRyusuke Konishi
Nilfs maintains two super blocks, and selects the new one on mount time if they both have valid checksums and their timestamps differ. However, this has potential for mis-selection since the system clock may be rewinded and the resolution of the timestamps is not high. Usually this doesn't become an issue because both super blocks are updated at the same time when the file system is unmounted. Even if the file system wasn't unmounted cleanly, the roll-forward recovery will find the proper log which stores the latest super root. Thus, the issue can appear only if update of one super block fails and the clock happens to be rewinded. This fixes the issue by using checkpoint numbers instead of timestamps to pick the super block storing the location of the latest log. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: add missing endian conversion on super block magic numberRyusuke Konishi
This adds missing endian conversions in comparision of the magic number of super blocks. It was coincidence that prior versions didn't incur problems; the upper byte of the magic number happened to be equal to the lower byte. But, semantically it's wrong to depend on this. This won't change anything else nor suffer any compatibility issues. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: make nilfs_sc_*_ops staticRyusuke Konishi
This kills the following sparse warnings: fs/nilfs2/segment.c:567:28: warning: symbol 'nilfs_sc_file_ops' was not declared. Should it be static? fs/nilfs2/segment.c:617:28: warning: symbol 'nilfs_sc_dat_ops' was not declared. Should it be static? fs/nilfs2/segment.c:625:28: warning: symbol 'nilfs_sc_dsync_ops' was not declared. Should it be static? Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: add kernel doc comments to persistent object allocator functionsRyusuke Konishi
The implementation of persistent object allocator (alloc.c) is poorly documented. This adds kernel doc style comments on that functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: change sc_timer from a pointer to an embedded one in struct ↵Li Hong
nilfs_sc_info In nilfs_segctor_thread(), timer is a local variable allocated on stack. Its address can't be set to sci->sc_timer and passed in several procedures. It works now by chance, just because other procedures are called by nilfs_segctor_thread() directly or indirectly and the stack hasn't been deallocated yet. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: remove nilfs_segctor_init() in segment.cLi Hong
There are only two lines of code in nilfs_segctor_init(). From a logic design view, the first line 'sci->sc_seq_done = sci->sc_seq_request;' should be put in nilfs_segctor_new(). Even in nilfs_segctor_new(), this initialization is needless because sci is kzalloc-ed. So nilfs_segctor_init() is only a wrap call to nilfs_segctor_start_thread(). Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: insert checkpoint number in segment summary headerRyusuke Konishi
This adds a field to record the latest checkpoint number in the nilfs_segment_summary structure. This will help to recover the latest checkpoint number from logs on disk. This field is intended for crucial cases in which super blocks have lost pointer to the latest log. Even though this will change the disk format, both backward and forward compatibility is preserved by a size field prepared in the segment summary header. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: add a print message after loading nilfs2Li Hong
Printing a message after loading a file system is a practice. Add this to provide a better user-friendly experience. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: cleanup multi kmem_cache_{create,destroy} codeLi Hong
This cleanup patch gives several improvements: - Moving all kmem_cache_{create_destroy} calls into one place, which removes some small function calls, cleans up error check code and clarify the logic. - Mark all initial code in __init section. - Remove some very obvious comments. - Adjust some declarations. - Fix some space-tab issues. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: move out checksum routines to segment buffer codeRyusuke Konishi
This moves out checksum routines in log writer to segbuf.c for cleanup. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: move pointer to super root block into logsRyusuke Konishi
This moves a pointer to buffer storing super root block to each log buffer from nilfs_sc_info struct for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: change default of 'errors' mount option to 'remount-ro' modeRyusuke Konishi
Like ext3, nilfs has 'errors' mount option to allow specifying desired behavior on severe errors. Currently, the default action is 'errors=continue' and has potential to advance filesystem corruption for severe errors. This will change the action to 'errors=remount-ro' to avoid the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: Combine nilfs_btree_release_path() and nilfs_btree_free_path()Li Hong
nilfs_btree_release_path() and nilfs_btree_free_path() are bound into each other tightly. Make them into one procedure to clearify the logic and avoid some misusages. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10nilfs2: Combine nilfs_btree_alloc_path() and nilfs_btree_init_path()Li Hong
nilfs_btree_alloc_path() and nilfs_btree_init_path() are bound into each other tightly. Make them into one procedure to clearify the logic and avoid some misusages. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-07nfsd4: fix bare destroy_session null dereferenceJ. Bruce Fields
It's legal to send a DESTROY_SESSION outside any session (as the only operation in a compound), in which case cstate->session will be NULL; check for that case. While we're at it, move these checks into a separate helper function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-07Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix RCU issues in the NFSv4 delegation code NFSv4: Fix the locking in nfs_inode_reclaim_delegation()
2010-05-07logfs: handle powerfail on NAND flashJoern Engel
The write buffer may not have been written and may no longer be written due to an interrupted write in the affected page. Signed-off-by: Joern Engel <joern@logfs.org>
2010-05-07logfs: handle errors from get_mtd_device()Dan Carpenter
The get_mtd_device() function returns error pointers on failure and if we don't handle it, it leads to a crash. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joern Engel <joern@logfs.org>
2010-05-07logfs: remove unused variableJoern Engel
Signed-off-by: Joern Engel <joern@logfs.org>
2010-05-06GFS2: Add some useful messagesSteven Whitehouse
The following patch adds a message to indicate when barriers have been disabled due to a block device which doesn't support them. You could already tell this via the mount options in /proc/mounts, but all the other filesystems also log a message at the same time. Also, the same mechanisms are used to indicate when the lock demote interface has been used (only ever used for debugging) which is a request from our support team. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-05ceph: don't use writeback_control in writepages completionSage Weil
The ->writepages writeback_control is not still valid in the writepages completion. We were touching it solely to adjust pages_skipped when there was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials), causing an oops in the writeback code shortly thereafter. Updating pages_skipped on error isn't correct anyway, so let's just rip out this (clearly broken) code to pass the wbc to the completion. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-05ocfs2/dlm: Increase o2dlm lockres hash sizeSunil Mushran
Lockres hash size of 16KB is far too small for large filesystems (where we have hundreds of thousands of lock resources stored in the table). This patch increases it to 128KB. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: Make ocfs2_extend_trans() really extend.Tao Ma
In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's blocks. But if jbd2_journal_extend() fails, it will only restart with the the new number of blocks. This tends to be awkward since in most cases we want additional reserved blocks. It makes our code harder to mantain since the caller can't be sure all the original blocks will not be accessed and dirtied again. There are 15 callers of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add h_buffer_credits before they call ocfs2_extend_trans(). This makes ocfs2_extend_trans() really extend atop the original block count. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2/trivial: Code cleanup for allocation reservation.Tao Ma
Two tiny cleanup for allocation reservation. 1. Remove some extra codes in ocfs2_local_alloc_find_clear_bits. 2. Remove an unuseful variables in ocfs2_find_resv_lhs. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: make ocfs2_adjust_resv_from_alloc simple.Tao Ma
When we allocate some bits from the reservation, we always allocate from the r_start(see ocfs2_resmap_resv_bits). So there should be no reason to check between r_start and start. And I don't think we will change this behaviour later by allocating from some bits after r_start. Why not make ocfs2_adjust_resv_from_alloc simple for now? The only chance we have to adjust the reservation is when we haven't reached the end. With this patch, the function is more readable. Note: btw, this patch also fixes an original bug in the function which I haven't found before. if (end < ocfs2_resv_end(resv)) rhs = end - ocfs2_resv_end(resv); This code is of course buggy. ;) Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: Make nointr a default mount optionSunil Mushran
OCFS2 has never really supported intr. This patch acknowledges this reality and makes nointr the default mount option. In a later patch, we intend to support intr. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICESunil Mushran
o2dlm join and leave messages are more than informational as they are required for debugging locking issues. This patch changes them from KERN_INFO to KERN_NOTICE. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05o2net: log socket state changesSrinivas Eeda
This patch logs socket state changes that lead to socket shutdown. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: print node # when tcp failsWengang Wang
Print the node number of a peer node if sending it a message failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: Add dir_resv_level mount optionMark Fasheh
The default behavior for directory reservations stays the same, but we add a mount option so people can tweak the size of directory reservations according to their workloads. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: change default reservation window sizesMark Fasheh
The default reservation size of 4 (32-bit windows) is a bit too ambitious. Scale it back to 16 bits (resv_level=2). I have been testing various sizes on a 4-node cluster which runs a mixed workload that is heavily threaded. With a 256MB local alloc, I get *roughly* the following levels of average file fragmentation: resv_level=0 70% resv_level=1 21% resv_level=2 23% resv_level=3 24% resv_level=4 60% resv_level=5 did not test resv_level=6 60% resv_level=2 seemed like a good compromise between not letting windows be too small, but not so big that heavier workloads will immediately suffer without tuning. This patch also change the behavior of directory reservations - they now track file reservations. The previous compromise of giving directory windows only 8 bits wound up fragmenting more at some window sizes because file allocations had smaller unused windows to poach from. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: increase the default size of local alloc windowsMark Fasheh
I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: clean up localalloc mount option size parsingMark Fasheh
This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: remove ocfs2_local_alloc_in_range()Mark Fasheh
Inodes are always allocated from the global bitmap now so we don't need this any more. Also, the existing implementation bounces reservations around needlessly. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: allocate btree internal block groups from the global bitmapMark Fasheh
Otherwise, the need for a very large contiguous allocation tends to wreak havoc on many inode allocation reservations on the local alloc, thus ruining any chances for contiguousness. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: use allocation reservations for directory dataMark Fasheh
Use the reservations system for unindexed dir tree allocations. We don't bother with the indexed tree as reads from it are mostly random anyway. Directory reservations are marked seperately, to allow the reservations code a chance to optimize their window sizes. This patch allocates only 8 bits for directory windows as they generally are not expected to grow as quickly as file data. Future improvements to dir window sizing can trivially be made. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: use allocation reservations during file writeMark Fasheh
Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: allocation reservationsMark Fasheh
This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: Make ocfs2_journal_dirty() void.Joel Becker
jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>