summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-10-02inotify: fix lock ordering wrt do_page_fault's mmap_semNick Piggin
Fix inotify lock order reversal with mmap_sem due to holding locks over copy_to_user. Signed-off-by: Nick Piggin <npiggin@suse.de> Reported-by: "Daniel J Blueman" <daniel.blueman@gmail.com> Tested-by: "Daniel J Blueman" <daniel.blueman@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-30UBIFS: check buffer length when scanning for LPT nodesAdrian Hunter
'is_a_node()' function was reading from a buffer before checking the buffer length, resulting in an OOPS as follows: BUG: unable to handle kernel paging request at f8f74002 IP: [<f8f9783f>] :ubifs:ubifs_unpack_bits+0xca/0x233 *pde = 19e95067 *pte = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ubifs ubi mtdchar bio2mtd mtd brd video output [last unloaded: mtd] Pid: 6414, comm: integck Not tainted (2.6.27-rc6ubifs34 #23) EIP: 0060:[<f8f9783f>] EFLAGS: 00010246 CPU: 0 EIP is at ubifs_unpack_bits+0xca/0x233 [ubifs] EAX: 00000000 EBX: f6090630 ECX: d9badcfc EDX: 00000000 ESI: 00000004 EDI: f8f74002 EBP: d9badcec ESP: d9badcc0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process integck (pid: 6414, ti=d9bac000 task=f727dae0 task.ti=d9bac000) Stack: 00000006 f7306240 00000002 00000000 d9badcfc d9badd00 0000001c 00000000 f6090630 f6090630 f8f74000 d9badd10 f8fa1cc9 00000000 f8f74002 00000000 f8f74002 f60fe128 f6090630 f8f74000 d9badd68 f8fa1e46 00000000 0001e000 Call Trace: [<f8fa1cc9>] ? is_a_node+0x30/0x90 [ubifs] [<f8fa1e46>] ? dbg_check_ltab+0x11d/0x5bd [ubifs] [<f8fa388f>] ? ubifs_lpt_start_commit+0x42/0xed3 [ubifs] [<c038e76a>] ? mutex_unlock+0x8/0xa [<f8f9625d>] ? ubifs_tnc_start_commit+0x1c8/0xedb [ubifs] [<f8f8d90b>] ? do_commit+0x187/0x523 [ubifs] [<c038e76a>] ? mutex_unlock+0x8/0xa [<f8f7ca17>] ? bud_wbuf_callback+0x22/0x28 [ubifs] [<f8f8dd1d>] ? ubifs_run_commit+0x76/0xc0 [ubifs] [<f8f8032c>] ? ubifs_sync_fs+0xd2/0xe6 [ubifs] [<c01a2e97>] ? vfs_quota_sync+0x0/0x17e [<c01a5ba6>] ? quota_sync_sb+0x26/0xbb [<c01a2e97>] ? vfs_quota_sync+0x0/0x17e [<c01a5c5d>] ? sync_dquots+0x22/0x12c [<c0173d1b>] ? __fsync_super+0x19/0x68 [<c0173d75>] ? fsync_super+0xb/0x19 [<c0174065>] ? generic_shutdown_super+0x22/0xe7 [<c01a31fc>] ? vfs_quota_off+0x0/0x5fd [<f8f7cf4d>] ? ubifs_kill_sb+0x31/0x35 [ubifs] [<c01741f9>] ? deactivate_super+0x5e/0x71 [<c0187610>] ? mntput_no_expire+0x82/0xe4 [<c0187905>] ? sys_umount+0x4c/0x2f6 [<c0187bc8>] ? sys_oldumount+0x19/0x1b [<c0103b71>] ? sysenter_do_call+0x12/0x25 ======================= Code: c1 f8 03 8d 04 07 8b 4d e8 89 01 8b 45 e4 89 10 89 d8 89 f1 d3 e8 85 c0 74 07 29 d6 83 fe 20 75 2a 89 d8 83 c4 20 5b 5e 5f 5d EIP: [<f8f9783f>] ubifs_unpack_bits+0xca/0x233 [ubifs] SS:ESP 0068:d9badcc0 ---[ end trace 1f02572436518c13 ]--- Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: correct condition to eliminate unecessary assignmentAdrian Hunter
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: add more debugging messages for LPTAdrian Hunter
Also add debugging checks for LPT size and separate out c->check_lpt_free from unrelated bitfields. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: fix bulk-read handling uptodate pagesAdrian Hunter
Bulk-read skips uptodate pages but this was putting its array index out and causing it to treat subsequent pages as holes. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: improve garbage collectionAdrian Hunter
Make garbage collection try to keep data nodes from the same inode together and in ascending order. This improves performance when reading those nodes especially when bulk-read is used. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: allow for sync_fs when read-onlyAdrian Hunter
sync_fs can be called even if the file system is mounted read-only. Ensure the commit is not run in that case. Reported-by: Zoltan Sogor <weth@inf.u-szeged.hu> Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: commit on sync_fsArtem Bityutskiy
Commit the journal when the FS is sync'ed. This will make statfs provide better free space report. And we anyway advice our users to sync the FS if they want better statfs report. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: correct comment for commit_on_unmountArtem Bityutskiy
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: update dbg_dump_inodeArtem Bityutskiy
'dbg_dump_inode()' is quite outdated and does not print all the fileds. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: fix commentaryArtem Bityutskiy
Znode may refer both data nodes and indexing nodes Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: fix races in bit-fieldsArtem Bityutskiy
We cannot store bit-fields together if the processes which change them may race, unless we serialize them. Thus, move the nospc and nospc_rp bit-fields eway from the mount option/constant bit-fields, to avoid races. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: ensure data read beyond i_size is zeroed out correctlyAdrian Hunter
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: correct key comparisonAdrian Hunter
The comparison was working, but more by accident than design. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: use bit-fields when possibleArtem Bityutskiy
The "bulk_read" and "no_chk_data_crc" have only 2 values - 0 and 1. We already have bit-fields in corresponding data structers, so make "bulk_read" and "no_chk_data_crc" bit-fields as well. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: check data CRC when in error stateArtem Bityutskiy
When UBIFS switches to R/O mode because of an error, it is reasonable to enable data CRC checking. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: improve znode splitting rulesAdrian Hunter
When inserting into a full znode it is split into two znodes. Because data node keys are usually consecutive, it is better to try to keep them together. This patch does a better job of that. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: add no_chk_data_crc mount optionAdrian Hunter
UBIFS read performance can be improved by skipping the CRC check when data nodes are read. This option can be used if the underlying media is considered to be highly reliable. Note that CRCs are always checked for metadata. Read speed on Arm platform with OneNAND goes from 19 MiB/s to 27 MiB/s with data CRC checking disabled. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: add bulk-read facilityAdrian Hunter
Some flash media are capable of reading sequentially at faster rates. UBIFS bulk-read facility is designed to take advantage of that, by reading in one go consecutive data nodes that are also located consecutively in the same LEB. Read speed on Arm platform with OneNAND goes from 17 MiB/s to 19 MiB/s. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: use an IS_ERR test rather than a NULL testJulien Brunel
In case of error, the function kthread_create returns an ERR pointer, but never returns a NULL pointer. So a NULL test that comes before an IS_ERR test should be deleted. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @match_bad_null_test@ expression x, E; statement S1,S2; @@ x = kthread_create(...) ... when != x = E * if (x == NULL) S1 else S2 // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: inline one-line functionsArtem Bityutskiy
'ubifs_get_lprops()' and 'ubifs_release_lprops()' basically wrap mutex lock and unlock. We have them because we want lprops subsystem be separate and as independent as possible. And we planned better locking rules for lprops. Anyway, because they are short, it is better to inline them. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: remove unneeded unlikely()Hirofumi Nakagawa
IS_ERR() macro already has unlikely(), so do not use constructions like 'if (unlikely(IS_ERR())'. Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-30UBIFS: add a print, fix comments and more minor stuffArtem Bityutskiy
This commit adds a reserved pool size print and tweaks the prints to make them look nicer. It also fixes and cleans-up some comments. Additionally, it deletes some blank lines to make the code look a little nicer. In other words, nothing essential. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-09-29nfsd: use nfs client rpc callback programBenny Halevy
since commit ff7d9756b501744540be65e172d27ee321d86103 "nfsd: use static memory for callback program and stats" do_probe_callback uses a static callback program (NFS4_CALLBACK) rather than the one set in clp->cl_callback.cb_prog as passed in by the client in setclientid (4.0) or create_session (4.1). This patches introduces rpc_create_args.prognumber that allows overriding program->number when creating rpc_clnt. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: do_probe_callback should not clear rpc statsBenny Halevy
Now that cb_stats are static (since commit ff7d9756b501744540be65e172d27ee321d86103) there's no need to clear them. Initially I thought it might make sense to do that every callback probing but since the stats are per-program and they are shared between possibly several client callback instances, zeroing them out seems like the wrong thing to do. Note that that commit also introduced a bug since stats.program is also being cleared in the process and it is not restored after the memset as it used to be. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Update nsm_find() to support non-AF_INET addressesChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Combine __nsm_find() and nsm_find().Chuck Lever
Clean up: Having two separate functions doesn't add clarity, so eliminate one of them. Use contemporary kernel coding conventions where appropriate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Support AF_INET6 when hashing addresses in nlm_lookup_hostChuck Lever
Adopt an approach similar to the RPC server's auth cache (from Aurelien Charbon and Brian Haley). Note nlm_lookup_host()'s existing IP address hash function has the same issue with correctness on little-endian systems as the original IPv4 auth cache hash function, so I've also updated it with a hash function similar to the new auth cache hash function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Teach nlm_cmp_addr() to support AF_INET6 addressesChuck Lever
Update the nlm_cmp_addr() helper to support AF_INET6 as well as AF_INET addresses. New version takes two "struct sockaddr *" arguments instead of "struct sockaddr_in *" arguments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29NSM: Use sockaddr_storage for sm_addr fieldChuck Lever
To store larger addresses in the nsm_handle structure, make sm_addr a sockaddr_storage. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Use sockaddr_storage for h_saddr fieldChuck Lever
To store larger addresses in the nlm_host structure, make h_saddr a sockaddr_storage. And let's call it something more self-explanatory: "saddr" could easily be mistaken for "server address". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Use sockaddr_storage + length for h_addr fieldChuck Lever
To store larger addresses in the nlm_host structure, make h_addr a sockaddr_storage, and add an address length field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Add address family-agnostic helper for zeroing the port numberChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: Specify address family for source addressChuck Lever
Make sure an address family is specified for source addresses passed to nlm_lookup_host(). nlm_lookup_host() will need this when it becomes capable of dealing with AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: address-family independent printable addressesChuck Lever
Knowing which source address is used for communicating with remote NLM services can be helpful for debugging configuration problems on hosts with multiple addresses. Keep the dprintk debugging here, but adapt it so it displays AF_INET6 addresses properly. There are also a couple of dprintk clean-ups as well. At some point we will aggregate the helpers that display presentation format addresses into a single set of shared helpers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29NLM: Clean up before introducing new debugging messagesChuck Lever
We're about to introduce some extra debugging messages in nlm_lookup_host(). Bring the coding style up to date first so we can cleanly introduce the new debugging messages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29SUNRPC: Support IPv6 when registering kernel RPC servicesChuck Lever
In order to advertise NFS-related services on IPv6 interfaces via rpcbind, the kernel RPC server implementation must use rpcb_v4_register() instead of rpcb_register(). A new kernel build option allows distributions to use the legacy v2 call until they integrate an appropriate user-space rpcbind daemon that can support IPv6 RPC services. I tried adding some automatic logic to fall back if registering with a v4 protocol request failed, but there are too many corner cases. So I just made it a compile-time switch that distributions can throw when they've replaced portmapper with rpcbind. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29lockd: don't depend on lockd main loop to end graceJ. Bruce Fields
End lockd's grace period using schedule_delayed_work() instead of a check on every pass through the main loop. After a later patch, we'll depend on lockd to end its grace period even if it's not currently handling requests; so it shouldn't depend on being woken up from the main loop to do so. Also, Nakano Hiroaki (who independently produced a similar patch) noticed that the current behavior is buggy in the face of jiffies wraparound: "lockd uses time_before() to determine whether the grace period has expired. This would seem to be enough to avoid timer wrap-around issues, but, unfortunately, that is not the case. The time_* family of comparison functions can be safely used to compare jiffies relatively close in time, but they stop working after approximately LONG_MAX/2 ticks. nfsd can suffer this problem because the time_before() comparison in lockd() is not performed until the first request comes in, which means that if there is no lockd traffic for more than LONG_MAX/2 ticks we are screwed. "The implication of this is that once time_before() starts misbehaving any attempt from a NFS client to execute fcntl() will be received with a NLM_LCK_DENIED_GRACE_PERIOD message for 25 days (assuming HZ=1000). In other words, the 50 seconds grace period could turn into a grace period of 50 days or more. "Note: This bug was analyzed independently by Oda-san <oda@valinux.co.jp> and myself." Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Nakano Hiroaki <nakano.hiroaki@oss.ntt.co.jp> Cc: Itsuro Oda <oda@valinux.co.jp>
2008-09-29locks: allow lockd to process blocked locks during grace periodJ. Bruce Fields
The check here is currently harmless but unnecessary, since, as the comment notes, there aren't any blocked-lock callbacks to process during the grace period anyway. And eventually we want to allow multiple grace periods that come and go for different filesystems over the course of the lifetime of lockd, at which point this check is just going to get in the way. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29knfsd: allocate readahead cache in individual chunksJeff Layton
I had a report from someone building a large NFS server that they were unable to start more than 585 nfsd threads. It was reported against an older kernel using the slab allocator, and I tracked it down to the large allocation in nfsd_racache_init failing. It appears that the slub allocator handles large allocations better, but large contiguous allocations can often be problematic. There doesn't seem to be any reason that the racache has to be allocated as a single large chunk. This patch breaks this up so that the racache is built up from separate allocations. (Thanks also to Takashi Iwai for a bugfix.) Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Takashi Iwai <tiwai@suse.de>
2008-09-29nfsd: nfs4xdr decode_stateid helper functionBenny Halevy
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: properly xdr-decode NFS4_OPEN_CLAIM_DELEGATE_CUR stateidBenny Halevy
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: don't declare p in ENCODE_SEQID_OP_HEADBenny Halevy
After using the encode_stateid helper the "p" pointer declared by ENCODE_SEQID_OP_HEAD is warned as unused. In the single site where it is still needed it can be declared separately using the ENCODE_HEAD macro. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: nfs4xdr encode_stateid helper functionBenny Halevy
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: fix nfsd4_encode_open buffer space reservationBenny Halevy
nfsd4_encode_open first reservation is currently for 36 + sizeof(stateid_t) while it writes after the stateid a cinfo (20 bytes) and 5 more 4-bytes words, for a total of 40 + sizeof(stateid_t). Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: properly xdr-encode deleg stateid returned from openBenny Halevy
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: properly xdr-encode stateid4.seqid as uint32_t for cb_recallBenny Halevy
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29Configure out file locking featuresThomas Petazzoni
This patch adds the CONFIG_FILE_LOCKING option which allows to remove support for advisory locks. With this patch enabled, the flock() system call, the F_GETLK, F_SETLK and F_SETLKW operations of fcntl() and NFS support are disabled. These features are not necessarly needed on embedded systems. It allows to save ~11 Kb of kernel code and data: text data bss dec hex filename 1125436 118764 212992 1457192 163c28 vmlinux.old 1114299 118564 212992 1445855 160fdf vmlinux -11137 -200 0 -11337 -2C49 +/- This patch has originally been written by Matt Mackall <mpm@selenic.com>, and is part of the Linux Tiny project. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: matthew@wil.cx Cc: linux-fsdevel@vger.kernel.org Cc: mpm@selenic.com Cc: akpm@linux-foundation.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29nfsd: permit unauthenticated stat of export rootJ. Bruce Fields
RFC 2623 section 2.3.2 permits the server to bypass gss authentication checks for certain operations that a client may perform when mounting. In the case of a client that doesn't have some form of credentials available to it on boot, this allows it to perform the mount unattended. (Presumably real file access won't be needed until a user with credentials logs in.) Being slightly more lenient allows lots of old clients to access krb5-only exports, with the only loss being a small amount of information leaked about the root directory of the export. This affects only v2 and v3; v4 still requires authentication for all access. Thanks to Peter Staubach testing against a Solaris client, which suggesting addition of v3 getattr, to the list, and to Trond for noting that doing so exposes no additional information. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Peter Staubach <staubach@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
2008-09-29SUNRPC: Add address family field to svc_serv data structureChuck Lever
Introduce and initialize an address family field in the svc_serv structure. This field will determine what family to use for the service's listener sockets and what families are advertised via the local rpcbind daemon. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>