summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-05-21Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "New notable features: - The seccomp work from Will Drewry - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski - Longer security labels for Smack from Casey Schaufler - Additional ptrace restriction modes for Yama by Kees Cook" Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) apparmor: fix long path failure due to disconnected path apparmor: fix profile lookup for unconfined ima: fix filename hint to reflect script interpreter name KEYS: Don't check for NULL key pointer in key_validate() Smack: allow for significantly longer Smack labels v4 gfp flags for security_inode_alloc()? Smack: recursive tramsmute Yama: replace capable() with ns_capable() TOMOYO: Accept manager programs which do not start with / . KEYS: Add invalidation support KEYS: Do LRU discard in full keyrings KEYS: Permit in-place link replacement in keyring list KEYS: Perform RCU synchronisation on keys prior to key destruction KEYS: Announce key type (un)registration KEYS: Reorganise keys Makefile KEYS: Move the key config into security/keys/Kconfig KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat Yama: remove an unused variable samples/seccomp: fix dependencies on arch macros Yama: add additional ptrace scopes ...
2012-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds
Pull GFS2 changes from Steven Whitehouse. * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits) GFS2: Fix quota adjustment return code GFS2: Add rgrp information to block_alloc trace point GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer GFS2: Update glock doc to add new stats info GFS2: Update main gfs2 doc GFS2: Remove redundant metadata block type check GFS2: Fix sgid propagation when using ACLs GFS2: eliminate log elements and simplify GFS2: Eliminate vestigial sd_log_le_rg GFS2: Eliminate needless parameter from function gfs2_setbit GFS2: Log code fixes GFS2: Remove unused argument from gfs2_internal_read GFS2: Remove bd_list_tr GFS2: Remove duplicate log code GFS2: Clean up log write code path GFS2: Use variable rather than qa to determine if unstuff necessary GFS2: Change variable blk to biblk GFS2: Fix function parameter comments in rgrp.c GFS2: Eliminate offset parameter to gfs2_setbit GFS2: Use slab for block reservation memory ...
2012-05-21Revert "vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu"Linus Torvalds
This reverts commit 8c01a529b861ba97c7d78368e6a5d4d42e946f75. It turns out the d_unhashed() check isn't unnecessary after all: while it's true that unhashing will increment the sequence numbers, that does not necessarily invalidate the RCU lookup, because it might have seen the dentry pointer (before it got unhashed), but by the time it loaded the sequence number, it could have seen the *new* sequence number (after it got unhashed). End result: we might look up an unhashed dentry that is about to be freed, with the sequence number never indicating anything bad about it. So checking that the dentry is still hashed (*after* reading the sequence number) is indeed the proper fix, and was never unnecessary. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-22Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into nextJames Morris
Per pull request, for 3.5.
2012-05-21vfs: be even more careful about dentry RCU name lookupsLinus Torvalds
Miklos Szeredi points out that we need to also worry about memory odering when doing the dentry name comparison asynchronously with RCU. In particular, doing a rename can do a memcpy() of one dentry name over another, and we want to make sure that any unlocked reader will always see the proper terminating NUL character, so that it won't ever run off the allocation. Rather than having to be extra careful with the name copy or at lookup time for each character, this resolves the issue by making sure that all names that are inlined in the dentry always have a NUL character at the end of the name allocation. If we do that at dentry allocation time, we know that no future name copy will ever change that final NUL to anything else, so there are no memory ordering issues. So even if a concurrent rename ends up overwriting the NUL character that terminates the original name, we always know that there is one final NUL at the end, and there is no worry about the lockless RCU lookup traversing the name too far. The out-of-line allocations are never copied over, so we can just make sure that we write the name (with terminating NULL) and do a write barrier before we expose the name to anything else by setting it in the dentry. Reported-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21vfs: make AIO use the proper rw_verify_area() area helpersLinus Torvalds
We had for some reason overlooked the AIO interface, and it didn't use the proper rw_verify_area() helper function that checks (for example) mandatory locking on the file, and that the size of the access doesn't cause us to overflow the provided offset limits etc. Instead, AIO did just the security_file_permission() thing (that rw_verify_area() also does) directly. This fixes it to do all the proper helper functions, which not only means that now mandatory file locking works with AIO too, we can actually remove lines of code. Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreamingLinus Torvalds
Pull c6x updates from Mark Salter: "Clean up some c6x Kconfig items and add support for Elf FDPIC loader." * tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: C6X: remove unused config items C6X: add support to build with BINFMT_ELF_FDPIC C6X: change main arch kbuild symbol
2012-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking changes from David Miller: 1) Get rid of the error prone NLA_PUT*() macros that used an embedded goto. 2) Kill off the token-ring and MCA networking drivers, from Paul Gortmaker. 3) Reduce high-order allocations made by datagram AF_UNIX sockets, from Eric Dumazet. 4) Add PTP hardware clock support to IGB and IXGBE, from Richard Cochran and Jacob Keller. 5) Allow users to query timestamping capabilities of a card via ethtool, from Richard Cochran. 6) Add loadbalance mode to the teaming driver, from Jiri Pirko. Part of this is that we can now have BPF filters not attached to sockets, and the loadbalancing function is calculated using one. 7) Francois Romieu went through the network drivers removing gratuitous uses of netdev->base_addr, perhaps some day we can remove it completely but it's used for ISA probing still. 8) Add a BPF JIT for sparc. I know, who cares, right? :-) 9) Move networking sysctl registry away from using the compatability mode interfaces in the sysctl code. From Eric W Biederman. 10) Pavel Emelyanov added a way to save and restore TCP socket state via TCP_REPAIR, TCP_REPAIR_QUEUE, and TCP_QUEUE_SEQ socket options as well as a way to forcefully bind a socket to a port via the sk->sk_reuse value SK_FORCE_REUSE. There is also a TCP_REPAIR_OPTIONS which allows to reinstante the TCP options enabled on the connection. 11) Several enhancements from Eric Dumazet that, in particular, can enhance splice performance on TCP sockets significantly. a) Reset the offset of the per-socket sendmsg page when we know we're the only use of the page in linear_to_page(). b) Add facilities such that skb->data can be backed a page rather than SLAB kmalloc'd memory. In particular devices which were receiving into linear RX buffers can now end up providing paged data. The big result is that code like splice and GRO do not have to copy any more. 12) Allow a pure sender to more gracefully handle ACK backlogs in TCP. What can happen at high rates is that the sender hasn't grown his receive buffer limits at all (he's not receiving data so really doesn't need to), but the non-data ACKs consume receive buffer space. sk_add_backlog() is too aggressive in dropping frames in this case, so relax it's requirements by using the receive buffer plus the send buffer limit as the backlog limit instead of just the former. Also from Eric Dumazet. 13) Add ipv6 support to L2TP, from Benjamin LaHaise, James Chapman, and Chris Elston. 14) Implement TCP early retransmit (RFC 5827), from Yuchung Cheng. Basically, we can start fast retransmit before hiting the dupack threshold under certain conditions. 15) New CODEL active queue management packet scheduler, from Eric Dumazet based upon initial work by Dave Taht. Basically, the big feature is that packets are dropped (or ECN bits are set) based upon how long packets live in the queue, rather than the queue length (which is what RED uses). * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1341 commits) drivers/net/stmmac: seq_file fix memory leak ipv6/exthdrs: strict Pad1 and PadN check USB: qmi_wwan: Add ZTE (Vodafone) K3520-Z USB: qmi_wwan: Add ZTE (Vodafone) K3765-Z USB: qmi_wwan: Make forced int 4 whitelist generic net/ipv4: replace simple_strtoul with kstrtoul net/ipv4/ipconfig: neaten __setup placement net: qmi_wwan: Add Vodafone/Huawei K5005 support net: cdc_ether: Add ZTE WWAN matches before generic Ethernet ipv6: use skb coalescing in reassembly ipv4: use skb coalescing in defragmentation net: introduce skb_try_coalesce() net:ipv6:fixed space issues relating to operators. net:ipv6:fixed a trailing white space issue. ipv6: disable GSO on sockets hitting dst_allfrag tg3: use netdev_alloc_frag() API net: napi_frags_skb() is static ppp: avoid false drop_monitor false positives ipv6: bool/const conversions phase2 ipx: Remove spurious NULL checking in ipx_ioctl(). ...
2012-05-21Merge branch 'dentry-cleanups' (dcache access cleanups and optimizations)Linus Torvalds
This branch simplifies and clarifies the dcache lookup, and allows us to do certain nice optimizations when comparing dentries. It also cleans up the interface to __d_lookup_rcu(), especially around passing the inode information around. * dentry-cleanups: vfs: make it possible to access the dentry hash/len as one 64-bit entry vfs: move dentry name length comparison from dentry_cmp() into callers vfs: do the careful dentry name access for all dentry_cmp cases vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
2012-05-21Merge branch 'vfs-cleanups' (random vfs cleanups)Linus Torvalds
This teaches vfs_fstat() to use the appropriate f[get|put]_light functions, allowing it to avoid some unnecessary locking for the common case. More noticeably, it also cleans up and simplifies the "getname_flags()" function, which now relies on the architecture strncpy_from_user() doing all the user access checks properly, instead of hacking around the fact that on x86 it didn't use to do it right (see commit 92ae03f2ef99: "x86: merge 32/64-bit versions of 'strncpy_from_user()' and speed it up"). * vfs-cleanups: VFS: make vfs_fstat() use f[get|put]_light() VFS: clean up and simplify getname_flags() x86: make word-at-a-time strncpy_from_user clear bytes at the end
2012-05-21Merge branch 'stat-cleanups' (clean up copying of stat info to user space)Linus Torvalds
This makes cp_new_stat() a bit more readable, and avoids having to memset() the whole structure just to fill in a couple of padding fields. This is another result of me looking at code generation of functions that show up high on certain kernel profiles, and just going "Oh, let's just clean that up". Architectures that don't supply the #define to fill just the padding fields will still fall back to memset(). * stat-cleanups: vfs: don't force a big memset of stat data just to clear padding fields vfs: de-crapify "cp_new_stat()" function
2012-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-05-19Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A few small, but important fixes. Most of them are marked for stable as well - Fix failure to release a semaphore on error path in mtip32xx. - Fix crashable condition in bio_get_nr_vecs(). - Don't mark end-of-disk buffers as mapped, limit it to i_size. - Fix for build problem with CONFIG_BLOCK=n on arm at least. - Fix for a buffer overlow on UUID partition printing. - Trivial removal of unused variables in dac960." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix buffer overflow when printing partition UUIDs Fix blkdev.h build errors when BLOCK=n bio allocation failure due to bio_get_nr_vecs() block: don't mark buffers beyond end of disk as mapped mtip32xx: release the semaphore on an error path dac960: Remove unused variables from DAC960_CreateProcEntries()
2012-05-18Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge misc fixes from Andrew Morton. * emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches) frv: delete incorrect task prototypes causing compile fail slub: missing test for partial pages flush work in flush_all() fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
2012-05-18proc: move fd symlink i_mode calculations into tid_fd_revalidate()Linus Torvalds
Instead of doing the i_mode calculations at proc_fd_instantiate() time, move them into tid_fd_revalidate(), which is where the other inode state (notably uid/gid information) is updated too. Otherwise we'll end up with stale i_mode information if an fd is re-used while the dentry still hangs around. Not that anything really *cares* (symlink permissions don't really matter), but Tetsuo Handa noticed that the owner read/write bits don't always match the state of the readability of the file descriptor, and we _used_ to get this right a long time ago in a galaxy far, far away. Besides, aside from fixing an ugly detail (that has apparently been this way since commit 61a28784028e: "proc: Remove the hard coded inode numbers" in 2006), this removes more lines of code than it adds. And it just makes sense to update i_mode in the same place we update i_uid/gid. Al Viro correctly points out that we could just do the inode fill in the inode iops ->getattr() function instead. However, that does require somewhat slightly more invasive changes, and adds yet *another* lookup of the file descriptor. We need to do the revalidate() for other reasons anyway, and have the file descriptor handy, so we might as well fill in the information at this point. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entriesCyrill Gorcunov
map_files/ entries are never supposed to be executed, still curious minds might try to run them, which leads to the following deadlock ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc4-24406-g841e6a6 #121 Not tainted ------------------------------------------------------- bash/1556 is trying to acquire lock: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1 but task is already holding lock: (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sig->cred_guard_mutex){+.+.+.}: validate_chain+0x444/0x4f4 __lock_acquire+0x387/0x3f8 lock_acquire+0x12b/0x158 __mutex_lock_common+0x56/0x3a9 mutex_lock_killable_nested+0x40/0x45 lock_trace+0x24/0x59 proc_map_files_lookup+0x5a/0x165 __lookup_hash+0x52/0x73 do_lookup+0x276/0x2b1 walk_component+0x3d/0x114 do_last+0xfc/0x540 path_openat+0xd3/0x306 do_filp_open+0x3d/0x89 do_sys_open+0x74/0x106 sys_open+0x21/0x23 tracesys+0xdd/0xe2 -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}: check_prev_add+0x6a/0x1ef validate_chain+0x444/0x4f4 __lock_acquire+0x387/0x3f8 lock_acquire+0x12b/0x158 __mutex_lock_common+0x56/0x3a9 mutex_lock_nested+0x40/0x45 do_lookup+0x267/0x2b1 walk_component+0x3d/0x114 link_path_walk+0x1f9/0x48f path_openat+0xb6/0x306 do_filp_open+0x3d/0x89 open_exec+0x25/0xa0 do_execve_common+0xea/0x2f9 do_execve+0x43/0x45 sys_execve+0x43/0x5a stub_execve+0x6c/0xc0 This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex and when do_lookup happens we try to grab task->signal->cred_guard_mutex again in lock_trace. Fix it using plain ptrace_may_access() helper in proc_map_files_lookup() and in proc_map_files_readdir() instead of lock_trace(), the caller must be CAP_SYS_ADMIN granted anyway. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reported-by: Sasha Levin <levinsasha928@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Jones <davej@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-05-16Merge git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fix from Jeff Layton * git://git.samba.org/sfrench/cifs-2.6: cifs: fix misspelling of "forcedirectio"
2012-05-16cifs: fix misspelling of "forcedirectio"Jeff Layton
...and add a "directio" synonym since that's what the manpage has always advertised. Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-16GFS2: Fix quota adjustment return codeBob Peterson
This patch changes function gfs2_adjust_quota so that it properly returns a good (zero) return code on the normal path through the code. Without this, mounting GFS2 with -o quota=account periodically gave this error message: GFS2: fsid=cluster:fs: gfs2_quotad: sync error -5 Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-15C6X: add support to build with BINFMT_ELF_FDPICMark Salter
C6x userspace supports a shared library mechanism called DSBT for systems with no MMU. DSBT is similar to FDPIC in allowing shared text segments and private copies of data segments without an MMU. Both methods access data using a base register and offset. With FDPIC, the caller of an external function sets up the base register for the callee. With DSBT, the called function sets up its own base register. Other details differ but both userspaces need the same thing from the kernel loader: a map of where each ELF segment was loaded. The FDPIC loader already provides this, so DSBT just uses it. This patch enables BINFMT_ELF_FDPIC by default for C6X and provides the necessary architecture hooks for the generic loader. Signed-off-by: Mark Salter <msalter@redhat.com>
2012-05-13Merge tag 'for-linus-3.4-20120513' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull three MTD fixes from David Woodhouse: - Fix a lock ordering deadlock in JFFS2 - Fix an oops in the dataflash driver, triggered by a dummy call to test whether it has OTP functionality. - Fix request_mem_region() failure on amsdelta NAND driver. * tag 'for-linus-3.4-20120513' of git://git.infradead.org/linux-mtd: mtd: ams-delta: fix request_mem_region() failure jffs2: Fix lock acquisition order bug in gc path mtd: fix oops in dataflash driver
2012-05-11bio allocation failure due to bio_get_nr_vecs()Bernd Schubert
The number of bio_get_nr_vecs() is passed down via bio_alloc() to bvec_alloc_bs(), which fails the bio allocation if nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an unexpected bio allocation failure. Limiting to queue_max_segments() is not sufficient, as max_segments also might be very large. bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs > BIO_MAX_PAGES bio_alloc_bioset(gfp_mask, nr_iovecs, ...) bio_alloc(GFP_NOIO, nvecs) xfs_alloc_ioend_bio() Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-11block: don't mark buffers beyond end of disk as mappedJeff Moyer
Hi, We have a bug report open where a squashfs image mounted on ppc64 would exhibit errors due to trying to read beyond the end of the disk. It can easily be reproduced by doing the following: [root@ibm-p750e-02-lp3 ~]# ls -l install.img -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null dd: reading `/dev/loop0': Input/output error 277376+0 records in 277376+0 records out 142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s In dmesg, you'll find the following: squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 43.106012] attempt to access beyond end of device [ 43.106029] loop0: rw=0, want=277410, limit=277408 [ 43.106039] Buffer I/O error on device loop0, logical block 138704 [ 43.106053] attempt to access beyond end of device [ 43.106057] loop0: rw=0, want=277412, limit=277408 [ 43.106061] Buffer I/O error on device loop0, logical block 138705 [ 43.106066] attempt to access beyond end of device [ 43.106070] loop0: rw=0, want=277414, limit=277408 [ 43.106073] Buffer I/O error on device loop0, logical block 138706 [ 43.106078] attempt to access beyond end of device [ 43.106081] loop0: rw=0, want=277416, limit=277408 [ 43.106085] Buffer I/O error on device loop0, logical block 138707 [ 43.106089] attempt to access beyond end of device [ 43.106093] loop0: rw=0, want=277418, limit=277408 [ 43.106096] Buffer I/O error on device loop0, logical block 138708 [ 43.106101] attempt to access beyond end of device [ 43.106104] loop0: rw=0, want=277420, limit=277408 [ 43.106108] Buffer I/O error on device loop0, logical block 138709 [ 43.106112] attempt to access beyond end of device [ 43.106116] loop0: rw=0, want=277422, limit=277408 [ 43.106120] Buffer I/O error on device loop0, logical block 138710 [ 43.106124] attempt to access beyond end of device [ 43.106128] loop0: rw=0, want=277424, limit=277408 [ 43.106131] Buffer I/O error on device loop0, logical block 138711 [ 43.106135] attempt to access beyond end of device [ 43.106139] loop0: rw=0, want=277426, limit=277408 [ 43.106143] Buffer I/O error on device loop0, logical block 138712 [ 43.106147] attempt to access beyond end of device [ 43.106151] loop0: rw=0, want=277428, limit=277408 [ 43.106154] Buffer I/O error on device loop0, logical block 138713 [ 43.106158] attempt to access beyond end of device [ 43.106162] loop0: rw=0, want=277430, limit=277408 [ 43.106166] attempt to access beyond end of device [ 43.106169] loop0: rw=0, want=277432, limit=277408 ... [ 43.106307] attempt to access beyond end of device [ 43.106311] loop0: rw=0, want=277470, limit=2774 Squashfs manages to read in the end block(s) of the disk during the mount operation. Then, when dd reads the block device, it leads to block_read_full_page being called with buffers that are beyond end of disk, but are marked as mapped. Thus, it would end up submitting read I/O against them, resulting in the errors mentioned above. I fixed the problem by modifying init_page_buffers to only set the buffer mapped if it fell inside of i_size. Cheers, Jeff Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Nick Piggin <npiggin@kernel.dk> -- Changes from v1->v2: re-used max_block, as suggested by Nick Piggin. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-11GFS2: Add rgrp information to block_alloc trace pointBob Peterson
This is a second attempt at a patch that adds rgrp information to the block allocation trace point for GFS2. As suggested, the patch was modified to list the rgrp information _after_ the fields that exist today. Again, the reason for this patch is to allow us to trace and debug problems with the block reservations patch, which is still in the works. We can debug problems with reservations if we can see what block allocations result from the block reservations. It may also be handy in figuring out if there are problems in rgrp free space accounting. In other words, we can use it to track the rgrp and its free space along side the allocations that are taking place. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-11GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_bufferBob Peterson
It turns out that the "new" parameter to function gfs2_meta_indirect_buffer was always being passed in as zero. Therefore, this patch eliminates it and simplifies the function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-10vfs: make it possible to access the dentry hash/len as one 64-bit entryLinus Torvalds
This allows comparing hash and len in one operation on 64-bit architectures. Right now only __d_lookup_rcu() takes advantage of this, since that is the case we care most about. The use of anonymous struct/unions hides the alternate 64-bit approach from most users, the exception being a few cases where we initialize a 'struct qstr' with a static initializer. This makes the problematic cases use a new QSTR_INIT() helper function for that (but initializing just the name pointer with a "{ .name = xyzzy }" initializer remains valid, as does just copying another qstr structure). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10vfs: move dentry name length comparison from dentry_cmp() into callersLinus Torvalds
All callers do want to check the dentry length, but some of them can check the length and the hash together, so doing it in dentry_cmp() can be counter-productive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10vfs: do the careful dentry name access for all dentry_cmp casesLinus Torvalds
Commit 12f8ad4b0533 ("vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces") did the careful ACCESS_ONCE() of the dentry name only for the word-at-a-time case, even though the issue is generic. Admittedly I don't really see gcc ever reloading the value in the middle of the loop, so the ACCESS_ONCE() protects us from a fairly theoretical issue. But better safe than sorry. Also, this consolidates the common parts of the word-at-a-time and bytewise logic, which includes checking the length. We'll be changing that later. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10vfs: remove unnecessary d_unhashed() check from __d_lookup_rcuLinus Torvalds
The check for d_unhashed() is not strictly incorrect, but at the same time it is also not sensible. The actual dentry removal from the dentry hash chains is totally asynchronous to the __d_lookup_rcu() logic, and we depend on __d_drop() updating the sequence number to invalidate any lookup of an unhashed dentry. So checking d_unhashed() is not incorrect, but it's not useful either: the code has to work correctly even without it. So just remove it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge misc fixes from Andrew Morton. * emailed from Andrew Morton <akpm@linux-foundation.org>: (8 patches) MAINTAINERS: add maintainer for LED subsystem mm: nobootmem: fix sign extend problem in __free_pages_memory() drivers/leds: correct __devexit annotations memcg: free spare array to avoid memory leak namespaces, pid_ns: fix leakage on fork() failure hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow() mm: fix division by 0 in percpu_pagelist_fraction() proc/pid/pagemap: correctly report non-present ptes and holes between vmas
2012-05-10proc/pid/pagemap: correctly report non-present ptes and holes between vmasKonstantin Khlebnikov
Reset the current pagemap-entry if the current pte isn't present, or if current vma is over. Otherwise pagemap reports last entry again and again. Non-present pte reporting was broken in commit 092b50bacd1c ("pagemap: introduce data structure for pagemap entry") Reporting for holes was broken in commit 5aaabe831eb5 ("pagemap: avoid splitting thp when reading /proc/pid/pagemap") Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Reported-by: Pavel Emelyanov <xemul@parallels.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-09cifs: fix revalidation test in cifs_llseek()Dan Carpenter
This test is always true so it means we revalidate the length every time, which generates more network traffic. When it is SEEK_SET or SEEK_CUR, then we don't need to revalidate. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-08GFS2: Remove redundant metadata block type checkBob Peterson
This patch removes a redundant metadata block check. See description below. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07jffs2: Fix lock acquisition order bug in gc pathJosh Cartwright
The locking policy is such that the erase_complete_block spinlock is nested within the alloc_sem mutex. This fixes a case in which the acquisition order was erroneously reversed. This issue was caught by the following lockdep splat: ======================================================= [ INFO: possible circular locking dependency detected ] 3.0.5 #1 ------------------------------------------------------- jffs2_gcd_mtd6/299 is trying to acquire lock: (&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890 but task is already holding lock: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}: [<c008bec4>] validate_chain+0xe6c/0x10bc [<c008c660>] __lock_acquire+0x54c/0xba4 [<c008d240>] lock_acquire+0xa4/0x114 [<c046780c>] _raw_spin_lock+0x3c/0x4c [<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890 [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc [<c0071a68>] kthread+0x98/0xa0 [<c000f264>] kernel_thread_exit+0x0/0x8 -> #0 (&c->alloc_sem){+.+.+.}: [<c008ad2c>] print_circular_bug+0x70/0x2c4 [<c008c08c>] validate_chain+0x1034/0x10bc [<c008c660>] __lock_acquire+0x54c/0xba4 [<c008d240>] lock_acquire+0xa4/0x114 [<c0466628>] mutex_lock_nested+0x74/0x33c [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890 [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc [<c0071a68>] kthread+0x98/0xa0 [<c000f264>] kernel_thread_exit+0x0/0x8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&c->erase_completion_lock)->rlock); lock(&c->alloc_sem); lock(&(&c->erase_completion_lock)->rlock); lock(&c->alloc_sem); *** DEADLOCK *** 1 lock held by jffs2_gcd_mtd6/299: #0: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890 stack backtrace: [<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24) [<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc) [<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4) [<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114) [<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c) [<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0) [<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8) This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'. Cc: stable@kernel.org [2.6.37+] Signed-off-by: Josh Cartwright <joshc@linux.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-05-06vfs: don't force a big memset of stat data just to clear padding fieldsLinus Torvalds
Admittedly this is something that the compiler should be able to just do for us, but gcc just isn't that smart. And trying to use a structure initializer (which would get us the right semantics) ends up resulting in gcc allocating stack space for _two_ 'struct stat', and then copying one into the other. So do it by hand - just have a per-architecture macro that initializes the padding fields. And if the architecture doesn't provide one, fall back to the old behavior of just doing the whole memset() first. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-06vfs: de-crapify "cp_new_stat()" functionLinus Torvalds
It's an unreadable mess of 32-bit vs 64-bit #ifdef's that mostly follow a rather simple pattern. Make a helper #define to handle that pattern, in the process making the code both shorter and more readable. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "The big ones here are a memory leak we introduced in rc1, and a scheduling while atomic if the transid on disk doesn't match the transid we expected. This happens for corrupt blocks, or out of date disks. It also fixes up the ioctl definition for our ioctl to resolve logical inode numbers. The __u32 was a merging error and doesn't match what we ship in the progs." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: avoid sleeping in verify_parent_transid while atomic Btrfs: fix crash in scrub repair code when device is missing btrfs: Fix mismatching struct members in ioctl.h Btrfs: fix page leak when allocing extent buffers Btrfs: Add properly locking around add_root_to_dirty_list
2012-05-06Btrfs: avoid sleeping in verify_parent_transid while atomicChris Mason
verify_parent_transid needs to lock the extent range to make sure no IO is underway, and so it can safely clear the uptodate bits if our checks fail. But, a few callers are using it with spinlocks held. Most of the time, the generation numbers are going to match, and we don't want to switch to a blocking lock just for the error case. This adds an atomic flag to verify_parent_transid, and changes it to return EAGAIN if it needs to block to properly verifiy things. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04vfs: clean up __d_lookup_rcu() and dentry_cmp() interfacesLinus Torvalds
The calling conventions for __d_lookup_rcu() and dentry_cmp() are annoying in different ways, and there is actually one single underlying reason for both of the annoyances. The fundamental reason is that we do the returned dentry sequence number check inside __d_lookup_rcu() instead of doing it in the caller. This results in two annoyances: - __d_lookup_rcu() now not only needs to return the dentry and the sequence number that goes along with the lookup, it also needs to return the inode pointer that was validated by that sequence number check. - and because we did the sequence number check early (to validate the name pointer and length) we also couldn't just pass the dentry itself to dentry_cmp(), we had to pass the counted string that contained the name. So that sequence number decision caused two separate ugly calling conventions. Both of these problems would be solved if we just did the sequence number check in the caller instead. There's only one caller, and that caller already has to do the sequence number check for the parent anyway, so just do that. That allows us to stop returning the dentry->d_inode in that in-out argument (pointer-to-pointer-to-inode), so we can make the inode argument just a regular input inode pointer. The caller can just load the inode from dentry->d_inode, and then do the sequence number check after that to make sure that it's synchronized with the name we looked up. And it allows us to just pass in the dentry to dentry_cmp(), which is what all the callers really wanted. Sure, dentry_cmp() has to be a bit careful about the dentry (which is not stable during RCU lookup), but that's actually very simple. And now that dentry_cmp() can clearly see that the first string argument is a dentry, we can use the direct word access for that, instead of the careful unaligned zero-padding. The dentry name is always properly aligned, since it is a single path component that is either embedded into the dentry itself, or was allocated with kmalloc() (see __d_alloc). Finally, this also uninlines the nasty slow-case for dentry comparisons: that one *does* need to do a sequence number check, since it will call in to the low-level filesystems, and we want to give those a stable inode pointer and path component length/start arguments. Doing an extra sequence check for that slow case is not a problem, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-04hfsplus: Fix potential buffer overflowsGreg Kroah-Hartman
Commit ec81aecb2966 ("hfs: fix a potential buffer overflow") fixed a few potential buffer overflows in the hfs filesystem. But as Timo Warns pointed out, these changes also need to be made on the hfsplus filesystem as well. Reported-by: Timo Warns <warns@pre-sense.de> Acked-by: WANG Cong <amwang@redhat.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Sage Weil <sage@newdream.net> Cc: Eugene Teo <eteo@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Anderson <anderson@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-04Merge git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French. * git://git.samba.org/sfrench/cifs-2.6: fs/cifs: fix parsing of dfs referrals cifs: make sure we ignore the credentials= and cred= options [CIFS] Update cifs version to 1.78 cifs - check S_AUTOMOUNT in revalidate cifs: add missing initialization of server->req_lock cifs: don't cap ra_pages at the same level as default_backing_dev_info CIFS: Fix indentation in cifs_show_options
2012-05-04Btrfs: fix crash in scrub repair code when device is missingStefan Behrens
Fix that when scrub tries to repair an I/O or checksum error and one of the devices containing the mirror is missing, it crashes in bio_add_page because the bdev is a NULL pointer for missing devices. Reported-by: Marco L. Crociani <marco.crociani@gmail.com> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04btrfs: Fix mismatching struct members in ioctl.hAlexander Block
Fix the size members of btrfs_ioctl_ino_path_args and btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used __u64 and the kernel headers used __u32 before. Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: fix page leak when allocing extent buffersJosef Bacik
If we happen to alloc a extent buffer and then alloc a page and notice that page is already attached to an extent buffer, we will only unlock it and free our existing eb. Any pages currently attached to that eb will be properly freed, but we don't do the page_cache_release() on the page where we noticed the other extent buffer which can cause us to leak pages and I hope cause the weird issues we've been seeing in this area. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: Add properly locking around add_root_to_dirty_listChris Mason
add_root_to_dirty_list happens once at the very beginning of the transaction, but it is still racey. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04GFS2: Fix sgid propagation when using ACLsSteven Whitehouse
This cleans up the mode setting code when creating inodes. The SGID bit was being reset by setattr_copy() when the user creating a subdirectory was not in the owning group. When ACLs are in use this SGID bit should have been propagated if the ACL allows creation of a subdirectory. GFS2's behaviour now matches that of the other ACL supporting filesystems in this regard. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-03fs/cifs: fix parsing of dfs referralsStefan Metzmacher
The problem was that the first referral was parsed more than once and so the caller tried the same referrals multiple times. The problem was introduced partly by commit 066ce6899484d9026acd6ba3a8dbbedb33d7ae1b, where 'ref += le16_to_cpu(ref->Size);' got lost, but that was also wrong... Cc: <stable@vger.kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Tested-by: Björn Jacke <bj@sernet.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-04Merge tag 'v3.4-rc5' into nextJames Morris
Linux 3.4-rc5 Merge to pull in prerequisite change for Smack: 86812bb0de1a3758dc6c7aa01a763158a7c0638a Requested by Casey.