summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-10-31Merge branch 'for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfsLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs: xfs: fix xfs_quota remove error xfs: free temporary cursor in xfs_dialloc
2009-10-30xfs: fix xfs_quota remove errorRyota Yamauchi
The xfs_quota returns ENOSYS when remove command is executed. Reproducable with following steps. # mount -t xfs -o uquota /dev/sda7 /mnt/mp1 # xfs_quota -x -c off -c remove XFS_QUOTARM: Function not implemented. The remove command is allowed during quotaoff, but xfs_fs_set_xstate() checks whether quota is running, and it leads to ENOSYS. To solve this problem, add a check for X_QUOTARM. Signed-off-by: Ryota Yamauchi <r-yamauchi@vf.jp.nec.com> Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-10-30xfs: free temporary cursor in xfs_diallocEric Sandeen
Commit bd169565993b39b9b4b102cdac8b13e0a259ce2f seems to have a slight regression where this code path: if (!--searchdistance) { /* * Not in range - save last search * location and allocate a new inode */ ... goto newino; } doesn't free the temporary cursor (tcur) that got dup'd in this function. This leaks an item in the xfs_btree_cur zone, and it's caught on module unload: =========================================================== BUG xfs_btree_cur: Objects remaining on kmem_cache_close() ----------------------------------------------------------- It seems like maybe a single free at the end of the function might be cleaner, but for now put a del_cursor right in this code block similar to the handling in the rest of the function. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-10-30Convert /proc/device-tree/ to seq_fileAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30powerpc: Cleanup Kconfig selection of hugetlbfs supportBenjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-29ocfs2: return f_fsid info in ocfs2_statfs()Coly Li
Currently the f_fsid of struct kstatfs returned from ocfs2_statfs() is undefined (vfs layer fills in 0 as default). Since in some conditions, f_fsid value might be used in a (f_fsid, ino) pair to uniquely identify a file, ocfs2 should return a unique defined f_fsid value from ocfs2_statfs(). Because uuid_str is the same on big or litlle endian machine, it's endian consistent to use osb->uuid_str to generate f_fsid value. Signed-off-by: Coly Li <coly.li@suse.de> Cc: Sunil Mushran <sunil.mushran@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-29Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: backing-dev: ensure that a removed bdi no longer has super_block referencing it block: use after free bug in __blkdev_get block: silently error unsupported empty barriers too
2009-10-29Merge branch 'sh/for-2.6.32' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Fix hugetlbfs dependencies for SH-3 && MMU configurations. sh: Document uImage.bin target in archhelp. sh: add uImage.bin target sh: rsk7203 CONFIG_MTD=n fix sh: Check for return_to_handler when unwinding the stack sh: Build fix: define more __movmem* symbols sh: __irq_entry annotate do_IRQ(). Fix up sh/powerpc conflicts in fs/Kconfig
2009-10-29Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: The link() operation should return any delegation on the file NFSv4: Fix two unbalanced put_rpccred() issues. NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE nfs: Panic when commit fails
2009-10-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fixing to avoid invalid kfree() in cifs_get_tcp_session()
2009-10-29Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule powerpc: Minor cleanup to lib/Kconfig.debug powerpc: Minor cleanup to sound/ppc/Kconfig powerpc: Minor cleanup to init/Kconfig powerpc: Limit memory hotplug support to PPC64 Book-3S machines powerpc: Limit hugetlbfs support to PPC64 Book-3S machines powerpc: Fix compile errors found by new ppc64e_defconfig powerpc: Add a Book-3E 64-bit defconfig powerpc/booke: Fix xmon single step on PowerPC Book-E powerpc: Align vDSO base address powerpc: Fix segment mapping in vdso32 powerpc/iseries: Remove compiler version dependent hack powerpc/perf_events: Fix priority of MSR HV vs PR bits powerpc/5200: Update defconfigs drivers/serial/mpc52xx_uart.c: Use UPIO_MEM rather than SERIAL_IO_MEM powerpc/boot/dts: drop obsolete 'fsl5200-clocking' of: Remove nested function mpc5200: support for the MAN mpc5200 based board mucmc52 mpc5200: support for the MAN mpc5200 based board uc101
2009-10-29Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix double IRELE in xfs_dqrele_inode
2009-10-29hfs: fix oops on mount with corrupted btree extent recordsJeff Mahoney
A particular fsfuzzer run caused an hfs file system to crash on mount. This is due to a corrupted MDB extent record causing a miscalculation of HFS_I(inode)->first_blocks for the extent tree. If the extent records are zereod out, it won't trigger the first_blocks special case. Instead it falls through to the extent code which we're still in the middle of initializing. This patch catches the 0 size extent records, reports the corruption, and fails the mount. Reported-by: Ramon de Carvalho Valle <rcvalle@linux.vnet.ibm.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29hfsplus: refuse to mount volumes larger than 2TBBen Hutchings
As found in <http://bugs.debian.org/550010>, hfsplus is using type u32 rather than sector_t for some sector number calculations. In particular, hfsplus_get_block() does: u32 ablock, dblock, mask; ... map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask)); I am not confident that I can find and fix all cases where a sector number may be truncated. For now, avoid data loss by refusing to mount HFS+ volumes with more than 2^32 sectors (2TB). [akpm@linux-foundation.org: fix 32 and 64-bit issues] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Eric Sesterhenn <snakebyte@gmx.de> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29hwpoison: fix/proc/meminfo alignmentHugh Dickins
Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted line is being shown too far right (it does align with x86_64's VmallocChunk above, but I hope nobody will ever have that much corrupted!). Align it. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29blkdev: flush disk cache on ->fsyncChristoph Hellwig
Currently there is no barrier support in the block device code. That means we cannot guarantee any sort of data integerity when using the block device node with dis kwrite caches enabled. Using the raw block device node is a typical use case for virtualization (and I assume databases, too). This patch changes block_fsync to issue a cache flush and thus make fsync on block device nodes actually useful. Note that in mainline we would also need to add such code to the ->aio_write method for O_SYNC handling, but assuming that Jan's patch series for the O_SYNC rewrite goes in it will also call into ->fsync for 2.6.32. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-29block: move bdi/address_space unplug functions to backing-dev.hJens Axboe
There's nothing block related about them, the backing device is used by things like NFS etc as well. This gets rid of the need to protect such calls by CONFIG_BLOCK. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28ocfs2: Set MS_POSIXACL on remountJan Kara
We have to set MS_POSIXACL on remount as well. Otherwise VFS would not know we started supporting ACLs after remount and thus ACLs would not work. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-28ocfs2: Make acl use the defaultJan Kara
Change acl mount options handling to match the one of XFS and BTRFS and hopefully it is also easier to use now. When admin does not specify any acl mount option, acls are enabled if and only if the filesystem has xattr feature enabled. If admin specifies 'acl' mount option, we fail the mount if the filesystem does not have xattr feature and thus acls cannot be enabled. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-28ocfs2: Always include ACL supportJan Kara
To become consistent with filesystems such as XFS or BTRFS, make posix ACLs always available. This also reduces possibility of misconfiguration on admin's side. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-10-28ocfs2: duplicate inline data properly during reflink.Tao Ma
The old reflink fails to handle inodes with inline data and will oops if it encounters them. This patch copies inline data to the new inode. Extended attributes may still be refcounted. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com>
2009-10-28ocfs2: Move ocfs2_complete_reflink to the right place.Tao Ma
As its name ocfs2_complete_reflink indicates, it should be called after all the work for reflink is done, so it really should be called after we reflink xattr successfully. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com>
2009-10-28ocfs2: Return -EINVAL when a device is not ocfs2.Joel Becker
In case of non-modular kernels the root filesystem is mounted by trying several filesystems. If ocfs2 was tried before the actual filesystem type, the mount would fail because ocfs2_sb_probe() returns -EAGAIN instead of -EINVAL. ocfs2 will now return -EINVAL properly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Reported-by: Laszlo Attila Toth <panther@balabit.hu>
2009-10-28aio: implement request batchingJeff Moyer
Hi, Some workloads issue batches of small I/O, and the performance is poor due to the call to blk_run_address_space for every single iocb. Nathan Roberts pointed this out, and suggested that by deferring this call until all I/Os in the iocb array are submitted to the block layer, we can realize some impressive performance gains (up to 30% for sequential 4k reads in batches of 16). Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28block: get rid of the WRITE_ODIRECT flagJeff Moyer
Hi, The WRITE_ODIRECT flag is only used in one place, and that code path happens to also call blk_run_address_space. The introduction of this flag, then, could result in the device being unplugged twice for every I/O. Further, with the batching changes in the next patch, we don't want an O_DIRECT write to imply a queue unplug. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-27nfsd: Fix sort_pacl in fs/nfsd/nf4acl.c to actually sort groupsFrank Filz
We have been doing some extensive testing of Linux support for ACLs on NFDS v4. We have noticed that the server rejects ACLs where the groups are out of order, for example, the following ACL is rejected: A::OWNER@:rwaxtTcCy A::user101@domain:rwaxtcy A::GROUP@:rwaxtcy A:g:group102@domain:rwaxtcy A:g:group101@domain:rwaxtcy A::EVERYONE@:rwaxtcy Examining the server code, I found that after converting an NFS v4 ACL to POSIX, sort_pacl is called to sort the user ACEs and group ACEs. Unfortunately, a minor bug causes the group sort to be skipped. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27nfsd4.1: common slot allocation size calculationJ. Bruce Fields
We do the same calculation in a couple places; use a helper function, and add a little documentation, in the hopes of preventing bugs like that fixed in the last patch. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27nfsd4.1: fix session memory use calculationJ. Bruce Fields
Unbalanced calculations on creation and destruction of sessions could cause our estimate of cache memory used to become negative, sometimes resulting in spurious SERVERFAULT returns to client CREATE_SESSION requests. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27nfs: new subdir Documentation/filesystems/nfsJ. Bruce Fields
We're adding enough nfs documentation that it may as well have its own subdirectory. Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27Merge commit 'v2.6.32-rc5' into for-2.6.33J. Bruce Fields
2009-10-27powerpc: Limit hugetlbfs support to PPC64 Book-3S machinesKumar Gala
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-27sh: Fix hugetlbfs dependencies for SH-3 && MMU configurations.Paul Mundt
The hugetlb dependencies presently depend on SUPERH && MMU while the hugetlb page size definitions depend on CPU_SH4 or CPU_SH5. This unfortunately allows SH-3 + MMU configurations to enable hugetlbfs without a corresponding HPAGE_SHIFT definition, resulting in the build blowing up. As SH-3 doesn't support variable page sizes, we tighten up the dependenies a bit to prevent hugetlbfs from being enabled. These days we also have a shiny new SYS_SUPPORTS_HUGETLBFS, so switch to using that rather than adding to the list of corner cases in fs/Kconfig. Reported-by: Kristoffer Ericson <kristoffer.ericson@gmail.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-10-26block: use after free bug in __blkdev_getNeil Brown
commit 0762b8bde9729f10f8e6249809660ff2ec3ad735 (from 14 months ago) introduced a use-after-free bug which has just recently started manifesting in my md testing. I tried git bisect to find out what caused the bug to start manifesting, and it could have been the recent change to blk_unregister_queue (48c0d4d4c04) but the results were inconclusive. This patch certainly fixes my symptoms and looks correct as the two calls are now in the same order as elsewhere in that function. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-26NFSv4: The link() operation should return any delegation on the fileTrond Myklebust
Otherwise, we have to wait for the server to recall it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-26NFSv4: Fix two unbalanced put_rpccred() issues.Trond Myklebust
Commits 29fba38b (nfs41: lease renewal) and fc01cea9 (nfs41: sequence operation) introduce a couple of put_rpccred() calls on credentials for which there is no corresponding get_rpccred(). See http://bugzilla.kernel.org/show_bug.cgi?id=14249 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-25sched, cpuacct: Fix niced guest time accountingRyota Ozaki
CPU time of a guest is always accounted in 'user' time without concern for the nice value of its counterpart process although the guest is scheduled under the nice value. This patch fixes the defect and accounts cpu time of a niced guest in 'nice' time as same as a niced process. And also the patch adds 'guest_nice' to cpuacct. The value provides niced guest cpu time which is like 'nice' to 'user'. The original discussions can be found here: http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com> Acked-by: Avi Kivity <avi@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1256314810-7897-1-git-send-email-ozaki.ryota@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-25Merge branch 'linus' into sched/coreIngo Molnar
Conflicts: fs/proc/array.c Merge reason: resolve conflict and queue up dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-25LSM: imbed ima calls in the security hooksMimi Zohar
Based on discussions on LKML and LSM, where there are consecutive security_ and ima_ calls in the vfs layer, move the ima_ calls to the existing security_ hooks. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-10-23NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCETrond Myklebust
RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are not supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc operations. The problem is that we map that error into EREMOTEIO in the XDR layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(), and nfs_increment_seqid() don't recognise it. The fix is to defer the mapping until after the middle layers have processed the error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-23nfs: Panic when commit failsTerry Loftin
Actually pass the NFS_FILE_SYNC option to the server to avoid a Panic in nfs_direct_write_complete() when a commit fails. At the end of an nfs write, if the nfs commit fails, all the writes will be rescheduled. They are supposed to be rescheduled as NFS_FILE_SYNC writes, but the rpc_task structure is not completely intialized and so the option is not passed. When the rescheduled writes complete, the return indicates that they are NFS_UNSTABLE and we try to do another commit. This leads to a Panic because the commit data structure pointer was set to null in the initial (failed) commit attempt. Signed-off-by: Terry Loftin <terry.loftin@hp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-22Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds
* 'for-linus' of git://git.infradead.org/users/eparis/notify: dnotify: ignore FS_EVENT_ON_CHILD inotify: fix coalesce duplicate events into a single event in special case inotify: deprecate the inotify kernel interface fsnotify: do not set group for a mark before it is on the i_list
2009-10-22nfs: Fix nfs_parse_mount_options() kfree() leakYinghai Lu
Fix a (small) memory leak in one of the error paths of the NFS mount options parsing code. Regression introduced in 2.6.30 by commit a67d18f (NFS: load the rpc/rdma transport module automatically). Reported-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22fs: pipe.c null pointer dereferenceEarl Chew
This patch fixes a null pointer exception in pipe_rdwr_open() which generates the stack trace: > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70 > [<ffffffff8028125c>] __dentry_open+0x13c/0x230 > [<ffffffff8028143d>] do_filp_open+0x2d/0x40 > [<ffffffff802814aa>] do_sys_open+0x5a/0x100 > [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67 The failure mode is triggered by an attempt to open an anonymous pipe via /proc/pid/fd/* as exemplified by this script: ============================================================= while : ; do { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } & PID=$! OUT=$(ps -efl | grep 'sleep 1' | grep -v grep | { read PID REST ; echo $PID; } ) OUT="${OUT%% *}" DELAY=$((RANDOM * 1000 / 32768)) usleep $((DELAY * 1000 + RANDOM % 1000 )) echo n > /proc/$OUT/fd/1 # Trigger defect done ============================================================= Note that the failure window is quite small and I could only reliably reproduce the defect by inserting a small delay in pipe_rdwr_open(). For example: static int pipe_rdwr_open(struct inode *inode, struct file *filp) { msleep(100); mutex_lock(&inode->i_mutex); Although the defect was observed in pipe_rdwr_open(), I think it makes sense to replicate the change through all the pipe_*_open() functions. The core of the change is to verify that inode->i_pipe has not been released before attempting to manipulate it. If inode->i_pipe is no longer present, return ENOENT to indicate so. The comment about potentially using atomic_t for i_pipe->readers and i_pipe->writers has also been removed because it is no longer relevant in this context. The inode->i_mutex lock must be used so that inode->i_pipe can be dealt with correctly. Signed-off-by: Earl Chew <earl_chew@agilent.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-20dnotify: ignore FS_EVENT_ON_CHILDAndreas Gruenbacher
Mask off FS_EVENT_ON_CHILD in dnotify_handle_event(). Otherwise, when there is more than one watch on a directory and dnotify_should_send_event() succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause spurious events. This case was overlooked in commit e42e2773. #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> static void create_event(int s, siginfo_t* si, void* p) { printf("create\n"); } static void delete_event(int s, siginfo_t* si, void* p) { printf("delete\n"); } int main (void) { struct sigaction action; char *tmpdir, *file; int fd1, fd2; sigemptyset (&action.sa_mask); action.sa_flags = SA_SIGINFO; action.sa_sigaction = create_event; sigaction (SIGRTMIN + 0, &action, NULL); action.sa_sigaction = delete_event; sigaction (SIGRTMIN + 1, &action, NULL); # define TMPDIR "/tmp/test.XXXXXX" tmpdir = malloc(strlen(TMPDIR) + 1); strcpy(tmpdir, TMPDIR); mkdtemp(tmpdir); # define TMPFILE "/file" file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1); sprintf(file, "%s/%s", tmpdir, TMPFILE); fd1 = open (tmpdir, O_RDONLY); fcntl(fd1, F_SETSIG, SIGRTMIN); fcntl(fd1, F_NOTIFY, DN_MULTISHOT | DN_CREATE); fd2 = open (tmpdir, O_RDONLY); fcntl(fd2, F_SETSIG, SIGRTMIN + 1); fcntl(fd2, F_NOTIFY, DN_MULTISHOT | DN_DELETE); if (fork()) { /* This triggers a create event */ creat(file, 0600); /* This triggers a create and delete event (!) */ unlink(file); } else { sleep(1); rmdir(tmpdir); } return 0; } Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-10-18inet: rename some inet_sock fieldsEric Dumazet
In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-18inotify: fix coalesce duplicate events into a single event in special caseWei Yongjun
If we do rename a dir entry, like this: rename("/tmp/ino7UrgoJ.rename1", "/tmp/ino7UrgoJ.rename2") rename("/tmp/ino7UrgoJ.rename2", "/tmp/ino7UrgoJ") The duplicate events should be coalesced into a single event. But those two events do not be coalesced into a single event, due to some bad check in event_compare(). It can not match the two NULL inodes as the same event. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-10-18fsnotify: do not set group for a mark before it is on the i_listEric Paris
fsnotify_add_mark is supposed to add a mark to the g_list and i_list and to set the group and inode for the mark. fsnotify_destroy_mark_by_entry uses the fact that ->group != NULL to know if this group should be destroyed or if it's already been done. But fsnotify_add_mark sets the group and inode before it actually adds the mark to the i_list and g_list. This can result in a race in inotify, it requires 3 threads. sys_inotify_add_watch("file") sys_inotify_add_watch("file") sys_inotify_rm_watch([a]) inotify_update_watch() inotify_new_watch() inotify_add_to_idr() ^--- returns wd = [a] inotfiy_update_watch() inotify_new_watch() inotify_add_to_idr() fsnotify_add_mark() ^--- returns wd = [b] returns to userspace; inotify_idr_find([a]) ^--- gives us the pointer from task 1 fsnotify_add_mark() ^--- this is going to set the mark->group and mark->inode fields, but will return -EEXIST because of the race with [b]. fsnotify_destroy_mark() ^--- since ->group != NULL we call back into inotify_freeing_mark() which calls inotify_remove_from_idr([a]) since fsnotify_add_mark() failed we call: inotify_remove_from_idr([a]) <------WHOOPS it's not in the idr, this could have been any entry added later! The fix is to make sure we don't set mark->group until we are sure the mark is on the inode and fsnotify_add_mark will return success. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-10-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fix socket fd translation dlm: fix lowcomms_connect_node for sctp
2009-10-15Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: always pin metadata in discard mode Btrfs: enable discard support Btrfs: add -o discard option Btrfs: properly wait log writers during log sync Btrfs: fix possible ENOSPC problems with truncate Btrfs: fix btrfs acl #ifdef checks Btrfs: streamline tree-log btree block writeout Btrfs: avoid tree log commit when there are no changes Btrfs: only write one super copy during fsync
2009-10-14sysfs: Allow sysfs_notify_dirent to be called from interrupt context.Neil Brown
sysfs_notify_dirent is a simple atomic operation that can be used to alert user-space that new data can be read from a sysfs attribute. Unfortunately it cannot currently be called from non-process context because of its use of spin_lock which is sometimes taken with interrupts enabled. So change all lockers of sysfs_open_dirent_lock to disable interrupts, thus making sysfs_notify_dirent safe to be called from non-process context (as drivers/md does in md_safemode_timeout). sysfs_get_open_dirent is (documented as being) only called from process context, so it uses spin_lock_irq. Other places use spin_lock_irqsave. The usage for sysfs_notify_dirent in md_safemode_timeout was introduced in 2.6.28, so this patch is suitable for that and more recent kernels. Reported-by: Joel Andres Granados <jgranado@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>