summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-01-05GFS2: Add more detail to debugfs glock dumpsSteven Whitehouse
Although the glock dumps print quite a lot of information about the glocks themselves, there are more things which can be usefully added to the dump realting to the objects themselves. This patch adds a few more fields to the inode and resource group lines, which should be useful for debugging. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Banish struct gfs2_rgrpd_hostSteven Whitehouse
This patch moves the final field so that we can get rid of struct gfs2_rgrpd_host, as promised some time ago. Also by rearranging the fields slightly, we are able to reduce the size of the gfs2_rgrpd structure at the same time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpdSteven Whitehouse
The second of three fields which need to move, in order to remove the struct gfs2_rgrpd_host. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move rg_igeneration into struct gfs2_rgrpdSteven Whitehouse
This moves one of the fields of struct gfs2_rgrpd_host into the struct gfs2_rgrpd with the eventual aim of removing the struct rgrpd_host completely. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Banish struct gfs2_dinode_hostSteven Whitehouse
The final field in gfs2_dinode_host was the i_flags field. Thats renamed to i_diskflags in order to avoid confusion with the existing inode flags, and moved into the inode proper at a suitable location to avoid creating a "hole". At that point struct gfs2_dinode_host is no longer needed and as promised (quite some time ago!) it can now be removed completely. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksizeSteven Whitehouse
This patch moved the i_size field from the gfs2_dinode_host and following the ext3 convention renames it i_disksize. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move di_eattr into "proper" inodeSteven Whitehouse
This moves the di_eattr field out of gfs2_inode_host and into the inode proper. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move "entries" into "proper" inodeSteven Whitehouse
This moves the directory entry count into the proper inode. Potentially we could get this to share the space used by something else in the future, but this is one more step on the way to removing the gfs2_dinode_host structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move generation number into "proper" part of inodeSteven Whitehouse
This moves the generation number from the gfs2_dinode_host into the gfs2_inode structure. Eventually the plan is to get rid of the gfs2_dinode_host structure completely. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: sparse annotation of gl->gl_spinHarvey Harrison
fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context fs/gfs2/glock.c:308:5: context '*gl+28': wanted >= 1, got 0 fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context fs/gfs2/glock.c:529:2: context '*gl+28': wanted >= 1, got 0 fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context fs/gfs2/glock.c:925:3: context '*gl+28': wanted >= 1, got 0 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Fix up jdata writepage/delete_inodeSteven Whitehouse
There is a bug in writepage and delete_inode which allows jdata files to invalidate pages from the address space without being in a transaction at the time. This causes problems in case the pages are in the journal. This patch fixes that case and prevents the resulting oops. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Rationalise header filesSteven Whitehouse
Move the contents of some headers which contained very little into more sensible places, and remove the original header files. This should make it easier to find things. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Support for FIEMAP ioctlSteven Whitehouse
This patch implements the FIEMAP ioctl for GFS2. We can use the generic code (aside from a lock order issue, solved as per Ted Tso's suggestion) for which I've introduced a new variant of the generic function. We also have one exception to deal with, namely stuffed files, so we do that "by hand", setting all the required flags. This has been tested with a modified (I could only find an old version) of Eric's test program, and appears to work correctly. This patch does not currently support FIEMAP of xattrs, but the plan is to add that feature at some future point. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com>
2009-01-04jbd2: Submit writes to the journal using WRITE_SYNCTheodore Ts'o
Since we will be waiting the write of the commit record to the journal to complete in journal_submit_commit_record(), submit it using WRITE_SYNC. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-04Merge branch 'audit.b61' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: audit: validate comparison operations, store them in sane form clean up audit_rule_{add,del} a bit make sure that filterkey of task,always rules is reported audit rules ordering, part 2 fixing audit rule ordering mess, part 1 audit_update_lsm_rules() misses the audit_inode_hash[] ones sanitize audit_log_capset() sanitize audit_fd_pair() sanitize audit_mq_open() sanitize AUDIT_MQ_SENDRECV sanitize audit_mq_notify() sanitize audit_mq_getsetattr() sanitize audit_ipc_set_perm() sanitize audit_ipc_obj() sanitize audit_socketcall don't reallocate buffer in every audit_sockaddr()
2009-01-04fs: symlink write_begin allocation context fixNick Piggin
With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04fs: introduce bgl_lock_ptr()Pekka Enberg
As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in <linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to filesystem specific header files to break the hidden dependency to struct ext[234]_sb_info. Also, while at it, convert the macros to static inlines to try make up for all the times I broke Andrew Morton's tree. Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04sanitize audit_fd_pair()Al Viro
* no allocations * return void Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-03jbd2: Add pid and journal device name to the "kjournald2 starting" messageTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-03ext4: Add markers for better debuggabilityTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-06ext4: Remove code to create the journal inodeTheodore Ts'o
This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-05ext4: provide function to release metadata pages under memory pressureToshiyuki Okajima
Pages in the page cache belonging to ext4 data files are released via the ext4_releasepage() function specified in the ext4 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a release_metadata function which calls jbd2_journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
2009-01-05ext3: provide function to release metadata pages under memory pressureToshiyuki Okajima
Pages in the page cache belonging to ext3 data files are released via the ext3_releasepage() function specified in the ext3 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a try_to_free_pages() function which calls journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
2009-01-03Merge branch 'cpus4096-for-linus-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits) x86: setup_per_cpu_areas() cleanup cpumask: fix compile error when CONFIG_NR_CPUS is not defined cpumask: use alloc_cpumask_var_node where appropriate cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t x86: use cpumask_var_t in acpi/boot.c x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids sched: put back some stack hog changes that were undone in kernel/sched.c x86: enable cpus display of kernel_max and offlined cpus ia64: cpumask fix for is_affinity_mask_valid() cpumask: convert RCU implementations, fix xtensa: define __fls mn10300: define __fls m32r: define __fls h8300: define __fls frv: define __fls cris: define __fls cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS cpumask: zero extra bits in alloc_cpumask_var_node cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ cpumask: convert mm/ ...
2009-01-03get rid of special-casing the /sbin/loader on alphaAl Viro
... just make it a binfmt handler like #! one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-03sanitize ifdefs in binfmt_aoutAl Viro
They are actually alpha vs. i386/arm/m68k i.e. ecoff vs. aout. In the only place where we actually tried to handle arm and i386/m68k in different ways (START_DATA() in coredump handling), the arm variant works for all of them (i386 and m68k have u.start_code set to 0). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-03remove the rudiment of a.out for sparcAl Viro
it's been used only in sunos compat Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-03add releasepage hooks to block devices which can be used by file systemsTheodore Ts'o
Implement blkdev_releasepage() to release the buffer_heads and pages after we release private data belonging to a mounted filesystem. Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-05ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelallocAneesh Kumar K.V
With nodelalloc option we need to update the dirty block counter on block allocation failure. This is needed because we increment the dirty block counter early in the block allocation phase. Without the patch s_dirty_blocks_counter goes wrong so that filesystem's free blocks decreases incorrectly. Tested-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-05ext4: Init the complete page while building buddy cacheAneesh Kumar K.V
We need to init the complete page during buddy cache init by setting the contents to '1'. Otherwise we can see the following errors after doing an online resize of the filesystem: EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used: Allocating block 1040385 in system zone of 127 group Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-05ext4: Don't allow new groups to be added during block allocationAneesh Kumar K.V
After we mark the blocks in the buddy cache as allocated, we need to ensure that we don't reinit the buddy cache until the block bitmap is updated. This commit achieves this by holding the group_info alloc_semaphore till ext4_mb_release_context Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-05ext4: mark the blocks/inode bitmap beyond end of group as usedAneesh Kumar K.V
We need to mark the block/inode bitmap beyond the end of the group with '1'. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-05ext4: Use new buffer_head flag to check uninit group bitmaps initializationAneesh Kumar K.V
For uninit block group, the on-disk bitmap is not initialized. That implies we cannot depend on the uptodate flag on the bitmap buffer_head to find bitmap validity. Use a new buffer_head flag which would be set after we properly initialize the bitmap. This also prevents (re-)initializing the uninit group bitmap every time we call ext4_read_block_bitmap(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-05ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()Aneesh Kumar K.V
We need to make sure we update the inode bitmap and clear EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide whether to initialize the inode bitmap each time it is called. (introduced by commit c806e68f.) ext4_read_inode_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { ext4_init_inode_bitmap(sb, bh, block_group, desc); and ext4_new_inode does if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group), ino, inode_bitmap_bh->b_data)) ...... ... spin_lock(sb_bgl_lock(sbi, group)); gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT); i.e., on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a parallel ext4_read_inode_bitmap can zero out the bitmap in between the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449 EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ... EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ... # ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71 ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-03ext4: code cleanupAneesh Kumar K.V
Rename some variables. We also unlock locks in the reverse order we acquired as a part of cleanup. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-05ext4: Use high 16 bits of the block group descriptor's free counts fieldsAneesh Kumar K.V
Rename the lower bits with suffix _lo and add helper to access the values. Also rename bg_itable_unused_hi to bg_pad as in e2fsprogs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-05ext4: Fix race between read_block_bitmap() and mark_diskspace_used()Aneesh Kumar K.V
We need to make sure we update the block bitmap and clear EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide whether to initialize the block bitmap each time it is called (introduced by commit c806e68f), and this can race with block allocations in ext4_mb_mark_diskspace_used(). ext4_read_block_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { ext4_init_block_bitmap(sb, bh, block_group, desc); Now on the block allocation side we do mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); .... spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ie on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a parallel ext4_read_block_bitmap can zero out the bitmap in between the above mb_set_bits and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105 EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block .. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2009-01-05ext4: fix BUG when calling ext4_error with locked block groupAneesh Kumar K.V
The mballoc code likes to call ext4_error while it is holding locked block groups. This can causes a scheduling in atomic context BUG. We can't just unlock the block group and relock it after/if ext4_error returns since that might result in race conditions in the case where the filesystem is set to continue after finding errors. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-02Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (33 commits) UBIFS: add more useful debugging prints UBIFS: print debugging messages properly UBIFS: fix numerous spelling mistakes UBIFS: allow mounting when short of space UBIFS: fix writing uncompressed files UBIFS: fix checkpatch.pl warnings UBIFS: fix sparse warnings UBIFS: simplify make_free_space UBIFS: do not lie about used blocks UBIFS: restore budg_uncommitted_idx UBIFS: always commit on unmount UBIFS: use ubi_sync UBIFS: always commit in sync_fs UBIFS: fix file-system synchronization UBIFS: fix constants initialization UBIFS: avoid unnecessary calculations UBIFS: re-calculate min_idx_size after the commit UBIFS: use nicer 64-bit math UBIFS: fix available blocks count UBIFS: various comment improvements and fixes ...
2009-01-02Merge branch 'kvm-updates/2.6.29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits) KVM: MMU: handle large host sptes on invlpg/resync KVM: Add locking to virtual i8259 interrupt controller KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared MAINTAINERS: Maintainership changes for kvm/ia64 KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs() KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip KVM: fix handling of ACK from shared guest IRQ KVM: MMU: check for present pdptr shadow page in walk_shadow KVM: Consolidate userspace memory capability reporting into common code KVM: Advertise the bug in memory region destruction as fixed KVM: use cpumask_var_t for cpus_hardware_enabled KVM: use modern cpumask primitives, no cpumask_t on stack KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus KVM: set owner of cpu and vm file operations anon_inodes: use fops->owner for module refcount x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj KVM: MMU: prepopulate the shadow on invlpg KVM: MMU: skip global pgtables on sync due to cr3 switch KVM: MMU: collapse remote TLB flushes on root sync ...
2009-01-02CRED: Wrap task credential accesses in the devpts filesystemDavid Howells
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02devpts: fix unused function warningAndrew Morton
fs/devpts/inode.c:324: warning: 'compare_init_pts_sb' defined but not used Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02devpts: Coding style clean upAlan Cox
Just nail the oddments now while this code is being touched Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02Enable multiple instances of devptsSukadev Bhattiprolu
To support containers, allow multiple instances of devpts filesystem, such that indices of ptys allocated in one instance are independent of ptys allocated in other instances of devpts. But to preserve backward compatibility, enable this support for multiple instances only if: - CONFIG_DEVPTS_MULTIPLE_INSTANCES is set to Y, and - '-o newinstance' mount option is specified while mounting devpts To use multi-instance mount, a container startup script could: $ ns_exec -cm /bin/bash $ umount /dev/pts $ mount -t devpts -o newinstance lxcpts /dev/pts $ mount -o bind /dev/pts/ptmx /dev/ptmx $ /usr/sbin/sshd -p 1234 where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag and execs /bin/bash in the child process. A pty created by the sshd is not visible in the original mount of /dev/pts. USER-SPACE-IMPACT: - See Documentation/fs/devpts.txt (included in next patch) for user- space impact in multi-instance and mixed-mode operation. TODO: - Update mount(8), pts(4) man pages. Highlight impact of not redirecting /dev/ptmx to /dev/pts/ptmx after a multi-instance mount. Changelog[v6]: - [Dave Hansen] Use new get_init_pts_sb() interface - [Serge Hallyn] Don't bother displaying 'newinstance' in show_options - [Serge Hallyn] Use macros (PARSE_REMOUNT/PARSE_MOUNT) instead of 0/1. - [Serge Hallyn] Check error return from get_sb_single() (now get_init_pts_sb()) - devpts_pty_kill(): don't dput error dentries Changelog[v5]: - Move get_sb_ref() definition to earlier patch - Move usage info to Documentation/filesystems/devpts.txt (next patch) - Make ptmx node even in init_pts_ns, now that default mode is 0000 (defined in earlier patch, enabled here). - Cache ptmx dentry and use to update mode during remount (defined in earlier patch, enabled here). - Bugfix: explicitly ignore newinstance on remount (if newinstance was specified on remount of initial mount, it would be ignored but /proc/mounts would imply that the option was set) Changelog[v4]: - Update patch description to address H. Peter Anvin's comments - Consolidate multi-instance mode code under new config token, CONFIG_DEVPTS_MULTIPLE_INSTANCE. - Move usage-details from patch description to Documentation/fs/devpts.txt Changelog[v3]: - Rename new mount option to 'newinstance' - Create ptmx nodes only in 'newinstance' mounts - Bugfix: parse_mount_options() modifies @data but since we need to parse the @data twice (once in devpts_get_sb() and once during do_remount_sb()), parse a local copy of @data in devpts_get_sb(). (restructured code in devpts_get_sb() to fix this) Changelog[v2]: - Support both single-mount and multiple-mount semantics and provide '-onewmnt' option to select the semantics. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02Define get_init_pts_sb()Sukadev Bhattiprolu
See comments in the function header for details. The new interface will be used in a follow-on patch. Changelog [v2]: [Dave Hansen] Replace get_sb_ref() in fs/super.c with get_init_pts_sb() and make the new interface private to devpts Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02Define mknod_ptmx()Sukadev Bhattiprolu
/dev/ptmx is closely tied to the devpts filesystem. An open of /dev/ptmx, allocates the next pty index and the associated device shows up in the devpts fs as /dev/pts/n. Wih multiple instancs of devpts filesystem, during an open of /dev/ptmx we would be unable to determine which instance of the devpts is being accessed. So we move the 'ptmx' node into /dev/pts and use the inode of the 'ptmx' node to identify the superblock and hence the devpts instance. This patch adds ability for the kernel to internally create the [ptmx, c, 5:2] device when mounting devpts filesystem. Since the ptmx node in devpts is new and may surprise some userspace scripts, the default permissions for the new node is 0000. These permissions can be changed either using chmod or by remounting with the new '-o ptmxmode=0666' mount option. Changelog[v5]: - [Serge Hallyn bugfix]: Letting new_inode() assign inode number to ptmx can collide with hand-assigning inode numbers to ptys. So, hand-assign specific inode number to ptmx node also. - [Serge Hallyn]: Maybe safer to grab root dentry mutex while creating ptmx node - [Bugfix with Serge Hallyn] Replace lookup_one_len() in mknod_ptmx() wih d_alloc_name() (lookup during ->get_sb() locks up system). To simplify patchset, fold the ptmx_dentry patch into this. Changelog[v4]: - Change default permissions of pts/ptmx node to 0000. - Move code for ptmxmode under #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES. Changelog[v3]: - Rename ptmx_mode to ptmxmode (for consistency with 'newinstance') Changelog[v2]: - [H. Peter Anvin] Remove mknod() system call support and create the ptmx node internally. Changelog[v1]: - Earlier version of this patch enabled creating /dev/pts/tty as well. As pointed out by Al Viro and H. Peter Anvin, that is not really necessary. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02Extract option parsing to new functionSukadev Bhattiprolu
Move code to parse mount options into a separate function so it can (later) be shared between mount and remount operations. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02Per-mount 'config' objectSukadev Bhattiprolu
With support for multiple mounts of devpts, the 'config' structure really represents per-mount options rather than config parameters. Rename 'config' structure to 'pts_mount_opts' and store it in the super-block. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02Per-mount allocated_ptysSukadev Bhattiprolu
To enable multiple mounts of devpts, 'allocated_ptys' must be a per-mount variable rather than a global variable. Move 'allocated_ptys' into the super_block's s_fs_info. Changelog[v2]: Define and use DEVPTS_SB() wrapper. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02Remove devpts_root globalSukadev Bhattiprolu
Remove the 'devpts_root' global variable and find the root dentry using the super_block. The super-block can be found from the device inode, using the new wrapper, pts_sb_from_inode(). Changelog: This patch is based on an earlier patchset from Serge Hallyn and Matt Helsley. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>