summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-02-26Btrfs: do not change inode flags in renameLiu Bo
Before we forced to change a file's NOCOW and COMPRESS flag due to the parent directory's, but this ends up a bad idea, because it confuses end users a lot about file's NOCOW status, eg. if someone change a file to NOCOW via 'chattr' and then rename it in the current directory which is without NOCOW attribute, the file will lose the NOCOW flag silently. This diables 'change flags in rename', so from now on we'll only inherit flags from the parent directory on creation stage while in other places we can use 'chattr' to set NOCOW or COMPRESS flags. Reported-by: Marios Titas <redneb8888@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26Btrfs: use reserved space for creating a snapshotLiu Bo
While inserting dir index and updating inode for a snapshot, we'd add delayed items which consume trans->block_rsv, if we don't have any space reserved in this trans handle, we either just return or reserve space again. But before creating pending snapshots during committing transaction, we've done a release on this trans handle, so we don't have space reserved in it at this stage. What we're using is block_rsv of pending snapshots which has already reserved well enough space for both inserting dir index and updating inode, so we need to set trans handle to indicate that we have space now. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26clear chunk_alloc flag on retryable failureAlexandre Oliva
I've experienced filesystem freezes with permanent spikes in the active process count for quite a while, particularly on filesystems whose available raw space has already been fully allocated to chunks. While looking into this, I found a pretty obvious error in do_chunk_alloc: it sets space_info->chunk_alloc, but if btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving that flag set, which causes any other threads waiting for space_info->chunk_alloc to become zero to spin indefinitely. I haven't double-checked that this patch fixes the failure I've observed fully (it's not exactly trivial to trigger), but it surely is a bug and the fix is trivial, so... Please put it in :-) What I saw in that function also happens to explain why in some cases I see filesystems allocate a huge number of chunks that remain unused (leading to the scenario above, of not having more chunks to allocate). It happens for data and metadata, but not necessarily both. I'm guessing some thread sets the force_alloc flag on the corresponding space_info, and then several threads trying to get disk space end up attempting to allocate a new chunk concurrently. All of them will see the force_alloc flag and bump their local copy of force up to the level they see first, and they won't clear it even if another thread succeeds in allocating a chunk, thus clearing the force flag. Then each thread that observed the force flag will, on its turn, force the allocation of a new chunk. And any threads that come in while it does that will see the force flag still set and pick it up, and so on. This sounds like a problem to me, but... what should the correct behavior be? Clear force_flag once we copy it to a local force? Reset force to the incoming value on every loop? Set the flag to our incoming force if we have it at first, clear our local flag, and move it from the space_info when we determined that we are the thread that's going to perform the allocation? btrfs: clear chunk_alloc flag on retryable failure From: Alexandre Oliva <oliva@gnu.org> If btrfs_alloc_chunk fails with e.g. ENOMEM, we exit do_chunk_alloc without clearing chunk_alloc in space_info. As a result, any further calls to do_chunk_alloc on that filesystem will start busy-waiting for chunk_alloc to be cleared, but it never will be. This patch adjusts do_chunk_alloc so that it clears this flag in case of an error. Signed-off-by: Alexandre Oliva <oliva@gnu.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26Btrfs: fix backref walking race with tree deletionsJan Schmidt
When a subvolume is removed, we remove the root item from the root tree, while the tree blocks and backrefs remain for a while. When backref walking comes across one of those orphan tree blocks, it can find a backref for a no longer existing root. This is all good, we only must tolerate __resolve_indirect_ref returning an error and continue with the good refs found. Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26Btrfs: make sure NODATACOW also gets NODATASUM setJosef Bacik
A user reported hitting the BUG_ON() in btrfs_finished_ordered_io() where we had csums on a NOCOW extent. This can happen if we have NODATACOW set but not NODATASUM set, which can happen in two cases, either we mount with -o nodatacow and then write into preallocated space, or chattr +C a directory and move a file into that directory. Liu has fixed the move case in a different place, but this fixes the mount -o nodatacow case. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26sfc: Detach net device when stopping queues for reconfigurationBen Hutchings
We must only ever stop TX queues when they are full or the net device is not 'ready' so far as the net core, and specifically the watchdog, is concerned. Otherwise, the watchdog may fire *immediately* if no packets have been added to the queue in the last 5 seconds. The device is ready if all the following are true: (a) It has a qdisc (b) It is marked present (c) It is running (d) The link is reported up (a) and (c) are normally true, and must not be changed by a driver. (d) is under our control, but fake link changes may disturb userland. This leaves (b). We already mark the device absent during reset and self-test, but we need to do the same during MTU changes and ring reallocation. We don't need to do this when the device is brought down because then (c) is already false. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-02-26sfc: Fix efx_rx_buf_offset() in the presence of swiotlbBen Hutchings
We assume that the mapping between DMA and virtual addresses is done on whole pages, so we can find the page offset of an RX buffer using the lower bits of the DMA address. However, swiotlb maps in units of 2K, breaking this assumption. Add an explicit page_offset field to struct efx_rx_buffer. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-02-26sfc: Properly sync RX DMA buffer when it is not the last in the pageBen Hutchings
We may currently allocate two RX DMA buffers to a page, and only unmap the page when the second is completed. We do not sync the first RX buffer to be completed; this can result in packet loss or corruption if the last RX buffer completed in a NAPI poll is the first in a page and is not DMA-coherent. (In the middle of a NAPI poll, we will handle the following RX completion and unmap the page *before* looking at the content of the first buffer.) Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-02-26i5100_edac: convert to use simple_open()Wei Yongjun
This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-26ghes_edac: fix to use list_for_each_entry_safe() when delete list itemsWei Yongjun
Since we will remove items off the list using list_del() we need to use a safe version of the list_for_each_entry() macro aptly named list_for_each_entry_safe(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-26ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge windowVineet Gupta
Since ARC port was not yet upstream, we missed a bunch of cross-arch Kconfig removals: * GENERIC_SIGALTSTACK: d64008a8f3 "burying unused conditionals" * HAVE_IRQ_WORK: 6147a9d807 "irq_work: Remove CONFIG_HAVE_IRQ_WORK" * ARCH_NO_VIRT_TO_BUS: e0cf2ef484 "arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS" Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-02-26ARC: make a copy of flat DTVineet Gupta
The flat DT (currently embedded in vmlinux) is in .init section. The unflattened/binary tree doesn't copy strings through and references them from orig flat DT - which could cause catestrohpy if of_* APIs are called post init, say from a driver which is a loadable module. Reported-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-02-26saner proc_get_inode() calling conventionsAl Viro
Make it drop the pde in *all* cases when no new reference to it is put into an inode - both when an inode had already been set up (as we were already doing) and when inode allocation has failed. Makes for simpler logics in callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26proc: avoid extra pde_put() in proc_fill_super()Maxim Patlasov
If proc_get_inode() succeeded, but d_make_root() failed, pde_put() for proc_root will be called twice: the first time due to iput() called from d_make_root() and the second time directly in the end of proc_fill_super(). Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26fs: change return values from -EACCES to -EPERMZhao Hongjiang
According to SUSv3: [EACCES] Permission denied. An attempt was made to access a file in a way forbidden by its file access permissions. [EPERM] Operation not permitted. An attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resource. So -EPERM should be returned if capability checks fails. Strictly speaking this is an API change since the error code user sees is altered. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26fs/exec.c: make bprm_mm_init() staticYuanhan Liu
There is only one user of bprm_mm_init, and it's inside the same file. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ocfs2/dlm: use GFP_ATOMIC inside a spin_lockDan Carpenter
My static checker complains that this is called with a spin_lock held in dlm_master_requery_handler() from dlmrecovery.c. Probably the reason we have not received any bug reports about this is that recovery is not a common operation. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ocfs2: fix possible use-after-free with AIOJan Kara
Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code pathSunil Mushran
Commit ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1 was missing a var init. Reported-and-Tested-by: Vincent Etienne <vetienne@aprogsys.com> Signed-off-by: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zeroAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26target: writev() on single-element vector is pointlessAl Viro
... in other news: filp_open() can't return a struct file with NULL dentry filp_open() can't return a struct file negative dentry filp_close() of something that never had been in any descriptor tables is pointless - fput() is all you need Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26export kernel_write(), convert open-coded instancesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26fs: encode_fh: return FILEID_INVALID if invalid fid_typeNamjae Jeon
This patch is a follow up on below patch: [PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63 Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26kill f_vfsmntAl Viro
very few users left... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry opJeff Layton
The following set of operations on a NFS client and server will cause server# mkdir a client# cd a server# mv a a.bak client# sleep 30 # (or whatever the dir attrcache timeout is) client# stat . stat: cannot stat `.': Stale NFS file handle Obviously, we should not be getting an ESTALE error back there since the inode still exists on the server. The problem is that the lookup code will call d_revalidate on the dentry that "." refers to, because NFS has FS_REVAL_DOT set. nfs_lookup_revalidate will see that the parent directory has changed and will try to reverify the dentry by redoing a LOOKUP. That of course fails, so the lookup code returns ESTALE. The problem here is that d_revalidate is really a bad fit for this case. What we really want to know at this point is whether the inode is still good or not, but we don't really care what name it goes by or whether the dcache is still valid. Add a new d_op->d_weak_revalidate operation and have complete_walk call that instead of d_revalidate. The intent there is to allow for a "weaker" d_revalidate that just checks to see whether the inode is still good. This is also gives us an opportunity to kill off the FS_REVAL_DOT special casing. [AV: changed method name, added note in porting, fixed confusion re having it possibly called from RCU mode (it won't be)] Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26nfsd: handle vfs_getattr errors in acl protocolJ. Bruce Fields
We're currently ignoring errors from vfs_getattr. The correct thing to do is to do the stat in the main service procedure not in the response encoding. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26switch vfs_getattr() to struct pathAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26default SET_PERSONALITY() in linux/elf.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ceph: prepopulate inodes only when request is abortedSage Weil
If r_aborted is true, we do not hold the dir i_mutex, and cannot touch the dcache. However, we still need to update the inodes with the state returned by the MDS. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26d_hash_and_lookup(): export, switch open-coded instancesAl Viro
* calling conventions change - ERR_PTR() is returned on ->d_hash() errors; NULL is just for dcache miss now. * exported, open-coded instances in ncpfs and cifs converted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: split dropping the acls from v9fs_set_create_acl()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: switch v9fs_acl_chmod() from dentry to inode+fidAl Viro
caller has both, might as well pass them explicitly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: switch v9fs_set_acl() from dentry to fidAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: lift the call of set_cached_acl() into the callers of v9fs_set_acl()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-269p: add fid-based variant of v9fs_xattr_set()Al Viro
... making v9fs_xattr_set() a wrapper for it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26tegra: don't wank with d_find_alias()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26lirc: get rid of bogus checksAl Viro
file argument is a struct file being passed to ->open() or already opened; none of the checks in lirc_get_pdata() can fail. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26hugetlb_file_setup(): use d_alloc_pseudo()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26shmem_setup_file(): use d_alloc_pseudo() instead of d_alloc()Al Viro
Note that provided ->d_dname() reproduces what we used to get for those guys in e.g. /proc/self/maps; it might be a good idea to change that to something less ugly, but for now let's keep the existing user-visible behaviour Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-25xtensa: add accept4 syscallChris Zankel
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-26openrisc: add missing header inclusionStefan Kristiansson
Prevents build issue with updated toolchain Reported-by: Jack Thomasson <jkt@moonlitsw.com> Tested-by: Christian Svensson <blue@cmd.nu> Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Signed-off-by: Jonas Bonn <jonas@southpole.se>
2013-02-25Merge tag 'pm+acpi-fixes-3.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: - Fixes for blackfin and microblaze build problems introduced by the removal of global pm_idle. From Lars-Peter Clausen. - OPP core build fix from Shawn Guo. - Error condition check fix for the new imx6q-cpufreq driver from Wei Yongjun. - Fix for an AER driver crash related to the lack of APEI initialization for acpi=off. From Rafael J Wysocki. - Fix for a USB breakage on Thinkpad T430 related to ACPI power resources and PCI wakeup from Rafael J. Wysocki. * tag 'pm+acpi-fixes-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PM: Take unusual configurations of power resources into account imx6q-cpufreq: fix return value check in imx6q_cpufreq_probe() PM / OPP: fix condition for empty of_init_opp_table() ACPI / APEI: Fix crash in apei_hest_parse() for acpi=off microblaze idle: Fix compile error blackfin idle: Fix compile error
2013-02-25Merge tag 'pci-v3.9-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu) - Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu) - Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu) - Stop caching _PRT and make independent of bus numbers (Yinghai Lu) PCI device hotplug - Clean up cpqphp dead code (Sasha Levin) - Disable ARI unless device and upstream bridge support it (Yijing Wang) - Initialize all hot-added devices (not functions 0-7) (Yijing Wang) Power management - Don't touch ASPM if disabled (Joe Lawrence) - Fix ASPM link state management (Myron Stowe) Miscellaneous - Fix PCI_EXP_FLAGS accessor (Alex Williamson) - Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov) - Document hotplug resource and MPS parameters (Yijing Wang) - Add accessor for PCIe capabilities (Myron Stowe) - Drop pciehp suspend/resume messages (Paul Bolle) - Make pci_slot built-in only (not a module) (Jiang Liu) - Remove unused PCI/ACPI bind ops (Jiang Liu) - Removed used pci_root_bus (Bjorn Helgaas)" * tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits) PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS ACPI / PCI: Make pci_slot built-in only, not a module PCI/PM: Clear state_saved during suspend PCI: Use atomic_inc_return() rather than atomic_add_return() PCI: Catch attempts to disable already-disabled devices PCI: Disable Bus Master unconditionally in pci_device_shutdown() PCI: acpiphp: Remove dead code for PCI host bridge hotplug PCI: acpiphp: Create companion ACPI devices before creating PCI devices PCI: Remove unused "rc" in virtfn_add_bus() PCI: pciehp: Drop suspend/resume ENTRY messages PCI/ASPM: Don't touch ASPM if forcibly disabled PCI/ASPM: Deallocate upstream link state even if device is not PCIe PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etc PCI: Document hpiosize= and hpmemsize= resource reservation parameters PCI: Use PCI Express Capability accessor PCI: Introduce accessor to retrieve PCIe Capabilities Register PCI: Put pci_dev in device tree as early as possible PCI: Skip attaching driver in device_add() PCI: acpiphp: Keep driver loaded even if no slots found ...
2013-02-25NFSv4.1: Hold reference to layout hdr in layoutgetWeston Andros Adamson
This fixes an oops where a LAYOUTGET is in still in the rpciod queue, but the requesting processes has been killed. Without this, killing the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout to NULL while the LAYOUTGET rpc task still references it. Example oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] PGD 7365b067 PUD 7365d067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4 CPU 0 Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa01bd586>] [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] RSP: 0018:ffff88007b0c1c88 EFLAGS: 00010246 RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958 RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958 R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0) Stack: ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791 Call Trace: [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4] [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc] [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc] Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Weston Andros Adamson <dros@netapp.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-26md/raid1,raid10: fix deadlock with freeze_array()NeilBrown
When raid1/raid10 needs to fix a read error, it first drains all pending requests by calling freeze_array(). This calls flush_pending_writes() if it needs to sleep, but some writes may be pending in a per-process plug rather than in the per-array request queue. When raid1{,0}_unplug() moves the request from the per-process plug to the per-array request queue (from which flush_pending_writes() can flush them), it needs to wake up freeze_array(), or freeze_array() will never flush them and so it will block forever. So add the requires wake_up() calls. This bug was introduced by commit f54a9d0e59c4bea3db733921ca9147612a6f292c for raid1 and a similar commit for RAID10, and so has been present since linux-3.6. As the bug causes a deadlock I believe this fix is suitable for -stable. Cc: stable@vger.kernel.org (3.6.y 3.7.y 3.8.y) Reported-by: Tregaron Bayly <tbayly@bluehost.com> Tested-by: Tregaron Bayly <tbayly@bluehost.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-26md/raid0: improve error message when converting RAID4-with-spares to RAID0NeilBrown
Mentioning "bad disk number -1" exposes irrelevant internal detail. Just say they are inactive and must be removed. Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-26md: raid0: fix error return from create_stripe_zones.NeilBrown
Create_stripe_zones returns an error slightly differently to raid0_run and to raid0_takeover_*. The error returned used by the second was wrong and an error would result in mddev->private being set to NULL and sooner or later a crash. So never return NULL, return ERR_PTR(err), not NULL from create_stripe_zones. This bug has been present since 2.6.35 so the fix is suitable for any kernel since then. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-26md: fix two bugs when attempting to resize RAID0 array.NeilBrown
You cannot resize a RAID0 array (in terms of making the devices bigger), but the code doesn't entirely stop you. So: disable setting of the available size on each device for RAID0 and Linear devices. This must not change as doing so can change the effective layout of data. Make sure that the size that raid0_size() reports is accurate, but rounding devices sizes to chunk sizes. As the device sizes cannot change now, this isn't so important, but it is best to be safe. Without this change: mdadm --grow /dev/md0 -z max mdadm --grow /dev/md0 -Z max then read to the end of the array can cause a BUG in a RAID0 array. These bugs have been present ever since it became possible to resize any device, which is a long time. So the fix is suitable for any -stable kerenl. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-26DM RAID: Add support for MD's RAID10 "far" and "offset" algorithmsJonathan Brassow
DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10 implementation. This patch adds support for the "far" and "offset" algorithms, but only with the improved redundancy that is brought with the introduction of the 'use_far_sets' bit, which shifts copied stripes according to smaller sets vs the entire array. That is, the 17th bit of the 'layout' variable that defines the RAID10 implementation will always be set. (More information on how the 'layout' variable selects the RAID10 algorithm can be found in the opening comments of drivers/md/raid10.c.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>