summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-11-09elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo)Al Viro
we can't get to do_coredump() if that condition isn't satisfied... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09constify do_coredump() argumentAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09constify copy_siginfo_to_user{,32}()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09... and kill anon_inode_getfile_private()Al Viro
it's a seriously misguided API, now fortunately without users. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09rework aio migrate pages to use aio fsBenjamin LaHaise
Don't abuse anon_inodes.c to host private files needed by aio; we can bloody well declare a mini-fs of our own instead of patching up what anon_inodes can create for us. Tested-by: Benjamin LaHaise <bcrl@kvack.org> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09take anon inode allocation to libfs.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09new helper: dump_align()Al Viro
dump_skip to given alignment... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09spufs: get rid of dump_emit() wrappersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09dump_skip(): dump_seek() replacement taking coredump_paramsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09make dump_emit() use vfs_write() instead of banging at ->f_op->write directlyAl Viro
... and deal with short writes properly - the output might be to pipe, after all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot more than a page worth of data at one call. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09binfmt_elf: count notes towards coredump limitAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09aout: switch to dump_emitAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09switch elf_coredump_extra_notes_write() to dump_emit()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09convert the rest of binfmt_elf_fdpic to dump_emit()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09binfmt_elf: convert writing actual dump pages to dump_emit()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09switch elf_core_write_extra_data() to dump_emit()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09switch elf_core_write_extra_phdrs() to dump_emit()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09new helper: dump_emit()Al Viro
dump_write() analog, takes core_dump_params instead of file, keeps track of the amount written in cprm->written and checks for cprm->limit. Start using it in binfmt_elf.c... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09restore 32bit aout coredumpAl Viro
just getting rid of bitrot Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09no need to keep brlock macros anymore...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09coda_revalidate_inode(): switch to passing inode...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09fold __d_shrink() into its only remaining callerAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09get rid of s_files and files_lockAl Viro
The only thing we need it for is alt-sysrq-r (emergency remount r/o) and these days we can do just as well without going through the list of files. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09get rid of {lock,unlock}_rcu_walk()Al Viro
those have become aliases for rcu_read_{lock,unlock}() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09RCU'd vfsmountsAl Viro
* RCU-delayed freeing of vfsmounts * vfsmount_lock replaced with a seqlock (mount_lock) * sequence number from mount_lock is stored in nameidata->m_seq and used when we exit RCU mode * new vfsmount flag - MNT_SYNC_UMOUNT. Set by umount_tree() when its caller knows that vfsmount will have no surviving references. * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock() and doing pending mntput(). * new helper: legitimize_mnt(mnt, seq). Checks the mount_lock sequence number against seq, then grabs reference to mnt. Then it rechecks mount_lock again to close the race and either returns success or drops the reference it has acquired. The subtle point is that in case of MNT_SYNC_UMOUNT we can simply decrement the refcount and sod off - aforementioned synchronize_rcu() makes sure that final mntput() won't come until we leave RCU mode. We need that, since we don't want to end up with some lazy pathwalk racing with umount() and stealing the final mntput() from it - caller of umount() may expect it to return only once the fs is shut down and we don't want to break that. In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do full-blown mntput() in case of mount_lock sequence number mismatch happening just as we'd grabbed the reference, but in those cases we won't be stealing the final mntput() from anything that would care. * mntput_no_expire() doesn't lock anything on the fast path now. Incidentally, SMP and UP cases are handled the same way - no ifdefs there. * normal pathname resolution does *not* do any writes to mount_lock. It does, of course, bump the refcounts of vfsmount and dentry in the very end, but that's it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09switch shrink_dcache_for_umount() to use of d_walk()Al Viro
we have too many iterators in fs/dcache.c... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-08block: cleanup removing dependency on bootmem headersGrygorii Strashko
Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-09ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"Chen Gang
In current kernel wide source code, except other architectures, only s390 scsi drivers use atomic_clear_mask(), and arm/arm64 need not support s390 drivers. So remove atomic_clear_mask() from "arm[64]/include/asm/atomic.h". Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-09ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' ↵Chen Gang
in atomic_cmpxchg(). For atomic_cmpxchg(), the type of 'oldval' need be 'int' to match the type of "*ptr" (used by 'ldrex' instruction) and 'old' (used by 'teq' instruction). Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-09ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.hChen Gang
atomic* value is signed value, and atomic* functions need also process signed value (parameter value, and return value), so 32-bit arm need use 'long long' instead of 'u64'. After replacement, it will also fix a bug for atomic64_add_negative(): "u64 is never less than 0". The modifications are: in vim, use "1,% s/\<u64\>/long long/g" command. remove '__aligned(8)' which is useless for 64-bit. be sure of 80 column limitation after replacement. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-09ARM: 7871/1: amba: Extend number of IRQSMichal Simek
Xilinx Zynq pl330 dma driver has 9 irqs which all have to be used by the driver to get it work properly. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-09ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()Stephen Boyd
If we're running a kernel compiled with SMP_ON_UP=y and the hardware only supports UP operation there isn't any smp_cross_call function assigned. Unfortunately, we call smp_cross_call() unconditionally in arch_irq_work_raise() and crash the kernel on UP devices. Check to make sure we're running on an SMP device before calling smp_cross_call() here. Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c0004000 [00000000] *pgd=00000000 Internal error: Oops: 80000005 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-rc6-00018-g8d45144-dirty #16 task: de05b440 ti: de05c000 task.ti: de05c000 PC is at 0x0 LR is at arch_irq_work_raise+0x3c/0x48 pc : [<00000000>] lr : [<c0019590>] psr: 60000193 sp : de05dd60 ip : 00000001 fp : 00000000 r10: c085e2f0 r9 : de05c000 r8 : c07be0a4 r7 : de05c000 r6 : de05c000 r5 : c07c5778 r4 : c0824554 r3 : 00000000 r2 : 00000000 r1 : 00000006 r0 : c0529a58 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 80004019 DAC: 00000017 Process swapper/0 (pid: 1, stack limit = 0xde05c248) Stack: (0xde05dd60 to 0xde05e000) dd60: c07b9dbc c00cb2dc 00000001 c08242c0 c08242c0 60000113 c07be0a8 c00b0590 dd80: de05c000 c085e2f0 c08242c0 c08242c0 c1414c28 c00b07cc de05b440 c1414c28 dda0: c08242c0 c00b0af8 c0862bb0 c0862db0 c1414cd8 de05c028 c0824840 de05ddb8 ddc0: 00000000 00000009 00000001 00000024 c07be0a8 c07be0a4 de05c000 c085e2f0 dde0: 00000000 c004a4b0 00000010 de00d2dc 00000054 00000100 00000024 00000000 de00: de05c028 0000000a ffff8ae7 00200040 00000016 de05c000 60000193 de05c000 de20: 00000054 00000000 00000000 00000000 00000000 c004a704 00000000 de05c008 de40: c07ba254 c004aa1c c07c5778 c0014b70 fa200000 00000054 de05de80 c0861244 de60: 00000000 c0008634 de05b440 c051c778 20000113 ffffffff de05deb4 c051d0a4 de80: 00000001 00000001 00000000 de05b440 c082afac de057ac0 de057ac0 de0443c0 dea0: 00000000 00000000 00000000 00000000 c082afbc de05dec8 c009f2a0 c051c778 dec0: 20000113 ffffffff 00000000 c016edb0 00000000 000002b0 de057ac0 de057ac0 dee0: 00000000 c016ee40 c0875e50 de05df2e de057ac0 00000000 00000013 00000000 df00: 00000000 c016f054 de043600 de0443c0 c008eb38 de004ec0 c0875e50 c008eb44 df20: 00000012 00000000 00000000 3931f0f8 00000000 00000000 00000014 c0822e84 df40: 00000000 c008ed2c 00000000 00000000 00000000 c07b7490 c07b7490 c075ab3c df60: 00000000 c00701ac 00000002 00000000 c0070160 dffadb73 7bf8edb4 00000000 df80: c051092c 00000000 00000000 00000000 00000000 00000000 00000000 c0510934 dfa0: de05aa40 00000000 c051092c c0013ce8 00000000 00000000 00000000 00000000 dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 07efffe5 4dfac6f5 [<c0019590>] (arch_irq_work_raise+0x3c/0x48) from [<c00cb2dc>] (irq_work_queue+0xe4/0xf8) [<c00cb2dc>] (irq_work_queue+0xe4/0xf8) from [<c00b0590>] (rcu_accelerate_cbs+0x1d4/0x1d8) [<c00b0590>] (rcu_accelerate_cbs+0x1d4/0x1d8) from [<c00b07cc>] (rcu_start_gp+0x34/0x48) [<c00b07cc>] (rcu_start_gp+0x34/0x48) from [<c00b0af8>] (rcu_process_callbacks+0x318/0x608) [<c00b0af8>] (rcu_process_callbacks+0x318/0x608) from [<c004a4b0>] (__do_softirq+0x114/0x2a0) [<c004a4b0>] (__do_softirq+0x114/0x2a0) from [<c004a704>] (do_softirq+0x6c/0x74) [<c004a704>] (do_softirq+0x6c/0x74) from [<c004aa1c>] (irq_exit+0xac/0x100) [<c004aa1c>] (irq_exit+0xac/0x100) from [<c0014b70>] (handle_IRQ+0x54/0xb4) [<c0014b70>] (handle_IRQ+0x54/0xb4) from [<c0008634>] (omap3_intc_handle_irq+0x60/0x74) [<c0008634>] (omap3_intc_handle_irq+0x60/0x74) from [<c051d0a4>] (__irq_svc+0x44/0x5c) Exception stack(0xde05de80 to 0xde05dec8) de80: 00000001 00000001 00000000 de05b440 c082afac de057ac0 de057ac0 de0443c0 dea0: 00000000 00000000 00000000 00000000 c082afbc de05dec8 c009f2a0 c051c778 dec0: 20000113 ffffffff [<c051d0a4>] (__irq_svc+0x44/0x5c) from [<c051c778>] (_raw_spin_unlock_irq+0x28/0x2c) [<c051c778>] (_raw_spin_unlock_irq+0x28/0x2c) from [<c016edb0>] (proc_alloc_inum+0x30/0xa8) [<c016edb0>] (proc_alloc_inum+0x30/0xa8) from [<c016ee40>] (proc_register+0x18/0x130) [<c016ee40>] (proc_register+0x18/0x130) from [<c016f054>] (proc_mkdir_data+0x44/0x6c) [<c016f054>] (proc_mkdir_data+0x44/0x6c) from [<c008eb44>] (register_irq_proc+0x6c/0x128) [<c008eb44>] (register_irq_proc+0x6c/0x128) from [<c008ed2c>] (init_irq_proc+0x74/0xb0) [<c008ed2c>] (init_irq_proc+0x74/0xb0) from [<c075ab3c>] (kernel_init_freeable+0x84/0x1c8) [<c075ab3c>] (kernel_init_freeable+0x84/0x1c8) from [<c0510934>] (kernel_init+0x8/0x150) [<c0510934>] (kernel_init+0x8/0x150) from [<c0013ce8>] (ret_from_fork+0x14/0x2c) Code: bad PC value Fixes: bf18525fd79 "ARM: 7872/1: Support arch_irq_work_raise() via self IPIs" Reported-by: Olof Johansson <olof@lixom.net> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Tested-by: Olof Johansson <olof@lixom.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-08IB/srp: Report receive errors correctlyBart Van Assche
The IB spec does not guarantee that the opcode is available in error completions. Hence do not rely on it. See also commit 948d1e889e5b ("IB/srp: Introduce srp_handle_qp_err()"). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> # v3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Avoid offlining operational SCSI devicesBart Van Assche
If SCSI commands are submitted with a SCSI request timeout that is lower than the the IB RC timeout, it can happen that the SCSI error handler has already started device recovery before transport layer error handling starts. So it can happen that the SCSI error handler tries to abort a SCSI command after it has been reset by srp_rport_reconnect(). Tell the SCSI error handler that such commands have finished and that it is not necessary to continue its recovery strategy for commands that have been reset by srp_rport_reconnect(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Remove target from list before freeing Scsi_Host structureVu Pham
Remove an SRP target from the SRP target list before invoking the last scsi_host_put() call. This change is necessary because that last put frees the memory that holds the srp_target_port structure. This patch prevents the following kernel oops: RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570 Call Trace: [<ffffffff810b11e4>] lock_acquire+0xa4/0x120 [<ffffffff81531206>] _spin_lock+0x36/0x70 [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp] [<ffffffff8109125c>] worker_thread+0x21c/0x3d0 [<ffffffff81096e86>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 Signed-off-by: Vu Pham <vuhuong@mellanox.com> [ bvanassche - Modified path description and CC'ed stable. ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Add change_queue_depth and change_queue_type supportJack Wang
Currently, it's not possible to change queue depth for a device behind SRP host. Sometimes, we need to adjust queue_depth for performance reason (eg storage busy, we need lower queue_depth to avoid running into SCSI error handler), so this patch add support for SRP driver. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Make queue size configurableBart Van Assche
Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Tested-by: Jack Wang <xjtuwjp@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Introduce srp_alloc_req_data()Bart Van Assche
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Cc: Roland Dreier <roland@purestorage.com> Cc: Vu Pham <vu@mellanox.com> Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Export sgid to sysfsBart Van Assche
On an initiator system with multiple IB ports it is not yet possible to figure out what the originating port of an SRP connection is. Hence make the source GID available in sysfs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Add periodic reconnect functionalityBart Van Assche
After a transport layer occurred, periodically try to reconnect to the target until the dev_loss timer expires. Protect the callback functions that can be invoked from inside the SCSI EH against concurrent invocation with srp_reconnect_rport() via the rport mutex. Change the default dev_loss_tmo from 60s into 600s to give the reconnect mechanism a chance to kick in. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08scsi_transport_srp: Add periodic reconnect supportBart Van Assche
Add support for periodically reconnecting to an SRP target until the dev_loss timer expires. After the tenth reconnection attempt, gradually slow down subsequent reconnect attempts. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Start timers if a transport layer error occursBart Van Assche
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Use SRP transport layer error recoveryBart Van Assche
Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these parameters. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08scsi_transport_srp: Add transport layer error handlingBart Van Assche
Add the necessary functions in the SRP transport module to allow an SRP initiator driver to implement transport layer error handling similar to the functionality already provided by the FC transport layer. This includes: - Support for implementing fast_io_fail_tmo, the time that should elapse after having detected a transport layer problem and before failing I/O. - Support for implementing dev_loss_tmo, the time that should elapse after having detected a transport layer problem and before removing a remote port. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Keep rport as long as the IB transport layerBart Van Assche
Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback could get invoked after srp_remove_host() has finished. In other words, without this patch the queuecommand callback can get invoked after the rport data structure has been freed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Make transport layer retry count configurableVu Pham
Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count allows to reduce the path failover time. Signed-off-by: Vu Pham <vu@mellanox.com> [ bvanassche: Rewrote patch description / changed default retry count ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/qib: Fix txselect regressionMike Marciniszyn
Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in removing a simple_strtoul for a parse routine, setup_txselect(). That routine is required to handle a multi-value string. Unwind that aspect of the fix. Cc: <stable@vger.kernel.org> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/qib: Fix checkpatch __packed warningsMike Marciniszyn
Convert __attribute__ ((packed)) to __packed. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast()Jan Kara
qib_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function qib_user_sdma_queue_pkts() (and also qib_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/ipath: Convert ipath_user_sdma_pin_pages() to use get_user_pages_fast()Jan Kara
ipath_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function ipath_user_sdma_queue_pkts() (and also ipath_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make ipath_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Merged in fix for call to get_user_pages_fast from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>