summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-14IB/rdmavt: Avoid queuing work into a destroyed cq kthread workerPetr Mladek
The memory barrier is not enough to protect queuing works into a destroyed cq kthread. Just imagine the following situation: CPU1 CPU2 rvt_cq_enter() worker = cq->rdi->worker; rvt_cq_exit() rdi->worker = NULL; smp_wmb(); kthread_flush_worker(worker); kthread_stop(worker->task); kfree(worker); // nothing queued yet => // nothing flushed and // happily stopped and freed if (likely(worker)) { // true => read before CPU2 acted cq->notify = RVT_CQ_NONE; cq->triggered++; kthread_queue_work(worker, &cq->comptask); BANG: worker has been flushed/stopped/freed in the meantime. This patch solves this by protecting the critical sections by rdi->n_cqs_lock. It seems that this lock is not much contended and looks reasonable for this purpose. One catch is that rvt_cq_enter() might be called from IRQ context. Therefore we must always take the lock with IRQs disabled to avoid a possible deadlock. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/mlx4: avoid a -Wmaybe-uninitialize warningArnd Bergmann
There is an old warning about mlx4_SW2HW_EQ_wrapper on x86: ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’: ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized] The problem here is that gcc won't track the state of the variable across a spin_unlock. Moving the assignment out of the lock is safe here and avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/mlx5: avoid bogus -Wmaybe-uninitialized warningArnd Bergmann
We get a false-positive warning in linux-next for the mlx5 driver: infiniband/hw/mlx5/mr.c: In function ‘mlx5_ib_reg_user_mr’: infiniband/hw/mlx5/mr.c:1172:5: error: ‘order’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1161:6: note: ‘order’ was declared here infiniband/hw/mlx5/mr.c:1173:6: error: ‘ncont’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1160:6: note: ‘ncont’ was declared here infiniband/hw/mlx5/mr.c:1173:6: error: ‘page_shift’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1158:6: note: ‘page_shift’ was declared here infiniband/hw/mlx5/mr.c:1143:13: error: ‘npages’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1159:6: note: ‘npages’ was declared here I had a trivial workaround for gcc-5 or higher, but that didn't work on gcc-4.9 unfortunately. The only way I found to avoid the warnings for gcc-4.9, short of initializing each of the arguments first was to change the calling conventions to separate the error code from the umem pointer. This avoids casting the error codes from one pointer to another incompatible pointer, and lets gcc figure out when that the data is actually valid whenever we return successfully. Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14Merge tag 'for-f2fs-4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well" * tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: fix a missing size change in f2fs_setattr f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting ...
2016-12-14rdma UAPI: Use __kernel_sockaddr_storageJason Gunthorpe
The kernel side is #ifdef'd to this type, and the UAPI header should use it directly. It has slightly different alignment requirments from the usual user space version. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14ubifs: Initialize fstr_real_lenRichard Weinberger
While fstr_real_len is only being used under if (encrypted), gcc-6 still warns. Fixes this false positive: fs/ubifs/dir.c: In function 'ubifs_readdir': fs/ubifs/dir.c:629:13: warning: 'fstr_real_len' may be used uninitialized in this function [-Wmaybe-uninitialized] fstr.len = fstr_real_len Initialize fstr_real_len to make gcc happy. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-14nvmet_rdma: log the connection reject messageSteve Wise
Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14ib_isert: log the connection reject messageSteve Wise
Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14rds_rdma: log the connection reject messageSteve Wise
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14ib_iser: log the connection reject messageSteve Wise
Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14nvme-rdma: use rdma connection reject helper functionsSteve Wise
Also add nvme cm status strings and use them. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14rdma_cm: add rdma_consumer_reject_data helper functionSteve Wise
rdma_consumer_reject_data() will return the private data pointer and length if any is available. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14rdma_cm: add rdma_is_consumer_reject() helper functionSteve Wise
Return true if the peer consumer application rejected the connection attempt. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14rdma_cm: add rdma_reject_msg() helper functionSteve Wise
rdma_reject_msg() returns a pointer to a string message associated with the transport reject reason codes. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14Merge tag 'dlm-4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm fixes from David Teigland: "This set fixes error reporting for dlm sockets, removes the unbound property on the dlm callback workqueue to improve performance, and includes a couple trivial changes" * tag 'dlm-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix error return code in sctp_accept_from_sock() dlm: don't specify WQ_UNBOUND for the ast callback workqueue dlm: remove lock_sock to avoid scheduling while atomic dlm: don't save callbacks after accept dlm: audit and remove any unnecessary uses of module.h dlm: make genl_ops const
2016-12-14Merge tag 'jfs-4.10' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs update from David Kleikamp: "The jfs piece of the current_time() series" * tag 'jfs-4.10' of git://github.com/kleikamp/linux-shaggy: fs: jfs: Replace CURRENT_TIME_SEC by current_time()
2016-12-14qedr: remove pointless NULL check in qedr_post_send()Wei Yongjun
Remove pointless NULL check for 'wr' in qedr_post_send(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14qedr: Use list_move_tail instead of list_del/list_add_tailWei Yongjun
Using list_move_tail() instead of list_del() + list_add_tail(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14qedr: Fix possible memory leak in qedr_create_qp()Wei Yongjun
'qp' is malloced in qedr_create_qp() and should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14qedr: return -EINVAL if pd is null and avoid null ptr dereferenceColin Ian King
Currently, if pd is null then we hit a null pointer derference on accessing pd->pd_id. Instead of just printing an error message we should also return -EINVAL immediately. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/mad: Eliminate redundant SM class version defines for OPAHal Rosenstock
and rename class version define to indicate SM rather than SMP or SMI Signed-off-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14s390/pci: query fmb lengthSebastian Ott
Query the length of the fmb and abort fmb registration if the size of the associated measurement block is too small. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: add missing memory clobber to ap_qci inline assemblyHeiko Carstens
The ap_qci() inline assembly writes to memory (*config) but misses to tell the compiler about it. Add the missing memory clobber to fix this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/extmem: add missing memory clobber to dcss_set_subcodesHeiko Carstens
Add the missing memory clobber / barrier to dcss_set_subcodes() to tell the compiler that the inline assembly accesses memory (name string). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/nmi: fix inline assembly constraintsHeiko Carstens
Add missing memory clobbers / barriers or use the Q constraint where possible to tell the compiler that the inline assemblies actually access memory and not only pointers to memory. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/lib: add missing memory barriers to string inline assembliesHeiko Carstens
We have a couple of inline assemblies like memchr() and strlen() that read from memory, but tell the compiler only they need the addresses of the strings they access. This allows the compiler to omit the initialization of such strings and therefore generate broken code. Add the missing memory barrier to all string related inline assemblies to fix this potential issue. It looks like the compiler currently does not generate broken code due to these bugs. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/cpumf: fix qsi inline assemblyHeiko Carstens
The qsi inline assembly takes an initialized "cc" variable as output operand but specifies it as write-to operand only instead of read/write operand. This allows the compiler to omit the initialization, which in fact it also does (gcc 6.1). Use the "+" constraint modifier to fix this. In addition also use the Q constraint to specify the hws_qsi_info_block memory location, so the compiler can generate slightly better code. Also get rid of the cc clobber since none of the instructions within the inline assembly modify the condition code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/setup: reword printk messagesMartin Schwidefsky
Two of the messages introduced by the memblock conversion are reworded. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/dasd: fix typos in DASD error messagesStefan Haberland
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390: fix compile error with memmove_early() inline assemblyHeiko Carstens
Old gcc versions can't handle a bogus early clobber on a Q constraint: arch/s390/kernel/early.c: In function 'memmove_early.part.1': arch/s390/kernel/early.c:432:2: error: '&' constraint used with no register class Simply remove it to fix this. Reported-by: Stefan Haberland <sth@linux.vnet.ibm.com> Fixes: d543a106f96d ("s390: fix initrd corruptions with gcov/kcov instrumented kernels") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: tracepoint definitions for zcrypt device driver.Harald Freudenberger
This patch introduces tracepoint definitions and tracepoint event invocations for the s390 zcrypt device. Currently there are just two tracepoint events defined. An s390_zcrypt_req request event occurs as soon as the request is recognized by the zcrypt ioctl function. This event may act as some kind of request-processing-starts-now indication. As late as possible within the zcrypt ioctl function there occurs the s390_zcrypt_rep event which may act as the point in time where the request has been processed by the kernel and the result is about to be transferred back to userspace. The glue which binds together request and reply event is the ptr parameter, which is the local buffer address where the request from userspace has been stored by the ioctl function. The main purpose of this zcrypt tracepoint patch is to get some data for performance measurements together with information about the kind of request and on which card and queue the request has been processed. It is not an ffdc interface as there is already code in the zcrypt device driver to serve the s390 debug feature interface. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Rework debug feature invocations.Harald Freudenberger
Rework the debug feature calls and initialization. There are now two debug feature entries used by the zcrypt code. The first is 'ap' with all the AP bus related stuff and the second is 'zcrypt' with all the zcrypt and devices and driver related entries. However, there isn't much traffic on both debug features. The ap bus code emits only some debug info and for zcrypt devices on appearance and disappearance there is an entry written. The new dbf invocations use the sprintf buffer layout, whereas the old implementation used the ascii dbf buffer. There are now 5*8=40 bytes used for each entry, resulting in 5 parameters per call. As the sprintf buffer needs a format string the first parameter provides this and so up to 4 more parameters can be used. Alltogehter the new layout should be much more human readable for customers and test. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Improved invalid domain response handling.Harald Freudenberger
Add defines and switch case code to handle the two invalid domain response codes better. Until now these two response codes are handled via default resulting in -EAGAIN and switching the processed queue to offline. So this kind of malformed request bounced through all suitable queues and switched them off. Now this kind of malformed request is just rejected with EINVAL without switching off the queue. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Fix ap_max_domain_id for older machine typesIngo Tuchscherer
According to the system architecture the current implementation requires the presence of the N bit in GR2 in the TAPQ response field to validate the max. number of domains (Nd). Older machine types don't have this N bit, hence the max. domain field was ignored. Before the N bit was introduced the maximum number of domain was a constant value of 15. So set this value in case of N bit absence. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Correct function bits for CEX2x and CEX3x cards.Harald Freudenberger
For the older CEX2x and CEX3x cards the function bits returned by TAPQ do not reflect the functions of the card. Instead the functionality is implicit by the type of the card. The reworked zcrypt requires to have the function bits set correct, so this patch fixes this. The queue selection is not only based on these function bits but also on function pointers set by the individual drivers. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Fixed attrition of AP adapters and domainsIngo Tuchscherer
Currently the first eligible AP adapter respectively domain will be selected to service requests. In case of sequential workload, the very same adapter/domain will be used. The adapter/domain selection algorithm now considers the completed transactions per adaper/domain and therefore ensures a homogeneous utilization. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Introduce new zcrypt device status APIIngo Tuchscherer
Introduce new ioctl (ZDEVICESTATUS) to provide detailed information, like hardware type, domains, status and functionality of available crypto devices. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: add multi domain supportIngo Tuchscherer
Currently the ap infrastructure only supports one domain at a time. This feature extends the generic cryptographic device driver to support multiple cryptographic domains simultaneously. There are now card and queue devices on the AP bus with independent card and queue drivers. The new /sys layout is as follows: /sys/bus/ap devices <xx>.<yyyy> -> ../../../devices/ap/card<xx>/<xx>.<yyyy> ... card<xx> -> ../../../devices/ap/card<xx> ... drivers <drv>card card<xx> -> ../../../../devices/ap/card<xx> <drv>queue <xx>.<yyyy> -> ../../../../devices/ap/card<xx>/<xx>.<yyyy> ... /sys/devices/ap card<xx> <xx>.<yyyy> driver -> ../../../../bus/ap/drivers/<zzz>queue ... driver -> ../../../bus/ap/drivers/<drv>card ... The two digit <xx> field is the card number, the four digit <yyyy> field is the queue number and <drv> is the name of the device driver, e.g. "cex4". For compatability /sys/bus/ap/card<xx> for the old layout has to exist, including the attributes that used to reside there. With additional contributions from Harald Freudenberger and Martin Schwidefsky. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Introduce workload balancingIngo Tuchscherer
Crypto requests are very different in complexity and thus runtime. Also various crypto adapters are differ with regard to the execution time. Crypto requests can be balanced much better when the request type and eligible crypto adapters are rated in a more precise granularity. Therefore, request weights and adapter speed rates for dedicated requests will be introduced. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: get rid of ap_poll_requestsMartin Schwidefsky
The poll thread of the AP bus is burning CPU while waiting for crypto requests to complete. We can as well burn a few more cycles in the poll thread to check if there are pending requests and remove the atomic operations with the ap_poll_requests. This improves the code if the machine has adapter interrupts. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: header for the AP inline assmbliesMartin Schwidefsky
Move the inline assemblies for the AP bus into a separate header file. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: simplify message type handlingMartin Schwidefsky
Now that the message type modules are linked with the zcrypt_api into a single module the zcrypt_ops_list is initialized by the module init function of the zcyppt.ko module. After that the list is static and all message types are present. Drop the zcrypt_ops_list_lock spinlock and the module handling in regard to the message types. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Move the ap bus into kernelIngo Tuchscherer
Move the ap bus into the kernel and make it general available. Additionally include the message types and the API layer as a preparation for the workload management facility. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14s390/zcrypt: Introduce CEX6 tolerationHarald Freudenberger
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-14blk-mq: Avoid memory reclaim when remapping queuesGabriel Krisman Bertazi
While stressing memory and IO at the same time we changed SMT settings, we were able to consistently trigger deadlocks in the mm system, which froze the entire machine. I think that under memory stress conditions, the large allocations performed by blk_mq_init_rq_map may trigger a reclaim, which stalls waiting on the block layer remmaping completion, thus deadlocking the system. The trace below was collected after the machine stalled, waiting for the hotplug event completion. The simplest fix for this is to make allocations in this path non-reclaimable, with GFP_NOIO. With this patch, We couldn't hit the issue anymore. This should apply on top of Jens's for-next branch cleanly. Changes since v1: - Use GFP_NOIO instead of GFP_NOWAIT. Call Trace: [c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable) [c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430 [c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0 [c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0 [c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30 [c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0 [c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0 [c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs] [c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs] [c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs] [c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210 [c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0 [c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0 [c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520 [c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250 [c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0 [c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380 [c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360 [c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0 [c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250 [c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100 [c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0 [c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0 [c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250 [c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120 [c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0 [c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150 [c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0 [c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120 [c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0 [c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0 [c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0 [c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250 [c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0 [c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270 [c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110 [c000000f0160be30] [c000000000009204] system_call+0x38/0xec Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Douglas Miller <dougmill@linux.vnet.ibm.com> Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-14Merge branch 'syscalls' into for-linusRussell King
Conflicts: arch/arm/include/asm/unistd.h arch/arm/include/uapi/asm/unistd.h arch/arm/kernel/calls.S
2016-12-14Merge branches 'clkdev', 'fixes', 'misc' and 'sa1100-base' into for-linusRussell King
2016-12-14crypto: skcipher - fix crash in virtual walkArd Biesheuvel
The new skcipher walk API may crash in the following way. (Interestingly, the tcrypt boot time tests seem unaffected, while an explicit test using the module triggers it) Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... [<ffff000008431d84>] __memcpy+0x84/0x180 [<ffff0000083ec0d0>] skcipher_walk_done+0x328/0x340 [<ffff0000080c5c04>] ctr_encrypt+0x84/0x100 [<ffff000008406d60>] simd_skcipher_encrypt+0x88/0x98 [<ffff0000083fa05c>] crypto_rfc3686_crypt+0x8c/0x98 [<ffff0000009b0900>] test_skcipher_speed+0x518/0x820 [tcrypt] [<ffff0000009b31c0>] do_test+0x1408/0x3b70 [tcrypt] [<ffff0000009bd050>] tcrypt_mod_init+0x50/0x1000 [tcrypt] [<ffff0000080838f4>] do_one_initcall+0x44/0x138 [<ffff0000081aee60>] do_init_module+0x68/0x1e0 [<ffff0000081524d0>] load_module+0x1fd0/0x2458 [<ffff000008152c38>] SyS_finit_module+0xe0/0xf0 [<ffff0000080836f0>] el0_svc_naked+0x24/0x28 This is due to the fact that skcipher_done_slow() may be entered with walk->buffer unset. Since skcipher_walk_done() already deals with the case where walk->buffer == walk->page, it appears to be the intention that walk->buffer point to walk->page after skcipher_next_slow(), so ensure that is the case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-14sign-file: Fix inplace signing when src and dst names are both specifiedAlex Yashchenko
When src and dst both are specified and they point to the same file the sign-file utility will write only signature to the dst file and the module (.ko file) body will not be written. That happens because we open the same file with "rb" and "wb" flags, from fopen man: w Truncate file to zero length or create text file for writing. The stream is positioned at the beginning of the file. ... bm = BIO_new_file(module_name, "rb"); ... bd = BIO_new_file(dest_name, "wb"); ... while ((n = BIO_read(bm, buf, sizeof(buf))), n > 0) { ERR(BIO_write(bd, buf, n) < 0, "%s", dest_name); } ... Signed-off-by: Alex Yashchenko <alexhoppus111@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-14crypto: asymmetric_keys - set error code on failurePan Bian
In function public_key_verify_signature(), returns variable ret on error paths. When the call to kmalloc() fails, the value of ret is 0, and it is not set to an errno before returning. This patch fixes the bug. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188891 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>