linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-04-21	nvme/pci: Poll CQ on timeout	Keith Busch
	If an IO timeout occurs, it's helpful to know if the controller did not post a completion or the driver missed an interrupt. While we never expect the latter, this patch will make it possible to tell the difference so we don't have to guess. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-21	nvmet_fc: Change traddr field separator to a colon	James Smart
	The FC-NVME spec revised syntax to avoid comma separators. Sync with the change in the parser for traddr on port attachments. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvme_fc: Add ls aborts on remote port teardown	James Smart
	remoteport teardown never aborted the LS opertions. Add support. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvme_fc: Move LS's to rport	James Smart
	Link LS's on the remoteport rather than the controller. LS's are between nport's. Makes more sense, especially on async teardown where the controller is torn down regardless of the LS (LS is more of a notifier to the target of the teardown), to have them on the remoteport. While revising ls send/done routines, issues were seen relative to refcounting and cleanup, especially in async path. Reworked these code paths. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvmet_fc: add missing reference in add_port	James Smart
	Add missing reference in add_port Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvmet_fc: Rework target side abort handling	James Smart
	target transport: ---------------------- There are cases when there is a need to abort in-progress target operations (writedata) so that controller termination or errors can clean up. That can't happen currently as the abort is another target op type, so it can't be used till the running one finishes (and it may not). Solve by removing the abort op type and creating a separate downcall from the transport to the lldd to request an io to be aborted. The transport will abort ios on queue teardown or io errors. In general the transport tries to call the lldd abort only when the io state is idle. Meaning: ops that transmit data (readdata or rsp) will always finish their transmit (or the lldd will see a state on the link or initiator port that fails the transmit) and the done call for the operation will occur. The transport will wait for the op done upcall before calling the abort function, and as the io is idle, the io can be cleaned up immediately after the abort call; Similarly, ios that are not waiting for data or transmitting data must be in the nvmet layer being processed. The transport will wait for the nvmet layer completion before calling the abort function, and as the io is idle, the io can be cleaned up immediately after the abort call; As for ops that are waiting for data (writedata), they may be outstanding indefinitely if the lldd doesn't see a condition where the initiatior port or link is bad. In those cases, the transport will call the abort function and wait for the lldd's op done upcall for the operation, where it will then clean up the io. Additionally, if a lldd receives an ABTS and matches it to an outstanding request in the transport, A new new transport upcall was created to abort the outstanding request in the transport. The transport expects any outstanding op call (readdata or writedata) will completed by the lldd and the operation upcall made. The transport doesn't act on the reported abort (e.g. clean up the io) until an op done upcall occurs, a new op is attempted, or the nvmet layer completes the io processing. fcloop: ---------------------- Updated to support the new target apis. On fcp io aborts from the initiator, the loopback context is updated to NULL out the half that has completed. The initiator side is immediately called after the abort request with an io completion (abort status). On fcp io aborts from the target, the io is stopped and the initiator side sees it as an aborted io. Target side ops, perhaps in progress while the initiator side is done, continue but noop the data movement as there's no structure on the initiator side to reference. patch also contains: ---------------------- Revised lpfc to support the new abort api commonized rsp buffer syncing and nulling of private data based on calling paths. errors in op done calls don't take action on the fod. They're bad operations which implies the fod may be bad. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvme_fcloop: split job struct from transport for req_release	James Smart
	Current design has the fcloop job struct, used for both initiator and target processing, allocated as part of the initiator request structure. On aborts, the initiator side (based on the request) may terminate, yet the target side wants to continue processing. the target side can't do that if the initiator side goes away. Revise fcloop to allocate an independent target side structure when it starts an io from the initiator. Added a lock to the request struct as well to synchronize pointer updates on abort calls. Modified target downcalls to recognize conditions where initiator has aborted the io (thus nulled the pointer between job structs), thus avoid referencing sgl lists which are gone and no longer making upcalls to the initiator. In conditions where the targetport is no longer connected, have the initiator return an access failure rather than simulating a command completion. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvmet_fc: add req_release to lldd api	James Smart
	With the advent of the opdone calls changing context, the lldd can no longer assume that once the op->done call returns for RSP operations that the request struct is no longer being accessed. As such, revise the lldd api for a req_release callback that the transport will call when the job is complete. This will also be used with abort cases. Fixed text in api header for change in io complete semantics. Revised lpfc to support the new req_release api. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvmet_fc: add target feature flags for upcall isr contexts	James Smart
	Two new feature flags were added to control whether upcalls to the transport result in context switches or stay in the calling context. NVMET_FCTGTFEAT_CMD_IN_ISR: By default, if the flag is not set, the transport assumes the lldd is in a non-isr context and in the cpu context it should be for the io queue. As such, the cmd handler is called directly in the calling context. If the flag is set, indicating the upcall is an isr context, the transport mandates a transition to a workqueue. The workqueue assigned to the queue is used for the context. NVMET_FCTGTFEAT_OPDONE_IN_ISR By default, if the flag is not set, the transport assumes the lldd is in a non-isr context and in the cpu context it should be for the io queue. As such, the fcp operation done callback is called directly in the calling context. If the flag is set, indicating the upcall is an isr context, the transport mandates a transition to a workqueue. The workqueue assigned to the queue is used for the context. Updated lpfc for flags Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvmet: convert from kmap to nvmet_copy_from_sgl	Logan Gunthorpe
	This is safer as it doesn't rely on the data being stored in a single page in an sgl. It also aids our effort to start phasing out users of sg_page. See [1]. For this we kmalloc some memory, copy to it and free at the end. Note: we can't allocate this memory on the stack as the kbuild test robot reports some frame size overflows on i386. [1] https://lwn.net/Articles/720053/ Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	nvme: improve performance for virtual NVMe devices	Helen Koike
	This change provides a mechanism to reduce the number of MMIO doorbell writes for the NVMe driver. When running in a virtualized environment like QEMU, the cost of an MMIO is quite hefy here. The main idea for the patch is provide the device two memory location locations: 1) to store the doorbell values so they can be lookup without the doorbell MMIO write 2) to store an event index. I believe the doorbell value is obvious, the event index not so much. Similar to the virtio specification, the virtual device can tell the driver (guest OS) not to write MMIO unless you are writing past this value. FYI: doorbell values are written by the nvme driver (guest OS) and the event index is written by the virtual device (host OS). The patch implements a new admin command that will communicate where these two memory locations reside. If the command fails, the nvme driver will work as before without any optimizations. Contributions: Eric Northup <digitaleric@google.com> Frank Swiderski <fes@google.com> Ted Tso <tytso@mit.edu> Keith Busch <keith.busch@intel.com> Just to give an idea on the performance boost with the vendor extension: Running fio [1], a stock NVMe driver I get about 200K read IOPs with my vendor patch I get about 1000K read IOPs. This was running with a null device i.e. the backing device simply returned success on every read IO request. [1] Running on a 4 core machine: fio --time_based --name=benchmark --runtime=30 --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k --randrepeat=false Signed-off-by: Rob Nelson <rlnelson@google.com> [mlin: port for upstream] Signed-off-by: Ming Lin <mlin@kernel.org> [koike: updated for upstream] Signed-off-by: Helen Koike <helen.koike@collabora.co.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2017-04-21	nvme/pci: Don't set reserved SQ create flags	Keith Busch
	The QPRIO field is only valid if weighted round robin arbitration is used, and this driver doesn't enable that controller configuration option. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21	net: dsa: Remove redundant NULL dst check	Florian Fainelli
	tag_lan9303.c does check for a NULL dst but that's already checked by dsa_switch_rcv() one layer above. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Juergen Borleis <jbe@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21	scsi: sas: move scsi_remove_host call into sas_remove_host	Johannes Thumshirn
	Move scsi_remove_host call into sas_remove_host and remove it from SAS HBA drivers, so we don't mess up the ordering. This solves an issue with double deleting sysfs entries that was introduced by the change of sysfs behaviour from commit bcdde7e221a8 ("sysfs: make __sysfs_remove_dir() recursive"). [mkp: addressed checkpatch complaints] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Suggested-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Jinpu Wang <jinpu.wang@profitbricks.com> Cc: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinpu Wang <jinpu.wang@profitbricks.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-21	scsi: BusLogic: fix incorrect spelling of adatper_reset_req	Colin Ian King
	Trivial fix to spelling mistake, adatper_reset_req should be adapter_reset_req. Also break up very long seq_printf statement into multiple lines. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Khalid Aziz <khalid@gonehiking.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-21	scsi: bfa: use designated initializers	Kees Cook
	Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. This also initializes the array members using the enum used to look up __port_action entries. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-21	blk-stat: kill blk_stat_rq_ddir()	Jens Axboe
	No point in providing and exporting this helper. There's just one (real) user of it, just use rq_data_dir(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-21	Merge branches 'doc.2017.04.12a', 'fixes.2017.04.19a' and 'srcu.2017.04.21a' ↵	Paul E. McKenney
	into HEAD doc.2017.04.12a: Documentation updates fixes.2017.04.19a: Miscellaneous fixes srcu.2017.04.21a: Parallelize SRCU callback handling
2017-04-21	rcu: Make non-preemptive schedule be Tasks RCU quiescent state	Paul E. McKenney
	Currently, a call to schedule() acts as a Tasks RCU quiescent state only if a context switch actually takes place. However, just the call to schedule() guarantees that the calling task has moved off of whatever tracing trampoline that it might have been one previously. This commit therefore plumbs schedule()'s "preempt" parameter into rcu_note_context_switch(), which then records the Tasks RCU quiescent state, but only if this call to schedule() was -not- due to a preemption. To avoid adding overhead to the common-case context-switch path, this commit hides the rcu_note_context_switch() check under an existing non-common-case check. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-21	srcu: Expedite srcu_schedule_cbs_snp() callback invocation	Paul E. McKenney
	Although Tree SRCU does reduce delays when there is at least one synchronize_srcu_expedited() invocation pending, srcu_schedule_cbs_snp() still waits for SRCU_INTERVAL before invoking callbacks. Since synchronize_srcu_expedited() now posts a callback and waits for that callback to do a wakeup, this destroys the expedited nature of synchronize_srcu_expedited(). This destruction became apparent to Marc Zyngier in the guise of a guest-OS bootup slowdown from five seconds to no fewer than forty seconds. This commit therefore invokes callbacks immediately at the end of the grace period when there is at least one synchronize_srcu_expedited() invocation pending. This brought Marc's guest-OS bootup times back into the realm of reason. Reported-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
2017-04-21	srcu: Parallelize callback handling	Paul E. McKenney
	Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2], however, there are workloads that could result in a high volume of concurrent invocations of call_srcu(), which with current SRCU would result in excessive lock contention on the srcu_struct structure's ->queue_lock, which protects SRCU's callback lists. This commit therefore moves SRCU to per-CPU callback lists, thus greatly reducing contention. Because a given SRCU instance no longer has a single centralized callback list, starting grace periods and invoking callbacks are both more complex than in the single-list Classic SRCU implementation. Starting grace periods and handling callbacks are now handled using an srcu_node tree that is in some ways similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is controlled by exactly the same Kconfig options and boot parameters that control the shape of the rcu_node tree). In addition, the old per-CPU srcu_array structure is now named srcu_data and contains an rcu_segcblist structure named ->srcu_cblist for its callbacks (and a spinlock to protect this). The srcu_struct gets an srcu_gp_seq that is used to associate callback segments with the corresponding completion-time grace-period number. These completion-time grace-period numbers are propagated up the srcu_node tree so that the grace-period workqueue handler can determine whether additional grace periods are needed on the one hand and where to look for callbacks that are ready to be invoked. The srcu_barrier() function must now wait on all instances of the per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected by ->lock, srcu_barrier() can remotely add the needed callbacks. In theory, it could also remotely start grace periods, but in practice doing so is complex and racy. And interestingly enough, it is never necessary for srcu_barrier() to start a grace period because srcu_barrier() only enqueues a callback when a callback is already present--and it turns out that a grace period has to have already been started for this pre-existing callback. Furthermore, it is only the callback that srcu_barrier() needs to wait on, not any particular grace period. Therefore, a new rcu_segcblist_entrain() function enqueues the srcu_barrier() function's callback into the same segment occupied by the last pre-existing callback in the list. The special case where all the pre-existing callbacks are on a different list (because they are in the process of being invoked) is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on the done-callbacks check that takes place after all callbacks are inovked. Note that the readers use the same algorithm as before. Note that there is a separate srcu_idx that tells the readers what counter to increment. This unfortunately cannot be combined with srcu_gp_seq because they need to be incremented at different times. This commit introduces some ugly #ifdefs in rcutorture. These will go away when I feel good enough about Tree SRCU to ditch Classic SRCU. Some crude performance comparisons, courtesy of a quickly hacked rcuperf asynchronous-grace-period capability: Callback Queuing Overhead ------------------------- # CPUS Classic SRCU Tree SRCU ------ ------------ --------- 2 0.349 us 0.342 us 16 31.66 us 0.4 us 41 --------- 0.417 us The times are the 90th percentiles, a statistic that was chosen to reject the overheads of the occasional srcu_barrier() call needed to avoid OOMing the test machine. The rcuperf test hangs when running Classic SRCU at 41 CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code and that statistics, this is a convincing demonstration of Tree SRCU's performance and scalability advantages. [1] https://lwn.net/Articles/309030/ [2] https://patchwork.kernel.org/patch/5108281/ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
2017-04-21	powerpc/mm: Add support for runtime configuration of ASLR limits	Michael Ellerman
	Add powerpc support for mmap_rnd_bits and mmap_rnd_compat_bits, which are two sysctls that allow a user to configure the number of bits of randomness used for ASLR. Because of the way the Kconfig for ARCH_MMAP_RND_BITS is defined, we have to construct at least the MIN value in Kconfig, vs in a header which would be more natural. Given that we just go ahead and do it all in Kconfig. At least according to the code (the documentation makes no mention of it), the value is defined as the number of bits of randomisation of the page, not the address. This makes some sense, with larger page sizes more of the low bits are forced to zero, which would reduce the randomisation if we didn't take the PAGE_SIZE into account. However it does mean the min/max values have to change depending on the PAGE_SIZE in order to actually limit the amount of address space consumed by the randomisation. The result of that is that we have to define the default values based on both 32-bit vs 64-bit, but also the configured PAGE_SIZE. Furthermore now that we have 128TB address space support on Book3S, we also have to take that into account. Finally we can wire up the value in arch_mmap_rnd(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2017-04-21	kvm: Move srcu_struct fields to end of struct kvm	Paul E. McKenney
	Parallelizing SRCU callback handling will increase the size of srcu_struct, which will move the kvm structure's kvm_arch field out of reach of powerpc's current assembly code, which will result in the following sort of build error: arch/powerpc/kvm/book3s_hv_rmhandlers.S:617: Error: operand out of range (0x000000000000b328 is not between 0xffffffffffff8000 and 0x0000000000007fff) This commit moves the srcu_struct fields in the kvm structure to follow the kvm_arch field, which will allow powerpc's assembly code to continue to be able to reach the kvm_arch field. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Michael Ellerman <michaele@au1.ibm.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> [ paulmck: Moved this commit to precede SRCU callback parallelization, and reworded the commit log into future tense, all in the name of bisectability. ]
2017-04-21	crypto: ccp - Disable interrupts early on unload	Gary R Hook
	Ensure that we disable interrupts first when shutting down the driver. Cc: <stable@vger.kernel.org> # 4.9.x+ Signed-off-by: Gary R Hook <ghook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: ccp - Use only the relevant interrupt bits	Gary R Hook
	Each CCP queue can product interrupts for 4 conditions: operation complete, queue empty, error, and queue stopped. This driver only works with completion and error events. Cc: <stable@vger.kernel.org> # 4.9.x+ Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	hwrng: mtk - Add driver for hardware random generator on MT7623 SoC	Sean Wang
	This patch adds support for hardware random generator on MT7623 SoC and should also work on other similar Mediatek SoCs. Currently, the driver is already tested successfully with rng-tools. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Reviewed-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	dt-bindings: hwrng: Add Mediatek hardware random generator bindings	Sean Wang
	Document the devicetree bindings for Mediatek random number generator which could be found on MT7623 SoC or other similar Mediatek SoCs. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: crct10dif-vpmsum - Fix missing preempt_disable()	Michael Ellerman
	In crct10dif_vpmsum() we call enable_kernel_altivec() without first disabling preemption, which is not allowed. It used to be sufficient just to call pagefault_disable(), because that also disabled preemption. But the two were decoupled in commit 8222dbe21e79 ("sched/preempt, mm/fault: Decouple preemption from the page fault logic") in mid 2015. The crct10dif-vpmsum code inherited this bug from the crc32c-vpmsum code on which it was modelled. So add the missing preempt_disable/enable(). We should also call disable_kernel_fp(), although it does nothing by default, there is a debug switch to make it active and all enables should be paired with disables. Fixes: b01df1c16c9a ("crypto: powerpc - Add CRC-T10DIF acceleration") Acked-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: testmgr - replace compression known answer test	Giovanni Cabiddu
	Compression implementations might return valid outputs that do not match what specified in the test vectors. For this reason, the testmgr might report that a compression implementation failed the test even if the data produced by the compressor is correct. This implements a decompress-and-verify test for acomp compression tests rather than a known answer test. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: acomp - allow registration of multiple acomps	Giovanni Cabiddu
	Add crypto_register_acomps and crypto_unregister_acomps to allow the registration of multiple implementations with one call. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	hwrng: n2 - Use devm_kcalloc() in n2rng_probe()	Markus Elfring
	* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "devm_kcalloc". * Replace the specification of a data structure by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: chcr - Fix error handling related to 'chcr_alloc_shash'	Christophe Jaillet
	Up to now, 'crypto_alloc_shash()' may return a valid pointer, an error pointer or NULL (in case of invalid parameter) Update it to always return an error pointer in case of error. It now returns ERR_PTR(-EINVAL) instead of NULL in case of invalid parameter. This simplifies error handling. Also fix a crash in 'chcr_authenc_setkey()' if 'chcr_alloc_shash()' returns an error pointer and the "goto out" path is taken. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	padata: get_next is never NULL	Jason A. Donenfeld
	Per Dan's static checker warning, the code that returns NULL was removed in 2010, so this patch updates the comments and fixes the code assumptions. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: exynos - Add new Exynos RNG driver	Krzysztof Kozlowski
	Replace existing hw_ranndom/exynos-rng driver with a new, reworked one. This is a driver for pseudo random number generator block which on Exynos4 chipsets must be seeded with some value. On newer Exynos5420 chipsets it might seed itself from true random number generator block but this is not implemented yet. New driver is a complete rework to use the crypto ALGAPI instead of hw_random API. Rationale for the change: 1. hw_random interface is for true RNG devices. 2. The old driver was seeding itself with jiffies which is not a reliable source for randomness. 3. Device generates five random 32-bit numbers in each pass but old driver was returning only one 32-bit number thus its performance was reduced. Compatibility with DeviceTree bindings is preserved. New driver does not use runtime power management but manually enables and disables the clock when needed. This is preferred approach because using runtime PM just to toggle clock is huge overhead. Another difference is reseeding itself with generated random data periodically and during resuming from system suspend (previously driver was re-seeding itself again with jiffies). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Stephan Müller <smueller@chronox.de> Reviewed-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	linux/kernel.h: Add ALIGN_DOWN macro	Krzysztof Kozlowski
	Few parts of kernel define their own macro for aligning down so provide a common define for this, with the same usage and assumptions as existing ALIGN. Convert also three existing implementations to this one. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: caam - fix error return code in caam_qi_init()	Wei Yongjun
	Fix to return error code -ENOMEM from the kmem_cache_create() error handling case instead of 0(err is 0 here), as done elsewhere in this function. Fixes: 67c2315def06 ("crypto: caam - add Queue Interface (QI) backend support") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: chcr - Add fallback for AEAD algos	Harsh Jain
	Fallback to sw when I AAD length greater than 511 II Zero length payload II No of sg entries exceeds Request size. Signed-off-by: Harsh Jain <harsh@chelsio.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: chcr - Fix txq ids.	Harsh Jain
	The patch fixes a critical issue to map txqid with flows on the hardware appropriately, if tx queues created are more than flows configured then txqid shall map within the range of hardware flows configured. This ensure that un-mapped txqid does not remain un-handled. The patch also segregated the rxqid and txqid for clarity. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: chcr - Set hmac_ctrl bit to use HW register HMAC_CFG[456]	Harsh Jain
	Use hmac_ctrl bit value saved in setauthsize callback. Signed-off-by: Harsh Jain <harsh@chelsio.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	crypto: chcr - Increase priority of AEAD algos.	Harsh Jain
	templates(gcm,ccm etc) inherit priority value of driver to calculate its priority. In some cases template priority becomes more than driver priority for same algo. Without this patch we will not be able to use driver authenc algos. It will be good if it pushed in stable kernel. Signed-off-by: Harsh Jain <harsh@chelsio.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-21	i2c: rcar: clarify PM handling with more comments	Wolfram Sang
	PM handling is correct but might be a bit subtle. Add some comments for clarification. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-04-21	i2c: rcar: fix resume by always initializing registers before transfer	Wolfram Sang
	Resume failed because of uninitialized registers. Instead of adding a resume callback, we simply initialize registers before every transfer. This lightweight change is more robust and will keep us safe if we ever need support for power domains or dynamic frequency changes. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-04-21	i2c: tegra: fix spelling mistake: "contoller" -> "controller"	Colin Ian King
	trivial fix to spelling mistake in MODULE_DESCRIPTION text Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-04-21	i2c: exynos5: use core helper to get driver data	Andrzej Hajda
	Driver core provides of_device_get_match_data which can be used to get driver data instead of custom helper. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-04-21	i2c: exynos5: de-duplicate error logs on clock setup	Andrzej Hajda
	In case of clock setup error it is enough to log it once. Moreover patch simplifies clock setup routines. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-04-21	i2c: exynos5: simplify clock frequency handling	Andrzej Hajda
	There is no need to keep separate settings for high and fast speed clock. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-04-21	i2c: exynos5: simplify timings calculation	Andrzej Hajda
	Instead of using cryptic loop direct calculation of timings can be used. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Reviewed-by: Andi Shyti <andi.shyti@samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-04-21	powerpc/mm: Wire up ioremap_cache()	Oliver O'Halloran
	The default implementation of ioremap_cache() is aliased to ioremap(). On powerpc ioremap() creates cache-inhibited mappings by default which is almost certainly not what you wanted. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-21	KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK	Marcelo Tosatti
	The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK attempts to disable software suspend from causing "non atomic behaviour" of the operation: Add a helper function to compute the kernel time and convert nanoseconds back to CPU specific cycles. Note that these must not be called in preemptible context, as that would mean the kernel could enter software suspend state, which would cause non-atomic operation. However, assume the kernel can enter software suspend at the following 2 points: ktime_get_ts(&ts); 1. hypothetical_ktime_get_ts(&ts) monotonic_to_bootbased(&ts); 2. monotonic_to_bootbased() should be correct relative to a ktime_get_ts(&ts) performed after point 1 (that is after resuming from software suspend), hypothetical_ktime_get_ts() Therefore it is also correct for the ktime_get_ts(&ts) before point 1, which is ktime_get_ts(&ts) = hypothetical_ktime_get_ts(&ts) + time-to-execute-suspend-code Note CLOCK_MONOTONIC does not count during suspension. So remove the irq disablement, which causes the following warning on -RT kernels: With this reasoning, and the -RT bug that the irq disablement causes (because spin_lock is now a sleeping lock), remove the IRQ protection as it causes: [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m [ 1064.668110] INFO: lockdep is turned off. [ 1064.668110] irq event stamp: 0 [ 1064.668112] hardirqs last enabled at (0): [< (null)>] ) [ 1064.668116] hardirqs last disabled at (0): [] c0 [ 1064.668118] softirqs last enabled at (0): [] c0 [ 1064.668118] softirqs last disabled at (0): [< (null)>] ) [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1 [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5 [ 1064.668123] ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3 [ 1064.668125] ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0 [ 1064.668126] 00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0 [ 1064.668126] Call Trace: [ 1064.668132] [] dump_stack+0x19/0x1b [ 1064.668135] [] __might_sleep+0x12d/0x1f0 [ 1064.668138] [] rt_spin_lock+0x24/0x60 [ 1064.668155] [] __get_kvmclock_ns+0x36/0x110 [k] [ 1064.668159] [] ? futex_wait_queue_me+0x103/0x10 [ 1064.668171] [] kvm_arch_vm_ioctl+0xa2/0xd70 [k] [ 1064.668173] [] ? futex_wait+0x1ac/0x2a0 v2: notice get_kvmclock_ns with the same problem (Pankaj). v3: remove useless helper function (Pankaj). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-21	kvm: better MWAIT emulation for guests	Michael S. Tsirkin
	Guests that are heavy on futexes end up IPI'ing each other a lot. That can lead to significant slowdowns and latency increase for those guests when running within KVM. If only a single guest is needed on a host, we have a lot of spare host CPU time we can throw at the problem. Modern CPUs implement a feature called "MWAIT" which allows guests to wake up sleeping remote CPUs without an IPI - thus without an exit - at the expense of never going out of guest context. The decision whether this is something sensible to use should be up to the VM admin, so to user space. We can however allow MWAIT execution on systems that support it properly hardware wise. This patch adds a CAP to user space and a KVM cpuid leaf to indicate availability of native MWAIT execution. With that enabled, the worst a guest can do is waste as many cycles as a "jmp ." would do, so it's not a privilege problem. We consciously do not expose the feature in our CPUID bitmap, as most people will want to benefit from sleeping vCPUs to allow for over commit. Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [agraf: fix amd, change commit message] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>