summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2017-02-19md/raid1: fix a use-after-free bugShaohua Li
Commit fd76863 (RAID1: a new I/O barrier implementation to remove resync window) introduces a user-after-free bug. Signed-off-by: Shaohua Li <shli@fb.com>
2017-02-19RAID1: avoid unnecessary spin locks in I/O barrier codecolyli@suse.de
When I run a parallel reading performan testing on a md raid1 device with two NVMe SSDs, I observe very bad throughput in supprise: by fio with 64KB block size, 40 seq read I/O jobs, 128 iodepth, overall throughput is only 2.7GB/s, this is around 50% of the idea performance number. The perf reports locking contention happens at allow_barrier() and wait_barrier() code, - 41.41% fio [kernel.kallsyms] [k] _raw_spin_lock_irqsave - _raw_spin_lock_irqsave + 89.92% allow_barrier + 9.34% __wake_up - 37.30% fio [kernel.kallsyms] [k] _raw_spin_lock_irq - _raw_spin_lock_irq - 100.00% wait_barrier The reason is, in these I/O barrier related functions, - raise_barrier() - lower_barrier() - wait_barrier() - allow_barrier() They always hold conf->resync_lock firstly, even there are only regular reading I/Os and no resync I/O at all. This is a huge performance penalty. The solution is a lockless-like algorithm in I/O barrier code, and only holding conf->resync_lock when it has to. The original idea is from Hannes Reinecke, and Neil Brown provides comments to improve it. I continue to work on it, and make the patch into current form. In the new simpler raid1 I/O barrier implementation, there are two wait barrier functions, - wait_barrier() Which calls _wait_barrier(), is used for regular write I/O. If there is resync I/O happening on the same I/O barrier bucket, or the whole array is frozen, task will wait until no barrier on same barrier bucket, or the whold array is unfreezed. - wait_read_barrier() Since regular read I/O won't interfere with resync I/O (read_balance() will make sure only uptodate data will be read out), it is unnecessary to wait for barrier in regular read I/Os, waiting in only necessary when the whole array is frozen. The operations on conf->nr_pending[idx], conf->nr_waiting[idx], conf-> barrier[idx] are very carefully designed in raise_barrier(), lower_barrier(), _wait_barrier() and wait_read_barrier(), in order to avoid unnecessary spin locks in these functions. Once conf-> nr_pengding[idx] is increased, a resync I/O with same barrier bucket index has to wait in raise_barrier(). Then in _wait_barrier() if no barrier raised in same barrier bucket index and array is not frozen, the regular I/O doesn't need to hold conf->resync_lock, it can just increase conf->nr_pending[idx], and return to its caller. wait_read_barrier() is very similar to _wait_barrier(), the only difference is it only waits when array is frozen. For heavy parallel reading I/Os, the lockless I/O barrier code almostly gets rid of all spin lock cost. This patch significantly improves raid1 reading peroformance. From my testing, a raid1 device built by two NVMe SSD, runs fio with 64KB blocksize, 40 seq read I/O jobs, 128 iodepth, overall throughput increases from 2.7GB/s to 4.6GB/s (+70%). Changelog V4: - Change conf->nr_queued[] to atomic_t. - Define BARRIER_BUCKETS_NR_BITS by (PAGE_SHIFT - ilog2(sizeof(atomic_t))) V3: - Add smp_mb__after_atomic() as Shaohua and Neil suggested. - Change conf->nr_queued[] from atomic_t to int. - Change conf->array_frozen from atomic_t back to int, and use READ_ONCE(conf->array_frozen) to check value of conf->array_frozen in _wait_barrier() and wait_read_barrier(). - In _wait_barrier() and wait_read_barrier(), add a call to wake_up(&conf->wait_barrier) after atomic_dec(&conf->nr_pending[idx]), to fix a deadlock between _wait_barrier()/wait_read_barrier and freeze_array(). V2: - Remove a spin_lock/unlock pair in raid1d(). - Add more code comments to explain why there is no racy when checking two atomic_t variables at same time. V1: - Original RFC patch for comments. Signed-off-by: Coly Li <colyli@suse.de> Cc: Shaohua Li <shli@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Guoqing Jiang <gqjiang@suse.com> Reviewed-by: Neil Brown <neilb@suse.de> Signed-off-by: Shaohua Li <shli@fb.com>
2017-02-19RAID1: a new I/O barrier implementation to remove resync windowcolyli@suse.de
'Commit 79ef3a8aa1cb ("raid1: Rewrite the implementation of iobarrier.")' introduces a sliding resync window for raid1 I/O barrier, this idea limits I/O barriers to happen only inside a slidingresync window, for regular I/Os out of this resync window they don't need to wait for barrier any more. On large raid1 device, it helps a lot to improve parallel writing I/O throughput when there are background resync I/Os performing at same time. The idea of sliding resync widow is awesome, but code complexity is a challenge. Sliding resync window requires several variables to work collectively, this is complexed and very hard to make it work correctly. Just grep "Fixes: 79ef3a8aa1" in kernel git log, there are 8 more patches to fix the original resync window patch. This is not the end, any further related modification may easily introduce more regreassion. Therefore I decide to implement a much simpler raid1 I/O barrier, by removing resync window code, I believe life will be much easier. The brief idea of the simpler barrier is, - Do not maintain a global unique resync window - Use multiple hash buckets to reduce I/O barrier conflicts, regular I/O only has to wait for a resync I/O when both them have same barrier bucket index, vice versa. - I/O barrier can be reduced to an acceptable number if there are enough barrier buckets Here I explain how the barrier buckets are designed, - BARRIER_UNIT_SECTOR_SIZE The whole LBA address space of a raid1 device is divided into multiple barrier units, by the size of BARRIER_UNIT_SECTOR_SIZE. Bio requests won't go across border of barrier unit size, that means maximum bio size is BARRIER_UNIT_SECTOR_SIZE<<9 (64MB) in bytes. For random I/O 64MB is large enough for both read and write requests, for sequential I/O considering underlying block layer may merge them into larger requests, 64MB is still good enough. Neil also points out that for resync operation, "we want the resync to move from region to region fairly quickly so that the slowness caused by having to synchronize with the resync is averaged out over a fairly small time frame". For full speed resync, 64MB should take less then 1 second. When resync is competing with other I/O, it could take up a few minutes. Therefore 64MB size is fairly good range for resync. - BARRIER_BUCKETS_NR There are BARRIER_BUCKETS_NR buckets in total, which is defined by, #define BARRIER_BUCKETS_NR_BITS (PAGE_SHIFT - 2) #define BARRIER_BUCKETS_NR (1<<BARRIER_BUCKETS_NR_BITS) this patch makes the bellowed members of struct r1conf from integer to array of integers, - int nr_pending; - int nr_waiting; - int nr_queued; - int barrier; + int *nr_pending; + int *nr_waiting; + int *nr_queued; + int *barrier; number of the array elements is defined as BARRIER_BUCKETS_NR. For 4KB kernel space page size, (PAGE_SHIFT - 2) indecates there are 1024 I/O barrier buckets, and each array of integers occupies single memory page. 1024 means for a request which is smaller than the I/O barrier unit size has ~0.1% chance to wait for resync to pause, which is quite a small enough fraction. Also requesting single memory page is more friendly to kernel page allocator than larger memory size. - I/O barrier bucket is indexed by bio start sector If multiple I/O requests hit different I/O barrier units, they only need to compete I/O barrier with other I/Os which hit the same I/O barrier bucket index with each other. The index of a barrier bucket which a bio should look for is calculated by sector_to_idx() which is defined in raid1.h as an inline function, static inline int sector_to_idx(sector_t sector) { return hash_long(sector >> BARRIER_UNIT_SECTOR_BITS, BARRIER_BUCKETS_NR_BITS); } Here sector_nr is the start sector number of a bio. - Single bio won't go across boundary of a I/O barrier unit If a request goes across boundary of barrier unit, it will be split. A bio may be split in raid1_make_request() or raid1_sync_request(), if sectors returned by align_to_barrier_unit_end() is smaller than original bio size. Comparing to single sliding resync window, - Currently resync I/O grows linearly, therefore regular and resync I/O will conflict within a single barrier units. So the I/O behavior is similar to single sliding resync window. - But a barrier unit bucket is shared by all barrier units with identical barrier uinit index, the probability of conflict might be higher than single sliding resync window, in condition that writing I/Os always hit barrier units which have identical barrier bucket indexs with the resync I/Os. This is a very rare condition in real I/O work loads, I cannot imagine how it could happen in practice. - Therefore we can achieve a good enough low conflict rate with much simpler barrier algorithm and implementation. There are two changes should be noticed, - In raid1d(), I change the code to decrease conf->nr_pending[idx] into single loop, it looks like this, spin_lock_irqsave(&conf->device_lock, flags); conf->nr_queued[idx]--; spin_unlock_irqrestore(&conf->device_lock, flags); This change generates more spin lock operations, but in next patch of this patch set, it will be replaced by a single line code, atomic_dec(&conf->nr_queueud[idx]); So we don't need to worry about spin lock cost here. - Mainline raid1 code split original raid1_make_request() into raid1_read_request() and raid1_write_request(). If the original bio goes across an I/O barrier unit size, this bio will be split before calling raid1_read_request() or raid1_write_request(), this change the code logic more simple and clear. - In this patch wait_barrier() is moved from raid1_make_request() to raid1_write_request(). In raid_read_request(), original wait_barrier() is replaced by raid1_read_request(). The differnece is wait_read_barrier() only waits if array is frozen, using different barrier function in different code path makes the code more clean and easy to read. Changelog V4: - Add alloc_r1bio() to remove redundant r1bio memory allocation code. - Fix many typos in patch comments. - Use (PAGE_SHIFT - ilog2(sizeof(int))) to define BARRIER_BUCKETS_NR_BITS. V3: - Rebase the patch against latest upstream kernel code. - Many fixes by review comments from Neil, - Back to use pointers to replace arraries in struct r1conf - Remove total_barriers from struct r1conf - Add more patch comments to explain how/why the values of BARRIER_UNIT_SECTOR_SIZE and BARRIER_BUCKETS_NR are decided. - Use get_unqueued_pending() to replace get_all_pendings() and get_all_queued() - Increase bucket number from 512 to 1024 - Change code comments format by review from Shaohua. V2: - Use bio_split() to split the orignal bio if it goes across barrier unit bounday, to make the code more simple, by suggestion from Shaohua and Neil. - Use hash_long() to replace original linear hash, to avoid a possible confilict between resync I/O and sequential write I/O, by suggestion from Shaohua. - Add conf->total_barriers to record barrier depth, which is used to control number of parallel sync I/O barriers, by suggestion from Shaohua. - In V1 patch the bellowed barrier buckets related members in r1conf are allocated in memory page. To make the code more simple, V2 patch moves the memory space into struct r1conf, like this, - int nr_pending; - int nr_waiting; - int nr_queued; - int barrier; + int nr_pending[BARRIER_BUCKETS_NR]; + int nr_waiting[BARRIER_BUCKETS_NR]; + int nr_queued[BARRIER_BUCKETS_NR]; + int barrier[BARRIER_BUCKETS_NR]; This change is by the suggestion from Shaohua. - Remove some inrelavent code comments, by suggestion from Guoqing. - Add a missing wait_barrier() before jumping to retry_write, in raid1_make_write_request(). V1: - Original RFC patch for comments Signed-off-by: Coly Li <colyli@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Guoqing Jiang <gqjiang@suse.com> Reviewed-by: Neil Brown <neilb@suse.de> Signed-off-by: Shaohua Li <shli@fb.com>
2017-02-19of_mdio: Add "broadcom,bcm5241" to the whitelist.David Daney
Some Cavium dev boards have firmware which doesn't supply a proper ethernet-phy-ieee802.3-c22" compatible property. Restore these boards to working order by whitelisting this compatible value. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19net: ethernet: stmmac: dwmac-rk: Add RK3328 gmac supportdavid.wu
Add constants and callback functions for the dwmac on rk3328 socs. As can be seen, the base structure is the same, only registers and the bits in them moved slightly. Signed-off-by: david.wu <david.wu@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19virtio-net: batch stats updatingJason Wang
We already have counters for sent/recv packets and sent/recv bytes. Doing a batched update to reduce the number of u64_stats_update_begin/end(). Take care not to bother with stats update when called speculatively. Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19mlx4: fix potential divide by 0 in mlx4_en_auto_moderation()Eric Dumazet
1) In the case where rate == priv->pkt_rate_low == priv->pkt_rate_high, mlx4_en_auto_moderation() does a divide by zero. 2) We want to properly change the moderation parameters if rx_frames was changed (like in ethtool -C eth0 rx-frames 16) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqsThomas Falcon
After sending device capability queries and requests to the vNIC Server, an interrupt is triggered and the responses are written to the driver's CRQ response buffer. Since the interrupt can be triggered before all responses are written and visible to the partition, there is a danger that the interrupt handler or tasklet can terminate before all responses are read, resulting in a failure to initialize the device. To avoid this scenario, when capability commands are sent, we set a flag that will be checked in the following interrupt tasklet that will handle the capability responses from the server. Once all responses have been handled, the flag is disabled; and the tasklet is allowed to terminate. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19ibmvnic: Use common counter for capabilities checksThomas Falcon
Two different counters were being used for capabilities requests and queries. These commands are not called at the same time so there is no reason a single counter cannot be used. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19ibmvnic: Handle processing of CRQ messages in a taskletThomas Falcon
Create a tasklet to process queued commands or messages received from firmware instead of processing them in the interrupt handler. Note that this handler does not process network traffic, but communications related to resource allocation and device settings. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19qed: Add support for hardware offloaded FCoE.Arun Easi
This adds the backbone required for the various HW initalizations which are necessary for the FCoE driver (qedf) for QLogic FastLinQ 4xxxx line of adapters - FW notification, resource initializations, etc. Signed-off-by: Arun Easi <arun.easi@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19Fix missing sanity check in /dev/sgAl Viro
What happens is that a write to /dev/sg is given a request with non-zero ->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing to an array full of empty iovecs. Having write permission to /dev/sg shouldn't be equivalent to the ability to trigger BUG_ON() while holding spinlocks... Found by Dmitry Vyukov and syzkaller. [ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the underlying issue. - Linus ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-19scsi: don't BUG_ON() empty DMA transfersJohannes Thumshirn
Don't crash the machine just because of an empty transfer. Use WARN_ON() combined with returning an error. Found by Dmitry Vyukov and syzkaller. [ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON() might still be a cause of excessive log spamming. NOTE! If this warning ever triggers, we may end up leaking resources, since this doesn't bother to try to clean the command up. So this WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is much worse. People really need to stop using BUG_ON() for "this shouldn't ever happen". It makes pretty much any bug worse. - Linus ] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-19Merge remote-tracking branches 'spi/topic/ti-qspi' and ↵Mark Brown
'spi/topic/topcliff-pch' into spi-next
2017-02-19Merge remote-tracking branches 'spi/topic/rockchip', 'spi/topic/rspi', ↵Mark Brown
'spi/topic/s3c64xx', 'spi/topic/sh-msiof' and 'spi/topic/slave' into spi-next
2017-02-19Merge remote-tracking branches 'spi/topic/imx', 'spi/topic/lantiq-ssc', ↵Mark Brown
'spi/topic/mpc52xx', 'spi/topic/ppc4xx' and 'spi/topic/pxa2xx' into spi-next
2017-02-19Merge remote-tracking branches 'spi/topic/dw', 'spi/topic/ep93xx', ↵Mark Brown
'spi/topic/falcon' and 'spi/topic/fsl-lpspi' into spi-next
2017-02-19Merge remote-tracking branches 'spi/topic/armada', 'spi/topic/ath79', ↵Mark Brown
'spi/topic/bcm-qspi' and 'spi/topic/bcm53xx' into spi-next
2017-02-19Merge remote-tracking branch 'spi/topic/dma' into spi-nextMark Brown
2017-02-19Merge remote-tracking branch 'spi/topic/core' into spi-nextMark Brown
2017-02-19Merge remote-tracking branches 'spi/fix/pxa2xx', 'spi/fix/rspi' and ↵Mark Brown
'spi/fix/s3c64xx' into spi-linus
2017-02-19Merge remote-tracking branches 'regulator/topic/s2mpa01', ↵Mark Brown
'regulator/topic/supplies' and 'regulator/topic/tps65217' into regulator-next
2017-02-19Merge remote-tracking branches 'regulator/topic/pv88080', ↵Mark Brown
'regulator/topic/pv88090', 'regulator/topic/qcom-smd', 'regulator/topic/rc5t583' and 'regulator/topic/rn5t618' into regulator-next
2017-02-19Merge remote-tracking branches 'regulator/topic/pbias', ↵Mark Brown
'regulator/topic/pcap', 'regulator/topic/pcf50633', 'regulator/topic/pfuze100' and 'regulator/topic/pv88060' into regulator-next
2017-02-19Merge remote-tracking branches 'regulator/topic/max77802', ↵Mark Brown
'regulator/topic/max8907', 'regulator/topic/max8925', 'regulator/topic/max8952' and 'regulator/topic/palmas' into regulator-next
2017-02-19Merge remote-tracking branches 'regulator/topic/ltc3676', ↵Mark Brown
'regulator/topic/max14577', 'regulator/topic/max77620', 'regulator/topic/max77686' and 'regulator/topic/max77693' into regulator-next
2017-02-19Merge remote-tracking branches 'regulator/topic/cpcap', ↵Mark Brown
'regulator/topic/fan53555', 'regulator/topic/gpio', 'regulator/topic/hi655x' and 'regulator/topic/lp8755' into regulator-next
2017-02-19Merge remote-tracking branches 'regulator/topic/anatop', ↵Mark Brown
'regulator/topic/arizona', 'regulator/topic/as3711' and 'regulator/topic/bcm590xx' into regulator-next
2017-02-19Merge remote-tracking branches 'regulator/topic/88pm800', ↵Mark Brown
'regulator/topic/88pm8607', 'regulator/topic/aat2870', 'regulator/topic/act8945a' and 'regulator/topic/ad5938' into regulator-next
2017-02-19Merge remote-tracking branch 'regulator/topic/core' into regulator-nextMark Brown
2017-02-19Merge remote-tracking branches 'regulator/fix/debugfs' and ↵Mark Brown
'regulator/fix/tps65086' into regulator-linus
2017-02-19Merge remote-tracking branch 'regulator/fix/core' into regulator-linusMark Brown
2017-02-19Merge remote-tracking branches 'asoc/topic/max9867', 'asoc/topic/mtk', ↵Mark Brown
'asoc/topic/nau8540', 'asoc/topic/nau8825' and 'asoc/topic/omap' into asoc-next
2017-02-19Merge remote-tracking branches 'asoc/topic/atmel', 'asoc/topic/chmap', ↵Mark Brown
'asoc/topic/cq93vc' and 'asoc/topic/da7218' into asoc-next
2017-02-19spi: spi-ti-qspi: Fix error handlingChristophe JAILLET
'dma_request_chan_by_mask()' can not return NULL. Try to keep the logic in 'no_dma:' by resetting 'qspi->rx_chan' in case of error. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-02-19spi: spi-ti-qspi: Fix error handlingChristophe JAILLET
'dma_request_chan_by_mask()' can not return NULL. Try to keep the logic in 'no_dma:' by resetting 'qspi->rx_chan' in case of error. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-02-19spi: lantiq-ssc: activate under COMPILE_TESTHauke Mehrtens
This driver should compile on all platforms, activate it under compile test. The Lantiq specific parts are under ifdef and should be removed when Lantiq platform supports common clock framework. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-02-19spi: armada-3700: Remove spi_master_put in a3700_spi_remove()Wei Yongjun
The call to spi_master_put() in a3700_spi_remove() is redundant since the master is registered using devm_spi_register_master() and no reference hold by using spi_master_get() in a3700_spi_remove(). This is detected by Coccinelle semantic patch. Fixes: 5762ab71eb24 ("spi: Add support for Armada 3700 SPI Controller") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-02-19IB/srp: Drain the send queue before destroying a QPBart Van Assche
A quote from the IB spec: However, if the Consumer does not wait for the Affiliated Asynchronous Last WQE Reached Event, then WQE and Data Segment leakage may occur. Therefore, it is good programming practice to tear down a QP that is associated with an SRQ by using the following process: * Put the QP in the Error State; * wait for the Affiliated Asynchronous Last WQE Reached Event; * either: * drain the CQ by invoking the Poll CQ verb and either wait for CQ to be empty or the number of Poll CQ operations has exceeded CQ capacity size; or * post another WR that completes on the same CQ and wait for this WR to return as a WC; * and then invoke a Destroy QP or Reset QP. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/core: Add support for draining IB_POLL_DIRECT completion queuesBart Van Assche
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/srp: Improve an error pathBart Van Assche
Avoid that the following message is printed if login fails: scsi host0: ib_srp: Sending CM DREQ failed Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/srp: Make a diagnostic message more informativeBart Van Assche
Report the destination port GID if connecting fails. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/srp: Document locking conventionsBart Van Assche
Use lockdep_assert_held() statements to verify at run-time whether the proper locks are held. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/srp: Fix race conditions related to task managementBart Van Assche
Avoid that srp_process_rsp() overwrites the status information in ch if the SRP target response timed out and processing of another task management function has already started. Avoid that issuing multiple task management functions concurrently triggers list corruption. This patch prevents that the following stack trace appears in the system log: WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0 list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68 CPU: 8 PID: 9269 Comm: sg_reset Tainted: G W 4.10.0-rc7-dbg+ #3 Call Trace: dump_stack+0x68/0x93 __warn+0xc6/0xe0 warn_slowpath_fmt+0x4a/0x50 __list_del_entry_valid+0xbc/0xc0 wait_for_completion_timeout+0x12e/0x170 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp] srp_reset_device+0x5b/0x110 [ib_srp] scsi_ioctl_reset+0x1c7/0x290 scsi_ioctl+0x12a/0x420 sd_ioctl+0x9d/0x100 blkdev_ioctl+0x51e/0x9f0 block_ioctl+0x38/0x40 do_vfs_ioctl+0x8f/0x700 SyS_ioctl+0x3c/0x70 entry_SYSCALL_64_fastpath+0x18/0xad Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Steve Feeley <Steve.Feeley@sandisk.com> Cc: <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/srp: Avoid that duplicate responses trigger a kernel bugBart Van Assche
After srp_process_rsp() returns there is a short time during which the scsi_host_find_tag() call will return a pointer to the SCSI command that is being completed. If during that time a duplicate response is received, avoid that the following call stack appears: BUG: unable to handle kernel NULL pointer dereference at (null) IP: srp_recv_done+0x450/0x6b0 [ib_srp] Oops: 0000 [#1] SMP CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1 Call Trace: <IRQ> __ib_process_cq+0x4b/0xd0 [ib_core] ib_poll_handler+0x1d/0x70 [ib_core] irq_poll_softirq+0xba/0x120 __do_softirq+0xba/0x4c0 irq_exit+0xbe/0xd0 smp_apic_timer_interrupt+0x38/0x50 apic_timer_interrupt+0x90/0xa0 </IRQ> RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Steve Feeley <Steve.Feeley@sandisk.com> Cc: <stable@vger.kernel.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/SRP: Avoid using IB_MR_TYPE_SG_GAPSBart Van Assche
Tests have shown that the following error message is reported when using SG-GAPS registration with an mlx5 adapter: scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0f007806 2500002a ad9fafd1 scsi host1: ib_srp: reconnect succeeded mlx5_0:dump_cqe:262:(pid 7369): dump error cqe 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0f007806 25000032 00105dd0 scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138 Hence avoid using SG-GAPS memory registrations. Additionally, always configure the blk_queue_virt_boundary() to avoid to trigger a mapping failure when using adapters that support SG-GAPS (e.g. mlx5). Fixes: commit ad8e66b4a801 ("IB/srp: fix mr allocation when the device supports sg gaps") Fixes: commit 509c5f33f4f6 ("IB/srp: Prevent mapping failures") Reported-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mark Bloch <markb@mellanox.com> Cc: Yuval Shaia <yuval.shaia@oracle.com> Cc: <stable@vger.kernel.org> # 4.7+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19RDMA/qedr: Fix some error handlingChristophe Jaillet
'qedr_alloc_pbl_tbl()' can not return NULL. In qedr_init_user_queue(): - simplify the test for the return value, no need to test for NULL - propagate the error pointer if needed, otherwise 0 (success) is returned. This is spurious. In init_mr_info(): - test the return value with IS_ERR - propagate the error pointer if needed instead of an exlictit -ENOMEM. This is a no-op as the only error pointer that we can have here is already -ENOMEM Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19RDMA/bnxt_re: add DCB dependencyArnd Bergmann
When CONFIG_DCB is disabled, we get a link error: drivers/infiniband/built-in.o: In function `bnxt_re_setup_qos': trace.c:(.text+0x155774): undefined reference to `dcb_ieee_getapp_mask' trace.c:(.text+0x155774): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `dcb_ieee_getapp_mask' trace.c:(.text+0x155794): undefined reference to `dcb_ieee_getapp_mask' trace.c:(.text+0x155794): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `dcb_ieee_getapp_mask' Like the other drivers that use this function, a Kconfig dependency is the correct fix. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/hns: include linux/module.hArnd Bergmann
I ran into a build error on arm64 randconfig testing: infiniband/hw/hns/hns_roce_main.c:539:1: error: data definition has no type or storage class [-Werror] infiniband/hw/hns/hns_roce_main.c:539:1: error: type defaults to 'int' in declaration of 'MODULE_DEVICE_TABLE' [-Werror=implicit-int] infiniband/hw/hns/hns_roce_main.c:539:1: error: parameter names (without types) in function declaration [-Werror] infiniband/hw/hns/hns_roce_main.c:979:226: error: data definition has no type or storage class [-Werror] infiniband/hw/hns/hns_roce_main.c:979:226: error: type defaults to 'int' in declaration of 'module_init' [-Werror=implicit-int] infiniband/hw/hns/hns_roce_main.c:979:1: error: parameter names (without types) in function declaration [-Werror] Including the module.h makes it build again. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>