linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-07-29	spi: lpspi: remove unused fsl_lpspi->chipselect	Clark Wang
	The cs-gpio is initailized by spi_get_gpio_descs() now. Remove the chipselect. Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Link: https://lore.kernel.org/r/20200727031448.31661-3-xiaoning.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-07-29	spi: lpspi: Fix kernel warning dump when probe fail after calling spi_register	Clark Wang
	Calling devm_spi_register_controller() too early will cause problem. When probe failed occurs after calling devm_spi_register_controller(), the call of spi_controller_put() will trigger the following warning dump. [ 2.092138] ------------[ cut here ]------------ [ 2.096876] kernfs: can not remove 'uevent', no directory [ 2.102440] WARNING: CPU: 0 PID: 181 at fs/kernfs/dir.c:1503 kernfs_remove_by_name_ns+0xa0/0xb0 [ 2.111142] Modules linked in: [ 2.114207] CPU: 0 PID: 181 Comm: kworker/0:7 Not tainted 5.4.24-05024-g775c6e8a738c-dirty #1314 [ 2.122991] Hardware name: Freescale i.MX8DXL EVK (DT) [ 2.128141] Workqueue: events deferred_probe_work_func [ 2.133281] pstate: 60000005 (nZCv daif -PAN -UAO) [ 2.138076] pc : kernfs_remove_by_name_ns+0xa0/0xb0 [ 2.142958] lr : kernfs_remove_by_name_ns+0xa0/0xb0 [ 2.147837] sp : ffff8000122bba70 [ 2.151145] x29: ffff8000122bba70 x28: ffff8000119d6000 [ 2.156462] x27: 0000000000000000 x26: ffff800011edbce8 [ 2.161779] x25: 0000000000000000 x24: ffff00003ae4f700 [ 2.167096] x23: ffff000010184c10 x22: ffff00003a3d6200 [ 2.172412] x21: ffff800011a464a8 x20: ffff000010126a68 [ 2.177729] x19: ffff00003ae5c800 x18: 000000000000000e [ 2.183046] x17: 0000000000000001 x16: 0000000000000019 [ 2.188362] x15: 0000000000000004 x14: 000000000000004c [ 2.193679] x13: 0000000000000000 x12: 0000000000000001 [ 2.198996] x11: 0000000000000000 x10: 00000000000009c0 [ 2.204313] x9 : ffff8000122bb7a0 x8 : ffff00003a3d6c20 [ 2.209630] x7 : ffff00003a3d6380 x6 : 0000000000000001 [ 2.214946] x5 : 0000000000000001 x4 : ffff00003a05eb18 [ 2.220263] x3 : 0000000000000005 x2 : ffff8000119f1c48 [ 2.225580] x1 : 2bcbda323bf5a800 x0 : 0000000000000000 [ 2.230898] Call trace: [ 2.233345] kernfs_remove_by_name_ns+0xa0/0xb0 [ 2.237879] sysfs_remove_file_ns+0x14/0x20 [ 2.242065] device_del+0x12c/0x348 [ 2.245555] device_unregister+0x14/0x30 [ 2.249492] spi_unregister_controller+0xac/0x120 [ 2.254201] devm_spi_unregister+0x10/0x18 [ 2.258304] release_nodes+0x1a8/0x220 [ 2.262055] devres_release_all+0x34/0x58 [ 2.266069] really_probe+0x1b8/0x318 [ 2.269733] driver_probe_device+0x54/0xe8 [ 2.273833] __device_attach_driver+0x80/0xb8 [ 2.278194] bus_for_each_drv+0x74/0xc0 [ 2.282034] __device_attach+0xdc/0x138 [ 2.285876] device_initial_probe+0x10/0x18 [ 2.290063] bus_probe_device+0x90/0x98 [ 2.293901] deferred_probe_work_func+0x64/0x98 [ 2.298442] process_one_work+0x198/0x320 [ 2.302451] worker_thread+0x1f0/0x420 [ 2.306208] kthread+0xf0/0x120 [ 2.309352] ret_from_fork+0x10/0x18 [ 2.312927] ---[ end trace 58abcdfae01bd3c7 ]--- So put this function at the end of the probe sequence. Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Link: https://lore.kernel.org/r/20200727031448.31661-2-xiaoning.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-07-29	RDMA/efa: Add EFA 0xefa1 PCI ID	Gal Pressman
	Add support for 0xefa1 devices. Link: https://lore.kernel.org/r/20200722140312.3651-5-galpress@amazon.com Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29	RDMA/efa: User/kernel compatibility handshake mechanism	Gal Pressman
	Introduce a mechanism that performs an handshake between the userspace provider and kernel driver which verifies that the user supports all required features in order to operate correctly. The handshake verifies the needed functionality by comparing the reported device caps and the provider caps. If the device reports a non-zero capability the appropriate comp mask is required from the userspace provider in order to allocate the context. Link: https://lore.kernel.org/r/20200722140312.3651-4-galpress@amazon.com Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29	RDMA/efa: Expose minimum SQ size	Gal Pressman
	The device reports the minimum SQ size required for creation. This patch queries the min SQ size and reports it back to the userspace library. Link: https://lore.kernel.org/r/20200722140312.3651-3-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29	RDMA/efa: Expose maximum TX doorbell batch	Gal Pressman
	The device reports the maximum number of bytes to be written before ringing the doorbell (zero means unlimited). This patch queries the max batch size and reports it back to the userspace library. Link: https://lore.kernel.org/r/20200722140312.3651-2-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29	usb: typec: tcpm: Add WARN_ON ensure we are not trying to send 2 VDM packets ↵	Hans de Goede
	at the same time The tcpm.c code for sending VDMs assumes that there will only be one VDM in flight at the time. The "queue" used by tcpm_queue_vdm is only 1 entry deep. This assumes that the higher layers (tcpm state-machine and alt-mode drivers) ensure that queuing a new VDM before the old one has been completely send (or it timed out) add a WARN_ON to check for this. Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-6-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	usb: typec: tcpm: Fix AB BA lock inversion between tcpm code and the ↵	Hans de Goede
	alt-mode drivers When we receive a PD data packet which ends up being for the alt-mode driver we have the following lock order: 1. tcpm_pd_rx_handler take the tcpm-port lock 2. We call into the alt-mode driver which takes the alt-mode's lock And when the alt-mode driver initiates communication we have the following lock order: 3. alt-mode driver takes the alt-mode's lock 4. alt-mode driver calls tcpm_altmode_enter which takes the tcpm-port lock This is a classic AB BA lock inversion issue. With the refactoring of tcpm_handle_vdm_request() done before this patch, we don't rely on, or need to make changes to the tcpm-port data by the time we make call 2. from above. All data to be passed to the alt-mode driver sits on our stack at this point, and thus does not need locking. So after the refactoring we can simply fix this by releasing the tcpm-port lock before calling into the alt-mode driver. This fixes the following lockdep warning: [ 191.454238] ====================================================== [ 191.454240] WARNING: possible circular locking dependency detected [ 191.454244] 5.8.0-rc5+ #1 Not tainted [ 191.454246] ------------------------------------------------------ [ 191.454248] kworker/u8:5/794 is trying to acquire lock: [ 191.454251] ffff9bac8e30d4a8 (&dp->lock){+.+.}-{3:3}, at: dp_altmode_vdm+0x30/0xf0 [typec_displayport] [ 191.454263] but task is already holding lock: [ 191.454264] ffff9bac9dc240a0 (&port->lock#2){+.+.}-{3:3}, at: tcpm_pd_rx_handler+0x43/0x12c0 [tcpm] [ 191.454273] which lock already depends on the new lock. [ 191.454275] the existing dependency chain (in reverse order) is: [ 191.454277] -> #1 (&port->lock#2){+.+.}-{3:3}: [ 191.454286] __mutex_lock+0x7b/0x820 [ 191.454290] tcpm_altmode_enter+0x23/0x90 [tcpm] [ 191.454293] dp_altmode_work+0xca/0xe0 [typec_displayport] [ 191.454299] process_one_work+0x23f/0x570 [ 191.454302] worker_thread+0x55/0x3c0 [ 191.454305] kthread+0x138/0x160 [ 191.454309] ret_from_fork+0x22/0x30 [ 191.454311] -> #0 (&dp->lock){+.+.}-{3:3}: [ 191.454317] __lock_acquire+0x1241/0x2090 [ 191.454320] lock_acquire+0xa4/0x3d0 [ 191.454323] __mutex_lock+0x7b/0x820 [ 191.454326] dp_altmode_vdm+0x30/0xf0 [typec_displayport] [ 191.454330] tcpm_pd_rx_handler+0x11ae/0x12c0 [tcpm] [ 191.454333] process_one_work+0x23f/0x570 [ 191.454336] worker_thread+0x55/0x3c0 [ 191.454338] kthread+0x138/0x160 [ 191.454341] ret_from_fork+0x22/0x30 [ 191.454343] other info that might help us debug this: [ 191.454345] Possible unsafe locking scenario: [ 191.454347] CPU0 CPU1 [ 191.454348] ---- ---- [ 191.454350] lock(&port->lock#2); [ 191.454353] lock(&dp->lock); [ 191.454355] lock(&port->lock#2); [ 191.454357] lock(&dp->lock); [ 191.454360] * DEADLOCK * Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-5-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	usb: typec: tcpm: Refactor tcpm_handle_vdm_request	Hans de Goede
	Refactor tcpm_handle_vdm_request and its tcpm_pd_svdm helper function so that reporting the results of the vdm to the altmode-driver is separated out into a clear separate step inside tcpm_handle_vdm_request, instead of being scattered over various places inside the tcpm_pd_svdm helper. This is a preparation patch for fixing an AB BA lock inversion between the tcpm code and some altmode drivers. Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-4-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	usb: typec: tcpm: Refactor tcpm_handle_vdm_request payload handling	Hans de Goede
	Refactor the tcpm_handle_vdm_request payload handling by doing the endianness conversion only once directly inside tcpm_handle_vdm_request itself instead of doing it multiple times inside various helper functions called by tcpm_handle_vdm_request. This is a preparation patch for some further refactoring to fix an AB BA lock inversion between the tcpm code and some altmode drivers. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-3-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	usb: typec: tcpm: Add tcpm_queue_vdm_unlocked() helper	Hans de Goede
	Various callers (all the typec_altmode_ops) take the port-lock just for the tcpm_queue_vdm() call. Add a new tcpm_queue_vdm_unlocked() helper which takes the lock, so that its callers don't have to do this themselves. This is a preparation patch for fixing an AB BA lock inversion between the tcpm code and some altmode drivers. Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-2-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	usb: typec: tcpm: Move mod_delayed_work(&port->vdm_state_machine) call into ↵	Hans de Goede
	tcpm_queue_vdm() All callers of tcpm_queue_vdm() immediately follow the tcpm_queue_vdm() vdm call with a: mod_delayed_work(port->wq, &port->vdm_state_machine, 0); Call, fold this into tcpm_queue_vdm() itself. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	IB/srpt: use new shared CQ mechanism	Yamin Friedman
	Have the driver use shared CQs provided by the rdma core driver. This provides the advantage of improved efficiency handling interrupts. Link: https://lore.kernel.org/r/20200722135629.49467-3-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29	IB/isert: use new shared CQ mechanism	Yamin Friedman
	Have the driver use shared CQs provided by the rdma core driver. Since this provides similar functionality to iser_comp it has been removed. Now there is no reason to allocate very large CQs when the driver is loaded while gaining the advantage of shared CQs. Previously when a single connection was opened a CQ was opened for every core with enough space for eight connections, this is a very large overhead that in most cases will not be utilized. Link: https://lore.kernel.org/r/20200722135629.49467-2-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29	IB/iser: use new shared CQ mechanism	Yamin Friedman
	Have the driver use shared CQs provided by the rdma core driver. Since this provides similar functionality to iser_comp it has been removed. Now there is no reason to allocate very large CQs when the driver is loaded while gaining the advantage of shared CQs. Link: https://lore.kernel.org/r/20200722135629.49467-1-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Acked-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29	staging/speakup: Move out of staging	Samuel Thibault
	The nasty TODO items are done. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Link: https://lore.kernel.org/r/20200729003531.907370-1-samuel.thibault@ens-lyon.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	staging: sm750fb: use generic power management	Vaibhav Gupta
	Drivers using legacy power management .suspen()/.resume() callbacks have to manage PCI states and device's PM states themselves. They also need to take care of standard configuration registers. Switch to generic power management framework using a single "struct dev_pm_ops" variable to take the unnecessary load from the driver. This also avoids the need for the driver to directly call most of the PCI helper functions and device power state control functions, as through the generic framework PCI Core takes care of the necessary operations, and drivers are required to do only device-specific jobs. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Link: https://lore.kernel.org/r/20200728123349.1331679-1-vaibhavgupta40@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	Staging: rtl8712: Fixed a coding sytle issue	Ankit Baluni
	Removed braces for a 'if' condition as it contain only single line & there is no need for braces for such case according to coding style rules. Signed-off-by: Ankit Baluni <b18007@students.iitmandi.ac.in> Link: https://lore.kernel.org/r/20200729074541.1972-1-b18007@students.iitmandi.ac.in Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	staging: rtl8723bs: remove redundant assignment to variable ret	Colin Ian King
	The variable ret is being assigned an error return value that is never read, the control passes to a return statement and ret is never referenced. Remove the redundant assignment. Also remove an empty line. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200729100525.573500-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	staging: most: usb: remove NET dependency	Christian Gromm
	This patch removes the dependency to NET as it is no longer needed. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Link: https://lore.kernel.org/r/1595927275-27462-1-git-send-email-christian.gromm@microchip.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	Merge tag 'usb-ci-v5.9-rc1' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-next Peter writes: ENDIAN issue fix and one query controller role API is introduced. * tag 'usb-ci-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb: usb: chipidea: imx: get available runtime dr mode for wakeup setting usb: chipidea: add query_available_role interface Documentation: ABI: usb: chipidea: Update Li Jun's e-mail usb: chipidea: udc: fix the ENDIAN issue
2020-07-29	thermal: core: Add thermal zone enable/disable notification	Daniel Lezcano
	Now the calls to enable/disable a thermal zone are centralized in a call to a function, we can add in these the corresponding netlink notifications. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Link: https://lore.kernel.org/r/20200727231033.26512-1-daniel.lezcano@linaro.org
2020-07-29	habanalabs: goya_ctx_init() can be static	kernel test robot
	Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20200729000313.GA14680@e442e3f624c4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	habanalabs: fix up absolute include instructions	Greg Kroah-Hartman
	There's no need to try to be cute with the include file locations in the Makefile, so just specify exactly where the files are. Bonus is this fixes the problem of building with O= as well as trying to just build the subdirectory alone. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Ben Segal <bpsegal20@gmail.com> Cc: Christine Gharzuzi <cgharzuzi@habana.ai> Cc: Pawel Piskorski <ppiskorski@habana.ai> Link: https://lore.kernel.org/r/20200728171851.55842-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	nvme: add a Identify Namespace Identification Descriptor list quirk	Christoph Hellwig
	Add a quirk for a device that does not support the Identify Namespace Identification Descriptor list despite claiming 1.3 compliance. Fixes: ea43d9709f72 ("nvme: fix identify error status silent ignore") Reported-by: Ingo Brunberg <ingo_brunberg@web.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Ingo Brunberg <ingo_brunberg@web.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2020-07-29	nvme-loop: remove extra variable in create ctrl	Chaitanya Kulkarni
	We can call the nvme_change_ctrl_state() directly and have WARN_ON_ONCE(1) call instead of having to use an extra variable which matches the name of the function. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-loop: set ctrl state connecting after init	Chaitanya Kulkarni
	When creating a loop controller (ctrl) in nvme_loop_create_ctrl() -> nvme_init_ctrl() we set the ctrl state to NVME_CTRL_NEW. Prior to [1] NVME_CTRL_NEW state was allowed in nvmf_check_ready() for fabrics command type connect. Now, this fails in the following code path for fabrics connect command when creating admin queue :- nvme_loop_create_ctrl() nvme_loo_configure_admin_queue() nvmf_connect_admin_queue() __nvme_submit_sync_cmd() blk_execute_rq() nvme_loop_queue_rq() nvmf_check_ready() # echo "transport=loop,nqn=fs" > /dev/nvme-fabrics [ 6047.741327] nvmet: adding nsid 1 to subsystem fs [ 6048.756430] nvme nvme1: Connect command failed, error wo/DNR bit: 880 We need to set the ctrl state to NVME_CTRL_CONNECTING after :- nvme_loop_create_ctrl() nvme_init_ctrl() so that the above mentioned check for nvmf_check_ready() will return true. This patch sets the ctrl state to connecting after we init the ctrl in nvme_loop_create_ctrl() nvme_init_ctrl() . [1] commit aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Fixes: aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths	Hannes Reinecke
	When nvme_round_robin_path() finds a valid namespace we should be using it; falling back to __nvme_find_path() for non-optimized paths will cause the result from nvme_round_robin_path() to be ignored for non-optimized paths. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-multipath: fix logic for non-optimized paths	Martin Wilck
	Handle the special case where we have exactly one optimized path, which we should keep using in this case. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Signed off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-rdma: fix controller reset hang during traffic	Sagi Grimberg
	commit fe35ec58f0d3 ("block: update hctx map when use multiple maps") exposed an issue where we may hang trying to wait for queue freeze during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple queue maps (which we have now for default/read/poll) is attempting to freeze the queue. However we never started queue freeze when starting the reset, which means that we have inflight pending requests that entered the queue that we will not complete once the queue is quiesced. So start a freeze before we quiesce the queue, and unfreeze the queue after we successfully connected the I/O queues (and make sure to call blk_mq_update_nr_hw_queues only after we are sure that the queue was already frozen). This follows to how the pci driver handles resets. Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-tcp: fix controller reset hang during traffic	Sagi Grimberg
	commit fe35ec58f0d3 ("block: update hctx map when use multiple maps") exposed an issue where we may hang trying to wait for queue freeze during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple queue maps (which we have now for default/read/poll) is attempting to freeze the queue. However we never started queue freeze when starting the reset, which means that we have inflight pending requests that entered the queue that we will not complete once the queue is quiesced. So start a freeze before we quiesce the queue, and unfreeze the queue after we successfully connected the I/O queues (and make sure to call blk_mq_update_nr_hw_queues only after we are sure that the queue was already frozen). This follows to how the pci driver handles resets. Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet: introduce the passthru Kconfig option	Chaitanya Kulkarni
	This patch updates KConfig file for the NVMeOF target where we add new option so that user can selectively enable/disable passthru code. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [logang@deltatee.com: fixed some of the wording in the help message] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet: introduce the passthru configfs interface	Logan Gunthorpe
	When CONFIG_NVME_TARGET_PASSTHRU as 'passthru' directory will be added to each subsystem. The directory is similar to a namespace and has two attributes: device_path and enable. The user must set the path to the nvme controller's char device and write '1' to enable the subsystem to use passthru. Any given subsystem is prevented from enabling both a regular namespace and the passthru device. If one is enabled, enabling the other will produce an error. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet: Add passthru enable/disable helpers	Logan Gunthorpe
	This patch adds helper functions which are used in the NVMeOF configfs when the user is configuring the passthru subsystem. Here we ensure that only one subsys is assigned to each nvme_ctrl by using an xarray on the cntlid. The subsystem's version number is overridden by the passed through controller's version. However, if that version is less than 1.2.1, then we bump the advertised version to that and print a warning in dmesg. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet: add passthru code to process commands	Logan Gunthorpe
	Add passthru command handling capability for the NVMeOF target and export passthru APIs which are used to integrate passthru code with nvmet-core. The new file passthru.c handles passthru cmd parsing and execution. In the passthru mode, we create a block layer request from the nvmet request and map the data on to the block layer request. Admin commands and features are on an allow list as there are a number of each that don't make too much sense with passthrough. We use an allow list such that new commands can be considered before being blindly passed through. In both cases, vendor specific commands are always allowed. We also reject reservation IO commands as the underlying device cannot differentiate between multiple hosts behind a fabric. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme: export nvme_find_get_ns() and nvme_put_ns()	Logan Gunthorpe
	nvme_find_get_ns() and nvme_put_ns() are required by the target passthru code and are exported under the NVME_TARGET_PASSTHRU namespace. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme: introduce nvme_ctrl_get_by_path()	Logan Gunthorpe
	nvme_ctrl_get_by_path() is analogous to blkdev_get_by_path() except it gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0). It makes use of filp_open() to open the file and uses the private data to obtain a pointer to the struct nvme_ctrl. If the fops of the file do not match, -EINVAL is returned. The purpose of this function is to support NVMe-OF target passthru and is exported under the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme: introduce nvme_execute_passthru_rq to call nvme_passthru_[start\|end]()	Logan Gunthorpe
	Introduce a new nvme_execute_passthru_rq() helper which calls nvme_passthru_[start\|end]() around blk_execute_rq(). This ensures all passthru calls (including nvme_submit_io()) will be wrapped appropriately. nvme_execute_passthru_rq() will also be useful for the nvmet passthru code and is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme: create helper function to obtain command effects	Logan Gunthorpe
	Separate the code to obtain command effects from the code to start a passthru request and move the nvme_passthru_start() and nvme_passthru_end() functions up above nvme_submit_user_cmd() in order that they may be used in a new helper a subsequent patch. The new helper function will be necessary for nvmet passthru code to determine if we need to change out of interrupt context to handle the effects. It is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme: clear any SGL flags in passthru commands	Logan Gunthorpe
	The host driver should decide whether to use SGLs or PRPs and they currently assume the flags are cleared after the call to nvme_setup_cmd(). However, passed-through commands may erroneously set these bits; so clear them for all cases. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet-fc: remove redundant del_work_active flag	James Smart
	The transport has a del_work_active flag to avoid duplicate scheduling of the del_work item. This is redundant with the checks that schedule_work() makes. Remove the del_work_active flag. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet-fc: check successful reference in nvmet_fc_find_target_assoc	James Smart
	When searching for an association based on an association id, when there is a match, the code takes a reference. However, it is not validating that the reference taking was successful. Check the status of the reference. If unsuccessful, the device is being deleted and should be ignored. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-fc: set max_segments to lldd max value	James Smart
	Currently the FC transport is set max_hw_sectors based on the lldds max sgl segment count. However, the block queue max segments is set based on the controller's max_segments count, which the transport does not set. As such, the lldd is receiving sgl lists that are exceeding its max segment count. Set the controller max segment count and derive max_hw_sectors from the max segment count. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-hwmon: log the controller device name	Sagi Grimberg
	Stay consistent with the rest of the driver Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme: fix deadlock in disconnect during scan_work and/or ana_work	Sagi Grimberg
	A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: 0d0b660f214d ("nvme: add ANA support") Reported-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme: document nvme controller states	Sagi Grimberg
	We are starting to see some non-trivial states so lets start documenting them. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet: use xarray for ctrl ns storing	Chaitanya Kulkarni
	This patch replaces the ctrl->namespaces tracking from linked list to xarray and improves the performance when accessing one namespce :- XArray vs Default:- IOPS and BW (more the better) increase BW (~1.8%):- --------------------------------------------------- XArray :- read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec) read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec) read: IOPS=162k, BW=631MiB/s (662MB/s)(18.5GiB/30001msec) Default:- read: IOPS=156k, BW=609MiB/s (639MB/s)(17.8GiB/30001msec) read: IOPS=157k, BW=613MiB/s (643MB/s)(17.0GiB/30001msec) read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec) Submission latency (less the better) decrease (~8.3%):- ------------------------------------------------------- XArray:- slat (usec): min=7, max=8386, avg=11.19, stdev=5.96 slat (usec): min=7, max=441, avg=11.09, stdev=4.48 slat (usec): min=7, max=1088, avg=11.21, stdev=4.54 Default :- slat (usec): min=8, max=2826.5k, avg=23.96, stdev=3911.50 slat (usec): min=8, max=503, avg=12.52, stdev=5.07 slat (usec): min=8, max=2384, avg=12.50, stdev=5.28 CPU Usage (less the better) decrease (~5.2%):- ---------------------------------------------- XArray:- cpu : usr=1.84%, sys=18.61%, ctx=949471, majf=0, minf=250 cpu : usr=1.83%, sys=18.41%, ctx=950262, majf=0, minf=237 cpu : usr=1.82%, sys=18.82%, ctx=957224, majf=0, minf=234 Default:- cpu : usr=1.70%, sys=19.21%, ctx=858196, majf=0, minf=251 cpu : usr=1.82%, sys=19.98%, ctx=929720, majf=0, minf=227 cpu : usr=1.83%, sys=20.33%, ctx=947208, majf=0, minf=235. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvmet-rdma: use new shared CQ mechanism	Yamin Friedman
	Has the driver use shared CQs providing ~10%-20% improvement when multiple disks are used. Instead of opening a CQ for each QP per controller, a CQ for each core will be provided by the RDMA core driver that will be shared between the QPs on that core reducing interrupt overhead. Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-rdma: use new shared CQ mechanism	Yamin Friedman
	Has the driver use shared CQs providing ~10%-20% improvement as seen in the patch introducing shared CQs. Instead of opening a CQ for each QP per controller connected, a CQ for each QP will be provided by the RDMA core driver that will be shared between the QPs on that core reducing interrupt overhead. Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29	nvme-pci: add support for ACPI StorageD3Enable property	David E. Box
	This patch implements a solution for a BIOS hack used on some currently shipping Intel systems to change driver power management policy for PCIe NVMe drives. Some newer Intel platforms, like some Comet Lake systems, require that PCIe devices use D3 when doing suspend-to-idle in order to allow the platform to realize maximum power savings. This is particularly needed to support ATX power supply shutdown on desktop systems. In order to ensure this happens for root ports with storage devices, Microsoft apparently created this ACPI _DSD property as a way to influence their driver policy. To my knowledge this property has not been discussed with the NVME specification body. Though the solution is not ideal, it addresses a problem that also affects Linux since the NVMe driver's default policy of using NVMe APST during suspend-to-idle prevents the PCI root port from going to D3 and leads to higher power consumption for these platforms. The power consumption difference may be negligible on laptop systems, but many watts on desktop systems when the ATX power supply is blocked from powering down. The patch creates a new nvme_acpi_storage_d3 function to check for the StorageD3Enable property during probe and enables D3 as a quirk if set. It also provides a 'noacpi' module parameter to allow skipping the quirk if needed. Tested with: - PM961 NVMe SED Samsung 512GB - INTEL SSDPEKKF512G8 Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>