summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-07-29RDMA/efa: Add EFA 0xefa1 PCI IDGal Pressman
Add support for 0xefa1 devices. Link: https://lore.kernel.org/r/20200722140312.3651-5-galpress@amazon.com Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/efa: User/kernel compatibility handshake mechanismGal Pressman
Introduce a mechanism that performs an handshake between the userspace provider and kernel driver which verifies that the user supports all required features in order to operate correctly. The handshake verifies the needed functionality by comparing the reported device caps and the provider caps. If the device reports a non-zero capability the appropriate comp mask is required from the userspace provider in order to allocate the context. Link: https://lore.kernel.org/r/20200722140312.3651-4-galpress@amazon.com Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/efa: Expose minimum SQ sizeGal Pressman
The device reports the minimum SQ size required for creation. This patch queries the min SQ size and reports it back to the userspace library. Link: https://lore.kernel.org/r/20200722140312.3651-3-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/efa: Expose maximum TX doorbell batchGal Pressman
The device reports the maximum number of bytes to be written before ringing the doorbell (zero means unlimited). This patch queries the max batch size and reports it back to the userspace library. Link: https://lore.kernel.org/r/20200722140312.3651-2-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29usb: typec: tcpm: Add WARN_ON ensure we are not trying to send 2 VDM packets ↵Hans de Goede
at the same time The tcpm.c code for sending VDMs assumes that there will only be one VDM in flight at the time. The "queue" used by tcpm_queue_vdm is only 1 entry deep. This assumes that the higher layers (tcpm state-machine and alt-mode drivers) ensure that queuing a new VDM before the old one has been completely send (or it timed out) add a WARN_ON to check for this. Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-6-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29usb: typec: tcpm: Fix AB BA lock inversion between tcpm code and the ↵Hans de Goede
alt-mode drivers When we receive a PD data packet which ends up being for the alt-mode driver we have the following lock order: 1. tcpm_pd_rx_handler take the tcpm-port lock 2. We call into the alt-mode driver which takes the alt-mode's lock And when the alt-mode driver initiates communication we have the following lock order: 3. alt-mode driver takes the alt-mode's lock 4. alt-mode driver calls tcpm_altmode_enter which takes the tcpm-port lock This is a classic AB BA lock inversion issue. With the refactoring of tcpm_handle_vdm_request() done before this patch, we don't rely on, or need to make changes to the tcpm-port data by the time we make call 2. from above. All data to be passed to the alt-mode driver sits on our stack at this point, and thus does not need locking. So after the refactoring we can simply fix this by releasing the tcpm-port lock before calling into the alt-mode driver. This fixes the following lockdep warning: [ 191.454238] ====================================================== [ 191.454240] WARNING: possible circular locking dependency detected [ 191.454244] 5.8.0-rc5+ #1 Not tainted [ 191.454246] ------------------------------------------------------ [ 191.454248] kworker/u8:5/794 is trying to acquire lock: [ 191.454251] ffff9bac8e30d4a8 (&dp->lock){+.+.}-{3:3}, at: dp_altmode_vdm+0x30/0xf0 [typec_displayport] [ 191.454263] but task is already holding lock: [ 191.454264] ffff9bac9dc240a0 (&port->lock#2){+.+.}-{3:3}, at: tcpm_pd_rx_handler+0x43/0x12c0 [tcpm] [ 191.454273] which lock already depends on the new lock. [ 191.454275] the existing dependency chain (in reverse order) is: [ 191.454277] -> #1 (&port->lock#2){+.+.}-{3:3}: [ 191.454286] __mutex_lock+0x7b/0x820 [ 191.454290] tcpm_altmode_enter+0x23/0x90 [tcpm] [ 191.454293] dp_altmode_work+0xca/0xe0 [typec_displayport] [ 191.454299] process_one_work+0x23f/0x570 [ 191.454302] worker_thread+0x55/0x3c0 [ 191.454305] kthread+0x138/0x160 [ 191.454309] ret_from_fork+0x22/0x30 [ 191.454311] -> #0 (&dp->lock){+.+.}-{3:3}: [ 191.454317] __lock_acquire+0x1241/0x2090 [ 191.454320] lock_acquire+0xa4/0x3d0 [ 191.454323] __mutex_lock+0x7b/0x820 [ 191.454326] dp_altmode_vdm+0x30/0xf0 [typec_displayport] [ 191.454330] tcpm_pd_rx_handler+0x11ae/0x12c0 [tcpm] [ 191.454333] process_one_work+0x23f/0x570 [ 191.454336] worker_thread+0x55/0x3c0 [ 191.454338] kthread+0x138/0x160 [ 191.454341] ret_from_fork+0x22/0x30 [ 191.454343] other info that might help us debug this: [ 191.454345] Possible unsafe locking scenario: [ 191.454347] CPU0 CPU1 [ 191.454348] ---- ---- [ 191.454350] lock(&port->lock#2); [ 191.454353] lock(&dp->lock); [ 191.454355] lock(&port->lock#2); [ 191.454357] lock(&dp->lock); [ 191.454360] *** DEADLOCK *** Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-5-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29usb: typec: tcpm: Refactor tcpm_handle_vdm_requestHans de Goede
Refactor tcpm_handle_vdm_request and its tcpm_pd_svdm helper function so that reporting the results of the vdm to the altmode-driver is separated out into a clear separate step inside tcpm_handle_vdm_request, instead of being scattered over various places inside the tcpm_pd_svdm helper. This is a preparation patch for fixing an AB BA lock inversion between the tcpm code and some altmode drivers. Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-4-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29usb: typec: tcpm: Refactor tcpm_handle_vdm_request payload handlingHans de Goede
Refactor the tcpm_handle_vdm_request payload handling by doing the endianness conversion only once directly inside tcpm_handle_vdm_request itself instead of doing it multiple times inside various helper functions called by tcpm_handle_vdm_request. This is a preparation patch for some further refactoring to fix an AB BA lock inversion between the tcpm code and some altmode drivers. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-3-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29usb: typec: tcpm: Add tcpm_queue_vdm_unlocked() helperHans de Goede
Various callers (all the typec_altmode_ops) take the port-lock just for the tcpm_queue_vdm() call. Add a new tcpm_queue_vdm_unlocked() helper which takes the lock, so that its callers don't have to do this themselves. This is a preparation patch for fixing an AB BA lock inversion between the tcpm code and some altmode drivers. Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-2-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29usb: typec: tcpm: Move mod_delayed_work(&port->vdm_state_machine) call into ↵Hans de Goede
tcpm_queue_vdm() All callers of tcpm_queue_vdm() immediately follow the tcpm_queue_vdm() vdm call with a: mod_delayed_work(port->wq, &port->vdm_state_machine, 0); Call, fold this into tcpm_queue_vdm() itself. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200724174702.61754-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29IB/srpt: use new shared CQ mechanismYamin Friedman
Have the driver use shared CQs provided by the rdma core driver. This provides the advantage of improved efficiency handling interrupts. Link: https://lore.kernel.org/r/20200722135629.49467-3-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29IB/isert: use new shared CQ mechanismYamin Friedman
Have the driver use shared CQs provided by the rdma core driver. Since this provides similar functionality to iser_comp it has been removed. Now there is no reason to allocate very large CQs when the driver is loaded while gaining the advantage of shared CQs. Previously when a single connection was opened a CQ was opened for every core with enough space for eight connections, this is a very large overhead that in most cases will not be utilized. Link: https://lore.kernel.org/r/20200722135629.49467-2-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29IB/iser: use new shared CQ mechanismYamin Friedman
Have the driver use shared CQs provided by the rdma core driver. Since this provides similar functionality to iser_comp it has been removed. Now there is no reason to allocate very large CQs when the driver is loaded while gaining the advantage of shared CQs. Link: https://lore.kernel.org/r/20200722135629.49467-1-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Acked-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29staging/speakup: Move out of stagingSamuel Thibault
The nasty TODO items are done. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Link: https://lore.kernel.org/r/20200729003531.907370-1-samuel.thibault@ens-lyon.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29staging: sm750fb: use generic power managementVaibhav Gupta
Drivers using legacy power management .suspen()/.resume() callbacks have to manage PCI states and device's PM states themselves. They also need to take care of standard configuration registers. Switch to generic power management framework using a single "struct dev_pm_ops" variable to take the unnecessary load from the driver. This also avoids the need for the driver to directly call most of the PCI helper functions and device power state control functions, as through the generic framework PCI Core takes care of the necessary operations, and drivers are required to do only device-specific jobs. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Link: https://lore.kernel.org/r/20200728123349.1331679-1-vaibhavgupta40@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29Staging: rtl8712: Fixed a coding sytle issueAnkit Baluni
Removed braces for a 'if' condition as it contain only single line & there is no need for braces for such case according to coding style rules. Signed-off-by: Ankit Baluni <b18007@students.iitmandi.ac.in> Link: https://lore.kernel.org/r/20200729074541.1972-1-b18007@students.iitmandi.ac.in Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29staging: rtl8723bs: remove redundant assignment to variable retColin Ian King
The variable ret is being assigned an error return value that is never read, the control passes to a return statement and ret is never referenced. Remove the redundant assignment. Also remove an empty line. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200729100525.573500-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29staging: most: usb: remove NET dependencyChristian Gromm
This patch removes the dependency to NET as it is no longer needed. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Link: https://lore.kernel.org/r/1595927275-27462-1-git-send-email-christian.gromm@microchip.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29Merge tag 'usb-ci-v5.9-rc1' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-next Peter writes: ENDIAN issue fix and one query controller role API is introduced. * tag 'usb-ci-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb: usb: chipidea: imx: get available runtime dr mode for wakeup setting usb: chipidea: add query_available_role interface Documentation: ABI: usb: chipidea: Update Li Jun's e-mail usb: chipidea: udc: fix the ENDIAN issue
2020-07-29Documentation/sysctl: Document uclamp sysctl knobsQais Yousef
Uclamp exposes 3 sysctl knobs: * sched_util_clamp_min * sched_util_clamp_max * sched_util_clamp_min_rt_default Document them in sysctl/kernel.rst. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716110347.19553-3-qais.yousef@arm.com
2020-07-29sched/uclamp: Add a new sysctl to control RT default boost valueQais Yousef
RT tasks by default run at the highest capacity/performance level. When uclamp is selected this default behavior is retained by enforcing the requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum value. This is also referred to as 'the default boost value of RT tasks'. See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks"). On battery powered devices, it is desired to control this default (currently hardcoded) behavior at runtime to reduce energy consumed by RT tasks. For example, a mobile device manufacturer where big.LITTLE architecture is dominant, the performance of the little cores varies across SoCs, and on high end ones the big cores could be too power hungry. Given the diversity of SoCs, the new knob allows manufactures to tune the best performance/power for RT tasks for the particular hardware they run on. They could opt to further tune the value when the user selects a different power saving mode or when the device is actively charging. The runtime aspect of it further helps in creating a single kernel image that can be run on multiple devices that require different tuning. Keep in mind that a lot of RT tasks in the system are created by the kernel. On Android for instance I can see over 50 RT tasks, only a handful of which created by the Android framework. To control the default behavior globally by system admins and device integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default to change the default boost value of the RT tasks. I anticipate this to be mostly in the form of modifying the init script of a particular device. To avoid polluting the fast path with unnecessary code, the approach taken is to synchronously do the update by traversing all the existing tasks in the system. This could race with a concurrent fork(), which is dealt with by introducing sched_post_fork() function which will ensure the racy fork will get the right update applied. Tested on Juno-r2 in combination with the RT capacity awareness [1]. By default an RT task will go to the highest capacity CPU and run at the maximum frequency, which is particularly energy inefficient on high end mobile devices because the biggest core[s] are 'huge' and power hungry. With this patch the RT task can be controlled to run anywhere by default, and doesn't cause the frequency to be maximum all the time. Yet any task that really needs to be boosted can easily escape this default behavior by modifying its requested uclamp.min value (p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall. [1] 804d402fb6f6: ("sched/rt: Make RT capacity-aware") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
2020-07-29sched/uclamp: Fix a deadlock when enabling uclamp static keyQais Yousef
The following splat was caught when setting uclamp value of a task: BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:49 cpus_read_lock+0x68/0x130 static_key_enable+0x1c/0x38 __sched_setscheduler+0x900/0xad8 Fix by ensuring we enable the key outside of the critical section in __sched_setscheduler() Fixes: 46609ce22703 ("sched/uclamp: Protect uclamp fast path code with static key") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716110347.19553-4-qais.yousef@arm.com
2020-07-29ASoC: tlv320adcx140: Fix various style errors and warningsDan Murphy
Fix white space issues and remove else case where it was not needed. Convert "static const char *" to "static const char * const" Fixes: 689c7655b50 ("ASoC: tlv320adcx140: Add the tlv320adcx140 codec driver family") Signed-off-by: Dan Murphy <dmurphy@ti.com> Link: https://lore.kernel.org/r/20200728164339.16841-1-dmurphy@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-07-29sched,tracing: Convert to sched_set_fifo()Peter Zijlstra
One module user of sched_setscheduler() was overlooked and is obviously causing build failures. Convert ring_buffer_benchmark to use sched_set_fifo_low() when fifo==1 and sched_set_fifo() when fifo==2. This is a bit of an abuse, but it makes the thing 'work' again. Specifically, it enables all combinations that were previously possible: producer higher than consumer consumer higher than producer Fixes: 616d91b68cd5 ("sched: Remove sched_setscheduler*() EXPORTs") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20200720214918.GM5523@worktop.programming.kicks-ass.net
2020-07-29ALSA: hda: fix NULL pointer dereference during suspendRanjani Sridharan
When the ASoC card registration fails and the codec component driver never probes, the codec device is not initialized and therefore memory for codec->wcaps is not allocated. This results in a NULL pointer dereference when the codec driver suspend callback is invoked during system suspend. Fix this by returning without performing any actions during codec suspend/resume if the card was not registered successfully. Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20200728231011.1454066-1-ranjani.sridharan@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-07-29habanalabs: goya_ctx_init() can be statickernel test robot
Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20200729000313.GA14680@e442e3f624c4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29habanalabs: fix up absolute include instructionsGreg Kroah-Hartman
There's no need to try to be cute with the include file locations in the Makefile, so just specify exactly where the files are. Bonus is this fixes the problem of building with O= as well as trying to just build the subdirectory alone. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Ben Segal <bpsegal20@gmail.com> Cc: Christine Gharzuzi <cgharzuzi@habana.ai> Cc: Pawel Piskorski <ppiskorski@habana.ai> Link: https://lore.kernel.org/r/20200728171851.55842-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29nvme: add a Identify Namespace Identification Descriptor list quirkChristoph Hellwig
Add a quirk for a device that does not support the Identify Namespace Identification Descriptor list despite claiming 1.3 compliance. Fixes: ea43d9709f72 ("nvme: fix identify error status silent ignore") Reported-by: Ingo Brunberg <ingo_brunberg@web.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Ingo Brunberg <ingo_brunberg@web.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2020-07-29nvme-loop: remove extra variable in create ctrlChaitanya Kulkarni
We can call the nvme_change_ctrl_state() directly and have WARN_ON_ONCE(1) call instead of having to use an extra variable which matches the name of the function. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-loop: set ctrl state connecting after initChaitanya Kulkarni
When creating a loop controller (ctrl) in nvme_loop_create_ctrl() -> nvme_init_ctrl() we set the ctrl state to NVME_CTRL_NEW. Prior to [1] NVME_CTRL_NEW state was allowed in nvmf_check_ready() for fabrics command type connect. Now, this fails in the following code path for fabrics connect command when creating admin queue :- nvme_loop_create_ctrl() nvme_loo_configure_admin_queue() nvmf_connect_admin_queue() __nvme_submit_sync_cmd() blk_execute_rq() nvme_loop_queue_rq() nvmf_check_ready() # echo "transport=loop,nqn=fs" > /dev/nvme-fabrics [ 6047.741327] nvmet: adding nsid 1 to subsystem fs [ 6048.756430] nvme nvme1: Connect command failed, error wo/DNR bit: 880 We need to set the ctrl state to NVME_CTRL_CONNECTING after :- nvme_loop_create_ctrl() nvme_init_ctrl() so that the above mentioned check for nvmf_check_ready() will return true. This patch sets the ctrl state to connecting after we init the ctrl in nvme_loop_create_ctrl() nvme_init_ctrl() . [1] commit aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Fixes: aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-multipath: do not fall back to __nvme_find_path() for non-optimized pathsHannes Reinecke
When nvme_round_robin_path() finds a valid namespace we should be using it; falling back to __nvme_find_path() for non-optimized paths will cause the result from nvme_round_robin_path() to be ignored for non-optimized paths. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-multipath: fix logic for non-optimized pathsMartin Wilck
Handle the special case where we have exactly one optimized path, which we should keep using in this case. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Signed off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-rdma: fix controller reset hang during trafficSagi Grimberg
commit fe35ec58f0d3 ("block: update hctx map when use multiple maps") exposed an issue where we may hang trying to wait for queue freeze during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple queue maps (which we have now for default/read/poll) is attempting to freeze the queue. However we never started queue freeze when starting the reset, which means that we have inflight pending requests that entered the queue that we will not complete once the queue is quiesced. So start a freeze before we quiesce the queue, and unfreeze the queue after we successfully connected the I/O queues (and make sure to call blk_mq_update_nr_hw_queues only after we are sure that the queue was already frozen). This follows to how the pci driver handles resets. Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-tcp: fix controller reset hang during trafficSagi Grimberg
commit fe35ec58f0d3 ("block: update hctx map when use multiple maps") exposed an issue where we may hang trying to wait for queue freeze during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple queue maps (which we have now for default/read/poll) is attempting to freeze the queue. However we never started queue freeze when starting the reset, which means that we have inflight pending requests that entered the queue that we will not complete once the queue is quiesced. So start a freeze before we quiesce the queue, and unfreeze the queue after we successfully connected the I/O queues (and make sure to call blk_mq_update_nr_hw_queues only after we are sure that the queue was already frozen). This follows to how the pci driver handles resets. Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: introduce the passthru Kconfig optionChaitanya Kulkarni
This patch updates KConfig file for the NVMeOF target where we add new option so that user can selectively enable/disable passthru code. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [logang@deltatee.com: fixed some of the wording in the help message] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: introduce the passthru configfs interfaceLogan Gunthorpe
When CONFIG_NVME_TARGET_PASSTHRU as 'passthru' directory will be added to each subsystem. The directory is similar to a namespace and has two attributes: device_path and enable. The user must set the path to the nvme controller's char device and write '1' to enable the subsystem to use passthru. Any given subsystem is prevented from enabling both a regular namespace and the passthru device. If one is enabled, enabling the other will produce an error. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: Add passthru enable/disable helpersLogan Gunthorpe
This patch adds helper functions which are used in the NVMeOF configfs when the user is configuring the passthru subsystem. Here we ensure that only one subsys is assigned to each nvme_ctrl by using an xarray on the cntlid. The subsystem's version number is overridden by the passed through controller's version. However, if that version is less than 1.2.1, then we bump the advertised version to that and print a warning in dmesg. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: add passthru code to process commandsLogan Gunthorpe
Add passthru command handling capability for the NVMeOF target and export passthru APIs which are used to integrate passthru code with nvmet-core. The new file passthru.c handles passthru cmd parsing and execution. In the passthru mode, we create a block layer request from the nvmet request and map the data on to the block layer request. Admin commands and features are on an allow list as there are a number of each that don't make too much sense with passthrough. We use an allow list such that new commands can be considered before being blindly passed through. In both cases, vendor specific commands are always allowed. We also reject reservation IO commands as the underlying device cannot differentiate between multiple hosts behind a fabric. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: export nvme_find_get_ns() and nvme_put_ns()Logan Gunthorpe
nvme_find_get_ns() and nvme_put_ns() are required by the target passthru code and are exported under the NVME_TARGET_PASSTHRU namespace. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: introduce nvme_ctrl_get_by_path()Logan Gunthorpe
nvme_ctrl_get_by_path() is analogous to blkdev_get_by_path() except it gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0). It makes use of filp_open() to open the file and uses the private data to obtain a pointer to the struct nvme_ctrl. If the fops of the file do not match, -EINVAL is returned. The purpose of this function is to support NVMe-OF target passthru and is exported under the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: introduce nvme_execute_passthru_rq to call nvme_passthru_[start|end]()Logan Gunthorpe
Introduce a new nvme_execute_passthru_rq() helper which calls nvme_passthru_[start|end]() around blk_execute_rq(). This ensures all passthru calls (including nvme_submit_io()) will be wrapped appropriately. nvme_execute_passthru_rq() will also be useful for the nvmet passthru code and is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: create helper function to obtain command effectsLogan Gunthorpe
Separate the code to obtain command effects from the code to start a passthru request and move the nvme_passthru_start() and nvme_passthru_end() functions up above nvme_submit_user_cmd() in order that they may be used in a new helper a subsequent patch. The new helper function will be necessary for nvmet passthru code to determine if we need to change out of interrupt context to handle the effects. It is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: clear any SGL flags in passthru commandsLogan Gunthorpe
The host driver should decide whether to use SGLs or PRPs and they currently assume the flags are cleared after the call to nvme_setup_cmd(). However, passed-through commands may erroneously set these bits; so clear them for all cases. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet-fc: remove redundant del_work_active flagJames Smart
The transport has a del_work_active flag to avoid duplicate scheduling of the del_work item. This is redundant with the checks that schedule_work() makes. Remove the del_work_active flag. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet-fc: check successful reference in nvmet_fc_find_target_assocJames Smart
When searching for an association based on an association id, when there is a match, the code takes a reference. However, it is not validating that the reference taking was successful. Check the status of the reference. If unsuccessful, the device is being deleted and should be ignored. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-fc: set max_segments to lldd max valueJames Smart
Currently the FC transport is set max_hw_sectors based on the lldds max sgl segment count. However, the block queue max segments is set based on the controller's max_segments count, which the transport does not set. As such, the lldd is receiving sgl lists that are exceeding its max segment count. Set the controller max segment count and derive max_hw_sectors from the max segment count. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-fc: drop a duplicated word in a commentRandy Dunlap
Drop the repeated word "a" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-hwmon: log the controller device nameSagi Grimberg
Stay consistent with the rest of the driver Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: fix deadlock in disconnect during scan_work and/or ana_workSagi Grimberg
A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: 0d0b660f214d ("nvme: add ANA support") Reported-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: document nvme controller statesSagi Grimberg
We are starting to see some non-trivial states so lets start documenting them. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>