summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-07-29nvmet: introduce the passthru Kconfig optionChaitanya Kulkarni
This patch updates KConfig file for the NVMeOF target where we add new option so that user can selectively enable/disable passthru code. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [logang@deltatee.com: fixed some of the wording in the help message] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: introduce the passthru configfs interfaceLogan Gunthorpe
When CONFIG_NVME_TARGET_PASSTHRU as 'passthru' directory will be added to each subsystem. The directory is similar to a namespace and has two attributes: device_path and enable. The user must set the path to the nvme controller's char device and write '1' to enable the subsystem to use passthru. Any given subsystem is prevented from enabling both a regular namespace and the passthru device. If one is enabled, enabling the other will produce an error. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: Add passthru enable/disable helpersLogan Gunthorpe
This patch adds helper functions which are used in the NVMeOF configfs when the user is configuring the passthru subsystem. Here we ensure that only one subsys is assigned to each nvme_ctrl by using an xarray on the cntlid. The subsystem's version number is overridden by the passed through controller's version. However, if that version is less than 1.2.1, then we bump the advertised version to that and print a warning in dmesg. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: add passthru code to process commandsLogan Gunthorpe
Add passthru command handling capability for the NVMeOF target and export passthru APIs which are used to integrate passthru code with nvmet-core. The new file passthru.c handles passthru cmd parsing and execution. In the passthru mode, we create a block layer request from the nvmet request and map the data on to the block layer request. Admin commands and features are on an allow list as there are a number of each that don't make too much sense with passthrough. We use an allow list such that new commands can be considered before being blindly passed through. In both cases, vendor specific commands are always allowed. We also reject reservation IO commands as the underlying device cannot differentiate between multiple hosts behind a fabric. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: export nvme_find_get_ns() and nvme_put_ns()Logan Gunthorpe
nvme_find_get_ns() and nvme_put_ns() are required by the target passthru code and are exported under the NVME_TARGET_PASSTHRU namespace. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: introduce nvme_ctrl_get_by_path()Logan Gunthorpe
nvme_ctrl_get_by_path() is analogous to blkdev_get_by_path() except it gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0). It makes use of filp_open() to open the file and uses the private data to obtain a pointer to the struct nvme_ctrl. If the fops of the file do not match, -EINVAL is returned. The purpose of this function is to support NVMe-OF target passthru and is exported under the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: introduce nvme_execute_passthru_rq to call nvme_passthru_[start|end]()Logan Gunthorpe
Introduce a new nvme_execute_passthru_rq() helper which calls nvme_passthru_[start|end]() around blk_execute_rq(). This ensures all passthru calls (including nvme_submit_io()) will be wrapped appropriately. nvme_execute_passthru_rq() will also be useful for the nvmet passthru code and is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: create helper function to obtain command effectsLogan Gunthorpe
Separate the code to obtain command effects from the code to start a passthru request and move the nvme_passthru_start() and nvme_passthru_end() functions up above nvme_submit_user_cmd() in order that they may be used in a new helper a subsequent patch. The new helper function will be necessary for nvmet passthru code to determine if we need to change out of interrupt context to handle the effects. It is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: clear any SGL flags in passthru commandsLogan Gunthorpe
The host driver should decide whether to use SGLs or PRPs and they currently assume the flags are cleared after the call to nvme_setup_cmd(). However, passed-through commands may erroneously set these bits; so clear them for all cases. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet-fc: remove redundant del_work_active flagJames Smart
The transport has a del_work_active flag to avoid duplicate scheduling of the del_work item. This is redundant with the checks that schedule_work() makes. Remove the del_work_active flag. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet-fc: check successful reference in nvmet_fc_find_target_assocJames Smart
When searching for an association based on an association id, when there is a match, the code takes a reference. However, it is not validating that the reference taking was successful. Check the status of the reference. If unsuccessful, the device is being deleted and should be ignored. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-fc: set max_segments to lldd max valueJames Smart
Currently the FC transport is set max_hw_sectors based on the lldds max sgl segment count. However, the block queue max segments is set based on the controller's max_segments count, which the transport does not set. As such, the lldd is receiving sgl lists that are exceeding its max segment count. Set the controller max segment count and derive max_hw_sectors from the max segment count. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-fc: drop a duplicated word in a commentRandy Dunlap
Drop the repeated word "a" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-hwmon: log the controller device nameSagi Grimberg
Stay consistent with the rest of the driver Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: fix deadlock in disconnect during scan_work and/or ana_workSagi Grimberg
A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: 0d0b660f214d ("nvme: add ANA support") Reported-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: document nvme controller statesSagi Grimberg
We are starting to see some non-trivial states so lets start documenting them. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: use xarray for ctrl ns storingChaitanya Kulkarni
This patch replaces the ctrl->namespaces tracking from linked list to xarray and improves the performance when accessing one namespce :- XArray vs Default:- IOPS and BW (more the better) increase BW (~1.8%):- --------------------------------------------------- XArray :- read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec) read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec) read: IOPS=162k, BW=631MiB/s (662MB/s)(18.5GiB/30001msec) Default:- read: IOPS=156k, BW=609MiB/s (639MB/s)(17.8GiB/30001msec) read: IOPS=157k, BW=613MiB/s (643MB/s)(17.0GiB/30001msec) read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec) Submission latency (less the better) decrease (~8.3%):- ------------------------------------------------------- XArray:- slat (usec): min=7, max=8386, avg=11.19, stdev=5.96 slat (usec): min=7, max=441, avg=11.09, stdev=4.48 slat (usec): min=7, max=1088, avg=11.21, stdev=4.54 Default :- slat (usec): min=8, max=2826.5k, avg=23.96, stdev=3911.50 slat (usec): min=8, max=503, avg=12.52, stdev=5.07 slat (usec): min=8, max=2384, avg=12.50, stdev=5.28 CPU Usage (less the better) decrease (~5.2%):- ---------------------------------------------- XArray:- cpu : usr=1.84%, sys=18.61%, ctx=949471, majf=0, minf=250 cpu : usr=1.83%, sys=18.41%, ctx=950262, majf=0, minf=237 cpu : usr=1.82%, sys=18.82%, ctx=957224, majf=0, minf=234 Default:- cpu : usr=1.70%, sys=19.21%, ctx=858196, majf=0, minf=251 cpu : usr=1.82%, sys=19.98%, ctx=929720, majf=0, minf=227 cpu : usr=1.83%, sys=20.33%, ctx=947208, majf=0, minf=235. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet-rdma: use new shared CQ mechanismYamin Friedman
Has the driver use shared CQs providing ~10%-20% improvement when multiple disks are used. Instead of opening a CQ for each QP per controller, a CQ for each core will be provided by the RDMA core driver that will be shared between the QPs on that core reducing interrupt overhead. Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-rdma: use new shared CQ mechanismYamin Friedman
Has the driver use shared CQs providing ~10%-20% improvement as seen in the patch introducing shared CQs. Instead of opening a CQ for each QP per controller connected, a CQ for each QP will be provided by the RDMA core driver that will be shared between the QPs on that core reducing interrupt overhead. Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-pci: add support for ACPI StorageD3Enable propertyDavid E. Box
This patch implements a solution for a BIOS hack used on some currently shipping Intel systems to change driver power management policy for PCIe NVMe drives. Some newer Intel platforms, like some Comet Lake systems, require that PCIe devices use D3 when doing suspend-to-idle in order to allow the platform to realize maximum power savings. This is particularly needed to support ATX power supply shutdown on desktop systems. In order to ensure this happens for root ports with storage devices, Microsoft apparently created this ACPI _DSD property as a way to influence their driver policy. To my knowledge this property has not been discussed with the NVME specification body. Though the solution is not ideal, it addresses a problem that also affects Linux since the NVMe driver's default policy of using NVMe APST during suspend-to-idle prevents the PCI root port from going to D3 and leads to higher power consumption for these platforms. The power consumption difference may be negligible on laptop systems, but many watts on desktop systems when the ATX power supply is blocked from powering down. The patch creates a new nvme_acpi_storage_d3 function to check for the StorageD3Enable property during probe and enables D3 as a quirk if set. It also provides a 'noacpi' module parameter to allow skipping the quirk if needed. Tested with: - PM961 NVMe SED Samsung 512GB - INTEL SSDPEKKF512G8 Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-pci: use max of PRP or SGL for iod sizeChaitanya Kulkarni
>From the initial implementation of NVMe SGL kernel support commit a7a7cbe353a5 ("nvme-pci: add SGL support") with addition of the commit 943e942e6266 ("nvme-pci: limit max IO size and segments to avoid high order allocations") now there is only caller left for nvme_pci_iod_alloc_size() which statically passes true for last parameter that calculates allocation size based on SGL since we need size of biggest command supported for mempool allocation. This patch modifies the helper functions nvme_pci_iod_alloc_size() such that it is now uses maximum of PRP and SGL size for iod allocation size calculation. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-core: replace ctrl page size with a macroChaitanya Kulkarni
Saving the nvme controller's page size was from a time when the driver tried to use different sized pages, but this value is always set to a constant, and has been this way for some time. Remove the 'page_size' field and replace its usage with the constant value. This also lets the compiler make some micro-optimizations in the io path, and that's always a good thing. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: remove redundant validation in nvme_start_ctrl()Baolin Wang
We've already validated the 'kato' in nvme_start_keep_alive(), thus no need to validate it again in nvme_start_ctrl(). Remove it. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme: remove an unnecessary conditionDan Carpenter
"v" is an unsigned int so it can't be more than UINT_MAX. Removing this check makes it easier to preserve the error code as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29Merge tag 'drm-misc-fixes-2020-07-28' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes * drm: fix possible use-after-free * dbi: fix SPI Type 1 transfer * drm_fb_helper: use memcpy_io on bochs' sparc64 * mcde: fix stability * panel: fix display noise on auo,kd101n80-45na * panel: delay HPD checks for boe_nv133fhm_n61 * bridge: drop connector check in nwl-dsi bridge * bridge: set proper bridge type for adv7511 * of: fix a double free Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200728110446.GA8076@linux-uq9g
2020-07-28Documentation: bareudp: Corrected description of bareudp module.Martin Varghese
Removed redundant words. Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28bareudp: forbid mixing IP and MPLS in multiproto modeGuillaume Nault
In multiproto mode, bareudp_xmit() accepts sending multicast MPLS and IPv6 packets regardless of the bareudp ethertype. In practice, this let an IP tunnel send multicast MPLS packets, or an MPLS tunnel send IPv6 packets. We need to restrict the test further, so that the multiproto mode only enables * IPv6 for IPv4 tunnels, * or multicast MPLS for unicast MPLS tunnels. To improve clarity, the protocol validation is moved to its own function, where each logical test has its own condition. v2: s/ntohs/htons/ Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28ipv6: Fix nexthop refcnt leak when creating ipv6 route infoXiyu Yang
ip6_route_info_create() invokes nexthop_get(), which increases the refcount of the "nh". When ip6_route_info_create() returns, local variable "nh" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of ip6_route_info_create(). When nexthops can not be used with source routing, the function forgets to decrease the refcnt increased by nexthop_get(), causing a refcnt leak. Fix this issue by pulling up the error source routing handling when nexthops can not be used with source routing. Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28Merge branch 'Fix-bugs-in-Octeontx2-netdev-driver'David S. Miller
Subbaraya Sundeep says: ==================== Fix bugs in Octeontx2 netdev driver There are problems in the existing Octeontx2 netdev drivers like missing cancel_work for the reset task, missing lock in reset task and missing unergister_netdev in driver remove. This patch set fixes the above problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28octeontx2-pf: Unregister netdev at driver removeSubbaraya Sundeep
Added unregister_netdev in the driver remove function. Generally unregister_netdev is called after disabling all the device interrupts but here it is called before disabling device mailbox interrupts. The reason behind this is VF needs mailbox interrupt to communicate with its PF to clean up its resources during otx2_stop. otx2_stop disables packet I/O and queue interrupts first and by using mailbox interrupt communicates to PF to free VF resources. Hence this patch calls unregister_device just before disabling mailbox interrupts. Fixes: 3184fb5ba96e ("octeontx2-vf: Virtual function driver support") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28octeontx2-pf: cancel reset_task workSubbaraya Sundeep
During driver exit cancel the queued reset_task work in VF driver. Fixes: 3184fb5ba96e ("octeontx2-vf: Virtual function driver support") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28octeontx2-pf: Fix reset_task bugsSubbaraya Sundeep
Two bugs exist in the code related to reset_task in PF driver one is the missing protection against network stack ndo_open and ndo_close. Other one is the missing cancel_work. This patch fixes those problems. Fixes: 4ff7d1488a84 ("octeontx2-pf: Error handling support") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28mlx4: disable device on shutdownJakub Kicinski
It appears that not disabling a PCI device on .shutdown may lead to a Hardware Error with particular (perhaps buggy) BIOS versions: mlx4_en: eth0: Close port called mlx4_en 0000:04:00.0: removed PHC reboot: Restarting system {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 {1}[Hardware Error]: event severity: fatal {1}[Hardware Error]: Error 0, type: fatal {1}[Hardware Error]: section_type: PCIe error {1}[Hardware Error]: port_type: 4, root port {1}[Hardware Error]: version: 1.16 {1}[Hardware Error]: command: 0x4010, status: 0x0143 {1}[Hardware Error]: device_id: 0000:00:02.2 {1}[Hardware Error]: slot: 0 {1}[Hardware Error]: secondary_bus: 0x04 {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f06 {1}[Hardware Error]: class_code: 000604 {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000 {1}[Hardware Error]: aer_uncor_severity: 0x00062030 {1}[Hardware Error]: TLP Header: 40000018 040000ff 791f4080 00000000 [hw error repeats] Kernel panic - not syncing: Fatal hardware error! CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017 Fix the mlx4 driver. This is a very similar problem to what had been fixed in: commit 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown") to address https://bugzilla.kernel.org/show_bug.cgi?id=199779. Fixes: 2ba5fbd62b25 ("net/mlx4_core: Handle AER flow properly") Reported-by: Jake Lawrence <lawja@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28Merge branch 'rhashtable-Fix-unprotected-RCU-dereference-in-__rht_ptr'David S. Miller
Herbert Xu says: ==================== rhashtable: Fix unprotected RCU dereference in __rht_ptr This patch series fixes an unprotected dereference in __rht_ptr. The first patch is a minimal fix that does not use the correct RCU markings but is suitable for backport, and the second patch cleans up the RCU markings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28rhashtable: Restore RCU marking on rhash_lock_headHerbert Xu
This patch restores the RCU marking on bucket_table->buckets as it really does need RCU protection. Its removal had led to a fatal bug. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28rhashtable: Fix unprotected RCU dereference in __rht_ptrHerbert Xu
The rcu_dereference call in rht_ptr_rcu is completely bogus because we've already dereferenced the value in __rht_ptr and operated on it. This causes potential double readings which could be fatal. The RCU dereference must occur prior to the comparison in __rht_ptr. This patch changes the order of RCU dereference so that it is done first and the result is then fed to __rht_ptr. The RCU marking changes have been minimised using casts which will be removed in a follow-up patch. Fixes: ba6306e3f648 ("rhashtable: Remove RCU marking from...") Reported-by: "Gong, Sishuai" <sishuai@purdue.edu> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net: ethernet: mtk_eth_soc: Always call mtk_gmac0_rgmii_adjust() for mt7623René van Dorst
Modify mtk_gmac0_rgmii_adjust() so it can always be called. mtk_gmac0_rgmii_adjust() sets-up the TRGMII clocks. Signed-off-by: René van Dorst <opensource@vdorst.com> Signed-off-By: David Woodhouse <dwmw2@infradead.org> Tested-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28Merge tag 'mlx5-fixes-2020-07-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes-2020-07-28 This series introduces some fixes to mlx5 driver. v1->v2: - Drop the "Hold reference on mirred devices" patch, until Or's comments are addressed. - Imporve "Modify uplink state" patch commit message per Or's request. Please pull and let me know if there is any problem. For -Stable: For -stable v4.9 ('net/mlx5e: Fix error path of device attach') For -stable v4.15 ('net/mlx5: Verify Hardware supports requested ptp function on a given pin') For -stable v5.3 ('net/mlx5e: Modify uplink state on interface up/down') For -stable v5.4 ('net/mlx5e: Fix kernel crash when setting vf VLANID on a VF dev') ('net/mlx5: E-switch, Destroy TSAR when fail to enable the mode') For -stable v5.5 ('net/mlx5: E-switch, Destroy TSAR after reload interface') For -stable v5.7 ('net/mlx5: Fix a bug of using ptp channel index as pin index') ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28Merge branch 'net-lan78xx-fix-NULL-deref-and-memory-leak'David S. Miller
Johan Hovold says: ==================== net: lan78xx: fix NULL deref and memory leak The first two patches fix a NULL-pointer dereference at probe that can be triggered by a malicious device and a small transfer-buffer memory leak, respectively. For another subsystem I would have marked them: Cc: stable@vger.kernel.org # 4.3 The third one replaces the driver's current broken endpoint lookup helper, which could end up accepting incomplete interfaces and whose results weren't even useeren Johan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net: lan78xx: replace bogus endpoint lookupJohan Hovold
Drop the bogus endpoint-lookup helper which could end up accepting interfaces based on endpoints belonging to unrelated altsettings. Note that the returned bulk pipes and interrupt endpoint descriptor were never actually used. Instead the bulk-endpoint numbers are hardcoded to 1 and 2 (matching the specification), while the interrupt- endpoint descriptor was assumed to be the third descriptor created by USB core. Try to bring some order to this by dropping the bogus lookup helper and adding the missing endpoint sanity checks while keeping the interrupt- descriptor assumption for now. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net: lan78xx: fix transfer-buffer memory leakJohan Hovold
The interrupt URB transfer-buffer was never freed on disconnect or after probe errors. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net: lan78xx: add missing endpoint sanity checkJohan Hovold
Add the missing endpoint sanity check to prevent a NULL-pointer dereference should a malicious device lack the expected endpoints. Note that the driver has a broken endpoint-lookup helper, lan78xx_get_endpoints(), which can end up accepting interfaces in an altsetting without endpoints as long as *some* altsetting has a bulk-in and a bulk-out endpoint. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28usb: hso: check for return value in hso_serial_common_create()Rustam Kovhaev
in case of an error tty_register_device_attr() returns ERR_PTR(), add IS_ERR() check Reported-and-tested-by: syzbot+67b2bd0e34f952d0321e@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=67b2bd0e34f952d0321e Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net/mlx5e: Fix kernel crash when setting vf VLANID on a VF devAlaa Hleihel
After the cited commit, function 'mlx5_eswitch_set_vport_vlan' started to acquire esw->state_lock. However, esw is not defined for VF devices, hence attempting to set vf VLANID on a VF dev will cause a kernel panic. Fix it by moving up the (redundant) esw validation from function '__mlx5_eswitch_set_vport_vlan' since the rest of the callers now have and use a valid esw. For example with vf device eth4: # ip link set dev eth4 vf 0 vlan 0 Trace of the panic: [ 411.409842] BUG: unable to handle page fault for address: 00000000000011b8 [ 411.449745] #PF: supervisor read access in kernel mode [ 411.452348] #PF: error_code(0x0000) - not-present page [ 411.454938] PGD 80000004189c9067 P4D 80000004189c9067 PUD 41899a067 PMD 0 [ 411.458382] Oops: 0000 [#1] SMP PTI [ 411.460268] CPU: 4 PID: 5711 Comm: ip Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1 [ 411.462447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 411.464158] RIP: 0010:__mutex_lock+0x4e/0x940 [ 411.464928] Code: fd 41 54 49 89 f4 41 52 53 89 d3 48 83 ec 70 44 8b 1d ee 03 b0 01 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 45 85 db 75 0a <48> 3b 7f 60 0f 85 7e 05 00 00 49 8d 45 68 41 56 41 b8 01 00 00 00 [ 411.467678] RSP: 0018:ffff88841fcd74b0 EFLAGS: 00010246 [ 411.468562] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 411.469715] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000001158 [ 411.470812] RBP: ffff88841fcd7550 R08: ffffffffa00fa1ce R09: 0000000000000000 [ 411.471835] R10: ffff88841fcd7570 R11: 0000000000000000 R12: 0000000000000002 [ 411.472862] R13: 0000000000001158 R14: ffffffffa00fa1ce R15: 0000000000000000 [ 411.474004] FS: 00007faee7ca6b80(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000 [ 411.475237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 411.476129] CR2: 00000000000011b8 CR3: 000000041909c006 CR4: 0000000000360ea0 [ 411.477260] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 411.478340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 411.479332] Call Trace: [ 411.479760] ? __nla_validate_parse.part.6+0x57/0x8f0 [ 411.482825] ? mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core] [ 411.483804] mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core] [ 411.484733] mlx5e_set_vf_vlan+0x41/0x50 [mlx5_core] [ 411.485545] do_setlink+0x613/0x1000 [ 411.486165] __rtnl_newlink+0x53d/0x8c0 [ 411.486791] ? mark_held_locks+0x49/0x70 [ 411.487429] ? __lock_acquire+0x8fe/0x1eb0 [ 411.488085] ? rcu_read_lock_sched_held+0x52/0x60 [ 411.488998] ? kmem_cache_alloc_trace+0x16d/0x2d0 [ 411.489759] rtnl_newlink+0x47/0x70 [ 411.490357] rtnetlink_rcv_msg+0x24e/0x450 [ 411.490978] ? netlink_deliver_tap+0x92/0x3d0 [ 411.491631] ? validate_linkmsg+0x330/0x330 [ 411.492262] netlink_rcv_skb+0x47/0x110 [ 411.492852] netlink_unicast+0x1ac/0x270 [ 411.493551] netlink_sendmsg+0x336/0x450 [ 411.494209] sock_sendmsg+0x30/0x40 [ 411.494779] ____sys_sendmsg+0x1dd/0x1f0 [ 411.495378] ? copy_msghdr_from_user+0x5c/0x90 [ 411.496082] ___sys_sendmsg+0x87/0xd0 [ 411.496683] ? lock_acquire+0xb9/0x3a0 [ 411.497322] ? lru_cache_add+0x5/0x170 [ 411.497944] ? find_held_lock+0x2d/0x90 [ 411.498568] ? handle_mm_fault+0xe46/0x18c0 [ 411.499205] ? __sys_sendmsg+0x51/0x90 [ 411.499784] __sys_sendmsg+0x51/0x90 [ 411.500341] do_syscall_64+0x59/0x2e0 [ 411.500938] ? asm_exc_page_fault+0x8/0x30 [ 411.501609] ? rcu_read_lock_sched_held+0x52/0x60 [ 411.502350] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 411.503093] RIP: 0033:0x7faee73b85a7 [ 411.503654] Code: Bad RIP value. Fixes: 0e18134f4f9f ("net/mlx5e: Eswitch, use state_lock to synchronize vlan change") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28net/mlx5e: Modify uplink state on interface up/downRon Diskin
When setting the PF interface up/down, notify the firmware to update uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled. This behavior will prevent sending traffic out on uplink port when PF is down, such as sending traffic from a VF interface which is still up. Currently when calling mlx5e_open/close(), the driver only sends PAOS command to notify the firmware to set the physical port state to up/down, however, it is not sufficient. When VF is in "auto" state, it follows the uplink state, which was not updated on mlx5e_open/close() before this patch. When switchdev mode is enabled and uplink representor is first enabled, set the uplink port state value back to its FW default "AUTO". Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down") Signed-off-by: Ron Diskin <rondi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28net/mlx5: Query PPS pin operational status before registering itEran Ben Elisha
In a special configuration, a ConnectX6-Dx pin pps-out might be activated when driver is loaded. Fix the driver to always read the operational pin mode when registering it, and advertise it accordingly. Fixes: ee7f12205abc ("net/mlx5e: Implement 1PPS support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28net/mlx5e: Fix slab-out-of-bounds in mlx5e_rep_is_lag_netdevRaed Salem
mlx5e_rep_is_lag_netdev is used as first check as part of netdev events handler for bond device of non-uplink representors, this handler can get any netdevice under the same network namespace of mlx5e netdevice. Current code treats the netdev as mlx5e netdev and only later on verifies this, hence causes the following Kasan trace: [15402.744990] ================================================================== [15402.746942] BUG: KASAN: slab-out-of-bounds in mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core] [15402.749009] Read of size 8 at addr ffff880391f3f6b0 by task ovs-vswitchd/5347 [15402.752065] CPU: 7 PID: 5347 Comm: ovs-vswitchd Kdump: loaded Tainted: G B O --------- -t - 4.18.0-g3dcc204d291d-dirty #1 [15402.755349] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [15402.757600] Call Trace: [15402.758968] dump_stack+0x71/0xab [15402.760427] print_address_description+0x6a/0x270 [15402.761969] kasan_report+0x179/0x2d0 [15402.763445] ? mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core] [15402.765121] mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core] [15402.766782] mlx5e_rep_esw_bond_netevent+0x129/0x620 [mlx5_core] Fix by deferring the violating access to be post the netdev verify check. Fixes: 7e51891a237f ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28net/mlx5: Verify Hardware supports requested ptp function on a given pinEran Ben Elisha
Fix a bug where driver did not verify Hardware pin capabilities for PTP functions. Fixes: ee7f12205abc ("net/mlx5e: Implement 1PPS support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28net/mlx5: Fix a bug of using ptp channel index as pin indexEran Ben Elisha
On PTP mlx5_ptp_enable(on=0) flow, driver mistakenly used channel index as pin index. After ptp patch marked in fixes tag was introduced, driver can freely call ptp_find_pin() as part of the .enable() callback. Fix driver mlx5_ptp_enable(on=0) flow to always use ptp_find_pin(). With that, Driver will use the correct pin index in mlx5_ptp_enable(on=0) flow. In addition, when initializing the pins, always set channel to zero. As all pins can be attached to all channels, let ptp_set_pinfunc() to move them between the channels. For stable branches, this fix to be applied only on kernels that includes both patches in fixes tag. Otherwise, mlx5_ptp_enable(on=0) will be stuck on pincfg_mux. Fixes: 62582a7ee783 ("ptp: Avoid deadlocks in the programmable pin code.") Fixes: ee7f12205abc ("net/mlx5e: Implement 1PPS support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28net/mlx5e: Fix missing cleanup of ethtool steering during rep rx cleanupMaor Dickman
The cited commit add initialization of ethtool steering during representor rx initializations without cleaning it up in representor rx cleanup, this may cause for stale ethtool flows to remain after moving back from switchdev mode to legacy mode. Fixed by calling ethtool steering cleanup during rep rx cleanup. Fixes: 6783e8b29f63 ("net/mlx5e: Init ethtool steering for representors") Signed-off-by: Maor Dickman <maord@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>