summaryrefslogtreecommitdiff
path: root/drivers/pci/host/pci-hyperv.c
AgeCommit message (Collapse)Author
2018-02-06Merge tag 'pci-v4.16-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - skip AER driver error recovery callbacks for correctable errors reported via ACPI APEI, as we already do for errors reported via the native path (Tyler Baicar) - fix DPC shared interrupt handling (Alex Williamson) - print full DPC interrupt number (Keith Busch) - enable DPC only if AER is available (Keith Busch) - simplify DPC code (Bjorn Helgaas) - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn Helgaas) - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn Helgaas) - move ASPM internal interfaces out of public header (Bjorn Helgaas) - allow hot-removal of VGA devices (Mika Westerberg) - speed up unplug and shutdown by assuming Thunderbolt controllers don't support Command Completed events (Lukas Wunner) - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling, Jay Cornwall) - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes) - clean up PCI DMA interface usage (Christoph Hellwig) - remove PCI pool API (replaced with DMA pool) (Romain Perier) - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan Kaya) - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring) - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler) - remove warnings on sysfs mmap failure (Bjorn Helgaas) - quiet ROM validation messages (Alex Deucher) - remove redundant memory alloc failure messages (Markus Elfring) - fill in types for compile-time VGA and other I/O port resources (Bjorn Helgaas) - make "pci=pcie_scan_all" work for Root Ports as well as Downstream Ports to help AmigaOne X1000 (Bjorn Helgaas) - add SPDX tags to all PCI files (Bjorn Helgaas) - quirk Marvell 9128 DMA aliases (Alex Williamson) - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas) - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas Cassel) - use DMA API to get MSI address for DesignWare IP (Niklas Cassel) - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I) - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun) - add support for ARTPEC-7 SoC (Niklas Cassel) - add endpoint-mode support for ARTPEC (Niklas Cassel) - add Cadence PCIe host and endpoint controller driver (Cyrille Pitchen) - handle multiple INTx status bits being set in dra7xx (Vignesh R) - translate dra7xx hwirq range to fix INTD handling (Vignesh R) - remove deprecated Exynos PHY initialization code (Jaehoon Chung) - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu) - fix NULL pointer dereference in iProc BCMA driver (Ray Jui) - fix Keystone interrupt-controller-node lookup (Johan Hovold) - constify qcom driver structures (Julia Lawall) - rework Tegra config space mapping to increase space available for endpoints (Vidya Sagar) - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy) - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy) - add support for Global Fabric Manager Server (GFMS) event to Microsemi Switchtec switch driver (Logan Gunthorpe) - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao) * tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller PCI: endpoint: Fix EPF device name to support multi-function devices PCI: endpoint: Add the function number as argument to EPC ops PCI: cadence: Add host driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller PCI: Add vendor ID for Cadence PCI: Add generic function to probe PCI host controllers PCI: generic: fix missing call of pci_free_resource_list() PCI: OF: Add generic function to parse and allocate PCI resources PCI: Regroup all PCI related entries into drivers/pci/Makefile PCI/DPC: Reformat DPC register definitions PCI/DPC: Add and use DPC Status register field definitions PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error() PCI/DPC: Remove unnecessary RP PIO register structs PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info() PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info() PCI/DPC: Make RP PIO log size check more generic PCI/DPC: Rename local "status" to "dpc_status" PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error() ...
2018-01-28PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplateBjorn Helgaas
Add SPDX GPL-2.0 to all PCI files that specified the GPL version 2 license. Remove the boilerplate GPL version 2 language, relying on the assertion in b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license") that the SPDX identifier may be used instead of the full boilerplate text. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-31Merge branch 'x86/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A couple of fixlets for x86: - Fix the ESPFIX double fault handling for 5-level pagetables - Fix the commandline parsing for 'apic=' on 32bit systems and update documentation - Make zombie stack traces reliable - Fix kexec with stack canary - Fix the delivery mode for APICs which was missed when the x86 vector management was converted to single target delivery. Caused a regression due to the broken hardware which ignores affinity settings in lowest prio delivery mode. - Unbreak modules when AMD memory encryption is enabled - Remove an unused parameter of prepare_switch_to" * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Switch all APICs to Fixed delivery mode x86/apic: Update the 'apic=' description of setting APIC driver x86/apic: Avoid wrong warning when parsing 'apic=' in X86-32 case x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR) x86: Remove unused parameter of prepare_switch_to x86/stacktrace: Make zombie stack traces reliable x86/mm: Unbreak modules that use the DMA API x86/build: Make isoimage work on Debian x86/espfix/64: Fix espfix double-fault handling on 5-level systems
2017-12-29x86/apic: Switch all APICs to Fixed delivery modeThomas Gleixner
Some of the APIC incarnations are operating in lowest priority delivery mode. This worked as long as the vector management code allocated the same vector on all possible CPUs for each interrupt. Lowest priority delivery mode does not necessarily respect the affinity setting and may redirect to some other online CPU. This was documented somewhere in the old code and the conversion to single target delivery missed to update the delivery mode of the affected APIC drivers which results in spurious interrupts on some of the affected CPU/Chipset combinations. Switch the APIC drivers over to Fixed delivery mode and remove all leftovers of lowest priority delivery mode. Switching to Fixed delivery mode is not a problem on these CPUs because the kernel already uses Fixed delivery mode for IPIs. The reason for this is that th SDM explicitely forbids lowest prio mode for IPIs. The reason is obvious: If the irq routing does not honor destination targets in lowest prio mode then an IPI targeted at CPU1 might end up on CPU0, which would be a fatal problem in many cases. As a consequence of this change, the apic::irq_delivery_mode field is now pointless, but this needs to be cleaned up in a separate patch. Fixes: fdba46ffb4c2 ("x86/apic: Get rid of multi CPU affinity") Reported-by: vcaputo@pengaru.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: vcaputo@pengaru.com Cc: Pavel Machek <pavel@ucw.cz> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
2017-11-07PCI: hv: Use effective affinity maskDexuan Cui
The effective_affinity_mask is always set when an interrupt is assigned in __assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic apic_physflat: -> default_cpu_mask_to_apicid() -> irq_data_update_effective_affinity(), but it looks d->common->affinity remains all-1's before the user space or the kernel changes it later. In the early allocation/initialization phase of an IRQ, we should use the effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to the expected CPU. Without the patch, if we assign 7 Mellanox ConnectX-3 VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts. Tested-by: Adrian Suhov <v-adsuho@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com> Cc: stable@vger.kernel.org Cc: Jork Loeser <jloeser@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com>
2017-09-08Merge tag 'pci-v4.14-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add enhanced Downstream Port Containment support, which prints more details about Root Port Programmed I/O errors (Dongdong Liu) - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang) - add MediaTek MT2712 and MT7622 support (Ryder Lee) - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang) - add Qualcom IPQ8074 support (Varadarajan Narayanan) - add R-Car r8a7743/5 device tree support (Biju Das) - add Rockchip per-lane PHY support for better power management (Shawn Lin) - fix IRQ mapping for hot-added devices by replacing the pci_fixup_irqs() boot-time design with a host bridge hook called at probe-time (Lorenzo Pieralisi, Matthew Minter) - fix race when enabling two devices that results in upstream bridge not being enabled correctly (Srinath Mannam) - fix pciehp power fault infinite loop (Keith Busch) - fix SHPC bridge MSI hotplug events by enabling bus mastering (Aleksandr Bezzubikov) - fix a VFIO issue by correcting PCIe capability sizes (Alex Williamson) - fix an INTD issue on Xilinx and possibly other drivers by unifying INTx IRQ domain support (Paul Burton) - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg Roedel) - allow APM X-Gene device assignment to guests by adding an ACS quirk (Feng Kan) - fix driver crashes by disabling Extended Tags on Broadcom HT2100 (Extended Tags support is required for PCIe Receivers but not Requesters, and we now enable them by default when Requesters support them) (Sinan Kaya) - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs use the real Requester ID (not a phantom RID) (Robin Murphy) - prevent assignment of Intel VMD children to guests (which may be supported eventually, but isn't yet) by not associating an IOMMU with them (Jon Derrick) - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott Bauer) - fix a Function-Level Reset issue with Intel 750 NVMe by waiting longer (up to 60sec instead of 1sec) for device to become ready (Sinan Kaya) - fix a Function-Level Reset issue on iProc Stingray by working around hardware defects in the CRS implementation (Oza Pawandeep) - fix an issue with Intel NVMe P3700 after an iProc reset by adding a delay during shutdown (Oza Pawandeep) - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking in compose_msi_msg() (Stephen Hemminger) - fix a wireless LAN driver timeout by clearing DesignWare MSI interrupt status after it is handled, not before (Faiz Abbas) - fix DesignWare ATU enable checking (Jisheng Zhang) - reduce Layerscape dependencies on the bootloader by doing more initialization in the driver (Hou Zhiqiang) - improve Intel VMD performance allowing allocation of more IRQ vectors than present CPUs (Keith Busch) - improve endpoint framework support for initial DMA mask, different BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon Vijay Abraham I, Stan Drozd) - rework CRS support to add periodic messages while we poll during enumeration and after Function-Level Reset and prepare for possible other uses of CRS (Sinan Kaya) - clean up Root Port AER handling by removing unnecessary code and moving error handler methods to struct pcie_port_service_driver (Christoph Hellwig) - clean up error handling paths in various drivers (Bjorn Andersson, Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen, Lorenzo Pieralisi, Sergei Shtylyov) - clean up SR-IOV resource handling by disabling VF decoding before updating the corresponding resource structs (Gavin Shan) - clean up DesignWare-based drivers by unifying quirks to update Class Code and Interrupt Pin and related handling of write-protected registers (Hou Zhiqiang) - clean up by adding empty generic pcibios_align_resource() and pcibios_fixup_bus() and removing empty arch-specific implementations (Palmer Dabbelt) - request exclusive reset control for several drivers to allow cleanup elsewhere (Philipp Zabel) - constify various structures (Arvind Yadav, Bhumika Goyal) - convert from full_name() to %pOF (Rob Herring) - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn Lin) * tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits) PCI: xgene: Clean up whitespace PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset PCI: xgene: Fix platform_get_irq() error handling PCI: xilinx-nwl: Fix platform_get_irq() error handling PCI: rockchip: Fix platform_get_irq() error handling PCI: altera: Fix platform_get_irq() error handling PCI: spear13xx: Fix platform_get_irq() error handling PCI: artpec6: Fix platform_get_irq() error handling PCI: armada8k: Fix platform_get_irq() error handling PCI: dra7xx: Fix platform_get_irq() error handling PCI: exynos: Fix platform_get_irq() error handling PCI: iproc: Clean up whitespace PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP PCI: iproc: Add 500ms delay during device shutdown PCI: Fix typos and whitespace errors PCI: Remove unused "res" variable from pci_resource_io() PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag() PCI/AER: Reformat AER register definitions iommu/vt-d: Prevent VMD child devices from being remapping targets x86/PCI: Use is_vmd() rather than relying on the domain number ...
2017-08-10hyper-v: Globalize vp_indexVitaly Kuznetsov
To support implementing remote TLB flushing on Hyper-V with a hypercall we need to make vp_index available outside of vmbus module. Rename and globalize. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Simon Xiao <sixiao@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Link: http://lkml.kernel.org/r/20170802160921.21791-7-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-03PCI: hv: Do not sleep in compose_msi_msg()Stephen Hemminger
The setup of MSI with Hyper-V host was sleeping with locks held. This error is reported when doing SR-IOV hotplug with kernel built with lockdep: BUG: sleeping function called from invalid context at kernel/sched/completion.c:93 in_atomic(): 1, irqs_disabled(): 1, pid: 1405, name: ip 3 locks held by ip/1405: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff976b10bb>] rtnetlink_rcv+0x1b/0x40 #1: (&desc->request_mutex){+.+...}, at: [<ffffffff970ddd33>] __setup_irq+0xb3/0x720 #2: (&irq_desc_lock_class){-.-...}, at: [<ffffffff970ddd65>] __setup_irq+0xe5/0x720 irq event stamp: 3476 hardirqs last enabled at (3475): [<ffffffff971b3005>] get_page_from_freelist+0x225/0xc90 hardirqs last disabled at (3476): [<ffffffff978024e7>] _raw_spin_lock_irqsave+0x27/0x90 softirqs last enabled at (2446): [<ffffffffc05ef0b0>] ixgbevf_configure+0x380/0x7c0 [ixgbevf] softirqs last disabled at (2444): [<ffffffffc05ef08d>] ixgbevf_configure+0x35d/0x7c0 [ixgbevf] The workaround is to poll for host response instead of blocking on completion. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-07-02PCI: hv: Use vPCI protocol version 1.2Jork Loeser
Update the Hyper-V vPCI driver to use the Server-2016 version of the vPCI protocol, fixing MSI creation and retargeting issues. Signed-off-by: Jork Loeser <jloeser@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02PCI: hv: Add vPCI version protocol negotiationJork Loeser
Hyper-V vPCI offers different protocol versions. Add the infra for negotiating the one to use. Signed-off-by: Jork Loeser <jloeser@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02PCI: hv: Temporary own CPU-number-to-vCPU-number infraJork Loeser
To ease parallel effort to centralize CPU-number-to-vCPU-number conversion, temporarily stand up own version, file-local hv_tmp_cpu_nr_to_vp_nr(). Once the changes have merged, this work-around can be removed, and the calls replaced with hv_cpu_number_to_vp_number(). Signed-off-by: Jork Loeser <jloeser@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02PCI: hv: Use page allocation for hbus structureJork Loeser
The hv_pcibus_device structure contains an in-memory hypercall argument that must not cross a page boundary. Allocate the structure as a page to ensure that. Signed-off-by: Jork Loeser <jloeser@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02PCI: hv: Fix comment formatting and use proper integer fieldsJork Loeser
Fix comment formatting and use proper integer fields. Signed-off-by: Jork Loeser <jloeser@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-04-18PCI: hv: Convert hv_pci_dev.refs from atomic_t to refcount_tElena Reshetova
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
2017-04-04PCI: hv: Allocate interrupt descriptors with GFP_ATOMICK. Y. Srinivasan
The memory allocation here needs to be non-blocking. Fix the issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Long Li <longli@microsoft.com> Cc: <stable@vger.kernel.org>
2017-04-04PCI: hv: Specify CPU_AFFINITY_ALL for MSI affinity when >= 32 CPUsK. Y. Srinivasan
When we have 32 or more CPUs in the affinity mask, we should use a special constant to specify that to the host. Fix this issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Long Li <longli@microsoft.com> Cc: <stable@vger.kernel.org>
2017-03-24PCI: hv: Lock PCI bus on device ejectLong Li
A PCI_EJECT message can arrive at the same time we are calling pci_scan_child_bus() in the workqueue for the previous PCI_BUS_RELATIONS message or in create_root_hv_pci_bus(). In this case we could potentially modify the bus from multiple places. Properly lock the bus access. Thanks Dexuan Cui <decui@microsoft.com> for pointing out the race condition in create_root_hv_pci_bus(). Reported-by: Xiaofeng Wang <xiaofwan@redhat.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-03-24PCI: hv: Properly handle PCI bus removeLong Li
hv_pci_devices_present() is called in hv_pci_remove() when we remove a PCI device from the host, e.g., by disabling SR-IOV on a device. In hv_pci_remove(), the bus is already removed before the call, so we don't need to rescan the bus in the workqueue scheduled from hv_pci_devices_present(). By introducing bus state hv_pcibus_removed, we can avoid this situation. Reported-by: Xiaofeng Wang <xiaofwan@redhat.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-02-17PCI: hv: Use device serial number as PCI domainHaiyang Zhang
Use the device serial number as the PCI domain. The serial numbers start with 1 and are unique within a VM. So names, such as VF NIC names, that include domain number as part of the name, can be shorter than that based on part of bus UUID previously. The new names will also stay same for VMs created with copied VHD and same number of devices. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
2017-02-10PCI: hv: Fix wslot_to_devfn() to fix warnings on device removalDexuan Cui
The devfn of 00:02.0 is 0x10. devfn_to_wslot(0x10) == 0x2, and wslot_to_devfn(0x2) should be 0x10, while it's 0x2 in the current code. Due to this, hv_eject_device_work() -> pci_get_domain_bus_and_slot() returns NULL and pci_stop_and_remove_bus_device() is not called. Later when the real device driver's .remove() is invoked by hv_pci_remove() -> pci_stop_root_bus(), some warnings can be noticed because the VM has lost the access to the underlying device at that time. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Haiyang Zhang <haiyangz@microsoft.com> CC: stable@vger.kernel.org CC: K. Y. Srinivasan <kys@microsoft.com> CC: Stephen Hemminger <sthemmin@microsoft.com>
2016-11-29PCI: hv: Allocate physically contiguous hypercall params bufferLong Li
hv_do_hypercall() assumes that we pass a segment from a physically contiguous buffer. A buffer allocated on the stack may not work if CONFIG_VMAP_STACK=y is set. Use kmalloc() to allocate this buffer. Reported-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2016-11-16PCI: hv: Delete the device earlier from hbus->children for hot-removeDexuan Cui
After we send a PCI_EJECTION_COMPLETE message to the host, the host will immediately send us a PCI_BUS_RELATIONS message with relations->device_count == 0, so pci_devices_present_work(), running on another thread, can find the being-ejected device, mark the hpdev->reported_missing to true, and run list_move_tail()/list_del() for the device -- this races hv_eject_device_work() -> list_del(). Move the list_del() in hv_eject_device_work() to an earlier place, i.e., before we send PCI_EJECTION_COMPLETE, so later the pci_devices_present_work() can't see the device. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-11-16PCI: hv: Fix hv_pci_remove() for hot-removeDexuan Cui
1. We don't really need such a big on-stack buffer when sending the teardown_packet: vmbus_sendpacket() here only uses sizeof(struct pci_message). 2. In the hot-remove case (PCI_EJECT), after we send PCI_EJECTION_COMPLETE to the host, the host will send a RESCIND_CHANNEL message to us and the host won't access the per-channel ringbuffer any longer, so we needn't send PCI_RESOURCES_RELEASED/PCI_BUS_D0EXIT to the host, and we shouldn't expect the host's completion message of PCI_BUS_D0EXIT, which will never come. 3. We should send PCI_BUS_D0EXIT after hv_send_resources_released(). Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-11-16PCI: hv: Use the correct buffer size in new_pcichild_device()Dexuan Cui
We don't really need such a big on-stack buffer. vmbus_sendpacket() here only uses sizeof(struct pci_child_message). Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com>
2016-10-31PCI: hv: Make unnecessarily global IRQ masking functions staticTobias Klauser
Make hv_irq_mask() and hv_irq_unmask() static as they are only used in pci-hyperv.c This fixes a sparse warning. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2016-09-06PCI: hv: Handle hv_pci_generic_compl() error caseDexuan Cui
'completion_status' is used in some places, e.g., hv_pci_protocol_negotiation(), so we should make sure it's initialized in error case too, though the error is unlikely here. [bhelgaas: fix changelog typo and nearby whitespace] Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Handle vmbus_sendpacket() failure in hv_compose_msi_msg()Dexuan Cui
Handle vmbus_sendpacket() failure in hv_compose_msi_msg(). I happened to find this when reading the code. I didn't get a real issue however. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Remove the unused 'wrk' in struct hv_pcibus_deviceDexuan Cui
Remove the unused 'wrk' member in struct hv_pcibus_device. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Use pci_function_description[0] in struct definitionsDexuan Cui
The 2 structs can use a zero-length array here, because dynamic memory of the correct size is allocated in hv_pci_devices_present() and we don't need this extra element. No functional change. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Use zero-length array in struct pci_packetDexuan Cui
Use zero-length array in struct pci_packet and rename struct pci_message's field "message_type" to "type". This makes the code more readable. No functionality change. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-08-22PCI: hv: Use list_move_tail() instead of list_del() + list_add_tail()Wei Yongjun
Use list_move_tail() instead of list_del() + list_add_tail(). No functional change intended Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-07-25PCI: hv: Fix interrupt cleanup pathCathy Avery
SR-IOV disabled from the host causes a memory leak. pci-hyperv usually first receives a PCI_EJECT notification and then proceeds to delete the hpdev list entry in hv_eject_device_work(). Later in hv_msi_free() since the device is no longer on the device list hpdev is NULL and hv_msi_free returns without freeing int_desc as part of hv_int_desc_free(). Signed-off-by: Cathy Avery <cavery@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-06-17PCI: hv: Handle all pending messages in hv_pci_onchannelcallback()Vitaly Kuznetsov
When we have an interrupt from the host we have a bit set in event page indicating there are messages for the particular channel. We need to read them all as we won't get signaled for what was on the queue before we cleared the bit in vmbus_on_event(). This applies to all Hyper-V drivers and the pass-through driver should do the same. I did not meet any bugs; the issue was found by code inspection. We don't have many events going through hv_pci_onchannelcallback(), which explains why nobody reported the issue before. While on it, fix handling non-zero vmbus_recvpacket_raw() return values by dropping out. If the return value is not zero, it is wrong to inspect buffer or bytes_recvd as these may contain invalid data. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-06-17PCI: hv: Don't leak buffer in hv_pci_onchannelcallback()Vitaly Kuznetsov
We don't free buffer on several code paths in hv_pci_onchannelcallback(), put kfree() to the end of the function to fix the issue. Direct { kfree(); return; } can now be replaced with a simple 'break'; Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-05-20Merge tag 'char-misc-4.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here's the big char and misc driver update for 4.7-rc1. Lots of different tiny driver subsystems have updates here with new drivers and functionality. Details in the shortlog. All have been in linux-next with no reported issues for a while" * tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (125 commits) mcb: Delete num_cells variable which is not required mcb: Fixed bar number assignment for the gdd mcb: Replace ioremap and request_region with the devm version mcb: Implement bus->dev.release callback mcb: export bus information via sysfs mcb: Correctly initialize the bus's device mei: bus: call mei_cl_read_start under device lock coresight: etb10: adjust read pointer only when needed coresight: configuring ETF in FIFO mode when acting as link coresight: tmc: implementing TMC-ETF AUX space API coresight: moving struct cs_buffers to header file coresight: tmc: keep track of memory width coresight: tmc: make sysFS and Perf mode mutually exclusive coresight: tmc: dump system memory content only when needed coresight: tmc: adding mode of operation for link/sinks coresight: tmc: getting rid of multiple read access coresight: tmc: allocating memory when needed coresight: tmc: making prepare/unprepare functions generic coresight: tmc: splitting driver in ETB/ETF and ETR components coresight: tmc: cleaning up header file ...
2016-05-04PCI: hv: Add explicit barriers to config space accessVitaly Kuznetsov
I'm trying to pass-through Broadcom BCM5720 NIC (Dell device 1f5b) on a Dell R720 server. Everything works fine when the target VM has only one CPU, but SMP guests reboot when the NIC driver accesses PCI config space with hv_pcifront_read_config()/hv_pcifront_write_config(). The reboot appears to be induced by the hypervisor and no crash is observed. Windows event logs are not helpful at all ('Virtual machine ... has quit unexpectedly'). The particular access point is always different and putting debug between them (printk/mdelay/...) moves the issue further away. The server model affects the issue as well: on Dell R420 I'm able to pass-through BCM5720 NIC to SMP guests without issues. While I'm obviously failing to reveal the essence of the issue I was able to come up with a (possible) solution: if explicit barriers are added to hv_pcifront_read_config()/hv_pcifront_write_config() the issue goes away. The essential minimum is rmb() at the end on _hv_pcifront_read_config() and wmb() at the end of _hv_pcifront_write_config() but I'm not confident it will be sufficient for all hardware. I suggest the following barriers: 1) wmb()/mb() between choosing the function and writing to its space. 2) mb() before releasing the spinlock in both _hv_pcifront_read_config()/ _hv_pcifront_write_config() to ensure that consecutive reads/writes to the space won't get re-ordered as drivers may count on that. Config space access is not supposed to be performance-critical so these explicit barriers should not cause any slowdown. [bhelgaas: use Linux "barriers" terminology] Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-05-02PCI: hv: Report resources release after stopping the busVitaly Kuznetsov
Kernel hang is observed when pci-hyperv module is release with device drivers still attached. E.g., when I do 'rmmod pci_hyperv' with BCM5720 device pass-through-ed (tg3 module) I see the following: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rmmod:2104] ... Call Trace: [<ffffffffa0641487>] tg3_read_mem+0x87/0x100 [tg3] [<ffffffffa063f000>] ? 0xffffffffa063f000 [<ffffffffa0644375>] tg3_poll_fw+0x85/0x150 [tg3] [<ffffffffa0649877>] tg3_chip_reset+0x357/0x8c0 [tg3] [<ffffffffa064ca8b>] tg3_halt+0x3b/0x190 [tg3] [<ffffffffa0657611>] tg3_stop+0x171/0x230 [tg3] ... [<ffffffffa064c550>] tg3_remove_one+0x90/0x140 [tg3] [<ffffffff813bee59>] pci_device_remove+0x39/0xc0 [<ffffffff814a3201>] __device_release_driver+0xa1/0x160 [<ffffffff814a32e3>] device_release_driver+0x23/0x30 [<ffffffff813b794a>] pci_stop_bus_device+0x8a/0xa0 [<ffffffff813b7ab6>] pci_stop_root_bus+0x36/0x60 [<ffffffffa02c3f38>] hv_pci_remove+0x238/0x260 [pci_hyperv] The problem seems to be that we report local resources release before stopping the bus and removing devices from it and device drivers may try to perform some operations with these resources on shutdown. Move resources release report after we do pci_stop_root_bus(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-04-30drivers:hv: Use new vmbus_mmio_free() from client drivers.Jake Oshins
This patch modifies all the callers of vmbus_mmio_allocate() to call vmbus_mmio_free() instead of release_mem_region(). Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-16PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMsJake Oshins
Add a new driver which exposes a root PCI bus whenever a PCI Express device is passed through to a guest VM under Hyper-V. The device can be single- or multi-function. The interrupts for the devices are managed by an IRQ domain, implemented within the driver. [bhelgaas: fold in race condition fix (http://lkml.kernel.org/r/1456340196-13717-1-git-send-email-jakeo@microsoft.com)] Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>