summaryrefslogtreecommitdiff
path: root/drivers/pci/host/pci-hyperv.c
AgeCommit message (Collapse)Author
2017-04-18PCI: hv: Convert hv_pci_dev.refs from atomic_t to refcount_tElena Reshetova
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
2017-04-04PCI: hv: Allocate interrupt descriptors with GFP_ATOMICK. Y. Srinivasan
The memory allocation here needs to be non-blocking. Fix the issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Long Li <longli@microsoft.com> Cc: <stable@vger.kernel.org>
2017-04-04PCI: hv: Specify CPU_AFFINITY_ALL for MSI affinity when >= 32 CPUsK. Y. Srinivasan
When we have 32 or more CPUs in the affinity mask, we should use a special constant to specify that to the host. Fix this issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Long Li <longli@microsoft.com> Cc: <stable@vger.kernel.org>
2017-03-24PCI: hv: Lock PCI bus on device ejectLong Li
A PCI_EJECT message can arrive at the same time we are calling pci_scan_child_bus() in the workqueue for the previous PCI_BUS_RELATIONS message or in create_root_hv_pci_bus(). In this case we could potentially modify the bus from multiple places. Properly lock the bus access. Thanks Dexuan Cui <decui@microsoft.com> for pointing out the race condition in create_root_hv_pci_bus(). Reported-by: Xiaofeng Wang <xiaofwan@redhat.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-03-24PCI: hv: Properly handle PCI bus removeLong Li
hv_pci_devices_present() is called in hv_pci_remove() when we remove a PCI device from the host, e.g., by disabling SR-IOV on a device. In hv_pci_remove(), the bus is already removed before the call, so we don't need to rescan the bus in the workqueue scheduled from hv_pci_devices_present(). By introducing bus state hv_pcibus_removed, we can avoid this situation. Reported-by: Xiaofeng Wang <xiaofwan@redhat.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-02-17PCI: hv: Use device serial number as PCI domainHaiyang Zhang
Use the device serial number as the PCI domain. The serial numbers start with 1 and are unique within a VM. So names, such as VF NIC names, that include domain number as part of the name, can be shorter than that based on part of bus UUID previously. The new names will also stay same for VMs created with copied VHD and same number of devices. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
2017-02-10PCI: hv: Fix wslot_to_devfn() to fix warnings on device removalDexuan Cui
The devfn of 00:02.0 is 0x10. devfn_to_wslot(0x10) == 0x2, and wslot_to_devfn(0x2) should be 0x10, while it's 0x2 in the current code. Due to this, hv_eject_device_work() -> pci_get_domain_bus_and_slot() returns NULL and pci_stop_and_remove_bus_device() is not called. Later when the real device driver's .remove() is invoked by hv_pci_remove() -> pci_stop_root_bus(), some warnings can be noticed because the VM has lost the access to the underlying device at that time. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Haiyang Zhang <haiyangz@microsoft.com> CC: stable@vger.kernel.org CC: K. Y. Srinivasan <kys@microsoft.com> CC: Stephen Hemminger <sthemmin@microsoft.com>
2016-11-29PCI: hv: Allocate physically contiguous hypercall params bufferLong Li
hv_do_hypercall() assumes that we pass a segment from a physically contiguous buffer. A buffer allocated on the stack may not work if CONFIG_VMAP_STACK=y is set. Use kmalloc() to allocate this buffer. Reported-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2016-11-16PCI: hv: Delete the device earlier from hbus->children for hot-removeDexuan Cui
After we send a PCI_EJECTION_COMPLETE message to the host, the host will immediately send us a PCI_BUS_RELATIONS message with relations->device_count == 0, so pci_devices_present_work(), running on another thread, can find the being-ejected device, mark the hpdev->reported_missing to true, and run list_move_tail()/list_del() for the device -- this races hv_eject_device_work() -> list_del(). Move the list_del() in hv_eject_device_work() to an earlier place, i.e., before we send PCI_EJECTION_COMPLETE, so later the pci_devices_present_work() can't see the device. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-11-16PCI: hv: Fix hv_pci_remove() for hot-removeDexuan Cui
1. We don't really need such a big on-stack buffer when sending the teardown_packet: vmbus_sendpacket() here only uses sizeof(struct pci_message). 2. In the hot-remove case (PCI_EJECT), after we send PCI_EJECTION_COMPLETE to the host, the host will send a RESCIND_CHANNEL message to us and the host won't access the per-channel ringbuffer any longer, so we needn't send PCI_RESOURCES_RELEASED/PCI_BUS_D0EXIT to the host, and we shouldn't expect the host's completion message of PCI_BUS_D0EXIT, which will never come. 3. We should send PCI_BUS_D0EXIT after hv_send_resources_released(). Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-11-16PCI: hv: Use the correct buffer size in new_pcichild_device()Dexuan Cui
We don't really need such a big on-stack buffer. vmbus_sendpacket() here only uses sizeof(struct pci_child_message). Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com>
2016-10-31PCI: hv: Make unnecessarily global IRQ masking functions staticTobias Klauser
Make hv_irq_mask() and hv_irq_unmask() static as they are only used in pci-hyperv.c This fixes a sparse warning. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2016-09-06PCI: hv: Handle hv_pci_generic_compl() error caseDexuan Cui
'completion_status' is used in some places, e.g., hv_pci_protocol_negotiation(), so we should make sure it's initialized in error case too, though the error is unlikely here. [bhelgaas: fix changelog typo and nearby whitespace] Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Handle vmbus_sendpacket() failure in hv_compose_msi_msg()Dexuan Cui
Handle vmbus_sendpacket() failure in hv_compose_msi_msg(). I happened to find this when reading the code. I didn't get a real issue however. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Remove the unused 'wrk' in struct hv_pcibus_deviceDexuan Cui
Remove the unused 'wrk' member in struct hv_pcibus_device. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Use pci_function_description[0] in struct definitionsDexuan Cui
The 2 structs can use a zero-length array here, because dynamic memory of the correct size is allocated in hv_pci_devices_present() and we don't need this extra element. No functional change. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06PCI: hv: Use zero-length array in struct pci_packetDexuan Cui
Use zero-length array in struct pci_packet and rename struct pci_message's field "message_type" to "type". This makes the code more readable. No functionality change. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: KY Srinivasan <kys@microsoft.com> CC: Jake Oshins <jakeo@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-08-22PCI: hv: Use list_move_tail() instead of list_del() + list_add_tail()Wei Yongjun
Use list_move_tail() instead of list_del() + list_add_tail(). No functional change intended Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-07-25PCI: hv: Fix interrupt cleanup pathCathy Avery
SR-IOV disabled from the host causes a memory leak. pci-hyperv usually first receives a PCI_EJECT notification and then proceeds to delete the hpdev list entry in hv_eject_device_work(). Later in hv_msi_free() since the device is no longer on the device list hpdev is NULL and hv_msi_free returns without freeing int_desc as part of hv_int_desc_free(). Signed-off-by: Cathy Avery <cavery@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-06-17PCI: hv: Handle all pending messages in hv_pci_onchannelcallback()Vitaly Kuznetsov
When we have an interrupt from the host we have a bit set in event page indicating there are messages for the particular channel. We need to read them all as we won't get signaled for what was on the queue before we cleared the bit in vmbus_on_event(). This applies to all Hyper-V drivers and the pass-through driver should do the same. I did not meet any bugs; the issue was found by code inspection. We don't have many events going through hv_pci_onchannelcallback(), which explains why nobody reported the issue before. While on it, fix handling non-zero vmbus_recvpacket_raw() return values by dropping out. If the return value is not zero, it is wrong to inspect buffer or bytes_recvd as these may contain invalid data. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-06-17PCI: hv: Don't leak buffer in hv_pci_onchannelcallback()Vitaly Kuznetsov
We don't free buffer on several code paths in hv_pci_onchannelcallback(), put kfree() to the end of the function to fix the issue. Direct { kfree(); return; } can now be replaced with a simple 'break'; Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-05-20Merge tag 'char-misc-4.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here's the big char and misc driver update for 4.7-rc1. Lots of different tiny driver subsystems have updates here with new drivers and functionality. Details in the shortlog. All have been in linux-next with no reported issues for a while" * tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (125 commits) mcb: Delete num_cells variable which is not required mcb: Fixed bar number assignment for the gdd mcb: Replace ioremap and request_region with the devm version mcb: Implement bus->dev.release callback mcb: export bus information via sysfs mcb: Correctly initialize the bus's device mei: bus: call mei_cl_read_start under device lock coresight: etb10: adjust read pointer only when needed coresight: configuring ETF in FIFO mode when acting as link coresight: tmc: implementing TMC-ETF AUX space API coresight: moving struct cs_buffers to header file coresight: tmc: keep track of memory width coresight: tmc: make sysFS and Perf mode mutually exclusive coresight: tmc: dump system memory content only when needed coresight: tmc: adding mode of operation for link/sinks coresight: tmc: getting rid of multiple read access coresight: tmc: allocating memory when needed coresight: tmc: making prepare/unprepare functions generic coresight: tmc: splitting driver in ETB/ETF and ETR components coresight: tmc: cleaning up header file ...
2016-05-04PCI: hv: Add explicit barriers to config space accessVitaly Kuznetsov
I'm trying to pass-through Broadcom BCM5720 NIC (Dell device 1f5b) on a Dell R720 server. Everything works fine when the target VM has only one CPU, but SMP guests reboot when the NIC driver accesses PCI config space with hv_pcifront_read_config()/hv_pcifront_write_config(). The reboot appears to be induced by the hypervisor and no crash is observed. Windows event logs are not helpful at all ('Virtual machine ... has quit unexpectedly'). The particular access point is always different and putting debug between them (printk/mdelay/...) moves the issue further away. The server model affects the issue as well: on Dell R420 I'm able to pass-through BCM5720 NIC to SMP guests without issues. While I'm obviously failing to reveal the essence of the issue I was able to come up with a (possible) solution: if explicit barriers are added to hv_pcifront_read_config()/hv_pcifront_write_config() the issue goes away. The essential minimum is rmb() at the end on _hv_pcifront_read_config() and wmb() at the end of _hv_pcifront_write_config() but I'm not confident it will be sufficient for all hardware. I suggest the following barriers: 1) wmb()/mb() between choosing the function and writing to its space. 2) mb() before releasing the spinlock in both _hv_pcifront_read_config()/ _hv_pcifront_write_config() to ensure that consecutive reads/writes to the space won't get re-ordered as drivers may count on that. Config space access is not supposed to be performance-critical so these explicit barriers should not cause any slowdown. [bhelgaas: use Linux "barriers" terminology] Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-05-02PCI: hv: Report resources release after stopping the busVitaly Kuznetsov
Kernel hang is observed when pci-hyperv module is release with device drivers still attached. E.g., when I do 'rmmod pci_hyperv' with BCM5720 device pass-through-ed (tg3 module) I see the following: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rmmod:2104] ... Call Trace: [<ffffffffa0641487>] tg3_read_mem+0x87/0x100 [tg3] [<ffffffffa063f000>] ? 0xffffffffa063f000 [<ffffffffa0644375>] tg3_poll_fw+0x85/0x150 [tg3] [<ffffffffa0649877>] tg3_chip_reset+0x357/0x8c0 [tg3] [<ffffffffa064ca8b>] tg3_halt+0x3b/0x190 [tg3] [<ffffffffa0657611>] tg3_stop+0x171/0x230 [tg3] ... [<ffffffffa064c550>] tg3_remove_one+0x90/0x140 [tg3] [<ffffffff813bee59>] pci_device_remove+0x39/0xc0 [<ffffffff814a3201>] __device_release_driver+0xa1/0x160 [<ffffffff814a32e3>] device_release_driver+0x23/0x30 [<ffffffff813b794a>] pci_stop_bus_device+0x8a/0xa0 [<ffffffff813b7ab6>] pci_stop_root_bus+0x36/0x60 [<ffffffffa02c3f38>] hv_pci_remove+0x238/0x260 [pci_hyperv] The problem seems to be that we report local resources release before stopping the bus and removing devices from it and device drivers may try to perform some operations with these resources on shutdown. Move resources release report after we do pci_stop_root_bus(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-04-30drivers:hv: Use new vmbus_mmio_free() from client drivers.Jake Oshins
This patch modifies all the callers of vmbus_mmio_allocate() to call vmbus_mmio_free() instead of release_mem_region(). Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-16PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMsJake Oshins
Add a new driver which exposes a root PCI bus whenever a PCI Express device is passed through to a guest VM under Hyper-V. The device can be single- or multi-function. The interrupts for the devices are managed by an IRQ domain, implemented within the driver. [bhelgaas: fold in race condition fix (http://lkml.kernel.org/r/1456340196-13717-1-git-send-email-jakeo@microsoft.com)] Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>