summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-04PCI: Improve pci_scan_bridge() and pci_scan_bridge_extend() docMika Westerberg
It is not immediately clear what the two functions actually return so add kernel-doc comment explaining it a bit better. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-06-04PCI: Move resource distribution for single bridge outside loopMika Westerberg
If there is only a single bridge on the bus, we assign all resources to it. Currently this is done as a part of the resource distribution loop but it does not have to be there, and moving it outside actually improves readability because we can then save one indent level in the loop. While there we can add hotplug_bridges == 1 && normal_bridges == 0 to the same block because they are dealt the same way. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-06-04PCI: Account for all bridges on bus when distributing bus numbersMika Westerberg
When distributing extra bus number space to hotplug bridges for future extension, we don't account for the fact that there might be non-hotplug bridges on the bus after the hotplug bridges. For example: 01:00.0 --+- 02:00.0 (HotPlug-) -- Thunderbolt host controller +- 02:01.0 (HotPlug+) \- 02:02.0 (HotPlug-) -- xHCI host controller pci_scan_child_bus_extend() is supposed to distribute the remaining bus numbers to the hotplug bridge at 02:01.0, but only after accounting for all bridges on bus 02. Since we don't check whether there's another non-hotplug bridge after the hotplug bridge 02:01.0, it may not leave space for the non-hotplug bridge: pci 0000:00:1b.0: PCI bridge to [bus 01-39] (Root Port) pci 0000:01:00.0: PCI bridge to [bus 02-39] ... pci 0000:02:00.0: PCI bridge to [bus 03] pci 0000:02:01.0: PCI bridge to [bus 04] pci_bus 0000:04: [bus 04-39] extended by 0x35 pci_bus 0000:04: bus scan returning with max=39 pci_bus 0000:04: busn_res: [bus 04-39] end is updated to 39 pci 0000:02:02.0: scanning [bus 00-00] behind bridge, pass 1 pci_bus 0000:3a: scanning bus pci_bus 0000:3a: bus scan returning with max=3a pci_bus 0000:3a: busn_res: [bus 3a] end is updated to 3a pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:02 [bus 02-39] pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:01 [bus 01-39] pci_bus 0000:02: bus scan returning with max=3a pci_bus 0000:02: busn_res: [bus 02-39] end can not be updated to 3a The resulting 'lspci -t' output looks like this: +-1b.0-[01-39]----00.0-[02-3a]--+-00.0-[03]----00.0 ^^ +-01.0-[04-39]-- \-02.0-[3a]----00.0 ^^ The xHCI host controller behind 02:02.0 is not usable because it would have to be assigned bus 3a, which is not accessible through 00:1b.0. To fix this, reserve at least one bus for each bridge while scanning already configured bridges. Then use this information in the second scan to correct the available extra bus space for hotplug bridges. After this change the 'lspci -t' output is what is expected: +-1b.0-[01-39]----00.0-[02-39]--+-00.0-[03]----00.0 +-01.0-[04-38]-- \-02.0-[39]----00.0 The xHCI controller is now on bus 39, where it is usable. Fixes: 1c02ea810065 ("PCI: Distribute available buses to hotplug-capable bridges") Reported-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org
2018-06-04ACPI / hotplug / PCI: Drop unnecessary parenthesesMika Westerberg
Remove unnecessary parentheses. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-04ACPI / hotplug / PCI: Mark stale PCI devices disconnectedMika Westerberg
Following PCIehp mark the unplugged PCI devices disconnected. This makes sure PCI core code leaves the now missing hardware registers alone. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-06-04ACPI / hotplug / PCI: Don't scan bridges managed by native hotplugMika Westerberg
When acpiphp re-enumerates a PCI hierarchy because of an ACPI Notify() event, we should skip bridges managed by native hotplug (pciehp or shpchp). We don't want to scan below a native hotplug bridge until the hotplug controller generates a hot-add event. A typical scenario is a Root Port leading to a Thunderbolt host router that remains powered off until something is connected to it. See [1] for the lspci details. 1. Before something is connected, only the Root Port exists. It has PCI_EXP_SLTCAP_HPC set and pciehp is responsible for hotplug: 00:1b.0 Root Port (HotPlug+) 2. When a USB-C or Thunderbolt device is connected, the Switch in the Thunderbolt host router is powered up, the Root Port signals a hotplug add event and pciehp enumerates the Switch: 01:00.0 Switch Upstream Port to [bus 02-39] 02:00.0 Switch Downstream Port to [bus 03] (HotPlug-, to NHI) 02:01.0 Switch Downstream Port to [bus 04-38] (HotPlug+, to Thunderbolt connector) 02:02.0 Switch Downstream Port to [bus 39] (HotPlug-, to xHCI) The 02:00.0 and 02:02.0 Ports lead to Endpoints that are not powered up yet. The Ports have PCI_EXP_SLTCAP_HPC cleared, so pciehp doesn't handle hotplug for them and we assign minimal resources to them. The 02:01.0 Port has PCI_EXP_SLTCAP_HPC set, so pciehp handles native hotplug events for it. 3. The BIOS powers up the xHCI controller. If a Thunderbolt device was connected (not just a USB-C device), it also powers up the NHI. Then it sends an ACPI Notify() to the Root Port, and acpiphp enumerates the new device(s): 03:00.0 Thunderbolt Host Controller (NHI) Endpoint 39:00.0 xHCI Endpoint 4. If a Thunderbolt device was connected, the host router firmware uses the NHI to set up Thunderbolt tunnels and triggers a native hotplug event (via 02:01.0 in this example). Then pciehp enumerates the new Thunderbolt devices: 04:00.0 Switch Upstream Port to [bus 05-38] 05:01.0 Switch Downstream Port to [bus 06-09] (HotPlug-) 05:04.0 Switch Downstream Port to [bus 0a-38] (HotPlug+) In this example, 05:01.0 leads to another Switch and some NICs. This subtree is static, so 05:01.0 doesn't support hotplug and has PCI_EXP_SLTCAP_HPC cleared. In step 3, acpiphp previously enumerated everything below the Root Port, including things below the 02:01.0 Port. We don't want that because pciehp expects to manage hotplug below that Port, and firmware on the host router may be in the middle of configuring its Link so it may not be ready yet. To make this work better with the native PCIe (pciehp) and standard PCI (shpchp) hotplug drivers, we let them handle all slot management and resource allocation for hotplug bridges and restrict ACPI hotplug to non-hotplug bridges. [1] https://bugzilla.kernel.org/show_bug.cgi?id=199581#c5 Link: https://lkml.kernel.org/r/20180529160155.1738-1-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> [bhelgaas: changelog, use hotplug_is_native() instead of dev->is_hotplug_bridge] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-06-04PCI: hotplug: Add hotplug_is_native()Mika Westerberg
Add hotplug_is_native() to find out whether the OS is supposed to handle native hotplug of a given bridge. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-06-04PCI: shpchp: Add shpchp_is_native()Mika Westerberg
In the same way we do for pciehp, add shpchp_is_native(), which returns true if the bridge should be handled by the native SHPC driver. Then convert the driver to use this function. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-04PCI: shpchp: Fix AMD POGO identificationBjorn Helgaas
The fix for an AMD POGO erratum related to SHPC incorrectly identified the device. The workaround should be applied only for AMD POGO devices, but it was instead applied to: - all AMD bridges, and - all devices from any vendor with device ID 0x7458 Fixes: 53044f357448 ("[PATCH] PCI Hotplug: shpchp: AMD POGO errata fix") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-04PCI: mobiveil: Add MSI supportSubrahmanya Lingappa
Implement MSI support for Mobiveil PCIe Host Bridge IP device driver. Signed-off-by: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Marc Zyngier <marc.zyngier@arm.com>
2018-06-04PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driverSubrahmanya Lingappa
Add a driver for Mobiveil AXI PCIe Host Bridge Soft IP - GPEX 4.0, a PCIe gen4 IP. This IP has upto 8 outbound and inbound windows for the address translation. Signed-off-by: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> [bhelgaas: fold in mobiveil_pcie_of_match[] NULL termination from Wei Yongjun <weiyongjun1@huawei.com>] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Marc Zyngier <marc.zyngier@arm.com>
2018-06-04Merge branch 'hch.procfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_show ide: remove ide_driver_proc_write isdn: replace ->proc_fops with ->proc_show atm: switch to proc_create_seq_private atm: simplify procfs code bluetooth: switch to proc_create_seq_data netfilter/x_tables: switch to proc_create_seq_private netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data neigh: switch to proc_create_seq_data hostap: switch to proc_create_{seq,single}_data bonding: switch to proc_create_seq_data rtc/proc: switch to proc_create_single_data drbd: switch to proc_create_single resource: switch to proc_create_seq_data staging/rtl8192u: simplify procfs code jfs: simplify procfs code ...
2018-06-04igb: Clear TSICR interrupts together with ICRJoanna Yurdal
Issuing "ip link set up/down" can block TSICR interrupts, what results in missing PTP Tx timestamp and no PPS pulse generation. Problem happens when the link is set up with the TSICR interrupts pending. ICR is cleared before enabling interrupts, while TSICR is not. When all TSICR interrupts are pending at this moment, time_sync interrupt will never be generated. TSICR should be cleared as well. In order to reproduce the issue: 1. Setup linux with IEEE 1588 grandmaster and PPS output enabled 2. Continue setting link up/down with random intervals between commands 3. Wait until PPS is not generated ( only one pulse is generated and PPS dies), and ptp4l complains constantly about Tx timeout. Signed-off-by: Joanna Yurdal <jyu@trackman.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04netfilter: ebtables: fix compat entry paddingAlin Nastac
On arm64, ebt_entry_{match,watcher,target} structs are 40 bytes long while on 32-bit arm these structs have a size of 36 bytes. COMPAT_XT_ALIGN() macro cannot be used here to determine the necessary padding for the CONFIG_COMPAT because it imposes an 8-byte boundary alignment, condition that is not found in 32-bit ebtables application. Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-04Merge branch 'work.rmdir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rmdir update from Al Viro: "More shrink_dcache_parent()-related stuff - killing the main source of potentially contended calls of that on large subtrees" * 'work.rmdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rmdir(),rename(): do shrink_dcache_parent() only on success
2018-06-04Documentation: e1000: Update kernel documentationJeff Kirsher
Updated the e1000.txt kernel documentation with the latest information. Also convert the text file to reStructuredText (RST) format, since the Linux kernel documentation now uses this format for documentation. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-06-04drm/msm: Fix NULL deref on bind/probe deferralSean Paul
This patch avoids dereferencing msm_host->dev when it is NULL. If we find ourselves tearing down dsi before calling (mdp4|mdp5|dpu)_kms_init(), we'll end up in a state where the dev pointer is NULL and trying to extract priv from it will fail. This was introduced in a seemingly innocuous commit to ensure the arguments to msm_gem_put_iova() are correct (even though that function has been a stub for ~5 years). Correctness FTW! \o/ Fixes: b01884a286b0 drm/msm: use correct aspace pointer in msm_gem_put_iova() Cc: Daniel Mack <daniel@zonque.org> Cc: Rob Clark <robdclark@gmail.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-04drm/msm: Switch to atomic_helper_commit()Sean Paul
Now that all of the msm-specific goo is tucked safely away we can switch over to using the atomic helper commit directly. \o/ Changes in v2: - None Changes in v3: - Rebased on Archit's private_obj set Changes in v4: - None Cc: Abhinav Kumar <abhinavk@codeaurora.org> Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-04Merge branches 'x86/dma', 'x86/microcode', 'x86/mm' and 'x86/vdso' into ↵Ingo Molnar
x86/urgent Merge these small and simple 1-2 commit branches into the urgent branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-04drm/msm: Remove msm_commit/worker, use atomic helper commitSean Paul
Moving further towards switching fully to the the atomic helpers, this patch removes the hand-rolled worker nonblock commit code and uses the atomic helpers commit_work model. Changes in v2: - Remove commit_destroy() - Shuffle order of commit_tail calls to further serialize commits - Use stall in swap_state to avoid abandoned events on disable Changes in v3: - Rebased on Archit's private_obj set Changes in v4: - None Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-04drm/msm: Issue queued events when disabling crtcSean Paul
Ensure that any queued events are issued when disabling the crtc. This avoids timeouts when we come back and wait for dependencies (like the previous frame's flip_done). Changes in v2: - None Changes in v3: - Rebased on Archit's private_obj set Changes in v4: - None Reviewed-by: Archit Taneja <architt@codeaurora.org> Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-04drm/msm: Move implicit sync handling to prepare_fbSean Paul
In preparation for moving to atomic helpers, move the implicit sync fence handling out of atomic commit and into the plane->prepare_fb() hook. While we're at it, de-duplicate the mdp*_prepare_fb functions. Changes in v4: - Added Reported-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-04drm/msm: Refactor complete_commit() to look more the helpersSean Paul
Factor out the commit_tail() portions of complete_commit() into a separate function to facilitate moving to the atomic helpers in future patches. Changes in v2: - None Changes in v3: - Rebased on Archit's private_obj set Changes in v4: - None Cc: Jeykumar Sankaran <jsanka@codeaurora.org> Reviewed-by: Archit Taneja <architt@codeaurora.org> Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-04IB/hns: Use zeroing memory allocator instead of allocator/memsetYueHaibing
Use dma_zalloc_coherent for allocating zeroed memory and remove unnecessary memset function. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04Documentation: e100: Update the Intel 10/100 driver docJeff Kirsher
Over the years, several of the links have changed or are no longer valid so update them. In addition, the default values were incorrect for a couple of parameters. Converted the text file to the reStructuredText (RST) format, since the Linux kernel documentation now uses this format for documentation. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-06-04e1000e: Ignore TSYNCRXCTL when getting I219 clock attributesBenjamin Poirier
There have been multiple reports of crashes that look like kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50 [...] kernel: Call Trace: kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e] kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e] kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440 kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0 kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0 kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70 These can be traced back to the fact that e1000e_systim_reset() skips the timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which leads to a null deref in timecounter_read(). Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked e1000e_get_base_timinca() in such a way that it can return -EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL. Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12") adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads sometimes don't have the SYSCFI bit set. Retrying the read shortly after finds the bit to be set. This was observed at boot (probe) but also link up and link down. Moreover, the phc (PTP Hardware Clock) seems to operate normally even after reads where SYSCFI=0. Therefore, remove this register read and unconditionally set the clock parameters. Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876 Fixes: 83129b37ef35 ("e1000e: fix systim issues") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04ipvs: fix check on xmit to non-local addressesJulian Anastasov
There is mistake in the rt_mode_allow_non_local assignment. It should be used to check if sending to non-local addresses is allowed, now it checks if local addresses are allowed. As local addresses are allowed for most of the cases, the only places that are affected are for traffic to transparent cache servers: - bypass connections when cache server is not available - related ICMP in FORWARD hook when sent to cache server Fixes: 4a4739d56b00 ("ipvs: Pull out crosses_local_route_boundary logic") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-04netfilter: nft_reject_bridge: fix skb allocation size in ↵Taehee Yoo
nft_reject_br_send_v6_unreach In order to allocate icmpv6 skb, sizeof(struct ipv6hdr) should be used. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-04Merge tag 'tpmdd-next-20180602' of ↵James Morris
git://git.infradead.org/users/jjs/linux-tpmdd into next-tpm tpmdd fixes for Linux 4.17 (just missed 4.17)
2018-06-04NFSv4: Don't ask for delegated attributes when adding a hard linkTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFSv4: Don't ask for delegated attributes when revalidating the inodeTrond Myklebust
Again, when revalidating the inode, we don't need to ask for attributes for which we are authoritative. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFS: Pass the inode down to the getattr() callbackTrond Myklebust
Allow the getattr() callback to check things like whether or not we hold a delegation so that it can adjust the attributes that it is asking for. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFSv4: Don't request size+change attribute if they are delegated to usTrond Myklebust
When we hold a delegation, we should not need to request attributes such as the file size or the change attribute. For some servers, avoiding asking for these unneeded attributes can improve the overall system performance. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04Merge branch 'work.dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "This is the first part of dealing with livelocks etc around shrink_dcache_parent()." * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: restore cond_resched() in shrink_dcache_parent() dput(): turn into explicit while() loop dcache: move cond_resched() into the end of __dentry_kill() d_walk(): kill 'finish' callback d_invalidate(): unhash immediately
2018-06-04kvm: nVMX: Add support for "VMWRITE to any supported field"Jim Mattson
Add support for "VMWRITE to any supported field in the VMCS" and enable this feature by default in L1's IA32_VMX_MISC MSR. If userspace clears the VMX capability bit, the old behavior will be restored. Note that this feature is a prerequisite for kvm in L1 to use VMCS shadowing, once that feature is available. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-04kvm: nVMX: Restrict VMX capability MSR changesJim Mattson
Disallow changes to the VMX capability MSRs while the vCPU is in VMX operation. Although this does break the existing API, it helps to avoid some potentially tricky situations for which there is no architected behavior. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-04KVM: VMX: Optimize tscdeadline timer latencyWanpeng Li
'Commit d0659d946be0 ("KVM: x86: add option to advance tscdeadline hrtimer expiration")' advances the tscdeadline (the timer is emulated by hrtimer) expiration in order that the latency which is incurred by hypervisor (apic_timer_fn -> vmentry) can be avoided. This patch adds the advance tscdeadline expiration support to which the tscdeadline timer is emulated by VMX preemption timer to reduce the hypervisor lantency (handle_preemption_timer -> vmentry). The guest can also set an expiration that is very small (for example in Linux if an hrtimer feeds a expiration in the past); in that case we set delta_tsc to 0, leading to an immediately vmexit when delta_tsc is not bigger than advance ns. This patch can reduce ~63% latency (~4450 cycles to ~1660 cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing busy waits. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-04of: platform: stop accessing invalid dev in of_platform_device_destroySrinivas Kandagatla
Immediately after the platform_device_unregister() the device will be cleaned up. Accessing the freed pointer immediately after that will crash the system. Found this bug when kernel is built with CONFIG_PAGE_POISONING and testing loading/unloading audio drivers in a loop on Qcom platforms. Fix this by moving of_node_clear_flag() just before the unregister calls. Below is the crash trace: Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c03 Mem abort info: ESR = 0x96000021 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000021 CM = 0, WnR = 0 [006b6b6b6b6b6c03] address between user and kernel address ranges Internal error: Oops: 96000021 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 1784 Comm: sh Tainted: G W 4.17.0-rc7-02230-ge3a63a7ef641-dirty #204 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) pstate: 80000005 (Nzcv daif -PAN -UAO) pc : clear_bit+0x18/0x2c lr : of_platform_device_destroy+0x64/0xb8 sp : ffff00000c9c3930 x29: ffff00000c9c3930 x28: ffff80003d39b200 x27: ffff000008bb1000 x26: 0000000000000040 x25: 0000000000000124 x24: ffff80003a9a3080 x23: 0000000000000060 x22: ffff00000939f518 x21: ffff80003aa79e98 x20: ffff80003aa3dae0 x19: ffff80003aa3c890 x18: ffff800009feb794 x17: 0000000000000000 x16: 0000000000000000 x15: ffff800009feb790 x14: 0000000000000000 x13: ffff80003a058778 x12: ffff80003a058728 x11: ffff80003a058750 x10: 0000000000000000 x9 : 0000000000000006 x8 : ffff80003a825988 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000001 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000008 x2 : 0000000000000001 x1 : 6b6b6b6b6b6b6c03 x0 : 0000000000000000 Process sh (pid: 1784, stack limit = 0x (ptrval)) Call trace: clear_bit+0x18/0x2c q6afe_remove+0x20/0x38 apr_device_remove+0x30/0x70 device_release_driver_internal+0x170/0x208 device_release_driver+0x14/0x20 bus_remove_device+0xcc/0x150 device_del+0x10c/0x310 device_unregister+0x1c/0x70 apr_remove_device+0xc/0x18 device_for_each_child+0x50/0x80 apr_remove+0x18/0x20 rpmsg_dev_remove+0x38/0x68 device_release_driver_internal+0x170/0x208 device_release_driver+0x14/0x20 bus_remove_device+0xcc/0x150 device_del+0x10c/0x310 device_unregister+0x1c/0x70 qcom_smd_remove_device+0xc/0x18 device_for_each_child+0x50/0x80 qcom_smd_unregister_edge+0x3c/0x70 smd_subdev_remove+0x18/0x28 rproc_stop+0x48/0xd8 rproc_shutdown+0x60/0xe8 state_store+0xbc/0xf8 dev_attr_store+0x18/0x28 sysfs_kf_write+0x3c/0x50 kernfs_fop_write+0x118/0x1e0 __vfs_write+0x18/0x110 vfs_write+0xa4/0x1a8 ksys_write+0x48/0xb0 sys_write+0xc/0x18 el0_svc_naked+0x30/0x34 Code: d2800022 8b400c21 f9800031 9ac32043 (c85f7c22) ---[ end trace 32020935775616a2 ]--- Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2018-06-04infiniband: fix a possible use-after-free bugCong Wang
ucma_process_join() will free the new allocated "mc" struct, if there is any error after that, especially the copy_to_user(). But in parallel, ucma_leave_multicast() could find this "mc" through idr_find() before ucma_process_join() frees it, since it is already published. So "mc" could be used in ucma_leave_multicast() after it is been allocated and freed in ucma_process_join(), since we don't refcnt it. Fix this by separating "publish" from ID allocation, so that we can get an ID first and publish it later after copy_to_user(). Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Reported-by: Noam Rathaus <noamr@beyondsecurity.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04net: mvpp2: mvpp2_percpu_read_relaxed() can be statickbuild test robot
Fixes: db9d7d36eecc ("net: mvpp2: Split the PPv2 driver to a dedicated directory") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04iw_cxgb4: add INFINIBAND_ADDR_TRANS dependencyArnd Bergmann
The newly added fill_res_ep_entry function fails to link if CONFIG_INFINIBAND_ADDR_TRANS is not set: drivers/infiniband/hw/cxgb4/restrack.o: In function `fill_res_ep_entry': restrack.c:(.text+0x3cc): undefined reference to `rdma_res_to_id' restrack.c:(.text+0x3d0): undefined reference to `rdma_iw_cm_id' This adds a Kconfig dependency for the driver. Fixes: 116aeb887371 ("iw_cxgb4: provide detailed provider-specific CM_ID information") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Thelen <gthelen@google.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04net/packet: refine check for priv area sizeEric Dumazet
syzbot was able to trick af_packet again [1] Various commits tried to address the problem in the past, but failed to take into account V3 header size. [1] tpacket_rcv: packet too big, clamped from 72 to 4294967224. macoff=96 BUG: KASAN: use-after-free in prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline] BUG: KASAN: use-after-free in prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039 Write of size 2 at addr ffff8801cb62000e by task kworker/1:2/2106 CPU: 1 PID: 2106 Comm: kworker/1:2 Not tainted 4.17.0-rc7+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_store2_noabort+0x17/0x20 mm/kasan/report.c:436 prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline] prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039 __packet_lookup_frame_in_block net/packet/af_packet.c:1094 [inline] packet_current_rx_frame net/packet/af_packet.c:1117 [inline] tpacket_rcv+0x1866/0x3340 net/packet/af_packet.c:2282 dev_queue_xmit_nit+0x891/0xb90 net/core/dev.c:2018 xmit_one net/core/dev.c:3049 [inline] dev_hard_start_xmit+0x16b/0xc10 net/core/dev.c:3069 __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584 dev_queue_xmit+0x17/0x20 net/core/dev.c:3617 neigh_resolve_output+0x679/0xad0 net/core/neighbour.c:1358 neigh_output include/net/neighbour.h:482 [inline] ip6_finish_output2+0xc9c/0x2810 net/ipv6/ip6_output.c:120 ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154 NF_HOOK_COND include/linux/netfilter.h:277 [inline] ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:288 [inline] ndisc_send_skb+0x100d/0x1570 net/ipv6/ndisc.c:491 ndisc_send_ns+0x3c1/0x8d0 net/ipv6/ndisc.c:633 addrconf_dad_work+0xbef/0x1340 net/ipv6/addrconf.c:4033 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 The buggy address belongs to the page: page:ffffea00072d8800 count:0 mapcount:-127 mapping:0000000000000000 index:0xffff8801cb620e80 flags: 0x2fffc0000000000() raw: 02fffc0000000000 0000000000000000 ffff8801cb620e80 00000000ffffff80 raw: ffffea00072e3820 ffffea0007132d20 0000000000000002 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801cb61ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801cb61ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8801cb620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8801cb620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8801cb620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: 2b6867c2ce76 ("net/packet: fix overflow in check for priv area size") Fixes: dc808110bb62 ("packet: handle too big packets for PACKET_V3") Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04dt-bindings: net: ravb: Add support for r8a77990 SoCYoshihiro Shimoda
Add documentation for r8a77990 compatible string to renesas ravb device tree bindings documentation. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Rob Herring <robh@kernel.org>
2018-06-04net: aquantia: make function aq_fw2x_get_mac_permanent staticColin Ian King
The function aq_fw2x_get_mac_permanent is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: warning: symbol 'aq_fw2x_get_mac_permanent' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04samples/bpf: minor *_nb_free performance fixMagnus Karlsson
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-04samples/bpf: adapted to new uapiBjörn Töpel
Here, the xdpsock sample application is adjusted to the new descriptor format. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-04xsk: new descriptor addressing schemeBjörn Töpel
Currently, AF_XDP only supports a fixed frame-size memory scheme where each frame is referenced via an index (idx). A user passes the frame index to the kernel, and the kernel acts upon the data. Some NICs, however, do not have a fixed frame-size model, instead they have a model where a memory window is passed to the hardware and multiple frames are filled into that window (referred to as the "type-writer" model). By changing the descriptor format from the current frame index addressing scheme, AF_XDP can in the future be extended to support these kinds of NICs. In the index-based model, an idx refers to a frame of size frame_size. Addressing a frame in the UMEM is done by offseting the UMEM starting address by a global offset, idx * frame_size + offset. Communicating via the fill- and completion-rings are done by means of idx. In this commit, the idx is removed in favor of an address (addr), which is a relative address ranging over the UMEM. To convert an idx-based address to the new addr is simply: addr = idx * frame_size + offset. We also stop referring to the UMEM "frame" as a frame. Instead it is simply called a chunk. To transfer ownership of a chunk to the kernel, the addr of the chunk is passed in the fill-ring. Note, that the kernel will mask addr to make it chunk aligned, so there is no need for userspace to do that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or 3000 to the fill-ring will refer to the same chunk. On the completion-ring, the addr will match that of the Tx descriptor, passed to the kernel. Changing the descriptor format to use chunks/addr will allow for future changes to move to a type-writer based model, where multiple frames can reside in one chunk. In this model passing one single chunk into the fill-ring, would potentially result in multiple Rx descriptors. This commit changes the uapi of AF_XDP sockets, and updates the documentation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-04xsk: proper Rx drop statistics updateBjörn Töpel
Previously, rx_dropped could be updated incorrectly, e.g. if the XDP program redirected the frame to a socket bound to a different queue than where the XDP program was executing. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-04xsk: proper fill queue descriptor validationBjörn Töpel
Previously the fill queue descriptor was not copied to kernel space prior validating it, making it possible for userland to change the descriptor post-kernel-validation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-04IB/isert: use T10-PI check mask definitions from core layerMax Gurtovoy
No reason to use hard-coded protection information checks in ib_isert driver. Use check masks from RDMA core driver. Also, while we here, reduce the number of instructions made for setting the check mask (no need to do bitwise or with 0 since we zero the mask in the beginning of the function). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>