summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-31Btrfs: scrub: batch rebuild for raid56Liu Bo
In case of raid56, writes and rebuilds always take BTRFS_STRIPE_LEN(64K) as unit, however, scrub_extent() sets blocksize as unit, so rebuild process may be triggered on every block on a same stripe. A typical example would be that when we're replacing a disappeared disk, all reads on the disks get -EIO, every block (size is 4K if blocksize is 4K) would go thru these, scrub_handle_errored_block scrub_recheck_block # re-read pages one by one scrub_recheck_block # rebuild by calling raid56_parity_recover() page by page Although with raid56 stripe cache most of reads during rebuild can be avoided, the parity recover calculation(xor or raid6 algorithms) needs to be done $(BTRFS_STRIPE_LEN / blocksize) times. This makes it smarter by doing raid56 scrub/replace on stripe length. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: sort and group mount option definitionsDavid Sterba
Sort mount options by the primary name, followed by the 'no-' counterpart if it exists. Group the deprecated and debugging options. Enum and token defintions are synced. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Add nossd_spread mount optionHoward McLauchlan
Btrfs has two mount options for SSD optimizations: ssd and ssd_spread. Presently there is an option to disable all SSD optimizations, but there isn't an option to disable just ssd_spread. This patch adds a mount option nossd_spread that disables ssd_spread only. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Remove btrfs_fs_info::open_ioctl_transNikolay Borisov
Since userspace transaction have been removed we no longer have use for this field so delete it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Remove code referencing unused TRANS_USERSPACENikolay Borisov
Now that the userspace transaction ioctls have been removed, TRANS_USERSPACE is no longer used hence we can remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Remove btrfs_file_private::transNikolay Borisov
Now that the userspace transaction IOCTL have been removed, this member is no longer used so just remove it Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Remove userspace transaction ioctlsNikolay Borisov
Commit 3558d4f88ec8 ("btrfs: Deprecate userspace transaction ioctls") marked the beginning of the end of userspace transaction. This commit finishes the job! There are no known users and ceph does not use the ioctl anymore. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Sage Weil <sage@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Fix root item corruption when multiple same source snapshots ↵Qu Wenruo
are created with quota enabled When multiple pending snapshots referring to the same source subvolume are executed, enabled quota will cause root item corruption, where root items are using old bytenr (no backref in extent tree). This can be triggered by fstests btrfs/152. The cause is when source subvolume is still dirty, extra commit (simplied transaction commit) of qgroup_account_snapshot() can skip dirty roots not recorded in current transaction, making root item of source subvolume not updated. Fix it by forcing recording source subvolume in current transaction before qgroup sub-transaction commit. Reported-by: Justin Maggard <jmaggard@netgear.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Relax memory barrier in btrfs_tree_unlockNikolay Borisov
When performing an unlock on an extent buffer we'd like to order the decrement of extent_buffer::blocking_writers with waking up any waiters. In such situations it's sufficient to use smp_mb__after_atomic rather than the heavy smp_mb. On architectures where atomic operations are fully ordered (such as x86 or s390) unconditionally executing a heavyweight smp_mb instruction causes a severe hit to performance while bringin no improvements in terms of correctness. The better thing is to use the appropriate smp_mb__after_atomic routine which will do the correct thing (invoke a full smp_mb or in the case of ordered atomics insert a compiler barrier). Put another way, an RMW atomic op + smp_load__after_atomic equals, in terms of semantics, to a full smp_mb. This ensures that none of the problems described in the accompanying comment of waitqueue_active occur. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: add define for oldest generationAnand Jain
Some functions can filter metadata by the generation. Add a define that will annotate such arguments. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: open code trivial helper btrfs_page_exists_in_rangeDavid Sterba
The called function name is self explanatory. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Use filemap_range_has_page()Matthew Wilcox
The current implementation of btrfs_page_exists_in_range() gives the wrong answer if the workingset code has stored a shadow entry in the page cache. The filemap_range_has_page() function does not have this problem, and it's shared code, so use it instead. eigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-30net/mlx5e: Use inline MTTs in UMR WQEsTariq Toukan
When modifying the page mapping of a HW memory region (via a UMR post), post the new values inlined in WQE, instead of using a data pointer. This is a micro-optimization, inline UMR WQEs of different rings scale better in HW. In addition, this obsoletes a few control flows and helps delete ~50 LOC. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Do not busy-wait for UMR completion in Striding RQTariq Toukan
Do not busy-wait a pending UMR completion. Under high HW load, busy-waiting a delayed completion would fully utilize the CPU core and mistakenly indicate a SW bottleneck. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Code movements in RX UMR WQE postTariq Toukan
Gets the process of a UMR WQE post in one function, in preparation for a downstream patch that inlines the WQE data. No functional change here. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Derive Striding RQ size from MTUTariq Toukan
In Striding RQ, each WQE serves multiple packets (hence called Multi-Packet WQE, MPWQE). The size of a MPWQE is constant (currently 256KB). Upon a ringparam set operation, we calculate the number of MPWQEs per RQ. For this, first it is needed to determine the number of packets that can reside within a single MPWQE. In this patch we use the actual MTU size instead of ETH_DATA_LEN for this calculation. This implies that a change in MTU might require a change in Striding RQ ring size. In addition, this obsoletes some WQEs-to-packets translation functions and helps delete ~60 LOC. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Save MTU in channels paramsTariq Toukan
Knowing the MTU is required for RQ creation flow. By our design, channels creation flow is totally isolated from priv/netdev, and can be completed with access to channels params and mdev. Adding the MTU to the channels params helps preserving that. In addition, we save it in RQ to make its access faster in datapath checks. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: IPoIB, Fix spelling mistakeTalat Batheesh
Fix spelling mistake in debug message text. "dettaching" -> "detaching" Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5: Change teardown with force mode failure message to warningAlaa Hleihel
With ConnectX-4, we expect the force teardown to fail in case that DC was enabled, therefore change the message from error to warning. Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5: Eliminate query xsrq dead codeSaeed Mahameed
1. This function is not used anywhere in mlx5 driver 2. It has a memcpy statement that makes no sense and produces build warning with gcc8 drivers/net/ethernet/mellanox/mlx5/core/transobj.c: In function 'mlx5_core_query_xsrq': drivers/net/ethernet/mellanox/mlx5/core/transobj.c:347:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] Fixes: 01949d0109ee ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Use eq ptr from cqSaeed Mahameed
Instead of looking for the EQ of the CQ, remove that redundant code and use the eq pointer stored in the cq struct. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30PCI: Always define the of_node helpersBjørn Mork
Simply move these inline functions outside the ifdef instead of duplicating them as stubs in the !OF case. The struct device of_node field does not depend on OF. This also fixes the missing stubbed pci_bus_to_OF_node(). Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2018-03-30PCI/DPC: Do not enable DPC if AER control is not allowed by the BIOSMika Westerberg
Commit eed85ff4c0da ("PCI/DPC: Enable DPC only if AER is available") made DPC control dependent whether AER is enabled in the OS. However, it does not take into account situations where BIOS has not given OS control of AER: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] acpi PNP0A08:00: _OSC: platform does not support [AER] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] I think here it is better not to enable DPC even if the capability is available because then it would be against what "Determination of DPC Control" note in PCIe 4.0 sec 6.1.10 recommends. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2018-03-30Input: stmfts, s6sy761 - update my e-mailAndi Shyti
Because I will be leaving Samsung soon, for reachability update my reference e-mail to etezian.org. Signed-off-by: Andi Shyti <andi.shyti@samsung.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-03-30Input: stmfts - use async probe & suspend/resume to avoid 2s delayMarek Szyprowski
Executing stmfts_power_on() function lasts over 2 seconds, what significantly slows down the boot and resume processes if driver is compiled in. Avoid this delay by forcing this driver to be probed and suspended/resumed asynchronously. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-03-30Input: ALPS - fix TrackStick detection on Thinkpad L570 and Latitude 7370Masaki Ota
The primary interface for the touchpad device in Thinkpad L570 is SMBus, so ALPS overlooked PS2 interface Firmware setting of TrackStick, and shipped with TrackStick otp bit is disabled. The address 0xD7 contains device number information, so we can identify the device by checking this value, but to access it we need to enable Command mode, and then re-enable the device. Devices shipped in Thinkpad L570 report either 0x0C or 0x1D as device numbers, if we see them we assume that the devices are DualPoints. The same issue exists on Dell Latitude 7370. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196929 Fixes: 646580f793 ("Input: ALPS - fix multi-touch decoding on SS4 plus touchpads") Signed-off-by: Masaki Ota <masaki.ota@jp.alps.com> Tested-by: Aaron Ma <aaron.ma@canonical.com> Tested-by: Jonathan Liu <net147@gmail.com> Tested-by: Jaak Ristioja <jaak@ristioja.ee> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-03-30PCI/AER: Use cached AER Capability offsetFrederick Lawler
Replace pci_find_ext_capability(..., PCI_EXT_CAP_ID_ERR) calls with pci_dev->aer_cap. pci_dev->aer_cap is initialized in pci_init_capabilities(), which happens before any of these users of the AER Capability. Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-03-30PCI/portdrv: Rename and reverse sense of pcie_ports_autoBjorn Helgaas
The platform may restrict the OS's use of PCIe services, e.g., via the ACPI _OSC method. The user may use "pcie_ports=native" to force the port driver to use PCIe services even if the platform asked us not to. The "pcie_ports=native" parameter determines the setting of pcie_ports_auto. Rename this to pcie_ports_native and reverse the sense to simplify the code. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-03-30PCI/portdrv: Encapsulate pcie_ports_auto inside the port driverBjorn Helgaas
"pcie_ports_auto" is only used inside the PCIe port driver itself, so move it from include/linux/pci.h to portdrv.h so it's not visible to the whole kernel. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-03-30PCI/portdrv: Remove unnecessary "pcie_ports=auto" parameterBjorn Helgaas
The "pcie_ports=auto" parameter set pcie_ports_disabled and pcie_ports_auto to their compiled-in defaults, so specifying the parameter is the same as not using it at all. Remove the "pcie_ports=auto" parameter and update the documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-03-30PCI/portdrv: Remove "pcie_hp=nomsi" kernel parameterBjorn Helgaas
7570a333d8b0 ("PCI: Add pcie_hp=nomsi to disable MSI/MSI-X for pciehp driver") added the "pcie_hp=nomsi" kernel parameter to work around this error on shutdown: irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 1081, comm: reboot Not tainted 3.2.0 #1 ... Disabling IRQ #16 This happened on an unspecified system (possibly involving the Integrated Device Technology, Inc. Device 807f bridge) where "an un-wanted interrupt is generated when PCI driver switches from MSI/MSI-X to INTx while shutting down the device." The implication was that the device was buggy, but it is normal for a device to use INTx after MSI/MSI-X have been disabled. The only problem was that the driver was still attached and it wasn't prepared for INTx interrupts. Prarit Bhargava fixed this issue with fda78d7a0ead ("PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()"). There is no automated way to set this parameter, so it's not very useful for distributions or end users. It's really only useful for debugging, and we have "pci=nomsi" for that purpose. Revert 7570a333d8b0 to remove the "pcie_hp=nomsi" parameter. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> CC: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com> CC: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> CC: Prarit Bhargava <prarit@redhat.com>
2018-03-30PCI/portdrv: Remove unnecessary include of <linux/pci-aspm.h>Bjorn Helgaas
portdrv_pci.c doesn't use anything from <linux/pci-aspm.h>. Remove the include of it. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-30PCI/portdrv: Simplify PCIe feature permission checkingBjorn Helgaas
Some PCIe features (AER, DPC, hotplug, PME) can be managed by either the platform firmware or the OS, so the host bridge driver may have to request permission from the platform before using them. On ACPI systems, this is done by negotiate_os_control() in acpi_pci_root_add(). The PCIe port driver later uses pcie_port_platform_notify() and pcie_port_acpi_setup() to figure out whether it can use these features. But all we need is a single bit for each service, so these interfaces are needlessly complicated. Simplify this by adding bits in the struct pci_host_bridge to show when the OS has permission to use each feature: + unsigned int native_aer:1; /* OS may use PCIe AER */ + unsigned int native_hotplug:1; /* OS may use PCIe hotplug */ + unsigned int native_pme:1; /* OS may use PCIe PME */ These are set when we create a host bridge, and the host bridge driver can clear the bits corresponding to any feature the platform doesn't want us to use. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-30PCI/portdrv: Remove unused PCIE_PORT_SERVICE_VCBjorn Helgaas
No driver registers for PCIE_PORT_SERVICE_VC, so remove it. This removes the VC "service" files from /sys/bus/pci_express/devices, e.g., 0000:07:00.0:pcie108, 0000:08:04.0:pcie208 (all the files that contained "8" as the last digit of the "pcieXXX" part). The port driver created these files for PCIe port devices that have a VC Capability. Since this reduces PCIE_PORT_DEVICE_MAXSERVICES and moves DPC down into the spot where VC used to be, the DPC sysfs files will now be named "pcieXX8". I don't think there's anything useful userspace can do with those files, so I hope nobody cares about these filenames. There is no VC driver that calls pcie_port_service_register(), so there never was a /sys/bus/pci_express/drivers/vc directory. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-03-30PCI/portdrv: Remove pcie_port_bus_type link order dependencyBjorn Helgaas
The pcie_port_bus_type must be registered before drivers that depend on it can be registered. Those drivers include: pcied_init() # PCIe native hotplug driver aer_service_init() # AER driver dpc_service_init() # DPC driver pcie_pme_service_init() # PME driver Previously we registered pcie_port_bus_type from pcie_portdrv_init(), a device_initcall. The callers of pcie_port_service_register() (above) are also device_initcalls. This is fragile because the device_initcall ordering depends on link order, which is not explicit. Register pcie_port_bus_type from pci_driver_init() along with pci_bus_type. This removes the link order dependency between portdrv and the pciehp, AER, DPC, and PCIe PME drivers. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-03-30PCI/portdrv: Disable port driver in compat modeBjorn Helgaas
The "pcie_ports=compat" kernel parameter sets pcie_ports_disabled, which is intended to disable the PCIe port driver. But even when it was disabled, we registered pcie_portdriver so we could work around a BIOS PME issue (see fe31e69740ed ("PCI/PCIe: Clear Root PME Status bits early during system resume")). Registering the driver meant that the pcie_portdrv_probe() path called pci_enable_device(), pci_save_state(), pm_runtime_set_autosuspend_delay(), pm_runtime_use_autosuspend(), etc., even when the driver was disabled. We've since moved the BIOS PME workaround from the port driver to the core, so stop registering the PCIe port driver in compat mode. This means "pcie_ports=compat" will now be basically the same as turning off CONFIG_PCIEPORTBUS completely. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-30PCI/PM: Clear PCIe PME Status bit for Root Complex Event CollectorsBjorn Helgaas
Per PCIe r4.0, sec 6.1.6, Root Complex Event Collectors can generate PME interrupts on behalf of Root Complex Integrated Endpoints. Linux does not currently enable PME interrupts from RC Event Collectors, but fe31e69740ed ("PCI/PCIe: Clear Root PME Status bits early during system resume") suggests PME interrupts may be enabled by the platform for ACPI- based runtime wakeup. Clear the PCIe PME Status bit for Root Complex Event Collectors during resume, just like we already do for Root Ports. If the BIOS enables PME interrupts for an event collector and neglects to clear the status bit on resume, this change should fix the same bug as fe31e69740ed (PMEs not working after waking from a sleep state), but for Root Complex Integrated Endpoints. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-30Merge branch 'bpf-sockmap-sg-api-fixes'Daniel Borkmann
Prashant Bhole says: ==================== These patches fix sg api usage in sockmap. Previously sockmap didn't use sg_init_table(), which caused hitting BUG_ON in sg api, when CONFIG_DEBUG_SG is enabled v1: added sg_init_table() calls wherever needed. v2: - Patch1 adds new helper function in sg api. sg_init_marker() - Patch2 sg_init_marker() and sg_init_table() in appropriate places Backgroud: While reviewing v1, John Fastabend raised a valid point about unnecessary memset in sg_init_table() because sockmap uses sg table which embedded in a struct. As enclosing struct is zeroed out, there is unnecessary memset in sg_init_table. So Daniel Borkmann suggested to define another static inline function in scatterlist.h which only initializes sg_magic. Also this function will be called from sg_init_table. From this suggestion I defined a function sg_init_marker() which sets sg_magic and calls sg_mark_end() ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30bpf: sockmap: initialize sg table entries properlyPrashant Bhole
When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized in sg_init_table() and it is verified in sg api while navigating. We hit BUG_ON when magic check is failed. In functions sg_tcp_sendpage and sg_tcp_sendmsg, the struct containing the scatterlist is already zeroed out. So to avoid extra memset, we use sg_init_marker() to initialize sg_magic. Fixed following things: - In bpf_tcp_sendpage: initialize sg using sg_init_marker - In bpf_tcp_sendmsg: Replace sg_init_table with sg_init_marker - In bpf_tcp_push: Replace memset with sg_init_table where consumed sg entry needs to be re-initialized. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30lib/scatterlist: add sg_init_marker() helperPrashant Bhole
sg_init_marker initializes sg_magic in the sg table and calls sg_mark_end() on the last entry of the table. This can be useful to avoid memset in sg_init_table() when scatterlist is already zeroed out For example: when scatterlist is embedded inside other struct and that container struct is zeroed out Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidthTal Gilboa
Add pcie_bandwidth_capable() to compute the max link bandwidth supported by a device, based on the max link speed and width, adjusted by the encoding overhead. The maximum bandwidth of the link is computed as: max_link_width * max_link_speed * (1 - encoding_overhead) 2.5 and 5.0 GT/s links use 8b/10b encoding, which reduces the raw bandwidth available by 20%; 8.0 GT/s and faster links use 128b/130b encoding, which reduces it by about 1.5%. The result is in Mb/s, i.e., megabits/second, of raw bandwidth. Signed-off-by: Tal Gilboa <talgi@mellanox.com> [bhelgaas: add 16 GT/s, adjust for pcie_get_speed_cap() and pcie_get_width_cap() signatures, don't export outside drivers/pci] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-03-30PCI: Add pcie_get_width_cap() to find max supported link widthTal Gilboa
Add pcie_get_width_cap() to find the max link width supported by a device. Change max_link_width_show() to use pcie_get_width_cap(). Signed-off-by: Tal Gilboa <talgi@mellanox.com> [bhelgaas: return width directly instead of error and *width, don't export outside drivers/pci] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2018-03-30PCI: Add pcie_get_speed_cap() to find max supported link speedTal Gilboa
Add pcie_get_speed_cap() to find the max link speed supported by a device. Change max_link_speed_show() to use pcie_get_speed_cap(). Signed-off-by: Tal Gilboa <talgi@mellanox.com> [bhelgaas: return speed directly instead of error and *speed, don't export outside drivers/pci] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2018-03-30blktrace: fix comment in blktrace_api.hSouvik Banerjee
The `__u64 time` field of the blk_io_trace struct refers to the time in nanoseconds, not in microseconds. It is set in __blk_add_trace, which does the following: t->time = ktime_to_ns(ktime_get()); ktime_to_ns returns ktime_t in nanoseconds, not microseconds. Signed-off-by: Souvik Banerjee <souvik1997@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-30rxrpc: Fix leak of rxrpc_peer objectsDavid Howells
When a new client call is requested, an rxrpc_conn_parameters struct object is passed in with a bunch of parameters set, such as the local endpoint to use. A pointer to the target peer record is also placed in there by rxrpc_get_client_conn() - and this is removed if and only if a new connection object is allocated. Thus it leaks if a new connection object isn't allocated. Fix this by putting any peer object attached to the rxrpc_conn_parameters object in the function that allocated it. Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info") Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-30rxrpc: Add a tracepoint to track rxrpc_peer refcountingDavid Howells
Add a tracepoint to track reference counting on the rxrpc_peer struct. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-30rxrpc: Fix apparent leak of rxrpc_local objectsDavid Howells
rxrpc_local objects cannot be disposed of until all the connections that point to them have been RCU'd as a connection object holds refcount on the local endpoint it is communicating through. Currently, this can cause an assertion failure to occur when a network namespace is destroyed as there's no check that the RCU destructors for the connections have been run before we start trying to destroy local endpoints. The kernel reports: rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5} ------------[ cut here ]------------ kernel BUG at ../net/rxrpc/local_object.c:439! Fix this by keeping a count of the live connections and waiting for it to go to zero at the end of rxrpc_destroy_all_connections(). Fixes: dee46364ce6f ("rxrpc: Add RCU destruction for connections and calls") Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-30rxrpc: Add a tracepoint to track rxrpc_local refcountingDavid Howells
Add a tracepoint to track reference counting on the rxrpc_local struct. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-30rxrpc: Fix potential call vs socket/net destruction raceDavid Howells
rxrpc_call structs don't pin sockets or network namespaces, but may attempt to access both after their refcount reaches 0 so that they can detach themselves from the network namespace. However, there's no guarantee that the socket still exists at this point (so sock_net(&call->socket->sk) may be invalid) and the namespace may have gone away if the call isn't pinning a peer. Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b) waiting for all calls to be destroyed when the network namespace goes away. This was detected by checker: net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces) net/rxrpc/call_object.c:634:57: expected struct sock const *sk net/rxrpc/call_object.c:634:57: got struct sock [noderef] <asn:4>*<noident> Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-30rxrpc: Fix checker warnings and errorsDavid Howells
Fix various issues detected by checker. Errors: (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set call->socket. Warnings: (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to trace_rxrpc_conn() as the where argument. (*) rxrpc_disconnect_client_call() should get its net pointer via the call->conn rather than call->sock to avoid a warning about accessing an RCU pointer without protection. (*) Proc seq start/stop functions need annotation as they pass locks between the functions. False positives: (*) Checker doesn't correctly handle of seq-retry lock context balance in rxrpc_find_service_conn_rcu(). (*) Checker thinks execution may proceed past the BUG() in rxrpc_publish_service_conn(). (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in rxkad.c. Signed-off-by: David Howells <dhowells@redhat.com>