summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2017-04-28Merge branch 'pci/host-imx6' into nextBjorn Helgaas
* pci/host-imx6: PCI: imx6: Fix spelling mistake: "contol" -> "control" PCI: imx6: Do not switch speed if Gen2 is disabled PCI: imx6: Do not wait for speed change on i.MX7 PCI: imx6: Allow probe deferral by reset GPIO PCI: imx6: Add code to support i.MX7D
2017-04-28PCI: Add device IDs for DRA74x and DRA72xKishon Vijay Abraham I
Add device IDs for DRA74x and DRA72x devices. These devices have configurable PCI endpoint. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-28blk-mq: unify hctx delay_work and run_workJens Axboe
The only difference between ->run_work and ->delay_work, is that the latter is used to defer running a queue. This is done by marking the queue stopped, and scheduling ->delay_work to run sometime in the future. While the queue is stopped, direct runs or runs through ->run_work will not run the queue. If we combine the handlers, then we need to handle two things: 1) If a delayed/stopped run is scheduled, then we should not run the queue before that has been completed. 2) If a queue is delayed/stopped, the handler needs to restart the queue. Normally a run of a queue with the stopped bit set would be a no-op. Case 1 is handled by modifying a currently pending queue run to the deadline set by the caller of blk_mq_delay_queue(). Subsequent attempts to queue a queue run will find the work item already pending, and direct runs will see a stopped queue as before. Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN, that tells the work handler that it should clear a stopped queue and run the handler. Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-28block: add kblock_mod_delayed_work_on()Jens Axboe
This modifies (or adds, if not currently pending) an existing delayed work item. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-28blk-mq: unify hctx delayed_run_work and run_workJens Axboe
They serve the exact same purpose. Get rid of the non-delayed work variant, and just run it without delay for the normal case. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-28mac80211: Add support for BSS max idle period elementAvraham Stern
Parse the BSS max idle period element and set the BSS configuration accordingly so the driver can use this information to configure the max idle period and to use protected management frames for keep alive when required. The BSS max idle period element is defined in IEEE802.11-2016, section 9.4.2.79 Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-28Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-nextPaul Mackerras
This merges in the powerpc topic/xive branch to bring in the code for the in-kernel XICS interrupt controller emulation to use the new XIVE (eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip directly, rather than via a XICS emulation in firmware. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-27NFSv4: Fix callback server shutdownTrond Myklebust
We want to use kthread_stop() in order to ensure the threads are shut down before we tear down the nfs_callback_info in nfs_callback_down. Tested-and-reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Reported-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: bb6aeba736ba9 ("NFSv4.x: Switch to using svc_set_num_threads()...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-27dm: introduce enum dm_queue_mode to cleanup related codeBart Van Assche
Introduce an enumeration type for the queue mode. This patch does not change any functionality but makes the DM code easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27bridge: add per-port broadcast flood flagMike Manning
Support for l2 multicast flood control was added in commit b6cb5ac8331b ("net: bridge: add per-port multicast flood flag"). It allows broadcast as it was introduced specifically for unknown multicast flood control. But as broadcast is a special case of multicast, this may also need to be disabled. For this purpose, introduce a flag to disable the flooding of received l2 broadcasts. This approach is backwards compatible and provides flexibility in filtering for the desired packet types. Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Mike Manning <mmanning@brocade.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27net: update comment for netif_dormant() functionZhang Shengju
This patch updates the comment for netif_dormant() function to reflect the intended usage. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27rhashtable: Cap total number of entries to 2^31Herbert Xu
When max_size is not set or if it set to a sufficiently large value, the nelems counter can overflow. This would cause havoc with the automatic shrinking as it would then attempt to fit a huge number of entries into a tiny hash table. This patch fixes this by adding max_elems to struct rhashtable to cap the number of elements. This is set to 2^31 as nelems is not a precise count. This is sufficiently smaller than UINT_MAX that it should be safe. When max_size is set max_elems will be lowered to at most twice max_size as is the status quo. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27KVM: mark requests that need synchronizationPaolo Bonzini
kvm_make_all_requests() provides a synchronization that waits until all kicked VCPUs have acknowledged the kick. This is important for KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is underway. This patch adds the synchronization property into all requests that are currently being used with kvm_make_all_requests() in order to preserve the current behavior and only introduce a new framework. Removing it from requests where it is not necessary is left for future patches. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-27KVM: return if kvm_vcpu_wake_up() did wake up the VCPURadim Krčmář
No need to kick a VCPU that we have just woken up. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-27KVM: add explicit barrier to kvm_vcpu_kickAndrew Jones
kvm_vcpu_kick() must issue a general memory barrier prior to reading vcpu->mode in order to ensure correctness of the mutual-exclusion memory barrier pattern used with vcpu->requests. While the cmpxchg called from kvm_vcpu_kick(): kvm_vcpu_kick kvm_arch_vcpu_should_kick kvm_vcpu_exiting_guest_mode cmpxchg implies general memory barriers before and after the operation, that implication is only valid when cmpxchg succeeds. We need an explicit barrier for when it fails, otherwise a VCPU thread on its entry path that reads zero for vcpu->requests does not exclude the possibility the requesting thread sees !IN_GUEST_MODE when it reads vcpu->mode. kvm_make_all_cpus_request already had a barrier, so we remove it, as now it would be redundant. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-27KVM: mark requests that do not need a wakeupRadim Krčmář
Some operations must ensure that the guest is not running with stale data, but if the guest is halted, then the update can wait until another event happens. kvm_make_all_requests() currently doesn't wake up, so we can mark all requests used with it. First 8 bits were arbitrarily reserved for request numbers. Most uses of requests have the request type as a constant, so a compiler will optimize the '&'. An alternative would be to have an inline function that would return whether the request needs a wake-up or not, but I like this one better even though it might produce worse assembly. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-27KVM: add kvm_{test,clear}_request to replace {test,clear}_bitRadim Krčmář
Users were expected to use kvm_check_request() for testing and clearing, but request have expanded their use since then and some users want to only test or do a faster clear. Make sure that requests are not directly accessed with bit operations. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-27KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-27mfd: axp20x: Support AXP803 variantIcenowy Zheng
AXP803 is a new PMIC chip produced by X-Powers, usually paired with A64 via RSB bus. The PMIC itself is like AXP288, but with RSB support and dedicated VBUS and ACIN. Add support for it in the axp20x mfd driver. Currently only power key function is supported. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27iommu: Move report_iommu_fault() to iommu.cJoerg Roedel
The function is in no fast-path, there is no need for it to be static inline in a header file. This also removes the need to include iommu trace-points in iommu.h. Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-27iommu: Include device.h in iommu.hJoerg Roedel
We make use of 'struct device' in iommu.h, so include device.h to make it available explicitly. Re-order the other headers while at it. Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-27fs: add a VALID_OPEN_FLAGSChristoph Hellwig
Add a central define for all valid open flags, and use it in the uniqueness check. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-27mfd: exynos-lpass: Remove pad retention controlMarek Szyprowski
Pad retention should be controlled from pin control driver, so remove it from Exynos LPASS driver. After this change, no more access to PMU regmap is needed, so remove also the code for handling PMU regmap. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Acked-by: Rob Herring <robh@kernel.org> Acked-for-MFD-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: Add support for DA9061Steve Twiss
MFD support for DA9061 is provided as part of the DA9062 device driver. The registers header file adds two new chip variant IDs defined in DA9061 and DA9062 hardware. The core header file adds new software enumerations for listing the valid DA9061 IRQs and a da9062_compatible_types enumeration for distinguishing between DA9061/62 devices in software. The core source code adds a new .compatible of_device_id entry. This is extended from DA9062 to support both "dlg,da9061" and "dlg,da9062". The .data entry now holds a reference to the enumerated device type. A new regmap_irq_chip model is added for DA9061 and this supports the new list of regmap_irq entries. A new mfd_cell da9061_devs[] array lists the new sub system components for DA9061. Support is added for a new DA9061 regmap_config which lists the correct readable, writable and volatile ranges for this chip. The probe function uses the device tree compatible string to switch on the da9062_compatible_types and configure the correct mfd cells, irq chip and regmap config. Kconfig is updated to reflect support for DA9061 and DA9062 PMICs. Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: syscon: atmel-smc: Remove unused helpers/macrosBoris Brezillon
All macros prefixed with AT91[SAM9]_SMC have been replaced by equivalent definitions prefixed with ATMEL_SMC, and the at91sam9_smc_xxxx() helpers are no longer used. Drop these definitions before someone starts using them again. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: syscon: atmel-smc: Add new helpers to ease SMC regs manipulationBoris Brezillon
These new helpers + macro definitions are meant to replace the old ones which are unpractical to use. Note that the macros and function prefixes have been intentionally changed to ATMEL_[H]SMC_XX and atmel_[h]smc_ to reflect the fact that this IP is also embedded in avr32 SoCs (and not only in at91 ones). Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: intel_soc_pmic_bxtwc: Rename header to follow c-fileAndy Shevchenko
For better understanding of relationship between headers and modules rename: intel_bxtwc.h -> intel_soc_pmic_bxtwc.h While here, remove file name from the file itself. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: intel_soc_pmic_bxtwc: Move inclusion to c-fileAndy Shevchenko
There is no need to include intel_soc_pmic.h into header which doesn't require it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: axp20x: Correct name of temperature data ADC registersQuentin Schulz
The registers 0x56 and 0x57 of AXP22X PMIC store the value of the internal temperature of the PMIC. This patch modifies the name of these registers from AXP22X_PMIC_ADC_H/L to AXP22X_PMIC_TEMP_H/L so their purpose is clearer. Signed-off-by: Quentin Schulz <quentin.schulz@free-electrons.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: Add TI LMU driverMilo Kim
TI LMU (Lighting Management Unit) driver supports lighting devices below. LM3532, LM3631, LM3632, LM3633, LM3695 and LM3697. LMU devices have common features. - I2C interface for accessing device registers - Hardware enable pin control - Backlight brightness control - Notifier for hardware fault monitoring - Regulators for LCD display bias It contains fault monitor, backlight, LED and regulator driver. LMU fault monitor ----------------- LM3633 and LM3697 provide hardware monitoring feature. It enables open or short circuit detection. After monitoring is done, each device should be re-initialized. Notifier is used for this case. Separate patch for 'ti-lmu-fault-monitor' will be sent later. Backlight --------- It's handled by TI LMU backlight consolidated driver and chip dependent data. Separate patchset will be sent later. LED indicator ------------- LM3633 has 6 indicator LEDs. Programmable dimming pattern is also supported. Separate patch for 'leds-lm3633' will be sent later. Regulator --------- LM3631 has 5 regulators for the display bias. LM3632 supports 3 regulators. One consolidated driver enables it. The lm363x regulator driver is already upstreamed. Signed-off-by: Milo Kim <milo.kim@ti.com> Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27mfd: cros_ec: Add ACPI GPE handler for LID0 devicesArchana Patni
This patch installs an ACPI GPE handler for LID0 ACPI device to indicate ACPI core that this GPE should stay enabled for lid to work in suspend to idle path. Signed-off-by: Archana Patni <archana.patni@intel.com> Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-04-27Merge branches 'ib-mfd-gpio-4.12', 'ib-mfd-iio-input-4.12', ↵Lee Jones
'ib-mfd-input-4.12', 'ib-mfd-leds-4.12', 'ib-mfd-phy-4.12' and 'ib-mfd-pinctrl-samsung-4.12' into ibs-for-mfd-merged
2017-04-26fs: remove _submit_bh()Eric Biggers
_submit_bh() allowed submitting a buffer_head for I/O using custom bio_flags. It used to be used by jbd to set BIO_SNAP_STABLE, introduced by commit 713685111774 ("mm: make snapshotting pages for stable writes a per-bio operation"). However, the code and flag has since been removed and no _submit_bh() users remain. These days, bio_flags are mostly used internally by the block layer to track the state of bio's. As such, it doesn't really make sense for filesystems to use them instead of op_flags when wanting special behavior for block requests. Therefore, remove _submit_bh() and trim the bio_flags argument from submit_bh_wbc(). Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26fs: constify tree_descr arrays passed to simple_fill_super()Eric Biggers
simple_fill_super() is passed an array of tree_descr structures which describe the files to create in the filesystem's root directory. Since these arrays are never modified intentionally, they should be 'const' so that they are placed in .rodata and benefit from memory protection. This patch updates the function signature and all users, and also constifies tree_descr.name. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26fs: drop duplicate header percpu-rwsem.hGeliang Tang
Drop duplicate header percpu-rwsem.h from linux/fs.h. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26fs/affs: import amigaffs.hFabian Frederick
Have that file in global include/linux is not needed. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26srcu: Specify auto-expedite holdoff timePaul E. McKenney
On small systems, in the absence of readers, expedited SRCU grace periods can complete in less than a microsecond. This means that an eight-CPU system can have all CPUs doing synchronize_srcu() in a tight loop and almost always expedite. This might actually be desirable in some situations, but in general it is a good way to needlessly burn CPU cycles. And in those situations where it is desirable, your friend is the function synchronize_srcu_expedited(). For other situations, this commit adds a kernel parameter that specifies a holdoff between completing the last SRCU grace period and auto-expediting the next. If the next grace period starts before the holdoff expires, auto-expediting is disabled. The holdoff is 50 microseconds by default, and can be tuned to the desired number of nanoseconds. A value of zero disables auto-expediting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26srcu: Expedited grace periods with reduced memory contentionPaul E. McKenney
Commit f60d231a87c5 ("srcu: Crude control of expedited grace periods") introduced a per-srcu_struct atomic counter to track outstanding requests for grace periods. This works, but represents a memory-contention bottleneck. This commit therefore uses the srcu_node combining tree to remove this bottleneck. This commit adds new ->srcu_gp_seq_needed_exp fields to the srcu_data, srcu_node, and srcu_struct structures, which track the farthest-in-the-future grace period that must be expedited, which in turn requires that all nearer-term grace periods also be expedited. Requests for expediting start with the srcu_data structure, run up through the srcu_node tree, and end at the srcu_struct structure. Note that it may be necessary to expedite a grace period that just now started, and this is handled by a new srcu_funnel_exp_start() function, which is invoked when the grace period itself is already in its way, but when that grace period was not marked as expedited. A new srcu_get_delay() function returns zero if there is at least one expedited SRCU grace period in flight, or SRCU_INTERVAL otherwise. This function is used to calculate delays: Normal grace periods are allowed to extend in order to cover more requests with a given grace-period computation, which decreases per-request overhead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26x86, iommu/vt-d: Add an option to disable Intel IOMMU force onShaohua Li
IOMMU harms performance signficantly when we run very fast networking workloads. It's 40GB networking doing XDP test. Software overhead is almost unaware, but it's the IOTLB miss (based on our analysis) which kills the performance. We observed the same performance issue even with software passthrough (identity mapping), only the hardware passthrough survives. The pps with iommu (with software passthrough) is only about ~30% of that without it. This is a limitation in hardware based on our observation, so we'd like to disable the IOMMU force on, but we do want to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not eabling IOMMU is totally ok. So introduce a new boot option to disable the force on. It's kind of silly we need to run into intel_iommu_init even without force on, but we need to disable TBOOT PMR registers. For system without the boot option, nothing is changed. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-26ieee80211: fix kernel-doc parsing errorsJohannes Berg
Some of the enum definitions are unnamed but there's still an attempt at documenting them - that doesn't work. Name them to make that work. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-26ieee80211: add FT-802.1X AKM suite selectorLuca Coelho
Add the definition for FT-8021.1X AKM selector as defined in IEEE Std 802.11-2016, table 9-133. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-26ieee80211: add SUITE_B AKM selectorsLuca Coelho
Add the definitions for SUITE_B and SUITE_B_192 AKM selectors as defined in IEEE802.11REVmc_D5.0, table 9-132. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-26blk-mq: Add blk_mq_ops.show_rq()Bart Van Assche
This new callback function will be used in the next patch to show more information about SCSI requests. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26tcp: switch rcv_rtt_est and rcvq_space to high resolution timestampsEric Dumazet
Some devices or distributions use HZ=100 or HZ=250 TCP receive buffer autotuning has poor behavior caused by this choice. Since autotuning happens after 4 ms or 10 ms, short distance flows get their receive buffer tuned to a very high value, but after an initial period where it was frozen to (too small) initial value. With tp->tcp_mstamp introduction, we can switch to high resolution timestamps almost for free (at the expense of 8 additional bytes per TCP structure) Note that some TCP stacks use usec TCP timestamps where this patch makes even more sense : Many TCP flows have < 500 usec RTT. Hopefully this finer TS option can be standardized soon. Tested: HZ=100 kernel ./netperf -H lpaa24 -t TCP_RR -l 1000 -- -r 10000,10000 & Peer without patch : lpaa24:~# ss -tmi dst lpaa23 ... skmem:(r0,rb8388608,...) rcv_rtt:10 rcv_space:3210000 minrtt:0.017 Peer with the patch : lpaa23:~# ss -tmi dst lpaa24 ... skmem:(r0,rb428800,...) rcv_rtt:0.069 rcv_space:30000 minrtt:0.017 We can see saner RCVBUF, and more precise rcv_rtt information. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26tcp: add tp->tcp_mstamp fieldEric Dumazet
We want to use precise timestamps in TCP stack, but we do not want to call possibly expensive kernel time services too often. tp->tcp_mstamp is guaranteed to be updated once per incoming packet. We will use it in the following patches, removing specific skb_mstamp_get() calls, and removing ack_time from struct tcp_sacktag_state. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26rhashtable: remove insecure_max_entries paramFlorian Westphal
no users in the tree, insecure_max_entries is always set to ht->p.max_size * 2 in rhtashtable_init(). Replace only spot that uses it with a ht->p.max_size check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26net: phy: fix auto-negotiation stall due to unavailable interruptAlexander Kochetkov
The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet cable was plugged before the Ethernet interface was brought up. The patch trigger PHY state machine to update link state if PHY was requested to do auto-negotiation and auto-negotiation complete flag already set. During power-up cycle the PHY do auto-negotiation, generate interrupt and set auto-negotiation complete flag. Interrupt is handled by PHY state machine but doesn't update link state because PHY is in PHY_READY state. After some time MAC bring up, start and request PHY to do auto-negotiation. If there are no new settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation. PHY continue to stay in auto-negotiation complete state and doesn't fire interrupt. At the same time PHY state machine expect that PHY started auto-negotiation and is waiting for interrupt from PHY and it won't get it. Fixes: 321beec5047a ("net: phy: Use interrupts when available in NOLINK state") Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Cc: stable <stable@vger.kernel.org> # v4.9+ Tested-by: Roger Quadros <rogerq@ti.com> Tested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26srcu: Make rcutorture writer stalls print SRCU GP statePaul E. McKenney
In the past, SRCU was simple enough that there was little point in making the rcutorture writer stall messages print the SRCU grace-period number state. With the advent of Tree SRCU, this has changed. This commit therefore makes Classic, Tiny, and Tree SRCU report this state to rcutorture as needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26srcu: Exact tracking of srcu_data structures containing callbacksPaul E. McKenney
The current Tree SRCU implementation schedules a workqueue for every srcu_data covered by a given leaf srcu_node structure having callbacks, even if only one of those srcu_data structures actually contains callbacks. This is clearly inefficient for workloads that don't feature callbacks everywhere all the time. This commit therefore adds an array of masks that are used by the leaf srcu_node structures to track exactly which srcu_data structures contain callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>