summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2016-03-03block: check virt boundary in bio_will_gap()Ming Lei
In the following patch, the way for figuring out the last bvec will be changed with a bit cost introduced, so return immediately if the queue doesn't have virt boundary limit. Actually most of devices have not this limit. Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03block: bio: introduce helpers to get the 1st and last bvecMing Lei
The bio passed to bio_will_gap() may be fast cloned from upper layer(dm, md, bcache, fs, ...), or from bio splitting in block core. Unfortunately bio_will_gap() just figures out the last bvec via 'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously wrong. This patch introduces two helpers for getting the first and last bvec of one bio for fixing the issue. Cc: stable@vger.kernel.org Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mld, igmp: Fix reserved tailroom calculationBenjamin Poirier
The current reserved_tailroom calculation fails to take hlen and tlen into account. skb: [__hlen__|__data____________|__tlen___|__extra__] ^ ^ head skb_end_offset In this representation, hlen + data + tlen is the size passed to alloc_skb. "extra" is the extra space made available in __alloc_skb because of rounding up by kmalloc. We can reorder the representation like so: [__hlen__|__data____________|__extra__|__tlen___] ^ ^ head skb_end_offset The maximum space available for ip headers and payload without fragmentation is min(mtu, data + extra). Therefore, reserved_tailroom = data + extra + tlen - min(mtu, data + extra) = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) Compare the second line to the current expression: reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset) and we can see that hlen and tlen are not taken into account. The min() in the third line can be expanded into: if mtu < skb_tailroom - tlen: reserved_tailroom = skb_tailroom - mtu else: reserved_tailroom = tlen Depending on hlen, tlen, mtu and the number of multicast address records, the current code may output skbs that have less tailroom than dev->needed_tailroom or it may output more skbs than needed because not all space available is used. Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03stmmac: Fix 'eth0: No PHY found' regressionGabriel Fernandez
This patch manages the case when you have an Ethernet MAC with a "fixed link", and not connected to a normal MDIO-managed PHY device. The test of phy_bus_name was not helpful because it was never affected and replaced by the mdio test node. Signed-off-by: Gabriel Fernandez <gabriel.fernandez@linaro.org> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03cgroup: introduce cgroup_{save|propagate|restore}_control()Tejun Heo
While controllers are being enabled and disabled in cgroup_subtree_control_write(), the original subsystem masks are stashed in local variables so that they can be restored if the operation fails in the middle. This patch adds dedicated fields to struct cgroup to be used instead of the local variables and implements functions to stash the current values, propagate the changes and restore them recursively. Combined with the previous changes, this makes subsystem management operations fully recursive and modularlized. This will be used to expand cgroup core functionalities. While at it, remove now unused @css_enable and @css_disable from cgroup_subtree_control_write(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>
2016-03-03cgroup: explicitly track whether a cgroup_subsys_state is visible to userlandTejun Heo
Currently, whether a css (cgroup_subsys_state) has its interface files created is not tracked and assumed to change together with the owning cgroup's lifecycle. cgroup directory and interface creation is being separated out from internal object creation to help refactoring and eventually allow cgroups which are not visible through cgroupfs. This patch adds CSS_VISIBLE to track whether a css has its interface files created and perform management operations only when necessary which helps decoupling interface file handling from internal object lifecycle. After this patch, all css interface file management functions can be called regardless of the current state and will achieve the expected result. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>
2016-03-03mm: Some arch may want to use HPAGE_PMD related values as variablesKirill A. Shutemov
With next generation power processor, we are having a new mmu model [1] that require us to maintain a different linux page table format. Inorder to support both current and future ppc64 systems with a single kernel we need to make sure kernel can select between different page table format at runtime. With the new MMU (radix MMU) added, we will have two different pmd hugepage size 16MB for hash model and 2MB for Radix model. Hence make HPAGE_PMD related values as a variable. Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a followup patch. [1] http://ibm.biz/power-isa3 (Needs registration). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-03hrtimer: Revert CLOCK_MONOTONIC_RAW supportThomas Gleixner
Revert commits: a6e707ddbdf1: KVM: arm/arm64: timer: Switch to CLOCK_MONOTONIC_RAW 9006a01829a5: hrtimer: Catch illegal clockids 9c808765e88e: hrtimer: Add support for CLOCK_MONOTONIC_RAW Marc found out, that there are fundamental issues with that patch series because __hrtimer_get_next_event() and hrtimer_forward() need support for CLOCK_MONOTONIC_RAW. Nothing which is easily fixed, so revert the whole lot. Reported-by: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/56D6CEF0.8060607@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-02time: Add history to cross timestamp interface supporting slower devicesChristopher S. Hall
Another representative use case of time sync and the correlated clocksource (in addition to PTP noted above) is PTP synchronized audio. In a streaming application, as an example, samples will be sent and/or received by multiple devices with a presentation time that is in terms of the PTP master clock. Synchronizing the audio output on these devices requires correlating the audio clock with the PTP master clock. The more precise this correlation is, the better the audio quality (i.e. out of sync audio sounds bad). From an application standpoint, to correlate the PTP master clock with the audio device clock, the system clock is used as a intermediate timebase. The transforms such an application would perform are: System Clock <-> Audio clock System Clock <-> Network Device Clock [<-> PTP Master Clock] Modern Intel platforms can perform a more accurate cross timestamp in hardware (ART,audio device clock). The audio driver requires ART->system time transforms -- the same as required for the network driver. These platforms offload audio processing (including cross-timestamps) to a DSP which to ensure uninterrupted audio processing, communicates and response to the host only once every millsecond. As a result is takes up to a millisecond for the DSP to receive a request, the request is processed by the DSP, the audio output hardware is polled for completion, the result is copied into shared memory, and the host is notified. All of these operation occur on a millisecond cadence. This transaction requires about 2 ms, but under heavier workloads it may take up to 4 ms. Adding a history allows these slow devices the option of providing an ART value outside of the current interval. In this case, the callback provided is an accessor function for the previously obtained counter value. If get_system_device_crosststamp() receives a counter value previous to cycle_last, it consults the history provided as an argument in history_ref and interpolates the realtime and monotonic raw system time using the provided counter value. If there are any clock discontinuities, e.g. from calling settimeofday(), the monotonic raw time is interpolated in the usual way, but the realtime clock time is adjusted by scaling the monotonic raw adjustment. When an accessor function is used a history argument *must* be provided. The history is initialized using ktime_get_snapshot() and must be called before the counter values are read. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> [jstultz: Fixed up cycles_t/cycle_t type confusion] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-03-02time: Add driver cross timestamp interface for higher precision time ↵Christopher S. Hall
synchronization ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner <tglx@linutronix.de>. It has changed considerably and any mistakes are mine. The precision with which events on multiple networked systems can be synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited by the precision of the cross timestamps between the system clock and the device (timestamp) clock. Precision here is the degree of simultaneity when capturing the cross timestamp. Currently the PTP cross timestamp is captured in software using the PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are interleaved with reads of the realtime clock. At best, the precision of this cross timestamp is on the order of several microseconds due to software latencies. Sub-microsecond precision is required for industrial control and some media applications. To achieve this level of precision hardware supported cross timestamping is needed. The function get_device_system_crosstimestamp() allows device drivers to return a cross timestamp with system time properly scaled to nanoseconds. The realtime value is needed to discipline that clock using PTP and the monotonic raw value is used for applications that don't require a "real" time, but need an unadjusted clock time. The get_device_system_crosstimestamp() code calls back into the driver to ensure that the system counter is within the current timekeeping update interval. Modern Intel hardware provides an Always Running Timer (ART) which is exactly related to TSC through a known frequency ratio. The ART is routed to devices on the system and is used to precisely and simultaneously capture the device clock with the ART. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> [jstultz: Reworked to remove extra structures and simplify calling] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-03-02time: Remove duplicated code in ktime_get_raw_and_real()Christopher S. Hall
The code in ktime_get_snapshot() is a superset of the code in ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is called only by the PPS code, pps_get_ts(). Consolidate the pps_get_ts() code into a single function calling ktime_get_snapshot() and eliminate ktime_get_raw_and_real(). A side effect of this is that the raw and real results of pps_get_ts() correspond to exactly the same clock cycle. Previously these values represented separate reads of the system clock. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-03-02time: Add timekeeping snapshot code capturing system time and counterChristopher S. Hall
In the current timekeeping code there isn't any interface to atomically capture the current relationship between the system counter and system time. ktime_get_snapshot() returns this triple (counter, monotonic raw, realtime) in the system_time_snapshot struct. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> [jstultz: Moved structure definitions around to clean things up, fixed cycles_t/cycle_t confusion.] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-03-02Input: rotary_encoder - move away from platform data structureDmitry Torokhov
Drop support for platform data passed via a C-structure and switch to device properties instead, which should make the driver compatible with all platforms: OF, ACPI and static boards. Static boards should use property sets to communicate device parameters to the driver. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-03-02net/mlx5e: Fix ethtool RX hash func configuration changeTariq Toukan
We should modify TIRs explicitly to apply the new RSS configuration. The light ndo close/open calls do not "refresh" them. Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02stmmac: rework DMA bus setting and introduce new platform AXI structureGiuseppe Cavallaro
This patch restructures the DMA bus settings and this is done by introducing a new platform structure used for programming the AXI Bus Mode Register inside the DMA module. This structure can be populated from device-tree as documented in the binding txt file. After initializing the DMA, the AXI register can be optionally tuned for platform drivers based. This patch also reworks some parameters to make coherent the DMA configuration now that AXI register is introduced. For example, the burst_len is managed by using the mentioned axi support above; so the snps,burst-len parameter has been removed. It makes sense to provide the AAL parameter from DT to Address-Aligned Beats inside the Register0 and review the PBL settings when initialize the engine. For PCI glue, rebuilding the story of this setting, it was added to align a configuration so not for fixing some known problem. No issue raised after this patch. It is safe to use the default burst length instead of tuning it to the maximum value Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02netfilter: don't call hooks unless neededFlorian Westphal
With the previous patches in place, a netns nf_hook_list might be empty, even if e.g. init_net performs filtering. Thus change nf_hook_thresh to check the hook_list as well before initializing hook_state and calling nf_hook_slow(). We still make use of static keys; if no netfilter modules are loaded list is guaranteed to be empty. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-02netfilter: xtables: don't hook tables by defaultFlorian Westphal
delay hook registration until the table is being requested inside a namespace. Historically, a particular table (iptables mangle, ip6tables filter, etc) was registered on module load. When netns support was added to iptables only the ip/ip6tables ruleset was made namespace aware, not the actual hook points. This means f.e. that when ipt_filter table/module is loaded on a system, then each namespace on that system has an (empty) iptables filter ruleset. In other words, if a namespace sends a packet, such skb is 'caught' by netfilter machinery and fed to hooking points for that table (i.e. INPUT, FORWARD, etc). Thanks to Eric Biederman, hooks are no longer global, but per namespace. This means that we can avoid allocation of empty ruleset in a namespace and defer hook registration until we need the functionality. We register a tables hook entry points ONLY in the initial namespace. When an iptables get/setockopt is issued inside a given namespace, we check if the table is found in the per-namespace list. If not, we attempt to find it in the initial namespace, and, if found, create an empty default table in the requesting namespace and register the needed hooks. Hook points are destroyed only once namespace is deleted, there is no 'usage count' (it makes no sense since there is no 'remove table' operation in xtables api). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-02netfilter: xtables: prepare for on-demand hook registerFlorian Westphal
This change prepares for upcoming on-demand xtables hook registration. We change the protoypes of the register/unregister functions. A followup patch will then add nf_hook_register/unregister calls to the iptables one. Once a hook is registered packets will be picked up, so all assignments of the form net->ipv4.iptable_$table = new_table have to be moved to ip(6)t_register_table, else we can see NULL net->ipv4.iptable_$table later. This patch doesn't change functionality; without this the actual change simply gets too big. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-02Input: rotary_encoder - convert to use gpiod APIDmitry Torokhov
Instead of using old GPIO API, let's switch to GPIOD API, which automatically handles polarity. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-03-02Merge tag 'gpmc-omap-for-v4.6' of https://github.com/rogerq/linux into ↵Arnd Bergmann
next/drivers Merge "OMAP-GPMC: Add address/data muxed timings" from Roger Quadros: * Add support for the GPMC Advanced AAD (address/data muxed) timings on hardware supporting the feature like the AM335x and DM816X. * tag 'gpmc-omap-for-v4.6' of https://github.com/rogerq/linux: dt-bindings: bus: ti-gpmc: Add AAD timings properties memory: omap-gpmc: Add support for AAD timings
2016-03-02posix-cpu-timers: Migrate to use new tick dependency mask modelFrederic Weisbecker
Instead of providing asynchronous checks for the nohz subsystem to verify posix cpu timers tick dependency, migrate the latter to the new mask. In order to keep track of the running timers and expose the tick dependency accordingly, we must probe the timers queuing and dequeuing on threads and process lists. Unfortunately it implies both task and signal level dependencies. We should be able to further optimize this and merge all that on the task level dependency, at the cost of a bit of complexity and may be overhead. Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02sched: Migrate sched to use new tick dependency mask modelFrederic Weisbecker
Instead of providing asynchronous checks for the nohz subsystem to verify sched tick dependency, migrate sched to the new mask. Everytime a task is enqueued or dequeued, we evaluate the state of the tick dependency on top of the policy of the tasks in the runqueue, by order of priority: SCHED_DEADLINE: Need the tick in order to periodically check for runtime SCHED_FIFO : Don't need the tick (no round-robin) SCHED_RR : Need the tick if more than 1 task of the same priority for round robin (simplified with checking if more than one SCHED_RR task no matter what priority). SCHED_NORMAL : Need the tick if more than 1 task for round-robin. We could optimize that further with one flag per sched policy on the tick dependency mask and perform only the checks relevant to the policy concerned by an enqueue/dequeue operation. Since the checks aren't based on the current task anymore, we could get rid of the task switch hook but it's still needed for posix cpu timers. Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02perf: Migrate perf to use new tick dependency mask modelFrederic Weisbecker
Instead of providing asynchronous checks for the nohz subsystem to verify perf event tick dependency, migrate perf to the new mask. Perf needs the tick for two situations: 1) Freq events. We could set the tick dependency when those are installed on a CPU context. But setting a global dependency on top of the global freq events accounting is much easier. If people want that to be optimized, we can still refine that on the per-CPU tick dependency level. This patch dooesn't change the current behaviour anyway. 2) Throttled events: this is a per-cpu dependency. Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02nohz: Use enum code for tick stop failure tracing messageFrederic Weisbecker
It makes nohz tracing more lightweight, standard and easier to parse. Examples: user_loop-2904 [007] d..1 517.701126: tick_stop: success=1 dependency=NONE user_loop-2904 [007] dn.1 518.021181: tick_stop: success=0 dependency=SCHED posix_timers-6142 [007] d..1 1739.027400: tick_stop: success=0 dependency=POSIX_TIMER user_loop-5463 [007] dN.1 1185.931939: tick_stop: success=0 dependency=PERF_EVENTS Suggested-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02nohz: New tick dependency maskFrederic Weisbecker
The tick dependency is evaluated on every IRQ and context switch. This consists is a batch of checks which determine whether it is safe to stop the tick or not. These checks are often split in many details: posix cpu timers, scheduler, sched clock, perf events.... each of which are made of smaller details: posix cpu timer involves checking process wide timers then thread wide timers. Perf involves checking freq events then more per cpu details. Checking these informations asynchronously every time we update the full dynticks state bring avoidable overhead and a messy layout. Let's introduce instead tick dependency masks: one for system wide dependency (unstable sched clock, freq based perf events), one for CPU wide dependency (sched, throttling perf events), and task/signal level dependencies (posix cpu timers). The subsystems are responsible for setting and clearing their dependency through a set of APIs that will take care of concurrent dependency mask modifications and kick targets to restart the relevant CPU tick whenever needed. This new dependency engine stays beside the old one until all subsystems having a tick dependency are converted to it. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02virtio: Add improved queue allocation APIAndy Lutomirski
This leaves vring_new_virtqueue alone for compatbility, but it adds two new improved APIs: vring_create_virtqueue: Creates a virtqueue backed by automatically allocated coherent memory. (Some day it this could be extended to support non-coherent memory, too, if there ends up being a platform on which it's worthwhile.) __vring_new_virtqueue: Creates a virtqueue with a manually-specified layout. This should allow mic_virtio to work much more cleanly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02dma: Provide simple noop dma opsChristian Borntraeger
We are going to require dma_ops for several common drivers, even for systems that do have an identity mapping. Lets provide some minimal no-op dma_ops that can be used for that purpose. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02regulator: helper: Add helper to configure active-discharge using regmapLaxman Dewangan
Add helper function to set the state of active-discharge of regulator using regmap. The HW regulator driver can directly use this by providing the necessary information in the regulator descriptor. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-03-02regulator: core: Add support for active-discharge configurationLaxman Dewangan
Add support to enable/disable active discharge of regulator via machine constraints. This configuration is done when setting machine constraint during regulator register and if regulator driver implemented the callback ops. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-03-01Drivers: hv: util: Pass the channel information during the init callK. Y. Srinivasan
Pass the channel information to the util drivers that need to defer reading the channel while they are processing a request. This would address the following issue reported by Vitaly: Commit 3cace4a61610 ("Drivers: hv: utils: run polling callback always in interrupt context") removed direct *_transaction.state = HVUTIL_READY assignments from *_handle_handshake() functions introducing the following race: if a userspace daemon connects before we get first non-negotiation request from the server hv_poll_channel() won't set transaction state to HVUTIL_READY as (!channel) condition will fail, we set it to non-NULL on the first real request from the server. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01misc: at24: replace memory_accessor with nvmem_device_readAndrew Lunn
Now that the AT24 uses the NVMEM framework, replace the memory_accessor in the setup() callback with nvmem API calls. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01eeprom: at25: Remove in kernel API for accessing the EEPROMAndrew Lunn
The setup() callback is not used by any in kernel code. Remove it. Any new code which requires access to the eeprom can use the NVMEM API. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01nvmem: Add backwards compatibility support for older EEPROM drivers.Andrew Lunn
Older drivers made an 'eeprom' file available in the /sys device directory. Have the NVMEM core provide this to retain backwards compatibility. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01nvmem: Add flag to export NVMEM to root onlyAndrew Lunn
Legacy AT24, AT25 EEPROMs are exported in sys so that only root can read the contents. The EEPROMs may contain sensitive information. Add a flag so the provide can indicate that NVMEM should also restrict access to root only. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01Merge tag 'extcon-next-for-4.6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-testing Chanwoo writes: Update extcon for 4.6 Detailed description for patchset: 1. Add new EXTCON_CHG_USB_SDP type - SDP (Standard Downstream Port) USB Charging Port means the charging connector.a 2. Add the VBUS detection by using GPIO on extcon-palmas - Beaglex15 board uses the extcon-palmas driver But, beaglex15 board need the GPIO support for VBUS detection. 3. Fix the minor issue of extcon drivers
2016-03-01Merge 4.5-rc6 into char-misc-nextGreg Kroah-Hartman
We want the fixes in here, and others are sending us pull requests based on this kernel tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01Merge tag 'iio-for-4.6c' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: Third set of IIO new device support, features and cleanups for the 4.6 cycle. Good to see several new contributors in this set - and more generally a number of new 'faces' over this whole cycle. Staging movements * hmc5843 - out of staging. * periodic RTC trigger - driver dropped. This is an ancient driver (brings back some memories ;) that was always somewhat of a bodge. Originally there was a driver that never went into mainline that supported large numbers of periodict timers on the PXA270 via this route. Discussions to have a generic periodic timer subsystem never went anywhere. At the time RTC periodic interrupts were real - now they are emulated using high resolution timers so with the HRT driver this has become pointless. New device support * mpu6050 driver - Add support for the mpu6500. * TI tpl0102 potentiometer - new driver. * Vybrid SoC DAC - new driver. The ADC on this SoC has been supported for a while, this adds a separate driver for the DAC. New Features * hmc5844 - Attributes to configure the bias current (typically part of a self test) This could be done before via a somewhat obscure custom interface. This at least makes it easy to tell what is going on. - Document all custom attributes. * mpu6050 - Add support for calibration offset control and readback. * ms5611 - power regulator support. This is always one that gets added the first time someone has a board that needs it. Here it was needed, hence it was added. Cleanups / minor fixes * tree wide - clean up all the myriad different return values in response to a failure of i2c_check_functionality. After discussions everyone seemed happy wiht -EOPNOTSUPP which seems to describe the situation well. I encouraged a tree wide cleanup to set a good example in future for this. * core - Typos in the iio_event_spec documentation in iio.h * afe4403 - select REGMAP_SPI to avoid dependency issues - mark suspend/resume as __maybe_unused to avoid warnings * afe4404 - mark suspend/resume as __maybe_unused to avoid warnings * atlas-ph-sensor - switch the regmap cache type from linear to rbtree to gain reading of registers on initial startup. It's not immediately obvious, but regmap flat is meant for high performances cases so doesn't read these registers. - use regmap_bulk_read in one case where it was using i2c_smbus_read_i2c_block_data directly (unlike everything else that was through regmap). * ina2xx - stype cleanups (lots of them!) * isl29018 - Get the struct device back from regmap rather than storing another copy of it in the private data. This cleanup makes sense in a number of other drivers so patches may well follow. * mpu6050 - style cleanups (lots of them!) - improved return value handling - use usleep_range to avoid the usual issues with very short msleeps. - add some missing documentation. * ms5611 - use the probed device name for the device rather than the driver name. - select IIO_BUFFER to avoid dependency issues * palmas - drop IRQF_EARLY_RESUME as no longer needed after genirq changes.
2016-03-01Merge 4.5-rc6 into usb-nextGreg Kroah-Hartman
We want the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01Merge 4.5-rc6 into staging-nextGreg Kroah-Hartman
We want the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull d_inode/d_flags race fix from Al Viro. I love this fix. Not only does it fix the race in the dentry type handling, it entirely gets rid of the nasty and subtle memory ordering rules for d_type and d_inode, and replaces them with the basic dentry locking rules (sequence numbers under RCU, d_lock elsewhere). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use ->d_seq to get coherency between ->d_inode and ->d_flags
2016-03-01qed: Semantic refactoring of interrupt codeYuval Mintz
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: remove skb_sender_cpu_clear()WANG Cong
After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net/mlx5: Fix global UAR mappingMoshe Lazer
Avoid double mapping of io mapped memory, Device page may be mapped to non-cached(NC) or to write-combining(WC). The code before this fix tries to map it both to WC and NC contrary to what stated in Intel's software developer manual. Here we remove the global WC mapping of all UARS "dev->priv.bf_mapping", since UAR mapping should be decided per UAR (e.g we want different mappings for EQs, CQs vs QPs). Caller will now have to choose whether to map via write-combining API or not. mlx5e SQs will choose write-combining in order to perform BlueFlame writes. Fixes: 88a85f99e51f ('TX latency optimization to save DMA reads') Signed-off-by: Moshe Lazer <moshel@mellanox.com> Reviewed-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net/mlx5: Make command timeout way shorterOr Gerlitz
The command timeout is terribly long, whole two hours. Make it 60s so if things do go wrong, the user gets feedback in relatively short time, so they can take corrective actions and/or investigate using tools and such. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Merge tag 'mac80211-next-for-davem-2016-02-26' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Here's another round of updates for -next: * big A-MSDU RX performance improvement (avoid linearize of paged RX) * rfkill changes: cleanups, documentation, platform properties * basic PBSS support in cfg80211 * MU-MIMO action frame processing support * BlockAck reordering & duplicate detection offload support * various cleanups & little fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01mlx4: Implement devlink interfaceJiri Pirko
Implement newly introduced devlink interface. Add devlink port instances for every port and set the port types accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> v2->v3: -add dev param to devlink_register (api change) Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01svcrdma: Use new CQ API for RPC-over-RDMA server send CQsChuck Lever
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Use new CQ API for RPC-over-RDMA server receive CQsChuck Lever
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Use correct XID in error repliesChuck Lever
When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Make RDMA_ERROR messages workChuck Lever
Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>