summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-05-03irqchip/gic-v2: Parse and export virtual GIC informationJulien Grall
For now, the firmware tables are parsed 2 times: once in the GIC drivers, the other timer when initializing the vGIC. It means code duplication and make more tedious to add the support for another firmware table (like ACPI). Introduce a new structure and set of helpers to get/set the virtual GIC information. Also fill up the structure for GICv2. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03clocksource: arm_arch_timer: Extend arch_timer_kvm_info to get the virtual IRQJulien Grall
Currently, the firmware table is parsed by the virtual timer code in order to retrieve the virtual timer interrupt. However, this is already done by the arch timer driver. To avoid code duplication, extend arch_timer_kvm_info to get the virtual IRQ. Note that the KVM code will be modified in a subsequent patch. Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03clocksource: arm_arch_timer: Gather KVM specific information in a structureJulien Grall
Introduce a structure which are filled up by the arch timer driver and used by the virtual timer in KVM. The first member of this structure will be the timecounter. More members will be added later. A stub for the new helper isn't introduced because KVM requires the arch timer for both ARM64 and ARM32. The function arch_timer_get_timecounter is kept for the time being and will be dropped in a subsequent patch. Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03signals/sigaltstack: Implement SS_AUTODISARM flagStas Sergeev
This patch implements the SS_AUTODISARM flag that can be OR-ed with SS_ONSTACK when forming ss_flags. When this flag is set, sigaltstack will be disabled when entering the signal handler; more precisely, after saving sas to uc_stack. When leaving the signal handler, the sigaltstack is restored by uc_stack. When this flag is used, it is safe to switch from sighandler with swapcontext(). Without this flag, the subsequent signal will corrupt the state of the switched-away sighandler. To detect the support of this functionality, one can do: err = sigaltstack(SS_DISABLE | SS_AUTODISARM); if (err && errno == EINVAL) unsupported(); Signed-off-by: Stas Sergeev <stsp@list.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Amanieu d'Antras <amanieu@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jason Low <jason.low2@hp.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Moore <pmoore@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: linux-api@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1460665206-13646-4-git-send-email-stsp@list.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-03signals/sigaltstack: Prepare to add new SS_xxx flagsStas Sergeev
This patch adds SS_FLAG_BITS - the mask that splits sigaltstack mode values and bit-flags. Since there is no bit-flags yet, the mask is defined to 0. The flags are added by subsequent patches. With every new flag, the mask should have the appropriate bit cleared. This makes sure if some flag is tried on a kernel that doesn't support it, the -EINVAL error will be returned, because such a flag will be treated as an invalid mode rather than the bit-flag. That way the existence of the particular features can be probed at run-time. This change was suggested by Andy Lutomirski: https://lkml.org/lkml/2016/3/6/158 Signed-off-by: Stas Sergeev <stsp@list.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amanieu d'Antras <amanieu@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: linux-api@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1460665206-13646-3-git-send-email-stsp@list.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-03net: relax expensive skb_unclone() in iptunnel_handle_offloads()Eric Dumazet
Locally generated TCP GSO packets having to go through a GRE/SIT/IPIP tunnel have to go through an expensive skb_unclone() Reallocating skb->head is a lot of work. Test should really check if a 'real clone' of the packet was done. TCP does not care if the original gso_type is changed while the packet travels in the stack. This adds skb_header_unclone() which is a variant of skb_clone() using skb_header_cloned() check instead of skb_cloned(). This variant can probably be used from other points. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02netdevice: shrink size of struct netdev_queueFlorian Westphal
- trans_timeout is incremented when tx queue timed out (tx watchdog). - tx_maxrate is set via sysfs Moving tx_maxrate to read-mostly part shrinks the struct by 64 bytes. While at it, also move trans_timeout (it is out-of-place in the 'write-mostly' part). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02bridge: netlink: export per-vlan statsNikolay Aleksandrov
Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and get_linkxstats_size) in order to export the per-vlan stats. The paddings were added because soon these fields will be needed for per-port per-vlan stats (or something else if someone beats me to it) so avoiding at least a few more netlink attributes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02bridge: vlan: learn to countNikolay Aleksandrov
Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets allocated a per-cpu stats which is then set in each per-port vlan context for quick access. The br_allowed_ingress() common function is used to account for Rx packets and the br_handle_vlan() common function is used to account for Tx packets. Stats accounting is performed only if the bridge-wide vlan_stats_enabled option is set either via sysfs or netlink. A struct hole between vlan_enabled and vlan_proto is used for the new option so it is in the same cache line. Currently it is binary (on/off) but it is intentionally restricted to exactly 0 and 1 since other values will be used in the future for different purposes (e.g. per-port stats). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02net: rtnetlink: add linkxstats callbacks and attributeNikolay Aleksandrov
Add callbacks to calculate the size and fill link extended statistics which can be split into multiple messages and are dumped via the new rtnl stats API (RTM_GETSTATS) with the IFLA_STATS_LINK_XSTATS attribute. Also add that attribute to the idx mask check since it is expected to be able to save state and resume dumping (e.g. future bridge per-vlan stats will be dumped via this attribute and callbacks). Each link type should nest its private attributes under the per-link type attribute. This allows to have any number of separated private attributes and to avoid one call to get the dev link type. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03PM / devfreq: Add new passive governorChanwoo Choi
This patch adds the new passive governor for DEVFREQ framework. The following governors are already present and used for DVFS (Dynamic Voltage and Frequency Scaling) drivers. The following governors are independently used for one device driver which don't give the influence to other device drviers and also don't receive the effect from other device drivers. - ondemand / performance / powersave / userspace The passive governor depends on operation of parent driver with specific governos extremely and is not able to decide the new frequency by oneself. According to the decided new frequency of parent driver with governor, the passive governor uses it to decide the appropriate frequency for own device driver. The passive governor must need the following information from device tree: - the source clock and OPP tables - the instance of parent device For exameple, there are one more devfreq device drivers which need to change their source clock according to their utilization on runtime. But, they share the same power line (e.g., regulator). So, specific device driver is operated as parent with ondemand governor and then the rest device driver with passive governor is influenced by parent device. Suggested-by: Myungjoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> [tjakobi: Reported RCU locking issue and cw00.choi fix it] Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> [linux.amoon: Reported possible recursive locking and cw00.choi fix it] Reported-by: Anand Moon <linux.amoon@gmail.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-05-03PM / devfreq: Add new DEVFREQ_TRANSITION_NOTIFIER notifierChanwoo Choi
This patch adds the new DEVFREQ_TRANSITION_NOTIFIER notifier to send the notification when the frequency of device is changed. This notifier has two state as following: - DEVFREQ_PRECHANGE : Notify it before chaning the frequency of device - DEVFREQ_POSTCHANGE : Notify it after changed the frequency of device And this patch adds the resourced-managed function to release the resource automatically when error happen. Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> [m.reichl and linux.amoon: Tested it on exynos4412-odroidu3 board] Tested-by: Markus Reichl <m.reichl@fivetechno.de> Tested-by: Anand Moon <linux.amoon@gmail.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-05-03PM / devfreq: Add devfreq_get_devfreq_by_phandle()Chanwoo Choi
This patch adds the new devfreq_get_devfreq_by_phandle() OF helper function which can find the instance of devfreq device by using phandle ("devfreq"). Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> [m.reichl and linux.amoon: Tested it on exynos4412-odroidu3 board] Tested-by: Markus Reichl <m.reichl@fivetechno.de> Tested-by: Anand Moon <linux.amoon@gmail.com> Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-05-02tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logicSteven Rostedt (Red Hat)
Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-02Merge tag 'tegra-for-4.7-clk' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into clk-next Pull tegra clk driver changes from Thierry Reding: This set of changes contains a bunch of cleanups and minor fixes along with some new clocks, mainly on Tegra210, in preparation for supporting DisplayPort and HDMI 2.0. * tag 'tegra-for-4.7-clk' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: clk: tegra: dfll: Reformat CVB frequency table clk: tegra: dfll: Properly clean up on failure and removal clk: tegra: dfll: Make code more comprehensible clk: tegra: dfll: Reference CVB table instead of copying data clk: tegra: dfll: Update kerneldoc clk: tegra: Fix PLL_U post divider and initial rate on Tegra30 clk: tegra: Initialize PLL_C to sane rate on Tegra30 clk: tegra: Fix pllre Tegra210 and add pll_re_out1 clk: tegra: Add sor_safe clock clk: tegra: dpaux and dpaux1 are fixed factor clocks clk: tegra: Add dpaux1 clock clk: tegra: Use correct parent for dpaux clock clk: tegra: Add fixed factor peripheral clock type clk: tegra: Special-case mipi-cal parent on Tegra114 clk: tegra: Remove trailing blank line clk: tegra: Constify peripheral clock registers clk: tegra: Add interface to enable hardware control of SATA/XUSB PLLs
2016-05-02Merge branch 'for-linus' into work.lookupsAl Viro
2016-05-02introduce a parallel variant of ->iterate()Al Viro
New method: ->iterate_shared(). Same arguments as in ->iterate(), called with the directory locked only shared. Once all filesystems switch, the old one will be gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02give readdir(2)/getdents(2)/etc. uniform exclusion with lseek()Al Viro
same as read() on regular files has, and for the same reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02parallel lookups: actual switch to rwsemAl Viro
ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02parallel lookups machinery, part 4 (and last)Al Viro
If we *do* run into an in-lookup match, we need to wait for it to cease being in-lookup. Fortunately, we do have unused space in in-lookup dentries - d_lru is never looked at until it stops being in-lookup. So we can stash a pointer to wait_queue_head from stack frame of the caller of ->lookup(). Some precautions are needed while waiting, but it's not that hard - we do hold a reference to dentry we are waiting for, so it can't go away. If it's found to be in-lookup the wait_queue_head is still alive and will remain so at least while ->d_lock is held. Moreover, the condition we are waiting for becomes true at the same point where everything on that wq gets woken up, so we can just add ourselves to the queue once. d_alloc_parallel() gets a pointer to wait_queue_head_t from its caller; lookup_slow() adjusted, d_add_ci() taught to use d_alloc_parallel() if the dentry passed to it happens to be in-lookup one (i.e. if it's been called from the parallel lookup). That's pretty much it - all that remains is to switch ->i_mutex to rwsem and have lookup_slow() take it shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02parallel lookups machinery, part 3Al Viro
We will need to be able to check if there is an in-lookup dentry with matching parent/name. Right now it's impossible, but as soon as start locking directories shared such beasts will appear. Add a secondary hash for locating those. Hash chains go through the same space where d_alias will be once it's not in-lookup anymore. Search is done under the same bitlock we use for modifications - with the primary hash we can rely on d_rehash() into the wrong chain being the worst that could happen, but here the pointers are buggered once it's removed from the chain. On the other hand, the chains are not going to be long and normally we'll end up adding to the chain anyway. That allows us to avoid bothering with ->d_lock when doing the comparisons - everything is stable until removed from chain. New helper: d_alloc_parallel(). Right now it allocates, verifies that no hashed and in-lookup matches exist and adds to in-lookup hash. Returns ERR_PTR() for error, hashed match (in the unlikely case it's been found) or new dentry. In-lookup matches trigger BUG() for now; that will change in the next commit when we introduce waiting for ongoing lookup to finish. Note that in-lookup matches won't be possible until we actually go for shared locking. lookup_slow() switched to use of d_alloc_parallel(). Again, these commits are separated only for making it easier to review. All this machinery will start doing something useful only when we go for shared locking; it's just that the combination is too large for my taste. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02parallel lookups machinery, part 2Al Viro
We'll need to verify that there's neither a hashed nor in-lookup dentry with desired parent/name before adding to in-lookup set. One possible solution would be to hold the parent's ->d_lock through both checks, but while the in-lookup set is relatively small at any time, dcache is not. And holding the parent's ->d_lock through something like __d_lookup_rcu() would suck too badly. So we leave the parent's ->d_lock alone, which means that we watch out for the following scenario: * we verify that there's no hashed match * existing in-lookup match gets hashed by another process * we verify that there's no in-lookup matches and decide that everything's fine. Solution: per-directory kinda-sorta seqlock, bumped around the times we hash something that used to be in-lookup or move (and hash) something in place of in-lookup. Then the above would turn into * read the counter * do dcache lookup * if no matches found, check for in-lookup matches * if there had been none of those either, check if the counter has changed; repeat if it has. The "kinda-sorta" part is due to the fact that we don't have much spare space in inode. There is a spare word (shared with i_bdev/i_cdev/i_pipe), so the counter part is not a problem, but spinlock is a different story. We could use the parent's ->d_lock, and it would be less painful in terms of contention, for __d_add() it would be rather inconvenient to grab; we could do that (using lock_parent()), but... Fortunately, we can get serialization on the counter itself, and it might be a good idea in general; we can use cmpxchg() in a loop to get from even to odd and smp_store_release() from odd to even. This commit adds the counter and updating logics; the readers will be added in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02beginning of transition to parallel lookups - marking in-lookup dentriesAl Viro
marked as such when (would be) parallel lookup is about to pass them to actual ->lookup(); unmarked when * __d_add() is about to make it hashed, positive or not. * __d_move() (from d_splice_alias(), directly or via __d_unalias()) puts a preexisting dentry in its place * in caller of ->lookup() if it has escaped all of the above. Bug (WARN_ON, actually) if it reaches the final dput() or d_instantiate() while still marked such. As the result, we are guaranteed that for as long as the flag is set, dentry will * remain negative unhashed with positive refcount * never have its ->d_alias looked at * never have its ->d_lru looked at * never have its ->d_parent and ->d_name changed Right now we have at most one such for any given parent directory. With parallel lookups that restriction will weaken to * only exist when parent is locked shared * at most one with given (parent,name) pair (comparison of names is according to ->d_compare()) * only exist when there's no hashed dentry with the same (parent,name) Transition will take the next several commits; unfortunately, we'll only be able to switch to rwsem at the end of this series. The reason for not making it a single patch is to simplify review. New primitives: d_in_lookup() (a predicate checking if dentry is in the in-lookup state) and d_lookup_done() (tells the system that we are done with lookup and if it's still marked as in-lookup, it should cease to be such). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02Merge getxattr prototype change into work.lookupsAl Viro
The rest of work.xattr stuff isn't needed for this branch
2016-05-02Merge tag 'v4.7-rockchip-clk3' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-next Pull rockchip clk updates from Heiko Stuebner: A spelling fix and a bunch of rk3399 clock fixes. * tag 'v4.7-rockchip-clk3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: clk: rockchip: fix the rk3399 cifout clock clk: rockchip: drop unnecessary CLK_IGNORE_UNUSED flags from rk3399 clk: rockchip: add some frequencies on the rk3399 PLL table clk: rockchip: assign more necessary rk3399 clock ids clk: rockchip: export some necessary rk3399 clock ids clk: rockchip: rename rga clock-id on rk3399 clk: rockchip: add general gpu soft-reset on rk3399 clk: rockchip: fix the gate bit for i2c4 and i2c8 on rk3399 clk: rockchip: fix of spelling mistake on unsuccessful in pll clock type
2016-05-02ipv6: Generic tunnel cleanupTom Herbert
A few generic changes to generalize tunnels in IPv6: - Export ip6_tnl_change_mtu so that it can be called by ip6_gre - Add tun_hlen to ip6_tnl structure. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02gre: Create common functions for transmitTom Herbert
Create common functions for both IPv4 and IPv6 GRE in transmit. These are put into gre.h. Common functions are for: - GRE checksum calculation. Move gre_checksum to gre.h. - Building a GRE header. Move GRE build_header and rename gre_build_header. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02ipv6: Create ip6_tnl_xmitTom Herbert
This patch renames ip6_tnl_xmit2 to ip6_tnl_xmit and exports it. Other users like GRE will be able to call this. The original ip6_tnl_xmit function is renamed to ip6_tnl_start_xmit (this is an ndo_start_xmit function). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02gre: Move utility functions to common headersTom Herbert
Several of the GRE functions defined in net/ipv4/ip_gre.c are usable for IPv6 GRE implementation (that is they are protocol agnostic). These include: - GRE flag handling functions are move to gre.h - GRE build_header is moved to gre.h and renamed gre_build_header - parse_gre_header is moved to gre_demux.c and renamed gre_parse_header - iptunnel_pull_header is taken out of gre_parse_header. This is now done by caller. The header length is returned from gre_parse_header in an int* argument. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02ipv6: Cleanup IPv6 tunnel receive pathTom Herbert
Some basic changes to make IPv6 tunnel receive path look more like IPv4 path: - Make ip6_tnl_rcv non-static so that GREv6 and others can call it - Make ip6_tnl_rcv look like ip_tunnel_rcv - Switch to gro_cells_receive - Make ip6_tnl_rcv non-static and export it Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02tcp: make tcp_sendmsg() aware of socket backlogEric Dumazet
Large sendmsg()/write() hold socket lock for the duration of the call, unless sk->sk_sndbuf limit is hit. This is bad because incoming packets are parked into socket backlog for a long time. Critical decisions like fast retransmit might be delayed. Receivers have to maintain a big out of order queue with additional cpu overhead, and also possible stalls in TX once windows are full. Bidirectional flows are particularly hurt since the backlog can become quite big if the copy from user space triggers IO (page faults) Some applications learnt to use sendmsg() (or sendmmsg()) with small chunks to avoid this issue. Kernel should know better, right ? Add a generic sk_flush_backlog() helper and use it right before a new skb is allocated. Typically we put 64KB of payload per skb (unless MSG_EOR is requested) and checking socket backlog every 64KB gives good results. As a matter of fact, tests with TSO/GSO disabled give very nice results, as we manage to keep a small write queue and smaller perceived rtt. Note that sk_flush_backlog() maintains socket ownership, so is not equivalent to a {release_sock(sk); lock_sock(sk);}, to ensure implicit atomicity rules that sendmsg() was giving to (possibly buggy) applications. In this simple implementation, I chose to not call tcp_release_cb(), but we might consider this later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02Minimal fix-up of bad hashing behavior of hash_64()Linus Torvalds
This is a fairly minimal fixup to the horribly bad behavior of hash_64() with certain input patterns. In particular, because the multiplicative value used for the 64-bit hash was intentionally bit-sparse (so that the multiply could be done with shifts and adds on architectures without hardware multipliers), some bits did not get spread out very much. In particular, certain fairly common bit ranges in the input (roughly bits 12-20: commonly with the most information in them when you hash things like byte offsets in files or memory that have block factors that mean that the low bits are often zero) would not necessarily show up much in the result. There's a bigger patch-series brewing to fix up things more completely, but this is the fairly minimal fix for the 64-bit hashing problem. It simply picks a much better constant multiplier, spreading the bits out a lot better. NOTE! For 32-bit architectures, the bad old hash_64() remains the same for now, since 64-bit multiplies are expensive. The bigger hashing cleanup will replace the 32-bit case with something better. The new constants were picked by George Spelvin who wrote that bigger cleanup series. I just picked out the constants and part of the comment from that series. Cc: stable@vger.kernel.org Cc: George Spelvin <linux@horizon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-02PCI: imx6: Implement reset sequence for i.MX6+Andrey Smirnov
I.MX6+ has a dedicated bit for resetting PCIe core, which should be used instead of a regular reset sequence since using the latter will hang the SoC. This commit is based on c34068d48273e24d392d9a49a38be807954420ed from http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git Tested-by: Gary Bisson <gary.bisson@boundarydevices.com> Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
2016-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips, from Sara Sharon. 2) Fix SKB size checks in batman-adv stack on receive, from Sven Eckelmann. 3) Leak fix on mac80211 interface add error paths, from Johannes Berg. 4) Cannot invoke napi_disable() with BH disabled in myri10ge driver, fix from Stanislaw Gruszka. 5) Fix sign extension problem when computing feature masks in net_gso_ok(), from Marcelo Ricardo Leitner. 6) lan78xx driver doesn't count packets and packet lengths in its statistics properly, fix from Woojung Huh. 7) Fix the buffer allocation sizes in pegasus USB driver, from Petko Manolov. 8) Fix refcount overflows in bpf, from Alexei Starovoitov. 9) Unified dst cache handling introduced a preempt warning in ip_tunnel, fix by resetting rather then setting the cached route. From Paolo Abeni. 10) Listener hash collision test fix in soreuseport, from Craig Gallak * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) gre: do not pull header in ICMP error processing net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case tipc: only process unicast on intended node cxgb3: fix out of bounds read net/smscx5xx: use the device tree for mac address soreuseport: Fix TCP listener hash collision net: l2tp: fix reversed udp6 checksum flags ip_tunnel: fix preempt warning in ip tunnel creation/updating samples/bpf: fix trace_output example bpf: fix check_map_func_compatibility logic bpf: fix refcnt overflow drivers: net: cpsw: use of_phy_connect() in fixed-link case dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive drivers: net: cpsw: don't ignore phy-mode if phy-handle is used drivers: net: cpsw: fix segfault in case of bad phy-handle drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver gre: reject GUE and FOU in collect metadata mode pegasus: fixes reported packet length pegasus: fixes URB buffer allocation size; ...
2016-05-02isa: Implement the max_num_isa_dev macroWilliam Breathitt Gray
max_num_isa_dev is a macro to determine the maximum possible number of ISA devices which may be registered in the I/O port address space given the address extent of the ISA devices. The highest base address possible for an ISA device is 0x3FF; this results in 1024 possible base addresses. Dividing the number of possible base addresses by the address extent taken by each device results in the maximum number of devices on a system. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-02isa: Implement the module_isa_driver macroWilliam Breathitt Gray
The module_isa_driver macro is a helper macro for ISA drivers which do not do anything special in module init/exit. This eliminates a lot of boilerplate code. Each module may only use this macro once, and calling it replaces module_init and module_exit. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-02block: add __blkdev_issue_discardChristoph Hellwig
This is a version of blkdev_issue_discard which doesn't wait for the I/O to complete, but instead allows the caller to submit the final bio and/or chain it to others. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagig@grimberg.me> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02NVMe: correct comment for offset enum of controller registers in nvme.hWang Sheng-Hui
Section 3.1 gives the comment for the offset of controller registers in the specification 1.2a. Some are mis-copied in the header file nvme.h. Correct them. Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02drm/atomic: Rename drm_atomic_async_commit to nonblocking.Maarten Lankhorst
Another step in renaming async to nonblocking for atomic commit. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1461679905-30177-3-git-send-email-maarten.lankhorst@linux.intel.com
2016-05-02drm/atomic: Rename async parameter to nonblocking.Maarten Lankhorst
This is the first step of renaming async commit to nonblocking commit. The flag passed by userspace is NONBLOCKING, and async has a different meaning for page flips, where it means as soon as possible. Fixing up comments in drm core is done manually, to make sure I didn't miss anything. For drivers, the following cocci script is used to rename bool async to bool nonblock: @@ identifier I =~ "^async"; identifier func; @@ func(..., bool - I + nonblock , ...) { <... - I + nonblock ...> } @@ identifier func; type T; identifier I =~ "^async"; @@ T func(..., bool - I + nonblock , ...); Thanks to Tvrtko Ursulin for the cocci script. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1461679905-30177-2-git-send-email-maarten.lankhorst@linux.intel.com
2016-05-02drm/fb-cma-helper: Add fb_deferred_io supportNoralf Trønnes
This adds fbdev deferred io support if CONFIG_FB_DEFERRED_IO is enabled. The driver has to provide a (struct drm_framebuffer_funcs *)->dirty() callback to get notification of fbdev framebuffer changes. If the dirty() hook is set, then fb_deferred_io is set up automatically by the helper. Two functions have been added so that the driver can provide a dirty() function: - drm_fbdev_cma_init_with_funcs() This makes it possible for the driver to provided a custom (struct drm_fb_helper_funcs *)->fb_probe() function. - drm_fbdev_cma_create_with_funcs() This is used by the .fb_probe hook to set a driver provided (struct drm_framebuffer_funcs *)->dirty() function. Cc: laurent.pinchart@ideasonboard.com Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1461856717-6476-6-git-send-email-noralf@tronnes.org
2016-05-02fbdev: fb_defio: Export fb_deferred_io_mmapNoralf Trønnes
Export fb_deferred_io_mmap so drivers can change vma->vm_page_prot. When the framebuffer memory is allocated using dma_alloc_writecombine() instead of vmalloc(), I get cache syncing problems on ARM. This solves it: static int drm_fbdev_cma_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma) { fb_deferred_io_mmap(info, vma); vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); return 0; } Could this have been done in the core? Drivers that don't set (struct fb_ops *)->fb_mmap, gets a call to fb_pgprotect() at the end of the default fb_mmap implementation (drivers/video/fbdev/core/fbmem.c). This is an architecture specific function that on many platforms uses pgprot_writecombine(), but not on all. And looking at some of the fb_mmap implementations, some of them sets vm_page_prot to nocache for instance, so I think the safest bet is to do this in the driver and not in the fbdev core. And we can't call fb_pgprotect() from fb_deferred_io_mmap() either because we don't have access to the file pointer that powerpc needs. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1461856717-6476-5-git-send-email-noralf@tronnes.org
2016-05-02drm/fb-helper: Add fb_deferred_io supportNoralf Trønnes
This adds deferred io support to drm_fb_helper. The fbdev framebuffer changes are flushed using the callback (struct drm_framebuffer *)->funcs->dirty() by a dedicated worker ensuring that it always runs in process context. For those wondering why we need to be able to handle atomic calling contexts: Both panic paths and cursor code and fbcon blanking can run from atomic. See commit bcb39af4486be07e896fc374a2336bad3104ae0a Author: Dave Airlie <airlied@redhat.com> Date: Thu Feb 7 11:19:15 2013 +1000 drm/udl: make usage as a console safer for where this was originally discovered. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: Augment commit message with why we need to handle atomic contexts.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1461856717-6476-4-git-send-email-noralf@tronnes.org
2016-05-02of: include errno.h in of_graph.hArnd Bergmann
When CONFIG_OF is disabled, we have to include linux/errno.h before including of_graph.h, or get build errors like in the newly added sun4i drm driver: In file included from ../drivers/gpu/drm/sun4i/sun4i_drv.c:14:0: include/linux/of_graph.h: In function 'of_graph_parse_endpoint': include/linux/of_graph.h:58:10: error: 'ENOSYS' undeclared (first use in this function) A better solution is to ensure that the header can be included by itself, so let's include linux/errno.h here to fix the error we just got, and any similar future error. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 9026e0d122ac ("drm: Add Allwinner A10 Display Engine support") Signed-off-by: Rob Herring <robh@kernel.org>
2016-05-02irqchip: Add per-cpu interrupt partitioning libraryMarc Zyngier
We've unfortunately started seeing a situation where percpu interrupts are partitioned in the system: one arbitrary set of CPUs has an interrupt connected to a type of device, while another disjoint set of CPUs has the same interrupt connected to another type of device. This makes it impossible to have a device driver requesting this interrupt using the current percpu-interrupt abstraction, as the same interrupt number is now potentially claimed by at least two drivers, and we forbid interrupt sharing on per-cpu interrupt. A solution to this is to turn things upside down. Let's assume that our system describes all the possible partitions for a given interrupt, and give each of them a unique identifier. It is then possible to create a namespace where the affinity identifier itself is a form of interrupt number. At this point, it becomes easy to implement a set of partitions as a cascaded irqchip, each affinity identifier being the HW irq. This allows us to keep a number of nice properties: - Each partition results in a separate percpu-interrupt (with a restrictied affinity), which keeps drivers happy. - Because the underlying interrupt is still per-cpu, the overhead of the indirection can be kept pretty minimal. - The core code can ignore most of that crap. For that purpose, we implement a small library that deals with some of the boilerplate code, relying on platform-specific drivers to provide a description of the affinity sets and a set of callbacks. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/1460365075-7316-4-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-02genirq: Allow the affinity of a percpu interrupt to be set/retrievedMarc Zyngier
In order to prepare the genirq layer for the concept of partitionned percpu interrupts, let's allow an affinity to be associated with such an interrupt. We introduce: - irq_set_percpu_devid_partition: flag an interrupt as a percpu-devid interrupt, and associate it with an affinity - irq_get_percpu_devid_partition: allow the affinity of that interrupt to be retrieved. This will allow a driver to discover which CPUs the per-cpu interrupt can actually fire on. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/1460365075-7316-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-02irqdomain: Allow domain matching on irq_fwspecMarc Zyngier
When iterating over the irq domain list, we try to match a domain either by calling a match() function or by comparing a number of fields passed as parameters. Both approaches are a bit restrictive: - match() is DT specific and only takes a device node - the fallback case only deals with the fwnode_handle It would be useful if we had a per-domain function that would actually perform the matching check on the whole of the irq_fwspec structure. This would allow for a domain to triage matching attempts that need to extend beyond the fwnode. Let's introduce irq_find_matching_fwspec(), which takes a full blown irq_fwspec structure, and call into a select() function implemented by the irqdomain. irq_find_matching_fwnode() is made a wrapper around irq_find_matching_fwspec in order to preserve compatibility. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/1460365075-7316-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-02genirq: Add error code reporting to irq_{reserve,destroy}_ipiMatt Redfearn
Make these functions return appropriate error codes when something goes wrong. Previously irq_destroy_ipi returned void making it impossible to notify the caller if the request could not be fulfilled. Patch 1 in the series added another condition in which this could fail in addition to the existing ones. irq_reserve_ipi returned an unsigned int meaning it could only return 0 on failure and give the caller no indication as to why the request failed. As time goes on there are likely to be further conditions added in which these functions can fail. These APIs and the IPI IRQ domain are new in 4.6 and the number of existing call sites are low, changing the API now has little impact on the code, while making it easier for these functions to grow over time. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: ralf@linux-mips.org Cc: Qais Yousef <qsyousef@gmail.com> Cc: lisa.parratt@imgtec.com Cc: jiang.liu@linux.intel.com Link: http://lkml.kernel.org/r/1461568464-31701-2-git-send-email-matt.redfearn@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-02genirq: Make irq_destroy_ipi take a cpumask of IPIs to destroyMatt Redfearn
Previously irq_destroy_ipi() would destroy IPIs to all CPUs that were configured by irq_reserve_ipi(). This change makes it possible to destroy just a subset of the IPIs. This may be useful to remove IPIs to CPUs that have been hot removed so that the IRQ numbers allocated within the IPI domain can be re-used. The original behaviour is restored by passing the complete mask that the IPI was created with. There are currently no users of this function that would break from the API change. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: ralf@linux-mips.org Cc: Qais Yousef <qsyousef@gmail.com> Cc: lisa.parratt@imgtec.com Cc: jiang.liu@linux.intel.com Link: http://lkml.kernel.org/r/1461568464-31701-1-git-send-email-matt.redfearn@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-02Merge tag 'v4.6-rc6' into patchworkMauro Carvalho Chehab
Linux 4.6-rc6 * tag 'v4.6-rc6': (762 commits) Linux 4.6-rc6 EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback Documentation/sysctl/vm.txt: update numa_zonelist_order description lib/stackdepot.c: allow the stack trace hash to be zero rapidio: fix potential NULL pointer dereference mm/memory-failure: fix race with compound page split/merge ocfs2/dlm: return zero if deref_done message is successfully handled Ananth has moved kcov: don't profile branches in kcov kcov: don't trace the code coverage code mm: wake kcompactd before kswapd's short sleep .mailmap: add Frank Rowand mm/hwpoison: fix wrong num_poisoned_pages accounting mm: call swap_slot_free_notify() with page lock held mm: vmscan: reclaim highmem zone if buffer_heads is over limit numa: fix /proc/<pid>/numa_maps for THP mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check mailmap: fix Krzysztof Kozlowski's misspelled name thp: keep huge zero page pinned until tlb flush mm: exclude HugeTLB pages from THP page_mapped() logic ...