summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-01-11net/mlx5_core: Set priority attributesMaor Gottlieb
Each priority has two attributes: 1. max_ft - maximum allowed flow tables under this priority. 2. start_level - start level range of the flow tables in the priority. These attributes are set by traversing the tree nodes by DFS and set start level and max flow tables to each priority. Start level depends on the max flow tables of the prior priorities in the tree. The leaves of the trees have max_ft set in them. Each node accumulates the max_ft of its children and set it accordingly. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11net/mlx5_core: Connect flow tablesMaor Gottlieb
Flow tables from different priorities should be chained together. When a packet arrives we search for a match in the by-pass flow tables (first we search for a match in priority 0 and if we don't find a match we move to the next priority). If we can't find a match in any of the bypass flow-tables, we continue searching in the flow-tables of the next priority, which are the kernel's flow tables. Setting the miss flow table in a new flow table to be the next one in the list is performed via create flow table API. If we want to change an existing flow table, for example in order to point from an existing flow table to the new next-in-list flow table, we use the modify flow table API. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11net/mlx5_core: Introduce modify flow table commandMaor Gottlieb
Introduce the modify flow table command. This command is used when we want to change the next flow table of an existing flow table. The next flow table is defined as the table we search (in order to find a match), if we couldn't find a match in any of the flow table entries in the current flow table. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11net/mlx5_core: Managing root flow tableMaor Gottlieb
The root Flow Table for each Flow Table Type is defined, by default, as the Flow Table with level 0. In order not to use an empty flow tables and introduce new hops, but still preserve space for flow-tables that have a priority greater(lower number) than the current flow table, we introduce this new set root flow table command. This command tells the HW to start matching packets from the assigned root flow table. This command is used when we create new flow table with level lower than the current lowest flow table or it is the first flow table. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11net/mlx5_core: Add utilities to find next and prev flow-tablesMaor Gottlieb
Add two utility functions for find next and prev flow table. Find next flow table function gets priority and return the first flow table of the next priority in the tree. Find prev flow table return the last flow table of the previous priority in the tree. These utility functions are used for chaining flow table from different priorities. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11net/mlx5_core: Introduce flow steering autogrouped flow tableMaor Gottlieb
When user add rule to autogrouped flow table, we search for flow group with the same match criteria, if we don't find such group then we create new flow group with the required match criteria and insert the rule to this group. We divide the flow table into required_groups + 1, in order to reserve a part of the flow table for rules which don't match any existing group. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11udp: disallow UFO for sockets with SO_NO_CHECK optionMichal Kubeček
Commit acf8dd0a9d0b ("udp: only allow UFO for packets from SOCK_DGRAM sockets") disallows UFO for packets sent from raw sockets. We need to do the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if for a bit different reason: while such socket would override the CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and bad offloading flags warning is triggered in __skb_gso_segment(). In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow UFO for packets sent by sockets with UDP_NO_CHECK6_TX option. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Tested-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11net: pktgen: fix null ptr deref in skb allocationJohn Fastabend
Fix possible null pointer dereference that may occur when calling skb_reserve() on a null skb. Fixes: 879c7220e82 ("net: pktgen: Observe needed_headroom of the device") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Intel Knights Landing support. (Harish Chegondi) - Intel Broadwell-EP uncore PMU support. (Kan Liang) - Core code improvements. (Peter Zijlstra.) - Event filter, LBR and PEBS fixes. (Stephane Eranian) - Enable cycles:pp on Intel Atom. (Stephane Eranian) - Add cycles:ppp support for Skylake. (Andi Kleen) - Various x86 NMI overhead optimizations. (Andi Kleen) - Intel PT enhancements. (Takao Indoh) - AMD cache events fix. (Vince Weaver) Tons of tooling changes: - Show random perf tool tips in the 'perf report' bottom line (Namhyung Kim) - perf report now defaults to --group if the perf.data file has grouped events, try it with: # perf record -e '{cycles,instructions}' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ] # perf report # Samples: 1K of event 'anon group { cycles, instructions }' # Event count (approx.): 1955219195 # # Overhead Command Shared Object Symbol 2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle 1.05% 0.33% firefox libxul.so [.] js::SetObjectElement 1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno 0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab 0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1> 0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay 0.62% 1.27% firefox libxul.so [.] js::GetIterator 0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty 0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining - Introduce the 'perf stat record/report' workflow: Generate perf.data files from 'perf stat', to tap into the scripting capabilities perf has instead of defining a 'perf stat' specific scripting support to calculate event ratios, etc. Simple example: $ perf stat record -e cycles usleep 1 Performance counter stats for 'usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ perf stat report Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ It generates PERF_RECORD_ userspace records to store the details: $ perf report -D | grep PERF_RECORD 0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637 0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535 0x12a [0x40]: PERF_RECORD_STAT_CONFIG 0x16a [0x30]: PERF_RECORD_STAT -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text 0x1da [0x18]: PERF_RECORD_STAT_ROUND [acme@ssdandy linux]$ An effort was made to make perf.data files generated like this to not generate cryptic messages when processed by older tools. The 'perf script' bits need rebasing, will go up later. - Make command line options always available, even when they depend on some feature being enabled, warning the user about use of such options (Wang Nan) - Support hw breakpoint events (mem:0xAddress) in the default output mode in 'perf script' (Wang Nan) - Fixes and improvements for supporting annotating ARM binaries, support ARM call and jump instructions, more work needed to have arch specific stuff separated into tools/perf/arch/*/annotate/ (Russell King) - Add initial 'perf config' command, for now just with a --list command to the contents of the configuration file in use and a basic man page describing its format, commands for doing edits and detailed documentation are being reviewed and proof-read. (Taeung Song) - Allows BPF scriptlets specify arguments to be fetched using DWARF info, using a prologue generated at compile/build time (He Kuang, Wang Nan) - Allow attaching BPF scriptlets to module symbols (Wang Nan) - Allow attaching BPF scriptlets to userspace code using uprobe (Wang Nan) - BPF programs now can specify 'perf probe' tunables via its section name, separating key=val values using semicolons (Wang Nan) Testing some of these new BPF features: Use case: get callchains when receiving SSL packets, filter then in the kernel, at arbitrary place. # cat ssl.bpf.c #define SEC(NAME) __attribute__((section(NAME), used)) struct pt_regs; SEC("func=__inet_lookup_established hnum") int func(struct pt_regs *ctx, int err, unsigned short port) { return err == 0 && port == 443; } char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; # # perf record -a -g -e ssl.bpf.c ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ] # perf script | head -30 swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux) 856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux) 2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux) 2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux) 96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux) 969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux) 2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux) 95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux) 1163ffa start_kernel ([kernel.vmlinux].init.text) 11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text) 1163623 x86_64_start_kernel ([kernel.vmlinux].init.text) qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux) 8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux) 430a br_handle_frame_finish ([bridge]) 48bc br_handle_frame ([bridge]) 855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) # - Use 'perf probe' various options to list functions, see what variables can be collected at any given point, experiment first collecting without a filter, then filter, use it together with 'perf trace', 'perf top', with or without callchains, if it explodes, please tell us! - Introduce a new callchain mode: "folded", that will list per line representations of all callchains for a give histogram entry, facilitating 'perf report' output processing by other tools, such as Brendan Gregg's flamegraph tools (Namhyung Kim) E.g: # perf report | grep -v ^# | head 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry | ---cpu_startup_entry | |--12.07%--start_secondary | --6.30%--rest_init start_kernel x86_64_start_reservations x86_64_start_kernel # Becomes, in "folded" mode: # perf report -g folded | grep -v ^# | head -5 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry 12.07% cpu_startup_entry;start_secondary 6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle 11.23% call_cpuidle;cpu_startup_entry;start_secondary 5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter 11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state # The user can also select one of "count", "period" or "percent" as the first column. ... and lots of infrastructure enhancements, plus fixes and other changes, features I failed to list - see the shortlog and the git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits) perf evlist: Add --trace-fields option to show trace fields perf record: Store data mmaps for dwarf unwind perf libdw: Check for mmaps also in MAP__VARIABLE tree perf unwind: Check for mmaps also in MAP__VARIABLE tree perf unwind: Use find_map function in access_dso_mem perf evlist: Remove perf_evlist__(enable|disable)_event functions perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does) perf report: Show random usage tip on the help line perf hists: Export a couple of hist functions perf diff: Use perf_hpp__register_sort_field interface perf tools: Add overhead/overhead_children keys defaults via string perf tools: Remove list entry from struct sort_entry perf tools: Include all tools/lib directory for tags/cscope/TAGS targets perf script: Align event name properly perf tools: Add missing headers in perf's MANIFEST perf tools: Do not show trace command if it's not compiled in perf report: Change default to use event group view perf top: Decay periods in callchains tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/ tools lib: Sync tools/lib/find_bit.c with the kernel ...
2016-01-11Merge branch 'bpf-next'David S. Miller
Daniel Borkmann says: ==================== BPF update This set adds IPv6 support for bpf_skb_{set,get}_tunnel_key() helper. It also exports flags to user space that are being used in helpers and weren't exported thus far. For more details, please see the individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11bpf: support ipv6 for bpf_skb_{set,get}_tunnel_keyDaniel Borkmann
After IPv6 support has recently been added to metadata dst and related encaps, add support for populating/reading it from an eBPF program. Commit d3aa45ce6b ("bpf: add helpers to access tunnel metadata") started with initial IPv4-only support back then (due to IPv6 metadata support not being available yet). To stay compatible with older programs, we need to test for the passed structure size. Also TOS and TTL support from the ip_tunnel_info key has been added. Tested with vxlan devs in collect meta data mode with IPv4, IPv6 and in compat mode over different network namespaces. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11bpf: export helper function flags and reject invalid onesDaniel Borkmann
Export flags used by eBPF helper functions through UAPI, so they can be used by programs (instead of them redefining all flags each time or just using the hard-coded values). It also gives a better overview what flags are used where and we can further get rid of the extra macros defined in filter.c. Moreover, reject invalid flags. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11Merge branch 'renesas-eth-fixes'David S. Miller
Sergei Shtylyov says: ==================== Fix some dubious code in the Renesas Ethernet drivers Here's a set of 2 patches against DaveM's 'net.git' repo. While initializing EMAC the code tries to respect the duplex mode both programmed into ECMR and stored in its own private data -- this just can't be right. [1/2] ravb: stop reading ECMR in ravb_emac_init() [2/2] sh_eth: stop reading ECMR in sh_eth_dev_init() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11sh_eth: stop reading ECMR in sh_eth_dev_init()Sergei Shtylyov
The code in sh_eth_dev_init() twiddling the ECMR bits always looked a bit strange to me: if one intends to respect 'mdp->duplex', why save old value of the ECMR.DM bit? As all the other bits are zeroed anyway, we don't really need to read ECMR before writing to it. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11ravb: stop reading ECMR in ravb_emac_init()Sergei Shtylyov
The code in ravb_emac_init() twiddling the ECMR bits always looked a bit strange to me: if one intends to respect 'priv->duplex', why save old value of the ECMR.DM bit? As all the other bits are zeroed anyway, we don't really need to read ECMR before writing to it. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11sched,cls_flower: set key address type when presentJamal Hadi Salim
only when user space passes the addresses should we consider their presence Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11tcp_yeah: don't set ssthresh below 2Neal Cardwell
For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno and CUBIC, per RFC 5681 (equation 4). tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh value if the intended reduction is as big or bigger than the current cwnd. Congestion control modules should never return a zero or negative ssthresh. A zero ssthresh generally results in a zero cwnd, causing the connection to stall. A negative ssthresh value will be interpreted as a u32 and will set a target cwnd for PRR near 4 billion. Oleksandr Natalenko reported that a system using tcp_yeah with ECN could see a warning about a prior_cwnd of 0 in tcp_cwnd_reduction(). Testing verified that this was due to tcp_yeah_ssthresh() misbehaving in this way. Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "So we have a laundry list of locking subsystem changes: - continuing barrier API and code improvements - futex enhancements - atomics API improvements - pvqspinlock enhancements: in particular lock stealing and adaptive spinning - qspinlock micro-enhancements" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op futex: Cleanup the goto confusion in requeue_pi() futex: Remove pointless put_pi_state calls in requeue() futex: Document pi_state refcounting in requeue code futex: Rename free_pi_state() to put_pi_state() futex: Drop refcount if requeue_pi() acquired the rtmutex locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation lcoking/barriers, arch: Use smp barriers in smp_store_release() locking/cmpxchg, arch: Remove tas() definitions locking/pvqspinlock: Queue node adaptive spinning locking/pvqspinlock: Allow limited lock stealing locking/pvqspinlock: Collect slowpath lock statistics sched/core, locking: Document Program-Order guarantees locking, sched: Introduce smp_cond_acquire() and use it locking/pvqspinlock, x86: Optimize the PV unlock code path locking/qspinlock: Avoid redundant read of next pointer locking/qspinlock: Prefetch the next node cacheline locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg() atomics: Add test for atomic operations with _relaxed variants
2016-01-11bonding: make mii_status sysfs node consistentJarod Wilson
The spew in /proc/net/bonding/bond0 uses netif_carrier_ok() to determine mii_status, while /sys/class/net/bond0/bonding/mii_status looks at curr_active_slave, which doesn't actually seem to be set sometimes when the bond actually is up. A mode 4 bond configured via ifcfg-foo files on a Red Hat Enterprise Linux system, after boot, comes up clean and functional, but the sysfs node shows mii_status of down, while proc shows up. A simple enough fix here seems to be to use the same method for determining up or down in both places, and I'd opt for the one that seems to match reality. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11sctp: fix use-after-free in pr_debug statementMarcelo Ricardo Leitner
Dmitry Vyukov reported a use-after-free in the code expanded by the macro debug_post_sfx, which is caused by the use of the asoc pointer after it was freed within sctp_side_effect() scope. This patch fixes it by allowing sctp_side_effect to clear that asoc pointer when the TCB is freed. As Vlad explained, we also have to cover the SCTP_DISPOSITION_ABORT case because it will trigger DELETE_TCB too on that same loop. Also, there were places issuing SCTP_CMD_INIT_FAILED and ASSOC_FAILED but returning SCTP_DISPOSITION_CONSUME, which would fool the scheme above. Fix it by returning SCTP_DISPOSITION_ABORT instead. The macro is already prepared to handle such NULL pointer. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The changes in this cycle were: - Adding transitivity uniformly to rcu_node structure ->lock acquisitions. (This is implemented by the first two commits on top of v4.4-rc2 due to the pervasive nature of this change.) - Documentation updates, including RCU requirements. - Expedited grace-period changes. - Miscellaneous fixes. - Linked-list fixes, courtesy of KTSAN. - Torture-test updates. - Late-breaking fix to sysrq-generated crash. One thing I should note is that these pieces of documentation are fairly large files: .../RCU/Design/Requirements/Requirements.html | 2897 ++++++++++++++++++++ .../RCU/Design/Requirements/Requirements.htmlx | 2741 ++++++++++++++++++ and are written in HTML, not the usual .txt style. I hope they are fine" Paul McKenney explains the html docs: "For whatever it is worth, the reason for this unconventional choice was that attempts to do the diagrams in ASCII art failed miserably. And attempts to do ASCII art for the upcoming documentation of the data structures failed even more miserably" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) sysrq: Fix warning in sysrq generated crash. list: Add lockless list traversal primitives rcu: Make rcu_gp_init() be bool rather than int rcu: Move wakeup out from under rnp->lock rcu: Fix comment for rcu_dereference_raw_notrace rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}() rcu: Make cpu_needs_another_gp() be bool rcu: Eliminate unused rcu_init_one() argument rcu: Remove TINY_RCU bloat from pointless boot parameters torture: Place console.log files correctly from the get-go torture: Abbreviate console error dump rcutorture: Print symbolic name for ->gp_state rcutorture: Print symbolic name for rcu_torture_writer_state rcutorture: Remove CONFIG_RCU_USER_QS from rcutorture selftest doc rcutorture: Default grace period to three minutes, allow override rcutorture: Dump stack when GP kthread stalls rcutorture: Flag nonexistent RCU GP kthread rcutorture: Add batch number to script printout Documentation/memory-barriers.txt: Fix ACCESS_ONCE thinko documentation: Update RCU requirements based on expedited changes ...
2016-01-11Merge branch 'work.xattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "Andreas' xattr cleanup series. It's a followup to his xattr work that went in last cycle; -0.5KLoC" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xattr handlers: Simplify list operation ocfs2: Replace list xattr handler operations nfs: Move call to security_inode_listsecurity into nfs_listxattr xfs: Change how listxattr generates synthetic attributes tmpfs: listxattr should include POSIX ACL xattrs tmpfs: Use xattr handler infrastructure btrfs: Use xattr handler infrastructure vfs: Distinguish between full xattr names and proper prefixes posix acls: Remove duplicate xattr name definitions gfs2: Remove gfs2_xattr_acl_chmod vfs: Remove vfs_xattr_cmp
2016-01-11Merge branch 'work.symlinks' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs RCU symlink updates from Al Viro: "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode even if the symlink is not an embedded one. No changes since the mailbomb on Jan 1" * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->get_link() to delayed_call, kill ->put_link() kill free_page_put_link() teach nfs_get_link() to work in RCU mode teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode teach shmem_get_link() to work in RCU mode teach page_get_link() to work in RCU mode replace ->follow_link() with new method that could stay in RCU mode don't put symlink bodies in pagecache into highmem namei: page_getlink() and page_follow_link_light() are the same thing ufs: get rid of ->setattr() for symlinks udf: don't duplicate page_symlink_inode_operations logfs: don't duplicate page_symlink_inode_operations switch befs long symlinks to page_symlink_operations
2016-01-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs compat_ioctl fixes from Al Viro: "This is basically Jann's patches from last week. I have _not_ included the stuff like switching i2c to ->compat_ioctl() into this one - those need more testing. Ideally I would like fs/compat_ioctl.c shrunk a lot, but that's a separate story" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS) compat_ioctl: don't pass fd around when not needed compat_ioctl: don't look up the fd twice
2016-01-12Merge branch 'xfs-misc-fixes-for-4.5-2' into for-nextDave Chinner
2016-01-12xfs: handle dquot buffer readahead in log recovery correctlyDave Chinner
When we do dquot readahead in log recovery, we do not use a verifier as the underlying buffer may not have dquots in it. e.g. the allocation operation hasn't yet been replayed. Hence we do not want to fail recovery because we detect an operation to be replayed has not been run yet. This problem was addressed for inodes in commit d891400 ("xfs: inode buffers may not be valid during recovery readahead") but the problem was not recognised to exist for dquots and their buffers as the dquot readahead did not have a verifier. The result of not using a verifier is that when the buffer is then next read to replay a dquot modification, the dquot buffer verifier will only be attached to the buffer if *readahead is not complete*. Hence we can read the buffer, replay the dquot changes and then add it to the delwri submission list without it having a verifier attached to it. This then generates warnings in xfs_buf_ioapply(), which catches and warns about this case. Fix this and make it handle the same readahead verifier error cases as for inode buffers by adding a new readahead verifier that has a write operation as well as a read operation that marks the buffer as not done if any corruption is detected. Also make sure we don't run readahead if the dquot buffer has been marked as cancelled by recovery. This will result in readahead either succeeding and the buffer having a valid write verifier, or readahead failing and the buffer state requiring the subsequent read to resubmit the IO with the new verifier. In either case, this will result in the buffer always ending up with a valid write verifier on it. Note: we also need to fix the inode buffer readahead error handling to mark the buffer with EIO. Brian noticed the code I copied from there wrong during review, so fix it at the same time. Add comments linking the two functions that handle readahead verifier errors together so we don't forget this behavioural link in future. cc: <stable@vger.kernel.org> # 3.12 - current Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-12xfs: inode recovery readahead can race with inode buffer creationDave Chinner
When we do inode readahead in log recovery, we do can do the readahead before we've replayed the icreate transaction that stamps the buffer with inode cores. The inode readahead verifier catches this and marks the buffer as !done to indicate that it doesn't yet contain valid inodes. In adding buffer error notification (i.e. setting b_error = -EIO at the same time as as we clear the done flag) to such a readahead verifier failure, we can then get subsequent inode recovery failing with this error: XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32 This occurs when readahead completion races with icreate item replay such as: inode readahead find buffer lock buffer submit RA io .... icreate recovery xfs_trans_get_buffer find buffer lock buffer <blocks on RA completion> ..... <ra completion> fails verifier clear XBF_DONE set bp->b_error = -EIO release and unlock buffer <icreate gains lock> icreate initialises buffer marks buffer as done adds buffer to delayed write queue releases buffer At this point, we have an initialised inode buffer that is up to date but has an -EIO state registered against it. When we finally get to recovering an inode in that buffer: inode item recovery xfs_trans_read_buffer find buffer lock buffer sees XBF_DONE is set, returns buffer sees bp->b_error is set fail log recovery! Essentially, we need xfs_trans_get_buf_map() to clear the error status of the buffer when doing a lookup. This function returns uninitialised buffers, so the buffer returned can not be in an error state and none of the code that uses this function expects b_error to be set on return. Indeed, there is an ASSERT(!bp->b_error); in the transaction case in xfs_trans_get_buf_map() that would have caught this if log recovery used transactions.... This patch firstly changes the inode readahead failure to set -EIO on the buffer, and secondly changes xfs_buf_get_map() to never return a buffer with an error state set so this first change doesn't cause unexpected log recovery failures. cc: <stable@vger.kernel.org> # 3.12 - current Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-11Merge remote-tracking branches 'spi/topic/sun4i', 'spi/topic/topcliff-pch' ↵Mark Brown
and 'spi/topic/zynq' into spi-next
2016-01-11Merge remote-tracking branches 'spi/topic/overlay', 'spi/topic/pxa2xx', ↵Mark Brown
'spi/topic/s3c64xx', 'spi/topic/sh-msiof' and 'spi/topic/spidev' into spi-next
2016-01-11Merge remote-tracking branches 'spi/topic/lm70llp', 'spi/topic/loopback', ↵Mark Brown
'spi/topic/mtk' and 'spi/topic/omap2-mcspi' into spi-next
2016-01-11Merge remote-tracking branches 'spi/topic/dw', 'spi/topic/dw-mid', ↵Mark Brown
'spi/topic/fsl-espi' and 'spi/topic/imx' into spi-next
2016-01-11Merge remote-tracking branches 'spi/topic/bcm63xx', 'spi/topic/butterfly', ↵Mark Brown
'spi/topic/cadence' and 'spi/topic/davinci' into spi-next
2016-01-11Merge remote-tracking branch 'spi/topic/sunxi' into spi-nextMark Brown
2016-01-11Merge remote-tracking branch 'spi/topic/core' into spi-nextMark Brown
2016-01-11Merge remote-tracking branch 'spi/fix/mtk' into spi-linusMark Brown
2016-01-11[media] Postpone the addition of MEDIA_IOC_G_TOPOLOGYMauro Carvalho Chehab
There are a few discussions left with regards to this ioctl: 1) the name of the new structs will contain _v2_ on it? 2) what's the best alternative to avoid compat32 issues? Due to that, let's postpone the addition of this new ioctl to the next Kernel version, to give people more time to discuss it. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] mxl111sf: Add a tuner entityMauro Carvalho Chehab
While mxl111sf may have multiple frontends, it has just one tuner. Reflect that on the media graph. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] dvbdev: create links on devices with multiple frontendsMauro Carvalho Chehab
Devices like mxl111sf-based WinTV Aero-m have multiple frontends, all linked on the same demod. Currently, the dvb_create_graph() function is not smart enough to create multiple links. Fix it. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] media-entitiy: add a function to create multiple linksMauro Carvalho Chehab
Sometimes, it is desired to create 1:n and n:1 or even n:n links between different entities with the same function. This is actually needed to support DVB devices that have multiple frontends. While we could do a function like that internally at the DVB core, such function is generic enough to be at media-entity, and it could be useful on some other places. So, add such function. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] dvb-usb-v2: postpone removal of media_deviceMauro Carvalho Chehab
We should not remove the media_device until its last usage, or we may have use after free troubles. So, move the per-adapter media_device removal to happen at the end of the adapter removal code. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] dvbdev: Add RF connector if neededMauro Carvalho Chehab
Several pure digital TV devices have a frontend with the tuner integrated on it. Add the RF connector when dvb_create_media_graph() is called on such devices. Tested with siano and dvb_usb_mxl111sf drivers. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] dvbdev: remove two dead functions if !CONFIG_MEDIA_CONTROLLER_DVBMauro Carvalho Chehab
Those functions are used only if CONFIG_MEDIA_CONTROLLER_DVB. Without that, if !CONFIG_MEDIA_CONTROLLER_DVB, it would produce two warnings: drivers/media/dvb-core/dvbdev.c:219:12: warning: 'dvb_create_tsout_entity' defined but not used [-Wunused-function] static int dvb_create_tsout_entity(struct dvb_device *dvbdev, ^ drivers/media/dvb-core/dvbdev.c:264:12: warning: 'dvb_create_media_entity' defined but not used [-Wunused-function] static int dvb_create_media_entity(struct dvb_device *dvbdev, ^ Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] call media_device_init() before registering the V4L2 deviceMauro Carvalho Chehab
Currently, v4l2_device_register() doesn't use the media_device struct. So, calling media_device_init() could be called either before or after v4l2_device_register(). Yet, it is a good practice to initialize everything before calling the register functions. Also, the other drivers call media_device_init() before registering the V4L2 device. So, move the call for media_device_init() to happen earlier on exynos4-is and s3c-camif. This is just a cleanup patch. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] uapi/media.h: Use u32 for the number of graph objectsMauro Carvalho Chehab
While we need to keep a u64 alignment to avoid compat32 issues, having the number of entities/pads/links/interfaces represented by an u64 is incoherent with the ID number, with is an u32. In order to make it coherent, change those quantities to u32. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] media-entity: don't sleep at media_device_register_entity()Mauro Carvalho Chehab
media_device_register_entity() is protected by a spin_lock. Calling ida_pre_get() with GFP_KERNEL may put it to sleep, with is a bad idea and causes this warning: [ 8812.397195] BUG: sleeping function called from invalid context at mm/slub.c:1287 [ 8812.397203] in_atomic(): 1, irqs_disabled(): 0, pid: 15179, name: modprobe [ 8812.397207] INFO: lockdep is turned off. [ 8812.397213] CPU: 2 PID: 15179 Comm: modprobe Tainted: G B 4.4.0-rc2+ #41 [ 8812.397218] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015 [ 8812.397222] 0000000000000000 ffff880314c77268 ffffffff818f8ba7 ffff8803b17dde00 [ 8812.397232] ffff880314c77290 ffffffff811c2ee5 ffff8803b17dde00 ffffffff8284dbc9 [ 8812.397241] 0000000000000507 ffff880314c772d0 ffffffff811c30d5 0000000041b58ab3 [ 8812.397250] Call Trace: [ 8812.397258] [<ffffffff818f8ba7>] dump_stack+0x4b/0x64 [ 8812.397265] [<ffffffff811c2ee5>] ___might_sleep+0x245/0x3a0 [ 8812.397270] [<ffffffff811c30d5>] __might_sleep+0x95/0x1a0 [ 8812.397276] [<ffffffff818fd083>] ? ida_pre_get+0x113/0x250 [ 8812.397282] [<ffffffff8153bb77>] kmem_cache_alloc+0x197/0x250 [ 8812.397288] [<ffffffff818fd083>] ida_pre_get+0x113/0x250 [ 8812.397293] [<ffffffff818fd265>] ida_simple_get+0xa5/0x170 [ 8812.397298] [<ffffffff818fd1c0>] ? ida_pre_get+0x250/0x250 [ 8812.397306] [<ffffffffa07382d1>] media_device_register_entity+0x171/0x420 [media] [ 8812.397318] [<ffffffffa129e76f>] v4l2_device_register_subdev+0x34f/0x640 [videodev] [ 8812.397324] [<ffffffffa0768dea>] v4l2_i2c_new_subdev_board+0x12a/0x250 [v4l2_common] [ 8812.397330] [<ffffffffa0768fe7>] v4l2_i2c_new_subdev+0xd7/0x110 [v4l2_common] [ 8812.397337] [<ffffffffa0768f10>] ? v4l2_i2c_new_subdev_board+0x250/0x250 [v4l2_common] [ 8812.397347] [<ffffffffa13d2f76>] au0828_card_analog_fe_setup+0x2e6/0x3f0 [au0828] [ 8812.397352] [<ffffffff814450cc>] ? power_down+0xc4/0xc4 [ 8812.397361] [<ffffffffa13d2c90>] ? au0828_tuner_callback+0x160/0x160 [au0828] [ 8812.397370] [<ffffffffa13d319f>] au0828_card_setup+0x11f/0x340 [au0828] [ 8812.397378] [<ffffffffa13d3080>] ? au0828_card_analog_fe_setup+0x3f0/0x3f0 [au0828] [ 8812.397384] [<ffffffff812a575b>] ? msleep+0x7b/0xc0 [ 8812.397393] [<ffffffffa13d0d79>] au0828_usb_probe+0x679/0xcf0 [au0828] [ 8812.397399] [<ffffffff81d7619d>] usb_probe_interface+0x45d/0x940 [ 8812.397406] [<ffffffff81ca7004>] driver_probe_device+0x454/0xd90 [ 8812.397411] [<ffffffff81ca7940>] ? driver_probe_device+0xd90/0xd90 [ 8812.397417] [<ffffffff81ca7940>] ? driver_probe_device+0xd90/0xd90 [ 8812.397422] [<ffffffff81ca7a61>] __driver_attach+0x121/0x160 [ 8812.397427] [<ffffffff81ca141f>] bus_for_each_dev+0x11f/0x1a0 [ 8812.397433] [<ffffffff81ca1300>] ? subsys_dev_iter_exit+0x10/0x10 [ 8812.397439] [<ffffffff822917d7>] ? _raw_spin_unlock+0x27/0x40 [ 8812.397445] [<ffffffff81ca5d4d>] driver_attach+0x3d/0x50 [ 8812.397450] [<ffffffff81ca5039>] bus_add_driver+0x4c9/0x770 [ 8812.397456] [<ffffffff81ca944c>] driver_register+0x18c/0x3b0 [ 8812.397462] [<ffffffff8124c952>] ? __raw_spin_lock_init+0x32/0x100 [ 8812.397468] [<ffffffff81d71e58>] usb_register_driver+0x1f8/0x440 [ 8812.397473] [<ffffffffa0208000>] ? 0xffffffffa0208000 [ 8812.397481] [<ffffffffa02080b7>] au0828_init+0xb7/0x1000 [au0828] [ 8812.397486] [<ffffffff810021b1>] do_one_initcall+0x141/0x300 [ 8812.397492] [<ffffffff81002070>] ? try_to_run_init_process+0x40/0x40 [ 8812.397497] [<ffffffff8123bbf6>] ? trace_hardirqs_on_caller+0x16/0x590 [ 8812.397502] [<ffffffff815406e6>] ? kasan_unpoison_shadow+0x36/0x50 [ 8812.397507] [<ffffffff815406e6>] ? kasan_unpoison_shadow+0x36/0x50 [ 8812.397512] [<ffffffff815406e6>] ? kasan_unpoison_shadow+0x36/0x50 [ 8812.397517] [<ffffffff815407f7>] ? __asan_register_globals+0x87/0xa0 [ 8812.397524] [<ffffffff814454e5>] do_init_module+0x1d0/0x5a4 [ 8812.397530] [<ffffffff812ed7e8>] load_module+0x6648/0x9d70 [ 8812.397535] [<ffffffff812e4b70>] ? symbol_put_addr+0x50/0x50 [ 8812.397546] [<ffffffff812e71a0>] ? module_frob_arch_sections+0x20/0x20 [ 8812.397552] [<ffffffff8158e950>] ? open_exec+0x50/0x50 [ 8812.397559] [<ffffffff811648db>] ? ns_capable+0x5b/0xd0 [ 8812.397565] [<ffffffff812f1208>] SyS_finit_module+0x108/0x130 [ 8812.397571] [<ffffffff812f1100>] ? SyS_init_module+0x1f0/0x1f0 [ 8812.397576] [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14 [ 8812.397582] [<ffffffff82292236>] entry_SYSCALL_64_fastpath+0x16/0x7a Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] media-entity: increase max number of PADsMauro Carvalho Chehab
The DVB drivers may have 257 PADs. Get the next power of two that would accomodate that amount. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] media-entity.h: document the remaining functionsMauro Carvalho Chehab
There are two ancillary functions that are missing comments. While those are used only internally at media-entity.c, document them, for completeness. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] media-device.h: use just one u32 counter for object IDMauro Carvalho Chehab
Instead of using one u32 counter per type for object IDs, use just one counter. With such change, it makes sense to simplify the debug logs too. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] media-entity.h fix documentation for several parametersMauro Carvalho Chehab
Several parameters added by the media_ent_enum patches were declared with wrong argument names: include/media/media-device.h:333: warning: No description found for parameter 'entity_internal_idx_max' include/media/media-device.h:354: warning: No description found for parameter 'ent_enum' include/media/media-device.h:354: warning: Excess function parameter 'e' description in 'media_entity_enum_init' include/media/media-device.h:333: warning: No description found for parameter 'entity_internal_idx_max' include/media/media-device.h:354: warning: No description found for parameter 'ent_enum' include/media/media-device.h:354: warning: Excess function parameter 'e' description in 'media_entity_enum_init' include/media/media-entity.h:397: warning: No description found for parameter 'ent_enum' include/media/media-entity.h:397: warning: Excess function parameter 'e' description in 'media_entity_enum_zero' include/media/media-entity.h:409: warning: No description found for parameter 'ent_enum' include/media/media-entity.h:409: warning: Excess function parameter 'e' description in 'media_entity_enum_set' include/media/media-entity.h:424: warning: No description found for parameter 'ent_enum' include/media/media-entity.h:424: warning: Excess function parameter 'e' description in 'media_entity_enum_clear' include/media/media-entity.h:441: warning: No description found for parameter 'ent_enum' include/media/media-entity.h:441: warning: Excess function parameter 'e' description in 'media_entity_enum_test' include/media/media-entity.h:458: warning: No description found for parameter 'ent_enum' include/media/media-entity.h:458: warning: Excess function parameter 'e' description in 'media_entity_enum_test_and_set' include/media/media-entity.h:474: warning: No description found for parameter 'ent_enum' include/media/media-entity.h:474: warning: Excess function parameter 'e' description in 'media_entity_enum_empty' include/media/media-entity.h:474: warning: Excess function parameter 'entity' description in 'media_entity_enum_empty' include/media/media-entity.h:489: warning: No description found for parameter 'ent_enum1' include/media/media-entity.h:489: warning: No description found for parameter 'ent_enum2' include/media/media-entity.h:489: warning: Excess function parameter 'e' description in 'media_entity_enum_intersects' include/media/media-entity.h:489: warning: Excess function parameter 'f' description in 'media_entity_enum_intersects' Fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-01-11[media] DocBook: document media_entity_graph_walk_cleanup()Mauro Carvalho Chehab
This function was added recently, but weren't documented. Add documentation for it. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>