summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-04-14fm10k: mbx_update_max_size does not drop all oversized messagesJeff Kirsher
When we call update_max_size it does not drop all oversized messages. This is due to the difficulty in performing this operation, since it is a FIFO which makes updating anything other than head or tail very difficult. To fix this, modify validate_msg_size to ensure that we error out later when trying to transmit the message that could be oversized. This will generally be a rare condition, as it requires the FIFO to include a message larger than the max_size negotiated during mailbox connect. Note that max_size is always smaller than rx.size so it should be safe to use here. Also, update the update_max_size function header comment to clearly indicate that it does not drop all oversized messages, but only those at the head of the FIFO. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: reset head instead of calling update_max_sizeJeff Kirsher
When we forcefully shutdown the mailbox, we then go about resetting max size to 0, and clearing all messages in the FIFO. Instead, we should just reset the head pointer so that the FIFO becomes empty, rather than changing the max size to 0. This helps prevent increment in tx_dropped counter during mailbox negotiation, which is confusing to viewers of Linux ethtool statistics output. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: renamed mbx_tx_dropped to mbx_tx_oversizedJeff Kirsher
The use of dropped doesn't really mean dropped mailbox messages, but rather specifically messages which were too large to fit in the remote Rx FIFO. Rename the stat to more clearly indicate what it means. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next A final pull request, I know it's very late but this time I think it's worth a bit of rush. The following patchset contains Netfilter/nf_tables updates for net-next, more specifically concatenation support and dynamic stateful expression instantiation. This also comes with a couple of small patches. One to fix the ebtables.h userspace header and another to get rid of an obsolete example file in tree that describes a nf_tables expression. This time, I decided to paste the original descriptions. This will result in a rather large commit description, but I think these bytes to keep. Patrick McHardy says: ==================== netfilter: nf_tables: concatenation support The following patches add support for concatenations, which allow multi dimensional exact matches in O(1). The basic idea is to split the data registers, currently consisting of 4 registers of 16 bytes each, into smaller units, 16 registers of 4 bytes each, and making sure each register store always leaves the full 32 bit in a well defined state, meaning smaller stores will zero the remaining bits. Based on that, we can load multiple adjacent registers with different values, thereby building a concatenated bigger value, and use that value for set lookups. Sets are changed to use variable sized extensions for their key and data values, removing the fixed limit of 16 bytes while saving memory if less space is needed. As a side effect, these patches will allow some nice optimizations in the future, like using jhash2 in nft_hash, removing the masking in nft_cmp_fast, optimized data comparison using 32 bit word size etc. These are not done so far however. The patches are split up as follows: * the first five patches add length validation to register loads and stores to make sure we stay within bounds and prepare the validation functions for the new addressing mode * the next patches prepare for changing to 32 bit addressing by introducing a struct nft_regs, which holds the verdict register as well as the data registers. The verdict members are moved to a new struct nft_verdict to allow to pull struct nft_data out of the stack. * the next patches contain preparatory conversions of expressions and sets to use 32 bit addressing * the next patch introduces so far unused register conversion helpers for parsing and dumping register numbers over netlink * following is the real conversion to 32 bit addressing, consisting of replacing struct nft_data in struct nft_regs by an array of u32s and actually translating and validating the new register numbers. * the final two patches add support for variable sized data items and variable sized keys / data in set elements The patches have been verified to work correctly with nft binaries using both old and new addressing. ==================== Patrick McHardy says: ==================== netfilter: nf_tables: dynamic stateful expression instantiation The following patches are the grand finale of my nf_tables set work, using all the building blocks put in place by the previous patches to support something like iptables hashlimit, but a lot more powerful. Sets are extended to allow attaching expressions to set elements. The dynset expression dynamically instantiates these expressions based on a template when creating new set elements and evaluates them for all new or updated set members. In combination with concatenations this effectively creates state tables for arbitrary combinations of keys, using the existing expression types to maintain that state. Regular set GC takes care of purging expired states. We currently support two different stateful expressions, counter and limit. Using limit as a template we can express the functionality of hashlimit, but completely unrestricted in the combination of keys. Using counter we can perform accounting for arbitrary flows. The following examples from patch 5/5 show some possibilities. Userspace syntax is still WIP, especially the listing of state tables will most likely be seperated from normal set listings and use a more structured format: 1. Limit the rate of new SSH connections per host, similar to iptables hashlimit: flow ip saddr timeout 60s \ limit 10/second \ accept 2. Account network traffic between each set of /24 networks: flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \ counter 3. Account traffic to each host per user: flow skuid . ip daddr \ counter 4. Account traffic for each combination of source address and TCP flags: flow ip saddr . tcp flags \ counter The resulting set content after a Xmas-scan look like this: { 192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040, 192.168.122.1 . ack : counter packets 74 bytes 3848, 192.168.122.1 . psh | ack : counter packets 35 bytes 3144 } In the future the "expressions attached to elements" will be extended to also support user created non-stateful expressions to allow to efficiently select beween a set of parameter sets, f.i. a set of log statements with different prefixes based on the interface, which currently require one rule each. This will most likely have to wait until the next kernel version though. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14fm10k: update xcast mode before synchronizing multicast addressesJeff Kirsher
When the PF receives a request to update a multicast address for the VF, it checks the enabled multicast mode first. Fix a bug where the VF tried to set a multicast address before requesting the required xcast mode. This ensures the multicast addresses are honored as long as the xcast mode was allowed. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: start service timer on probeJeff Kirsher
Since the service task handles varying work that doesn't all require the interface to be up, launch the service timer immediately. This ensures that we continually check the mailbox, as well as handle other tasks while the device is down. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: fix function header commentJeff Kirsher
The header comment included a miscopy of a C-code line, and also mis-used Rx FIFO when it clearly meant Tx FIFO Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: comment next_vf_mbx flowJeff Kirsher
Add a header comment explaining why we have the somewhat crazy mailbox flow. This flow is necessary as it prevents the PF<->SM mailbox from being flooded by the VF messages, which normally trigger a message to the PF. This helps prevent the case where we see a PF mailbox timeout. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: don't handle mailbox events in iov_event path and always process mailboxJeff Kirsher
Since we already schedule the service task, we can just wait for this task to handle the mailbox events from the VF. This reduces some complex code flow, and makes it so we have a single path for handling the VF messages. There is a possibility that we have a slight delay in handling VF messages, but it should be minimal. The result of tx_complete and !rx_ready is insufficient to determine whether we need to process the mailbox. There is a possible race condition whereby the VF fills up the mbmem for us, but we have already recently processed the mailboxes in the interrupt. During this time, the interrupt is disabled. Thus, our Rx FIFO is empty, but the mbmem now has data in it. Since we continually check whether Rx FIFO is empty, we then never call process. This results in the possibility to prevent PF from handling the VF mailbox messages. Instead, just call process every time, despite the fact that we may or may not have anything to process for the VF. There should be minimal overhead for doing this, and it resolves an issue where the VF never comes up due to never getting response for its SET_LPORT_STATE message. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: use separate workqueue for fm10k driverJeff Kirsher
Since we run the watchdog periodically, which might take a while and potentially monopolize the system default workqueue, create our own separate work queue. This also helps reduce and stabilize latency between scheduling the work in our interrupt and actually performing the work. Still use a timer for the regular scheduled interval but queue the work onto its own work queue. It seemed overkill to create a single workqueue per interface, so we just spawn a single work queue for all interfaces upon driver load. For this reason, use a multi-threaded workqueue with one thread per processor, rather than single threaded queue. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: Set PF queues to unlimited bandwidth during virtualizationJeff Kirsher
When returning virtualization queues from the VF back to the PF, do not retain the VF rate limiter. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Todd Russell <todd.a.russell@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: expose tx_timeout_count as an ethtool statJeff Kirsher
Named it tx_hang_count to differentiate it from tx_hwtstamp_timeout. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: only increment tx_timeout_count in Tx hang pathJeff Kirsher
We were incrementing the tx_timeout_count for both the Tx hang and then for all reset flows. Instead, we should only increment tx_timeout_count in the Tx hang path, so that our Tx hang counter does not increment when it was not caused by a Tx hang. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: remove extraneous "Reset interface" messageJeff Kirsher
Since we already print this message when a reset is requested via the RESET_REQUESTED flag, we do not need to print it before setting the flag. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: separate PF only stats so that VF does not display themJeff Kirsher
This patch resolves an issue with ethtool stats displaying useless values on the VF, because some stats simply have no meaning to the VF. Resolve this by splitting these out into PF_STATS and only showing them if we aren't the VF. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: use hw->mac.max_queues for statsJeff Kirsher
Even though it shouldn't strictly matter, don't count queue stats higher than the max_queues value stored for this mac. This ensures that we don't attempt to check queues which don't belong to use in VFs. This shouldn't be a visible change, as the VFs should see zero for queues which don't belong to them. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: only show actual queues, not the maximum in hardwareJeff Kirsher
Currently, we show statistics for all 128 queues, even though we don't necessarily have that many queues available especially in the VF case. Instead, use the hw->mac.max_queues value, which tells us how many queues we actually have, rather than the space for the rings we allocated. In this way, we prevent dumping statistics that are useless on the VF. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: allow creation of VLAN on default vidJeff Kirsher
Previously, the user was not allowed to create a VLAN interface on top of the switch default vid. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14Merge branch 'for-linus-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: "Part one: - struct filename-related cleanups - saner iov_iter_init() replacements (and switching the syscalls to use of those) - ntfs switch to ->write_iter() (Anton) - aio cleanups and splitting iocb into common and async parts (Christoph) - assorted fixes (me, bfields, Andrew Elble) There's a lot more, including the completion of switchover to ->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags race fixes, etc, but that goes after #for-davem merge. David has pulled it, and once it's in I'll send the next vfs pull request" * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits) sg_start_req(): use import_iovec() sg_start_req(): make sure that there's not too many elements in iovec blk_rq_map_user(): use import_single_range() sg_io(): use import_iovec() process_vm_access: switch to {compat_,}import_iovec() switch keyctl_instantiate_key_common() to iov_iter switch {compat_,}do_readv_writev() to {compat_,}import_iovec() aio_setup_vectored_rw(): switch to {compat_,}import_iovec() vmsplice_to_user(): switch to import_iovec() kill aio_setup_single_vector() aio: simplify arguments of aio_setup_..._rw() aio: lift iov_iter_init() into aio_setup_..._rw() lift iov_iter into {compat_,}do_readv_writev() NFS: fix BUG() crash in notify_change() with patch to chown_common() dcache: return -ESTALE not -EBUSY on distributed fs race NTFS: Version 2.1.32 - Update file write from aio_write to write_iter. VFS: Add iov_iter_fault_in_multipages_readable() drop bogus check in file_open_root() switch security_inode_getattr() to struct path * constify tomoyo_realpath_from_path() ...
2015-04-14fm10k: fix unused warningsJeff Kirsher
The were several functions which had parameters which were never or sometimes used in functions. To resolve possible compiler warnings, use __always_unused or __maybe_unused kernel macros to resolve. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: Add netconsole supportJeff Kirsher
This change adds a function called "fm10k_netpoll" that's used to define "ndo_poll_controller" in "fm10k_netdev_ops". This is required to enable support for "netconsole" in fm10k. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: Have the VF get the default VLAN during initJeff Kirsher
Currently, the VFs do not read the default VLAN during initialization, so they will not be able to indicate untagged frames properly. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: Correct spelling mistakeJeff Kirsher
Corrected a spelling mistake that was found over time. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14fm10k: Remove redundant rx_errors in ethtoolJeff Kirsher
Output of ethtool was reporting 2 rx_errors entries. This change removes one of the redundant entries. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
2015-04-14fm10k: Corrected an error in Tx statisticsJeff Kirsher
The function collecting Tx statistics was actually using values from the RX ring. Thus, Tx and Rx statistics values reported by "ifconfig" will return identical values. This change corrects this error and the Tx statistics is now reading from the Tx ring. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
2015-04-14Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
2015-04-14Merge branch 'devel-stable' into for-nextRussell King
2015-04-14Merge branches 'misc', 'vdso' and 'fixes' into for-nextRussell King
Conflicts: arch/arm/mm/proc-macros.S
2015-04-14ARM: update errata 430973 documentation to cover Cortex A8 r1p*Russell King
This errata covers all r1 variants of Cortex A8, it's not limited to just r1p0..r1p2. Update the documentation to reflect this. The code already applies the workaround to all r1p* A8 CPUs. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: ensure delay timer has sufficient accuracy for delaysRussell King
We have recently had an example of someone wanting to use a 90kHz timer for the software delay loop. udelay() needs to have at least microsecond resolution to allow drivers access to a delay mechanism with a reasonable chance of delaying the period they requested within at least a 50% marging of error, especially for small delays. Discussion about the udelay() accuracy can be found at: https://lkml.org/lkml/2011/1/9/37 Reject timers which are unable to supply this level of resolution. Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: switch to use the generic show_mem() implementationRussell King
Switch ARM to use the generic show_mem() implementation, which displays the statistics from the mm zone rather than walking the page arrays. Acked-by: Mel Gorman <mgorman <mgorman@suse.de> Tested-by: Gregory Fong <gregory.0xf0@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: proc-v7: avoid errata 430973 workaround for non-Cortex A8 CPUsRussell King
Avoid the errata 430973 workaround for non-Cortex A8 CPUs. Having this workaround enabled introduces an additional branch target buffer flush into the context switching path, something we wish to avoid. To allow this errata to be enabled in multiplatform kernels while reducing its impact, rearrange the Cortex-A8 CPU support to avoid impacting on other Version 7 CPUs. Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: enable ARM errata 643719 workaround by defaultRussell King
The effects of not having ARM errata 643719 enabled on affected CPUs can be very confusing and hard to debug. Rather than leave this to chance, enable this workaround by default. Now that we have rearranged the code, it should have a low impact on the majority of CPUs. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: cache-v7: optimise test for Cortex A9 r0pX devicesRussell King
Eliminate one unnecessary instruction from this test by pre-shifting the Cortex A9 ID - we can shift the actual ID in the teq instruction thereby losing the pX bit of the ID at no cost. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: cache-v7: optimise branches in v7_flush_cache_louisRussell King
Optimise the branches such that for the majority of unaffected devices, we avoid needing to execute the errata work-around code path by branching to start_flush_levels early. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: cache-v7: consolidate initialisation of cache level indexRussell King
Both v7_flush_cache_louis and v7_flush_dcache_all both begin the flush_levels loop with r10 initialised to zero. In each case, this is done immediately prior to entering the loop. Branch to this instruction in v7_flush_dcache_all from v7_flush_cache_louis and eliminate the unnecessary initialisation in v7_flush_cache_louis. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: cache-v7: shift CLIDR to extract appropriate field before maskingRussell King
Rather than have code which masks and then shifts, such as: mrc p15, 1, r0, c0, c0, 1 ALT_SMP(ands r3, r0, #7 << 21) ALT_UP( ands r3, r0, #7 << 27) ALT_SMP(mov r3, r3, lsr #20) ALT_UP( mov r3, r3, lsr #26) re-arrange this as a shift and then mask. The masking is the same for each field which we want to extract, so this allows the mask to be shared amongst code paths: mrc p15, 1, r0, c0, c0, 1 ALT_SMP(mov r3, r0, lsr #20) ALT_UP( mov r3, r0, lsr #26) ands r3, r3, #7 << 1 Use this method for the LoUIS, LoUU and LoC fields. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: cache-v7: use movw/movt instructionsRussell King
We always build cache-v7.S for ARMv7, so we can use the ARMv7 16-bit move instructions to load large constants, rather than using constants in a literal pool. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14ARM: allow 16-bit instructions in ALT_UP()Russell King
Allow ALT_UP() to cope with a 16-bit Thumb instruction by automatically inserting a following nop instruction. This allows us to care less about getting the assembler to emit a 32-bit thumb instruction. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull NOHZ changes from Ingo Molnar: "This tree adds full dynticks support to KVM guests (support the disabling of the timer tick on the guest). The main missing piece was the recognition of guest execution as RCU extended quiescent state and related changes" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest context_tracking: Export context_tracking_user_enter/exit context_tracking: Run vtime_user_enter/exit only when state == CONTEXT_USER context_tracking: Add stub context_tracking_is_enabled context_tracking: Generalize context tracking APIs to support user and guest context_tracking: Rename context symbols to prepare for transition state ppc: Remove unused cpp symbols in kvm headers
2015-04-14Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: "The main changes in this cycle were: - changes permitting use of call_rcu() and friends very early in boot, for example, before rcu_init() is invoked. - add in-kernel API to enable and disable expediting of normal RCU grace periods. - improve RCU's handling of (hotplug-) outgoing CPUs. - NO_HZ_FULL_SYSIDLE fixes. - tiny-RCU updates to make it more tiny. - documentation updates. - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well cpu: Defer smpboot kthread unparking until CPU known to scheduler rcu: Associate quiescent-state reports with grace period rcu: Yet another fix for preemption and CPU hotplug rcu: Add diagnostics to grace-period cleanup rcutorture: Default to grace-period-initialization delays rcu: Handle outgoing CPUs on exit from idle loop cpu: Make CPU-offline idle-loop transition point more precise rcu: Eliminate ->onoff_mutex from rcu_node structure rcu: Process offlining and onlining only at grace-period start rcu: Move rcu_report_unblock_qs_rnp() to common code rcu: Rework preemptible expedited bitmask handling rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs rcutorture: Enable slow grace-period initializations rcu: Provide diagnostic option to slow down grace-period initialization rcu: Detect stalls caused by failure to propagate up rcu_node tree rcu: Eliminate empty HOTPLUG_CPU ifdef rcu: Simplify sync_rcu_preempt_exp_init() rcu: Put all orphan-callback-related code under same comment rcu: Consolidate offline-CPU callback initialization ...
2015-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The dwmac-socfpga.c conflict was a case of a bug fix overlapping changes in net-next to handle an error pointer differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14Merge branch 'cxgb4-next'David S. Miller
Hariprasad Shenai says: ==================== cxgb4: Misc. fixes for sge Increases value of MAX_IMM_TX_PKT_LEN to improve latency, fill freelist starving threshold based on adapter type, add comments for tx flits and sge length code and don't call t4_slow_intr_handler when we are not master PF. This patch series has been created against net-next tree and includes patches on cxgb4 driver We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Don't call t4_slow_intr_handler when we're not the Master PFHariprasad Shenai
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Add comment for calculate tx flits and sge length codeHariprasad Shenai
Add comment for tx filt and sge length calucaltion code, also remove a hardcoded value Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Use device node in page allocationHariprasad Shenai
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Freelist starving threshold varies from adapter to adapterHariprasad Shenai
fl_starv_thres could be different from adapter to adapter, don't use hardcoded values Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14cxgb4: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytesHariprasad Shenai
This allows a significant latency drop for packets of sizes between 128 and 192 bytes Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: drop ring->num_slotsFelix Fietkau
The ring size is always known at compile time, so make the code a bit more efficient Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14bgmac: fix DMA rx corruptionFelix Fietkau
The driver needs to inform the hardware about the first invalid (not yet filled) rx slot, by writing its DMA descriptor pointer offset to the BGMAC_DMA_RX_INDEX register. This register was set to a value exceeding the rx ring size, effectively allowing the hardware constant access to the full ring, regardless of which slots are initialized. To fix this issue, always mark the last filled rx slot as invalid. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>