summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-01net/mlx5e: Re-attempt to offload flows on multipath port affinity eventsRoi Dayan
Under multipath it's possible for us to offload the flow only through the e-switch for which proper route through the uplink exists. When the port is up and the next-hop route is set again we want to offload through it as well. We generate SW event from the FIB event handler when multipath port affinity changes. The tc offloads code gets this event, goes over the flows which were marked as of having missing route and attempts to offload them. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Emit port affinity event for multipath offloadsRoi Dayan
Under multipath offload scheme, as part of handling fib events, emit mlx5 port affinity event on the enabled ports which will be handled by the tc offloads code. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5e: Allow one failure when offloading tc encap rules under multipathRoi Dayan
In a similar manner to uplink/VF LAG, under multipath we add encap peer rule on the second port as well. However, unlike the LAG case, we do want to allow failure for adding one of the rules. This happens due to using a routing hint while doing the route lookup when one path (next hop device) is down. Introduce a new flag to indicate that route lookup failed for encap flow. Note that a flow may still not be offloaded to hw due to missing neighbour, in that case, the neigh update event will take care of it. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5e: Don't inherit flow flags on peer flow creationRoi Dayan
Currently the peer flow inherits the flags from the original flow after we've set it. At this time the flags are set according to the flow state, e.g marked as going to slow path and such. Even if not getting us to real bugs now, this opens the door to get us to troubles later. Future proof the code and avoid the inheritance, use the peer flags as were set on input when we started adding the original flow. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5e: Activate HW multipath and handle port affinity based on FIB eventsRoi Dayan
To support multipath offload we are going to track SW multipath route and related nexthops. To do that we register to FIB notifier and handle the route and next-hops events and reflect that as port affinity to HW. When there is a new multipath route entry that all next-hops are the ports of an HCA we will activate LAG in HW. Egress wise, we use HW LAG as the means to emulate multipath on current HW which doesn't support port selection based on xmit hash. In the presence of multiple VFs which use multiple SQs (send queues) this yields fairly good distribution. HA wise, HW LAG buys us the ability for a given RQ (receive queue) to receive traffic from both ports and for SQs to migrate xmitting over the active port if their base port fails. When the route entry is being updated to single path we will update the HW port affinity to use that port only. If a next-hop becomes dead we update the HW port affinity to the living port. When all next-hops are alive again we reset the affinity to default. Due to FW/HW limitations, when a route is deleted we are not disabling the HW LAG since doing so will not allow us to enable it again while VFs are bounded. Typically this is just a temporary state when a routing daemon removes dead routes and later adds them back as needed. This patch only handles events for AF_INET. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Add multipath modeRoi Dayan
In order to offload ecmp-on-host scheme where next-hop routes are used, we will make use of HW LAG. Add accessor function to let upper layers in the driver to realize if the lag acts in multi-path mode. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Use own workqueue for lag netdev events processingRoi Dayan
Instead of using the system workqueue, allocate our own workqueue. This workqueue will be used to handle more work in the next patch. This patch doesn't change functionality. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Expose lag operations in header fileRoi Dayan
The change is a refactoring step towards a multipath use case. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Use unsigned int bit instead of bool as a struct memberRoi Dayan
This fix checkpatch check CHECK: Avoid using bool structure members because of possible alignment issues Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5e: Don't make internal use of errno to denote missing neighRoi Dayan
EAGAIN is treated as a specific case when we consider the attachment successful but wait for neigh event before offloading the flow. This can result in unwanted behavior when sub calls on the offloading path will return EAGAIN and we pass this error up. Instead of attaching to a specific error code return a boolean value from the attach encap operation saying if the encap is valid or not. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5e: Cleanup attach encap functionRoi Dayan
Remove the tunnel info argument which we can get from the other args. Also reorder the args to have input args first and output args later. This patch doesn't change functionality. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as staticEran Ben Elisha
Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx reporter, move it to be statically declared in en/reporter_tx.c. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01Merge branch 'nfp-control-processor-DMA-support-and-RJ45'David S. Miller
Jakub Kicinski says: ==================== nfp: control processor DMA support and RJ45 This series starts with adding support for reporting twisted pair media type in ethtool. Remaining patches add support for using DMA with the control/service processor. Currently we always copy the command data into card's memory. DMA support allows us to have the NSP read the data from host memory by itself. Unfortunately, the FW loading and flashing cannot directly map the buffers for DMA because (a) the firmware ABI returns const buffers, and (b) the buffers may be vmalloc()ed in many mysterious/unmappable way. So just bite the bullet - allocate new host buffer for the command and copy. As Dirk explains, the NSP now supports updating all FWs at once which means the max flashing time grew significantly. He bumps the max wait to avoid timeouts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: set higher timeout for flash bundleDirk van der Merwe
The management firmware now supports being passed a bundle with multiple components to be stored in flash at once. This makes it easier to update all components to a known state with a single user command, however, this also has the potential to increase the time required to perform the update significantly. The management firmware only updates the components out of a bundle which are outdated, however, we need to make sure we can handle the absolute worst case where a CPLD update can take a long time to perform. We set a very conservative total timeout of 900s which already adds a contingency. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: allow the use of DMA bufferJakub Kicinski
Newer versions of NSP can access host memory. Simplest access type requires all data to be in one contiguous area. Since we don't have the guarantee on where callers of the NSP ABI will allocate their buffers we allocate a bounce buffer and copy the data in and out. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: move default buffer handling into its own functionJakub Kicinski
DMA version of NSP communication is coming, move the code which copies data into the NFP buffer into a separate function. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: nsp: use fractional size of the bufferJakub Kicinski
NSP expresses the buffer size in MB and 4 kB blocks. For small buffers the kB part may make a difference, so count it in. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01nfp: report RJ45 connector in ethtoolJakub Kicinski
Add support for reporting twisted pair port type. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01lan743x: Fix TX Stall IssueBryan Whitehead
It has been observed that tx queue stalls while downloading from certain web sites (example www.speedtest.net) The cause has been tracked down to a corner case where dma descriptors where not setup properly. And there for a tx completion interrupt was not signaled. This fix corrects the problem by properly marking the end of a multi descriptor transmission. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: phy: phylink: fix uninitialized variable in phylink_get_mac_stateHeiner Kallweit
When debugging an issue I found implausible values in state->pause. Reason in that state->pause isn't initialized and later only single bits are changed. Also the struct itself isn't initialized in phylink_resolve(). So better initialize state->pause and other not yet initialized fields. v2: - use right function name in subject v3: - initialize additional fields Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: aquantia: regression on cpus with high cores: set mode with 8 queuesDmitry Bogdanov
Recently the maximum number of queues was increased up to 8, but NIC was not fully configured for 8 queues. In setups with more than 4 CPU cores parts of TX traffic gets lost if the kernel routes it to queues 4th-8th. This patch sets a tx hw traffic mode with 8 queues. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202651 Fixes: 71a963cfc50b ("net: aquantia: increase max number of hw queues") Reported-by: Nicholas Johnson <nicholas.johnson@outlook.com.au> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01selftests: fixes for UDP GROPaolo Abeni
The current implementation for UDP GRO tests is racy: the receiver may flush the RX queue while the sending is still transmitting and incorrectly report RX errors, with a wrong number of packet received. Add explicit timeouts to the receiver for both connection activation (first packet received for UDP) and reception completion, so that in the above critical scenario the receiver will wait for the transfer completion. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: marvell: neta: disable comphy when setting modeMarek Behún
The comphy driver for Armada 3700 by Miquèl Raynal (which is currently in linux-next) does not actually set comphy mode when phy_set_mode_ext is called. The mode is set at next call of phy_power_on. Update the driver to semantics similar to mvpp2: helper mvneta_comphy_init sets comphy mode and powers it on. When mode is to be changed in mvneta_mac_config, first power the comphy off, then call mvneta_comphy_init (which sets the mode to new one). Only do this when new mode is different from old mode. This should also work for Armada 38x, since in that comphy driver methods power_on and power_off are unimplemented. Signed-off-by: Marek Behún <marek.behun@nic.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01Merge branch 'enetc-Add-mdio-support-and-device-tree-nodes'David S. Miller
Claudiu Manoil says: ==================== enetc: Add mdio support and device tree nodes This is the missing part to enable PCI probing of the ENETC ethernet ports on the LS1028A SoC and external traffic on the LS1028A RDB board. It's one of the first items on the TODO list for the recently merged ENETC ethernet driver. v3: Add DT bindings doc for ENETC connections v4: none ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01dt-bindings: net: freescale: enetc: Add connection bindings for ENETC ↵Claudiu Manoil
ethernet nodes Define connection bindings (external PHY connections and internal links) for the ENETC on-chip ethernet controllers. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01enetc: Add ENETC PF level external MDIO supportClaudiu Manoil
Each ENETC PF has its own MDIO interface, the corresponding MDIO registers are mapped in the ENETC's Port register block. The current patch adds a driver for these PF level MDIO buses, so that each PF can manage directly its own external link. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A ↵Claudiu Manoil
RDB board The LS1028A RDB board features an Atheros PHY connected over SGMII to the ENETC PF0 (or Port0). ENETC Port1 (PF1) has no external connection on this board, so it can be disabled for now. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpointsClaudiu Manoil
The LS1028A SoC features a PCI Integrated Endpoint Root Complex (IERC) defining several integrated PCI devices, including the ENETC ethernet controller integrated endpoints (IEPs). The IERC implements ECAM (Enhanced Configuration Access Mechanism) to provide access to the PCIe config space of the IEPs. This means the the IEPs (including ENETC) do not support the standard PCIe BARs, instead the Enhanced Allocation (EA) capability structures in the ECAM space are used to fix the base addresses in the system, and the PCI subsystem uses these structures for device enumeration and discovery. The "ranges" entries contain basic information from these EA capabily structures required by the kernel for device enumeration. The current patch also enables the first 2 ENETC PFs (Physiscal Functions) and the associated VFs (Virtual Functions), 2 VFs for each PF. Each of these ENETC PFs has an external ethernet port on the LS1028A SoC. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01perf beauty msg_flags: Add missing %s lost when adding prefix suppression logicArnaldo Carvalho de Melo
When the prefix suppresion/enabling logic was added, I forgot to add an extra %, which ended up chopping off the strings: Before: # perf trace -e *mmsg --map-dump syscalls [299] = 1, [307] = 1, DNS Res~ver #3/14587 sendmmsg(106<socket:[3462393]>, 0x7f252b0fcaf0, 2, MSG_) = 2 chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_, NULL) = 1 DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2 DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2 DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2 DNS Res~ver #3/14587 sendmmsg(148<socket:[3460636]>, 0x7f252b0fcaf0, 2, MSG_) = 2 DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2 ^C# After: # perf trace -e *mmsg --map-dump syscalls [299] = 1, [307] = 1, NetworkManager/17467 sendmmsg(22<socket:[3466493]>, 0x7f28927f9bb0, 2, MSG_NOSIGNAL) = 2 pool/17478 sendmmsg(10<socket:[3466523]>, 0x7f2769f95e90, 2, MSG_NOSIGNAL) = 2 DNS Res~ver #3/14587 sendmmsg(121<socket:[3466132]>, 0x7f252b0fcaf0, 2, MSG_NOSIGNAL) = 2 chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_DONTWAIT, NULL) = 1 Socket Thread/17433 sendmmsg(121<socket:[3460903]>, 0x7f252668baf0, 2, MSG_NOSIGNAL) = 2 ^C# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: c65c83ffe904 ("perf trace: Allow asking for not suppressing common string prefixes") Link: https://lkml.kernel.org/n/tip-t2eu1rqx710k6jr4814mlzg7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf scripts python: exported-sql-viewer.py: Add call treeAdrian Hunter
Add a new report to display a call tree. The Call Tree report is very similar to the Context-Sensitive Call Graph, but the data is not aggregated. Also the 'Count' column, which would be always 1, is replaced by the 'Call Time'. Committer testing: $ cat simple-retpoline.c /* https://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c $ objdump -d simple-retpoline */ __attribute__((noinline)) int bar(void) { return -1; } int foo(void) { return bar() + 1; } __attribute__((indirect_branch("thunk"))) int main() { int (*volatile fn)(void) = foo; fn(); return fn(); } $ $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline $ perf script -i simple-retpoline.perf.data --itrace=be -s ~acme/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls $ python ~acme/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db And in the GUI select: "Reports" "Call Tree" Call Path | Object | Call Time (ns) | Time (ns) | Time (%) | Branch Count | Brach Count (%) | > simple-retpolin > PID:TID > _start ld-2.28.so 2193855505777 156267 100.0 10602 100.0 unknown unknown 2193855506010 2276 1.5 1 0.0 > _dl_start ld-2.28.so 2193855508286 137047 87.7 10088 95.2 > _dl_init ld-2.28.so 2193855645444 9142 5.9 326 3.1 > _start simple-retpoline 2193855654587 7457 4.8 182 1.7 > __libc_start_main <SNIP> <SNIP> > main simple-retpoline 2193855657493 32 0.5 12 6.7 > foo simple-retpoline 2193855657493 14 43.8 5 41.7 <SNIP> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-enf0w96gqzfpv4fi16pw9ovc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf scripts python: exported-sql-viewer.py: Factor out CallGraphModelBaseAdrian Hunter
Factor out a base class CallGraphModelBase from CallGraphModel, so that CallGraphModelBase can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-76eybebzjwvgnadkm2oufrqi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf scripts python: exported-sql-viewer.py: Improve TreeModel abstractionAdrian Hunter
Instead of passing the tree root, get it from a method that can be implemented in any derived class. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-ovcv28bg4mt9swk36ypdyz14@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf scripts python: exported-sql-viewer.py: Factor out TreeWindowBaseAdrian Hunter
Factor out a base class TreeWindowBase from CallGraphWindow, so that TreeWindowBase can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-ifirw0c0mhkwxg6l12lk6k4p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf scripts python: export-to-postgresql.py: Export calls parent_idAdrian Hunter
Export to the 'calls' table the newly created 'parent_id' and create an index for it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-eybd6fnk6j9r7g643lsideoo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf scripts python: export-to-postgresql.py: Fix invalid input syntax for ↵Adrian Hunter
integer error Fix SQL query error "invalid input syntax for integer": Traceback (most recent call last): File "tools/perf/scripts/python/export-to-postgresql.py", line 465, in <module> do_query(query, 'CREATE VIEW calls_view AS ' File "tools/perf/scripts/python/export-to-postgresql.py", line 274, in do_query raise Exception("Query failed: " + q.lastError().text()) Exception: Query failed: ERROR: invalid input syntax for integer: "" LINE 1: ...ch_count,call_id,return_id,CASE WHEN flags=0 THEN '' WHEN fl... ^ (22P02) QPSQL: Unable to create query Error running python script tools/perf/scripts/python/export-to-postgresql.py Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol") Link: https://lkml.kernel.org/n/tip-strfpdozrvg7bi1xzrivxzqt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf scripts python: export-to-sqlite.py: Export calls parent_idAdrian Hunter
Export to the 'calls' table the newly created 'parent_id'. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-b09oukl48rsl9azkp2wmh0bl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf db-export: Add calls parent_id to enable creation of call treesAdrian Hunter
The call_path can be used to find the parent symbol for a call but not the exact parent call. To do that add parent_id to the call_return export. This enables the creation of a call tree from the exported data. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-6j7tzdxo67cox6kan7k22oo6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf intel-pt: Fix divide by zero when TSC is not availableAdrian Hunter
When TSC is not available, "timeless" decoding is used but a divide by zero occurs if perf_time_to_tsc() is called. Ensure the divisor is not zero. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v4.9+ Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf auxtrace: Improve address filter error message when there is no DSOAdrian Hunter
The message does not indicate the possibility that the symbol is not found because the file does not exist. Before: $ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls Symbol 'strcmp' not found. Note that symbols must be functions. Failed to parse address filter: 'filter strcmp / strcpy @ foo ' Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>] Where multiple filters are separated by space or comma. After: $ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls File 'foo' not found or has no symbols. Symbol 'strcmp' not found. Note that symbols must be functions. Failed to parse address filter: 'filter strcmp / strcpy @ foo ' Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>] Where multiple filters are separated by space or comma. Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lkml.kernel.org/n/tip-dvngzxd0jkplzw1ary69dilb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01Merge tag 'iommu-fix-v5.0-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fix from Joerg Roedel: "One important fix for a memory corruption issue in the Intel VT-d driver that triggers on hardware with deep PCI hierarchies" * tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/dmar: Fix buffer overflow during PCI bus notification
2019-03-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "2 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hugetlbfs: fix races and page leaks during migration kasan: turn off asan-stack for clang-8 and earlier
2019-03-01hugetlbfs: fix races and page leaks during migrationMike Kravetz
hugetlb pages should only be migrated if they are 'active'. The routines set/clear_page_huge_active() modify the active state of hugetlb pages. When a new hugetlb page is allocated at fault time, set_page_huge_active is called before the page is locked. Therefore, another thread could race and migrate the page while it is being added to page table by the fault code. This race is somewhat hard to trigger, but can be seen by strategically adding udelay to simulate worst case scheduling behavior. Depending on 'how' the code races, various BUG()s could be triggered. To address this issue, simply delay the set_page_huge_active call until after the page is successfully added to the page table. Hugetlb pages can also be leaked at migration time if the pages are associated with a file in an explicitly mounted hugetlbfs filesystem. For example, consider a two node system with 4GB worth of huge pages available. A program mmaps a 2G file in a hugetlbfs filesystem. It then migrates the pages associated with the file from one node to another. When the program exits, huge page counts are as follows: node0 1024 free_hugepages 1024 nr_hugepages node1 0 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool That is as expected. 2G of huge pages are taken from the free_hugepages counts, and 2G is the size of the file in the explicitly mounted filesystem. If the file is then removed, the counts become: node0 1024 free_hugepages 1024 nr_hugepages node1 1024 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool Note that the filesystem still shows 2G of pages used, while there actually are no huge pages in use. The only way to 'fix' the filesystem accounting is to unmount the filesystem If a hugetlb page is associated with an explicitly mounted filesystem, this information in contained in the page_private field. At migration time, this information is not preserved. To fix, simply transfer page_private from old to new page at migration time if necessary. There is a related race with removing a huge page from a file and migration. When a huge page is removed from the pagecache, the page_mapping() field is cleared, yet page_private remains set until the page is actually freed by free_huge_page(). A page could be migrated while in this state. However, since page_mapping() is not set the hugetlbfs specific routine to transfer page_private is not called and we leak the page count in the filesystem. To fix that, check for this condition before migrating a huge page. If the condition is detected, return EBUSY for the page. Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> [mike.kravetz@oracle.com: v2] Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com [mike.kravetz@oracle.com: update comment and changelog] Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-01kasan: turn off asan-stack for clang-8 and earlierArnd Bergmann
Building an arm64 allmodconfig kernel with clang results in over 140 warnings about overly large stack frames, the worst ones being: drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare' drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable' drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread' drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe' drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd' drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo' None of these happen with gcc today, and almost all of these are the result of a single known issue in llvm. Hopefully it will eventually get fixed with the clang-9 release. In the meantime, the best idea I have is to turn off asan-stack for clang-8 and earlier, so we can produce a kernel that is safe to run. I have posted three patches that address the frame overflow warnings that are not addressed by turning off asan-stack, so in combination with this change, we get much closer to a clean allmodconfig build, which in turn is necessary to do meaningful build regression testing. It is still possible to turn on the CONFIG_ASAN_STACK option on all versions of clang, and it's always enabled for gcc, but when CONFIG_COMPILE_TEST is set, the option remains invisible, so allmodconfig and randconfig builds (which are normally done with a forced CONFIG_COMPILE_TEST) will still result in a mostly clean build. Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de Link: https://bugs.llvm.org/show_bug.cgi?id=38809 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Qian Cai <cai@lca.pw> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-01Merge tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Three final fixes, one for a feature that is new in this kernel, one bochs fix for qemu riscv and one atomic modesetting fix. I've left a few of the other late fixes until next as I didn't want to throw in anything that wasn't really necessary" * tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm: drm/bochs: Fix the ID mismatch error drm: Block fb changes for async plane updates drm/amd/display: Use vrr friendly pageflip throttling in DC.
2019-03-01s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=yMartin Schwidefsky
The dasd_eckd_restore_device() function calls dasd_generic_read_dev_chars with a temporary buffer on the stack. With CONFIG_VMAP_STACK=y this is a vmalloc address but dasd_generic_restore_device() uses the address of the buffer as I/O address. Circumvent this by using the already allocated cqr->data buffer for the RDC data. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01s390/suspend: fix prefix register reset in swsusp_arch_resumeMartin Schwidefsky
The reset of the prefix to zero in swsusp_arch_resume uses a 4 byte stack slot. With CONFIG_VMAP_STACK=y this is now in the vmalloc area, this works only with DAT enabled. Move the DAT disable in swsusp_arch_resume after the prefix reset. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01bpf: drop refcount if bpf_map_new_fd() fails in map_create()Peng Sun
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file descriptor is supposed to return to userspace. When bpf_map_new_fd() fails, drop the refcount. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <sironhide0null@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01Merge tag 'qcom-fixes-for-5.0-rc8' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into arm/fixes Qualcomm ARM64 Fixes for 5.0-rc8 * Fix TZ memory area size to avoid crashes during boot * tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: arm64: dts: qcom: msm8998: Extend TZ reserved memory area
2019-03-01perf time-utils: Refactor time range parsing codeJin Yao
Jiri points out that we don't need any time checking and time string parsing if the --time option is not set. That makes sense. This patch refactors the time range parsing code, move the duplicated code from perf report and perf script to time_utils and check if --time option is set before parsing the time string. This patch is no logic change expected. So the usage of --time is same as before. For example: Select the first and second 10% time slices: perf report --time 10%/1,10%/2 perf script --time 10%/1,10%/2 Select the slices from 0% to 10% and from 30% to 40%: perf report --time 0%-10%,30%-40% perf script --time 0%-10%,30%-40% Select the time slices from timestamp 3971 to 3973 perf report --time 3971,3973 perf script --time 3971,3973 Committer testing: Using the above examples, check before and after to see if it remains the same: $ perf record -F 10000 -- find . -name "*.[ch]" -exec cat {} + > /dev/null [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 1.626 MB perf.data (42392 samples) ] $ $ perf report --time 10%/1,10%/2 > /tmp/report.before.1 $ perf script --time 10%/1,10%/2 > /tmp/script.before.1 $ perf report --time 0%-10%,30%-40% > /tmp/report.before.2 $ perf script --time 0%-10%,30%-40% > /tmp/script.before.2 $ perf report --time 180457.375844,180457.377717 > /tmp/report.before.3 $ perf script --time 180457.375844,180457.377717 > /tmp/script.before.3 For example, the 3rd test produces this slice: $ cat /tmp/script.before.3 cat 3147 180457.375844: 2143 cycles:uppp: 7f79362590d9 cfree@GLIBC_2.2.5+0x9 (/usr/lib64/libc-2.28.so) cat 3147 180457.375986: 2245 cycles:uppp: 558b70f3d86e [unknown] (/usr/bin/cat) cat 3147 180457.376012: 2164 cycles:uppp: 7f7936257430 _int_malloc+0x8c0 (/usr/lib64/libc-2.28.so) cat 3147 180457.376140: 2921 cycles:uppp: 558b70f3a554 [unknown] (/usr/bin/cat) cat 3147 180457.376296: 2844 cycles:uppp: 7f7936258abe malloc+0x4e (/usr/lib64/libc-2.28.so) cat 3147 180457.376431: 2717 cycles:uppp: 558b70f3b0ca [unknown] (/usr/bin/cat) cat 3147 180457.376667: 2630 cycles:uppp: 558b70f3d86e [unknown] (/usr/bin/cat) cat 3147 180457.376795: 2442 cycles:uppp: 7f79362bff55 read+0x15 (/usr/lib64/libc-2.28.so) cat 3147 180457.376927: 2376 cycles:uppp: ffffffff9aa00163 [unknown] ([unknown]) cat 3147 180457.376954: 2307 cycles:uppp: 7f7936257438 _int_malloc+0x8c8 (/usr/lib64/libc-2.28.so) cat 3147 180457.377116: 3091 cycles:uppp: 7f7936258a70 malloc+0x0 (/usr/lib64/libc-2.28.so) cat 3147 180457.377362: 2945 cycles:uppp: 558b70f3a3b0 [unknown] (/usr/bin/cat) cat 3147 180457.377517: 2727 cycles:uppp: 558b70f3a9aa [unknown] (/usr/bin/cat) $ Install 'coreutils-debuginfo' to see cat's guts (symbols), but then, the above chunk translates into this 'perf report' output: $ cat /tmp/report.before.3 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 13 of event 'cycles:uppp' (time slices: 180457.375844,180457.377717) # Event count (approx.): 33552 # # Overhead Command Shared Object Symbol # ........ ....... ................ ...................... # 17.69% cat libc-2.28.so [.] malloc 14.53% cat cat [.] 0x000000000000586e 13.33% cat libc-2.28.so [.] _int_malloc 8.78% cat cat [.] 0x00000000000023b0 8.71% cat cat [.] 0x0000000000002554 8.13% cat cat [.] 0x00000000000029aa 8.10% cat cat [.] 0x00000000000030ca 7.28% cat libc-2.28.so [.] read 7.08% cat [unknown] [k] 0xffffffff9aa00163 6.39% cat libc-2.28.so [.] cfree@GLIBC_2.2.5 # # (Tip: Order by the overhead of source file name and line number: perf report -s srcline) # $ Now lets see after applying this patch, nothing should change: $ perf report --time 10%/1,10%/2 > /tmp/report.after.1 $ perf script --time 10%/1,10%/2 > /tmp/script.after.1 $ perf report --time 0%-10%,30%-40% > /tmp/report.after.2 $ perf script --time 0%-10%,30%-40% > /tmp/script.after.2 $ perf report --time 180457.375844,180457.377717 > /tmp/report.after.3 $ perf script --time 180457.375844,180457.377717 > /tmp/script.after.3 $ diff -u /tmp/report.before.1 /tmp/report.after.1 $ diff -u /tmp/script.before.1 /tmp/script.after.1 $ diff -u /tmp/report.before.2 /tmp/report.after.2 --- /tmp/report.before.2 2019-03-01 11:01:53.526094883 -0300 +++ /tmp/report.after.2 2019-03-01 11:09:18.231770467 -0300 @@ -352,5 +352,5 @@ # -# (Tip: Generate a script for your data: perf script -g <lang>) +# (Tip: Treat branches as callchains: perf report --branch-history) # $ diff -u /tmp/script.before.2 /tmp/script.after.2 $ diff -u /tmp/report.before.3 /tmp/report.after.3 --- /tmp/report.before.3 2019-03-01 11:03:08.890045588 -0300 +++ /tmp/report.after.3 2019-03-01 11:09:40.660224002 -0300 @@ -22,5 +22,5 @@ # -# (Tip: Order by the overhead of source file name and line number: perf report -s srcline) +# (Tip: List events using substring match: perf list <keyword>) # $ diff -u /tmp/script.before.3 /tmp/script.after.3 $ Cool, just the 'perf report' tips changed, QED. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1551435186-6008-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01Merge tag 'tee-fix-for-v5.0' of ↵Arnd Bergmann
https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes OP-TEE driver - add missing of_node_put after of_device_is_available * tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee: tee: optee: add missing of_node_put after of_device_is_available