summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-08-08cls_api: add flow_indr_block_call functionwenxu
This patch make indr_block_call don't access struct tc_indr_block_cb and tc_indr_block_dev directly Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08cls_api: remove the tcf_block cachewenxu
Remove the tcf_block in the tc_indr_block_dev for muti-subsystem support. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08cls_api: modify the tc_indr_block_ing_cmd parameters.wenxu
This patch make tc_indr_block_ing_cmd can't access struct tc_indr_block_dev and tc_indr_block_cb. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08Merge branch 'net-batched-receive-in-GRO-path'David S. Miller
Edward Cree says: ==================== net: batched receive in GRO path This series listifies part of GRO processing, in a manner which allows those packets which are not GROed (i.e. for which dev_gro_receive returns GRO_NORMAL) to be passed on to the listified regular receive path. dev_gro_receive() itself is not listified, nor the per-protocol GRO callback, since GRO's need to hold packets on lists under napi->gro_hash makes keeping the packets on other lists awkward, and since the GRO control block state of held skbs can refer only to one 'new' skb at a time. Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb onto a list in the napi struct, which is received at the end of the napi poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch. Performance figures with this series, collected on a back-to-back pair of Solarflare sfn8522-r2 NICs with 120-second NetPerf tests. In the stats, sample size n for old and new code is 6 runs each; p is from a Welch t-test. Tests were run both with GRO enabled and disabled, the latter simulating uncoalesceable packets (e.g. due to IP or TCP options). The receive side (which was the device under test) had the NetPerf process pinned to one CPU, and the device interrupts pinned to a second CPU. CPU utilisation figures (used in cases of line-rate performance) are summed across all CPUs. net.core.gro_normal_batch was left at its default value of 8. TCP 4 streams, GRO on: all results line rate (9.415Gbps) net-next: 210.3% cpu after #1: 181.5% cpu (-13.7%, p=0.031 vs net-next) after #3: 196.7% cpu (- 8.4%, p=0.136 vs net-next) TCP 4 streams, GRO off: net-next: 8.017 Gbps after #1: 7.785 Gbps (- 2.9%, p=0.385 vs net-next) after #3: 7.604 Gbps (- 5.1%, p=0.282 vs net-next. But note *) TCP 1 stream, GRO off: net-next: 6.553 Gbps after #1: 6.444 Gbps (- 1.7%, p=0.302 vs net-next) after #3: 6.790 Gbps (+ 3.6%, p=0.169 vs net-next) TCP 1 stream, GRO on, busy_read = 50: all results line rate net-next: 156.0% cpu after #1: 174.5% cpu (+11.9%, p=0.015 vs net-next) after #3: 165.0% cpu (+ 5.8%, p=0.147 vs net-next) TCP 1 stream, GRO off, busy_read = 50: net-next: 6.488 Gbps after #1: 6.625 Gbps (+ 2.1%, p=0.059 vs net-next) after #3: 7.351 Gbps (+13.3%, p=0.026 vs net-next) TCP_RR 100 streams, GRO off, 8000 byte payload net-next: 995.083 us after #1: 969.167 us (- 2.6%, p=0.204 vs net-next) after #3: 976.433 us (- 1.9%, p=0.254 vs net-next) TCP_RR 100 streams, GRO off, 8000 byte payload, busy_read = 50: net-next: 2.851 ms after #1: 2.871 ms (+ 0.7%, p=0.134 vs net-next) after #3: 2.937 ms (+ 3.0%, p<0.001 vs net-next) TCP_RR 100 streams, GRO off, 1 byte payload, busy_read = 50: net-next: 867.317 us after #1: 865.717 us (- 0.2%, p=0.334 vs net-next) after #3: 868.517 us (+ 0.1%, p=0.414 vs net-next) (*) These tests produced a mixture of line-rate and below-line-rate results, meaning that statistically speaking the results were 'censored' by the upper bound, and were thus not normally distributed, making a Welch t-test mathematically invalid. I therefore also calculated estimators according to [1], which gave the following: net-next: 8.133 Gbps after #1: 8.130 Gbps (- 0.0%, p=0.499 vs net-next) after #3: 7.680 Gbps (- 5.6%, p=0.285 vs net-next) (though my procedure for determining ν wasn't mathematically well-founded either, so take that p-value with a grain of salt). A further check came from dividing the bandwidth figure by the CPU usage for each test run, giving: net-next: 3.461 after #1: 3.198 (- 7.6%, p=0.145 vs net-next) after #3: 3.641 (+ 5.2%, p=0.280 vs net-next) The above results are fairly mixed, and in most cases not statistically significant. But I think we can roughly conclude that the series marginally improves non-GROable throughput, without hurting latency (except in the large-payload busy-polling case, which in any case yields horrid performance even on net-next (almost triple the latency without busy-poll). Also, drivers which, unlike sfc, pass UDP traffic to GRO would expect to see a benefit from gaining access to batching. Changed in v3: * gro_normal_batch sysctl now uses SYSCTL_ONE instead of &one * removed RFC tags (no comments after a week means no-one objects, right?) Changed in v2: * During busy poll, call gro_normal_list() to receive batched packets after each cycle of the napi busy loop. See comments in Patch #3 for complications of doing the same in busy_poll_stop(). [1]: Cohen 1959, doi: 10.1080/00401706.1959.10489859 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08net: use listified RX for handling GRO_NORMAL skbsEdward Cree
When GRO decides not to coalesce a packet, in napi_frags_finish(), instead of passing it to the stack immediately, place it on a list in the napi struct. Then, at flush time (napi_complete_done(), napi_poll(), or napi_busy_loop()), call netif_receive_skb_list_internal() on the list. We'd like to do that in napi_gro_flush(), but it's not called if !napi->gro_bitmask, so we have to do it in the callers instead. (There are a handful of drivers that call napi_gro_flush() themselves, but it's not clear why, or whether this will affect them.) Because a full 64 packets is an inefficiently large batch, also consume the list whenever it exceeds gro_normal_batch, a new net/core sysctl that defaults to 8. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08sfc: falcon: don't score irq moderation points for GROEdward Cree
Same rationale as for sfc, except that this wasn't performance-tested. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08sfc: don't score irq moderation points for GROEdward Cree
We already scored points when handling the RX event, no-one else does this, and looking at the history it appears this was originally meant to only score on merges, not on GRO_NORMAL. Moreover, it gets in the way of changing GRO to not immediately pass GRO_NORMAL skbs to the stack. Performance testing with four TCP streams received on a single CPU (where throughput was line rate of 9.4Gbps in all tests) showed a 13.7% reduction in RX CPU usage (n=6, p=0.03). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08qed: Add new ethtool supported port types based on media.Rahul Verma
Supported ports in ethtool <eth1> are displayed based on media type. For media type fibre and twinaxial, port type is "FIBRE". Media type Base-T is "TP" and media KR is "Backplane". V1->V2: Corrected the subject. Signed-off-by: Rahul Verma <rahulv@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08cxgb4: smt: Use normal int for refcountChuhong Yuan
All refcount operations are protected by spinlocks now. Then the atomic counter can be replaced by a normal int. This patch depends on PATCH 1/2. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08cxgb4: smt: Add lock for atomic_dec_and_testChuhong Yuan
The atomic_dec_and_test() is not safe because it is outside of locks. Move the locks of t4_smte_free() to its caller, cxgb4_smt_release() to protect the atomic decrement. Fixes: 3bdb376e6944 ("cxgb4: introduce SMT ops to prepare for SMAC rewrite support") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08selftests: Add l2tp testsDavid Ahern
Add IPv4 and IPv6 l2tp tests. Current set is over IP and with IPsec. v2 - add l2tp.sh to TEST_PROGS in Makefile Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08net: delete "register" keywordAlexey Dobriyan
Delete long obsoleted "register" keyword. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08mkiss: Use refcount_t for refcountChuhong Yuan
refcount_t is better for reference counters since its implementation can prevent overflows. So convert atomic_t ref counters to refcount_t. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08dpaa_eth: Use refcount_t for refcountChuhong Yuan
refcount_t is better for reference counters since its implementation can prevent overflows. So convert atomic_t ref counters to refcount_t. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08Merge tag 'batadv-next-for-davem-20190808' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Replace usage of strlcpy with strscpy, by Sven Eckelmann - Add OGMv2 per-interface queue and aggregations, by Linus Luessing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-07tools/bpf: fix core_reloc.c compilation errorYonghong Song
On my local machine, I have the following compilation errors: ===== In file included from prog_tests/core_reloc.c:3:0: ./progs/core_reloc_types.h:517:46: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘fancy_char_ptr_t’ typedef const char * const volatile restrict fancy_char_ptr_t; ^ ./progs/core_reloc_types.h:527:2: error: unknown type name ‘fancy_char_ptr_t’ fancy_char_ptr_t d; ^ ===== I am using gcc 4.8.5. Later compilers may change their behavior not emitting the error. Nevertheless, let us fix the issue. "restrict" can be tested without typedef. Fixes: 9654e2ae908e ("selftests/bpf: add CO-RE relocs modifiers/typedef tests") Cc: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07Merge branch 'compile-once-run-everywhere'Alexei Starovoitov
Andrii Nakryiko says: ==================== This patch set implements central part of CO-RE (Compile Once - Run Everywhere, see [0] and [1] for slides and video): relocating fields offsets. Most of the details are written down as comments to corresponding parts of the code. Patch #1 adds a bunch of commonly useful btf_xxx helpers to simplify working with BTF types. Patch #2 converts existing libbpf code to these new helpers and removes some of pre-existing ones. Patch #3 adds loading of .BTF.ext offset relocations section and macros to work with its contents. Patch #4 implements CO-RE relocations algorithm in libbpf. Patch #5 introduced BPF_CORE_READ macro, hiding usage of Clang's __builtin_preserve_access_index intrinsic that records offset relocation. Patches #6-#14 adds selftests validating various parts of relocation handling, type compatibility, etc. For all tests to work, you'll need latest Clang/LLVM supporting __builtin_preserve_access_index intrinsic, used for recording offset relocations. Kernel on which selftests run should have BTF information built in (CONFIG_DEBUG_INFO_BTF=y). [0] http://vger.kernel.org/bpfconf2019.html#session-2 [1] http://vger.kernel.org/lpc-bpf2018.html#session-2 v5->v6: - fix bad comment formatting for real (Alexei); v4->v5: - drop constness for btf_xxx() helpers, allowing to avoid type casts (Alexei); - rebase on latest bpf-next, change test__printf back to printf; v3->v4: - added btf_xxx helpers (Alexei); - switched libbpf code to new helpers; - reduced amount of logging and simplified format in few places (Alexei); - made flavor name parsing logic more strict (exactly three underscores); - no uname() error checking (Alexei); - updated misc tests to reflect latest Clang fixes (Yonghong); v2->v3: - enclose BPF_CORE_READ args in parens (Song); v1->v2: - add offsetofend(), fix btf_ext optional fields checks (Song); - add bpf_core_dump_spec() for logging spec representation; - move special first element processing out of the loop (Song); - typo fixes (Song); - drop BPF_ST | BPF_MEM insn relocation (Alexei); - extracted BPF_CORE_READ into bpf_helpers (Alexei); - added extra tests validating Clang capturing relocs correctly (Yonghong); - switch core_relocs.c to use sub-tests; - updated mods tests after Clang bug was fixed (Yonghong); - fix bug enumerating candidate types; ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs misc testsAndrii Nakryiko
Add tests validating few edge-cases of capturing offset relocations. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs ints testsAndrii Nakryiko
Add various tests validating handling compatible/incompatible integer types. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs ptr-as-array testsAndrii Nakryiko
Add test validating correct relocation handling for cases where pointer to something is used as an array. E.g.: int *ptr = ...; int x = ptr[42]; Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs modifiers/typedef testsAndrii Nakryiko
Add tests validating correct handling of various combinations of typedefs and const/volatile/restrict modifiers. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs enum/ptr/func_proto testsAndrii Nakryiko
Test CO-RE relocation handling of ints, enums, pointers, func protos, etc. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs array testsAndrii Nakryiko
Add tests for various array handling/relocation scenarios. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs nesting testsAndrii Nakryiko
Add a bunch of test validating correct handling of nested structs/unions. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs struct flavors testsAndrii Nakryiko
Add tests verifying that BPF program can use various struct/union "flavors" to extract data from the same target struct/union. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add CO-RE relocs testing setupAndrii Nakryiko
Add CO-RE relocation test runner. Add one simple test validating that libbpf's logic for searching for kernel image and loading BTF out of it works. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: add BPF_CORE_READ relocatable read macroAndrii Nakryiko
Add BPF_CORE_READ macro used in tests to do bpf_core_read(), which automatically captures offset relocation. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07libbpf: implement BPF CO-RE offset relocation algorithmAndrii Nakryiko
This patch implements the core logic for BPF CO-RE offsets relocations. Every instruction that needs to be relocated has corresponding bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying to match recorded "local" relocation spec against potentially many compatible "target" types, creating corresponding spec. Details of the algorithm are noted in corresponding comments in the code. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07libbpf: add .BTF.ext offset relocation section loadingAndrii Nakryiko
Add support for BPF CO-RE offset relocations. Add section/record iteration macros for .BTF.ext. These macro are useful for iterating over each .BTF.ext record, either for dumping out contents or later for BPF CO-RE relocation handling. To enable other parts of libbpf to work with .BTF.ext contents, moved a bunch of type definitions into libbpf_internal.h. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07libbpf: convert libbpf code to use new btf helpersAndrii Nakryiko
Simplify code by relying on newly added BTF helper functions. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07libbpf: add helpers for working with BTF typesAndrii Nakryiko
Add lots of frequently used helpers that simplify working with BTF types. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06Merge branch 'test_progs-stdio'Alexei Starovoitov
Stanislav Fomichev says: ==================== I was looking into converting test_sockops* to test_progs framework and that requires using cgroup_helpers.c which rely on stdio/stderr. Let's use open_memstream to override stdout into buffer during subtests instead of custom test_{v,}printf wrappers. That lets us continue to use stdio in the subtests and dump it on failure if required. That would also fix bpf_find_map which currently uses printf to signal failure (missed during test_printf conversion). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-06selftests/bpf: test_progs: drop extra trailing tabStanislav Fomichev
Small (un)related cleanup. Cc: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-06selftests/bpf: test_progs: test__printf -> printfStanislav Fomichev
Now that test__printf is a simple wraper around printf, let's drop it (and test__vprintf as well). Cc: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-06selftests/bpf: test_progs: switch to open_memstreamStanislav Fomichev
Use open_memstream to override stdout during test execution. The copy of the original stdout is held in env.stdout and used to print subtest info and dump failed log. test_{v,}printf are now simple wrappers around stdout and will be removed in the next patch. v5: * fix -v crash by always setting env.std{in,err} (Alexei Starovoitov) * drop force_log check from stdio_hijack (Andrii Nakryiko) v4: * one field per line for stdout/stderr (Andrii Nakryiko) v3: * don't do strlen over log_buf, log_cnt has it already (Andrii Nakryiko) v2: * add ifdef __GLIBC__ around open_memstream (maybe pointless since we already depend on glibc for argp_parse) * hijack stderr as well (Andrii Nakryiko) * don't hijack for every test, do it once (Andrii Nakryiko) * log_cap -> log_size (Andrii Nakryiko) * do fseeko in a proper place (Andrii Nakryiko) * check open_memstream returned value (Andrii Nakryiko) Cc: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "Yeah I should have sent a pull request last week, so there is a lot more here than usual: 1) Fix memory leak in ebtables compat code, from Wenwen Wang. 2) Several kTLS bug fixes from Jakub Kicinski (circular close on disconnect etc.) 3) Force slave speed check on link state recovery in bonding 802.3ad mode, from Thomas Falcon. 4) Clear RX descriptor bits before assigning buffers to them in stmmac, from Jose Abreu. 5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF loops, from Nishka Dasgupta. 6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean. 7) Need to hold sock across skb->destructor invocation, from Cong Wang. 8) IP header length needs to be validated in ipip tunnel xmit, from Haishuang Yan. 9) Use after free in ip6 tunnel driver, also from Haishuang Yan. 10) Do not use MSI interrupts on r8169 chips before RTL8168d, from Heiner Kallweit. 11) Upon bridge device init failure, we need to delete the local fdb. From Nikolay Aleksandrov. 12) Handle erros from of_get_mac_address() properly in stmmac, from Martin Blumenstingl. 13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef Kadlecsik. 14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with some devices, so revert. From Johannes Berg. 15) Fix deadlock in rxrpc, from David Howells. 16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue Haibing. 17) Fix mvpp2 crash on module removal, from Matteo Croce. 18) Fix race in genphy_update_link, from Heiner Kallweit. 19) bpf_xdp_adjust_head() stopped working with generic XDP when we fixes generic XDP to support stacked devices properly, fix from Jesper Dangaard Brouer. 20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from David Ahern. 21) Several memory leaks in new sja1105 driver, from Vladimir Oltean" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits) net: dsa: sja1105: Fix memory leak on meta state machine error path net: dsa: sja1105: Fix memory leak on meta state machine normal path net: dsa: sja1105: Really fix panic on unregistering PTP clock net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well net: dsa: sja1105: Fix broken learning with vlan_filtering disabled net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus() net: sched: sample: allow accessing psample_group with rtnl net: sched: police: allow accessing police->params with rtnl net: hisilicon: Fix dma_map_single failed on arm64 net: hisilicon: fix hip04-xmit never return TX_BUSY net: hisilicon: make hip04_tx_reclaim non-reentrant tc-testing: updated vlan action tests with batch create/delete net sched: update vlan action for batched events operations net: stmmac: tc: Do not return a fragment entry net: stmmac: Fix issues when number of Queues >= 4 net: stmmac: xgmac: Fix XGMAC selftests be2net: disable bh with spin_lock in be_process_mcc net: cxgb3_main: Fix a resource leak in a error path in 'init_one()' net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER ...
2019-08-06Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2019-08-05 This series contains updates to i40e driver only. Dmitrii adds missing statistic counters for VEB and VEB TC's. Slawomir adds support for logging the "Disable Firmware LLDP" flag option and its current status. Jake fixes an issue where VF's being notified of their link status before their queues are enabled which was causing issues. So always report link status down when the VF queues are not enabled. Also adds future proofing when statistics are added or removed by adding checks to ensure the data pointer for the strings lines up with the expected statistics count. Czeslaw fixes the advertised mode reported in ethtool for FEC, where the "None BaseR RS" was always being displayed no matter what the mode it was in. Also added logging information when the PF is entering or leaving "allmulti" (or promiscuous) mode. Fixed up the logging logic for VF's when leaving multicast mode to not include unicast as well. v2: drop Aleksandr's patch (previously patch #2 in the series) to display the VF MAC address that is set by the VF while community feedback is addressed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06openvswitch: Print error when ovs_execute_actions() failsYifeng Sun
Currently in function ovs_dp_process_packet(), return values of ovs_execute_actions() are silently discarded. This patch prints out an debug message when error happens so as to provide helpful hints for debugging. Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06Merge branch 'sja1105-fixes'David S. Miller
Vladimir Oltean says: ==================== Fixes for SJA1105 DSA: FDBs, Learning and PTP This is an assortment of functional fixes for the sja1105 switch driver targeted for the "net" tree (although they apply on net-next just as well). Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering disabled") repairs a breakage introduced in the early development stages of the driver: support for traffic from the CPU has broken "normal" frame forwarding (based on DMAC) - there is connectivity through the switch only because all frames are flooded. I debated whether this patch qualifies as a fix, since it puts the switch into a mode it has never operated in before (aka SVL). But "normal" forwarding did use to work before the "Traffic support for SJA1105 DSA driver" patchset, and arguably this patch should have been part of that. Also, it would be strange for this feature to be broken in the 5.2 LTS. Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well") is a simplification of a previous FDB-related patch that is currently in the 5.3 rc's. Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the switch ports for extended periods of time, or in conjunction with other error conditions. The fixed-up commits were all introduced in 5.2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: dsa: sja1105: Fix memory leak on meta state machine error pathVladimir Oltean
When RX timestamping is enabled and two link-local (non-meta) frames are received in a row, this constitutes an error. The tagger is always caching the last link-local frame, in an attempt to merge it with the meta follow-up frame when that arrives. To recover from the above error condition, the initial cached link-local frame is dropped and the second frame in a row is cached (in expectance of the second meta frame). However, when dropping the initial link-local frame, its backing memory was being leaked. Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: dsa: sja1105: Fix memory leak on meta state machine normal pathVladimir Oltean
After a meta frame is received, it is associated with the cached sp->data->stampable_skb from the DSA tagger private structure. Cached means its refcount is incremented with skb_get() in order for dsa_switch_rcv() to not free it when the tagger .rcv returns NULL. The mistake is that skb_unref() is not the correct function to use. It will correctly decrement the refcount (which will go back to zero) but the skb memory will not be freed. That is the job of kfree_skb(), which also calls skb_unref(). But it turns out that freeing the cached stampable_skb is in fact not necessary. It is still a perfectly valid skb, and now it is even annotated with the partial RX timestamp. So remove the skb_copy() altogether and simply pass the stampable_skb with a refcount of 1 (incremented by us, decremented by dsa_switch_rcv) up the stack. Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: dsa: sja1105: Really fix panic on unregistering PTP clockVladimir Oltean
The IS_ERR_OR_NULL(priv->clock) check inside sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync from actually being run. Additionally, sja1105_ptp_clock_unregister() does not actually get run, when placed in sja1105_remove(). The DSA switch gets torn down, but the sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister needs to be moved to sja1105_teardown, to be symmetrical with sja1105_ptp_clock_register which is called from the DSA sja1105_setup. It is strange to fix a "fixes" patch, but the probe failure can only be seen when the attached PHY does not respond to MDIO (issue which I can't pinpoint the reason to) and it goes away after I power-cycle the board. This time the patch was validated on a failing board, and the kernel panic from the fixed commit's message can no longer be seen. Fixes: 29dd908d355f ("net: dsa: sja1105: Cancel PTP delayed work on unregister") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as wellVladimir Oltean
It looks like the FDB dump taken from first-generation switches also contains information on whether entries are static or not. So use that instead of searching through the driver's tables. Fixes: d763778224ea ("net: dsa: sja1105: Implement is_static for FDB entries on E/T") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: dsa: sja1105: Fix broken learning with vlan_filtering disabledVladimir Oltean
When put under a bridge with vlan_filtering 0, the SJA1105 ports will flood all traffic as if learning was broken. This is because learning interferes with the rx_vid's configured by dsa_8021q as unique pvid's. So learning technically still *does* work, it's just that the learnt entries never get matched due to their unique VLAN ID. The setting that saves the day is Shared VLAN Learning, which on this switch family works exactly as desired: VLAN tagging still works (untagged traffic gets the correct pvid) and FDB entries are still populated with the correct contents including VID. Also, a frame cannot violate the forwarding domain restrictions enforced by its classified VLAN. It is just that the VID is ignored when looking up the FDB for taking a forwarding decision (selecting the egress port). This patch activates SVL, and the result is that frames with a learnt DMAC are no longer flooded in the scenario described above. Now exactly *because* SVL works as desired, we have to revisit some earlier patches: - It is no longer necessary to manipulate the VID of the 'bridge fdb {add,del}' command when vlan_filtering is off. This is because now, SVL is enabled for that case, so the actual VID does not matter*. - It is still desirable to hide dsa_8021q VID's in the FDB dump callback. But right now the dump callback should no longer hide duplicates (one per each front panel port's pvid, plus one for the VLAN that the CPU port is going to tag a TX frame with), because there shouldn't be any (the switch will match a single FDB entry no matter its VID anyway). * Not really... It's no longer necessary to transform a 'bridge fdb add' into 5 fdb add operations, but the user might still add a fdb entry with any vid, and all of them would appear as duplicates in 'bridge fdb show'. So force a 'bridge fdb add' to insert the VID of 0**, so that we can prune the duplicates at insertion time. ** The VID of 0 is better than 1 because it is always guaranteed to be in the ports' hardware filter. DSA also avoids putting the VID inside the netlink response message towards the bridge driver when we return this particular VID, which makes it suitable for FDB entries learnt with vlan_filtering off. Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()Nishka Dasgupta
Each iteration of for_each_available_child_of_node() puts the previous node, but in the case of a return from the middle of the loop, there is no put, thus causing a memory leak. Hence add an of_node_put() before the return. Additionally, the local variable ports in the function qca8k_setup_mdio_bus() takes the return value of of_get_child_by_name(), which gets a node but does not put it. If the function returns without putting ports, it may cause a memory leak. Hence put ports before the mid-loop return statement, and also outside the loop after its last usage in this function. Issues found with Coccinelle. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06Merge branch 'Support-tunnels-over-VLAN-in-NFP'David S. Miller
John Hurley says: ==================== Support tunnels over VLAN in NFP This patchset deals with tunnel encap and decap when the end-point IP address is on an internal port (for example and OvS VLAN port). Tunnel encap without VLAN is already supported in the NFP driver. This patchset extends that to include a push VLAN along with tunnel header push. Patches 1-4 extend the flow_offload IR API to include actions that use skbedit to set the ptype of an SKB and that send a packet to port ingress from the act_mirred module. Such actions are used in flower rules that forward tunnel packets to internal ports where they can be decapsulated. OvS and its TC API is an example of a user-space app that produces such rules. Patch 5 modifies the encap offload code to allow the pushing of a VLAN header after a tunnel header push. Patches 6-10 deal with tunnel decap when the end-point is on an internal port. They detect 'pre-tunnel rules' which do not deal with tunnels themselves but, rather, forward packets to internal ports where they can be decapped if required. Such rules are offloaded to a table in HW along with an indication of whether packets need to be passed to this table of not (based on their destination MAC address). Matching against this table prior to decapsulation in HW allows the correct parsing and handling of outer VLANs on tunnelled packets and the correct updating of stats for said 'pre-tunnel' rules. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06nfp: flower: encode mac indexes with pre-tunnel rule checkJohn Hurley
When a tunnel packet arrives on the NFP card, its destination MAC is looked up and MAC index returned for it. This index can help verify the tunnel by, for example, ensuring that the packet arrived on the expected port. If the packet is destined for a known MAC that is not connected to a given physical port then the mac index can have a global value (e.g. when a series of bonded ports shared the same MAC). If the packet is to be detunneled at a bridge device or internal port like an Open vSwitch VLAN port, then it should first match a 'pre-tunnel' rule to direct it to that internal port. Use the MAC index to indicate if a packet should match a pre-tunnel rule before decap is allowed. Do this by tracking the number of internal ports associated with a MAC address and, if the number if >0, set a bit in the mac_index to forward the packet to the pre-tunnel table before continuing with decap. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06nfp: flower: remove offloaded MACs when reprs are applied to OvS bridgesJohn Hurley
MAC addresses along with an identifying index are offloaded to firmware to allow tunnel decapsulation. If a tunnel packet arrives with a matching destination MAC address and a verified index, it can continue on the decapsulation process. This replicates the MAC verifications carried out in the kernel network stack. When a netdev is added to a bridge (e.g. OvS) then packets arriving on that dev are directed through the bridge datapath instead of passing through the network stack. Therefore, tunnelled packets matching the MAC of that dev will not be decapped here. Replicate this behaviour on firmware by removing offloaded MAC addresses when a MAC representer is added to an OvS bridge. This can prevent any false positive tunnel decaps. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06nfp: flower: offload pre-tunnel rulesJohn Hurley
Pre-tunnel rules are TC flower and OvS rules that forward a packet to the tunnel end point where it can then pass through the network stack and be decapsulated. These are required if the tunnel end point is, say, an OvS internal port. Currently, firmware determines that a packet is in a tunnel and decaps it if it has a known destination IP and MAC address. However, this bypasses the flower pre-tunnel rule and so does not update the stats. Further to this it ignores VLANs that may exist outside of the tunnel header. Offload pre-tunnel rules to the NFP. This embeds the pre-tunnel rule into the tunnel decap process based on (firmware) mac index and VLAN. This means that decap can be carried out correctly with VLANs and that stats can be updated for all kernel rules correctly. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>