git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-10-12	netfilter: add nf_static_key_{inc,dec}	Pablo Neira Ayuso
	Add helper functions increment and decrement the hook static keys. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12	ipvs: inspect reply packets from DR/TUN real servers	longguang.yue
	Just like for MASQ, inspect the reply packets coming from DR/TUN real servers and alter the connection's state and timeout according to the protocol. It's ipvs's duty to do traffic statistic if packets get hit, no matter what mode it is. Signed-off-by: longguang.yue <bigclouds@163.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-11	bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org.	Alexei Starovoitov
	Move the bpf/bpf-next patch processing queue to patchwork.kernel.org. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201011200149.66537-1-alexei.starovoitov@gmail.com
2020-10-11	bpf: Always return target ifindex in bpf_fib_lookup	Toke Høiland-Jørgensen
	The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work. For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex if the FIB lookup itself succeeds. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201009184234.134214-1-toke@redhat.com
2020-10-11	Merge branch 'samples: bpf: Refactor XDP programs with libbpf'	Alexei Starovoitov
	"Daniel T. Lee" says: ==================== To avoid confusion caused by the increasing fragmentation of the BPF Loader program, this commit would like to convert the previous bpf_load loader with the libbpf loader. Thanks to libbpf's bpf_link interface, managing the tracepoint BPF program is much easier. bpf_program__attach_tracepoint manages the enable of tracepoint event and attach of BPF programs to it with a single interface bpf_link, so there is no need to manage event_fd and prog_fd separately. And due to addition of generic bpf_program__attach() to libbpf, it is now possible to attach BPF programs with __attach() instead of explicitly calling __attach_<type>(). This patchset refactors xdp_monitor with using this libbpf API, and the bpf_load is removed and migrated to libbpf. Also, attach_tracepoint() is replaced with the generic __attach() method in xdp_redirect_cpu. Moreover, maps in kern program have been converted to BTF-defined map. --- Changes in v2: - added cleanup logic for bpf_link and bpf_object in xdp_monitor - program section match with bpf_program__is_<type> instead of strncmp - revert BTF key/val type to default of BPF_MAP_TYPE_PERF_EVENT_ARRAY - split increment into seperate satement - refactor pointer array initialization - error code cleanup ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11	samples: bpf: Refactor XDP kern program maps with BTF-defined map	Daniel T. Lee
	Most of the samples were converted to use the new BTF-defined MAP as they moved to libbpf, but some of the samples were missing. Instead of using the previous BPF MAP definition, this commit refactors xdp_monitor and xdp_sample_pkts_kern MAP definition with the new BTF-defined MAP format. Also, this commit removes the max_entries attribute at PERF_EVENT_ARRAY map type. The libbpf's bpf_object__create_map() will automatically set max_entries to the maximum configured number of CPUs on the host. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-4-danieltimlee@gmail.com
2020-10-11	samples: bpf: Replace attach_tracepoint() to attach() in xdp_redirect_cpu	Daniel T. Lee
	>From commit d7a18ea7e8b6 ("libbpf: Add generic bpf_program__attach()"), for some BPF programs, it is now possible to attach BPF programs with __attach() instead of explicitly calling __attach_<type>(). This commit refactors the __attach_tracepoint() with libbpf's generic __attach() method. In addition, this refactors the logic of setting the map FD to simplify the code. Also, the missing removal of bpf_load.o in Makefile has been fixed. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-3-danieltimlee@gmail.com
2020-10-11	samples: bpf: Refactor xdp_monitor with libbpf	Daniel T. Lee
	To avoid confusion caused by the increasing fragmentation of the BPF Loader program, this commit would like to change to the libbpf loader instead of using the bpf_load. Thanks to libbpf's bpf_link interface, managing the tracepoint BPF program is much easier. bpf_program__attach_tracepoint manages the enable of tracepoint event and attach of BPF programs to it with a single interface bpf_link, so there is no need to manage event_fd and prog_fd separately. This commit refactors xdp_monitor with using this libbpf API, and the bpf_load is removed and migrated to libbpf. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010181734.1109-2-danieltimlee@gmail.com
2020-10-11	Merge branch 'Offload-tc-vlan-mangle-to-mscc_ocelot-switch'	Jakub Kicinski
	Vladimir Oltean says: ==================== Offload tc-vlan mangle to mscc_ocelot switch This series offloads one more action to the VCAP IS1 ingress TCAM, which is to change the classified VLAN for packets, according to the VCAP IS1 keys (VLAN, source MAC, source IP, EtherType, etc). ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	selftests: net: mscc: ocelot: add test for VLAN modify action	Vladimir Oltean
	Create a test that changes a VLAN ID from 200 to 300. We also need to modify the preferences of the filters installed for the other rules so that they are unique, because we now install the "tc-vlan modify" filter in VCAP IS1 only temporarily, and we need to perform the deletion by filter preference number. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	net: dsa: tag_ocelot: use VLAN information from tagging header when available	Vladimir Oltean
	When the Extraction Frame Header contains a valid classified VLAN, use that instead of the VLAN header present in the packet. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	net: mscc: ocelot: offload VLAN mangle action to VCAP IS1	Vladimir Oltean
	The VCAP_IS1_ACT_VID_REPLACE_ENA action, from the VCAP IS1 ingress TCAM, changes the classified VLAN. We are only exposing this ability for switch ports that are under VLAN aware bridges. This is because in standalone ports mode and under a bridge with vlan_filtering=0, the ocelot driver configures the switch to operate as VLAN-unaware, so the classified VLAN is not derived from the 802.1Q header from the packet, but instead is always equal to the port-based VLAN ID of the ingress port. We _can_ still change the classified VLAN for packets when operating in this mode, but the end result will most likely be a drop, since both the ingress and the egress port need to be members of the modified VLAN. And even if we install the new classified VLAN into the VLAN table of the switch, the result would still not be as expected: we wouldn't see, on the output port, the modified VLAN tag, but the original one, even though the classified VLAN was indeed modified. This is because of how the hardware works: on egress, what is pushed to the frame is a "port tag", which gives us the following options: - Tag all frames with port tag (derived from the classified VLAN) - Tag all frames with port tag, except if the classified VLAN is 0 or equal to the native VLAN of the egress port - No port tag Needless to say, in VLAN-unaware mode we are disabling the port tag. Otherwise, the existing VLAN tag would be ignored, and a second VLAN tag (the port tag), holding the classified VLAN, would be pushed (instead of replacing the existing 802.1Q tag). This is definitely not what the user wanted when installing a "vlan modify" action. So it is simply not worth bothering with VLAN modify rules under other configurations except when the ports are fully VLAN-aware. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	Merge branch 'enetc-Migrate-to-PHYLINK-and-PCS_LYNX'	Jakub Kicinski
	Claudiu Manoil says: ==================== enetc: Migrate to PHYLINK and PCS_LYNX Transitioning the enetc driver from phylib to phylink. Offloading the serdes configuration to the PCS_LYNX module is a mandatory part of this transition. Aiming for a cleaner, more maintainable design, and better code reuse. The first 2 patches are clean up prerequisites. Tested on a p1028rdb board. v2: validate() explicitly rejects now all interface modes not supported by the driver instead of relying on the device tree to provide only supported interfaces, and dropped redundant activation of pcs_poll (addressing Ioana's findings) ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	enetc: Migrate to PHYLINK and PCS_LYNX	Claudiu Manoil
	This is a methodical transition of the driver from phylib to phylink, following the guidelines from sfp-phylink.rst. The MAC register configurations based on interface mode were moved from the probing path to the mac_config() hook. MAC enable and disable commands (enabling Rx and Tx paths at MAC level) were also extracted and assigned to their corresponding phylink hooks. As part of the migration to phylink, the serdes configuration from the driver was offloaded to the PCS_LYNX module, introduced in commit 0da4c3d393e4 ("net: phy: add Lynx PCS module"), the PCS_LYNX module being a mandatory component required to make the enetc driver work with phylink. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.cionei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	arm64: dts: fsl-ls1028a-rdb: Specify in-band mode for ENETC port 0	Claudiu Manoil
	As part of the transition of the enetc ethernet driver from phylib to phylink, the in-band operation mode of the SGMII interface from enetc port 0 needs to be specified explicitly for phylink. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	enetc: Clean up serdes configuration	Claudiu Manoil
	Decouple internal mdio bus creation from serdes configuration, as a prerequisite to offloading serdes configuration to a different module. Group together mdio bus creation routines, cleanup. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	enetc: Clean up MAC and link configuration	Claudiu Manoil
	Decouple level MAC configuration based on phy interface type from general port configuration. Group together MAC and link configuration code. Decouple external mdio bus creation from interface type parsing. No longer return an (unhandled) error code when phy_node not found, use phy_node to indicate whether the port has a phy or not. No longer fall-through when serdes configuration fails for the link modes that require internal link configuration. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11	Merge branch 'Follow-up BPF helper improvements'	Alexei Starovoitov
	Daniel Borkmann says: ==================== This series addresses most of the feedback [0] that was to be followed up from the last series, that is, UAPI helper comment improvements and getting rid of the ifindex obj file hacks in the selftest by using a BPF map instead. The __sk_buff data/data_end pointer work, I'm planning to do in a later round as well as the mem*() BPF improvements we have in Cilium for libbpf. Next, the series adds two features, i) a helper called redirect_peer() to improve latency on netns switch, and ii) to allow map in map with dynamic inner array map sizes. Selftests for each are added as well. For details, please check individual patches, thanks! [0] https://lore.kernel.org/bpf/cover.1601477936.git.daniel@iogearbox.net/ v5 -> v6: - Going with Andrii's suggestion to make the misconfigured verifier test more robust, and only probe on -EOPNOTSUPP (Andrii) v4 -> v5: - Replace cnt == -EOPNOTSUPP check with cnt < 0; I've used < 0 here as I think it's useful to keep the existing cnt == 0 \|\| cnt >= ARRAY_SIZE(insn_buf) for error detection (Andrii) v3 -> v4: - Rename new array map flag to BPF_F_INNER_MAP (Alexei) v2 -> v3: - Remove tab that slipped into uapi helper desc (Jakub) - Rework map in map for array to error from map_gen_lookup (Andrii) v1 -> v2: - Fixed selftest comment wrt inner1/inner2 value (Yonghong) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11	bpf, selftests: Add redirect_peer selftest	Daniel Borkmann
	Extend the test_tc_redirect test and add a small test that exercises the new redirect_peer() helper for the IPv4 and IPv6 case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-7-daniel@iogearbox.net
2020-10-11	bpf, selftests: Make redirect_neigh test more extensible	Daniel Borkmann
	Rename into test_tc_redirect.sh and move setup and test code into separate functions so they can be reused for newly added tests in here. Also remove the crude hack to override ifindex inside the object file via xxd and sed and just use a simple map instead. Map given iproute2 does not support BTF fully and therefore neither global data at this point. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201010234006.7075-6-daniel@iogearbox.net
2020-10-11	bpf, selftests: Add test for different array inner map size	Daniel Borkmann
	Extend the "diff_size" subtest to also include a non-inlined array map variant where dynamic inner #elems are possible. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201010234006.7075-5-daniel@iogearbox.net
2020-10-11	bpf: Allow for map-in-map with dynamic inner array map entries	Daniel Borkmann
	Recent work in f4d05259213f ("bpf: Add map_meta_equal map ops") and 134fede4eecf ("bpf: Relax max_entries check for most of the inner map types") added support for dynamic inner max elements for most map-in-map types. Exceptions were maps like array or prog array where the map_gen_lookup() callback uses the maps' max_entries field as a constant when emitting instructions. We recently implemented Maglev consistent hashing into Cilium's load balancer which uses map-in-map with an outer map being hash and inner being array holding the Maglev backend table for each service. This has been designed this way in order to reduce overall memory consumption given the outer hash map allows to avoid preallocating a large, flat memory area for all services. Also, the number of service mappings is not always known a-priori. The use case for dynamic inner array map entries is to further reduce memory overhead, for example, some services might just have a small number of back ends while others could have a large number. Right now the Maglev backend table for small and large number of backends would need to have the same inner array map entries which adds a lot of unneeded overhead. Dynamic inner array map entries can be realized by avoiding the inlined code generation for their lookup. The lookup will still be efficient since it will be calling into array_map_lookup_elem() directly and thus avoiding retpoline. The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips inline code generation and relaxes array_map_meta_equal() check to ignore both maps' max_entries. This also still allows to have faster lookups for map-in-map when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed. Example code generation where inner map is dynamic sized array: # bpftool p d x i 125 int handle__sys_enter(void * ctx): ; int handle__sys_enter(void ctx) 0: (b4) w1 = 0 ; int key = 0; 1: (63) (u32 )(r10 -4) = r1 2: (bf) r2 = r10 ; 3: (07) r2 += -4 ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key); 4: (18) r1 = map[id:468] 6: (07) r1 += 272 7: (61) r0 = (u32 )(r2 +0) 8: (35) if r0 >= 0x3 goto pc+5 9: (67) r0 <<= 3 10: (0f) r0 += r1 11: (79) r0 = (u64 )(r0 +0) 12: (15) if r0 == 0x0 goto pc+1 13: (05) goto pc+1 14: (b7) r0 = 0 15: (b4) w6 = -1 ; if (!inner_map) 16: (15) if r0 == 0x0 goto pc+6 17: (bf) r2 = r10 ; 18: (07) r2 += -4 ; val = bpf_map_lookup_elem(inner_map, &key); 19: (bf) r1 = r0 \| No inlining but instead 20: (85) call array_map_lookup_elem#149280 \| call to array_map_lookup_elem() ; return val ? val : -1; \| for inner array lookup. 21: (15) if r0 == 0x0 goto pc+1 ; return val ? val : -1; 22: (61) r6 = (u32 *)(r0 +0) ; } 23: (bc) w0 = w6 24: (95) exit Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
2020-10-11	bpf: Add redirect_peer helper	Daniel Borkmann
	Add an efficient ingress to ingress netns switch that can be used out of tc BPF programs in order to redirect traffic from host ns ingress into a container veth device ingress without having to go via CPU backlog queue [0]. For local containers this can also be utilized and path via CPU backlog queue only needs to be taken once, not twice. On a high level this borrows from ipvlan which does similar switch in __netif_receive_skb_core() and then iterates via another_round. This helps to reduce latency for mentioned use cases. Pod to remote pod with redirect(), TCP_RR [1]: # percpu_netperf 10.217.1.33 RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 ) MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 ) STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 ) MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 ) P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 ) P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 ) P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 ) TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 ) Pod to remote pod with redirect_peer(), TCP_RR: # percpu_netperf 10.217.1.33 RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 ) MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 ) STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 ) MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 ) P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 ) P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 ) P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 ) TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 ) [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
2020-10-11	bpf: Improve bpf_redirect_neigh helper description	Daniel Borkmann
	Follow-up to address David's feedback that we should better describe internals of the bpf_redirect_neigh() helper. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201010234006.7075-2-daniel@iogearbox.net
2020-10-10	drivers/net/wan/hdlc_fr: Move the skb_headroom check out of fr_hard_header	Xie He
	Move the skb_headroom check out of fr_hard_header and into pvc_xmit. This has two benefits: 1. Originally we only do this check for skbs sent by users on Ethernet- emulating PVC devices. After the change we do this check for skbs sent on normal PVC devices, too. (Also add a comment to make it clear that this is only a protection against upper layers that don't take dev->needed_headroom into account. Such upper layers should be rare and I believe they should be fixed.) 2. After the change we can simplify the parameter list of fr_hard_header. We no longer need to use a pointer to pointers (skb_p) because we no longer need to replace the skb inside fr_hard_header. Cc: Krzysztof Halasa <khc@pm.waw.pl> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	ipv4: Restore flowi4_oif update before call to xfrm_lookup_route	David Ahern
	Tobias reported regressions in IPsec tests following the patch referenced by the Fixes tag below. The root cause is dropping the reset of the flowi4_oif after the fib_lookup. Apparently it is needed for xfrm cases, so restore the oif update to ip_route_output_flow right before the call to xfrm_lookup_route. Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device") Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	net: dsa: rtl8366rb: Roof MTU for switch	Linus Walleij
	The MTU setting for this DSA switch is global so we need to keep track of the MTU set for each port, then as soon as any MTU changes, roof the MTU to the biggest common denominator and poke that into the switch MTU setting. To achieve this we need a per-chip-variant state container for the RTL8366RB to use for the RTL8366RB-specific stuff. Other SMI switches does seem to have per-port MTU setting capabilities. Fixes: 5f4a8ef384db ("net: dsa: rtl8366rb: Support setting MTU") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	Merge branch 'mptcp-some-fallback-fixes'	Jakub Kicinski
	Paolo Abeni says: ==================== mptcp: some fallback fixes pktdrill pointed-out we currently don't handle properly some fallback scenario for MP_JOIN subflows The first patch addresses such issue. Patch 2/2 fixes a related pre-existing issue that is more evident after 1/2: we could keep using for MPTCP signaling closed subflows. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	mptcp: subflows garbage collection	Paolo Abeni
	The msk can close MP_JOIN subflows if the initial handshake fails. Currently such subflows are kept alive in the conn_list until the msk itself is closed. Beyond the wasted memory, we could end-up sending the DATA_FIN and the DATA_FIN ack on such socket, even after a reset. Fixes: 43b54c6ee382 ("mptcp: Use full MPTCP-level disconnect state machine") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	mptcp: fix fallback for MP_JOIN subflows	Paolo Abeni
	Additional/MP_JOIN subflows that do not pass some initial handshake tests currently causes fallback to TCP. That is an RFC violation: we should instead reset the subflow and leave the the msk untouched. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/91 Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	net: phy: Move of_mdio from drivers/of to drivers/net/mdio	Calvin Johnson
	Better place for of_mdio.c is drivers/net/mdio. Move of_mdio.c from drivers/of to drivers/net/mdio Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	dpaa_eth: enable NETIF_MSG_HW by default	Maxim Kochetkov
	When packets are received on the error queue, this function under net_ratelimit(): netif_err(priv, hw, net_dev, "Err FD status = 0x%08x\n"); does not get printed. Instead we only see: [ 3658.845592] net_ratelimit: 244 callbacks suppressed [ 3663.969535] net_ratelimit: 230 callbacks suppressed [ 3669.085478] net_ratelimit: 228 callbacks suppressed Enabling NETIF_MSG_HW fixes this issue, and we can see some information about the frame descriptors of packets. Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	r8169: factor out handling rtl8169_stats	Heiner Kallweit
	Factor out handling the private packet/byte counters to new functions rtl_get_priv_stats() and rtl_inc_priv_stats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	net: usbnet: remove driver version	Heiner Kallweit
	Obviously this driver version doesn't make sense. Go with the default and let ethtool display the kernel version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	net: smc: fix missing brace warning for old compilers	Pujin Shi
	For older versions of gcc, the array = {0}; will cause warnings: net/smc/smc_llc.c: In function 'smc_llc_add_link_local': net/smc/smc_llc.c:1212:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_add_link add_llc = {0}; ^ net/smc/smc_llc.c:1212:9: warning: (near initialization for 'add_llc.hd') [-Wmissing-braces] net/smc/smc_llc.c: In function 'smc_llc_srv_delete_link_local': net/smc/smc_llc.c:1245:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_del_link del_llc = {0}; ^ net/smc/smc_llc.c:1245:9: warning: (near initialization for 'del_llc.hd') [-Wmissing-braces] 2 warnings generated Fixes: 4dadd151b265 ("net/smc: enqueue local LLC messages") Signed-off-by: Pujin Shi <shipujin.t@gmail.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	net: smc: fix missing brace warning for old compilers	Pujin Shi
	For older versions of gcc, the array = {0}; will cause warnings: net/smc/smc_llc.c: In function 'smc_llc_send_link_delete_all': net/smc/smc_llc.c:1317:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_del_link delllc = {0}; ^ net/smc/smc_llc.c:1317:9: warning: (near initialization for 'delllc.hd') [-Wmissing-braces] 1 warnings generated Fixes: f3811fd7bc97 ("net/smc: send DELETE_LINK, ALL message and wait for send to complete") Signed-off-by: Pujin Shi <shipujin.t@gmail.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	net: thunderx: Use struct_size() helper in kmalloc()	Gustavo A. R. Silva
	Make use of the new struct_size() helper instead of the offsetof() idiom. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	Merge tag 'linux-can-fixes-for-5.9-20201008' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can ==================== linux-can-fixes-for-5.9-20201008 The first patch is by Lucas Stach and fixes m_can driver by removing an erroneous call to m_can_class_suspend() in runtime suspend. Which causes the pinctrl state to get stuck on the "sleep" state, which breaks all CAN functionality on SoCs where this state is defined. The last two patches target the j1939 protocol: Cong Wang fixes a syzbot finding of an uninitialized variable in the j1939 transport protocol. I contribute a patch, that fixes the initialization of a same uninitialized variable in a different function. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	Merge tag 'wireless-drivers-next-2020-10-09' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.10 Fourth and last set of patches for v5.10. Most of these are iwlwifi patches, but few small fixes to other drivers as well. Major changes: iwlwifi * PNVM support (platform-specific phy config data) * bump the FW API support to 59 ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10	Merge tag 'mac80211-next-for-net-next-2020-10-08' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A handful of changes: * fixes for the recent S1G work * a docbook build time improvement * API to pass beacon rate to lower-level driver ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	Merge branch 'netlink-export-policy-on-validation-failures'	Jakub Kicinski
	Johannes Berg says: ==================== netlink: export policy on validation failures Export the policy used for attribute validation when it fails, so e.g. for an out-of-range attribute userspace immediately gets the valid ranges back. v2 incorporates the suggestion from Jakub to have a function to estimate the size (netlink_policy_dump_attr_size_estimate()) and check that it does the right thing on the normal policy dumps, not (just) when calling it from the error scenario. v3 only addresses a few minor style issues. v4 fixes up a forgotten 'git add' ... sorry. v5 is a resend, I messed up v4's cover letter subject (saying v3) and apparently the second patch didn't go out at all. Tested using nl80211/iw in a few scenarios, seems to work fine and return the policy back, e.g. kernel reports: integer out of range policy: 04 00 0b 00 0c 00 04 00 01 00 00 00 00 00 00 00 ^ padding ^ minimum allowed value policy: 04 00 0b 00 0c 00 05 00 ff ff ff ff 00 00 00 00 ^ padding ^ maximum allowed value policy: 08 00 01 00 04 00 00 00 ^ type 4 == U32 for an out-of-range case. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	netlink: export policy in extended ACK	Johannes Berg
	Add a new attribute NLMSGERR_ATTR_POLICY to the extended ACK to advertise the policy, e.g. if an attribute was out of range, you'll know the range that's permissible. Add new NL_SET_ERR_MSG_ATTR_POL() and NL_SET_ERR_MSG_ATTR_POL() macros to set this, since realistically it's only useful to do this when the bad attribute (offset) is also returned. Use it in lib/nlattr.c which practically does all the policy validation. v2: - add and use netlink_policy_dump_attr_size_estimate() v3: - remove redundant break v4: - really remove redundant break ... sorry Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	netlink: policy: refactor per-attr policy writing	Johannes Berg
	Refactor the per-attribute policy writing into a new helper function, to be used later for dumping out the policy of a rejected attribute. v2: - fix some indentation v3: - change variable order in netlink_policy_dump_write() Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	tipc: fix NULL pointer dereference in tipc_named_rcv	Hoang Huu Le
	In the function node_lost_contact(), we call __skb_queue_purge() without grabbing the list->lock. This can cause to a race-condition why processing the list 'namedq' in calling path tipc_named_rcv()->tipc_named_dequeue(). [] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] #PF: supervisor read access in kernel mode [] #PF: error_code(0x0000) - not-present page [] PGD 7ca63067 P4D 7ca63067 PUD 6c553067 PMD 0 [] Oops: 0000 [#1] SMP NOPTI [] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G O 5.9.0-rc6+ #2 [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [...] [] RIP: 0010:tipc_named_rcv+0x103/0x320 [tipc] [] Code: 41 89 44 24 10 49 8b 16 49 8b 46 08 49 c7 06 00 00 00 [...] [] RSP: 0018:ffffc900000a7c58 EFLAGS: 00000282 [] RAX: 00000000000012ec RBX: 0000000000000000 RCX: ffff88807bde1270 [] RDX: 0000000000002c7c RSI: 0000000000002c7c RDI: ffff88807b38f1a8 [] RBP: ffff88807b006288 R08: ffff88806a367800 R09: ffff88806a367900 [] R10: ffff88806a367a00 R11: ffff88806a367b00 R12: ffff88807b006258 [] R13: ffff88807b00628a R14: ffff888069334d00 R15: ffff88806a434600 [] FS: 0000000000000000(0000) GS:ffff888079480000(0000) knlGS:0[...] [] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [] CR2: 0000000000000000 CR3: 0000000077320000 CR4: 00000000000006e0 [] Call Trace: [] ? tipc_bcast_rcv+0x9a/0x1a0 [tipc] [] tipc_rcv+0x40d/0x670 [tipc] [] ? _raw_spin_unlock+0xa/0x20 [] tipc_l2_rcv_msg+0x55/0x80 [tipc] [] __netif_receive_skb_one_core+0x8c/0xa0 [] process_backlog+0x98/0x140 [] net_rx_action+0x13a/0x420 [] __do_softirq+0xdb/0x316 [] ? smpboot_thread_fn+0x2f/0x1e0 [] ? smpboot_thread_fn+0x74/0x1e0 [] ? smpboot_thread_fn+0x14e/0x1e0 [] run_ksoftirqd+0x1a/0x40 [] smpboot_thread_fn+0x149/0x1e0 [] ? sort_range+0x20/0x20 [] kthread+0x131/0x150 [] ? kthread_unuse_mm+0xa0/0xa0 [] ret_from_fork+0x22/0x30 [] Modules linked in: veth tipc(O) ip6_udp_tunnel udp_tunnel [...] [] CR2: 0000000000000000 [] ---[ end trace 65c276a8e2e2f310 ]--- To fix this, we need to grab the lock of the 'namedq' list on both path calling. Fixes: cad2929dc432 ("tipc: update a binding service via broadcast") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	tipc: fix the skb_unshare() in tipc_buf_append()	Cong Wang
	skb_unshare() drops a reference count on the old skb unconditionally, so in the failure case, we end up freeing the skb twice here. And because the skb is allocated in fclone and cloned by caller tipc_msg_reassemble(), the consequence is actually freeing the original skb too, thus triggered the UAF by syzbot. Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy(). Fixes: ff48b6222e65 ("tipc: use skb_unshare() instead in tipc_buf_append()") Reported-and-tested-by: syzbot+e96a7ba46281824cc46a@syzkaller.appspotmail.com Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	Merge branch 'net-smc-updates-2020-10-07'	Jakub Kicinski
	Karsten Graul says: ==================== net/smc: updates 2020-10-07 Patch 1 and 2 address warnings from static code checkers, and patch 3 handles a case when all proposed ISM V2 devices fail to init and no V1 devices are tried afterwards. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	net/smc: restore smcd_version when all ISM V2 devices failed to init	Karsten Graul
	Field ini->smcd_version is set to SMC_V2 before calling smc_listen_ism_init(). This clears the V1 bit that may be set. When all matching ISM V2 devices fail to initialize then the smcd_version field needs to get restored to allow any possible V1 devices to initialize. And be consistent, always go to the not_found label when no device was found. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	net/smc: cleanup buffer usage in smc_listen_work()	Karsten Graul
	coccinelle informs about net/smc/af_smc.c:1770:10-11: WARNING: opportunity for kzfree/kvfree_sensitive Its not that kzfree() would help here, the memset() is done to prepare the buffer for another socket receive. Fix that warning message by reordering the calls, while at it eliminate the unneeded variable cclc2 and use sizeof(*buf) as above in the same function. No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	net/smc: consolidate unlocking in same function	Karsten Graul
	Static code checkers warn of inconsistent returns because the lgr mutex is locked in one function and unlocked in a function called by the locking function: net/smc/af_smc.c:823 smc_connect_rdma() warn: inconsistent returns 'smc_client_lgr_pending'. net/smc/af_smc.c:897 smc_connect_ism() warn: inconsistent returns 'smc_server_lgr_pending'. Make the code consistent by doing the unlock in the same function that fetches the lock. No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09	Merge tag 'linux-can-next-for-5.10-20201007' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-5.10-20201007 The first 3 patches are by me and fix several warnings found when compiling the kernel with W=1. Lukas Bulwahn's patch adjusts the MAINTAINERS file, to accommodate the renaming of the mcp251xfd driver. Vincent Mailhol contributes 3 patches for the CAN networking layer. First error queue support is added the the CAN RAW protocol. The second patch converts the get_can_dlc() and get_canfd_dlc() in-Kernel-only macros from using __u8 to u8. The third patch adds a helper function to calculate the length of one bit in in multiple of time quanta. Oliver Hartkopp's patch add support for the ISO 15765-2:2016 transport protocol to the CAN stack. Three patches by Lad Prabhakar add documentation for various new rcar controllers to the device tree bindings of the rcar_can and rcan_canfd driver. Michael Walle's patch adds various processors to the flexcan driver binding documentation. The next two patches are by me and target the flexcan driver aswell. The remove the ack_grp and ack_bit from the fsl,stop-mode DT property and the driver, as they are not used anymore. As these are the last two arguments this change will not break existing device trees. The last three patches are by Srinivas Neeli and target the xilinx_can driver. The first one increases the lower limit for the bit rate prescaler to 2, the other two fix sparse and coverity findings. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>