git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-05-26	Merge branch 'mlxsw-Various-trap-changes-part-2'	David S. Miller
	Ido Schimmel says: ==================== mlxsw: Various trap changes - part 2 This patch set contains another set of small changes in mlxsw trap configuration. It is the last set before exposing control traps (e.g., IGMP query, ARP request) via devlink-trap. Tested with existing devlink-trap selftests. Please see individual patches for a detailed changelog. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum_router: Allow programming link-local prefix routes	Ido Schimmel
	The device has a trap for IPv6 packets that need be routed and have a unicast link-local destination IP (i.e., fe80::/10). This allows mlxsw to ignore link-local routes, as the packets will be trapped to the CPU in any case. However, since link-local routes are not programmed, it is possible for routed packets to hit the default route which might also be programmed to trap packets. This means that packets with a link-local destination IP might be trapped for the wrong reason. To overcome this, allow programming link-local prefix routes (usually one fe80::/64 per-table), so that the packets will be forwarded until reaching the link-local trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Add packet traps for BFD packets	Ido Schimmel
	Bidirectional Forwarding Detection (BFD) provides "low-overhead, short-duration detection of failures in the path between adjacent forwarding engines" (RFC 5880). This is accomplished by exchanging BFD packets between the two forwarding engines. Up until now these packets were trapped via the general local delivery (i.e., IP2ME) trap which also traps a lot of other packets that are not as time-sensitive as BFD packets. Expose dedicated traps for BFD packets so that user space could configure a dedicated policer for them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Treat IPv6 link-local SIP as an exception	Ido Schimmel
	IPv6 packets that need to be forwarded and have a link-local source IP are dropped by the kernel and an ICMPv6 "Destination unreachable" is sent to the sending host. As such, change the trap group of such packets so that they do not interfere with IPv6 management packets. In the future this trap will be exposed as an exception via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Share one group for all locally delivered packets	Ido Schimmel
	Routed IP packets with the Router Alert option need to be trapped to the CPU as they might need to be locally delivered to raw sockets with the IP_ROUTER_ALERT / IPV6_ROUTER_ALERT socket option. Move them to the same group with other packets that might need to be trapped following route lookup. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: reg: Move all trap groups under the same enum	Ido Schimmel
	After the previous patch the split is no longer necessary and all the trap groups can be moved under the same enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum_trap: Do not hard code "thin" policer identifier	Ido Schimmel
	As explained in commit e612523041ab ("mlxsw: spectrum_trap: Introduce dummy group with thin policer"), the purpose of the "thin" policer is to pass as less packets as possible to the CPU. The identifier of this policer is currently set according to the maximum number of used trap groups, but this is fragile: On Spectrum-1 the maximum number of policers is less than the maximum number of trap groups, which might result in an invalid policer identifier in case the number of used trap groups grows beyond the policer limit. Solve this by dynamically allocating the policer identifier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: switchx2: Move SwitchX-2 trap groups out of main enum	Ido Schimmel
	The number of Spectrum trap groups is not infinite, but two identifiers are occupied by SwitchX-2 specific trap groups. Free these identifiers by moving them out of the main enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Reduce priority of locally delivered packets	Ido Schimmel
	To align with recent recommended values. Will be configurable by future patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use same trap group for local routes and link-local destination	Ido Schimmel
	Packets with an IPv6 link-local destination (i.e., fe80::/10) should not be forwarded and are therefore trapped to the CPU for local delivery. Since these packets are trapped for the same logical reason as packets hitting local routes, associate both traps with the same group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use separate trap group for FID miss	Ido Schimmel
	When a packet enters the device it is classified to a filtering identifier (FID) based on the ingress port and VLAN. The FID miss trap is used to trap packets for which a FID could not be found. In mlxsw this trap should only be triggered when a port is enslaved to an OVS bridge and a matching ACL rule could not be found, so as to trigger learning. These packets are therefore completely unrelated to packets hitting local routes and should be in a different group. Move them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use same trap group for various IPv6 packets	Ido Schimmel
	Group these various IPv6 packets (e.g., router solicitations, router advertisement) together and subject them to the same policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Rename IPv6 ND trap group	Ido Schimmel
	The IPv6 Neighbour Discovery (ND) group will be used for various IPv6 packets, not all of which fall under the definition of ND, so rename it to "IPV6" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use same switch case for identical groups	Ido Schimmel
	Trap groups that use the same policer settings can share the same switch case. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use dedicated trap group for ACL trap	Ido Schimmel
	Packets that are trapped via tc's trap action are currently subject to the same policer as packets hitting local routes. The latter are critical to the correct functioning of the control plane, while the former are mainly used for traffic inspection. Split the ACL trap to a separate group with its own policer. Use a higher priority for these traps than for traps using mirror action (e.g., ARP, IGMP). Otherwise, packets matching both traps will not be forwarded in hardware (because of trap action) and also not forwarded in software because they will be marked with 'offload_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: sctp: Fix spelling in Kconfig help	Chris Packham
	Change 'handeled' to 'handled' in the Kconfig help for SCTP. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	Merge branch 'bnxt_en-Bug-fixes'	David S. Miller
	Michael Chan says: ==================== bnxt_en: Bug fixes. 3 bnxt_en driver fixes, covering a bug in preserving the counters during some resets, proper error code when flashing NVRAM fails, and an endian bug when extracting the firmware response message length. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	bnxt_en: fix firmware message length endianness	Edwin Peer
	The explicit mask and shift is not the appropriate way to parse fields out of a little endian struct. The length field is internally __le16 and the strategy employed only happens to work on little endian machines because the offset used is actually incorrect (length is at offset 6). Also remove the related and no longer used definitions from bnxt.h. Fixes: 845adfe40c2a ("bnxt_en: Improve valid bit checking in firmware response message.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	bnxt_en: Fix return code to "flash_device".	Vasundhara Volam
	When NVRAM directory is not found, return the error code properly as per firmware command failure instead of the hardcode -ENOBUFS. Fixes: 3a707bed13b7 ("bnxt_en: Return -EAGAIN if fw command returns BUSY") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	bnxt_en: Fix accumulation of bp->net_stats_prev.	Michael Chan
	We have logic to maintain network counters across resets by storing the counters in bp->net_stats_prev before reset. But not all resets will clear the counters. Certain resets that don't need to change the number of rings do not clear the counters. The current logic accumulates the counters before all resets, causing big jumps in the counters after some resets, such as ethtool -G. Fix it by only accumulating the counters during reset if the irq_re_init parameter is set. The parameter signifies that all rings and interrupts will be reset and that means that the counters will also be reset. Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com> Fixes: b8875ca356f1 ("bnxt_en: Save ring statistics before reset.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mptcp: attempt coalescing when moving skbs to mptcp rx queue	Florian Westphal
	We can try to coalesce skbs we take from the subflows rx queue with the tail of the mptcp rx queue. If successful, the skb head can be discarded early. We can also free the skb extensions, we do not access them after this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: usb: qmi_wwan: add Telit LE910C1-EUX composition	Daniele Palmas
	Add support for Telit LE910C1-EUX composition 0x1031: tty, tty, tty, rmnet Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	r8169: improve rtl_remove_one	Heiner Kallweit
	Don't call netif_napi_del() manually, free_netdev() does this for us. In addition reorder calls to match reverse order of calls in probe(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: check untrusted gso_size at kernel entry	Willem de Bruijn
	Syzkaller again found a path to a kernel crash through bad gso input: a packet with gso size exceeding len. These packets are dropped in tcp_gso_segment and udp[46]_ufo_fragment. But they may affect gso size calculations earlier in the path. Now that we have thlen as of commit 9274124f023b ("net: stricter validation of untrusted gso packets"), check gso_size at entry too. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	Merge branch 'net-ethernet-fec-move-GPR-register-offset-and-bit-into-DT'	David S. Miller
	Fugang Duan says: ==================== net: ethernet: fec: move GPR register offset and bit into DT The commit da722186f654 (net: fec: set GPR bit on suspend by DT configuration) set the GPR reigster offset and bit in driver for wol feature support. It brings trouble to enable wol feature on imx6sx/imx6ul/imx7d platforms that have multiple ethernet instances with different GPR bit for stop mode control. So the patch set is to move GPR register offset and bit define into DT, and enable imx6q/imx6dl imx6qp/imx6sx/imx6ul/imx7d stop mode support. Currently, below NXP i.MX boards support wol: - imx6q/imx6dl/imx6qp sabresd - imx6sx sabreauto - imx7d sdb imx6q/imx6dl/imx6qp sabresd board dts file miss the property "fsl,magic-packet;", so patch#4 is to add the property for stop mode support. v1 -> v2: - driver: switch back to store the quirks bitmask in driver_data - dt-bindings: rename 'gpr' property string to 'fsl,stop-mode' - imx6/7 dtsi: add imx6sx/imx6ul/imx7d ethernet stop mode property v2 -> v3: - driver: suggested by Sascha Hauer, use a struct fec_devinfo for abstracting differences between different hardware variants, it can give more freedom to describe the differences. - imx6/7 dtsi: correct one typo pointed out by Andrew. Thanks Martin, Andrew and Sascha Hauer for the review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	ARM: dts: imx6qdl-sabresd: enable fec wake-on-lan	Fugang Duan
	Enable ethernet wake-on-lan feature for imx6q/dl/qp sabresd boards since the PHY clock is supplied by external osc. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	ARM: dts: imx: add ethernet stop mode property	Fugang Duan
	- Update the imx6qdl gpr property to define gpr register offset and bit in DT. - Add imx6sx/imx6ul/imx7d ethernet stop mode property. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	dt-bindings: fec: update the gpr property	Fugang Duan
	- rename the 'gpr' property string to 'fsl,stop-mode'. - Update the property to define gpr register offset and bit in DT, since different instance have different gpr bit. v2: * rename 'gpr' property string to 'fsl,stop-mode'. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: ethernet: fec: move GPR register offset and bit into DT	Fugang Duan
	The commit da722186f654 (net: fec: set GPR bit on suspend by DT configuration) set the GPR reigster offset and bit in driver for wake on lan feature. But it introduces two issues here: - one SOC has two instances, they have different bit - different SOCs may have different offset and bit So to support wake-on-lan feature on other i.MX platforms, it should configure the GPR reigster offset and bit from DT. So the patch is to improve the commit da722186f654 (net: fec: set GPR bit on suspend by DT configuration) to support multiple ethernet instances on i.MX series. v2: * switch back to store the quirks bitmask in driver_data v3: * suggested by Sascha Hauer, use a struct fec_devinfo for abstracting differences between different hardware variants, it can give more freedom to describe the differences. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net/smc: mark smc_pnet_policy as const	Dmitry Vyukov
	Netlink policies are generally declared as const. This is safer and prevents potential bugs. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mptcp: avoid NULL-ptr derefence on fallback	Paolo Abeni
	In the MPTCP receive path we must cope with TCP fallback on blocking recvmsg(). Currently in such code path we detect the fallback condition, but we don't fetch the struct socket required for fallback. The above allowed syzkaller to trigger a NULL pointer dereference: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline] RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904 Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d RSP: 0018:ffffc90001077b98 EFLAGS: 00010202 RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000 FS: 00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891 inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:886 [inline] sock_recvmsg net/socket.c:904 [inline] __sys_recvfrom+0x2f3/0x470 net/socket.c:2057 __do_sys_recvfrom net/socket.c:2075 [inline] __se_sys_recvfrom net/socket.c:2071 [inline] __x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071 do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Address the issue initializing the struct socket reference before entering the fallback code. Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com Suggested-by: Ondrej Mosnacek <omosnace@redhat.com> Fixes: 8ab183deb26a ("mptcp: cope with later TCP fallback") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	Merge tag 'mac80211-next-for-net-next-2020-04-25' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== One batch of changes, containing: * hwsim improvements from Jouni and myself, to be able to test more scenarios easily * some more HE (802.11ax) support * some initial S1G (sub 1 GHz) work for fractional MHz channels * some (action) frame registration updates to help DPP support * along with other various improvements/fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	clk: qcom: gcc: Fix parent for gpll0_out_even	Vinod Koul
	Documentation says that gpll0 is parent of gpll0_out_even, somehow driver coded that as bi_tcxo, so fix it Fixes: 2a1d7eb854bb ("clk: qcom: gcc: Add global clock controller driver for SM8150") Reported-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Vinod Koul <vkoul@kernel.org> Link: https://lkml.kernel.org/r/20200521052728.2141377-1-vkoul@kernel.org Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-05-26	clk: qcom: sm8250 gcc depends on QCOM_GDSC	Jonathan Marek
	The driver will always fail to probe without QCOM_GDSC, so select it. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Link: https://lkml.kernel.org/r/20200523040947.31946-1-jonathan@marek.ca Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Fixes: 3e5770921a88 ("clk: qcom: gcc: Add global clock controller driver for SM8250") Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-05-26	net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a	Fugang Duan
	For rx filter 'HWTSTAMP_FILTER_PTP_V2_EVENT', it should be PTP v2/802.AS1, any layer, any kind of event packet, but HW only take timestamp snapshot for below PTP message: sync, Pdelay_req, Pdelay_resp. Then it causes below issue when test E2E case: ptp4l[2479.534]: port 1: received DELAY_REQ without timestamp ptp4l[2481.423]: port 1: received DELAY_REQ without timestamp ptp4l[2481.758]: port 1: received DELAY_REQ without timestamp ptp4l[2483.524]: port 1: received DELAY_REQ without timestamp ptp4l[2484.233]: port 1: received DELAY_REQ without timestamp ptp4l[2485.750]: port 1: received DELAY_REQ without timestamp ptp4l[2486.888]: port 1: received DELAY_REQ without timestamp ptp4l[2487.265]: port 1: received DELAY_REQ without timestamp ptp4l[2487.316]: port 1: received DELAY_REQ without timestamp Timestamp snapshot dependency on register bits in received path: SNAPTYPSEL TSMSTRENA TSEVNTENA PTP_Messages 01 x 0 SYNC, Follow_Up, Delay_Req, Delay_Resp, Pdelay_Req, Pdelay_Resp, Pdelay_Resp_Follow_Up 01 0 1 SYNC, Pdelay_Req, Pdelay_Resp For dwmac v5.10a, enabling all events by setting register DWC_EQOS_TIME_STAMPING[SNAPTYPSEL] to 2’b01, clearing bit [TSEVNTENA] to 0’b0, which can support all required events. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	Merge branch 'nexthop-group-fixes'	David S. Miller
	David Ahern says: ==================== nexthops: Fix 2 fundamental flaws with nexthop groups Nik's torture tests have exposed 2 fundamental mistakes with the initial nexthop code for groups. First, the nexthops entries and num_nh in the nh_grp struct should not be modified once the struct is set under rcu. Doing so has major affects on the datapath seeing valid nexthop entries. Second, the helpers in the header file were convenient for not repeating code, but they cause datapath walks to potentially see 2 different group structs after an rcu replace, disrupting a walk of the path objects. This second problem applies solely to IPv4 as I re-used too much of the existing code in walking legs of a multipath route. Patches 1 is refactoring change to simplify the overhead of reviewing and understanding the change in patch 2 which fixes the update of nexthop groups when a compnent leg is removed. Patches 3-5 address the second problem. Patch 3 inlines the multipath check such that the mpath lookup and subsequent calls all use the same nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path with iterative calls to fib_info_nhc. fib_info_num_path can be used in control plane path in a 'for loop' with subsequent fib_info_nhc calls to get each leg since the nh_grp struct is only changed while holding the rtnl; the combination can not be used in the data plane with external nexthops as it involves repeated dereferences of nh_grp struct which can change between calls. Similarly, nexthop_is_multipath can be used for branching decisions in the datapath since the nexthop type can not be changed (a group can not be converted to standalone and vice versa). Patch set developed in coordination with Nikolay Aleksandrov. He did a lot of work creating a good reproducer, discussing options to fix it and testing iterations. I have adapted Nik's commands into additional tests in the nexthops selftest script which I will send against -next. v2 - fixed whitespace errors ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	ipv4: nexthop version of fib_info_nh_uses_dev	David Ahern
	Similar to the last path, need to fix fib_info_nh_uses_dev for external nexthops to avoid referencing multiple nh_grp structs. Move the device check in fib_info_nh_uses_dev to a helper and create a nexthop version that is called if the fib_info uses an external nexthop. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	ipv4: Refactor nhc evaluation in fib_table_lookup	David Ahern
	FIB lookups can return an entry that references an external nexthop. While walking the nexthop struct we do not want to make multiple calls into the nexthop code which can result in 2 different structs getting accessed - one returning the number of paths the rest of the loop seeing a different nh_grp struct. If the nexthop group shrunk, the result is an attempt to access a fib_nh_common that does not exist for the new nh_grp struct but did for the old one. To fix that move the device evaluation code to a helper that can be used for inline fib_nh path as well as external nexthops. Update the existing check for fi->nh in fib_table_lookup to call a new helper, nexthop_get_nhc_lookup, which walks the external nexthop with a single rcu dereference. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	nexthop: Expand nexthop_is_multipath in a few places	David Ahern
	I got too fancy consolidating checks on multipath type. The result is that path lookups can access 2 different nh_grp structs as exposed by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to avoid multiple, nh_grp dereferences and make decisions based on the consistent struct. Only 2 places left using nexthop_is_multipath are within IPv6, both only check that the nexthop is a multipath for a branching decision which are acceptable. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	nexthops: don't modify published nexthop groups	Nikolay Aleksandrov
	We must avoid modifying published nexthop groups while they might be in use, otherwise we might see NULL ptr dereferences. In order to do that we allocate 2 nexthoup group structures upon nexthop creation and swap between them when we have to delete an entry. The reason is that we can't fail nexthop group removal, so we can't handle allocation failure thus we move the extra allocation on creation where we can safely fail and return ENOMEM. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry	David Ahern
	Move nh_grp dereference and check for removing nexthop group due to all members gone into remove_nh_grp_entry. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	Merge branch 'net-phy-mscc-miim-reduce-waiting-time-between-MDIO-transactions'	David S. Miller
	Antoine Tenart says: ==================== net: phy: mscc-miim: reduce waiting time between MDIO transactions This series aims at reducing the waiting time between MDIO transactions when using the MSCC MIIM MDIO controller. I'm not sure we need patch 4/4 and we could reasonably drop it from the series. I'm including the patch as it could help to ensure the system is functional with a non optimal configuration. We needed to improve the driver's performances as when using a PHY requiring lots of registers accesses (such as the VSC85xx family), delays would add up and ended up to be quite large which would cause issues such as: a slow initialization of the PHY, and issues when using timestamping operations (this feature will be sent quite soon to the mailing lists). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: read poll when high resolution timers are disabled	Antoine Tenart
	The driver uses a read polling mechanism to check the status of the MDIO bus, to know if it is ready to accept next commands. This polling mechanism uses usleep_delay() under the hood between reads which is fine as long as high resolution timers are enabled. Otherwise the delays will end up to be much longer than expected. This patch fixes this by using udelay() under the hood when CONFIG_HIGH_RES_TIMERS isn't enabled. This increases CPU usage. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: improve waiting logic	Antoine Tenart
	The MSCC MIIM MDIO driver uses a waiting logic to wait for the MDIO bus to be ready to accept next commands. It does so by polling the BUSY status bit which indicates the MDIO bus has completed all pending operations. This can take time, and the controller supports writing the next command as soon as there are no pending commands (which happens while the MDIO bus is busy completing its current command). This patch implements this improved logic by adding an helper to poll the PENDING status bit, and by adjusting where we should wait for the bus to not be busy or to not be pending. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: remove redundant timeout check	Antoine Tenart
	readl_poll_timeout already returns -ETIMEDOUT if the condition isn't satisfied, there's no need to check again the condition after calling it. Remove the redundant timeout check. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: use more reasonable delays	Antoine Tenart
	The MSCC MIIM MDIO driver uses delays to read poll a status register. I made multiple tests on a Ocelot PCS120 platform which led me to reduce those delays. The delay in between which the polling function is allowed to sleep is reduced from 100us to 50us which in almost all cases is a good value to succeed at the first retry. The overall delay is also lowered as the prior value was really way to high, 10000us is large enough. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: mdiobus: add clause 45 mdiobus accessors	Russell King
	There is a recurring pattern throughout some of the PHY code converting a devad and regnum to our packed clause 45 representation. Rather than having this scattered around the code, let's put a common translation function in mdio.h, and provide some register accessors. Convert the phylib core, phylink, bcm87xx and cortina to use these. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	Merge branch 'flow-mpls'	David S. Miller
	Guillaume Nault says: ==================== flow_dissector, cls_flower: Add support for multiple MPLS Label Stack Entries Currently, the flow dissector and the Flower classifier can only handle the first entry of an MPLS label stack. This patch series generalises the code to allow parsing and matching the Label Stack Entries that follow. Patch 1 extends the flow dissector to parse MPLS LSEs until the Bottom Of Stack bit is reached. The number of parsed LSEs is capped at FLOW_DIS_MPLS_MAX (arbitrarily set to 7). Flower and the NFP driver are updated to take into account the new layout of struct flow_dissector_key_mpls. Patch 2 extends Flower. It defines new netlink attributes, which are independent from the previous MPLS ones. Mixing the old and the new attributes in a same filter is not allowed. For backward compatibility, the old attributes are used when dumping filters that don't require the new ones. Changes since v2: * Fix compilation with the new MLX5 bareudp tunnel code. Changes since v1: * Fix compilation of NFP driver (kbuild test robot). * Fix sparse warning with entropy label (kbuild test robot). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	cls_flower: Support filtering on multiple MPLS Label Stack Entries	Guillaume Nault
	With struct flow_dissector_key_mpls now recording the first FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of these LSEs independently. In order to avoid creating new netlink attributes for every possible depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute that contains the list of LSEs to match. Each LSE is represented by another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains the attributes representing the depth and the MPLS fields to match at this depth (label, TTL, etc.). For each MPLS field, the mask is always set to all-ones, as this is what the original API did. We could allow user configurable masks in the future if there is demand for more flexibility. The new API also allows to only specify an LSE depth. In that case, Flower only verifies that the MPLS label stack depth is greater or equal to the provided depth (that is, an LSE exists at this depth). Filters that only match on one (or more) fields of the first LSE are dumped using the old netlink attributes, to avoid confusing user space programs that don't understand the new API. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	flow_dissector: Parse multiple MPLS Label Stack Entries	Guillaume Nault
	The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>