linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-05-14	bnxt: Add XDP frame size to driver	Jesper Dangaard Brouer
	This driver uses full PAGE_SIZE pages when XDP is enabled. In case of XDP uses driver uses __bnxt_alloc_rx_page which does full page DMA-map. Thus, xdp_adjust_tail grow is DMA compliant for XDP_TX action that does DMA-sync. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com> Link: https://lore.kernel.org/bpf/158945334769.97035.13437970179897613984.stgit@firesoul
2020-05-14	xdp: Add frame size to xdp_buff	Jesper Dangaard Brouer
	XDP have evolved to support several frame sizes, but xdp_buff was not updated with this information. The frame size (frame_sz) member of xdp_buff is introduced to know the real size of the memory the frame is delivered in. When introducing this also make it clear that some tailroom is reserved/required when creating SKBs using build_skb(). It would also have been an option to introduce a pointer to data_hard_end (with reserved offset). The advantage with frame_sz is that (like rxq) drivers only need to setup/assign this value once per NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to store frame_sz inside xdp_rxq_info, because it's varies per packet as it can be based/depend on packet length. V2: nitpick: deduct -> deduce Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945334261.97035.555255657490688547.stgit@firesoul
2020-05-14	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller
	Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-14 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON. 2) support for narrow loads in bpf_sock_addr progs and additional helpers in cg-skb progs, from Andrey. 3) bpf benchmark runner, from Andrii. 4) arm and riscv JIT optimizations, from Luke. 5) bpf iterator infrastructure, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15	Merge tag 'drm-intel-fixes-2020-05-13-1' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Handle idling during i915_gem_evict_something busy loops (Chris) - Mark current submissions with a weak-dependency (Chris) - Propagate errror from completed fences (Chris) - Fixes on execlist to avoid GPU hang situation (Chris) - Fixes couple deadlocks (Chris) - Timeslice preemption fixes (Chris) - Fix Display Port interrupt handling on Tiger Lake (Imre) - Reduce debug noise around Frame Buffer Compression +(Peter) - Fix logic around IPC W/a for Coffee Lake and Kaby Lake +(Sultan) - Avoid dereferencing a dead context (Chris) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200514040235.GA2164266@intel.com
2020-05-14	Merge branch 'expand-cg_skb-helpers'	Alexei Starovoitov
	Andrey Ignatov says: ==================== v2->v3: - better documentation for bpf_sk_cgroup_id in uapi (Yonghong Song) - save/restore errno in network helpers (Yonghong Song) - cleanup leftover after switching selftest to skeleton (Yonghong Song) - switch from map to skel->bss in selftest (Yonghong Song) v1->v2: - switch selftests to skeleton. This patch set allows a bunch of existing sk lookup and skb cgroup id helpers, and adds two new bpf_sk_{,ancestor_}cgroup_id helpers to be used in cgroup skb programs. It fills the gap to cover a use-case to apply intra-host cgroup-bpf network policy based on a source cgroup a packet comes from. For example, there can be multiple containers A, B, C running on a host. Every such container runs in its own cgroup that can have multiple sub-cgroups. But all these containers can share some IP addresses. At the same time container A wants to have a policy for a server S running in it so that only clients from this same container can connect to S, but not from other containers (such as B, C). Source IP address can't be used to decide whether to allow or deny a packet, but it looks reasonable to filter by cgroup id. The patch set allows to implement the following policy: * when an ingress packet comes to container's cgroup, lookup peer (client) socket this packet comes from; * having peer socket, get its cgroup id; * compare peer cgroup id with self cgroup id and allow packet only if they match, i.e. it comes from same cgroup; * the "sub-cgroup" part of the story can be addressed by getting not direct cgroup id of the peer socket, but ancestor cgroup id on specified level, similar to existing "ancestor" flavors of cgroup id helpers. A newly introduced selftest implements such a policy in its basic form to provide a better idea on the use-case. Patch 1 allows existing sk lookup helpers in cgroup skb. Patch 2 allows skb_ancestor_cgroup_id in cgrou skb. Patch 3 introduces two new helpers to get cgroup id of socket. Patch 4 extends network helpers to use them in the next patch. Patch 5 adds selftest / example of use-case. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-14	selftests/bpf: Test for sk helpers in cgroup skb	Andrey Ignatov
	Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and bpf_sk_ancestor_cgroup_id helpers from cgroup skb program. The test creates a testing cgroup, starts a TCPv6 server inside the cgroup and creates two client sockets: one inside testing cgroup and one outside. Then it attaches cgroup skb program to the cgroup that checks all TCP segments coming to the server and allows only those coming from the cgroup of the server. If a segment comes from a peer outside of the cgroup, it'll be dropped. Finally the test checks that client from inside testing cgroup can successfully connect to the server, but client outside the cgroup fails to connect by timeout. The main goal of the test is to check newly introduced bpf_sk_{,ancestor_}cgroup_id helpers. It also checks a couple of socket lookup helpers (tcp & release), but lookup helpers were introduced much earlier and covered by other tests. Here it's mostly checked that they can be called from cgroup skb. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com
2020-05-14	selftests/bpf: Add connect_fd_to_fd, connect_wait net helpers	Andrey Ignatov
	Add two new network helpers. connect_fd_to_fd connects an already created client socket fd to address of server fd. Sometimes it's useful to separate client socket creation and connecting this socket to a server, e.g. if client socket has to be created in a cgroup different from that of server cgroup. Additionally connect_to_fd is now implemented using connect_fd_to_fd, both helpers don't treat EINPROGRESS as an error and let caller decide how to proceed with it. connect_wait is a helper to work with non-blocking client sockets so that if connect_to_fd or connect_fd_to_fd returned -1 with errno == EINPROGRESS, caller can wait for connect to finish or for connection timeout. The helper returns -1 on error, 0 on timeout (1sec, hard-coded), and positive number on success. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com
2020-05-14	bpf: Introduce bpf_sk_{, ancestor_}cgroup_id helpers	Andrey Ignatov
	With having ability to lookup sockets in cgroup skb programs it becomes useful to access cgroup id of retrieved sockets so that policies can be implemented based on origin cgroup of such socket. For example, a container running in a cgroup can have cgroup skb ingress program that can lookup peer socket that is sending packets to a process inside the container and decide whether those packets should be allowed or denied based on cgroup id of the peer. More specifically such ingress program can implement intra-host policy "allow incoming packets only from this same container and not from any other container on same host" w/o relying on source IP addresses since quite often it can be the case that containers share same IP address on the host. Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id(). These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id helpers with the only difference that sk is used to get cgroup id instead of skb, and share code with them. See documentation in UAPI for more details. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com
2020-05-14	bpf: Allow skb_ancestor_cgroup_id helper in cgroup skb	Andrey Ignatov
	cgroup skb programs already can use bpf_skb_cgroup_id. Allow bpf_skb_ancestor_cgroup_id as well so that container policies can be implemented for a container that can have sub-cgroups dynamically created, but policies should still be implemented based on cgroup id of container itself not on an id of a sub-cgroup. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/8874194d6041eba190356453ea9f6071edf5f658.1589486450.git.rdna@fb.com
2020-05-14	bpf: Allow sk lookup helpers in cgroup skb	Andrey Ignatov
	Currently sk lookup helpers are allowed in tc, xdp, sk skb, and cgroup sock_addr programs. But they would be useful in cgroup skb as well so that for example cgroup skb ingress program can lookup a peer socket a packet comes from on same host and make a decision whether to allow or deny this packet based on the properties of that socket, e.g. cgroup that peer socket belongs to. Allow the following sk lookup helpers in cgroup skb: * bpf_sk_lookup_tcp; * bpf_sk_lookup_udp; * bpf_sk_release; * bpf_skc_lookup_tcp. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f8c7ee280f1582b586629436d777b6db00597d63.1589486450.git.rdna@fb.com
2020-05-14	selftest/bpf: Fix spelling mistake "SIGALARM" -> "SIGALRM"	Colin Ian King
	There is a spelling mistake in an error message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200514121529.259668-1-colin.king@canonical.com
2020-05-14	bpf: Fix bpf_iter's task iterator logic	Andrii Nakryiko
	task_seq_get_next might stop prematurely if get_pid_task() fails to get task_struct. Failure to do so doesn't mean that there are no more tasks with higher pids. Procfs's iteration algorithm (see next_tgid in fs/proc/base.c) does a retry in such case. After this fix, instead of stopping prematurely after about 300 tasks on my server, bpf_iter program now returns >4000, which sounds much closer to reality. Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200514055137.1564581-1-andriin@fb.com
2020-05-14	selftests/bpf: Test narrow loads for bpf_sock_addr.user_port	Andrey Ignatov
	Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr programs. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com
2020-05-14	bpf: Support narrow loads from bpf_sock_addr.user_port	Andrey Ignatov
	bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly code in BPF programs, like: volatile __u32 user_port = ctx->user_port; __u16 port = bpf_ntohs(user_port); Since otherwise clang may optimize the load to be 2-byte and it's rejected by verifier. Add support for 1- and 2-byte loads same way as it's supported for other fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
2020-05-14	samples/bpf: xdp_redirect_cpu: Set MAX_CPUS according to NR_CPUS	Lorenzo Bianconi
	xdp_redirect_cpu is currently failing in bpf_prog_load_xattr() allocating cpu_map map if CONFIG_NR_CPUS is less than 64 since cpu_map_alloc() requires max_entries to be less than NR_CPUS. Set cpu_map max_entries according to NR_CPUS in xdp_redirect_cpu_kern.c and get currently running cpus in xdp_redirect_cpu_user.c Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/374472755001c260158c4e4b22f193bdd3c56fb7.1589300442.git.lorenzo@kernel.org
2020-05-14	MAINTAINERS: Mark networking drivers as Maintained.	David S. Miller
	Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	r8169: don't include linux/moduleparam.h	Heiner Kallweit
	93882c6f210a ("r8169: switch from netif_xxx message functions to netdev_xxx") removed the last module parameter from the driver, therefore there's no need any longer to include linux/moduleparam.h. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	r8169: remove not needed checks in rtl8169_set_eee	Heiner Kallweit
	After 9de5d235b60a ("net: phy: fix aneg restart in phy_ethtool_set_eee") we don't need the check for aneg being enabled any longer, and as discussed with Russell configuring the EEE advertisement should be supported even if we're in a half-duplex mode currently. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: dsa: felix: fix incorrect clamp calculation for burst	Colin Ian King
	Currently burst is clamping on rate and not burst, the assignment of burst from the clamping discards the previous assignment of burst. This looks like a cut-n-paste error from the previous clamping calculation on ramp. Fix this by replacing ramp with burst. Addresses-Coverity: ("Unused value") Fixes: 0fbabf875d18 ("net: dsa: felix: add support Credit Based Shaper(CBS) for hardware offload") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	ipmr: Add lockdep expression to ipmr_for_each_table macro	Amol Grover
	During the initialization process, ipmr_new_table() is called to create new tables which in turn calls ipmr_get_table() which traverses net->ipv4.mr_tables without holding the writer lock. However, this is safe to do so as no tables exist at this time. Hence add a suitable lockdep expression to silence the following false-positive warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc3-next-20200428-syzkaller #0 Not tainted ----------------------------- net/ipv4/ipmr.c:136 RCU-list traversed in non-reader section!! ipmr_get_table+0x130/0x160 net/ipv4/ipmr.c:136 ipmr_new_table net/ipv4/ipmr.c:403 [inline] ipmr_rules_init net/ipv4/ipmr.c:248 [inline] ipmr_net_init+0x133/0x430 net/ipv4/ipmr.c:3089 Fixes: f0ad0860d01e ("ipv4: ipmr: support multiple tables") Reported-by: syzbot+1519f497f2f9f08183c6@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	ipmr: Fix RCU list debugging warning	Amol Grover
	ipmr_for_each_table() macro uses list_for_each_entry_rcu() for traversing outside of an RCU read side critical section but under the protection of rtnl_mutex. Hence, add the corresponding lockdep expression to silence the following false-positive warning at boot: [ 4.319347] ============================= [ 4.319349] WARNING: suspicious RCU usage [ 4.319351] 5.5.4-stable #17 Tainted: G E [ 4.319352] ----------------------------- [ 4.319354] net/ipv4/ipmr.c:1757 RCU-list traversed in non-reader section!! Fixes: f0ad0860d01e ("ipv4: ipmr: support multiple tables") Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: phy: mdio-moxart: remove unneeded include	Bartosz Golaszewski
	mdio-moxart doesn't use regulators in the driver code. We can remove the regulator include. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	dt-bindings: dp83867: Convert DP83867 to yaml	Dan Murphy
	Convert the dp83867 binding to yaml. Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	dt-bindings: net: dp83869: Update licensing info	Dan Murphy
	Add BSD 2 Clause to the licensing. CC: Rob Herring <robh@kernel.org> Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c	Madhuparna Bhowmik
	This patch fixes the following warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc5-next-20200514-syzkaller #0 Not tainted ----------------------------- drivers/net/hamradio/bpqether.c:149 RCU-list traversed in non-reader section!! Since rtnl lock is held, pass this cond in list_for_each_entry_rcu(). Reported-by: syzbot+bb82cafc737c002d11ca@syzkaller.appspotmail.com Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810	Kevin Lo
	Set the correct bit when checking for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54810 PHY. Fixes: 0ececcfc9267 ("net: phy: broadcom: Allow BCM54810 to use bcm54xx_adjust_rxrefclk()") Signed-off-by: Kevin Lo <kevlo@kevlo.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	hinic: update huawei ethernet driver maintainer	Luo bin
	update huawei ethernet driver maintainer from aviad to Bin luo Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	hinic: add set_ringparam ethtool_ops support	Luo bin
	support to change TX/RX queue depth with ethtool -G Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	devlink: refactor end checks in devlink_nl_cmd_region_read_dumpit	Jakub Kicinski
	Clean up after recent fixes, move address calculations around and change the variable init, so that we can have just one start_offset == end_offset check. Make the check a little stricter to preserve the -EINVAL error if requested start offset is larger than the region itself. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	Merge branch 'am65-cpsw-add-taprio-EST-offload-support'	David S. Miller
	Murali Karicheri says: ==================== am65-cpsw: add taprio/EST offload support AM65 CPSW h/w supports Enhanced Scheduled Traffic (EST – defined in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018) configuration. EST allows express queue traffic to be scheduled (placed) on the wire at specific repeatable time intervals. In Linux kernel, EST configuration is done through tc command and the taprio scheduler in the net core implements a software only scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration, user indicate "flag 2" in the command which is then parsed by taprio scheduler in net core and indicate that the command is to be offloaded to h/w. taprio then offloads the command to the driver by calling ndo_setup_tc() ndo ops. This patch implements ndo_setup_tc() as well as other changes required to offload EST configuration to CPSW h/w For more details please refer patch 2/2. This series is based on original work done by Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> to add taprio offload support to AM65 CPSW 2G. 1. Example configuration 3 Gates ifconfig eth0 down ethtool -L eth0 tx 3 ethtool --set-priv-flags eth0 p0-rx-ptype-rrobin off ifconfig eth0 192.168.2.20 tc qdisc replace dev eth0 parent root handle 100 taprio \ num_tc 3 \ map 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 1@1 1@2 \ base-time 0000 \ sched-entry S 4 125000 \ sched-entry S 2 125000 \ sched-entry S 1 250000 \ flags 2 2. Example configuration 8 Gates ifconfig eth0 down ethtool -L eth0 tx 8 ethtool --set-priv-flags eth0 p0-rx-ptype-rrobin off ifconfig eth0 192.168.2.20 tc qdisc replace dev eth0 parent root handle 100 taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0000 \ sched-entry S 80 125000 \ sched-entry S 40 125000 \ sched-entry S 20 125000 \ sched-entry S 10 125000 \ sched-entry S 08 125000 \ sched-entry S 04 125000 \ sched-entry S 02 125000 \ sched-entry S 01 125000 \ flags 2 Classify frames to particular priority using skbedit so that they land at a specific queue in cpsw h/w which is Gated by the EST gate which opens based on the sched-entry. tc qdisc add dev eth0 clsact In the below for example an iperf3 session with destination port 5007 will go through Q7. tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5007 0xffff action skbedit priority 7 tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5006 0xffff action skbedit priority 6 tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5005 0xffff action skbedit priority 5 tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5004 0xffff action skbedit priority 4 tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5003 0xffff action skbedit priority 3 tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5002 0xffff action skbedit priority 2 tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5001 0xffff action skbedit priority 1 iperf3 -c 192.168.2.10 -u -l1470 -b32M -t1 -p 5007 Testing was done by capturing frames at the PC using wireshark and checking for the bust interval or cycle time of UDP frames with a specific port number. Verified that the distance between first frame of a burst (cycle-time) is 1 milli second and burst duration is within 125 usec based on the received packet timestamp shown in wireshark packet display. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	ethernet: ti: am65-cpsw-qos: add TAPRIO offload support	Ivan Khoronzhuk
	AM65 CPSW h/w supports Enhanced Scheduled Traffic (EST – defined in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018) configuration. EST allows express queue traffic to be scheduled (placed) on the wire at specific repeatable time intervals. In Linux kernel, EST configuration is done through tc command and the taprio scheduler in the net core implements a software only scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration, user indicate "flag 2" in the command which is then parsed by taprio scheduler in net core and indicate that the command is to be offloaded to h/w. taprio then offloads the command to the driver by calling ndo_setup_tc() ndo ops. This patch implements ndo_setup_tc() to offload EST configuration to CPSW h/w. Currently driver supports only SetGateStates operation. EST operates on a repeating time interval generated by the CPTS EST function generator. Each Ethernet port has a global EST fetch RAM that can be configured as 2 buffers, each of 64 locations or one large buffer of 128 locations. In 2 buffer configuration, a ping pong mechanism is used to hold the active schedule (oper) in one buffer and new (admin) command in the other. Each 22-bit fetch command consists of a 14-bit fetch count (14 MSB’s) and an 8-bit priority fetch allow (8 LSB’s) that will be applied for the fetch count time in wireside clocks. Driver process each of the sched-entry in the offload command and update the fetch RAM. Driver configures duration in sched-entry into the fetch count and Gate mask into the priority fetch bits of the RAM. Then configures the CPTS EST function generator to activate the schedule. Currently driver supports only 2 buffer configuration which means driver supports a max cycle time of ~8 msec. CPSW supports a configurable number of priority queues (up to 8) and needs to be switched to this mode from the default round robin mode before EST can be offloaded. User configures these through ethtool commands (-L for changing number of queues and --set-priv-flags to disable round robin mode). Driver doesn't enable EST if pf_p0_rx_ptype_rrobin privat flag is set. The flag is common for all ports, and so can't be just overridden by taprio configuration w/o user involvement. Command fails if pf_p0_rx_ptype_rrobin is already set in the driver. Scheds (commands) configuration depends on interface speed so driver translates the duration to the fetch count based on link speed. Each schedule can be constructed with several command entries in fetch RAM depending on interval. For example if each sched has timer interval < ~130us on 1000 Mb link then each sched consumes one command and have 1:1 mapping. When Ethernet link goes down, driver purge the configuration if link is down for more than 1 second. The patch allows to update the timer and scheds memory only if it's really needed, and skip cases required the user to stop timer by configuring only shceds memory. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	ethernet: ti: am65-cpts: add routines to support taprio offload	Ivan Khoronzhuk
	TAPRIO/EST offload support in CPSW2G requires EST scheduler function enabled in CPTS. So this patch add a function to set cycle time for EST scheduler. It also add a function for getting time in ns of PHC clock for taprio qdisc configuration. Mostly to verify if timer update is needed or to get actual state of oper/admin schedule. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	scsi: target: Put lun_ref at end of tmr processing	Bodo Stroesser
	Testing with Loopback I found that, after a Loopback LUN has executed a TMR, I can no longer unlink the LUN. The rm command hangs in transport_clear_lun_ref() at wait_for_completion(&lun->lun_shutdown_comp) The reason is, that transport_lun_remove_cmd() is not called at the end of target_tmr_work(). It seems, that in other fabrics this call happens implicitly when the fabric drivers call transport_generic_free_cmd() during their ->queue_tm_rsp(). Unfortunately Loopback seems to not comply to the common way of calling transport_generic_free_cmd() from ->queue_*(). Instead it calls transport_generic_free_cmd() from its ->check_stop_free() only. But the ->check_stop_free() is called by transport_cmd_check_stop_to_fabric() after it has reset the se_cmd->se_lun pointer. Therefore the following transport_generic_free_cmd() skips the transport_lun_remove_cmd(). So this patch re-adds the transport_lun_remove_cmd() at the end of target_tmr_work(), which was removed during commit 2c9fa49e100f ("scsi: target/core: Make ABORT and LUN RESET handling synchronous"). For fabrics using transport_generic_free_cmd() in the usual way the double call to transport_lun_remove_cmd() doesn't harm, as transport_lun_remove_cmd() checks for this situation and does not release lun_ref twice. Link: https://lore.kernel.org/r/20200513153443.3554-1-bstroesser@ts.fujitsu.com Fixes: 2c9fa49e100f ("scsi: target/core: Make ABORT and LUN RESET handling synchronous") Cc: stable@vger.kernel.org Tested-by: Bryant G. Ly <bryangly@gmail.com> Reviewed-by: Bart van Assche <bvanassche@acm.org> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-14	evm: Fix a small race in init_desc()	Dan Carpenter
	The IS_ERR_OR_NULL() function has two conditions and if we got really unlucky we could hit a race where "ptr" started as an error pointer and then was set to NULL. Both conditions would be false even though the pointer at the end was NULL. This patch fixes the problem by ensuring that "tfm" can only be NULL or valid. I have introduced a "tmp_tfm" variable to make that work. I also reversed a condition and pulled the code in one tab. Reported-by: Roberto Sassu <roberto.sassu@huawei.com> Fixes: 53de3b080d5e ("evm: Check also if tfm is an error pointer in init_desc()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Krzysztof Struczynski <krzysztof.struczynski@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2020-05-14	cifs: fix leaked reference on requeued write	Adam McCoy
	Failed async writes that are requeued may not clean up a refcount on the file, which can result in a leaked open. This scenario arises very reliably when using persistent handles and a reconnect occurs while writing. cifs_writev_requeue only releases the reference if the write fails (rc != 0). The server->ops->async_writev operation will take its own reference, so the initial reference can always be released. Signed-off-by: Adam McCoy <adam@forsedomani.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-05-14	NFSv3: fix rpc receive buffer size for MOUNT call	Olga Kornievskaia
	Prior to commit e3d3ab64dd66 ("SUNRPC: Use au_rslack when computing reply buffer size"), there was enough slack in the reply buffer to commodate filehandles of size 60bytes. However, the real problem was that the reply buffer size for the MOUNT operation was not correctly calculated. Received buffer size used the filehandle size for NFSv2 (32bytes) which is much smaller than the allowed filehandle size for the v3 mounts. Fix the reply buffer size (decode arguments size) for the MNT command. Fixes: 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing reply buffer size") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-14	tcp: fix error recovery in tcp_zerocopy_receive()	Eric Dumazet
	If user provides wrong virtual address in TCP_ZEROCOPY_RECEIVE operation we want to return -EINVAL error. But depending on zc->recv_skip_hint content, we might return -EIO error if the socket has SOCK_DONE set. Make sure to return -EINVAL in this case. BUG: KMSAN: uninit-value in tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline] BUG: KMSAN: uninit-value in do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685 CPU: 1 PID: 625 Comm: syz-executor.0 Not tainted 5.7.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline] do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685 tcp_getsockopt+0xf8/0x1f0 net/ipv4/tcp.c:3728 sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3131 __sys_getsockopt+0x533/0x7b0 net/socket.c:2177 __do_sys_getsockopt net/socket.c:2192 [inline] __se_sys_getsockopt+0xe1/0x100 net/socket.c:2189 __x64_sys_getsockopt+0x62/0x80 net/socket.c:2189 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:297 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c829 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1deeb72c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00000000004e01e0 RCX: 000000000045c829 RDX: 0000000000000023 RSI: 0000000000000006 RDI: 0000000000000009 RBP: 000000000078bf00 R08: 0000000020000200 R09: 0000000000000000 R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000001d8 R14: 00000000004d3038 R15: 00007f1deeb736d4 Local variable ----zc@do_tcp_getsockopt created at: do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670 do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670 Fixes: 05255b823a61 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	ARC: show_regs: avoid extra line of output	Vineet Gupta
	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2020-05-14	s390/kexec_file: fix initrd location for kdump kernel	Philipp Rudo
	initrd_start must not point at the location the initrd is loaded into the crashkernel memory but at the location it will be after the crashkernel memory is swapped with the memory at 0. Fixes: ee337f5469fd ("s390/kexec_file: Add crash support to image loader") Reported-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Tested-by: Lianbo Jiang <lijiang@redhat.com> Link: https://lore.kernel.org/r/20200512193956.15ae3f23@laptop2-ibm.local Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-14	s390/pci: Fix s390_mmio_read/write with MIO	Niklas Schnelle
	The s390_mmio_read/write syscalls are currently broken when running with MIO. The new pcistb_mio/pcstg_mio/pcilg_mio instructions are executed similiarly to normal load/store instructions and do address translation in the current address space. That means inside the kernel they are aware of mappings into kernel address space while outside the kernel they use user space mappings (usually created through mmap'ing a PCI device file). Now when existing user space applications use the s390_pci_mmio_write and s390_pci_mmio_read syscalls, they pass I/O addresses that are mapped into user space so as to be usable with the new instructions without needing a syscall. Accessing these addresses with the old instructions as done currently leads to a kernel panic. Also, for such a user space mapping there may not exist an equivalent kernel space mapping which means we can't just use the new instructions in kernel space. Instead of replicating user mappings in the kernel which then might collide with other mappings, we can conceptually execute the new instructions as if executed by the user space application using the secondary address space. This even allows us to directly store to the user pointer without the need for copy_to/from_user(). Cc: stable@vger.kernel.org Fixes: 71ba41c9b1d9 ("s390/pci: provide support for MIO instructions") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-14	char: ipmi: convert to use i2c_new_client_device()	Wolfram Sang
	Move away from the deprecated API. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-Id: <20200326210958.13051-2-wsa+renesas@sang-engineering.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-05-14	SUNRPC: 'Directory with parent 'rpc_clnt' already present!'	J. Bruce Fields
	Each rpc_client has a cl_clid which is allocated from a global ida, and a debugfs directory which is named after cl_clid. We're releasing the cl_clid before we free the debugfs directory named after it. As soon as the cl_clid is released, that value is available for another newly created client. That leaves a window where another client may attempt to create a new debugfs directory with the same name as the not-yet-deleted debugfs directory from the dying client. Symptoms are log messages like Directory 4 with parent 'rpc_clnt' already present! Fixes: 7c4310ff5642 "SUNRPC: defer slow parts of rpc_free_client() to a workqueue." Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-14	Merge branch 'net-qed-qede-critical-hw-error-handling'	David S. Miller
	Igor Russkikh says: ==================== net: qed/qede: critical hw error handling FastLinQ devices as a complex systems may observe various hardware level error conditions, both severe and recoverable. Driver is able to detect and report this, but so far it only did trace/dmesg based reporting. Here we implement an extended hw error detection, service task handler captures a dump for the later analysis. I also resubmit a patch from Denis Bolotin on tx timeout handler, addressing David's comment regarding recovery procedure as an extra reaction on this event. v2: Removing the patch with ethtool dump and udev magic. Its quite isolated, I'm working on devlink based logic for this separately. v1: https://patchwork.ozlabs.org/project/netdev/cover/cover.1588758463.git.irusskikh@marvell.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: qed: fix bad formatting	Igor Russkikh
	On some adjacent code, fix bad code formatting Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: qed: introduce critical hardware error handler	Igor Russkikh
	MCP may signal driver about generic critical failure. Driver has to collect mdump information (get_retain), it pushes that to logs and triggers generic notification on "hardware attention" event. Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: qed: introduce critical fan failure handler	Igor Russkikh
	Fan failure is sent by firmware, driver reacts on this error with newly introduced notification path. It will collect dump and shut down the device to prevent physical breakage Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: qede: Implement ndo_tx_timeout	Denis Bolotin
	Upon tx timeout detection we do disable carrier and print TX queue info on TX timeout. We then raise hw error condition and trigger service task to handle this. This handler will capture extra debug info and then optionally trigger recovery procedure to try restore function. Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: qede: optional hw recovery procedure	Igor Russkikh
	Driver has an ability to initiate a recovery process as a reaction to detected errors. But the codepath (recovery_process) was disabled and never active. Here we add ethtool private flag to allow user have the recovery procedure activated. We still do not enable this by default though, since in some configurations this is not desirable. E.g. this may impact other PFs/VFs. Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: qed: attention clearing properties	Igor Russkikh
	On different hardware events we have to respond differently, on some of hardware indications hw attention (error condition) should be cleared by the driver to continue normal functioning. Here we introduce attention clear flags, and put them on some important events (in aeu_descs). Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14	net: qed: cleanup debug related declarations	Igor Russkikh
	Thats probably a legacy code had double declaration of some fields. Cleanup this, removing copy and fixing references. Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>