git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-02-20	net: lan966x: Use automatic selection of VCAP rule actionset	Horatiu Vultur
	Since commit 81e164c4aec5 ("net: microchip: sparx5: Add automatic selection of VCAP rule actionset") the VCAP API has the capability to select automatically the actionset based on the actions that are attached to the rule. So it is not needed anymore to hardcode the actionset in the driver, therefore it is OK to remove this. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge branch 'default_rps_mask-follow-up'	David S. Miller
	Paolo Abeni says: ==================== net: default_rps_mask follow-up The first patch namespacify the setting. In the common case, once proper isolation is in place in the main namespace, forwarding to/from each child netns will allways happen on the desidered CPUs. Any additional RPS stage inside the child namespace will not provide additional isolation and could hurt performance badly if picking a CPU on a remote node. The 2nd patch adds more self-tests coverage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	self-tests: more rps self tests	Paolo Abeni
	Explicitly check for child netns and main ns independency Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net: make default_rps_mask a per netns attribute	Paolo Abeni
	That really was meant to be a per netns attribute from the beginning. The idea is that once proper isolation is in place in the main namespace, additional demux in the child namespaces will be redundant. Let's make child netns default rps mask empty by default. To avoid bloating the netns with a possibly large cpumask, allocate it on-demand during the first write operation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge tag 'wireless-next-2023-02-17' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.3 Third set of patches for v6.3. This time only a set of small fixes submitted during the last day or two. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next	David S. Miller
	Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add safeguard to check for NULL tupe in objects updates via NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari. 2) Incorrect pointer check in the new destroy rule command, from Yang Yingliang. 3) Incorrect status bitcheck in nf_conntrack_udp_packet(), from Florian Westphal. 4) Simplify seq_print_acct(), from Ilia Gavrilov. 5) Use 2-arg optimal variant of kfree_rcu() in IPVS, from Julian Anastasov. 6) TCP connection enters CLOSE state in conntrack for locally originated TCP reset packet from the reject target, from Florian Westphal. The fixes #2 and #3 in this series address issues from the previous pull nf-next request in this net-next cycle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net: microchip: sparx5: reduce stack usage	Arnd Bergmann
	The vcap_admin structures in vcap_api_next_lookup_advanced_test() take several hundred bytes of stack frame, but when CONFIG_KASAN_STACK is enabled, each one of them also has extra padding before and after it, which ends up blowing the warning limit: In file included from drivers/net/ethernet/microchip/vcap/vcap_api.c:3521: drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c: In function 'vcap_api_next_lookup_advanced_test': drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1954:1: error: the frame size of 1448 bytes is larger than 1400 bytes [-Werror=frame-larger-than=] 1954 \| } Reduce the total stack usage by replacing the five structures with an array that only needs one pair of padding areas. Fixes: 1f741f001160 ("net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	sfc: use IS_ENABLED() checks for CONFIG_SFC_SRIOV	Arnd Bergmann
	One local variable has become unused after a recent change: drivers/net/ethernet/sfc/ef100_nic.c: In function 'ef100_probe_netdev_pf': drivers/net/ethernet/sfc/ef100_nic.c:1155:21: error: unused variable 'net_dev' [-Werror=unused-variable] struct net_device *net_dev = efx->net_dev; ^~~~~~~ The variable is still used in an #ifdef. Replace the #ifdef with an if(IS_ENABLED()) check that lets the compiler see where it is used, rather than adding another #ifdef. This also fixes an uninitialized return value in ef100_probe_netdev_pf() that gcc did not spot. Fixes: 7e056e2360d9 ("sfc: obtain device mac address based on firmware handle for ef100") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ice: properly alloc ICE_VSI_LB	Michal Swiatkowski
	Devlink reload patchset introduced regression. ICE_VSI_LB wasn't taken into account when doing default allocation. Fix it by adding a case for ICE_VSI_LB in ice_vsi_alloc_def(). Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	sfc: Fix spelling mistake "creationg" -> "creating"	Colin Ian King
	There is a spelling mistake in a pci_warn message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	octeontx2-af: Add NIX Errata workaround on CN10K silicon	Geetha sowjanya
	This patch adds workaround for below 2 HW erratas 1. Due to improper clock gating, NIXRX may free the same NPA buffer multiple times.. to avoid this, always enable NIX RX conditional clock. 2. NIX FIFO does not get initialized on reset, if the SMQ flush is triggered before the first packet is processed, it will lead to undefined state. The workaround to perform SMQ flush only if packet count is non-zero in MDQ. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net: phy: Read EEE abilities when using .features	Andrew Lunn
	A PHY driver can use a static integer value to indicate what link mode features it supports, i.e, its abilities.. This is the old way, but useful when dynamically determining the devices features does not work, e.g. support of fibre. EEE support has been moved into phydev->supported_eee. This needs to be set otherwise the code assumes EEE is not supported. It is normally set as part of reading the devices abilities. However if a static integer value was used, the dynamic reading of the abilities is not performed. Add a call to genphy_c45_read_eee_abilities() to read the EEE abilities. Fixes: 8b68710a3121 ("net: phy: start using genphy_c45_ethtool_get/set_eee()") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge branch 'phydev-locks'	David S. Miller
	Andrew Lunn says: ==================== Add additional phydev locks The phydev lock should be held when accessing members of phydev, or calling into the driver. Some of the phy_ethtool_ functions are missing locks. Add them. To avoid deadlock the marvell driver is modified since it calls one of the functions which gain locks, which would result in a deadlock. The missing locks have not caused noticeable issues, so these patches are for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net: phy: Add locks to ethtool functions	Andrew Lunn
	The phydev lock should be held while accessing members of phydev, or calling into the driver. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net: phy: marvell: Use the unlocked genphy_c45_ethtool_get_eee()	Andrew Lunn
	phy_ethtool_get_eee() is about to gain locking of the phydev lock. This means it cannot be used within a PHY driver without causing a deadlock. Swap to using genphy_c45_ethtool_get_eee() which assumes the lock has already been taken. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net: bcmgenet: fix MoCA LED control	Doug Berger
	When the bcmgenet_mii_config() code was refactored it was missed that the LED control for the MoCA interface got overwritten by the port_ctrl value. Its previous programming is restored here. Fixes: 4f8d81b77e66 ("net: bcmgenet: Refactor register access in bcmgenet_mii_config") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	l2tp: Avoid possible recursive deadlock in l2tp_tunnel_register()	Shigeru Yoshida
	When a file descriptor of pppol2tp socket is passed as file descriptor of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register(). This situation is reproduced by the following program: int main(void) { int sock; struct sockaddr_pppol2tp addr; sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); if (sock < 0) { perror("socket"); return 1; } addr.sa_family = AF_PPPOX; addr.sa_protocol = PX_PROTO_OL2TP; addr.pppol2tp.pid = 0; addr.pppol2tp.fd = sock; addr.pppol2tp.addr.sin_family = PF_INET; addr.pppol2tp.addr.sin_port = htons(0); addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1"); addr.pppol2tp.s_tunnel = 1; addr.pppol2tp.s_session = 0; addr.pppol2tp.d_tunnel = 0; addr.pppol2tp.d_session = 0; if (connect(sock, (const struct sockaddr )&addr, sizeof(addr)) < 0) { perror("connect"); return 1; } return 0; } This program causes the following lockdep warning: ============================================ WARNING: possible recursive locking detected 6.2.0-rc5-00205-gc96618275234 #56 Not tainted -------------------------------------------- repro/8607 is trying to acquire lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0 but task is already holding lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_PPPOX); lock(sk_lock-AF_PPPOX); DEADLOCK * May be due to missing lock nesting notation 1 lock held by repro/8607: #0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 stack backtrace: CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x100/0x178 __lock_acquire.cold+0x119/0x3b9 ? lockdep_hardirqs_on_prepare+0x410/0x410 lock_acquire+0x1e0/0x610 ? l2tp_tunnel_register+0x2b7/0x11c0 ? lock_downgrade+0x710/0x710 ? __fget_files+0x283/0x3e0 lock_sock_nested+0x3a/0xf0 ? l2tp_tunnel_register+0x2b7/0x11c0 l2tp_tunnel_register+0x2b7/0x11c0 ? sprintf+0xc4/0x100 ? l2tp_tunnel_del_work+0x6b0/0x6b0 ? debug_object_deactivate+0x320/0x320 ? lockdep_init_map_type+0x16d/0x7a0 ? lockdep_init_map_type+0x16d/0x7a0 ? l2tp_tunnel_create+0x2bf/0x4b0 ? l2tp_tunnel_create+0x3c6/0x4b0 pppol2tp_connect+0x14e1/0x1a30 ? pppol2tp_put_sk+0xd0/0xd0 ? aa_sk_perm+0x2b7/0xa80 ? aa_af_perm+0x260/0x260 ? bpf_lsm_socket_connect+0x9/0x10 ? pppol2tp_put_sk+0xd0/0xd0 __sys_connect_file+0x14f/0x190 __sys_connect+0x133/0x160 ? __sys_connect_file+0x190/0x190 ? lockdep_hardirqs_on+0x7d/0x100 ? ktime_get_coarse_real_ts64+0x1b7/0x200 ? ktime_get_coarse_real_ts64+0x147/0x200 ? __audit_syscall_entry+0x396/0x500 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x38/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This patch fixes the issue by getting/creating the tunnel before locking the pppol2tp socket. Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge branch 'icmp6-drop-reason'	David S. Miller
	Eric Dumazet says: ==================== ipv6: icmp6: better drop reason support This series aims to have more precise drop reason reports for icmp6. This should reduce false positives on most usual cases. This can be extended as needed later. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add drop reason support to icmpv6_echo_reply()	Eric Dumazet
	Change icmpv6_echo_reply() to return a drop reason. For the moment, return NOT_SPECIFIED or SKB_CONSUMED. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST	Eric Dumazet
	Hosts can often receive neighbour discovery messages that are not for them. Use a dedicated drop reason to make clear the packet is dropped for this normal case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS	Eric Dumazet
	This is a generic drop reason for any error detected in ndisc_parse_options(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add drop reason support to ndisc_redirect_rcv()	Eric Dumazet
	Change ndisc_redirect_rcv() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED and values from icmpv6_notify(). More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add drop reason support to ndisc_router_discovery()	Eric Dumazet
	Change ndisc_router_discovery() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED and SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add drop reason support to ndisc_recv_rs()	Eric Dumazet
	Change ndisc_recv_rs() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add drop reason support to ndisc_recv_na()	Eric Dumazet
	Change ndisc_recv_na() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	ipv6: icmp6: add drop reason support to ndisc_recv_ns()	Eric Dumazet
	Change ndisc_recv_ns() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net: add location to trace_consume_skb()	Eric Dumazet
	kfree_skb() includes the location, it makes sense to add it to consume_skb() as well. After patch: taskd_EventMana 8602 [004] 420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic swapper 0 [011] 422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc discipline 9141 [043] 423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp swapper 0 [010] 423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv borglet 8672 [014] 425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump swapper 0 [028] 426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action wget 14339 [009] 426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	selftests/net: Interpret UDP_GRO cmsg data as an int value	Jakub Sitnicki
	Data passed to user-space with a (SOL_UDP, UDP_GRO) cmsg carries an int (see udp_cmsg_recv), not a u16 value, as strace confirms: recvmsg(8, {msg_name=..., msg_iov=[{iov_base="\0\0..."..., iov_len=96000}], msg_iovlen=1, msg_control=[{cmsg_len=20, <-- sizeof(cmsghdr) + 4 cmsg_level=SOL_UDP, cmsg_type=0x68}], <-- UDP_GRO msg_controllen=24, msg_flags=0}, 0) = 11200 Interpreting the data as an u16 value won't work on big-endian platforms. Since it is too late to back out of this API decision [1], fix the test. [1]: https://lore.kernel.org/netdev/20230131174601.203127-1-jakub@cloudflare.com/ Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	qede: fix interrupt coalescing configuration	Manish Chopra
	On default driver load device gets configured with unexpected higher interrupt coalescing values instead of default expected values as memory allocated from krealloc() is not supposed to be zeroed out and may contain garbage values. Fix this by allocating the memory of required size first with kcalloc() and then use krealloc() to resize and preserve the contents across down/up of the interface. Signed-off-by: Manish Chopra <manishc@marvell.com> Fixes: b0ec5489c480 ("qede: preserve per queue stats across up/down of interface") Cc: stable@vger.kernel.org Cc: Bhaskar Upadhaya <bupadhaya@marvell.com> Cc: David S. Miller <davem@davemloft.net> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2160054 Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	xsk: support use vaddr as ring	Xuan Zhuo
	When we try to start AF_XDP on some machines with long running time, due to the machine's memory fragmentation problem, there is no sufficient contiguous physical memory that will cause the start failure. If the size of the queue is 8 * 1024, then the size of the desc[] is 8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is 16page+. This is necessary to apply for a 4-order memory. If there are a lot of queues, it is difficult to these machine with long running time. Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but we only use 17 pages. This patch replaces __get_free_pages() by vmalloc() to allocate memory to solve these problems. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net/smc: fix application data exception	D. Wythe
	There is a certain probability that following exceptions will occur in the wrk benchmark test: Running 10s test @ http://11.213.45.6:80 8 threads and 64 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.72ms 13.94ms 245.33ms 94.17% Req/Sec 1.96k 713.67 5.41k 75.16% 155262 requests in 10.10s, 23.10MB read Non-2xx or 3xx responses: 3 We will find that the error is HTTP 400 error, which is a serious exception in our test, which means the application data was corrupted. Consider the following scenarios: CPU0 CPU1 buf_desc->used = 0; cmpxchg(buf_desc->used, 0, 1) deal_with(buf_desc) memset(buf_desc->cpu_addr,0); This will cause the data received by a victim connection to be cleared, thus triggering an HTTP 400 error in the server. This patch exchange the order between clear used and memset, add barrier to ensure memory consistency. Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()	D. Wythe
	There is a certain chance to trigger the following panic: PID: 5900 TASK: ffff88c1c8af4100 CPU: 1 COMMAND: "kworker/1:48" #0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7 #1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a #2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60 #3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7 #4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715 #5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654 #6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62 [exception RIP: ib_alloc_mr+19] RIP: ffffffffc0c9cce3 RSP: ffff9456c1cc7c38 RFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000004 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88c1ea281d00 R8: 000000020a34ffff R9: ffff88c1350bbb20 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000010 R14: ffff88c1ab040a50 R15: ffff88c1ea281d00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc] #8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc] #9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc] The reason here is that when the server tries to create a second link, smc_llc_srv_add_link() has no protection and may add a new link to link group. This breaks the security environment protected by llc_conf_mutex. Fixes: 2d2209f20189 ("net/smc: first part of add link processing as SMC server") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge branch 'taprio-queuemaxsdu-fixes'	Paolo Abeni
	Vladimir Oltean says: ==================== taprio queueMaxSDU fixes This fixes 3 issues noticed while attempting to reoffload the dynamically calculated queueMaxSDU values. These are: - Dynamic queueMaxSDU is not calculated correctly due to a lost patch - Dynamically calculated queueMaxSDU needs to be clamped on the low end - Dynamically calculated queueMaxSDU needs to be clamped on the high end ==================== Link: https://lore.kernel.org/r/20230215224632.2532685-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimited	Vladimir Oltean
	It makes no sense to keep randomly large max_sdu values, especially if larger than the device's max_mtu. These are visible in "tc qdisc show". Such a max_sdu is practically unlimited and will cause no packets for that traffic class to be dropped on enqueue. Just set max_sdu_dynamic to U32_MAX, which in the logic below causes taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user space of 0 (unlimited). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net/sched: taprio: don't allow dynamic max_sdu to go negative after stab ↵	Vladimir Oltean
	adjustment The overhead specified in the size table comes from the user. With small time intervals (or gates always closed), the overhead can be larger than the max interval for that traffic class, and their difference is negative. What we want to happen is for max_sdu_dynamic to have the smallest non-zero value possible (1) which means that all packets on that traffic class are dropped on enqueue. However, since max_sdu_dynamic is u32, a negative is represented as a large value and oversized dropping never happens. Use max_t with int to force a truncation of max_frm_len to no smaller than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no smaller than 1. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net/sched: taprio: fix calculation of maximum gate durations	Vladimir Oltean
	taprio_calculate_gate_durations() depends on netdev_get_num_tc() and this returns 0. So it calculates the maximum gate durations for no traffic class. I had tested the blamed commit only with another patch in my tree, one which in the end I decided isn't valuable enough to submit ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"). The problem is that having this patch threw off my testing. By moving the netdev_set_num_tc() call earlier, we implicitly gave to taprio_calculate_gate_durations() the information it needed. Extract only the portion from the unsubmitted change which applies the mqprio configuration to the netdev earlier. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/ Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	rxrpc: Fix overproduction of wakeups to recvmsg()	David Howells
	Fix three cases of overproduction of wakeups: (1) rxrpc_input_split_jumbo() conditionally notifies the app that there's data for recvmsg() to collect if it queues some data - and then its only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway. Fix the rxrpc_input_data() to only do the wakeup in failure cases. (2) If a DATA packet is received for a call by the I/O thread whilst recvmsg() is busy draining the call's rx queue in the app thread, the call will left on the recvmsg() queue for recvmsg() to pick up, even though there isn't any data on it. This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR set after the reply has been posted to a service call. Fix this by discarding pending calls from the recvmsg() queue that don't need servicing yet. (3) Not-yet-completed calls get requeued after having data read from them, even if they have no data to read. Fix this by only requeuing them if they have data waiting on them; if they don't, the I/O thread will requeue them when data arrives or they fail. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	Merge branch 'net-final-gsi-register-updates'	Paolo Abeni
	Alex Elder says: ==================== net: final GSI register updates I believe this is the last set of changes required to allow IPA v5.0 to be supported. There is a little cleanup work remaining, but that can happen in the next Linux release cycle. Otherwise we just need config data and register definitions for IPA v5.0 (and DTS updates). These are ready but won't be posted without further testing. The first patch in this series fixes a minor bug in a patch just posted, which I found too late. The second eliminates the GSI memory "adjustment"; this was done previously to avoid/delay the need to implement a more general way to define GSI register offsets. Note that this patch causes "checkpatch" warnings due to indentation that aligns with an open parenthesis. The third patch makes use of the newly-defined register offsets, to eliminate the need for a function that hid a few details. The next modifies a different helper function to work properly for IPA v5.0+. The fifth patch changes the way the event ring size is specified based on how it's now done for IPA v5.0+. And the last defines a new register required for IPA v5.0+. ==================== Link: https://lore.kernel.org/r/20230215195352.755744-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: add HW_PARAM_4 GSI register	Alex Elder
	Starting at IPA v5.0, the number of event rings per EE is defined in a field in a new HW_PARAM_4 GSI register rather than HW_PARAM_2. Define this new register and its fields, and update the code that checks the number of rings supported by hardware to use the proper field based on IPA version. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: support different event ring encoding	Alex Elder
	Starting with IPA v5.0, a channel's event ring index is encoded in a field in the CH_C_CNTXT_1 GSI register rather than CH_C_CNTXT_0. Define a new field ID for the former register and encode the event ring in the appropriate register. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: avoid setting an undefined field	Alex Elder
	The GSI channel protocol field in the CH_C_CNTXT_0 GSI register is widened starting IPA v5.0, making the CHTYPE_PROTOCOL_MSB field added in IPA v4.5 unnecessary. Update the code to reflect this. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: kill ev_ch_e_cntxt_1_length_encode()	Alex Elder
	Now that we explicitly define each register field width there is no need to have a special encoding function for the event ring length. Add a field for this to the EV_CH_E_CNTXT_1 GSI register, and use it in place of ev_ch_e_cntxt_1_length_encode() (which can be removed). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: kill gsi->virt_raw	Alex Elder
	Starting at IPA v4.5, almost all GSI registers had their offsets changed by a fixed amount (shifted downward by 0xd000). Rather than defining offsets for all those registers dependent on version, an adjustment was applied for most register accesses. This was implemented in commit cdeee49f3ef7f ("net: ipa: adjust GSI register addresses"). It was later modified to be a bit more obvious about the adjusment, in commit 571b1e7e58ad3 ("net: ipa: use a separate pointer for adjusted GSI memory"). We now are able to define every GSI register with its own offset, so there's no need to implement this special adjustment. So get rid of the "virt_raw" pointer, and just maintain "virt" as the (non-adjusted) base address of I/O mapped GSI register memory. Redefine the offsets of all GSI registers (other than the INTER_EE ones, which were not subject to the adjustment) for IPA v4.5+, subtracting 0xd000 from their defined offsets instead. Move the ERROR_LOG and ERROR_LOG_CLR definitions further down in the register definition files so all registers are defined in order of their offset. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: fix an incorrect assignment	Alex Elder
	I spotted an error in a patch posted this week, unfortunately just after it got accepted. The effect of the bug is that time-based interrupt moderation is disabled. This is not technically a bug, but it is not what is intended. The problem is that a \|= assignment got implemented as a simple assignment, so the previously assigned value was ignored. Fixes: edc6158b18af ("net: ipa: define fields for event-ring related registers") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: dpaa2-eth: do not always set xsk support in xdp_features flag	Lorenzo Bianconi
	Do not always add NETDEV_XDP_ACT_XSK_ZEROCOPY bit in xdp_features flag but check if the NIC really supports it. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/3dba6ea42dc343a9f2d7d1a6a6a6c173235e1ebf.1676471386.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-19	Linux 6.2v6.2	Linus Torvalds

2023-02-18	Merge tag 'x86-urgent-2023-02-19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for x86. Revert the recent change to the MTRR code which aimed to support SEV-SNP guests on Hyper-V. It caused a regression on XEN Dom0 kernels. The underlying issue of MTTR (mis)handling in the x86 code needs some deeper investigation and is definitely not 6.2 material" * tag 'x86-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
2023-02-18	Merge tag 'timers-urgent-2023-02-19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A fix for a long standing issue in the alarmtimer code. Posix-timers armed with a short interval with an ignored signal result in an unpriviledged DoS. Due to the ignored signal the timer switches into self rearm mode. This issue had been "fixed" before but a rework of the alarmtimer code 5 years ago lost that workaround. There is no real good solution for this issue, which is also worked around in the core posix-timer code in the same way, but it certainly moved way up on the ever growing todo list" * tag 'timers-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: alarmtimer: Prevent starvation by small intervals and SIG_IGN
2023-02-18	Merge tag 'irq-urgent-2023-02-19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single build fix for the PCI/MSI infrastructure. The addition of the new alloc/free interfaces in this cycle forgot to add stub functions for pci_msix_alloc_irq_at() and pci_msix_free_irq() for the CONFIG_PCI_MSI=n case" * tag 'irq-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Provide missing stubs for CONFIG_PCI_MSI=n
2023-02-19	Merge tag 'irqchip-6.3' of ↵	Thomas Gleixner
	git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - New and improved irqdomain locking, closing a number of races that became apparent now that we are able to probe drivers in parallel - A bunch of OF node refcounting bugs have been fixed - We now have a new IPI mux, lifted from the Apple AIC code and made common. It is expected that riscv will eventually benefit from it - Two small fixes for the Broadcom L2 drivers - Various cleanups and minor bug fixes Link: https://lore.kernel.org/r/20230218143452.3817627-1-maz@kernel.org