git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-03-08	Merge branch 'nexthop-group-stats'	David S. Miller
	Petr Machata says: ==================== Support for nexthop group statistics ECMP is a fundamental component in L3 designs. However, it's fragile. Many factors influence whether an ECMP group will operate as intended: hash policy (i.e. the set of fields that contribute to ECMP hash calculation), neighbor validity, hash seed (which might lead to polarization) or the type of ECMP group used (hash-threshold or resilient). At the same time, collection of statistics that would help an operator determine that the group performs as desired, is difficult. A solution that we present in this patchset is to add counters to next hop group entries. For SW-datapath deployments, this will on its own allow collection and evaluation of relevant statistics. For HW-datapath deployments, we further add a way to request that HW counters be installed for a given group, in-kernel interfaces to collect the HW statistics, and netlink interfaces to query them. For example: # ip nexthop replace id 4000 group 4001/4002 hw_stats on # ip -s -d nexthop show id 4000 id 4000 group 4001/4002 scope global proto unspec offload hw_stats on used on stats: id 4001 packets 5002 packets_hw 5000 id 4002 packets 4999 packets_hw 4999 The point of the patchset is visibility of ECMP balance, and that is influenced by packet headers, not their payload. Correspondingly, we only include packet counters in the statistics, not byte counters. We also decided to model HW statistics as a nexthop group attribute, not an arbitrary nexthop one. The latter would count any traffic going through a given nexthop, regardless of which ECMP group it is in, or any at all. The reason is again hat the point of the patchset is ECMP balance visibility, not arbitrary inspection of how busy a particular nexthop is. Implementation of individual-nexthop statistics is certainly possible, and could well follow the general approach we are taking in this patchset. For resilient groups, per-bucket statistics could be done in a similar manner as well. This patchset contains the core code. mlxsw support will be sent in a follow-up patch set. This patchset progresses as follows: - Patches #1 and #2 add support for a new next-hop object attribute, NHA_OP_FLAGS. That is meant to carry various op-specific signaling, in particular whether SW- and HW-collected nexthop stats should be part of the get or dump response. The idea is to avoid wasting message space, and time for collection of HW statistics, when the values are not needed. - Patches #3 and #4 add SW-datapath stats and corresponding UAPI. - Patches #5, #6 and #7 add support fro HW-datapath stats and UAPI. Individual drivers still need to contribute the appropriate HW-specific support code. v4: - Patch #2: - s/nla_get_bitfield32/nla_get_u32/ in __nh_valid_dump_req(). v3: - Patch #3: - Convert to u64_stats_t - Patch #4: - Give a symbolic name to the set of all valid dump flags for the NHA_OP_FLAGS attribute. - Convert to u64_stats_t - Patch #6: - Use a named constant for the NHA_HW_STATS_ENABLE policy. v2: - Patch #2: - Change OP_FLAGS to u32, enforce through NLA_POLICY_MASK - Patch #3: - Set err on nexthop_create_group() error path - Patch #4: - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS - Rename jump target in nla_put_nh_group_stats() to avoid having to rename further in the patchset. - Patch #7: - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS_HW - Do not cancel outside of nesting in nla_put_nh_group_stats() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: nexthop: Expose nexthop group HW stats to user space	Ido Schimmel
	Add netlink support for reading NH group hardware stats. Stats collection is done through a new notifier, NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for a given NH group are thereby asked to collect the stats and report back to core by calling nh_grp_hw_stats_report_delta(). This is similar to what netdevice L3 stats do. Besides exposing number of packets that passed in the HW datapath, also include information on whether any driver actually realizes the counters. The core can tell based on whether it got any _report_delta() reports from the drivers. This allows enabling the statistics at the group at any time, with drivers opting into supporting them. This is also in line with what netdevice L3 stats are doing. So as not to waste time and space, tie the collection and reporting of HW stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: nexthop: Add ability to enable / disable hardware statistics	Ido Schimmel
	Add netlink support for enabling collection of HW statistics on nexthop groups. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: nexthop: Add hardware statistics notifications	Ido Schimmel
	Add hw_stats field to several notifier structures to communicate to the drivers that HW statistics should be configured for nexthops within a given group. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: nexthop: Expose nexthop group stats to user space	Ido Schimmel
	Add netlink support for reading NH group stats. This data is only for statistics of the traffic in the SW datapath. HW nexthop group statistics will be added in the following patches. Emission of the stats is keyed to a new op_stats flag to avoid cluttering the netlink message with stats if the user doesn't need them: NHA_OP_FLAG_DUMP_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: nexthop: Add nexthop group entry stats	Ido Schimmel
	Add nexthop group entry stats to count the number of packets forwarded via each nexthop in the group. The stats will be exposed to user space for better data path observability in the next patch. The per-CPU stats pointer is placed at the beginning of 'struct nh_grp_entry', so that all the fields accessed for the data path reside on the same cache line: struct nh_grp_entry { struct nexthop * nh; /* 0 8 / struct nh_grp_entry_stats stats; /* 8 8 / u8 weight; / 16 1 / / XXX 7 bytes hole, try to pack / union { struct { atomic_t upper_bound; / 24 4 / } hthr; / 24 4 / struct { struct list_head uw_nh_entry; / 24 16 / u16 count_buckets; / 40 2 / u16 wants_buckets; / 42 2 / } res; / 24 24 / }; / 24 24 / struct list_head nh_list; / 48 16 / / --- cacheline 1 boundary (64 bytes) --- / struct nexthop nh_parent; /* 64 8 / / size: 72, cachelines: 2, members: 6 / / sum members: 65, holes: 1, sum holes: 7 / / last cacheline: 8 bytes */ }; Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: nexthop: Add NHA_OP_FLAGS	Petr Machata
	In order to add per-nexthop statistics, but still not increase netlink message size for consumers that do not care about them, there needs to be a toggle through which the user indicates their desire to get the statistics. To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to use the attribute for carrying of arbitrary operation-specific flags, i.e. not make it specific for get / dump. Add the new attribute to get and dump policies, but do not actually allow any flags yet -- those will come later as the flags themselves are defined. Add the necessary parsing code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: nexthop: Adjust netlink policy parsing for a new attribute	Petr Machata
	A following patch will introduce a new attribute, op-specific flags to adjust the behavior of an operation. Different operations will recognize different flags. - To make the differentiation possible, stop sharing the policies for get and del operations. - To allow querying for presence of the attribute, have all the attribute arrays sized to NHA_MAX, regardless of what is permitted by policy, and pass the corresponding value to nlmsg_parse() as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	octeontx2-pf: Add TC flower offload support for TCP flags	Sai Krishna
	This patch adds TC offload support for matching TCP flags from TCP header. Example usage: tc qdisc add dev eth0 ingress TC rule to drop the TCP SYN packets: tc filter add dev eth0 ingress protocol ip flower ip_proto tcp tcp_flags 0x02/0x3f skip_sw action drop Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	tcp: Add skb addr and sock addr to arguments of tracepoint tcp_probe.	fuyuanli
	It is useful to expose skb addr and sock addr to user in tracepoint tcp_probe, so that we can get more information while monitoring receiving of tcp data, by ebpf or other ways. For example, we need to identify a packet by seq and end_seq when calculate transmit latency between layer 2 and layer 4 by ebpf, but which is not available in tcp_probe, so we can only use kprobe hooking tcp_rcv_established to get them. But we can use tcp_probe directly if skb addr and sock addr are available, which is more efficient. Signed-off-by: fuyuanli <fuyuanli@didiglobal.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: dqs: add NIC stall detector based on BQL	Jakub Kicinski
	softnet_data->time_squeeze is sometimes used as a proxy for host overload or indication of scheduling problems. In practice this statistic is very noisy and has hard to grasp units - e.g. is 10 squeezes a second to be expected, or high? Delaying network (NAPI) processing leads to drops on NIC queues but also RTT bloat, impacting pacing and CA decisions. Stalls are a little hard to detect on the Rx side, because there may simply have not been any packets received in given period of time. Packet timestamps help a little bit, but again we don't know if packets are stale because we're not keeping up or because someone (cough cgroups) disabled IRQs for a long time. We can, however, use Tx as a proxy for Rx stalls. Most drivers use combined Rx+Tx NAPIs so if Tx gets starved so will Rx. On the Tx side we know exactly when packets get queued, and completed, so there is no uncertainty. This patch adds stall checks to BQL. Why BQL? Because it's a convenient place to add such checks, already called by most drivers, and it has copious free space in its structures (this patch adds no extra cache references or dirtying to the fast path). The algorithm takes one parameter - max delay AKA stall threshold and increments a counter whenever NAPI got delayed for at least that amount of time. It also records the length of the longest stall. To be precise every time NAPI has not polled for at least stall thrs we check if there were any Tx packets queued between last NAPI run and now - stall_thrs/2. Unlike the classic Tx watchdog this mechanism does not ignore stalls caused by Tx being disabled, or loss of link. I don't think the check is worth the complexity, and stall is a stall, whether due to host overload, flow control, link down... doesn't matter much to the application. We have been running this detector in production at Meta for 2 years, with the threshold of 8ms. It's the lowest value where false positives become rare. There's still a constant stream of reported stalls (especially without the ksoftirqd deferral patches reverted), those who like their stall metrics to be 0 may prefer higher value. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: chelsio: remove unused function calc_tx_descs	Colin Ian King
	The inlined helper function calc_tx_descs is not used and is redundant. Remove it. Cleans up clang scan build warning: drivers/net/ethernet/chelsio/cxgb4/sge.c:814:28: warning: unused function 'calc_tx_descs' [-Wunused-function] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: phy: fix phy_get_internal_delay accessing an empty array	Kévin L'hôpital
	The phy_get_internal_delay function could try to access to an empty array in the case that the driver is calling phy_get_internal_delay without defining delay_values and rx-internal-delay-ps or tx-internal-delay-ps is defined to 0 in the device-tree. This will lead to "unable to handle kernel NULL pointer dereference at virtual address 0". To avoid this kernel oops, the test should be delay >= 0. As there is already delay < 0 test just before, the test could only be size == 0. Fixes: 92252eec913b ("net: phy: Add a helper to return the index for of the internal delay") Co-developed-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> Signed-off-by: Kévin L'hôpital <kevin.lhopital@savoirfairelinux.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	octeontx2-af: Fix devlink params	Sunil Goutham
	Devlink param for adjusting NPC MCAM high zone area is in wrong param list and is not getting activated on CN10KA silicon. That patch fixes this issue. Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	net: ip_tunnel: make sure to pull inner header in ip_tunnel_rcv()	Eric Dumazet
	Apply the same fix than ones found in : 8d975c15c0cd ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()") 1ca1ba465e55 ("geneve: make sure to pull inner header in geneve_rx()") We have to save skb->network_header in a temporary variable in order to be able to recompute the network_header pointer after a pskb_inet_may_pull() call. pskb_inet_may_pull() makes sure the needed headers are in skb->head. syzbot reported: BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] BUG: KMSAN: uninit-value in ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409 __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409 __ipgre_rcv+0x9bc/0xbc0 net/ipv4/ip_gre.c:389 ipgre_rcv net/ipv4/ip_gre.c:411 [inline] gre_rcv+0x423/0x19f0 net/ipv4/ip_gre.c:447 gre_rcv+0x2a4/0x390 net/ipv4/gre_demux.c:163 ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5534 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648 netif_receive_skb_internal net/core/dev.c:5734 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5793 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1556 tun_get_user+0x53b9/0x66e0 drivers/net/tun.c:2009 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055 call_write_iter include/linux/fs.h:2087 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb6b/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590 alloc_pages_mpol+0x62b/0x9d0 mm/mempolicy.c:2133 alloc_pages+0x1be/0x1e0 mm/mempolicy.c:2204 skb_page_frag_refill+0x2bf/0x7c0 net/core/sock.c:2909 tun_build_skb drivers/net/tun.c:1686 [inline] tun_get_user+0xe0a/0x66e0 drivers/net/tun.c:1826 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055 call_write_iter include/linux/fs.h:2087 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb6b/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	ipv6: fib6_rules: flush route cache when rule is changed	Shiming Cheng
	When rule policy is changed, ipv6 socket cache is not refreshed. The sock's skb still uses a outdated route cache and was sent to a wrong interface. To avoid this error we should update fib node's version when rule is changed. Then skb's route will be reroute checked as route cache version is already different with fib node version. The route cache is refreshed to match the latest rule. Fixes: 101367c2f8c4 ("[IPV6]: Policy Routing Rules") Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> Signed-off-by: Lena Wang <lena.wang@mediatek.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	gpio: sysfs: repair export returning -EPERM on 1st attempt	Alexander Sverdlin
	It would make sense to return -EPERM if the bit was already set (already used), not if it was cleared. Before this fix pins can only be exported on the 2nd attempt: $ echo 522 > /sys/class/gpio/export sh: write error: Operation not permitted $ echo 522 > /sys/class/gpio/export Fixes: 35b545332b80 ("gpio: remove gpio_lock") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-03-08	i2c: aspeed: Fix the dummy irq expected print	Tommy Huang
	When the i2c error condition occurred and master state was not idle, the master irq function will goto complete state without any other interrupt handling. It would cause dummy irq expected print. Under this condition, assign the irq_status into irq_handle. For example, when the abnormal start / stop occurred (bit 5) with normal stop status (bit 4) at same time. Then the normal stop status would not be handled and it would cause irq expected print in the aspeed_i2c_bus_irq. ... aspeed-i2c-bus x. i2c-bus: irq handled != irq. Expected 0x00000030, but was 0x00000020 ... Fixes: 3e9efc3299dd ("i2c: aspeed: Handle master/slave combined irq events properly") Cc: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com> Signed-off-by: Tommy Huang <tommy_huang@aspeedtech.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-03-08	i2c: wmt: Fix an error handling path in wmt_i2c_probe()	Christophe JAILLET
	wmt_i2c_reset_hardware() calls clk_prepare_enable(). So, should an error occur after it, it should be undone by a corresponding clk_disable_unprepare() call, as already done in the remove function. Fixes: 560746eb79d3 ("i2c: vt8500: Add support for I2C bus on Wondermedia SoCs") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-03-08	i2c: i801: Avoid potential double call to gpiod_remove_lookup_table	Heiner Kallweit
	If registering the platform device fails, the lookup table is removed in the error path. On module removal we would try to remove the lookup table again. Fix this by setting priv->lookup only if registering the platform device was successful. In addition free the memory allocated for the lookup table in the error path. Fixes: d308dfbf62ef ("i2c: mux/i801: Switch to use descriptor passing") Cc: stable@vger.kernel.org Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-03-08	i2c: i801: Fix using mux_pdev before it's set	Heiner Kallweit
	i801_probe_optional_slaves() is called before i801_add_mux(). This results in mux_pdev being checked before it's set by i801_add_mux(). Fix this by changing the order of the calls. I consider this safe as I see no dependencies. Fixes: 80e56b86b59e ("i2c: i801: Simplify class-based client device instantiation") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-03-08	Merge branches 'arm/mediatek', 'arm/renesas', 'arm/smmu', 'x86/vt-d', ↵	Joerg Roedel
	'x86/amd' and 'core' into next
2024-03-08	iommu: Fix compilation without CONFIG_IOMMU_INTEL	Bert Karwatzki
	When the kernel is comiled with CONFIG_IRQ_REMAP=y but without CONFIG_IOMMU_INTEL compilation fails since commit def054b01a8678 with an undefined reference to device_rbtree_find(). This patch makes sure that intel specific code is only compiled with CONFIG_IOMMU_INTEL=y. Signed-off-by: Bert Karwatzki <spasswolf@web.de> Fixes: 80a9b50c0b9e ("iommu/vt-d: Improve ITE fault handling if target device isn't present") Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240307194419.15801-1-spasswolf@web.de Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-08	x86/sev: Disable KMSAN for memory encryption TUs	Changbin Du
	Instrumenting sev.c and mem_encrypt_identity.c with KMSAN will result in a triple-faulting kernel. Some of the code is invoked too early during boot, before KMSAN is ready. Disable KMSAN instrumentation for the two translation units. [ bp: Massage commit message. ] Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240308044401.1120395-1-changbin.du@huawei.com
2024-03-08	iommu/amd: Fix sleeping in atomic context	Vasant Hegde
	Commit cf70873e3d01 ("iommu/amd: Refactor GCR3 table helper functions") changed GFP flag we use for GCR3 table. Original plan was to move GCR3 table allocation outside spinlock. But this requires complete rework of attach device path. Hence we didn't do it as part of SVA series. For now revert the GFP flag to ATOMIC (same as original code). Fixes: cf70873e3d01 ("iommu/amd: Refactor GCR3 table helper functions") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20240307052738.116035-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-08	Merge tag 'asoc-fix-v6.8-rc7' of ↵	Takashi Iwai
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.8 Some more driver specific fixes for v6.8, plus one new x86 platform quirk. All good fixes to have if you have systems that use the relevant hardware.
2024-03-07	Merge branch 'netdev-add-per-queue-statistics'	Jakub Kicinski
	Jakub Kicinski says: ==================== netdev: add per-queue statistics Per queue stats keep coming up, so it's about time someone laid the foundation. This series adds the uAPI, a handful of stats and a sample support for bnxt. It's not very comprehensive in terms of stat types or driver support. The expectation is that the support will grow organically. If we have the basic pieces in place it will be easy for reviewers to request new stats, or use of the API in place of ethtool -S. See patch 3 for sample output. v2: https://lore.kernel.org/all/20240229010221.2408413-1-kuba@kernel.org/ v1: https://lore.kernel.org/all/20240226211015.1244807-1-kuba@kernel.org/ rfc: https://lore.kernel.org/all/20240222223629.158254-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20240306195509.1502746-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	eth: bnxt: support per-queue statistics	Jakub Kicinski
	Support per-queue statistics API in bnxt. $ ethtool -S eth0 NIC statistics: [0]: rx_ucast_packets: 1418 [0]: rx_mcast_packets: 178 [0]: rx_bcast_packets: 0 [0]: rx_discards: 0 [0]: rx_errors: 0 [0]: rx_ucast_bytes: 1141815 [0]: rx_mcast_bytes: 16766 [0]: rx_bcast_bytes: 0 [0]: tx_ucast_packets: 1734 ... $ ./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' [{'ifindex': 2, 'queue-id': 0, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 1164931, 'rx-packets': 1641}, ... {'ifindex': 2, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 631494, 'tx-packets': 1771}, ... Reset the per queue counters: $ ethtool -L eth0 combined 4 Inspect again: $ ./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' [{'ifindex': 2, 'queue-id': 0, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 32397, 'rx-packets': 145}, ... {'ifindex': 2, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 37481, 'tx-packets': 196}, ... $ ethtool -S eth0 \| head NIC statistics: [0]: rx_ucast_packets: 174 [0]: rx_mcast_packets: 3 [0]: rx_bcast_packets: 0 [0]: rx_discards: 0 [0]: rx_errors: 0 [0]: rx_ucast_bytes: 37151 [0]: rx_mcast_bytes: 267 [0]: rx_bcast_bytes: 0 [0]: tx_ucast_packets: 267 ... Totals are still correct: $ ./cli.py --spec netlink/specs/netdev.yaml --dump qstats-get [{'ifindex': 2, 'rx-alloc-fail': 0, 'rx-bytes': 281949995, 'rx-packets': 216524, 'tx-bytes': 52694905, 'tx-packets': 75546}] $ ip -s link show dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 14:23:f2:61:05:40 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 282519546 218100 0 0 0 516 TX: bytes packets errors dropped carrier collsns 53323054 77674 0 0 0 0 Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	netdev: add queue stat for alloc failures	Jakub Kicinski
	Rx alloc failures are commonly counted by drivers. Support reporting those via netdev-genl queue stats. Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	netdev: add per-queue statistics	Jakub Kicinski
	The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	Merge branch 'net-group-together-hot-data'	Jakub Kicinski
	Eric Dumazet says: ==================== net: group together hot data While our recent structure reorganizations were focused on increasing max throughput, there is still an area where improvements are much needed. In many cases, a cpu handles one packet at a time, instead of a nice batch. Hardware interrupt. -> Software interrupt. -> Network/Protocol stacks. If the cpu was idle or busy in other layers, it has to pull many cache lines. This series adds a new net_hotdata structure, where some critical (and read-mostly) data used in rx and tx path is packed in a small number of cache lines. Synthetic benchmarks will not see much difference, but latency of single packet should improve. net_hodata current size on 64bit is 416 bytes, but might grow in the future. Also move RPS definitions to a new include file. ==================== Link: https://lore.kernel.org/r/20240306160031.874438-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move rps_sock_flow_table to net_hotdata	Eric Dumazet
	rps_sock_flow_table and rps_cpu_mask are used in fast path. Move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-19-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: introduce include/net/rps.h	Eric Dumazet
	Move RPS related structures and helpers from include/linux/netdevice.h and include/net/sock.h to a new include file. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	ipv6: move tcp_ipv6_hash_secret and udp_ipv6_hash_secret to net_hotdata	Eric Dumazet
	Use a 32bit hole in "struct net_offload" to store the remaining 32bit secrets used by TCPv6 and UDPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-17-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	ipv6: move inet6_ehash_secret and udp6_ehash_secret into net_hotdata	Eric Dumazet
	"struct inet6_protocol" has a 32bit hole in 32bit arches. Use it to store the 32bit secret used by UDP and TCP, to increase cache locality in rx path. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-16-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	inet: move inet_ehash_secret and udp_ehash_secret into net_hotdata	Eric Dumazet
	"struct net_protocol" has a 32bit hole in 32bit arches. Use it to store the 32bit secret used by UDP and TCP, to increase cache locality in rx path. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-15-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	inet: move tcp_protocol and udp_protocol to net_hotdata	Eric Dumazet
	These structures are read in rx path, move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-14-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	ipv6: move tcpv6_protocol and udpv6_protocol to net_hotdata	Eric Dumazet
	These structures are read in rx path, move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-13-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	udp: move udpv4_offload and udpv6_offload to net_hotdata	Eric Dumazet
	These structures are used in GRO and GSO paths. Move them to net_hodata for better cache locality. v2: udpv6_offload definition depends on CONFIG_INET=y Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move skbuff_cache(s) to net_hotdata	Eric Dumazet
	skbuff_cache, skbuff_fclone_cache and skb_small_head_cache are used in rx/tx fast paths. Move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move dev_rx_weight to net_hotdata	Eric Dumazet
	dev_rx_weight is read from process_backlog(). Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move dev_tx_weight to net_hotdata	Eric Dumazet
	dev_tx_weight is used in tx fast path. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move tcpv4_offload and tcpv6_offload to net_hotdata	Eric Dumazet
	These are used in TCP fast paths. Move them into net_hotdata for better cache locality. v2: tcpv6_offload definition depends on CONFIG_INET Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move ip_packet_offload and ipv6_packet_offload to net_hotdata	Eric Dumazet
	These structures are used in GRO and GSO paths. v2: ipv6_packet_offload definition depends on CONFIG_INET Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move netdev_max_backlog to net_hotdata	Eric Dumazet
	netdev_max_backlog is used in rx fat path. Move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move ptype_all into net_hotdata	Eric Dumazet
	ptype_all is used in rx/tx fast paths. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move netdev_tstamp_prequeue into net_hotdata	Eric Dumazet
	netdev_tstamp_prequeue is used in rx path. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move netdev_budget and netdev_budget to net_hotdata	Eric Dumazet
	netdev_budget and netdev_budget are used in rx path (net_rx_action()) Move them into net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: introduce struct net_hotdata	Eric Dumazet
	Instead of spreading networking critical fields all over the places, add a custom net_hotdata structure so that we can precisely control its layout. In this first patch, move : - gro_normal_batch used in rx (GRO stack) - offload_base used in rx and tx (GRO and TSO stacks) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	Merge branch 'selftests-mptcp-share-code-and-fix-shellcheck-warnings'	Jakub Kicinski
	Matthieu Baerts says: ==================== selftests: mptcp: share code and fix shellcheck warnings This series cleans MPTCP selftests code. Patch 1 stops using 'iptables-legacy' if available, but uses 'iptables', which is likely 'iptables-nft' behind. Patches 2, 4 and 6 move duplicated code to mptcp_lib.sh. Patch 3 is a preparation for patch 4, and patch 5 adds generic actions at the creation and deletion of netns. Patches 7 to 11 disable a few shellcheck warnings, and fix the rest, so it is easy to spot real issues later. MPTCP CI is checking that now. Patch 12 avoids redoing some actions at init time twice, e.g. restarting the pm events tool. v1: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v1-0-66618ea5504e@kernel.org ==================== Link: https://lore.kernel.org/r/20240306-upstream-net-next-20240304-selftests-mptcp-shared-code-shellcheck-v2-0-bc79e6e5e6a0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>