linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-05-08	Merge branch 'rxrpc-miscellaneous-fixes'	Jakub Kicinski
	David Howells says: ==================== rxrpc: Miscellaneous fixes (part) Here some miscellaneous fixes for AF_RXRPC: (1) Fix the congestion control algorithm to start cwnd at 4 and to not cut ssthresh when the peer cuts its rwind size. (2) Only transmit a single ACK for all the DATA packets glued together into a jumbo packet to reduce the number of ACKs being generated. ==================== Link: https://lore.kernel.org/r/20240503150749.1001323-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08	rxrpc: Only transmit one ACK per jumbo packet received	David Howells
	Only generate one ACK packet for all the subpackets in a jumbo packet. If we would like to generate more than one ACK, we prioritise them base on their reason code, in the order, highest first: OutOfSeq > NoSpace > ExceedsWin > Duplicate > Requested > Delay > Idle For the first four, we reference the lowest offending subpacket; for the last three, the highest. This reduces the number of ACKs we end up transmitting to one per UDP packet transmitted to reduce network loading and packet parsing. Fixes: 5d7edbc9231e ("rxrpc: Get rid of the Rx ring") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>> Link: https://lore.kernel.org/r/20240503150749.1001323-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08	rxrpc: Fix congestion control algorithm	David Howells
	Make the following fixes to the congestion control algorithm: (1) Don't vary the cwnd starting value by the size of RXRPC_TX_SMSS since that's currently held constant - set to the size of a jumbo subpacket payload so that we can create jumbo packets on the fly. The current code invariably picks 3 as the starting value. Further, the starting cwnd needs to be an even number because we ack every other packet, so set it to 4. (2) Don't cut ssthresh when we see an ACK come from the peer with a receive window (rwind) less than ssthresh. ssthresh keeps track of characteristics of the connection whereas rwind may be reduced by the peer for any reason - and may be reduced to 0. Fixes: 1fc4fa2ac93d ("rxrpc: Fix congestion management") Fixes: 0851115090a3 ("rxrpc: Reduce ssthresh to peer's receive window") Signed-off-by: David Howells <dhowells@redhat.com> Suggested-by: Simon Wilkinson <sxw@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>> Link: https://lore.kernel.org/r/20240503150749.1001323-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08	PM / devfreq: rk3399_dmc: Convert to platform remove callback returning void	Uwe Kleine-König
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2024-05-08	PM / devfreq: sun8i-a33-mbus: Convert to platform remove callback returning void	Uwe Kleine-König
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2024-05-08	PM / devfreq: mtk-cci: Convert to platform remove callback returning void	Uwe Kleine-König
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2024-05-08	PM / devfreq: exynos-ppmu: Convert to platform remove callback returning void	Uwe Kleine-König
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2024-05-08	PM / devfreq: exynos-nocp: Convert to platform remove callback returning void	Uwe Kleine-König
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2024-05-08	bpf, arm64: Add support for lse atomics in bpf_arena	Puranjay Mohan
	When LSE atomics are available, BPF atomic instructions are implemented as single ARM64 atomic instructions, therefore it is easy to enable these in bpf_arena using the currently available exception handling setup. LL_SC atomics use loops and therefore would need more work to enable in bpf_arena. Enable LSE atomics based instructions in bpf_arena and use the bpf_jit_supports_insn() callback to reject atomics in bpf_arena if LSE atomics are not available. All atomics and arena_atomics selftests are passing: [root@ip-172-31-2-216 bpf]# ./test_progs -a atomics,arena_atomics #3/1 arena_atomics/add:OK #3/2 arena_atomics/sub:OK #3/3 arena_atomics/and:OK #3/4 arena_atomics/or:OK #3/5 arena_atomics/xor:OK #3/6 arena_atomics/cmpxchg:OK #3/7 arena_atomics/xchg:OK #3 arena_atomics:OK #10/1 atomics/add:OK #10/2 atomics/sub:OK #10/3 atomics/and:OK #10/4 atomics/or:OK #10/5 atomics/xor:OK #10/6 atomics/cmpxchg:OK #10/7 atomics/xchg:OK #10 atomics:OK Summary: 2/14 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20240426161116.441-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-08	io_uring/filetable: don't unnecessarily clear/reset bitmap	Jens Axboe
	If we're updating an existing slot, we clear the slot bitmap only to set it again right after. Just leave the bit set rather than toggle it off and on, and move the unused slot setting into the branch of not already having a file occupy this slot. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-08	selftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MAC	Ido Schimmel
	When creating the topology for the test, three veth pairs are created in the initial network namespace before being moved to one of the network namespaces created by the test. On systems where systemd-udev uses MACAddressPolicy=persistent (default since systemd version 242), this will result in some net devices having the same MAC address since they were created with the same name in the initial network namespace. In turn, this leads to arping / ndisc6 failing since packets are dropped by the bridge's loopback filter. Fix by creating each net device in the correct network namespace instead of moving it there from the initial network namespace. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240426074015.251854d4@kernel.org/ Fixes: 7648ac72dcd7 ("selftests: net: Add bridge neighbor suppression test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240507113033.1732534-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08	nvmet-rdma: fix possible bad dereference when freeing rsps	Sagi Grimberg
	It is possible that the host connected and saw a cm established event and started sending nvme capsules on the qp, however the ctrl did not yet see an established event. This is why the rsp_wait_list exists (for async handling of these cmds, we move them to a pending list). Furthermore, it is possible that the ctrl cm times out, resulting in a connect-error cm event. in this case we hit a bad deref [1] because in nvmet_rdma_free_rsps we assume that all the responses are in the free list. We are freeing the cmds array anyways, so don't even bother to remove the rsp from the free_list. It is also guaranteed that we are not racing anything when we are releasing the queue so no other context accessing this array should be running. [1]: -- Workqueue: nvmet-free-wq nvmet_rdma_free_queue_work [nvmet_rdma] [...] pc : nvmet_rdma_free_rsps+0x78/0xb8 [nvmet_rdma] lr : nvmet_rdma_free_queue_work+0x88/0x120 [nvmet_rdma] Call trace: nvmet_rdma_free_rsps+0x78/0xb8 [nvmet_rdma] nvmet_rdma_free_queue_work+0x88/0x120 [nvmet_rdma] process_one_work+0x1ec/0x4a0 worker_thread+0x48/0x490 kthread+0x158/0x160 ret_from_fork+0x10/0x18 -- Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-08	x86/irq: Use existing helper for pending vector check	Jacob Pan
	lapic_vector_set_in_irr() is already available, use it for checking pending vectors at the local APIC. No functional change. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20240506175612.1141095-1-jacob.jun.pan@linux.intel.com
2024-05-08	nvmet: prevent sprintf() overflow in nvmet_subsys_nsid_exists()	Dan Carpenter
	The nsid value is a u32 that comes from nvmet_req_find_ns(). It's endian data and we're on an error path and both of those raise red flags. So let's make this safer. 1) Make the buffer large enough for any u32. 2) Remove the unnecessary initialization. 3) Use snprintf() instead of sprintf() for even more safety. 4) The sprintf() function returns the number of bytes printed, not counting the NUL terminator. It is impossible for the return value to be <= 0 so delete that. Fixes: 505363957fad ("nvmet: fix nvme status code when namespace is disabled") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-08	erofs: clean up z_erofs_load_full_lcluster()	Gao Xiang
	Only four lcluster types here, remove redundant code. No real logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240508123357.3266173-1-hsiangkao@linux.alibaba.com
2024-05-08	thermal: intel: hfi: Increase the number of CPU capabilities per netlink event	Ricardo Neri
	The number of updated CPU capabilities per netlink event is hard-coded to 16. On systems with more than 16 CPUs (a common case), it takes more than one thermal netlink event to relay all the new capabilities after an HFI interrupt. This adds unnecessary overhead to both the kernel and user space entities. Increase the number of CPU capabilities updated per event to 64. Any system with 64 CPUs or less can now update all the capabilities in a single thermal netlink event. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08	thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT	Ricardo Neri
	When processing a hardware update, HFI generates as many thermal netlink events as needed to relay all the updated CPU capabilities to user space. The constant HFI_MAX_THERM_NOTIFY_COUNT is the number of CPU capabilities updated per each of those events. Give this constant a more descriptive name. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08	thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms	Ricardo Neri
	The delay between an HFI interrupt and its corresponding thermal netlink event has so far been hard-coded to CONFIG_HZ jiffies (1 second). This delay is too long for hardware that generates updates every tens of milliseconds. The HFI driver uses a delayed workqueue to send thermal netlink events. No subsequent events will be sent if there is pending work. As a result, much of the information of consecutive hardware updates will be lost if the workqueue delay is too long. User space entities may act on obsolete data. If the delay is too short, multiple events may overwhelm listeners. Set the delay to 100ms to strike a balance between too many and too few events. Use milliseconds instead of jiffies to improve readability. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08	thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL	Ricardo Neri
	The name of the constant HFI_UPDATE_INTERVAL is misleading. It is not a periodic interval at which HFI updates are processed. It is the delay in the processing of an HFI update after the arrival of an HFI interrupt. Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08	cpufreq: amd-pstate: fix the highest frequency issue which limits performance	Perry Yuan
	To address the performance drop issue, an optimization has been implemented. The incorrect highest performance value previously set by the low-level power firmware for AMD CPUs with Family ID 0x19 and Model ID ranging from 0x70 to 0x7F series has been identified as the cause. To resolve this, a check has been implemented to accurately determine the CPU family and model ID. The correct highest performance value is now set and the performance drop caused by the incorrect highest performance value are eliminated. Before the fix, the highest frequency was set to 4200MHz, now it is set to 4971MHz which is correct. CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ 0 0 0 0 0:0:0:0 yes 4971.0000 400.0000 400.0000 1 0 0 0 0:0:0:0 yes 4971.0000 400.0000 400.0000 2 0 0 1 1:1:1:0 yes 4971.0000 400.0000 4865.8140 3 0 0 1 1:1:1:0 yes 4971.0000 400.0000 400.0000 Fixes: f3a052391822 ("cpufreq: amd-pstate: Enable amd-pstate preferred core support") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218759 Signed-off-by: Perry Yuan <perry.yuan@amd.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Gaha Bana <gahabana@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08	test: hsr: Call cleanup_all_ns when hsr_redbox.sh script exits	Lukasz Majewski
	Without this change the created netns instances are not cleared after this script execution. To fix this problem the cleanup_all_ns function from ../lib.sh is called. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	ax25: Remove superfuous "return" from ax25_ds_set_timer	Joel Granados
	Remove the explicit call to "return" in the void ax25_ds_set_timer function that was introduced in 78a7b5dbc060 ("ax.25: x.25: Remove the now superfluous sentinel elements from ctl_table array"). Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	ipvs: allow some sysctls in non-init user namespaces	Alexander Mikhalitsyn
	Let's make all IPVS sysctls writtable even when network namespace is owned by non-initial user namespace. Let's make a few sysctls to be read-only for non-privileged users: - sync_qlen_max - sync_sock_size - run_estimation - est_cpulist - est_nice I'm trying to be conservative with this to prevent introducing any security issues in there. Maybe, we can allow more sysctls to be writable, but let's do this on-demand and when we see real use-case. This patch is motivated by user request in the LXC project [1]. Having this can help with running some Kubernetes [2] or Docker Swarm [3] workloads inside the system containers. Link: https://github.com/lxc/lxc/issues/4278 [1] Link: https://github.com/kubernetes/kubernetes/blob/b722d017a34b300a2284b890448e5a605f21d01e/pkg/proxy/ipvs/proxier.go#L103 [2] Link: https://github.com/moby/libnetwork/blob/3797618f9a38372e8107d8c06f6ae199e1133ae8/osl/namespace_linux.go#L682 [3] Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	ipvs: add READ_ONCE barrier for ipvs->sysctl_amemthresh	Alexander Mikhalitsyn
	Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	ipv6: Fix potential uninit-value access in __ip6_make_skb()	Shigeru Yoshida
	As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in __ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags instead of testing HDRINCL on the socket to avoid a race condition which causes uninit-value access. Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: stmmac: dwmac-ipq806x: account for rgmii-txid/rxid/id phy-mode	Christian Marangi
	Currently the ipq806x dwmac driver is almost always used attached to the CPU port of a switch and phy-mode was always set to "rgmii" or "sgmii". Some device came up with a special configuration where the PHY is directly attached to the GMAC port and in those case phy-mode needs to be set to "rgmii-id" to make the PHY correctly work and receive packets. Since the driver supports only "rgmii" and "sgmii" mode, when "rgmii-id" (or variants) mode is set, the mode is rejected and probe fails. Add support also for these phy-modes to correctly setup PHYs that requires delay applied to tx/rx. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: bridge: switchdev: Improve error message for port_obj_add/del functions	Oleksij Rempel
	Enhance the error reporting mechanism in the switchdev framework to provide more informative and user-friendly error messages. Following feedback from users struggling to understand the implications of error messages like "failed (err=-28) to add object (id=2)", this update aims to clarify what operation failed and how this might impact the system or network. With this change, error messages now include a description of the failed operation, the specific object involved, and a brief explanation of the potential impact on the system. This approach helps administrators and developers better understand the context and severity of errors, facilitating quicker and more effective troubleshooting. Example of the improved logging: [ 70.516446] ksz-switch spi0.0 uplink: Failed to add Port Multicast Database entry (object id=2) with error: -ENOSPC (-28). [ 70.516446] Failure in updating the port's Multicast Database could lead to multicast forwarding issues. [ 70.516446] Current HW/SW setup lacks sufficient resources. This comprehensive update includes handling for a range of switchdev object IDs, ensuring that most operations within the switchdev framework benefit from clearer error reporting. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: phy: marvell-88q2xxx: add support for Rev B1 and B2	Gregor Herburger
	Different revisions of the Marvell 88q2xxx phy needs different init sequences. Add init sequence for Rev B1 and Rev B2. Rev B2 init sequence skips one register write. Tested-by: Dimitri Fedrau <dima.fedrau@gmail.com> Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	appletalk: Improve handling of broadcast packets	Vincent Duvert
	When a broadcast AppleTalk packet is received, prefer queuing it on the socket whose address matches the address of the interface that received the packet (and is listening on the correct port). Userspace applications that handle such packets will usually send a response on the same socket that received the packet; this fix allows the response to be sent on the correct interface. If a socket matching the interface's address is not found, an arbitrary socket listening on the correct port will be used, if any. This matches the implementation's previous behavior. Fixes atalkd's responses to network information requests when multiple network interfaces are configured to use AppleTalk. Link: https://lore.kernel.org/netdev/20200722113752.1218-2-vincent.ldev@duvert.net/ Link: https://gist.github.com/VinDuv/4db433b6dce39d51a5b7847ee749b2a4 Signed-off-by: Vincent Duvert <vincent.ldev@duvert.net> Signed-off-by: Doug Brown <doug@schmorgal.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net/ipv4: add tracepoint for icmp_send	Peilin He
	Introduce a tracepoint for icmp_send, which can help users to get more detail information conveniently when icmp abnormal events happen. 1. Giving an usecase example: ============================= When an application experiences packet loss due to an unreachable UDP destination port, the kernel will send an exception message through the icmp_send function. By adding a trace point for icmp_send, developers or system administrators can obtain detailed information about the UDP packet loss, including the type, code, source address, destination address, source port, and destination port. This facilitates the trouble-shooting of UDP packet loss issues especially for those network-service applications. 2. Operation Instructions: ========================== Switch to the tracing directory. cd /sys/kernel/tracing Filter for destination port unreachable. echo "type==3 && code==3" > events/icmp/icmp_send/filter Enable trace event. echo 1 > events/icmp/icmp_send/enable 3. Result View: ================ udp_client_erro-11370 [002] ...s.12 124.728002: icmp_send: icmp_send: type=3, code=3. From 127.0.0.1:41895 to 127.0.0.1:6666 ulen=23 skbaddr=00000000589b167a Signed-off-by: Peilin He <he.peilin@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Liu Chun <liu.chun2@zte.com.cn> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: bridge: fix corrupted ethernet header on multicast-to-unicast	Felix Fietkau
	The change from skb_copy to pskb_copy unfortunately changed the data copying to omit the ethernet header, since it was pulled before reaching this point. Fix this by calling __skb_push/pull around pskb_copy. Fixes: 59c878cbcdd8 ("net: bridge: fix multicast-to-unicast with fraglist GSO") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	Merge branch 'ksz-dcb-dscp'	David S. Miller
	Oleksij Rempel says: ==================== add DCB and DSCP support for KSZ switches This patch series is aimed at improving support for DCB (Data Center Bridging) and DSCP (Differentiated Services Code Point) on KSZ switches. The main goal is to introduce global DSCP and PCP (Priority Code Point) mapping support, addressing the limitation of KSZ switches not having per-port DSCP priority mapping. This involves extending the DSA framework with new callbacks for managing trust settings for global DSCP and PCP maps. Additionally, we introduce IEEE 802.1q helpers for default configurations, benefiting other drivers too. Change logs are in separate patches. Compared to v6 this series includes some new patches for DSCP global mapping support and QoS selftest script for KSZ9477 switches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	selftests: microchip: add test for QoS support on KSZ9477 switch family	Oleksij Rempel
	Add tests covering following functionality on KSZ9477 switch family: - default port priority - global DSCP to Internal Priority Mapping - apptrust configuration This script was tested on KSZ9893R Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: add support DSCP priority mapping	Oleksij Rempel
	Microchip KSZ and LAN variants do not have per port DSCP priority configuration. Instead there is a global DSCP mapping table. This patch provides write access to this global DSCP map. In case entry is "deleted", we map corresponding DSCP entry to a best effort prio, which is expected to be the default priority for all untagged traffic. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: add support switches global DSCP priority mapping	Oleksij Rempel
	Some switches like Microchip KSZ variants do not support per port DSCP priority configuration. Instead there is a global DSCP mapping table. To handle it, we will accept set/del request to any of user ports to make global configuration and update dcb app entries for all other ports. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: let DCB code do PCP and DSCP policy configuration	Oleksij Rempel
	802.1P (PCP) and DiffServ (DSCP) are handled now by DCB code. Let it do all needed initial configuration. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: init predictable IPV to queue mapping for all non ↵	Oleksij Rempel
	KSZ8xxx variants Init priority to queue mapping in the way as it shown in IEEE 802.1Q mapping example. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: enable ETS support for KSZ989X variants	Oleksij Rempel
	I tested ETS support on KSZ9893, so it should work other KSZ989X variants too, which was till not listed as support. With this change we now officially not support only ksz8 family of chips. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: dcb: add special handling for KSZ88X3 family	Oleksij Rempel
	KSZ88X3 switches have different behavior on different ports: - It seems to be not possible to disable VLAN PCP classification on port 2. It means, as soon as mutliqueue support is enabled, frames with VLAN tag will get PCP prios. This behavior do not affect Port 1 - it is possible to disable PCP prios. - DSCP classification is not working on Port 2. Since there are still usable configuration combinations, I added some quirks to make sure user will get appropriate error message if not possible configuration is chosen. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: add support for different DCB app configurations	Oleksij Rempel
	Add DCB support to configure app trust sources and default port priority. Following commands can be used for testing: dcb apptrust set dev lan1 order pcp dscp dcb app replace dev lan1 default-prio 3 Since it is not possible to configure DSCP-Prio mapping per port, this patch provide only ability to read switch global dscp-prio mapping and way to enable/disable app trust for DSCP. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: add multi queue support for KSZ88X3 variants	Oleksij Rempel
	KSZ88X3 switches support up to 4 queues. Rework ksz8795_set_prio_queue() to support KSZ8795 and KSZ88X3 families of switches. Per default, configure KSZ88X3 to use one queue, since it need special handling due to priority related errata. Errata handling is implemented in a separate patch. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: add IEEE 802.1q specific helpers	Oleksij Rempel
	IEEE 802.1q specification provides recommendation and examples which can be used as good default values for different drivers. This patch implements mapping examples documented in IEEE 802.1Q-2022 in Annex I "I.3 Traffic type to traffic class mapping" and IETF DSCP naming and mapping DSCP to Traffic Type inspired by RFC8325. This helpers will be used in followup patches for dsa/microchip DCB implementation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: microchip: add IPV information support	Oleksij Rempel
	Most of Microchip KSZ switches use Internal Priority Value associated with every frame. For example, it is possible to map any VLAN PCP or DSCP value to IPV and at the end, map IPV to a queue. Since amount of IPVs is not equal to amount of queues, add this information and make use of it in some functions. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	net: dsa: add support for DCB get/set apptrust configuration	Oleksij Rempel
	Add DCB support to get/set trust configuration for different packet priority information sources. Some switch allow to chose different source of packet priority classification. For example on KSZ switches it is possible to configure VLAN PCP and/or DSCP sources. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08	timers/migration: Prevent out of bounds access on failure	Levi Yun
	When tmigr_setup_groups() fails the level 0 group allocation, then the cleanup derefences index -1 of the local stack array. Prevent this by checking the loop condition first. Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Levi Yun <ppbuk5246@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Link: https://lore.kernel.org/r/20240506041059.86877-1-ppbuk5246@gmail.com
2024-05-08	erofs: derive fsid from on-disk UUID for .statfs() if possible	Hongzhen Luo
	Use the superblock's UUID to generate the fsid when it's non-null. Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Link: https://lore.kernel.org/r/20240409113022.74720-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08	erofs: add a reserved buffer pool for lz4 decompression	Chunhai Guo
	This adds a special global buffer pool (in the end) for reserved pages. Using a reserved pool for LZ4 decompression significantly reduces the time spent on extra temporary page allocation for the extreme cases in low memory scenarios. The table below shows the reduction in time spent on page allocation for LZ4 decompression when using a reserved pool. The results were obtained from multi-app launch benchmarks on ARM64 Android devices running the 5.15 kernel with an 8-core CPU and 8GB of memory. In the benchmark, we launched 16 frequently-used apps, and the camera app was the last one in each round. The data in the table is the average time of camera app for each round. After using the reserved pool, there was an average improvement of 150ms in the overall launch time of our camera app, which was obtained from the systrace log. +--------------+---------------+--------------+---------+ \| \| w/o page pool \| w/ page pool \| diff \| +--------------+---------------+--------------+---------+ \| Average (ms) \| 3434 \| 21 \| -99.38% \| +--------------+---------------+--------------+---------+ Based on the benchmark logs, 64 pages are sufficient for 95% of scenarios. This value can be adjusted with a module parameter `reserved_pages`. The default value is 0. This pool is currently only used for the LZ4 decompressor, but it can be applied to more decompressors if needed. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240402131523.2703948-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08	erofs: do not use pagepool in z_erofs_gbuf_growsize()	Chunhai Guo
	Let's use alloc_pages_bulk_array() for simplicity and get rid of unnecessary pagepool. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240402092757.2635257-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08	erofs: rename per-CPU buffers to global buffer pool and make it configurable	Chunhai Guo
	It will cost more time if compressed buffers are allocated on demand for low-latency algorithms (like lz4) so EROFS uses per-CPU buffers to keep compressed data if in-place decompression is unfulfilled. While it is kind of wasteful of memory for a device with hundreds of CPUs, and only a small number of CPUs concurrently decompress most of the time. This patch renames it as 'global buffer pool' and makes it configurable. This allows two or more CPUs to share a common buffer to reduce memory occupation. Suggested-by: Gao Xiang <xiang@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Link: https://lore.kernel.org/r/20240402100036.2673604-1-guochunhai@vivo.com Signed-off-by: Sandeep Dhavale <dhavale@google.com> Link: https://lore.kernel.org/r/20240408215231.3376659-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08	erofs: rename utils.c to zutil.c	Chunhai Guo
	Currently, utils.c is only useful if CONFIG_EROFS_FS_ZIP is on. So let's rename it to zutil.c as well as avoid its inclusion if CONFIG_EROFS_FS_ZIP is explicitly disabled. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240401135550.2550043-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>