linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-11-11	net: atlantic: use irq_update_affinity_hint()	Mohammad Heib
	irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. atlantic device reopening will resets the affinity in aq_ndev_open(). 3. atlantic has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107120739.415743-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	nfp: use irq_update_affinity_hint()	Mohammad Heib
	irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. nfp device reopening will resets the affinity in nfp_net_netdev_open(). 3. nfp has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241107115002.413358-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	bnxt_en: use irq_update_affinity_hint()	Mohammad Heib
	irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. bnxt_en device reopening will resets the affinity in bnxt_open(). 3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net: fix data-races around sk->sk_forward_alloc	Wang Liang
	Syzkaller reported this warning: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 16 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x1c5/0x1e0 Modules linked in: CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc5 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:inet_sock_destruct+0x1c5/0x1e0 Code: 24 12 4c 89 e2 5b 48 c7 c7 98 ec bb 82 41 5c e9 d1 18 17 ff 4c 89 e6 5b 48 c7 c7 d0 ec bb 82 41 5c e9 bf 18 17 ff 0f 0b eb 83 <0f> 0b eb 97 0f 0b eb 87 0f 0b e9 68 ff ff ff 66 66 2e 0f 1f 84 00 RSP: 0018:ffffc9000008bd90 EFLAGS: 00010206 RAX: 0000000000000300 RBX: ffff88810b172a90 RCX: 0000000000000007 RDX: 0000000000000002 RSI: 0000000000000300 RDI: ffff88810b172a00 RBP: ffff88810b172a00 R08: ffff888104273c00 R09: 0000000000100007 R10: 0000000000020000 R11: 0000000000000006 R12: ffff88810b172a00 R13: 0000000000000004 R14: 0000000000000000 R15: ffff888237c31f78 FS: 0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc63fecac8 CR3: 000000000342e000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x88/0x130 ? inet_sock_destruct+0x1c5/0x1e0 ? report_bug+0x18e/0x1a0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? inet_sock_destruct+0x1c5/0x1e0 __sk_destruct+0x2a/0x200 rcu_do_batch+0x1aa/0x530 ? rcu_do_batch+0x13b/0x530 rcu_core+0x159/0x2f0 handle_softirqs+0xd3/0x2b0 ? __pfx_smpboot_thread_fn+0x10/0x10 run_ksoftirqd+0x25/0x30 smpboot_thread_fn+0xdd/0x1d0 kthread+0xd3/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Its possible that two threads call tcp_v6_do_rcv()/sk_forward_alloc_add() concurrently when sk->sk_state == TCP_LISTEN with sk->sk_lock unlocked, which triggers a data-race around sk->sk_forward_alloc: tcp_v6_rcv tcp_v6_do_rcv skb_clone_and_charge_r sk_rmem_schedule __sk_mem_schedule sk_forward_alloc_add() skb_set_owner_r sk_mem_charge sk_forward_alloc_add() __kfree_skb skb_release_all skb_release_head_state sock_rfree sk_mem_uncharge sk_forward_alloc_add() sk_mem_reclaim // set local var reclaimable __sk_mem_reclaim sk_forward_alloc_add() In this syzkaller testcase, two threads call tcp_v6_do_rcv() with skb->truesize=768, the sk_forward_alloc changes like this: (cpu 1) \| (cpu 2) \| sk_forward_alloc ... \| ... \| 0 __sk_mem_schedule() \| \| +4096 = 4096 \| __sk_mem_schedule() \| +4096 = 8192 sk_mem_charge() \| \| -768 = 7424 \| sk_mem_charge() \| -768 = 6656 ... \| ... \| sk_mem_uncharge() \| \| +768 = 7424 reclaimable=7424 \| \| \| sk_mem_uncharge() \| +768 = 8192 \| reclaimable=8192 \| __sk_mem_reclaim() \| \| -4096 = 4096 \| __sk_mem_reclaim() \| -8192 = -4096 != 0 The skb_clone_and_charge_r() should not be called in tcp_v6_do_rcv() when sk->sk_state is TCP_LISTEN, it happens later in tcp_v6_syn_recv_sock(). Fix the same issue in dccp_v6_do_rcv(). Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20241107023405.889239-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	rxrpc: Add a tracepoint for aborts being proposed	David Howells
	Add a tracepoint to rxrpc to trace the proposal of an abort. The abort is performed asynchronously by the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/726356.1730898045@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	ipv6: Fix soft lockups in fib6_select_path under high next hop churn	Omid Ehtemam-Haghighi
	Soft lockups have been observed on a cluster of Linux-based edge routers located in a highly dynamic environment. Using the `bird` service, these routers continuously update BGP-advertised routes due to frequently changing nexthop destinations, while also managing significant IPv6 traffic. The lockups occur during the traversal of the multipath circular linked-list in the `fib6_select_path` function, particularly while iterating through the siblings in the list. The issue typically arises when the nodes of the linked list are unexpectedly deleted concurrently on a different core—indicated by their 'next' and 'previous' elements pointing back to the node itself and their reference count dropping to zero. This results in an infinite loop, leading to a soft lockup that triggers a system panic via the watchdog timer. Apply RCU primitives in the problematic code sections to resolve the issue. Where necessary, update the references to fib6_siblings to annotate or use the RCU APIs. Include a test script that reproduces the issue. The script periodically updates the routing table while generating a heavy load of outgoing IPv6 traffic through multiple iperf3 clients. It consistently induces infinite soft lockups within a couple of minutes. Kernel log: 0 [ffffbd13003e8d30] machine_kexec at ffffffff8ceaf3eb 1 [ffffbd13003e8d90] __crash_kexec at ffffffff8d0120e3 2 [ffffbd13003e8e58] panic at ffffffff8cef65d4 3 [ffffbd13003e8ed8] watchdog_timer_fn at ffffffff8d05cb03 4 [ffffbd13003e8f08] __hrtimer_run_queues at ffffffff8cfec62f 5 [ffffbd13003e8f70] hrtimer_interrupt at ffffffff8cfed756 6 [ffffbd13003e8fd0] __sysvec_apic_timer_interrupt at ffffffff8cea01af 7 [ffffbd13003e8ff0] sysvec_apic_timer_interrupt at ffffffff8df1b83d -- <IRQ stack> -- 8 [ffffbd13003d3708] asm_sysvec_apic_timer_interrupt at ffffffff8e000ecb [exception RIP: fib6_select_path+299] RIP: ffffffff8ddafe7b RSP: ffffbd13003d37b8 RFLAGS: 00000287 RAX: ffff975850b43600 RBX: ffff975850b40200 RCX: 0000000000000000 RDX: 000000003fffffff RSI: 0000000051d383e4 RDI: ffff975850b43618 RBP: ffffbd13003d3800 R8: 0000000000000000 R9: ffff975850b40200 R10: 0000000000000000 R11: 0000000000000000 R12: ffffbd13003d3830 R13: ffff975850b436a8 R14: ffff975850b43600 R15: 0000000000000007 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 9 [ffffbd13003d3808] ip6_pol_route at ffffffff8ddb030c 10 [ffffbd13003d3888] ip6_pol_route_input at ffffffff8ddb068c 11 [ffffbd13003d3898] fib6_rule_lookup at ffffffff8ddf02b5 12 [ffffbd13003d3928] ip6_route_input at ffffffff8ddb0f47 13 [ffffbd13003d3a18] ip6_rcv_finish_core.constprop.0 at ffffffff8dd950d0 14 [ffffbd13003d3a30] ip6_list_rcv_finish.constprop.0 at ffffffff8dd96274 15 [ffffbd13003d3a98] ip6_sublist_rcv at ffffffff8dd96474 16 [ffffbd13003d3af8] ipv6_list_rcv at ffffffff8dd96615 17 [ffffbd13003d3b60] __netif_receive_skb_list_core at ffffffff8dc16fec 18 [ffffbd13003d3be0] netif_receive_skb_list_internal at ffffffff8dc176b3 19 [ffffbd13003d3c50] napi_gro_receive at ffffffff8dc565b9 20 [ffffbd13003d3c80] ice_receive_skb at ffffffffc087e4f5 [ice] 21 [ffffbd13003d3c90] ice_clean_rx_irq at ffffffffc0881b80 [ice] 22 [ffffbd13003d3d20] ice_napi_poll at ffffffffc088232f [ice] 23 [ffffbd13003d3d80] __napi_poll at ffffffff8dc18000 24 [ffffbd13003d3db8] net_rx_action at ffffffff8dc18581 25 [ffffbd13003d3e40] __do_softirq at ffffffff8df352e9 26 [ffffbd13003d3eb0] run_ksoftirqd at ffffffff8ceffe47 27 [ffffbd13003d3ec0] smpboot_thread_fn at ffffffff8cf36a30 28 [ffffbd13003d3ee8] kthread at ffffffff8cf2b39f 29 [ffffbd13003d3f28] ret_from_fork at ffffffff8ce5fa64 30 [ffffbd13003d3f50] ret_from_fork_asm at ffffffff8ce03cbb Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Adrian Oliver <kernel@aoliver.ca> Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241106010236.1239299-1-omid.ehtemamhaghighi@menlosecurity.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	selftests/mm: Fix unused function warning for aarch64_write_signal_pkey()	Catalin Marinas
	Since commit 49f59573e9e0 ("selftests/mm: Enable pkey_sighandler_tests on arm64"), pkey_sighandler_tests.c (which includes pkey-arm64.h via pkey-helpers.h) ends up compiled for arm64. Since it doesn't use aarch64_write_signal_pkey(), the compiler warns: In file included from pkey-helpers.h:106, from pkey_sighandler_tests.c:31: pkey-arm64.h:130:13: warning: ‘aarch64_write_signal_pkey’ defined but not used [-Wunused-function] 130 \| static void aarch64_write_signal_pkey(ucontext_t *uctxt, u64 pkey) \| ^~~~~~~~~~~~~~~~~~~~~~~~~ Make the aarch64_write_signal_pkey() a 'static inline void' function to avoid the compiler warning. Fixes: f5b5ea51f78f ("selftests: mm: make protection_keys test work on arm64") Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20241108110549.1185923-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-11	kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests	Catalin Marinas
	Fix the incorrect length modifiers in arm64/abi/syscall-abi.c. Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241108134920.1233992-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-11	kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test	Catalin Marinas
	While prctl() returns an 'int', the PR_MTE_TCF_MASK is defined as unsigned long which results in the larger type following a bitwise 'and' operation. Cast the printf() argument to 'int'. Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241108134920.1233992-3-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-11	kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests	Catalin Marinas
	Lots of incorrect length modifiers, missing arguments or conversion specifiers. Fix them. Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241108134920.1233992-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-11	kselftest/arm64: Fix build with stricter assemblers	Mark Brown
	While some assemblers (including the LLVM assembler I mostly use) will happily accept SMSTART as an instruction by default others, specifically gas, require that any architecture extensions be explicitly enabled. The assembler SME test programs use manually encoded helpers for the new instructions but no SMSTART helper is defined, only SM and ZA specific variants. Unfortunately the irritators that were just added use plain SMSTART so on stricter assemblers these fail to build: za-test.S:160: Error: selected processor does not support `smstart' Switch to using SMSTART ZA via the manually encoded smstart_za macro we already have defined. Fixes: d65f27d240bb ("kselftest/arm64: Implement irritators for ZA and ZT") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241108-arm64-selftest-asm-error-v1-1-7ce27b42a677@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-11	Merge branch 'knobs-for-npc-default-rule-counters'	Jakub Kicinski
	Linu Cherian says: ==================== Knobs for NPC default rule counters Patch 1 introduce _rvu_mcam_remove/add_counter_from/to_rule by refactoring existing code Patch 2 adds a devlink param to enable/disable counters for default rules. Once enabled, counters can Patch 3 adds documentation for devlink params v4: https://lore.kernel.org/20241029035739.1981839-1-lcherian@marvell.com ==================== Link: https://patch.msgid.link/20241105125620.2114301-1-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	devlink: Add documentation for OcteonTx2 AF	Linu Cherian
	Add documentation for the following devlink params - npc_mcam_high_zone_percent - npc_def_rule_cntr - nix_maxlf Signed-off-by: Linu Cherian <lcherian@marvell.com> Link: https://patch.msgid.link/20241105125620.2114301-4-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	octeontx2-af: Knobs for NPC default rule counters	Linu Cherian
	Add devlink knobs to enable/disable counters on NPC default rule entries. Sample command to enable default rule counters: devlink dev param set <dev> name npc_def_rule_cntr value true cmode runtime Sample command to read the counter: cat /sys/kernel/debug/cn10k/npc/mcam_rules Signed-off-by: Linu Cherian <lcherian@marvell.com> Link: https://patch.msgid.link/20241105125620.2114301-3-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	octeontx2-af: Refactor few NPC mcam APIs	Linu Cherian
	Introduce lowlevel variant of rvu_mcam_remove/add_counter_from/to_rule for better code reuse, which assumes necessary locks are taken at higher level. These low level functions would be used for implementing default rule counter APIs in the subsequent patch. Signed-off-by: Linu Cherian <lcherian@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241105125620.2114301-2-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mlx5/core: deduplicate {mlx5_,}eq_update_ci()	Caleb Sander Mateos
	The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The only additional work done by mlx5_eq_update_ci() is to increment eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid the duplication. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183054.2443218-2-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mlx5/core: relax memory barrier in eq_update_ci()	Caleb Sander Mateos
	The memory barrier in eq_update_ci() after the doorbell write is a significant hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see 3% of CPU time spent on the mfence instruction. 98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci update") already relaxed the full memory barrier to just a write barrier in mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb() with wmb() in eq_update_ci() too. On strongly ordered architectures, no barrier is actually needed because the MMIO writes to the doorbell register are guaranteed to appear to the device in the order they were made. However, the kernel's ordered MMIO primitive writel() lacks a convenient big-endian interface. Therefore, we opt to stick with __raw_writel() + a barrier. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183054.2443218-1-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	Merge branch ↵	Jakub Kicinski
	'macsec-inherit-lower-device-s-features-and-tso-limits-when-offloading' Sabrina Dubroca says: ==================== macsec: inherit lower device's features and TSO limits when offloading When macsec is offloaded to a NIC, we can take advantage of some of its features, mainly TSO and checksumming. This increases performance significantly. Some features cannot be inherited, because they require additional ops that aren't provided by the macsec netdevice. We also need to inherit TSO limits from the lower device, like VLAN/macvlan devices do. This series also moves the existing macsec offload selftest to the netdevsim selftests before adding tests for the new features. To allow this new selftest to work, netdevsim's hw_features are expanded. ==================== Link: https://patch.msgid.link/cover.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	selftests: netdevsim: add ethtool features to macsec offload tests	Sabrina Dubroca
	The test verifies that available features aren't changed by toggling offload on the device. Creating a device with offload off and then enabling it later should result in the same features as creating the device with offload enabled directly. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ba801bd0a75b02de2dddbfc77f9efceb8b3d8a2e.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	selftests: netdevsim: add test toggling macsec offload	Sabrina Dubroca
	The test verifies that toggling offload works (both via rtnetlink and macsec's genetlink APIs). This is only possible when no SA is configured. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/bf8e27ee0d921caa4eb35f1e830eca6d4080ddb2.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	selftests: move macsec offload tests from net/rtnetlink to drivers/net/netdvesim	Sabrina Dubroca
	We're going to expand this test, and macsec offload is only lightly related to rtnetlink. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/a1f92c250cc129b4bb111a206c4b560bab4e24a5.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	macsec: inherit lower device's TSO limits when offloading	Sabrina Dubroca
	If macsec is offloaded, we need to follow the lower device's capabilities, like VLAN devices do. Leave the limits unchanged when the offload is disabled. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8240c0181e851f169d815f59658a01fb9dfc5073.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	macsec: clean up local variables in macsec_notify	Sabrina Dubroca
	For all events, we need to loop over the list of secys, so let's move the common variables out of the switch/case. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/9b8996af518fbeb3b7d527feb15d5788495e3108.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	macsec: add some of the lower device's features when offloading	Sabrina Dubroca
	This commit extends the set of netdevice features supported by macsec devices when offload is enabled, which increases performance significantly (for a single TCP stream: 17.5Gbps to 38.5Gbps on my test machines). Commit c850240b6c41 ("net: macsec: report real_dev features when HW offloading is enabled") previously attempted something similar, but had to be reverted (commit 8bcd560ae878 ("Revert "net: macsec: report real_dev features when HW offloading is enabled"")) because the set of features it exposed was too large. During initialization, all features are set, and they're then removed via ndo_fix_features (macsec_fix_features). This allows the offloadable features to be automatically enabled if offloading is turned on after device creation. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8b32c3011d269d6f149724e80c1ffe67c9534067.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	selftests: netdevsim: add a test checking ethtool features	Sabrina Dubroca
	Add a test checking that some features are active by default and changeable. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/fff58fa70f8a300440958b5020f6a4eb2e9dad61.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	netdevsim: add more hw_features	Sabrina Dubroca
	netdevsim currently only set HW_TC in its hw_features, but other features should also be present to better reflect the behavior of real HW. In my macsec offload testing, this ends up as HW_CSUM being missing from hw_features, so it doesn't stick in wanted_features when offload is turned off. Then HW_CSUM (and thus TSO, thanks to netdev_fix_features) is not automatically turned back on when offload is re-enabled. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/b918dc4dd76410a57f7516a855f66b0a2bd58326.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	Merge tag 'sched_ext-for-6.12-rc7-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - The fair sched class currently has a bug where its balance() returns true telling the sched core that it has tasks to run but then NULL from pick_task(). This makes sched core call sched_ext's pick_task() without preceding balance() which can lead to stalls in partial mode. For now, work around by detecting the condition and forcing the CPU to go through another scheduling cycle. - Add a missing newline to an error message and fix drgn introspection tool which went out of sync. * tag 'sched_ext-for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx() sched_ext: Update scx_show_state.py to match scx_ops_bypass_depth's new type sched_ext: Add a missing newline at the end of an error message
2024-11-11	HID: magicmouse: Apple Magic Trackpad 2 USB-C driver support	Callahan Kovacs
	Adds driver support for the USB-C model of Apple's Magic Trackpad 2. The 2024 USB-C model is compatible with the existing Magic Trackpad 2 driver but has a different hardware ID. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219470 Signed-off-by: Callahan Kovacs <callahankovacs@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-11-11	empty include/asm-generic/vga.h	Al Viro
	all places that use anything defined in it (vgacon, mdacon and vga16fb) are built only on architectures that have all that stuff in their native asm/vga.h allows to kill stub asm/vga.h on sh, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-11	sparc: get rid of asm/vga.h	Al Viro
	The only thing we are using it for on sparc is telling vt_buffer.h to pick what it would pick by default anyway - we are not accessing any VRAM here... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-11	asm/vga.h: don't bother with scr_mem{cpy,move}v() unless we need to	Al Viro
	... if they are identical to fallbacks, just leave them alone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-11	vt_buffer.h: get rid of dead code in default scr_...() instances	Al Viro
	Only 4 architectures define VT_BUF_HAVE_RW (alpha, mips, powerpc, sparc) and all of them define VT_BUF_HAVE_MEM{SET,CPY,MOVE}W. In other words, the code under #ifdef VT_BUF_HAVE_RW in default scr_mem...w() instances won't be compiled anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-11	drm/amdgpu/mes12: correct kiq unmap latency	Jack Xiao
	Correct kiq unmap queue timeout value. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cfe98204a06329b6b7fce1b828b7d620473181ff) Cc: stable@vger.kernel.org # 6.11.x
2024-11-11	drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()	Christian König
	The coherency flags can only be determined when the BO is locked and that in turn is only guaranteed when the mapping is validated. Fix the check, move the resource check into the function and add an assert that the BO is locked. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1b4ca8546f5b5c482717bedb8e031227b1541539) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0	Tim Huang
	Currently, the pp_dpm_mclk values are reported in descending order on SMU IP v14.0.0/1/4. Adjust to ascending order for consistency with other clock interfaces. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d4be16ccfd5bf822176740a51ff2306679a2247e) Cc: stable@vger.kernel.org
2024-11-11	drm/amdgpu: Fix video caps for H264 and HEVC encode maximum size	David Rosca
	H264 supports 4096x4096 starting from Polaris. HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352 is supported. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 69e9a9e65b1ea542d07e3fdd4222b46e9f5a3a29) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Adjust VSDB parser for replay feature	Rodrigo Siqueira
	At some point, the IEEE ID identification for the replay check in the AMD EDID was added. However, this check causes the following out-of-bounds issues when using KASAN: [ 27.804016] BUG: KASAN: slab-out-of-bounds in amdgpu_dm_update_freesync_caps+0xefa/0x17a0 [amdgpu] [ 27.804788] Read of size 1 at addr ffff8881647fdb00 by task systemd-udevd/383 ... [ 27.821207] Memory state around the buggy address: [ 27.821215] ffff8881647fda00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821224] ffff8881647fda80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821234] >ffff8881647fdb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821243] ^ [ 27.821250] ffff8881647fdb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821259] ffff8881647fdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821268] ================================================================== This is caused because the ID extraction happens outside of the range of the edid lenght. This commit addresses this issue by considering the amd_vsdb_block size. Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b7e381b1ccd5e778e3d9c44c669ad38439a861d8) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Require minimum VBlank size for stutter optimization	Dillon Varone
	If the nominal VBlank is too small, optimizing for stutter can cause the prefetch bandwidth to increase drasticaly, resulting in higher clock and power requirements. Only optimize if it is >3x the stutter latency. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 003215f962cdf2265f126a3f4c9ad20917f87fca) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Handle dml allocation failure to avoid crash	Ryan Seto
	[Why] In the case where a dml allocation fails for any reason, the current state's dml contexts would no longer be valid. Then subsequent calls dc_state_copy_internal would shallow copy invalid memory and if the new state was released, a double free would occur. [How] Reset dml pointers in new_state to NULL and avoid invalid pointer Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bcafdc61529a48f6f06355d78eb41b3aeda5296c) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Fix Panel Replay not update screen correctly	Tom Chung
	[Why] In certain use case such as KDE login screen, there will be no atomic commit while do the frame update. If the Panel Replay enabled, it will cause the screen not updated and looks like system hang. [How] Delay few atomic commits before enabled the Panel Replay just like PSR. Fixes: be64336307a6c ("drm/amd/display: Re-enable panel replay feature") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3686 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3682 Tested-By: Corey Hickey <bugfood-c@fatooh.org> Tested-By: James Courtier-Dutton <james.dutton@gmail.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ca628f0eddd73adfccfcc06b2a55d915bca4a342) Cc: stable@vger.kernel.org # 6.11+
2024-11-11	drm/amd/display: Change some variable name of psr	Tom Chung
	Panel Replay feature may also use the same variable with PSR. Change the variable name and make it not specify for PSR. Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c7fafb7a46b38a11a19342d153f505749bf56f3e) Cc: stable@vger.kernel.org # 6.11+
2024-11-11	Merge branch 'replace-page_frag-with-page_frag_cache-part-1'	Jakub Kicinski
	Yunsheng Lin says: ==================== Replace page_frag with page_frag_cache (Part-1) This is part 1 of "Replace page_frag with page_frag_cache", which mainly contain refactoring and optimization for the implementation of page_frag API before the replacing. As the discussion in [1], it would be better to target net-next tree to get more testing as all the callers page_frag API are in networking, and the chance of conflicting with MM tree seems low as implementation of page_frag API seems quite self-contained. After [2], there are still two implementations for page frag: 1. mm/page_alloc.c: net stack seems to be using it in the rx part with 'struct page_frag_cache' and the main API being page_frag_alloc_align(). 2. net/core/sock.c: net stack seems to be using it in the tx part with 'struct page_frag' and the main API being skb_page_frag_refill(). This patchset tries to unfiy the page frag implementation by replacing page_frag with page_frag_cache for sk_page_frag() first. net_high_order_alloc_disable_key for the implementation in net/core/sock.c doesn't seems matter that much now as pcp is also supported for high-order pages: commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") As the related change is mostly related to networking, so targeting the net-next. And will try to replace the rest of page_frag in the follow patchset. After this patchset: 1. Unify the page frag implementation by taking the best out of two the existing implementations: we are able to save some space for the 'page_frag_cache' API user, and avoid 'get_page()' for the old 'page_frag' API user. 2. Future bugfix and performance can be done in one place, hence improving maintainability of page_frag's implementation. Kernel Image changing: Linux Kernel total \| text data bss ------------------------------------------------------ after 45250307 \| 27274279 17209996 766032 before 45254134 \| 27278118 17209984 766032 delta -3827 \| -3839 +12 +0 1. https://lore.kernel.org/all/add10dd4-7f5d-4aa1-aa04-767590f944e0@redhat.com/ 2. https://lore.kernel.org/all/20240228093013.8263-1-linyunsheng@huawei.com/ ==================== Link: https://patch.msgid.link/20241028115343.3405838-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mm: page_frag: use __alloc_pages() to replace alloc_pages_node()	Yunsheng Lin
	It seems there is about 24Bytes binary size increase for __page_frag_cache_refill() after refactoring in arm64 system with 64K PAGE_SIZE. By doing the gdb disassembling, It seems we can have more than 100Bytes decrease for the binary size by using __alloc_pages() to replace alloc_pages_node(), as there seems to be some unnecessary checking for nid being NUMA_NO_NODE, especially when page_frag is part of the mm system. CC: Andrew Morton <akpm@linux-foundation.org> CC: Linux-MM <linux-mm@kvack.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20241028115343.3405838-8-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mm: page_frag: reuse existing space for 'size' and 'pfmemalloc'	Yunsheng Lin
	Currently there is one 'struct page_frag' for every 'struct sock' and 'struct task_struct', we are about to replace the 'struct page_frag' with 'struct page_frag_cache' for them. Before begin the replacing, we need to ensure the size of 'struct page_frag_cache' is not bigger than the size of 'struct page_frag', as there may be tens of thousands of 'struct sock' and 'struct task_struct' instances in the system. By or'ing the page order & pfmemalloc with lower bits of 'va' instead of using 'u16' or 'u32' for page size and 'u8' for pfmemalloc, we are able to avoid 3 or 5 bytes space waste. And page address & pfmemalloc & order is unchanged for the same page in the same 'page_frag_cache' instance, it makes sense to fit them together. After this patch, the size of 'struct page_frag_cache' should be the same as the size of 'struct page_frag'. CC: Andrew Morton <akpm@linux-foundation.org> CC: Linux-MM <linux-mm@kvack.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20241028115343.3405838-7-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	xtensa: remove the get_order() implementation	Yunsheng Lin
	As the get_order() implemented by xtensa supporting 'nsau' instruction seems be the same as the generic implementation in include/asm-generic/getorder.h when size is not a constant value as the generic implementation calling the fls*() is also utilizing the 'nsau' instruction for xtensa. So remove the get_order() implemented by xtensa, as using the generic implementation may enable the compiler to do the computing when size is a constant value instead of runtime computing and enable the using of get_order() in BUILD_BUG_ON() macro in next patch. CC: Andrew Morton <akpm@linux-foundation.org> CC: Linux-MM <linux-mm@kvack.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20241028115343.3405838-6-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mm: page_frag: avoid caller accessing 'page_frag_cache' directly	Yunsheng Lin
	Use appropriate frag_page API instead of caller accessing 'page_frag_cache' directly. CC: Andrew Morton <akpm@linux-foundation.org> CC: Linux-MM <linux-mm@kvack.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20241028115343.3405838-5-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mm: page_frag: use initial zero offset for page_frag_alloc_align()	Yunsheng Lin
	We are about to use page_frag_alloc_*() API to not just allocate memory for skb->data, but also use them to do the memory allocation for skb frag too. Currently the implementation of page_frag in mm subsystem is running the offset as a countdown rather than count-up value, there may have several advantages to that as mentioned in [1], but it may have some disadvantages, for example, it may disable skb frag coalescing and more correct cache prefetching We have a trade-off to make in order to have a unified implementation and API for page_frag, so use a initial zero offset in this patch, and the following patch will try to make some optimization to avoid the disadvantages as much as possible. 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/ CC: Andrew Morton <akpm@linux-foundation.org> CC: Linux-MM <linux-mm@kvack.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20241028115343.3405838-4-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mm: move the page fragment allocator from page_alloc into its own file	Yunsheng Lin
	Inspired by [1], move the page fragment allocator from page_alloc into its own c file and header file, as we are about to make more change for it to replace another page_frag implementation in sock.c As this patchset is going to replace 'struct page_frag' with 'struct page_frag_cache' in sched.h, including page_frag_cache.h in sched.h has a compiler error caused by interdependence between mm_types.h and mm.h for asm-offsets.c, see [2]. So avoid the compiler error by moving 'struct page_frag_cache' to mm_types_task.h as suggested by Alexander, see [3]. 1. https://lore.kernel.org/all/20230411160902.4134381-3-dhowells@redhat.com/ 2. https://lore.kernel.org/all/15623dac-9358-4597-b3ee-3694a5956920@gmail.com/ 3. https://lore.kernel.org/all/CAKgT0UdH1yD=LSCXFJ=YM_aiA4OomD-2wXykO42bizaWMt_HOA@mail.gmail.com/ CC: David Howells <dhowells@redhat.com> CC: Linux-MM <linux-mm@kvack.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20241028115343.3405838-3-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	mm: page_frag: add a test module for page_frag	Yunsheng Lin
	The testing is done by ensuring that the fragment allocated from a frag_frag_cache instance is pushed into a ptr_ring instance in a kthread binded to a specified cpu, and a kthread binded to a specified cpu will pop the fragment from the ptr_ring and free the fragment. CC: Andrew Morton <akpm@linux-foundation.org> CC: Linux-MM <linux-mm@kvack.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/20241028115343.3405838-2-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net: convert to nla_get_*_default()	Johannes Berg
	Most of the original conversion is from the spatch below, but I edited some and left out other instances that were either buggy after conversion (where default values don't fit into the type) or just looked strange. @@ expression attr, def; expression val; identifier fn =~ "^nla_get_.*"; fresh identifier dfn = fn ## "_default"; @@ ( -if (attr) - val = fn(attr); -else - val = def; +val = dfn(attr, def); \| -if (!attr) - val = def; -else - val = fn(attr); +val = dfn(attr, def); \| -if (!attr) - return def; -return fn(attr); +return dfn(attr, def); \| -attr ? fn(attr) : def +dfn(attr, def) \| -!attr ? def : fn(attr) +dfn(attr, def) ) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>