git.armlinux.org.uk/linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-07-25	net/mlx5e: Fix potential deadlock by deferring RX timeout recovery	Shahar Shitrit
	mlx5e_reporter_rx_timeout() is currently invoked synchronously in the driver's open error flow. This causes the thread holding priv->state_lock to attempt acquiring the devlink lock, which can result in a circular dependency with other devlink operations. For example: - Devlink health diagnose flow: - __devlink_nl_pre_doit() acquires the devlink lock. - devlink_nl_health_reporter_diagnose_doit() invokes the driver's diagnose callback. - mlx5e_rx_reporter_diagnose() then attempts to acquire priv->state_lock. - Driver open flow: - mlx5e_open() acquires priv->state_lock. - If an error occurs, devlink_health_reporter may be called, attempting to acquire the devlink lock. To prevent this circular locking scenario, defer the RX timeout recovery by scheduling it via a workqueue. This ensures that the recovery work acquires locks in a consistent order: first the devlink lock, then priv->state_lock. Additionally, make the recovery work acquire the netdev instance lock to safely synchronize with the open/close channel flows, similar to mlx5e_tx_timeout_work. Repeatedly attempt to acquire the netdev instance lock until it is taken or the target RQ is no longer active, as indicated by the MLX5E_STATE_CHANNELS_ACTIVE bit. Fixes: 32c57fb26863 ("net/mlx5e: Report and recover from rx timeout") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25	net/mlx5e: Remove skb secpath if xfrm state is not found	Jianbo Liu
	Hardware returns a unique identifier for a decrypted packet's xfrm state, this state is looked up in an xarray. However, the state might have been freed by the time of this lookup. Currently, if the state is not found, only a counter is incremented. The secpath (sp) extension on the skb is not removed, resulting in sp->len becoming 0. Subsequently, functions like __xfrm_policy_check() attempt to access fields such as xfrm_input_state(skb)->xso.type (which dereferences sp->xvec[sp->len - 1]) without first validating sp->len. This leads to a crash when dereferencing an invalid state pointer. This patch prevents the crash by explicitly removing the secpath extension from the skb if the xfrm state is not found after hardware decryption. This ensures downstream functions do not operate on a zero-length secpath. BUG: unable to handle page fault for address: ffffffff000002c8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 282e067 P4D 282e067 PUD 0 Oops: Oops: 0000 [#1] SMP CPU: 12 UID: 0 PID: 0 Comm: swapper/12 Not tainted 6.15.0-rc7_for_upstream_min_debug_2025_05_27_22_44 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:__xfrm_policy_check+0x61a/0xa30 Code: b6 77 7f 83 e6 02 74 14 4d 8b af d8 00 00 00 41 0f b6 45 05 c1 e0 03 48 98 49 01 c5 41 8b 45 00 83 e8 01 48 98 49 8b 44 c5 10 <0f> b6 80 c8 02 00 00 83 e0 0c 3c 04 0f 84 0c 02 00 00 31 ff 80 fa RSP: 0018:ffff88885fb04918 EFLAGS: 00010297 RAX: ffffffff00000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffffffff8311af80 R08: 0000000000000020 R09: 00000000c2eda353 R10: ffff88812be2bbc8 R11: 000000001faab533 R12: ffff88885fb049c8 R13: ffff88812be2bbc8 R14: 0000000000000000 R15: ffff88811896ae00 FS: 0000000000000000(0000) GS:ffff8888dca82000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff000002c8 CR3: 0000000243050002 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? try_to_wake_up+0x108/0x4c0 ? udp4_lib_lookup2+0xbe/0x150 ? udp_lib_lport_inuse+0x100/0x100 ? __udp4_lib_lookup+0x2b0/0x410 __xfrm_policy_check2.constprop.0+0x11e/0x130 udp_queue_rcv_one_skb+0x1d/0x530 udp_unicast_rcv_skb+0x76/0x90 __udp4_lib_rcv+0xa64/0xe90 ip_protocol_deliver_rcu+0x20/0x130 ip_local_deliver_finish+0x75/0xa0 ip_local_deliver+0xc1/0xd0 ? ip_protocol_deliver_rcu+0x130/0x130 ip_sublist_rcv+0x1f9/0x240 ? ip_rcv_finish_core+0x430/0x430 ip_list_rcv+0xfc/0x130 __netif_receive_skb_list_core+0x181/0x1e0 netif_receive_skb_list_internal+0x200/0x360 ? mlx5e_build_rx_skb+0x1bc/0xda0 [mlx5_core] gro_receive_skb+0xfd/0x210 mlx5e_handle_rx_cqe_mpwrq+0x141/0x280 [mlx5_core] mlx5e_poll_rx_cq+0xcc/0x8e0 [mlx5_core] ? mlx5e_handle_rx_dim+0x91/0xd0 [mlx5_core] mlx5e_napi_poll+0x114/0xab0 [mlx5_core] __napi_poll+0x25/0x170 net_rx_action+0x32d/0x3a0 ? mlx5_eq_comp_int+0x8d/0x280 [mlx5_core] ? notifier_call_chain+0x33/0xa0 handle_softirqs+0xda/0x250 irq_exit_rcu+0x6d/0xc0 common_interrupt+0x81/0xa0 </IRQ> Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25	net/mlx5e: Clear Read-Only port buffer size in PBMC before update	Alexei Lazar
	When updating the PBMC register, we read its current value, modify desired fields, then write it back. The port_buffer_size field within PBMC is Read-Only (RO). If this RO field contains a non-zero value when read, attempting to write it back will cause the entire PBMC register update to fail. This commit ensures port_buffer_size is explicitly cleared to zero after reading the PBMC register but before writing back the modified value. This allows updates to other fields in the PBMC register to succeed. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25	spi: SPISG: Fix less than zero comparison on a u32 variable	Colin Ian King
	The check for ns < 0 is always false because variable ns is a u32 which is not a signed type. Fix this by making ns a s32 type. Fixes: cef9991e04ae ("spi: Add Amlogic SPISG driver") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20250725171701.839927-1-colin.i.king@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-25	perf sort: Use perf_env to set arch sort keys and header	Ian Rogers
	Previously arch_support_sort_key and arch_perf_header_entry used a weak symbol to compile as appropriate for x86 and powerpc. A limitation to this is that the handling of a data file could vary in cross-platform development. Change to using the perf_env of the current session to determine the architecture kind and set the sort key and header entries as appropriate. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-23-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test	Ian Rogers
	test__x86_sample_parsing is identical to test__sample_parsing except it explicitly tested PERF_SAMPLE_WEIGHT_STRUCT. Now the parsing code is common move the PERF_SAMPLE_WEIGHT_STRUCT to the common sample parsing test and remove the x86 version. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-22-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf sample: Remove arch notion of sample parsing	Ian Rogers
	By definition arch sample parsing and synthesis will inhibit certain kinds of cross-platform record then analysis (report, script, etc.). Remove arch_perf_parse_sample_weight and arch_perf_synthesize_sample_weight replacing with a common implementation. Combine perf_sample p_stage_cyc and retire_lat as weight3 to capture the differing uses regardless of compiled for architecture. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf env: Remove global perf_env	Ian Rogers
	The global perf_env was used for the host, but if a perf_env wasn't easy to come by it was used in a lot of places where potentially recorded and host data could be confused. Remove the global variable as now the majority of accesses retrieve the perf_env for the host from the session. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf trace: Avoid global perf_env with evsel__env	Ian Rogers
	There is no session in perf trace unless in replay mode, so in host mode no session can be associated with the evlist. If the evsel__env call fails resort to the host_env that's part of the trace. Remove errno_to_name as it becomes a called once 1-line function once the argument is turned into a perf_env, just call perf_env__arch_strerrno directly. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf auxtrace: Pass perf_env from session through to mmap read	Ian Rogers
	auxtrace_mmap__read and auxtrace_mmap__read_snapshot end up calling `evsel__env(NULL)` which returns the global perf_env variable for the host. Their only call is in perf record. Rather than use the global variable pass through the perf_env for `perf record`. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf machine: Explicitly pass in host perf_env	Ian Rogers
	When creating a machine for the host explicitly pass in a scoped perf_env. This removes a use of the global perf_env. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-17-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf bench synthesize: Avoid use of global perf_env	Ian Rogers
	The benchmark doesn't use a data file and so the header perf_env isn't used. Stack allocate a host perf_env for use to avoid the use of the global perf_env. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf top: Make perf_env locally scoped	Ian Rogers
	The use of the global host perf_env variable is potentially inconsistent within the code. Switch perf top to using a locally scoped variable that is generally accessed through the session. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf session: Add host_env argument to perf_session__new	Ian Rogers
	When creating a perf_session the host perf_env may or may not want to be used. For example, `perf top` uses a host perf_env while `perf inject` does not. Add a host_env argument to perf_session__new so that sessions requiring a host perf_env can pass it in. Currently if none is specified the global perf_env variable is used, but this will change in later patches. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf test: Avoid use perf_env	Ian Rogers
	The perf_env global variable holds the host perf_env data but its use is hit and miss. Switch to using local perf_env variables and ensure scoped perf_env__init and perf_env__exit. This loses command line setting of the perf_env, but this doesn't matter for tests. So the perf_env is fully initialized, clear it with memset in perf_env__init. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf header: Clean up use of perf_env	Ian Rogers
	Always use the perf_env from the feat_fd's perf_header. Cache the value on entry to a function in `env` and use `env->` consistently in the code. Ensure the header is initialized for use in perf_session__do_write_header. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf evlist: Change env variable to session	Ian Rogers
	The session holds a perf_env pointer env. In UI code container_of is used to turn the env to a session, but this assumes the session header's env is in use. Rather than a dubious container_of, hold the session in the evlist and derive the env from the session with evsel__env, perf_session__env, etc. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf session: Add accessor for session->header.env	Ian Rogers
	The perf_env from the header in the session is frequently accessed, add an accessor function rather than access directly. Cache the value to avoid repeated calls. No behavioral change. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf record: Make --buildid-mmap the default	Ian Rogers
	Support for build IDs in mmap2 perf events has been present since Linux v5.12: https://lore.kernel.org/lkml/20210219194619.1780437-1-acme@kernel.org/ Build ID mmap events don't avoid the need to inject build IDs for DSO touched by samples as the build ID cache is populated by perf record. They can avoid some cases of symbol mis-resolution caused by the file system changing from when a sample occurred and when the DSO is sought. Unlike the --buildid-mmap option, this chnage doesn't disable the build ID cache but it does disable the processing of samples looking for DSOs to inject build IDs for. To disable the build ID cache the -B (--no-buildid) option should be used. Making this option the default was raised on the list in: https://lore.kernel.org/linux-perf-users/CAP-5=fXP7jN_QrGUcd55_QH5J-Y-FCaJ6=NaHVtyx0oyNh8_-Q@mail.gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf jitdump: Directly mark the jitdump DSO	Ian Rogers
	The DSO being generated was being accessed through a thread's maps, this is unnecessary as the dso can just be directly found. This avoids problems with passing a NULL evsel which may be inspected to determine properties of a callchain when using the buildid DSO marking code. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf dso: Move build_id to dso_id	Ian Rogers
	The dso_id previously contained the major, minor, inode and inode generation information from a mmap2 event - the inode generation would be zero when reading from /proc/pid/maps. The build_id was in the dso. With build ID mmap2 events these fields wouldn't be initialized which would largely mean the special empty case where any dso would match for equality. This isn't desirable as if a dso is replaced we want the comparison to yield a difference. To support detecting the difference between DSOs based on build_id, move the build_id out of the DSO and into the dso_id. The dso_id is also stored in the DSO so nothing is lost. Capture in the dso_id what parts have been initialized and rename dso_id__inject to dso_id__improve_id so that it is clear the dso_id is being improved upon with additional information. With the build_id in the dso_id, use memcmp to compare for equality. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf build-id: Ensure struct build_id is empty before use	Ian Rogers
	If a build ID is read then not all code paths may ensure it is empty before use. Initialize the build_id to be zero-ed unless there is clear initialization such as a call to build_id__init. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf build-id: Mark DSO in sample callchains	Ian Rogers
	Previously only the sample IP's map DSO would be marked hit for the purposes of populating the build ID cache. Walk the call chain to mark all IPs and DSOs. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	perf build-id: Change sprintf functions to snprintf	Ian Rogers
	Pass in a size argument rather than implying all build id strings must be SBUILD_ID_SIZE. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-4-irogers@google.com [ fixed some build errors ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25	net: Fix typos	Bjorn Helgaas
	Fix typos in comments and error messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25	Documentation/ABI/testing/debugfs-cxl: Add 'cxl' to clear_poison path	Alison Schofield
	'cxl' is missing from the path to the clear_poison attribute. Add it. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20250724224308.2101255-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-25	selftests: netfilter: ipvs.sh: Explicity disable rp_filter on interface tunl0	Yi Chen
	Although setup_ns() set net.ipv4.conf.default.rp_filter=0, loading certain module such as ipip will automatically create a tunl0 interface in all netns including new created ones. In the script, this is before than default.rp_filter=0 applied, as a result tunl0.rp_filter remains set to 1 which causes the test report FAIL when ipip module is preloaded. Before fix: Testing DR mode... Testing NAT mode... Testing Tunnel mode... ipvs.sh: FAIL After fix: Testing DR mode... Testing NAT mode... Testing Tunnel mode... ipvs.sh: PASS Fixes: 7c8b89ec506e ("selftests: netfilter: remove rp_filter configuration") Signed-off-by: Yi Chen <yiche@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	selftests: netfilter: Ignore tainted kernels in interface stress test	Phil Sutter
	Complain about kernel taint value only if it wasn't set at start already. Fixes: 73db1b5dab6f ("selftests: netfilter: Torture nftables netdev hooks") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: xt_nfacct: don't assume acct name is null-terminated	Florian Westphal
	BUG: KASAN: slab-out-of-bounds in .. lib/vsprintf.c:721 Read of size 1 at addr ffff88801eac95c8 by task syz-executor183/5851 [..] string+0x231/0x2b0 lib/vsprintf.c:721 vsnprintf+0x739/0xf00 lib/vsprintf.c:2874 [..] nfacct_mt_checkentry+0xd2/0xe0 net/netfilter/xt_nfacct.c:41 xt_check_match+0x3d1/0xab0 net/netfilter/x_tables.c:523 nfnl_acct_find_get() handles non-null input, but the error printk relied on its presence. Reported-by: syzbot+4ff165b9251e4d295690@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4ff165b9251e4d295690 Tested-by: syzbot+4ff165b9251e4d295690@syzkaller.appspotmail.com Fixes: ceb98d03eac5 ("netfilter: xtables: add nfacct match to support extended accounting") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nft_set_pipapo: prefer kvmalloc for scratch maps	Florian Westphal
	The scratchmap size depends on the number of elements in the set. For huge sets, each scratch map can easily require very large allocations, e.g. for 100k entries each scratch map will require close to 64kbyte of memory. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nft_set_pipapo: merge pipapo_get/lookup	Florian Westphal
	The matching algorithm has implemented thrice: 1. data path lookup, generic version 2. data path lookup, avx2 version 3. control plane lookup Merge 1 and 3 by refactoring pipapo_get as a common helper, then make nft_pipapo_lookup and nft_pipapo_get both call the common helper. Aside from the code savings this has the benefit that we no longer allocate temporary scratch maps for each control plane get and insertion operation. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nft_set: remove indirection from update API call	Florian Westphal
	This stems from a time when sets and nft_dynset resided in different kernel modules. We can replace this with a direct call. We could even remove both ->update and ->delete, given its only supported by rhashtable, but on the off-chance we'll see runtime add/delete for other types or a new set type keep that as-is for now. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nft_set: remove one argument from lookup and update functions	Florian Westphal
	Return the extension pointer instead of passing it as a function argument to be filled in by the callee. As-is, whenever false is returned, the extension pointer is not used. For all set types, when true is returned, the extension pointer was set to the matching element. Only exception: nft_set_bitmap doesn't support extensions. Return a pointer to a static const empty element extension container. return false -> return NULL return true -> return the elements' extension pointer. This saves one function argument. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nft_set_pipapo: remove unused arguments	Florian Westphal
	They are not used anymore, so remove them. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nfnetlink_hook: Dump flowtable info	Phil Sutter
	Introduce NFNL_HOOK_TYPE_NFT_FLOWTABLE to distinguish flowtable hooks from base chain ones. Nested attributes are shared with the old NFTABLES hook info type since they fit apart from their misleading name. Old nftables in user space will ignore this new hook type and thus continue to print flowtable hooks just like before, e.g.: \| family netdev { \| hook ingress device test0 { \| 0000000000 nf_flow_offload_ip_hook [nf_flow_table] \| } \| } With this patch in place and support for the new hook info type, output becomes more useful: \| family netdev { \| hook ingress device test0 { \| 0000000000 flowtable ip mytable myft [nf_flow_table] \| } \| } Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nfnetlink: New NFNLA_HOOK_INFO_DESC helper	Phil Sutter
	Introduce a helper routine adding the nested attribute for use by a second caller later. Note how this introduces cancelling of 'nest2' for categorical reasons. Since always followed by cancelling of the outer 'nest', it is technically not needed. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	ipvs: Rename del_timer in comment in ip_vs_conn_expire_now()	WangYuli
	Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()") switched del_timer to timer_delete, but did not modify the comment for ip_vs_conn_expire_now(). Now fix it. Signed-off-by: WangYuli <wangyuli@uniontech.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	selftests: netfilter: Enable CONFIG_INET_SCTP_DIAG	Sebastian Andrzej Siewior
	The config snippet specifies CONFIG_SCTP_DIAG. This was never an option. Replace CONFIG_SCTP_DIAG with the intended CONFIG_INET_SCTP_DIAG. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	selftests: net: Enable legacy netfilter legacy options.	Florian Westphal
	Some specified options rely on NETFILTER_XTABLES_LEGACY to be enabled. IP_NF_TARGET_TTL for instance depends on IP_NF_MANGLE which in turn depends on IP_NF_IPTABLES_LEGACY -> NETFILTER_XTABLES_LEGACY. Enable relevant iptables config options explicitly, this is needed to avoid breakage when symbols related to iptables-legacy will depend on NETFILTER_LEGACY resp. IP_TABLES_LEGACY. This also means that the classic tables (Kernel modules) will not be enabled by default, so enable them too. Signed-off-by: Florian Westphal <fw@strlen.de> [bigeasy: Split out the config bits from the main patch] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: Exclude LEGACY TABLES on PREEMPT_RT.	Pablo Neira Ayuso
	The seqcount xt_recseq is used to synchronize the replacement of xt_table::private in xt_replace_table() against all readers such as ipt_do_table() To ensure that there is only one writer, the writing side disables bottom halves. The sequence counter can be acquired recursively. Only the first invocation modifies the sequence counter (signaling that a writer is in progress) while the following (recursive) writer does not modify the counter. The lack of a proper locking mechanism for the sequence counter can lead to live lock on PREEMPT_RT if the high prior reader preempts the writer. Additionally if the per-CPU lock on PREEMPT_RT is removed from local_bh_disable() then there is no synchronisation for the per-CPU sequence counter. The affected code is "just" the legacy netfilter code which is replaced by "netfilter tables". That code can be disabled without sacrificing functionality because everything is provided by the newer implementation. This will only requires the usage of the "-nft" tools instead of the "-legacy" ones. The long term plan is to remove the legacy code so lets accelerate the progress. Relax dependencies on iptables legacy, replace select with depends on, this should cause no harm to existing kernel configs and users can still toggle IP{6}_NF_IPTABLES_LEGACY in any case. Make EBTABLES_LEGACY, IPTABLES_LEGACY and ARPTABLES depend on NETFILTER_XTABLES_LEGACY. Hide xt_recseq and its users, xt_register_table() and xt_percpu_counter_alloc() behind NETFILTER_XTABLES_LEGACY. Let NETFILTER_XTABLES_LEGACY depend on !PREEMPT_RT. This will break selftest expecing the legacy options enabled and will be addressed in a following patch. Co-developed-by: Florian Westphal <fw@strlen.de> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: conntrack: Remove unused net in nf_conntrack_double_lock()	Yue Haibing
	Since commit a3efd81205b1 ("netfilter: conntrack: move generation seqcnt out of netns_ct") this param is unused. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: nf_tables: Remove unused nft_reduce_is_readonly()	Yue Haibing
	Since commit 9e539c5b6d9c ("netfilter: nf_tables: disable expression reduction infra") this is unused. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: x_tables: Remove unused functions xt_{in\|out}name()	Yue Haibing
	Since commit 2173c519d5e9 ("audit: normalize NETFILTER_PKT") these are unused, so can be removed. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid	Lance Yang
	When no logger is registered, nf_conntrack_log_invalid fails to log invalid packets, leaving users unaware of actual invalid traffic. Improve this by loading nf_log_syslog, similar to how 'iptables -I FORWARD 1 -m conntrack --ctstate INVALID -j LOG' triggers it. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Zi Li <zi.li@linux.dev> Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: conntrack: table full detailed log	lvxiafei
	Add the netns field in the "nf_conntrack: table full, dropping packet" log to help locate the specific netns when the table is full. Signed-off-by: lvxiafei <lvxiafei@sensetime.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	Merge patch series "can: kvaser_usb: Simplify identification of physical CAN ↵	Marc Kleine-Budde
	interfaces" Jimmy Assarsson <extja@kvaser.com> says: This patch series simplifies the process of identifying which network interface (can0..canX) corresponds to which physical CAN channel on Kvaser USB based CAN interfaces. Note that this patch series is based on [1] "can: kvaser_pciefd: Simplify identification of physical CAN interfaces" Changes in v3: - Fix GCC compiler array warning (-Warray-bounds) - Fix transient Sparse warning - Add tag Reviewed-by Vincent Mailhol Changes in v2: - New patch with devlink documentation - New patch assigning netdev.dev_port - Formatting and refactoring [1] https://lore.kernel.org/linux-can/20250725123230.8-1-extja@kvaser.com Link: https://patch.msgid.link/20250725123452.41-1-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	Documentation: devlink: add devlink documentation for the kvaser_usb driver	Jimmy Assarsson
	List the version information reported by the kvaser_usb driver through devlink. Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-12-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Add devlink port support	Jimmy Assarsson
	Register each CAN channel of the device as an devlink physical port. This makes it easier to get device information for a given network interface (i.e. can2). Example output: $ devlink dev usb/1-1.3:1.0 $ devlink port usb/1-1.3:1.0/0: type eth netdev can0 flavour physical port 0 splittable false usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 1 splittable false $ devlink port show can1 usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 0 splittable false $ devlink dev info usb/1-1.3:1.0: driver kvaser_usb serial_number 1020 versions: fixed: board.rev 1 board.id 7330130009653 running: fw 3.22.527 $ ethtool -i can1 driver: kvaser_usb version: 6.12.10-arch1-1 firmware-version: 3.22.527 expansion-rom-version: bus-info: 1-1.3:1.0 supports-statistics: no supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-11-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Expose device information via devlink info_get()	Jimmy Assarsson
	Expose device information via devlink info_get(): * Serial number * Firmware version * Hardware revision * EAN (product number) Example output: $ devlink dev usb/1-1.2:1.0 $ devlink dev info usb/1-1.2:1.0: driver kvaser_usb serial_number 1020 versions: fixed: board.rev 1 board.id 7330130009653 running: fw 3.22.527 Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-10-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Add devlink support	Jimmy Assarsson
	Add devlink support at device level. Example output: $ devlink dev usb/1-1.3:1.0 $ devlink dev info usb/1-1.3:1.0: driver kvaser_usb Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-9-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>