Age | Commit message (Collapse) | Author |
|
mlx5e_reporter_rx_timeout() is currently invoked synchronously
in the driver's open error flow. This causes the thread holding
priv->state_lock to attempt acquiring the devlink lock, which
can result in a circular dependency with other devlink operations.
For example:
- Devlink health diagnose flow:
- __devlink_nl_pre_doit() acquires the devlink lock.
- devlink_nl_health_reporter_diagnose_doit() invokes the
driver's diagnose callback.
- mlx5e_rx_reporter_diagnose() then attempts to acquire
priv->state_lock.
- Driver open flow:
- mlx5e_open() acquires priv->state_lock.
- If an error occurs, devlink_health_reporter may be called,
attempting to acquire the devlink lock.
To prevent this circular locking scenario, defer the RX timeout
recovery by scheduling it via a workqueue. This ensures that the
recovery work acquires locks in a consistent order: first the
devlink lock, then priv->state_lock.
Additionally, make the recovery work acquire the netdev instance
lock to safely synchronize with the open/close channel flows,
similar to mlx5e_tx_timeout_work. Repeatedly attempt to acquire
the netdev instance lock until it is taken or the target RQ is no
longer active, as indicated by the MLX5E_STATE_CHANNELS_ACTIVE bit.
Fixes: 32c57fb26863 ("net/mlx5e: Report and recover from rx timeout")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1753256672-337784-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hardware returns a unique identifier for a decrypted packet's xfrm
state, this state is looked up in an xarray. However, the state might
have been freed by the time of this lookup.
Currently, if the state is not found, only a counter is incremented.
The secpath (sp) extension on the skb is not removed, resulting in
sp->len becoming 0.
Subsequently, functions like __xfrm_policy_check() attempt to access
fields such as xfrm_input_state(skb)->xso.type (which dereferences
sp->xvec[sp->len - 1]) without first validating sp->len. This leads to
a crash when dereferencing an invalid state pointer.
This patch prevents the crash by explicitly removing the secpath
extension from the skb if the xfrm state is not found after hardware
decryption. This ensures downstream functions do not operate on a
zero-length secpath.
BUG: unable to handle page fault for address: ffffffff000002c8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 282e067 P4D 282e067 PUD 0
Oops: Oops: 0000 [#1] SMP
CPU: 12 UID: 0 PID: 0 Comm: swapper/12 Not tainted 6.15.0-rc7_for_upstream_min_debug_2025_05_27_22_44 #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:__xfrm_policy_check+0x61a/0xa30
Code: b6 77 7f 83 e6 02 74 14 4d 8b af d8 00 00 00 41 0f b6 45 05 c1 e0 03 48 98 49 01 c5 41 8b 45 00 83 e8 01 48 98 49 8b 44 c5 10 <0f> b6 80 c8 02 00 00 83 e0 0c 3c 04 0f 84 0c 02 00 00 31 ff 80 fa
RSP: 0018:ffff88885fb04918 EFLAGS: 00010297
RAX: ffffffff00000000 RBX: 0000000000000002 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffffffff8311af80 R08: 0000000000000020 R09: 00000000c2eda353
R10: ffff88812be2bbc8 R11: 000000001faab533 R12: ffff88885fb049c8
R13: ffff88812be2bbc8 R14: 0000000000000000 R15: ffff88811896ae00
FS: 0000000000000000(0000) GS:ffff8888dca82000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff000002c8 CR3: 0000000243050002 CR4: 0000000000372eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
? try_to_wake_up+0x108/0x4c0
? udp4_lib_lookup2+0xbe/0x150
? udp_lib_lport_inuse+0x100/0x100
? __udp4_lib_lookup+0x2b0/0x410
__xfrm_policy_check2.constprop.0+0x11e/0x130
udp_queue_rcv_one_skb+0x1d/0x530
udp_unicast_rcv_skb+0x76/0x90
__udp4_lib_rcv+0xa64/0xe90
ip_protocol_deliver_rcu+0x20/0x130
ip_local_deliver_finish+0x75/0xa0
ip_local_deliver+0xc1/0xd0
? ip_protocol_deliver_rcu+0x130/0x130
ip_sublist_rcv+0x1f9/0x240
? ip_rcv_finish_core+0x430/0x430
ip_list_rcv+0xfc/0x130
__netif_receive_skb_list_core+0x181/0x1e0
netif_receive_skb_list_internal+0x200/0x360
? mlx5e_build_rx_skb+0x1bc/0xda0 [mlx5_core]
gro_receive_skb+0xfd/0x210
mlx5e_handle_rx_cqe_mpwrq+0x141/0x280 [mlx5_core]
mlx5e_poll_rx_cq+0xcc/0x8e0 [mlx5_core]
? mlx5e_handle_rx_dim+0x91/0xd0 [mlx5_core]
mlx5e_napi_poll+0x114/0xab0 [mlx5_core]
__napi_poll+0x25/0x170
net_rx_action+0x32d/0x3a0
? mlx5_eq_comp_int+0x8d/0x280 [mlx5_core]
? notifier_call_chain+0x33/0xa0
handle_softirqs+0xda/0x250
irq_exit_rcu+0x6d/0xc0
common_interrupt+0x81/0xa0
</IRQ>
Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1753256672-337784-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When updating the PBMC register, we read its current value,
modify desired fields, then write it back.
The port_buffer_size field within PBMC is Read-Only (RO).
If this RO field contains a non-zero value when read,
attempting to write it back will cause the entire PBMC
register update to fail.
This commit ensures port_buffer_size is explicitly cleared
to zero after reading the PBMC register but before writing
back the modified value.
This allows updates to other fields in the PBMC register to succeed.
Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1753256672-337784-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The check for ns < 0 is always false because variable ns is a u32 which
is not a signed type. Fix this by making ns a s32 type.
Fixes: cef9991e04ae ("spi: Add Amlogic SPISG driver")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250725171701.839927-1-colin.i.king@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Previously arch_support_sort_key and arch_perf_header_entry used a
weak symbol to compile as appropriate for x86 and powerpc. A
limitation to this is that the handling of a data file could vary in
cross-platform development. Change to using the perf_env of the
current session to determine the architecture kind and set the sort
key and header entries as appropriate.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-23-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
test__x86_sample_parsing is identical to test__sample_parsing except
it explicitly tested PERF_SAMPLE_WEIGHT_STRUCT. Now the parsing code
is common move the PERF_SAMPLE_WEIGHT_STRUCT to the common sample
parsing test and remove the x86 version.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-22-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
By definition arch sample parsing and synthesis will inhibit certain
kinds of cross-platform record then analysis (report, script,
etc.). Remove arch_perf_parse_sample_weight and
arch_perf_synthesize_sample_weight replacing with a common
implementation. Combine perf_sample p_stage_cyc and retire_lat as
weight3 to capture the differing uses regardless of compiled for
architecture.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The global perf_env was used for the host, but if a perf_env wasn't
easy to come by it was used in a lot of places where potentially
recorded and host data could be confused. Remove the global variable
as now the majority of accesses retrieve the perf_env for the host
from the session.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
There is no session in perf trace unless in replay mode, so in host
mode no session can be associated with the evlist. If the evsel__env
call fails resort to the host_env that's part of the trace. Remove
errno_to_name as it becomes a called once 1-line function once the
argument is turned into a perf_env, just call perf_env__arch_strerrno
directly.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-19-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
auxtrace_mmap__read and auxtrace_mmap__read_snapshot end up calling
`evsel__env(NULL)` which returns the global perf_env variable for the
host. Their only call is in perf record. Rather than use the global
variable pass through the perf_env for `perf record`.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-18-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When creating a machine for the host explicitly pass in a scoped
perf_env. This removes a use of the global perf_env.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-17-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The benchmark doesn't use a data file and so the header perf_env isn't
used. Stack allocate a host perf_env for use to avoid the use of the
global perf_env.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-16-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The use of the global host perf_env variable is potentially
inconsistent within the code. Switch perf top to using a locally
scoped variable that is generally accessed through the session.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-15-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When creating a perf_session the host perf_env may or may not want to
be used. For example, `perf top` uses a host perf_env while `perf
inject` does not. Add a host_env argument to perf_session__new so that
sessions requiring a host perf_env can pass it in. Currently if none
is specified the global perf_env variable is used, but this will
change in later patches.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-14-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The perf_env global variable holds the host perf_env data but its use
is hit and miss. Switch to using local perf_env variables and ensure
scoped perf_env__init and perf_env__exit. This loses command line
setting of the perf_env, but this doesn't matter for tests. So the
perf_env is fully initialized, clear it with memset in perf_env__init.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-13-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Always use the perf_env from the feat_fd's perf_header. Cache the
value on entry to a function in `env` and use `env->` consistently in
the code. Ensure the header is initialized for use in
perf_session__do_write_header.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The session holds a perf_env pointer env. In UI code container_of is
used to turn the env to a session, but this assumes the session
header's env is in use. Rather than a dubious container_of, hold the
session in the evlist and derive the env from the session with
evsel__env, perf_session__env, etc.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The perf_env from the header in the session is frequently accessed,
add an accessor function rather than access directly. Cache the value
to avoid repeated calls. No behavioral change.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Support for build IDs in mmap2 perf events has been present since
Linux v5.12:
https://lore.kernel.org/lkml/20210219194619.1780437-1-acme@kernel.org/
Build ID mmap events don't avoid the need to inject build IDs for DSO
touched by samples as the build ID cache is populated by perf
record. They can avoid some cases of symbol mis-resolution caused by
the file system changing from when a sample occurred and when the DSO
is sought.
Unlike the --buildid-mmap option, this chnage doesn't disable the
build ID cache but it does disable the processing of samples looking
for DSOs to inject build IDs for. To disable the build ID cache the -B
(--no-buildid) option should be used.
Making this option the default was raised on the list in:
https://lore.kernel.org/linux-perf-users/CAP-5=fXP7jN_QrGUcd55_QH5J-Y-FCaJ6=NaHVtyx0oyNh8_-Q@mail.gmail.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The DSO being generated was being accessed through a thread's maps,
this is unnecessary as the dso can just be directly found. This avoids
problems with passing a NULL evsel which may be inspected to determine
properties of a callchain when using the buildid DSO marking code.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The dso_id previously contained the major, minor, inode and inode
generation information from a mmap2 event - the inode generation would
be zero when reading from /proc/pid/maps. The build_id was in the
dso. With build ID mmap2 events these fields wouldn't be initialized
which would largely mean the special empty case where any dso would
match for equality. This isn't desirable as if a dso is replaced we
want the comparison to yield a difference.
To support detecting the difference between DSOs based on build_id,
move the build_id out of the DSO and into the dso_id. The dso_id is
also stored in the DSO so nothing is lost. Capture in the dso_id what
parts have been initialized and rename dso_id__inject to
dso_id__improve_id so that it is clear the dso_id is being improved
upon with additional information. With the build_id in the dso_id, use
memcmp to compare for equality.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
If a build ID is read then not all code paths may ensure it is empty
before use. Initialize the build_id to be zero-ed unless there is
clear initialization such as a call to build_id__init.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Previously only the sample IP's map DSO would be marked hit for the
purposes of populating the build ID cache. Walk the call chain to mark
all IPs and DSOs.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Pass in a size argument rather than implying all build id strings must
be SBUILD_ID_SIZE.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250724163302.596743-4-irogers@google.com
[ fixed some build errors ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Fix typos in comments and error messages.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'cxl' is missing from the path to the clear_poison attribute. Add it.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20250724224308.2101255-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Although setup_ns() set net.ipv4.conf.default.rp_filter=0,
loading certain module such as ipip will automatically create a tunl0 interface
in all netns including new created ones. In the script, this is before than
default.rp_filter=0 applied, as a result tunl0.rp_filter remains set to 1
which causes the test report FAIL when ipip module is preloaded.
Before fix:
Testing DR mode...
Testing NAT mode...
Testing Tunnel mode...
ipvs.sh: FAIL
After fix:
Testing DR mode...
Testing NAT mode...
Testing Tunnel mode...
ipvs.sh: PASS
Fixes: 7c8b89ec506e ("selftests: netfilter: remove rp_filter configuration")
Signed-off-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Complain about kernel taint value only if it wasn't set at start
already.
Fixes: 73db1b5dab6f ("selftests: netfilter: Torture nftables netdev hooks")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
BUG: KASAN: slab-out-of-bounds in .. lib/vsprintf.c:721
Read of size 1 at addr ffff88801eac95c8 by task syz-executor183/5851
[..]
string+0x231/0x2b0 lib/vsprintf.c:721
vsnprintf+0x739/0xf00 lib/vsprintf.c:2874
[..]
nfacct_mt_checkentry+0xd2/0xe0 net/netfilter/xt_nfacct.c:41
xt_check_match+0x3d1/0xab0 net/netfilter/x_tables.c:523
nfnl_acct_find_get() handles non-null input, but the error
printk relied on its presence.
Reported-by: syzbot+4ff165b9251e4d295690@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4ff165b9251e4d295690
Tested-by: syzbot+4ff165b9251e4d295690@syzkaller.appspotmail.com
Fixes: ceb98d03eac5 ("netfilter: xtables: add nfacct match to support extended accounting")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The scratchmap size depends on the number of elements in the set.
For huge sets, each scratch map can easily require very large
allocations, e.g. for 100k entries each scratch map will require
close to 64kbyte of memory.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The matching algorithm has implemented thrice:
1. data path lookup, generic version
2. data path lookup, avx2 version
3. control plane lookup
Merge 1 and 3 by refactoring pipapo_get as a common helper, then make
nft_pipapo_lookup and nft_pipapo_get both call the common helper.
Aside from the code savings this has the benefit that we no longer allocate
temporary scratch maps for each control plane get and insertion operation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This stems from a time when sets and nft_dynset resided in different kernel
modules. We can replace this with a direct call.
We could even remove both ->update and ->delete, given its only
supported by rhashtable, but on the off-chance we'll see runtime
add/delete for other types or a new set type keep that as-is for now.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Return the extension pointer instead of passing it as a function
argument to be filled in by the callee.
As-is, whenever false is returned, the extension pointer is not used.
For all set types, when true is returned, the extension pointer was set
to the matching element.
Only exception: nft_set_bitmap doesn't support extensions.
Return a pointer to a static const empty element extension container.
return false -> return NULL
return true -> return the elements' extension pointer.
This saves one function argument.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
They are not used anymore, so remove them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Introduce NFNL_HOOK_TYPE_NFT_FLOWTABLE to distinguish flowtable hooks
from base chain ones. Nested attributes are shared with the old NFTABLES
hook info type since they fit apart from their misleading name.
Old nftables in user space will ignore this new hook type and thus
continue to print flowtable hooks just like before, e.g.:
| family netdev {
| hook ingress device test0 {
| 0000000000 nf_flow_offload_ip_hook [nf_flow_table]
| }
| }
With this patch in place and support for the new hook info type, output
becomes more useful:
| family netdev {
| hook ingress device test0 {
| 0000000000 flowtable ip mytable myft [nf_flow_table]
| }
| }
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Introduce a helper routine adding the nested attribute for use by a
second caller later.
Note how this introduces cancelling of 'nest2' for categorical reasons.
Since always followed by cancelling of the outer 'nest', it is
technically not needed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()")
switched del_timer to timer_delete, but did not modify the comment for
ip_vs_conn_expire_now(). Now fix it.
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The config snippet specifies CONFIG_SCTP_DIAG. This was never an option.
Replace CONFIG_SCTP_DIAG with the intended CONFIG_INET_SCTP_DIAG.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Some specified options rely on NETFILTER_XTABLES_LEGACY to be enabled.
IP_NF_TARGET_TTL for instance depends on IP_NF_MANGLE which in turn
depends on IP_NF_IPTABLES_LEGACY -> NETFILTER_XTABLES_LEGACY.
Enable relevant iptables config options explicitly, this is needed
to avoid breakage when symbols related to iptables-legacy
will depend on NETFILTER_LEGACY resp. IP_TABLES_LEGACY.
This also means that the classic tables (Kernel modules) will
not be enabled by default, so enable them too.
Signed-off-by: Florian Westphal <fw@strlen.de>
[bigeasy: Split out the config bits from the main patch]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The seqcount xt_recseq is used to synchronize the replacement of
xt_table::private in xt_replace_table() against all readers such as
ipt_do_table()
To ensure that there is only one writer, the writing side disables
bottom halves. The sequence counter can be acquired recursively. Only the
first invocation modifies the sequence counter (signaling that a writer
is in progress) while the following (recursive) writer does not modify
the counter.
The lack of a proper locking mechanism for the sequence counter can lead
to live lock on PREEMPT_RT if the high prior reader preempts the
writer. Additionally if the per-CPU lock on PREEMPT_RT is removed from
local_bh_disable() then there is no synchronisation for the per-CPU
sequence counter.
The affected code is "just" the legacy netfilter code which is replaced
by "netfilter tables". That code can be disabled without sacrificing
functionality because everything is provided by the newer
implementation. This will only requires the usage of the "-nft" tools
instead of the "-legacy" ones.
The long term plan is to remove the legacy code so lets accelerate the
progress.
Relax dependencies on iptables legacy, replace select with depends on,
this should cause no harm to existing kernel configs and users can still
toggle IP{6}_NF_IPTABLES_LEGACY in any case.
Make EBTABLES_LEGACY, IPTABLES_LEGACY and ARPTABLES depend on
NETFILTER_XTABLES_LEGACY. Hide xt_recseq and its users,
xt_register_table() and xt_percpu_counter_alloc() behind
NETFILTER_XTABLES_LEGACY. Let NETFILTER_XTABLES_LEGACY depend on
!PREEMPT_RT.
This will break selftest expecing the legacy options enabled and will be
addressed in a following patch.
Co-developed-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since commit a3efd81205b1 ("netfilter: conntrack: move generation
seqcnt out of netns_ct") this param is unused.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since commit 9e539c5b6d9c ("netfilter: nf_tables: disable expression
reduction infra") this is unused.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since commit 2173c519d5e9 ("audit: normalize NETFILTER_PKT")
these are unused, so can be removed.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When no logger is registered, nf_conntrack_log_invalid fails to log invalid
packets, leaving users unaware of actual invalid traffic. Improve this by
loading nf_log_syslog, similar to how 'iptables -I FORWARD 1 -m conntrack
--ctstate INVALID -j LOG' triggers it.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Zi Li <zi.li@linux.dev>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add the netns field in the "nf_conntrack: table full, dropping packet"
log to help locate the specific netns when the table is full.
Signed-off-by: lvxiafei <lvxiafei@sensetime.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
interfaces"
Jimmy Assarsson <extja@kvaser.com> says:
This patch series simplifies the process of identifying which network
interface (can0..canX) corresponds to which physical CAN channel on
Kvaser USB based CAN interfaces.
Note that this patch series is based on [1]
"can: kvaser_pciefd: Simplify identification of physical CAN interfaces"
Changes in v3:
- Fix GCC compiler array warning (-Warray-bounds)
- Fix transient Sparse warning
- Add tag Reviewed-by Vincent Mailhol
Changes in v2:
- New patch with devlink documentation
- New patch assigning netdev.dev_port
- Formatting and refactoring
[1] https://lore.kernel.org/linux-can/20250725123230.8-1-extja@kvaser.com
Link: https://patch.msgid.link/20250725123452.41-1-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
List the version information reported by the kvaser_usb driver
through devlink.
Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://patch.msgid.link/20250725123452.41-12-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Register each CAN channel of the device as an devlink physical port.
This makes it easier to get device information for a given network
interface (i.e. can2).
Example output:
$ devlink dev
usb/1-1.3:1.0
$ devlink port
usb/1-1.3:1.0/0: type eth netdev can0 flavour physical port 0 splittable false
usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 1 splittable false
$ devlink port show can1
usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 0 splittable false
$ devlink dev info
usb/1-1.3:1.0:
driver kvaser_usb
serial_number 1020
versions:
fixed:
board.rev 1
board.id 7330130009653
running:
fw 3.22.527
$ ethtool -i can1
driver: kvaser_usb
version: 6.12.10-arch1-1
firmware-version: 3.22.527
expansion-rom-version:
bus-info: 1-1.3:1.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://patch.msgid.link/20250725123452.41-11-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Expose device information via devlink info_get():
* Serial number
* Firmware version
* Hardware revision
* EAN (product number)
Example output:
$ devlink dev
usb/1-1.2:1.0
$ devlink dev info
usb/1-1.2:1.0:
driver kvaser_usb
serial_number 1020
versions:
fixed:
board.rev 1
board.id 7330130009653
running:
fw 3.22.527
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://patch.msgid.link/20250725123452.41-10-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Add devlink support at device level.
Example output:
$ devlink dev
usb/1-1.3:1.0
$ devlink dev info
usb/1-1.3:1.0:
driver kvaser_usb
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://patch.msgid.link/20250725123452.41-9-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|