summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-23selftests: ublk: add test for UBLK_F_QUIESCEMing Lei
Add test generic_11 for covering new control command of UBLK_U_CMD_QUIESCE_DEV. Add 'quiesce -n dev_id' sub-command on ublk utility for transitioning device state to quiesce states, then verify the feature via generic_10 by doing quiesce and recovery. Cc: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23ublk: add feature UBLK_F_QUIESCEMing Lei
Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV` for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED` or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server cooperation. This feature can help to support to upgrade ublk server application by shutting down ublk server gracefully, meantime keep ublk block device persistent during the upgrading period. The feature is only available for UBLK_F_USER_RECOVERY. Suggested-by: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZEMing Lei
Add test generic_10 for covering new control command of UBLK_U_CMD_UPDATE_SIZE. Add 'update_size -s|--size size_in_bytes' sub-command on ublk utility for supporting this feature, then verify the feature via generic_10. Cc: Omri Mann <omri@nvidia.com> Cc: Jared Holzman <jholzman@nvidia.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23traceevent/block: Add REQ_ATOMIC flag to block trace eventsRitesh Harjani (IBM)
Filesystems like XFS can implement atomic write I/O using either REQ_ATOMIC flag set in the bio or via CoW operation. It will be useful if we have a flag in trace events to distinguish between the two. This patch adds char 'U' (Untorn writes) to rwbs field of the trace events if REQ_ATOMIC flag is set in the bio. <W/ REQ_ATOMIC> ================= xfs_io-4238 [009] ..... 4148.126843: block_rq_issue: 259,0 WFSU 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [009] d.h1. 4148.129864: block_rq_complete: 259,0 WFSU () 768 + 32 none,0,0 [0] <W/O REQ_ATOMIC> =============== xfs_io-4237 [010] ..... 4143.325616: block_rq_issue: 259,0 WS 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [010] d.H1. 4143.329138: block_rq_complete: 259,0 WS () 768 + 32 none,0,0 [0] Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/44317cb2ec4588f6a2c1501a96684e6a1196e8ba.1747921498.git.ritesh.list@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23thermal: qcom: ipq5018: make ops_ipq5018 struct staticGeorge Moussalem
Fix a sparse warning by making the ops_ipq5018 struct static. Fixes: 04b31cc53fe0 ("thermal/drivers/qcom/tsens: Add support for IPQ5018 tsens") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505202356.S21Sc7bk-lkp@intel.com/ Signed-off-by: George Moussalem <george.moussalem@outlook.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ipq5018-tsens-sparse-v1-1-97edaaaef27c@outlook.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23Merge tag 'soc-fixes-6.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "A few last minute fixes: - two driver fixes for samsung/google platforms, both addressing mistakes in changes from the 6.15 merge window - a revert for an allwinner devicetree change that caused problems - a fix for an older regression with the LEDs on Marvell Armada 3720 - a defconfig change to enable chacha20 again after a crypto subsystem change in 6.15 inadventently turned it off" * tag 'soc-fixes-6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection" soc: samsung: usi: prevent wrong bits inversion during unconfiguring firmware: exynos-acpm: check saved RX before bailing out on empty RX queue
2025-05-23thermal/drivers/airoha: Fix spelling mistake "calibrarion" -> "calibration"Colin Ian King
There is a spelling mistake in a dev_info message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20250522083157.1957240-1-colin.i.king@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23Merge tag 'platform-drivers-x86-v6.15-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: - dell-wmi-sysman: Avoid buffer overflow in current_password_store() - fujitsu-laptop: Support Lifebook S2110 hotkeys - intel/pmc: Fix Arrow Lake U/H NPU PCI ID - think-lmi: Fix attribute name usage for non-compliant items - thinkpad_acpi: Ignore battery threshold change event notification * tag 'platform-drivers-x86-v6.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID platform/x86: think-lmi: Fix attribute name usage for non-compliant items platform/x86: thinkpad_acpi: Ignore battery threshold change event notification platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store() platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys
2025-05-23ACPI: bus: Bail out if acpi_kobj registration failsArmin Wolf
The ACPI sysfs code will fail to initialize if acpi_kobj is NULL, together with some ACPI drivers. Follow the other firmware subsystems and bail out if the kobject cannot be registered. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20250518185111.3560-2-W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23Merge tag 'vfs-6.15-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a small set of fixes for the blocking buffer lookup conversion done earlier this cycle. It adds a missing conversion in the getblk slowpath and a few minor optimizations and cleanups" * tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/buffer: optimize discard_buffer() fs/buffer: remove superfluous statements fs/buffer: avoid redundant lookup in getblk slowpath fs/buffer: use sleeping lookup in __getblk_slowpath()
2025-05-23ACPI: platform_profile: Avoid initializing on non-ACPI platformsAlexandre Ghiti
The platform profile driver is loaded even on platforms that do not have ACPI enabled. The initialization of the sysfs entries was recently moved from platform_profile_register() to the module init call, and those entries need acpi_kobj to be initialized which is not the case when ACPI is disabled. This results in the following warning: WARNING: CPU: 5 PID: 1 at fs/sysfs/group.c:131 internal_create_group+0xa22/0xdd8 Modules linked in: CPU: 5 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.15.0-rc7-dirty #6 PREEMPT Tainted: [W]=WARN Hardware name: riscv-virtio,qemu (DT) epc : internal_create_group+0xa22/0xdd8 ra : internal_create_group+0xa22/0xdd8 Call Trace: internal_create_group+0xa22/0xdd8 sysfs_create_group+0x22/0x2e platform_profile_init+0x74/0xb2 do_one_initcall+0x198/0xa9e kernel_init_freeable+0x6d8/0x780 kernel_init+0x28/0x24c ret_from_fork+0xe/0x18 Fix this by checking if ACPI is enabled before trying to create sysfs entries. Fixes: 77be5cacb2c2 ("ACPI: platform_profile: Create class for ACPI platform profile") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://patch.msgid.link/20250522141410.31315-1-alexghiti@rivosinc.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23firmware: cs_dsp: Fix OOB memory read access in KUnit testJaroslav Kysela
KASAN reported out of bounds access - cs_dsp_mock_bin_add_name_or_info(), because the source string length was rounded up to the allocation size. Cc: Simon Trimmer <simont@opensource.cirrus.com> Cc: Charles Keepax <ckeepax@opensource.cirrus.com> Cc: Richard Fitzgerald <rf@opensource.cirrus.com> Cc: patches@opensource.cirrus.com Cc: stable@vger.kernel.org Signed-off-by: Jaroslav Kysela <perex@perex.cz> Link: https://patch.msgid.link/20250523102102.1177151-1-perex@perex.cz Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-23io_uring/cmd: warn on reg buf imports by ineligible cmdsPavel Begunkov
For IORING_URING_CMD_FIXED-less commands io_uring doesn't pull buf_index from the sqe, so imports might succeed if the index coincide, e.g. when it's 0, but otherwise it's error prone. Warn if someone tries to import without the flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/a1c2c88e53c3fe96978f23d50c6bc66c2c79c337.1747991070.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23statmount: update STATMOUNT_SUPPORTED macroDmitry V. Levin
According to commit 8f6116b5b77b ("statmount: add a new supported_mask field"), STATMOUNT_SUPPORTED macro shall be updated whenever a new flag is added. Fixes: 7a54947e727b ("Merge patch series "fs: allow changing idmappings"") Signed-off-by: "Dmitry V. Levin" <ldv@strace.io> Link: https://lore.kernel.org/20250511224953.GA17849@strace.io Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23fs: convert mount flags to enumStephen Brennan
In prior kernel versions (5.8-6.8), commit 9f6c61f96f2d9 ("proc/mounts: add cursor") introduced MNT_CURSOR, a flag used by readers from /proc/mounts to keep their place while reading the file. Later, commit 2eea9ce4310d8 ("mounts: keep list of mounts in an rbtree") removed this flag and its value has since been repurposed. For debuggers iterating over the list of mounts, cursors should be skipped as they are irrelevant. Detecting whether an element is a cursor can be difficult. Since the MNT_CURSOR flag is a preprocessor constant, it's not present in debuginfo, and since its value is repurposed, we cannot hard-code it. For this specific issue, cursors are possible to detect in other ways, but ideally, we would be able to read the mount flag definitions out of the debuginfo. For that reason, convert the mount flags to an enum. Link: https://github.com/osandov/drgn/pull/496 Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Link: https://lore.kernel.org/20250507223402.2795029-1-stephen.s.brennan@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23->mnt_devname is never NULLAl Viro
Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()" back in 2018. Get rid of the dead checks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250421033509.GV2023217@ZenIV Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23mount: add a comment about concurrent changes with statmount()/listmount()Christian Brauner
Add some comments in there highlighting a few non-obvious assumptions. Link: https://lore.kernel.org/20250416-zerknirschen-aluminium-14a55639076f@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23io_uring/io-wq: only create a new worker if it can make progressJens Axboe
Hashed work is serialized by io-wq, intended to be used for cases like serializing buffered writes to a regular file, where the file system will serialize the workers anyway with a mutex or similar. Since they would be forcibly serialized and blocked, it's more efficient for io-wq to handle these individually rather than issue them in parallel. If a worker is currently handling a hashed work item and gets blocked, don't create a new worker if the next work item is also hashed and mapped to the same bucket. That new worker would not be able to make any progress anyway. Reported-by: Fengnan Chang <changfengnan@bytedance.com> Reported-by: Diangang Li <lidiangang@bytedance.com> Link: https://lore.kernel.org/io-uring/20250522090909.73212-1-changfengnan@bytedance.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: ignore non-busy worker going to sleepJens Axboe
When an io-wq worker goes to sleep, it checks if there's work to do. If there is, it'll create a new worker. But if this worker is currently idle, it'll either get woken right back up immediately, or someone else has already created the necessary worker to handle this work. Only go through the worker creation logic if the current worker is currently handling a work item. That means it's being scheduled out as part of handling that work, not just going to sleep on its own. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: move hash helpers to the topJens Axboe
Just in preparation for using them higher up in the function, move these generic helpers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23bcachefs: Path must be locked if trans->locked && should_be_lockedKent Overstreet
If path->should_be_locked is true, that means user code (of the btree API) has seen, in this transaction, something guarded by the node this path has locked, and we have to keep it locked until the end of the transaction. Assert that we're not violating this; should_be_locked should also be cleared only in _very_ special situations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Simplify bch2_path_put()Kent Overstreet
Simplify the "do we need to keep this locked?" checks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Plumb btree_trans for more locking assertsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Clear trans->locked before unlockKent Overstreet
We're adding new should_be_locked assertions: it's going to be illegal to unlock a should_be_locked path when trans->locked is true. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Clear should_be_locked before unlock in key_cache_drop()Kent Overstreet
We're adding new should_be_locked assertions, also add a comment explaining why clearing should_be_locked is safe here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_lockedKent Overstreet
Small additional optimization over the previous patch, bringing us closer to the original behaviour, except when we need to clone to avoid a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Give out new path if upgrade failsKent Overstreet
Avoid transaction restarts due to failure to upgrade - we can traverse a new iterator without a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix btree_path_get_locks when not doing trans restartKent Overstreet
btree_path_get_locks, on failure, shouldn't unlock if we're not issuing a transaction restart: we might drop locks we're not supposed to (if path->should_be_locked is set). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: btree_node_locked_type_nowrite()Kent Overstreet
Small helper to improve locking assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Kill bch2_path_put_nokeep()Kent Overstreet
bch2_path_put_nokeep() was intended for paths we wouldn't need to preserve for a transaction restart - it always frees them right away when the ref hits 0. But since paths are shared, freeing unconditionally is a bug, the path might have been used elsewhere and have should_be_locked set, i.e. we need to keep it locked until the end of the transaction. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23selftests: netfilter: Torture nftables netdev hooksPhil Sutter
Add a ruleset which binds to various interface names via netdev-family chains and flowtables and massage the notifiers by frequently renaming interfaces to match these names. While doing so: - Keep an 'nft monitor' running in background to receive the notifications - Loop over 'nft list ruleset' to exercise ruleset dump codepath - Have iperf running so the involved chains/flowtables see traffic If supported, also test interface wildcard support separately by creating a flowtable with 'wild*' interface spec and quickly add/remove matching dummy interfaces. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Add notifications for hook changesPhil Sutter
Notify user space if netdev hooks are updated due to netdev add/remove events. Send minimal notification messages by introducing NFT_MSG_NEWDEV/DELDEV message types describing a single device only. Upon NETDEV_CHANGENAME, the callback has no information about the interface's old name. To provide a clear message to user space, include the hook's stored interface name in the notification. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Support wildcard netdev hook specsPhil Sutter
User space may pass non-nul-terminated NFTA_DEVICE_NAME attribute values to indicate a suffix wildcard. Expect for multiple devices to match the given prefix in nft_netdev_hook_alloc() and populate 'ops_list' with them all. When checking for duplicate hooks, compare the shortest prefix so a device may never match more than a single hook spec. Finally respect the stored prefix length when hooking into new devices from event handlers. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()Phil Sutter
No point in having err_hook_alloc, just call return directly. Also rename err_hook_dev - it's not about the hook's device but freeing the hook itself. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Handle NETDEV_CHANGENAME eventsPhil Sutter
For the sake of simplicity, treat them like consecutive NETDEV_REGISTER and NETDEV_UNREGISTER events. If the new name matches a hook spec and registration fails, escalate the error and keep things as they are. To avoid unregistering the newly registered hook again during the following fake NETDEV_UNREGISTER event, leave hooks alone if their interface spec matches the new name. Note how this patch also skips for NETDEV_REGISTER if the device is already registered. This is not yet possible as the new name would have to match the old one. This will change with wildcard interface specs, though. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Wrap netdev notifiersPhil Sutter
Handling NETDEV_CHANGENAME events has to traverse all chains/flowtables twice, prepare for this. No functional change intended. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Respect NETDEV_REGISTER eventsPhil Sutter
Hook into new devices if their name matches the hook spec. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Prepare for handling NETDEV_REGISTER eventsPhil Sutter
Put NETDEV_UNREGISTER handling code into a switch, no functional change intended as the function is only called for that event yet. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Have a list of nf_hook_ops in nft_hookPhil Sutter
Supporting a 1:n relationship between nft_hook and nf_hook_ops is convenient since a chain's or flowtable's nft_hooks may remain in place despite matching interfaces disappearing. This stabilizes ruleset dumps in that regard and opens the possibility to claim newly added interfaces which match the spec. Also it prepares for wildcard interface specs since these will potentially match multiple interfaces. All spots dealing with hook registration are updated to handle a list of multiple nf_hook_ops, but nft_netdev_hook_alloc() only adds a single item for now to retain the old behaviour. The only expected functional change here is how vanishing interfaces are handled: Instead of dropping the respective nft_hook, only the matching nf_hook_ops are dropped. To safely remove individual ops from the list in netdev handlers, an rcu_head is added to struct nf_hook_ops so kfree_rcu() may be used. There is at least nft_flowtable_find_dev() which may be iterating through the list at the same time. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()Phil Sutter
The function accesses only the hook's ops field, pass it directly. This prepares for nft_hooks holding a list of nf_hook_ops in future. While at it, make use of the function in __nft_unregister_flowtable_net_hooks() as well. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce nft_register_flowtable_ops()Phil Sutter
Facilitate binding and registering of a flowtable hook via a single function call. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()Phil Sutter
Also a pretty dull wrapper around the hook->ops.dev comparison for now. Will search the embedded nf_hook_ops list in future. The ugly cast to eliminate the const qualifier will vanish then, too. Since this future list will be RCU-protected, also introduce an _rcu() variant here. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce functions freeing nft_hook objectsPhil Sutter
Pointless wrappers around kfree() for now, prep work for an embedded list of nf_hook_ops. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: add packets conntrack state to debug trace infoFlorian Westphal
Add the minimal relevant info needed for userspace ("nftables monitor trace") to provide the conntrack view of the packet: - state (new, related, established) - direction (original, reply) - status (e.g., if connection is subject to dnat) - id (allows to query ctnetlink for remaining conntrack state info) Example: trace id a62 inet filter PRE_RAW packet: iif "enp0s3" ether [..] [..] trace id a62 inet filter PRE_MANGLE conntrack: ct direction original ct state new ct id 32 trace id a62 inet filter PRE_MANGLE packet: [..] [..] trace id a62 inet filter IN conntrack: ct direction original ct state new ct status dnat-done ct id 32 [..] In this case one can see that while NAT is active, the new connection isn't subject to a translation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: conntrack: make nf_conntrack_id callable without a module dependencyFlorian Westphal
While nf_conntrack_id() doesn't need any functionaliy from conntrack, it does reside in nf_conntrack_core.c -- callers add a module dependency on conntrack. Followup patch will need to compute the conntrack id from nf_tables_trace.c to include it in nf_trace messages emitted to userspace via netlink. I don't want to introduce a module dependency between nf_tables and conntrack for this. Since trace is slowpath, the added indirection is ok. One alternative is to move nf_conntrack_id to the netfilter/core.c, but I don't see a compelling reason so far. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmitSebastian Andrzej Siewior
nf_dup_skb_recursion is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move nf_dup_skb_recursion to struct netdev_xmit, provide wrappers. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctxSebastian Andrzej Siewior
nft_pcpu_tun_ctx is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a nft_inner_tun_ctx member (original nft_pcpu_tun_ctx) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_dup{4, 6}: Move duplication check to task_structSebastian Andrzej Siewior
nf_skb_duplicated is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Due to the recursion involved, the simplest change is to make it a per-task variable. Move the per-CPU variable nf_skb_duplicated to task_struct and name it in_nf_duplicate. Add it to the existing bitfield so it doesn't use additional memory. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nft_tunnel: fix geneve_opt dumpFernando Fernandez Mancera
When dumping a nft_tunnel with more than one geneve_opt configured the netlink attribute hierarchy should be as follow: NFTA_TUNNEL_KEY_OPTS | |--NFTA_TUNNEL_KEY_OPTS_GENEVE | | | |--NFTA_TUNNEL_KEY_GENEVE_CLASS | |--NFTA_TUNNEL_KEY_GENEVE_TYPE | |--NFTA_TUNNEL_KEY_GENEVE_DATA | |--NFTA_TUNNEL_KEY_OPTS_GENEVE | | | |--NFTA_TUNNEL_KEY_GENEVE_CLASS | |--NFTA_TUNNEL_KEY_GENEVE_TYPE | |--NFTA_TUNNEL_KEY_GENEVE_DATA | |--NFTA_TUNNEL_KEY_OPTS_GENEVE ... Otherwise, userspace tools won't be able to fetch the geneve options configured correctly. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFsFlorian Westphal
Replace the existing VRF test with a more comprehensive one. It tests following combinations: - fib type (returns address type, e.g. unicast) - fib oif (route output interface index - both with and without 'iif' keyword (changes result, e.g. 'fib daddr type local' will be true when the destination address is configured on the local machine, but 'fib daddr . iif type local' will only be true when the destination address is configured on the incoming interface. Add all types of addresses to test with for both ipv4 and ipv6: - local address on the incoming interface - local address on another interface - local address on another interface thats part of a vrf - address on another host The ruleset stores obtained results from 'fib' in nftables sets and then queries the sets to check that it has the expected results. Perform one pass while packets are coming in on interface NOT part of a VRF and then again when it was added and make sure fib returns the expected routes and address types for the various addresses in the setup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>