Age | Commit message (Collapse) | Author |
|
Add error pointer check after calling otx2_mbox_get_rsp().
Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool")
Fixes: d0cf9503e908 ("octeontx2-pf: ethtool fec mode support")
Signed-off-by: Dipendra Khadka <kdipendra88@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
|
|
Add error pointer check after calling otx2_mbox_get_rsp().
Fixes: ab58a416c93f ("octeontx2-pf: cn10k: Get max mtu supported from admin function")
Signed-off-by: Dipendra Khadka <kdipendra88@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
|
|
MAXAG is a legitimate value for bmp->db_numag
Fixes: e63866a47556 ("jfs: fix out-of-bounds in dbNextAG() and diAlloc()")
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
The ret may be zero in btrfs_search_dir_index_item() and should not
passed to ERR_PTR(). Now btrfs_unlink_subvol() is the only caller to
this, reconstructed it to check ERR_PTR(-ENOENT) while ret >= 0.
This fixes smatch warnings:
fs/btrfs/dir-item.c:353
btrfs_search_dir_index_item() warn: passing zero to 'ERR_PTR'
Fixes: 9dcbe16fccbb ("btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
Syzbot reports the following crash:
BTRFS info (device loop0 state MCS): disabling free space tree
BTRFS info (device loop0 state MCS): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1)
BTRFS info (device loop0 state MCS): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2)
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:backup_super_roots fs/btrfs/disk-io.c:1691 [inline]
RIP: 0010:write_all_supers+0x97a/0x40f0 fs/btrfs/disk-io.c:4041
Call Trace:
<TASK>
btrfs_commit_transaction+0x1eae/0x3740 fs/btrfs/transaction.c:2530
btrfs_delete_free_space_tree+0x383/0x730 fs/btrfs/free-space-tree.c:1312
btrfs_start_pre_rw_mount+0xf28/0x1300 fs/btrfs/disk-io.c:3012
btrfs_remount_rw fs/btrfs/super.c:1309 [inline]
btrfs_reconfigure+0xae6/0x2d40 fs/btrfs/super.c:1534
btrfs_reconfigure_for_mount fs/btrfs/super.c:2020 [inline]
btrfs_get_tree_subvol fs/btrfs/super.c:2079 [inline]
btrfs_get_tree+0x918/0x1920 fs/btrfs/super.c:2115
vfs_get_tree+0x90/0x2b0 fs/super.c:1800
do_new_mount+0x2be/0xb40 fs/namespace.c:3472
do_mount fs/namespace.c:3812 [inline]
__do_sys_mount fs/namespace.c:4020 [inline]
__se_sys_mount+0x2d6/0x3c0 fs/namespace.c:3997
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[CAUSE]
To support mounting different subvolume with different RO/RW flags for
the new mount APIs, btrfs introduced two workaround to support this feature:
- Skip mount option/feature checks if we are mounting a different
subvolume
- Reconfigure the fs to RW if the initial mount is RO
Combining these two, we can have the following sequence:
- Mount the fs ro,rescue=all,clear_cache,space_cache=v1
rescue=all will mark the fs as hard read-only, so no v2 cache clearing
will happen.
- Mount a subvolume rw of the same fs.
We go into btrfs_get_tree_subvol(), but fc_mount() returns EBUSY
because our new fc is RW, different from the original fs.
Now we enter btrfs_reconfigure_for_mount(), which switches the RO flag
first so that we can grab the existing fs_info.
Then we reconfigure the fs to RW.
- During reconfiguration, option/features check is skipped
This means we will restart the v2 cache clearing, and convert back to
v1 cache.
This will trigger fs writes, and since the original fs has "rescue=all"
option, it skips the csum tree read.
And eventually causing NULL pointer dereference in super block
writeback.
[FIX]
For reconfiguration caused by different subvolume RO/RW flags, ensure we
always run btrfs_check_options() to ensure we have proper hard RO
requirements met.
In fact the function btrfs_check_options() doesn't really do many
complex checks, but hard RO requirement and some feature dependency
checks, thus there is no special reason not to do the check for mount
reconfiguration.
Reported-by: syzbot+56360f93efa90ff15870@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/0000000000008c5d090621cb2770@google.com/
Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes")
CC: stable@vger.kernel.org # 6.8+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In debugging some corrupt squashfs files, we observed symptoms of
corrupt page cache pages but correct on-disk contents. Further
investigation revealed that the exact symptom was a correct page
followed by an incorrect, duplicate, page. This got us thinking about
extent maps.
commit ac05ca913e9f ("Btrfs: fix race between using extent maps and merging them")
enforces a reference count on the primary `em` extent_map being merged,
as that one gets modified.
However, since,
commit 3d2ac9922465 ("btrfs: introduce new members for extent_map")
both 'em' and 'merge' get modified, which started modifying 'merge'
and thus introduced the same race.
We were able to reproduce this by looping the affected squashfs workload
in parallel on a bunch of separate btrfs-es while also dropping caches.
We are still working on a simple enough reproducer to make into an fstest.
The simplest fix is to stop modifying 'merge', which is not essential,
as it is dropped immediately after the merge. This behavior is simply
a consequence of the order of the two extent maps being important in
computing the new values. Modify merge_ondisk_extents to take prev and
next by const* and also take a third merged parameter that it puts the
results in. Note that this introduces the rather odd behavior of passing
'em' to merge_ondisk_extents as a const * and as a regular ptr.
Fixes: 3d2ac9922465 ("btrfs: introduce new members for extent_map")
CC: stable@vger.kernel.org # 6.11+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Inside lock_delalloc_folios(), there are several problems related to
sector size < page size handling:
- Set the writer locks without checking if the folio is still valid
We call btrfs_folio_start_writer_lock() just like it's folio_lock().
But since the folio may not even be the folio of the current mapping,
we can easily screw up the folio->private.
- The range is not clamped inside the page
This means we can over write other bitmaps if the start/len is not
properly handled, and trigger the btrfs_subpage_assert().
- @processed_end is always rounded up to page end
If the delalloc range is not page aligned, and we need to retry
(returning -EAGAIN), then we will unlock to the page end.
Thankfully this is not a huge problem, as now
btrfs_folio_end_writer_lock() can handle range larger than the locked
range, and only unlock what is already locked.
Fix all these problems by:
- Lock and check the folio first, then call
btrfs_folio_set_writer_lock()
So that if we got a folio not belonging to the inode, we won't
touch folio->private.
- Properly truncate the range inside the page
- Update @processed_end to the locked range end
Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Since commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to
avoid low stall in btrfs_commit_transaction()"), btrfs qgroup can
automatically skip large subtree scan at the cost of marking qgroup
inconsistent.
It's designed to address the final performance problem of snapshot drop
with qgroup enabled, but to be safe the default value is
BTRFS_MAX_LEVEL, requiring a user space daemon to set a different value
to make it work.
I'd say it's not a good idea to rely on user space tool to set this
default value, especially when some operations (snapshot dropping) can
be triggered immediately after mount, leaving a very small window to
that that sysfs interface.
So instead of disabling this new feature by default, enable it with a
low threshold (3), so that large subvolume tree drop at mount time won't
cause huge qgroup workload.
CC: stable@vger.kernel.org # 6.1
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
After the migration to use fs context for processing mount options we had
a slight change in the semantics for remounting a filesystem that was
mounted with compress-force. Before we could clear compress-force by
passing only "-o compress[=algo]" during a remount, but after that change
that does not work anymore, force-compress is still present and one needs
to pass "-o compress-force=no,compress[=algo]" to the mount command.
Example, when running on a kernel 6.8+:
$ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/)
$ mount -o remount,compress=zlib:5 /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/)
On a 6.7 kernel (or older):
$ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/)
$ mount -o remount,compress=zlib:5 /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/)
So update btrfs_parse_param() to clear "compress-force" when "compress" is
given, providing the same semantics as kernel 6.7 and older.
Reported-by: Roman Mamedov <rm@romanrm.net>
Link: https://lore.kernel.org/linux-btrfs/20241014182416.13d0f8b0@nvm/
CC: stable@vger.kernel.org # 6.8+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The kernel test robot reported a build failure on m68k in the intel
driver due to the recent shapers-related changes.
The mentioned arch has funny alignment properties, let's be explicit
about the binary layout expectation introducing a padding field.
Fixes: 608a5c05c39b ("virtchnl: support queue rate limit and quanta size configuration")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410131710.71Wt6LKO-lkp@intel.com/
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/e45d1c9f17356d431b03b419f60b8b763d2ff768.1729000481.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Breno Leitao says:
====================
net: netconsole refactoring and warning fix
The netconsole driver was showing a warning related to userdata
information, depending on the message size being transmitted:
------------[ cut here ]------------
WARNING: CPU: 13 PID: 3013042 at drivers/net/netconsole.c:1122 write_ext_msg+0x3b6/0x3d0
? write_ext_msg+0x3b6/0x3d0
console_flush_all+0x1e9/0x330
...
Identifying the cause of this warning proved to be non-trivial due to:
* The write_ext_msg() function being over 100 lines long
* Extensive use of pointer arithmetic
* Inconsistent naming conventions and concept application
The send_ext_msg() function grew organically over time:
* Initially, the UDP packet consisted of a header and body
* Later additions included release prepend and userdata
* Naming became inconsistent (e.g., "body" excludes userdata, "header"
excludes prepended release)
This lack of consistency made investigating issues like the above warning
more challenging than what it should be.
To address these issues, the following steps were taken:
* Breaking down write_ext_msg() into smaller functions with clear scopes
* Improving readability and reasoning about the code
* Simplifying and clarifying naming conventions
Warning Fix
-----------
The warning occurred when there was insufficient buffer space to append
userdata. While this scenario is acceptable (as userdata can be sent in a
separate packet later), the kernel was incorrectly raising a warning. A
one-line fix has been implemented to resolve this issue.
The fix was already sent to net, and is already available in net-next
also.
v4:
* https://lore.kernel.org/all/20240930131214.3771313-1-leitao@debian.org/
v3:
* https://lore.kernel.org/all/20240910100410.2690012-1-leitao@debian.org/
v2:
* https://lore.kernel.org/all/20240909130756.2722126-1-leitao@debian.org/
v1:
* https://lore.kernel.org/all/20240903140757.2802765-1-leitao@debian.org/
====================
Link: https://patch.msgid.link/20241017095028.3131508-1-leitao@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Refactor the send_msg_fragmented() function by extracting the logic for
sending the message body into a new function called
send_fragmented_body().
Now, send_msg_fragmented() handles appending the release and header, and
then delegates the task of breaking up the body and sending the
fragments to send_fragmented_body().
This is the final flow now:
When send_ext_msg_udp() is called to send a message, it will:
- call send_msg_no_fragmentation() if no fragmentation is needed
or
- call send_msg_fragmented() if fragmentation is needed
* send_msg_fragmented() appends the header to the buffer, which is
be persisted until the function returns
* call send_fragmented_body() to iterate and populate the body of
the message. It will not touch the header, and it will only
replace the body, writing the msgbody and/or userdata.
Also add some comment to make the code easier to review.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Do not pass userdata to send_msg_fragmented, since we can get it later.
This will be more useful in the next patch, where send_msg_fragmented()
will be split even more, and userdata is only necessary in the last
function.
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Refactor the code by extracting the logic for appending the
release into the buffer into a separate function.
The goal is to reduce the size of send_msg_fragmented() and improve
code readability.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The current check to determine if the message body was fully sent is
difficult to follow. To improve clarity, introduce a variable that
explicitly tracks whether the message body (msgbody) has been completely
sent, indicating when it's time to begin sending userdata.
Additionally, add comments to make the code more understandable for
others who may work with it.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This new variable tracks the total length of the data to be sent,
encompassing both the message body (msgbody) and userdata, which is
collectively called body.
By explicitly defining body_len, the code becomes clearer and easier to
reason about, simplifying offset calculations and improving overall
readability of the function.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
With the introduction of the userdata concept, the term body has become
ambiguous and less intuitive.
To improve clarity, body is renamed to msg_body, making it clear that
the body is not the only content following the header.
In an upcoming patch, the term body_len will also be revised for further
clarity.
The current packet structure is as follows:
release, header, body, [msg_body + userdata]
Here, [msg_body + userdata] collectively forms what is currently
referred to as "body." This renaming helps to distinguish and better
understand each component of the packet.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Following the previous change, where the non-fragmented case was moved
to its own function, this update introduces a new function called
send_msg_fragmented to specifically manage scenarios where message
fragmentation is required.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The send_ext_msg_udp() function has become quite large, currently
spanning 102 lines. Its complexity, along with extensive pointer and
offset manipulation, makes it difficult to read and error-prone.
The function has evolved over time, and it’s now due for a refactor.
To improve readability and maintainability, isolate the case where no
message fragmentation occurs into a separate function, into a new
send_msg_no_fragmentation() function. This scenario covers about 95% of
the messages.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Variable msg_ready is useless, since it does not represent anything. Get
rid of it, using buf directly instead.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Change ynl-gen-c.py to use NLA_BE16 and NLA_BE32 types to represent
big-endian u16 and u32 ynl types.
Doing this enables those attributes to have range checks applied, as
the validator will then convert to host endianness prior to validation.
The autogenerated kernel/uapi code have been regenerated by running:
./tools/net/ynl/ynl-regen.sh -f
This changes the policy types of the following attributes:
FOU_ATTR_PORT (NLA_U16 -> NLA_BE16)
FOU_ATTR_PEER_PORT (NLA_U16 -> NLA_BE16)
These two are used with nla_get_be16/nla_put_be16().
MPTCP_PM_ADDR_ATTR_ADDR4 (NLA_U32 -> NLA_BE32)
This one is used with nla_get_in_addr/nla_put_in_addr(),
which uses nla_get_be32/nla_put_be32().
IOWs the generated changes are AFAICT aligned with their implementations.
The generated userspace code remains identical, and have been verified
by comparing the output generated by the following command:
make -C tools/net/ynl/generated
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241017094704.3222173-1-ast@fiberby.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Petr Machata says:
====================
selftests: net: Introduce deferred commands
Recently, a defer helper was added to Python selftests. The idea is to keep
cleanup commands close to their dirtying counterparts, thereby making it
more transparent what is cleaning up what, making it harder to miss a
cleanup, and make the whole cleanup business exception safe. All these
benefits are applicable to bash as well, exception safety can be
interpreted in terms of safety vs. a SIGINT.
This patchset therefore introduces a framework of several helpers that
serve to schedule cleanups in bash selftests.
- Patch #1 has more details about the primitives being introduced.
Patch #2 adds a fallback cleanup() function to lib.sh, because ideally
selftests wouldn't need to introduce a dedicated cleanup function at all.
- Patch #3 adds a parameter to stop_traffic(), which makes it possible to
start other background processes after the traffic is started without
confusing the cleanup.
- Patches #4 to #10 convert a number of selftests.
The goal was to convert all tests that use start_traffic / stop_traffic
to the defer framework. Leftover traffic generators are a particularly
painful sort of a missed cleanup. Normal unfinished cleanups can usually
be cleaned up simply by rerunning the test and interrupting it early to
let the cleanups run again / in full. This does not work with
stop_traffic, because it is only issued at the end of the test case that
starts the traffic. At the same time, leftover traffic generators
influence follow-up test runs, and are hard to notice.
The tests were however converted whole-sale, not just their traffic bits.
Thus they form a proof of concept of the defer framework.
====================
Link: https://patch.msgid.link/cover.1729157566.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the defer framework to schedule cleanups as soon as the command is
executed.
Note that the start_traffic commands in __burst_test() are each sending a
fixed number of packets (note the -c flag) and then ending. They therefore
do not need a matching stop_traffic.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the defer framework to schedule cleanups as soon as the command is
executed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the defer framework to schedule cleanups as soon as the command is
executed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the defer framework to schedule cleanups as soon as the command is
executed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the defer framework to schedule cleanups as soon as the command is
executed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the defer framework to schedule cleanups as soon as the command is
executed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Instead of having a suite of dedicated cleanup functions, use the defer
framework to schedule cleanups right as their setup functions are run.
The sleep after stop_traffic() in mlxsw selftests is necessary, but
scheduling it as "defer sleep; defer stop_traffic" is silly. Instead, add a
local helper to stop traffic and sleep afterwards.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Now that it is possible to schedule a deferral of stop_traffic() right
after the traffic is started, we do not have to rely on the %% magic to
kill the background process that was started last. Instead we can just give
the PID explicitly. This makes it possible to start other background
processes after the traffic is started without confusing the cleanup.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Consistent use of defers obviates the need for a separate test-specific
cleanup function -- everything is just taken care of in defers. So in this
patch, introduce a cleanup() helper in the forwarding lib.sh, which calls
just pre_cleanup() and defer_scopes_cleanup(). Selftests are obviously
still free to override the function.
Since pre_cleanup() is too entangled with forwarding-specific minutia, the
function cannot currently be in net/lib.sh.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In commit 8510801a9dbd ("selftests: drv-net: add ability to schedule
cleanup with defer()"), a defer helper was added to Python selftests.
The idea is to keep cleanup commands close to their dirtying counterparts,
thereby making it more transparent what is cleaning up what, making it
harder to miss a cleanup, and make the whole cleanup business exception
safe. All these benefits are applicable to bash as well, exception safety
can be interpreted in terms of safety vs. a SIGINT.
This patch therefore introduces a framework of several helpers that serve
to schedule cleanups in bash selftests:
- defer_scope_push(), defer_scope_pop(): Deferred statements can be batched
together in scopes. When a scope is popped, the deferred commands
scheduled in that scope are executed in the order opposite to order of
their scheduling.
- defer(): Schedules a defer to the most recently pushed scope (or the
default scope if none was pushed.)
- defer_prio(): Schedules a defer on the priority track. The priority defer
queue is run before the default defer queue when scope is popped.
The issue that this is addressing is specifically the one of restoring
devlink shared buffer threshold type. When setting up static thresholds,
one has to first change the threshold type to static, then override the
individual thresholds. When cleaning up, it would be natural to reset the
threshold values first, then change the threshold type. But the values
that are valid for dynamic thresholds are generally invalid for static
thresholds and vice versa. Attempts to restore the values first would be
bounced. Thus one has to first reset the threshold type, then adjust the
thresholds.
(You could argue that the shared buffer threshold type API is broken and
you would be right, but here we are.)
This cannot be solved by pure defers easily. I considered making it
possible to disable an existing defer, so that one could then schedule a
new defer and disable the original. But this forward-shifting of the
defer job would have to take place after every threshold-adjusting
command, which would make it very awkward to schedule these jobs.
- defer_scopes_cleanup(): Pops any unpopped scopes, including the default
one. The selftests that use defer should run this in their exit trap.
This is important to get cleanups of interrupted scripts.
- in_defer_scope(): Sometimes a function would like to introduce a new
defer scope, then run whatever it is that it wants to run, and then pop
the scope to run the deferred cleanups. The helper in_defer_scope() can
be used to run another command within such environment, such that any
scheduled defers run after the command finishes.
The framework is added as a separate file lib/sh/defer.sh so that it can be
used by all bash selftests, including those that do not currently use
lib.sh. lib.sh however includes the file by default, because ideally all
tests would use these helpers instead of hand-rolling their cleanups.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The fix for MAC addresses broke detection of the naming convention
because it gave network devices no random MAC before bind()
was called. This means that the check for the local assignment bit
was always negative as the address was zeroed from allocation,
instead of from overwriting the MAC with a unique hardware address.
The correct check for whether bind() has altered the MAC is
done with is_zero_ether_addr
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: Greg Thelen <gthelen@google.com>
Diagnosed-by: John Sperbeck <jsperbeck@google.com>
Fixes: bab8eb0dd4cb9 ("usbnet: modern method to get random MAC")
Link: https://patch.msgid.link/20241017071849.389636-1-oneukum@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
It is meant to use xa_err() to extract the error encoded in the return
value of xa_store().
Fixes: 44c2fbebe18a ("mlxsw: spectrum_router: Share nexthop counters in resilient groups")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20241017023223.74180-1-yuancan@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Report MDI-X resolved state after link up.
Tested on Linkstreet 88E6193X internal PHYs.
Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241017015026.255224-1-paul.davey@alliedtelesis.co.nz
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently reset state configuration of split header works fine for
non-tagged packets and we see no corruption in payload of any size
We need additional programming sequence with reset configuration to
handle VLAN tagged packets to avoid corruption in payload for packets
of size greater than 256 bytes.
Without this change ping application complains about corruption
in payload when the size of the VLAN packet exceeds 256 bytes.
With this change tagged and non-tagged packets of any size works fine
and there is no corruption seen.
Current configuration which has the issue for VLAN packet
----------------------------------------------------------
Split happens at the position at Layer 3 header
|MAC-DA|MAC-SA|Vlan Tag|Ether type|IP header|IP data|Rest of the payload|
2 bytes ^
|
With the fix we are making sure that the split happens now at
Layer 2 which is end of ethernet header and start of IP payload
Ip traffic split
-----------------
Bits which take care of this are SPLM and SPLOFST
SPLM = Split mode is set to Layer 2
SPLOFST = These bits indicate the value of offset from the beginning
of Length/Type field at which header split should take place when the
appropriate SPLM is selected. Reset value is 2bytes.
Un-tagged data (without VLAN)
|MAC-DA|MAC-SA|Ether type|IP header|IP data|Rest of the payload|
2bytes ^
|
Tagged data (with VLAN)
|MAC-DA|MAC-SA|VLAN Tag|Ether type|IP header|IP data|Rest of the payload|
2bytes ^
|
Non-IP traffic split such AV packet
------------------------------------
Bits which take care of this are
SAVE = Split AV Enable
SAVO = Split AV Offset, similar to SPLOFST but this is for AVTP
packets.
|Preamble|MAC-DA|MAC-SA|VLAN tag|Ether type|IEEE 1722 payload|CRC|
2bytes ^
|
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241016234313.3992214-1-quic_abchauha@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
This patchset contains Netfilter fixes for net:
1) syzkaller managed to triger UaF due to missing reference on netns in
bpf infrastructure, from Florian Westphal.
2) Fix incorrect conversion from NFPROTO_UNSPEC to NFPROTO_{IPV4,IPV6}
in the following xtables targets: MARK and NFLOG. Moreover, add
missing
I have my half share in this mistake, I did not take the necessary time
to review this: For several years I have been struggling to keep working
on Netfilter, juggling a myriad of side consulting projects to stop
burning my own savings.
I have extended the iptables-tests.py test infrastructure to improve the
coverage of ip6tables and detect similar problems in the future.
This is a v2 including a extended PR with one more fix.
netfilter pull request 24-10-21
* tag 'nf-24-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: xtables: fix typo causing some targets not to load on IPv6
netfilter: bpf: must hold reference on net namespace
====================
Link: https://patch.msgid.link/20241021094536.81487-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Kuniyuki Iwashima says:
====================
rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL.
This is a prep for the next series where we will push RTNL down to
rtnl_{new,del,set}link().
That means, for example, __rtnl_newlink() is always under RTNL, but
rtnl_newlink() has a non-RTNL section.
As a prerequisite for per-netns RTNL, we will move netns validation
(and RTNL-independent validations if possible) to that section.
rtnl_link_ops and rtnl_af_ops will be protected with SRCU not to
depend on RTNL.
Changes:
v2:
* Add Eric's Reviewed-by to patch 1-4,6,8-11, (no tag on 5,7,12-14)
* Patch 7
* Handle error of init_srcu_struct().
* Call cleanup_srcu_struct() after synchronize_srcu().
* Patch 12
* Move put_net() before errorout label
* Patch 13
* Newly added as prep for patch 14
* Patch 14
* Handle error of init_srcu_struct().
* Call cleanup_srcu_struct() after synchronize_srcu().
v1: https://lore.kernel.org/netdev/20241009231656.57830-1-kuniyu@amazon.com/
====================
Link: https://patch.msgid.link/20241016185357.83849-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_af_ops is alive during inflight RTM_SETLINK
even when its module is being unloaded.
Let's use SRCU to protect ops.
rtnl_af_lookup() now iterates rtnl_af_ops under RCU and returns
SRCU-protected ops pointer. The caller must call rtnl_af_put()
to release the pointer after the use.
Also, rtnl_af_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_SETLINK requests to
complete.
Note that rtnl_af_ops needs to be protected by its dedicated lock
when RTNL is removed.
Note also that BUG_ON() in do_setlink() is changed to the normal
error handling as a different af_ops might be found after
validate_linkmsg().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The next patch will add init_srcu_struct() in rtnl_af_register(),
then we need to handle its error.
Let's add the error handling in advance to make the following
patch cleaner.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_setlink().
RTM_SETLINK could call rtnl_link_get_net_capable() in do_setlink()
to move a dev to a new netns, but the netns needs to be fetched before
holding rtnl_net_lock().
Let's move it to rtnl_setlink() and pass the netns to do_setlink().
Now, RTM_NEWLINK paths (rtnl_changelink() and rtnl_group_changelink())
can pass the prefetched netns to do_setlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_setlink().
Let's unify the error path to make it easy to place rtnl_net_lock().
While at it, keep the variables in reverse xmas order.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_delink().
Let's unify the error path to make it easy to place rtnl_net_lock().
While at it, keep the variables in reverse xmas order.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Another netns option for RTM_NEWLINK is IFLA_LINK_NETNSID and
is fetched in rtnl_newlink_create().
This must be done before holding rtnl_net_lock().
Let's move IFLA_LINK_NETNSID processing to rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As a prerequisite of per-netns RTNL, we must fetch netns before
looking up dev or moving it to another netns.
rtnl_link_get_net_capable() is called in rtnl_newlink_create() and
do_setlink(), but both of them need to be moved to the RTNL-independent
region, which will be rtnl_newlink().
Let's call rtnl_link_get_net_capable() in rtnl_newlink() and pass the
netns down to where needed.
Note that the latter two have not passed the nets to do_setlink() yet
but will do so after the remaining rtnl_link_get_net_capable() is moved
to rtnl_setlink() later.
While at it, dest_net is renamed to tgt_net in rtnl_newlink_create() to
align with rtnl_{del,set}link().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_link_ops is alive during inflight RTM_NEWLINK
even when its module is being unloaded.
Let's use SRCU to protect ops.
rtnl_link_ops_get() now iterates link_ops under RCU and returns
SRCU-protected ops pointer. The caller must call rtnl_link_ops_put()
to release the pointer after the use.
Also, __rtnl_link_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_NEWLINK requests to
complete.
Note that link_ops needs to be protected by its dedicated lock
when RTNL is removed.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ops->validate() does not require RTNL.
Let's move it to rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, if neither dev nor rtnl_link_ops is found in __rtnl_newlink(),
we release RTNL and redo the whole process after request_module(), which
complicates the logic.
The ops will be RTNL-independent later.
Let's move the ops lookup to rtnl_newlink() and do the retry earlier.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_newlink().
Let's move RTNL-independent validation to rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
__rtnl_newlink() got too long to maintain.
For example, netdev_master_upper_dev_get()->rtnl_link_ops is fetched even
when IFLA_INFO_SLAVE_DATA is not specified.
Let's factorise the single dev do_setlink() path to a separate function.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|