Age | Commit message (Collapse) | Author |
|
Instead of using __clear_extent_bit() we can use clear_extent_bit() since
we pass a NULL value for the changeset argument.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG WITH EXPERIMENTAL LARGE FOLIOS]
When testing the experimental large data folio support with compression,
there are several ASSERT()s triggered from btrfs_decompress_buf2page()
when running fsstress with compress=zstd mount option:
- ASSERT(copy_len) from btrfs_decompress_buf2page()
- VM_BUG_ON(offset + len > PAGE_SIZE) from memcpy_to_page()
[CAUSE]
Inside btrfs_decompress_buf2page(), we need to grab the file offset from
the current bvec.bv_page, to check if we even need to copy data into the
bio.
And since we're using single page bvec, and no large folio, every page
inside the folio should have its index properly setup.
But when large folios are involved, only the first page (aka, the head
page) of a large folio has its index properly initialized.
The other pages inside the large folio will not have their indexes
properly initialized.
Thus the page_offset() call inside btrfs_decompress_buf2page() will
result garbage, and completely screw up the @copy_len calculation.
[FIX]
Instead of using page->index directly, go with page_pgoff(), which can
handle non-head pages correctly.
So introduce a helper, file_offset_from_bvec(), to get the file offset
from a single page bio_vec, so the copy_len calculation can be done
correctly.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Simplify conditionally reading an rb_entry(), there's the
rb_entry_safe() helper that checks the node pointer for NULL so we don't
have to write it explicitly.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Allow get_range_bits() to take an extent state pointer to pointer argument
so that we can cache the first extent state record in the target range, so
that a caller can use it for subsequent operations without doing a full
tree search. Currently the only user is try_release_extent_state(), which
then does a call to __clear_extent_bit() which can use such a cached state
record.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When the release_folio callback (from struct address_space_operations) is
invoked we don't allow the folio to be released if its range is currently
locked in the inode's io_tree, as it may indicate the folio may be needed
by the task that locked the range.
However if the range is locked because an ordered extent is finishing,
then we can safely allow the folio to be released because ordered extent
completion doesn't need to use the folio at all.
When we are under memory pressure, the kernel starts writeback of dirty
pages (folios) with the goal of releasing the pages from the page cache
after writeback completes, however this often is not possible on btrfs
because:
* Once the writeback completes we queue the ordered extent completion;
* Once the ordered extent completion starts, we lock the range in the
inode's io_tree (at btrfs_finish_one_ordered());
* If the release_folio callback is called while the folio's range is
locked in the inode's io_tree, we don't allow the folio to be
released, so the kernel has to try to release memory elsewhere,
which may result in triggering more writeback or releasing other
pages from the page cache which may be more useful to have around
for applications.
In contrast, when the release_folio callback is invoked after writeback
finishes and before ordered extent completion starts or locks the range,
we allow the folio to be released, as well as when the release_folio
callback is invoked after ordered extent completion unlocks the range.
Improve on this by detecting if the range is locked for ordered extent
completion and if it is, allow the folio to be released. This detection
is achieved by adding a new extent flag in the io_tree that is set when
the range is locked during ordered extent completion.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Drop reference to pages from the comment since the function is fully folio
aware and works regardless of how many pages are in the folio. Also while
at it, capitalize the first word and make it more explicit that
release_folio is a callback from struct address_space_operations.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The function btrfs_punch_hole_lock_range() needs to make sure there is
no other folio in the range, thus it goes with filemap_range_has_page(),
which works pretty fine.
But if we have large folios, under the following case
filemap_range_has_page() will always return true, forcing
btrfs_punch_hole_lock_range() to do a very time consuming busy loop:
start end
| |
|//|//|//|//| | | | | | | | |//|//|
\ / \ /
Folio A Folio B
In the above case, folio A and B contain our start/end indexes, and there
are no other folios in the range. Thus we do not need to retry inside
btrfs_punch_hole_lock_range().
To prepare for large data folios, introduce a helper,
check_range_has_page(), which will:
- Shrink the search range towards page boundaries
If the rounded down end (exclusive, otherwise it can underflow when @end
is inside the folio at file offset 0) is no larger than the rounded up
start, it means the range contains no other pages other than the ones
covering @start and @end.
Can return false directly in that case.
- Grab all the folios inside the range
- Skip any large folios that cover the start and end indexes
- If any other folios are found return true
- Otherwise return false
This new helper is going to handle both large folios and regular ones.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This involves the following modifications:
- Set the order flags for __filemap_get_folio() inside
prepare_one_folio()
This will allow __filemap_get_folio() to create a large folio if the
address space supports it.
- Limit the initial @write_bytes inside copy_one_range()
If the largest folio boundary splits the initial write range, there is
no way we can write beyond the largest folio boundary.
This is done by a simple helper calc_write_bytes().
- Release exceeding reserved space if the folio is smaller than expected
Which is doing the same handling when short copy happens.
All the preparations should not change the behavior when the largest
folio order is 0.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There are several things not ideal in copy_one_range():
- Unnecessary temporary variables
* block_offset
* reserve_bytes
* dirty_blocks
* num_blocks
* release_bytes
These are utilized to handle short-copy cases.
- Inconsistent handling of btrfs_delalloc_release_extents()
There is a hidden behavior that, after reserving metadata for X bytes
of data write, we have to call btrfs_delalloc_release_extents() with X
once and only once.
Calling btrfs_delalloc_release_extents(X - 4K) and
btrfs_delalloc_release_extents(4K) will cause outstanding extents
accounting to go wrong.
This is because the outstanding extents mechanism is not designed to
handle shrinking of reserved space.
Improve above situations by:
- Use a single @reserved_start and @reserved_len pair
Now we reserve space for the initial range, and if a short copy
happened and we need to shrink the reserved space, we can easily
calculate the new length, and update @reserved_len.
- Introduce helpers to shrink reserved data and metadata space
This is done by two new helpers, shrink_reserved_space() and
btrfs_delalloc_shrink_extents().
The later will do a better calculation if we need to modify the
outstanding extents, and the first one will be utilized inside
copy_one_range().
- Manually unlock, release reserved space and return if no byte is
copied
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The EXTENT_UPTODATE io tree flag is now used only to mark ranges in the
fs_info->excluded_extents as used by super blocks and not available for
extent allocation (to prevent adding those ranges as free space in the
in memory space caches). As we can use any flag for that purpose, and
we are using EXTENT_DIRTY for the pinned extents io tree for example,
remove the EXTENT_UPTODATE flag and use instead EXTENT_DIRTY for the
excluded extents io tree.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At btrfs_add_new_free_space() we keep searching for ranges in the excluded
extents io tree that have the EXTENT_DIRTY bit set, however we never ever
set that bit for ranges in that tree. That is a leftover from when that
function used the global freed extents trees (fs_info->freed_extents[2]),
where we used both the EXTENT_DIRTY and EXTENT_UPTODATE bits, but those
trees are gone with commit fe119a6eeb67 ("btrfs: switch to per-transaction
pinned extents"), which introduced the fs_info->excluded_extents io tree,
where only EXTENT_UPTODATE is set.
So remove the EXTENT_DIRTY bit search at btrfs_add_new_free_space().
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
After commit 52b029f42751 ("btrfs: remove unnecessary EXTENT_UPTODATE
state in buffered I/O path") we never set EXTENT_UPTODATE in an inode's
io_tree anymore, but we still have some code attempting to clear that
bit from an inode's io_tree. Remove that code as it doesn't do anything
anymore. The sole use of the EXTENT_UPTODATE bit is for the excluded
extents io_tree (fs_info->excluded_extents), which is used to track the
locations of super blocks, so that their ranges are never marked as free,
making them unavailable for extent allocation.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we fsync a file (or directory) that has no more hard links, because
while a process had a file descriptor open on it, the file's last hard
link was removed and then the process did an fsync against the file
descriptor, after a power failure or crash the file still exists after
replaying the log.
This behaviour is incorrect since once an inode has no more hard links
it's not accessible anymore and we insert an orphan item into its
subvolume's tree so that the deletion of all its items is not missed in
case of a power failure or crash.
So after log replay the file shouldn't exist anymore, which is also the
behaviour on ext4, xfs, f2fs and other filesystems.
Fix this by not ignoring inodes with zero hard links at
btrfs_log_inode_parent() and by committing an inode's delayed inode when
we are not doing a fast fsync (either BTRFS_INODE_COPY_EVERYTHING or
BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode's runtime flags). This
last step is necessary because when removing the last hard link we don't
delete the corresponding ref (or extref) item, instead we record the
change in the inode's delayed inode with the BTRFS_DELAYED_NODE_DEL_IREF
flag, so that when the delayed inode is committed we delete the ref/extref
item from the inode's subvolume tree - otherwise the logging code will log
the last hard link and therefore upon log replay the inode is not deleted.
The base code for a fstests test case that reproduces this bug is the
following:
. ./common/dmflakey
_require_scratch
_require_dm_target flakey
_require_mknod
_scratch_mkfs >>$seqres.full 2>&1 || _fail "mkfs failed"
_require_metadata_journaling $SCRATCH_DEV
_init_flakey
_mount_flakey
touch $SCRATCH_MNT/foo
# Commit the current transaction and persist the file.
_scratch_sync
# A fifo to communicate with a background xfs_io process that will
# fsync the file after we deleted its hard link while it's open by
# xfs_io.
mkfifo $SCRATCH_MNT/fifo
tail -f $SCRATCH_MNT/fifo | \
$XFS_IO_PROG $SCRATCH_MNT/foo >>$seqres.full &
XFS_IO_PID=$!
# Give some time for the xfs_io process to open a file descriptor for
# the file.
sleep 1
# Now while the file is open by the xfs_io process, delete its only
# hard link.
rm -f $SCRATCH_MNT/foo
# Now that it has no more hard links, make the xfs_io process fsync it.
echo "fsync" > $SCRATCH_MNT/fifo
# Terminate the xfs_io process so that we can unmount.
echo "quit" > $SCRATCH_MNT/fifo
wait $XFS_IO_PID
unset XFS_IO_PID
# Simulate a power failure and then mount again the filesystem to
# replay the journal/log.
_flakey_drop_and_remount
# We don't expect the file to exist anymore, since it was fsynced when
# it had no more hard links.
[ -f $SCRATCH_MNT/foo ] && echo "file foo still exists"
_unmount_flakey
# success, all done
echo "Silence is golden"
status=0
exit
A test case for fstests will be submitted soon.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There's an explanation of how space info works at the top of
fs/btrfs/space-info.c, which makes reference to a variable called
bytes_may_reserve. There's nothing called that in the code, and wasn't
at time the comment was written; as far I can tell this is a typo, and
it should actually be bytes_may_use.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This flag is set after inserting the eb to the buffer tree and cleared
on it's removal. It was added in commit 34b41acec1ccc0 ("Btrfs: use a
bit to track if we're in the radix tree") and wanted to make use of it,
faa2dbf004e89e ("Btrfs: add sanity tests for new qgroup accounting
code"). Both are 10+ years old, we can remove the flag.
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This flag is no longer being used. It was added by commit a826d6dcb32d
("Btrfs: check items for correctness as we search") but it's no longer
being used after commit f26c92386028 ("btrfs: remove reada
infrastructure").
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This flag is no longer being used. It was added by commit ab0fff03055d
("btrfs: add READAHEAD extent buffer flag") and used in commits:
79fb65a1f6d9 ("Btrfs: don't call readahead hook until we have read the entire eb")
78e62c02abb9 ("btrfs: Remove extent_io_ops::readpage_io_failed_hook")
371cdc0700c7 ("btrfs: introduce subpage metadata validation check")
Finally all the code using it was removed by commit f26c92386028 ("btrfs: remove
reada infrastructure").
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This flag was added by commit 656f30dba7ab ("Btrfs: be aware of btree
inode write errors to avoid fs corruption") but it stopped being used
after commit 046b562b20a5 ("btrfs: use a separate end_io handler for
read_extent_buffer").
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Inside the main loop of btrfs_buffered_write() we are doing a lot of
heavy lifting inside a while() loop.
This makes it pretty hard to read, factor out the content into a helper,
copy_one_range() to do the work.
This has no functional change, but with some minor variable renames,
e.g. rename all "sector" into "block".
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Inside the main loop of btrfs_buffered_write(), we have a complex data
and metadata space reservation code, which tries to reserve space for
a COW write, if failed then fallback to check if we can do a NOCOW
write.
Factor out that part of code into a dedicated helper, reserve_space(),
to make the main loop a little easier to read.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Inside the main loop of btrfs_buffered_write(), if something wrong
happened, there is a out-of-loop cleanup path to release the reserved
space.
This behavior saves some code lines, but makes it much harder to read,
as we need to check release_bytes to make sure when we need to do the
cleanup.
Factor out the cleanup part into a helper, release_reserved_space(), to
do the cleanup inside the main loop, so that we can move @release_bytes
inside the loop.
This will make later refactoring of the main loop much easier.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit c87c299776e4 ("btrfs: make buffered write to copy one page a
time") changed how the variable @force_page_uptodate was updated.
Before that commit the variable was only initialized to false at the
beginning of the function, and after hitting a short copy, the next
retry on the same folio would force the folio to be read from the disk.
But after the commit, the variable is always initialized to false at the
beginning of the loop's scope, causing prepare_one_folio() never to get a
true value passed in.
The change in behavior is not a huge deal, it only makes a difference
on how we handle short copies:
Old: Allow the buffer to be split
The first short copy will be rejected, that's the same for both
cases.
But for the next retry, we require the folio to be read from disk.
Then even if we hit a short copy again, since the folio is already
uptodate, we do not need to handle partial uptodate range, and can
continue, marking the short copied range as dirty and continue.
This will split the buffer write into the folio as two buffered
writes.
New: Do not allow the buffer to be split
The first short copy will be rejected, that's the same for both
cases.
For the next retry, we do nothing special, thus if the short copy
happened again, we reject it again, until either the short copy is
gone, or we failed to fault in the buffer.
This will mean the buffer write into the folio will either fail or
succeed, no splitting will happen.
To me, either solution is fine, but the new one makes it simpler and
requires no special handling, so I prefer that solution.
And since @force_page_uptodate is always false when passed into
prepare_one_folio(), we can just remove the variable.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 1d2fbb7f1f9e ("btrfs: allow compression even if the range is not
page aligned") introduced the block perfect compression for block size <
page size cases.
Before that commit, if the fs block size is smaller than page size (aka
subpage cases), compressed write is only enabled if the dirty range is
fully page aligned.
This block perfect compression support was introduced in v6.13, and has
been tested for two kernel releases.
I believe it's time to move it out of experimental features so that we
can get more tests in the real world.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Pull NVMe fixes from Christoph:
"nvme fixes for linux 6.15
- fixes for atomic writes (Alan Adamson)
- fixes for polled CQs in nvmet-epf (Damien Le Moal)
- fix for polled CQs in nvme-pci (Keith Busch)
- fix compile on odd configs that need to be forced to inline
(Kees Cook)
- one more quirk (Ilya Guterman)"
* tag 'nvme-6.15-2025-05-15' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro
nvme: all namespaces in a subsystem must adhere to a common atomic write size
nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing
nvmet: pci-epf: remove NVMET_PCI_EPF_Q_IS_SQ
nvmet: pci-epf: improve debug message
nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq()
nvmet: pci-epf: do not fall back to using INTX if not supported
nvmet: pci-epf: clear completion queue IRQ flag on delete
nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
nvme-pci: make nvme_pci_npages_prp() __always_inline
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
Miri Korenblit says:
====================
iwlwifi features, notably a rework of the transport configuration
====================
Link: https://patch.msgid.link/MW5PR11MB5810DD2655DE461E98A618DDA390A@MW5PR11MB5810.namprd11.prod.outlook.com/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Merge series from Chen-Yu Tsai <wenst@chromium.org>:
This series is meant as an example on how to use macros and range cases
to shorten the MediaTek audio frontend drivers. The drivers have large
tables describing the registers and register fields for every supported
audio DMA interface. (Some are actually skipped!) There's a lot of
duplication which can be eliminated using macros. This should serve as
a reference for the MT8196 AFE driver that I had commented on.
The three patches tackle separate tables in the driver. The remaining
one that could be tackled is the list of DAIs; but that one has more
differences between each entry, so I haven't done it yet.
|
|
If the 'buf' array received from the user contains an empty string, the
'length' variable will be zero. Accessing the 'buf' array element with
index 'length - 1' will result in a buffer overflow.
Add a check for an empty string.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e8a60aa7404b ("platform/x86: Introduce support for Systems Management Driver over WMI for Dell Systems")
Cc: stable@vger.kernel.org
Signed-off-by: Vladimir Moskovkin <Vladimir.Moskovkin@kaspersky.com>
Link: https://lore.kernel.org/r/39973642a4f24295b4a8fad9109c5b08@kaspersky.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
ISP device specific configuration is not available in ACPI. Add
swnode graph to configure the missing device properties for the
OV05C10 camera device supported on amdisp platform.
Add support to create i2c-client dynamically when amdisp i2c
adapter is available.
Co-developed-by: Benjamin Chan <benjamin.chan@amd.com>
Signed-off-by: Benjamin Chan <benjamin.chan@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Link: https://lore.kernel.org/r/20250514215623.522746-1-pratap.nirujogi@amd.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Add documentation to describe die_id attribute.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250508230250.1186619-6-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
For domains with agents to control cores (compute dies) show matching
Linux CPU die ID. Linux CPU ID is a logical die ID, so this may not match
physical die ID or domain_id. So, a mapping is required to get Linux CPU
die ID. This attribute is only presented when CPUID enumerates die ids.
This attribute can be used by orchestration software like Kubernetes to
target specific dies for uncore frequency control.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250508230250.1186619-5-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
The die ID in the Linux topology sysfs is a logical identifier that
differs from the one presented in CPUID leaf 0x1F or via MSR 0x54.
Introduce an interface that returns the Linux CPU die ID based on a
given package ID and power domain ID. This mapping is stored during the
CPU online callback in an array.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250508230250.1186619-4-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Add documentation to describe agent_types attribute.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250508230250.1186619-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Currently, users need detailed hardware information to understand the
scope of controls within each uncore domain. Uncore frequency controls
manage subsystems such as core, cache, memory, and I/O. The UFS TPMI
provides this information, which can be used to present the scope more
clearly.
Each uncore domain consists of one or more agent types, with each agent
type controlling one or more uncore hardware subsystems. For example, a
single agent might control both the core and cache.
Introduce a new attribute called "agent_types." This attribute displays
a list of agents, separated by space character.
The string representations for agent types are as follows:
For core agent: core
For cache agent: cache
For memory agent: memory
For I/O agent: io
These agent types are read during probe time for each cluster and stored
as part of the struct uncore_data.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250508230250.1186619-2-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
The S2110 has an additional set of media playback control keys enabled
by a hardware toggle button that switches the keys between "Application"
and "Player" modes. Toggling "Player" mode just shifts the scancode of
each hotkey up by 4.
Add defines for new scancodes, and a keymap and dmi id for the S2110.
Tested on a Fujitsu Lifebook S2110.
Signed-off-by: Valtteri Koskivuori <vkoskiv@gmail.com>
Acked-by: Jonathan Woithe <jwoithe@just42.net>
Link: https://lore.kernel.org/r/20250509184251.713003-1-vkoskiv@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Felix Fietkau says:
===================
mt76 fix for 6.15
- disable napi on driver removal to fix warning
- fix multicast rx regression on mt7925
===================
Link: https://patch.msgid.link/3b526d06-b717-4d47-817c-a9f47b796a31@nbd.name/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Subbaraya Sundeep says:
====================
octeontx2: Improve mailbox tracing
Octeontx2 VF,PF and AF devices communicate using hardware
shared mailbox region where VFs can only to talk to its PFs
and PFs can only talk to AF. AF does the entire resource management
for all PFs and VFs. The shared mbox region is used for synchronous
requests (requests from PF to AF or VF to PF) and async notifications
(notifications from AF to PFs or PF to VFs). Sending a request to AF
from VF involves various stages like
1. VF allocates message in shared region
2. Triggers interrupt to PF
3. PF upon receiving interrupt from VF will copy the message
from VF<->PF region to PF<->AF region
4. Triggers interrupt to AF
5. AF processes it and writes response in PF<->AF region
6. Triggers interrupt to PF
7. PF copies responses from PF<->AF region to VF<->PF region
8. Triggers interrupt to Vf
9. VF reads response in VF<->PF region
Due to various stages involved, Tracepoints are used in mailbox code for
debugging. Existing tracepoints need some improvements so that maximum
information can be inferred from trace logs during an issue.
This patchset tries to enhance existing tracepoints and also adds
a couple of tracepoints.
====================
Link: https://patch.msgid.link/1747136408-30685-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Apart from netdev interface Octeontx2 PF does the following:
1. Sends its own requests to AF and receives responses from AF.
2. Receives async messages from AF.
3. Forwards VF requests to AF, sends respective responses from AF to VFs.
4. Sends async messages to VFs.
This patch adds new tracepoint otx2_msg_status to display the status
of PF wrt mailbox handling.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1747136408-30685-5-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch adds pcifunc which represents PF and VF device to the
tracepoints otx2_msg_alloc, otx2_msg_send, otx2_msg_process so that
it is easier to correlate which device allocated the message, which
device forwarded it and which device processed that message.
Also add message id in otx2_msg_send tracepoint to check which
message is sent at any point of time from a device.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1747136408-30685-4-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Mailbox UP messages and CPT messages names are not being
displayed with their names in trace log files. Add those
messages too in otx2_mbox_id2name.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1747136408-30685-3-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use tracepoint instead of dev_dbg since the entire
mailbox code uses tracepoints for debugging.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1747136408-30685-2-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make sure that n_channels is set after allocating the
struct cfg80211_registered_device::int_scan_req member. Seen with
syzkaller:
UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:1208:5
index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]')
This was missed in the initial conversions because I failed to locate
the allocation likely due to the "sizeof(void *)" not matching the
"channels" array type.
Reported-by: syzbot+4bcdddd48bb6f0be0da1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/680fd171.050a0220.2b69d1.045e.GAE@google.com/
Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by")
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20250509184641.work.542-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Update the for_each_netdev_in_bond_rcu macro to iterate through network
devices in the bond's network namespace instead of always using
init_net. This change is safe because:
1. **Bond-Slave Namespace Relationship**: A bond device and its slaves
must reside in the same network namespace. The bond device's
namespace is established at creation time and cannot change.
2. **Slave Movement Implications**: Any attempt to move a slave device
to a different namespace automatically removes it from the bond, as
per kernel networking stack rules.
This maintains the invariant that slaves must exist in the same
namespace as their bond.
This change is part of an effort to enable Link Aggregation (LAG) to
work properly inside custom network namespaces. Previously, the macro
would only find slave devices in the initial network namespace,
preventing proper bonding functionality in custom namespaces.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250513081922.525716-1-mbloch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Depending on the data offset, skb_to_sgvec_nomark() may use
less scatterlist elements than what was forecasted by the
previous call to skb_cow_data().
It specifically happens when 'skbheadlen(skb) < offset', because
in this case we entirely skip the skb's head, which would have
required its own scatterlist element.
For this reason, it doesn't make sense to check that
skb_to_sgvec_nomark() returns the same value as skb_cow_data(),
but we can rather check for errors only, as it happens in
other parts of the kernel.
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
When debugging a 'no route to host' error it can be beneficial
to know the address of the unreachable destination.
Print it along the debugging text.
While at it, add a missing parenthesis in a different debugging
message inside ovpn_peer_endpoints_update().
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
The keepalive worker is cancelled before calling
unregister_netdevice_queue(), therefore it will never
hit a situation where the reg_state can be different
than NETDEV_REGISTERED.
For this reason, checking reg_state is useless and the
condition can be removed.
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
To increase code coverage, extend the ovpn selftests with the following
cases:
* connect UDP peers using a mix of IPv6 and IPv4 at the transport layer
* run full test with tunnel MTU equal to transport MTU (exercising
IP layer fragmentation)
* ping "LAN IP" served by VPN peer ("LAN behind a client" test case)
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
ndo_start_xmit is basically expected to always return NETDEV_TX_OK.
However, in case of error, it was currently returning NET_XMIT_DROP,
which is not a valid netdev_tx_t return value, leading to
misinterpretation.
Change ndo_start_xmit to always return NETDEV_TX_OK to signal back
to the caller that the packet was handled (even if dropped).
Effects of this bug can be seen when sending IPv6 packets having
no peer to forward them to:
$ ip netns exec ovpn-server oping -c20 fd00:abcd:220:201::1
PING fd00:abcd:220:201::1 (fd00:abcd:220:201::1) 56 bytes of data.00:abcd:220:201 :1
ping_send failed: No buffer space available
ping_sendto: No buffer space available
ping_send failed: No buffer space available
ping_sendto: No buffer space available
...
Fixes: c2d950c4672a ("ovpn: add basic interface creation/destruction/management routines")
Reported-by: Gert Doering <gert@greenie.muc.de>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/5
Tested-by: Gert Doering <gert@greenie.muc.de>
Acked-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31591.html
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
getaddrinfo() may fail with error code different from EAI_FAIL
or EAI_NONAME, however in this case we still try to free the
results object, thus leading to a crash.
Fix this by bailing out on any possible error.
Fixes: 959bc330a439 ("testing/selftests: add test tool and scripts for ovpn module")
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
When routing a packet to a LAN behind a peer, ovpn needs to
inspect the route entry that brought the packet there in the
first place.
If this packet is truly routable, the route entry provides the
GW to be used when looking up the VPN peer to send the packet to.
However, the route entry is currently dropped before entering
the ovpn xmit function, because the IFF_XMIT_DST_RELEASE priv_flag
is enabled by default.
Clear the IFF_XMIT_DST_RELEASE flag during interface setup to allow
the route entry (skb's dst) to survive and thus be inspected
by the ovpn routing logic.
Fixes: a3aaef8cd173 ("ovpn: implement peer lookup logic")
Reported-by: Gert Doering <gert@greenie.muc.de>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/2
Tested-by: Gert Doering <gert@greenie.muc.de>
Acked-by: Gert Doering <gert@greenie.muc.de> # as a primary user
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31583.html
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|
|
IPv6 user packets (sent over the tunnel) may be larger than
the outgoing interface MTU after encapsulation.
When this happens ovpn should allow the kernel to fragment
them because they are "locally generated".
To achieve the above, we must set skb->ignore_df = 1
so that ip6_fragment() can be made aware of this decision.
Failing to do so will result in ip6_fragment() dropping
the packet thinking it was "routed".
No change is required in the IPv4 path, because when
calling udp_tunnel_xmit_skb() we already pass the
'df' argument set to 0, therefore the resulting datagram
is allowed to be fragmented if need be.
Fixes: 08857b5ec5d9 ("ovpn: implement basic TX path (UDP)")
Reported-by: Gert Doering <gert@greenie.muc.de>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/3
Tested-by: Gert Doering <gert@greenie.muc.de>
Acked-by: Gert Doering <gert@greenie.muc.de> # as primary user
Link: https://mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31577.html
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
|