Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We normally can't create a new directory with the case-insensitive
option already set - except when we're creating a snapshot.
And if casefolding is enabled filesystem wide, we should still set it
even though not strictly required, for consistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we only ever logged the filesystem UUID.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We have to be able to print superblock sections even if they fail to
validate (for debugging), so we have to calculate the number of entries
from the field size.
Reported-by: syzbot+5138f00559ffb3cb3610@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It seems btree node scan picked up a partially overwritten btree node,
and corrected the "bset version older than sb version_min" error -
resulting in an invalid superblock with a bad version_min field.
Don't run this check at all when we're in btree node scan, and when we
do run it, do something saner if the bset version is totally crazy.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Multiple ioctl handlers individually use a lot of stack space, and clang chooses
to inline them into the bch2_fs_ioctl() function, blowing through the warning
limit:
fs/bcachefs/chardev.c:655:6: error: stack frame size (1032) exceeds limit (1024) in 'bch2_fs_ioctl' [-Werror,-Wframe-larger-than]
655 | long bch2_fs_ioctl(struct bch_fs *c, unsigned cmd, void __user *arg)
By marking the largest two of them as noinline_for_stack, no indidual code path
ends up using this much, which avoids the warning and reduces the possible
total stack usage in the ioctl handler.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsck_err() can return a transaction restart if passed a transaction
object - this has always been true when it has to drop locks to prompt
for user input, but we're seeing this more now that we're logging the
error being corrected in the journal.
gc_accounting_done() doesn't call fsck_err() from an actual commit loop,
and it doesn't need to be holding btree locks when it calls fsck_err(),
so the easy fix here for the unhandled transaction restart is to just
not pass it the transaction object. We'll miss out on the fancy new
logging, but that's ok.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a small leak of the superblock 'clean' section.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
PREEMPT_RT redefines how standard spinlocks work, so local_irq_save() +
spin_lock() is no longer equivalent to spin_lock_irqsave(). Fortunately,
we don't strictly need to do it that way.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a UAF: we were calling darray_make_room() and retaining a pointer to
the old buffer.
And fix an UBSAN warning: struct bch_sb_field_downgrade_entry uses
__counted_by, so set dst->nr_errors before assigning to the array entry.
Reported-by: syzbot+14c52d86ddbd89bea13e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Object debugging generally needs special provisions for putting said
objects on the stack, which rhashtable does not have.
Reported-by: syzbot+bcc38a9556d0324c2ec2@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we think we're read-only but the VFS doesn't, fun will ensue.
And now that we know we have to be able to do this safely, just make
nochanges imply ro.
Reported-by: syzbot+a7d6ceaba099cc21dee4@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Link: https://lore.kernel.org/all/6822ab02.050a0220.f2294.00cb.GAE@google.com/T/
Reported-by: syzbot+2c3ef91c9523c3d1a25c@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_lost_data() gets called on btree node read error, but the
error might be transient.
btree_node_scan is expensive, and there's no need to run it persistently
(marking it in the superblock as required to run) - check_topology
will run it if required, via bch2_get_scanned_nodes().
Running it non-persistently is fine, to avoid check_topology having to
rewind recovery to run it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_increase_depth() was originally for disaster recovery, to get
some data back from the journal when a btree root was bad.
We don't need it for that purpose anymore; on bad btree root we'll
launch btree node scan and reconstruct all the interior nodes.
If there's a key in the journal for a depth that doesn't exists, and
it's not from check_topology/btree node scan, we should just ignore it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Invalidate pagecache after we write the new superblock and send a
uevent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It seems excessive forced btree node rewrites can cause interior btree
updates to become wedged during recovery, before we're using the write
buffer for backpointer updates.
Add more flags so we can determine where these are coming from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We had a deadlock during recovery where interior btree updates became
wedged and all open_buckets were consumed; start adding more
introspection.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Log the specific error being corrected in the journal when we're
repairing, this helps greatly with 'bcachefs list_journal' analysis.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The next patch will add logging of the specific error being corrected in
repair paths to the journal; this means __bch2_fsck_err() can return
transaction restarts in places that previously weren't expecting them.
check_topology() is old code that doesn't use btree iterators for btree
node locking - it'll have to be rewritten in the future to work online.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Add initial DMR support, which required smarter RAPL probe
- Fix AMD MSR RAPL energy reporting
- Add RAPL power limit configuration output
- Minor fixes
* tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2025.06.08
tools/power turbostat: Add initial support for BartlettLake
tools/power turbostat: Add initial support for DMR
tools/power turbostat: Dump RAPL sysfs info
tools/power turbostat: Avoid probing the same perf counters
tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared
tools/power turbostat: Clean up add perf/msr counter logic
tools/power turbostat: Introduce add_msr_counter()
tools/power turbostat: Remove add_msr_perf_counter_()
tools/power turbostat: Remove add_cstate_perf_counter_()
tools/power turbostat: Remove add_rapl_perf_counter_()
tools/power turbostat: Quit early for unsupported RAPL counters
tools/power turbostat: Always check rapl_joules flag
tools/power turbostat: Fix AMD package-energy reporting
tools/power turbostat: Fix RAPL_GFX_ALL typo
tools/power turbostat: Add Android support for MSR device handling
tools/power turbostat.8: pm_domain wording fix
tools/power turbostat.8: fix typo: idle_pct should be pct_idle
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanup from Thomas Gleixner:
"The delayed from_timer() API cleanup:
The renaming to the timer_*() namespace was delayed due massive
conflicts against Linux-next. Now that everything is upstream finish
the conversion"
* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename from_timer() to timer_container_of()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Cure IO bitmap inconsistencies
A failed fork cleans up all resources of the newly created thread
via exit_thread(). exit_thread() invokes io_bitmap_exit() which
does the IO bitmap cleanups, which unfortunately assume that the
cleanup is related to the current task, which is obviously bogus.
Make it work correctly
- A lockdep fix in the resctrl code removed the clearing of the
command buffer in two places, which keeps stale error messages
around. Bring them back.
- Remove unused trace events"
* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
x86/iopl: Cure TIF_IO_BITMAP inconsistencies
x86/fpu: Remove unused trace events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"Add the missing seq_file forward declaration in the timer namespace
header"
* tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timens: Add struct seq_file forward declaration
|
|
Add initial DMR support, which required smarter RAPL probe
Fix AMD MSR RAPL energy reporting
Add RAPL power limit configuration output
Minor fixes
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Add initial support for BartlettLake.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Add initial support for DMR.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
for example:
intel-rapl:1: psys 28.0s:100W 976.0us:100W
intel-rapl:0: package-0 28.0s:57W,max:15W 2.4ms:57W
intel-rapl:0/intel-rapl:0:0: core disabled
intel-rapl:0/intel-rapl:0:1: uncore disabled
intel-rapl-mmio:0: package-0 28.0s:28W,max:15W 2.4ms:57W
[lenb: simplified format]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
squish me
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
For the RAPL package energy status counter, Intel and AMD share the same
perf_subsys and perf_name, but with different MSR addresses.
Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are
introduced to describe this counter for different Vendors.
As a result, the perf counter is probed twice, and causes a failure in
in get_rapl_counters() because expected_read_size and actual_read_size
don't match.
Fix the problem by skipping the already probed counter.
Note, this is not a perfect fix. For example, if different
vendors/platforms use the same MSR value for different purpose, the code
can be fooled when it probes a rapl_counter_arch_infos[] entry that does
not belong to the running Vendor/Platform.
In a long run, better to put rapl_counter_arch_infos[] into the
platform_features so that this becomes Vendor/Platform specific.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
cleared
platform_features->rapl_msrs describes the RAPL MSRs supported. While
RAPL Perf counters can be exposed from different kernel backend drivers,
e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver.
Thus, turbostat should first blindly probe all the available RAPL Perf
counters, and falls back to the RAPL MSR counters if they are listed in
platform_features->rapl_msrs.
With this, platforms that don't have RAPL MSRs can clear the
platform_features->rapl_msrs bits and use RAPL Perf counters only.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Increase the code readability by moving the no_perf/no_msr flag and the
cai->perf_name/cai->msr sanity checks into the counter probe functions.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR
counters and MPERF/APERF/SMI MSR counters, thus its name is misleading.
Similar to add_perf_counter(), introduce add_msr_counter() to probe a
counter via MSR. Introduce wrapper function add_rapl_msr_counter() at
the same time to add extra check for Zero return value for specified
RAPL counters.
No functional change intended.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
As the only caller of add_msr_perf_counter_(), add_msr_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.
Remove add_msr_perf_counter_() and move all the logic to
add_msr_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
As the only caller of add_cstate_perf_counter_(),
add_cstate_perf_counter() just gives extra debug output on top. There is
no need to keep both functions.
Remove add_cstate_perf_counter_() and move all the logic to
add_cstate_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.
Remove add_rapl_perf_counter_() and move all the logic to
add_rapl_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Quit early for unsupported RAPL counters.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
rapl_joules bit should always be checked even if
platform_features->rapl_msrs is not set or no_msr flag is used.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
commit 05a2f07db888 ("tools/power turbostat: read RAPL counters via
perf") that adds support to read RAPL counters via perf defines the
notion of a RAPL domain_id which is set to physical_core_id on
platforms which support per_core_rapl counters (Eg: AMD processors
Family 17h onwards) and is set to the physical_package_id on all the
other platforms.
However, the physical_core_id is only unique within a package and on
platforms with multiple packages more than one core can have the same
physical_core_id and thus the same domain_id. (For eg, the first cores
of each package have the physical_core_id = 0). This results in all
these cores with the same physical_core_id using the same entry in the
rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the
perf-initialization for cores whose domain_ids have already been
visited, cores that have the same physical_core_id always read the
perf file corresponding to the physical_core_id of the first package
and thus the package-energy is incorrectly reported to be the same
value for different packages.
Note: This issue only arises when RAPL counters are read via perf and
not when they are read via MSRs since in the latter case the MSRs are
read separately on each core.
Fix this issue by associating each CPU with rapl_core_id which is
unique across all the packages in the system.
Fixes: 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Fix typo in the currently unused RAPL_GFX_ALL macro definition.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr,
updates error messages and permission checks to reflect the Android
device path, and wraps platform-specific code with #if defined(ANDROID)
to ensure correct behavior on both Android and non-Android systems.
These changes improve compatibility and usability of turbostat on
Android devices.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
turbostat.8: clarify that uncore "domains" are Power Management domains,
aka pm_domains.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
idle_pct should be pct_idle
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Thomas Gleixner:
"A single fix for the x86 performance counters on Intel CPUs:
The MSR offset calculations for fixed performance counters are stored
at the wrong index in the configuration array causing the general
purpose counter MSR offset to be overwritten, so both the general
purpose and the fixed counters offsets are incorrect.
Correct the array index calculation to fix that"
* tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for the PCI/MSI code:
The conversion to per device MSI domains created a MSI domain with
size 1 instead of sizing it to the maximum possible number of MSI
interrupts for the device. This "worked" as the subsequent allocations
resized the domain, but the recent change to move the prepare() call
into the domain creation path broke this works by chance mechanism.
Size the domain properly at creation time"
* tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Size device MSI domain with the maximum number of vectors
|
|
Pull mount fixes from Al Viro:
"Various mount-related bugfixes:
- split the do_move_mount() checks in subtree-of-our-ns and
entire-anon cases and adapt detached mount propagation selftest for
mount_setattr
- allow clone_private_mount() for a path on real rootfs
- fix a race in call of has_locked_children()
- fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP
- make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right
userns
- avoid false negatives in path_overmount()
- don't leak MNT_LOCKED from parent to child in finish_automount()
- do_change_type(): refuse to operate on unmounted/not ours mounts"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_change_type(): refuse to operate on unmounted/not ours mounts
clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
selftests/mount_setattr: adapt detached mount propagation test
do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases
fs: allow clone_private_mount() for a path on real rootfs
fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
finish_automount(): don't leak MNT_LOCKED from parent to child
path_overmount(): avoid false negatives
fs/fhandle.c: fix a race in call of has_locked_children()
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- multichannel/reconnect fixes
- move smbdirect (smb over RDMA) defines to fs/smb/common so they will
be able to be used in the future more broadly, and a documentation
update explaining setting up smbdirect mounts
- update email address for Paulo
* tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
MAINTAINERS, mailmap: Update Paulo Alcantara's email address
cifs: add documentation for smbdirect setup
cifs: do not disable interface polling on failure
cifs: serialize other channels when query server interfaces is pending
cifs: deal with the channel loading lag while picking channels
smb: client: make use of common smbdirect_socket_parameters
smb: smbdirect: introduce smbdirect_socket_parameters
smb: client: make use of common smbdirect_socket
smb: smbdirect: add smbdirect_socket.h
smb: client: make use of common smbdirect.h
smb: smbdirect: add smbdirect.h with public structures
smb: client: make use of common smbdirect_pdu.h
smb: smbdirect: add smbdirect_pdu.h with protocol definitions
|