summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-24bcachefs: Fix block/btree node size defaultsKent Overstreet
We're fixing option parsing in userspace, it now obeys OPT_SB_FIELD_SECTORS Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Add missing smp_rmb()Alan Huang
The smp_rmb() guarantees that reads from reservations.counter occur before accessing cur_entry_u64s. It's paired with the atomic64_try_cmpxchg in journal_entry_open. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Kill JOURNAL_ERRORS()Kent Overstreet
Convert these to standard error codes, which means we can pass them outside the journal code, they're easier to pass to tracepoints, etc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Filesystem discard option now propagates to devicesKent Overstreet
the discard option is special, because it's both a filesystem and a device option. When set at the filesytsem level, it's supposed to propagate to (if set persistently via sysfs) or override (if non persistently as a mount option) the devices - that now works correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Device state is now a runtime optionKent Overstreet
Other options can normally be set at runtime via sysfs, no reason for this one not to be as well - it just doesn't support the degraded flags argument this way, that requires the ioctl. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Setting foreground_target at runtime now triggers rebalanceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Device options now use standard sysfs codeKent Overstreet
Device options now use the common code for sysfs, and can superblock fields (in a struct bch_member). This replaces BCH_DEV_OPT_SETTERS(), which was weird and easy to miss. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Kill BCH_DEV_OPT_SETTERS()Kent Overstreet
Previously, device options had their superblock option field listed separately, which was weird and easy to miss when defining options. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Remove spurious smp_mb()Alan Huang
The smp_mb() is paired with nothing. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Fix incorrect state countAlan Huang
atomic64_read(&j->seq) - j->seq_write_started == JOURNAL_STATE_BUF_NR is the condition in journal_entry_open where we return JOURNAL_ERR_max_open, so journal_cur_seq(j) - seq == JOURNAL_STATE_BUF_NR means that the buf corresponding to seq has started to write. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Fix btree iter flags in data moveKent Overstreet
Rebalance requires a not_extents iterator. This wasn't hit before because all_snapshots disableds is_extents on snapshots btrees - but has no effect on the reflink btree. Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Validate bch_sb.offset fieldKent Overstreet
This was missed - but it needs to be correct for the superblock recovery tool that scans the start and end of the device for backup superblocks: we don't want to pick up superblocks that belong to a different partition that starts at a different offset. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: bch2_sb_validate() doesn't need bch_sb_handleKent Overstreet
Minor refactoring, so that bch2_sb_validate() can be used in the new userspace superblock recovery tool. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Add missing random.h includesKent Overstreet
Fix build in userspace, and good hygeine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Better incompat version/feature error messagesKent Overstreet
If we can't mount because of an incompatibility, print what's supported and unsupported - to help solve PEBKAC issues. Reported-by: Roland Vet <vet.roland@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Fix offset_into_extent in data move pathKent Overstreet
Fixes the following: [ 17.607394] kernel BUG at fs/bcachefs/reflink.c:261! [ 17.608316] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 17.608485] CPU: 0 UID: 0 PID: 564 Comm: bch-rebalance/3 Tainted: G OE 6.14.0-rc6-arch1-gfcb0bd9609d2 #7 0efd7a8f4a00afeb2c5fb6e7ecb1aec8ddcbb1e1 [ 17.608616] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 17.608736] Hardware name: Micro-Star International Co., Ltd. MS-7D75/MAG B650 TOMAHAWK WIFI (MS-7D75), BIOS 1.74 08/01/2023 [ 17.608855] RIP: 0010:bch2_lookup_indirect_extent+0x252/0x290 [bcachefs] [ 17.609006] Code: 00 00 00 00 e8 7f 51 f5 ff 89 c3 85 c0 74 52 48 8b 7d b0 4c 89 ee e8 4d 4b f4 ff 48 63 d3 48 89 d0 31 d2 e9 2e ff ff ff 0f 0b <0f> 0b 48 8b 7d b0 4c 89 ee 48 89 55 a8 e8 2c 4b f4 ff 4c 8b 55 a8 [ 17.609136] RSP: 0018:ffffa3714455f850 EFLAGS: 00010246 [ 17.609261] RAX: 0000000000000080 RBX: ffff895891098790 RCX: 0000000000000000 [ 17.609387] RDX: 0000000000000080 RSI: ffffa3714455fa90 RDI: ffff895889550000 [ 17.609511] RBP: ffffa3714455f8c0 R08: ffff895891098790 R09: 0000000000000001 [ 17.609637] R10: ffffa3714455f8d8 R11: ffffa3714455f950 R12: ffffa3714455fa58 [ 17.609763] R13: ffff895891098790 R14: ffffa3714455fa58 R15: ffff895889550000 [ 17.609888] FS: 0000000000000000(0000) GS:ffff896757c00000(0000) knlGS:0000000000000000 [ 17.610015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.610143] CR2: 0000716b8cda2750 CR3: 0000000914e22000 CR4: 0000000000f50ef0 [ 17.610272] PKRU: 55555554 [ 17.610403] Call Trace: [ 17.610535] <TASK> [ 17.610662] ? __die_body.cold+0x19/0x27 [ 17.610791] ? die+0x2e/0x50 [ 17.610918] ? do_trap+0xca/0x110 [ 17.611049] ? do_error_trap+0x6a/0x90 [ 17.611178] ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.611331] ? exc_invalid_op+0x50/0x70 [ 17.611468] ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.611620] ? asm_exc_invalid_op+0x1a/0x20 [ 17.611757] ? bch2_lookup_indirect_extent+0x252/0x290 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.611911] ? bch2_move_data_btree+0x58a/0x6c0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612084] bch2_move_data_btree+0x58a/0x6c0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612256] ? __pfx_rebalance_pred+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612431] ? bch2_move_extent+0x3d7/0x6e0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612607] ? __bch2_move_data+0xea/0x200 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612782] __bch2_move_data+0xea/0x200 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.612959] ? __pfx_rebalance_pred+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.613149] do_rebalance+0x517/0x8d0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.613342] ? local_clock_noinstr+0xd/0xd0 [ 17.613518] ? local_clock+0x15/0x30 [ 17.613693] ? __bch2_trans_get+0x152/0x300 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.613890] ? __pfx_bch2_rebalance_thread+0x10/0x10 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] [ 17.614090] bch2_rebalance_thread+0x66/0xb0 [bcachefs c42b95c23facdfe11d39755520127cd771dddec2] The offset_into_extent bit was copied from the read path, but it's unnecessary here, where we always want to read and move the entire indirect extent, and it causes the assertion pop - because we're using a non-extents iterator, which always points to the end of the reflink pointer. Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: use sha256() instead of crypto_shash APIEric Biggers
Just use sha256() instead of the clunky crypto API. This is much simpler. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Remove unnecessary softdeps on crc32c and crc64Eric Biggers
Since bcachefs does not access crc32c and crc64 through the crypto API, there is no need to use module softdeps to ensure they are loaded. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: #if 0 out (enable|disable)_encryption()Kent Overstreet
These weren't hooked up, but they probably should be - add some comments for context. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Improve can_write_extent()Kent Overstreet
This fixes another "rebalance spinning and doing no work" issue; rebalance was reading extents it wanted to move, but then failing in bch2_write() -> bch2_alloc_sectors_start() due to being unable to allocate sufficient replicas. This was triggered by a user playing with the durability settings, the foreground device was an NVME device with durability=2, and originally he'd set the background device to durability=2 as well, but changed it back to 1 (the default) after seeing IO errors. That meant that with replicas=2, we want to move data off the NVME device which satisfies that constraint, but with a single durability=1 device on the background target there's no way to move the extent to that target while satisfiying the "required replicas" constraint. The solution for now is for bch2_data_update_init() to check for this, and return an error - before kicking off the read. bch2_data_update_init() already had two different checks for "will we be able to write this extent", with partially duplicated code, so this patch combines and improves that logic. Additionally, we now always bail out and return an error if there's insufficient space on the destination target. Previously, we only did this for BCH_WRITE_alloc_nowait moves, because it might be the case that copygc just needs to free up space on the destination target. But we really shouldn't kick off a move if the destination is full, we can't currently distinguish between "really full" and "just need to wait for copygc", and if we are going to wait on copygc it'd be better to do that before kicking off the move. This will additionally fix "rebalance spinning" issues caused by a filesystem that has more data than can fit in background_target - which is a valid scenario, since we don't exclude foreground/cache devices when calculating filesystem capacity. Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: trace_io_move_write_failKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: Increase blacklist rangeAlan Huang
Now there are 16 journal buffers, 8 is too small to be enough. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: __bch2_read() now takes a btree_transKent Overstreet
Next patch will be checking if the extent we're reading from matches the IO failure we saw before marking the failure. For this to work, __bch2_read() needs to take the same transaction context that bch2_rbio_retry() uses to do that check. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24bcachefs: BCH_READ_data_update -> bch_read_bio.data_updateKent Overstreet
Read flags are codepath dependent and change as they're passed around, while the fields in rbio._state are mostly fixed properties of that particular object. Losing track of BCH_READ_data_update would be bad, and previously it was not obvious if it was always correctly set in the rbio, so this is a safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24Merge branch 'pm-cpuidle'Rafael J. Wysocki
Merge cpuidle updates for 6.15-rc5, including a menu governor update that is reported to improve some benchmark results quite significantly: - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki). - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki). - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai). * pm-cpuidle: cpuidle: Init cpuidle only for present CPUs cpuidle: intel_idle: Update MAINTAINERS intel_idle: introduce 'no_native' module parameter cpuidle: menu: Update documentation after get_typical_interval() changes cpuidle: menu: Avoid discarding useful information cpuidle: menu: Eliminate outliers on both ends of the sample set cpuidle: menu: Tweak threshold use in get_typical_interval() cpuidle: menu: Use one loop for average and variance computations cpuidle: menu: Drop a redundant local variable intel_idle: clean up BYT/CHT auto demotion disable
2025-03-24Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq updates for 6.15-rc1: - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code from drivers (Viresh Kumar). - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, and zuoqian). - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai). - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski). - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng). - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar). - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki). - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan). * pm-cpufreq: (91 commits) dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650 dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1 dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible cpufreq: Init cpufreq only for present CPUs cpufreq: tegra186: Share policy per cluster cpufreq/amd-pstate: Drop actions in amd_pstate_epp_cpu_offline() cpufreq/amd-pstate: Stop caching EPP cpufreq/amd-pstate: Rework CPPC enabling cpufreq/amd-pstate: Drop debug statements for policy setting cpufreq/amd-pstate: Update cppc_req_cached for shared mem EPP writes cpufreq/amd-pstate: Move all EPP tracing into *_update_perf and *_set_epp functions cpufreq/amd-pstate: Cache CPPC request in shared mem case too cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks cpufreq/amd-pstate-ut: Adjust variable scope cpufreq/amd-pstate-ut: Run on all of the correct CPUs cpufreq/amd-pstate-ut: Drop SUCCESS and FAIL enums cpufreq/amd-pstate-ut: Allow lowest nonlinear and lowest to be the same cpufreq/amd-pstate-ut: Use _free macro to free put policy cpufreq/amd-pstate: Drop `cppc_cap1_cached` ...
2025-03-24Merge branches 'thermal-core' and 'thermal-misc'Rafael J. Wysocki
Merge thermal core updates and miscellaneous updates of the thermal control subsystem for 6.15-rc1: - Delay exposing thermal zone sysfs interface to prevent user space from accessing thermal zones that have not been completely initialized yet (Lucas De Marchi). - Fix a spelling mistake in a comment in the thermal core (Colin Ian King). - Use kcalloc() instead of kzalloc() in some places in the thermal control subsystem (Lukasz Luba, Ethan Carter Edwards). - Clean up variable initialization in int340x_thermal_zone_add() (Christophe JAILLET). * thermal-core: thermal: core: Delay exposing sysfs interface thermal: core: Fix spelling mistake "Occurences" -> "Occurrences" * thermal-misc: thermal: intel: Clean up zone_trips[] initialization in int340x_thermal_zone_add() thermal: hisi: Use kcalloc() instead of kzalloc() with multiplication thermal: int340x: Use kcalloc() instead of kzalloc() with multiplication thermal: k3_j72xx_bandgap: Use kcalloc() instead of kzalloc() thermal/of: Use kcalloc() instead of kzalloc() with multiplication thermal/debugfs: replace kzalloc() with kcalloc() in thermal_debug_tz_add()
2025-03-24Merge branches 'acpi-x86', 'acpi-platform-profile', 'acpi-apei' and 'acpi-misc'Rafael J. Wysocki
Merge an ACPI CPPC update, ACPI platform-profile driver updates, an ACPI APEI update and a MAINTAINERS update related to ACPI for 6.15-rc1: - Add a missing header file include to the x86 arch CPPC code (Mario Limonciello). - Rework the sysfs attributes implementation in the ACPI platform-profile driver and improve the unregistration code in it (Nathan Chancellor, Kurt Borja). - Prevent the ACPI HED driver from being built as a module and change its initcall level to subsys_initcall to avoid initialization ordering issues related to it (Xiaofei Tan). - Update a maintainer email address in the ACPI PMIC entry in MAINTAINERS (Mika Westerberg). * acpi-x86: x86/ACPI: CPPC: Add missing include * acpi-platform-profile: ACPI: platform_profile: Improve platform_profile_unregister() ACPI: platform-profile: Fix CFI violation when accessing sysfs files * acpi-apei: ACPI: HED: Always initialize before evged * acpi-misc: MAINTAINERS: Use my kernel.org address for ACPI PMIC work
2025-03-24Merge branches 'acpi-power', 'acpi-fan', 'acpi-thermal', 'acpi-button' and ↵Rafael J. Wysocki
'acpi-video' Merge five ACPI driver updates for 6.15-rc1: - Use the str_on_off() helper function instead of hard-coded strings in the ACPI power resources handling code (Thorsten Blum). - Add fan speed reporting for ACPI fans that have _FST, but otherwise do not support the entire ACPI 4 fan interface (Joshua Grisham). - Fix a stale comment regarding trip points in acpi_thermal_add() that diverged from the commented code after removing _CRT evaluation from acpi_thermal_get_trip_points() (xueqin Luo). - Make ACPI button driver also subscribe to system events (Mario Limonciello). - Use the str_yes_no() helper function instead of hard-coded strings in the ACPI backlight (video) driver (Thorsten Blum). * acpi-power: ACPI: power: Use str_on_off() helper function * acpi-fan: ACPI: fan: Add fan speed reporting for fans with only _FST * acpi-thermal: ACPI: thermal: Fix stale comment regarding trip points * acpi-button: ACPI: button: Install notifier for system events as well * acpi-video: ACPI: video: Use str_yes_no() helper in acpi_video_bus_add()
2025-03-24ax25: Remove broken autobindMurad Masimov
Binding AX25 socket by using the autobind feature leads to memory leaks in ax25_connect() and also refcount leaks in ax25_release(). Memory leak was detected with kmemleak: ================================================================ unreferenced object 0xffff8880253cd680 (size 96): backtrace: __kmalloc_node_track_caller_noprof (./include/linux/kmemleak.h:43) kmemdup_noprof (mm/util.c:136) ax25_rt_autobind (net/ax25/ax25_route.c:428) ax25_connect (net/ax25/af_ax25.c:1282) __sys_connect_file (net/socket.c:2045) __sys_connect (net/socket.c:2064) __x64_sys_connect (net/socket.c:2067) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) ================================================================ When socket is bound, refcounts must be incremented the way it is done in ax25_bind() and ax25_setsockopt() (SO_BINDTODEVICE). In case of autobind, the refcounts are not incremented. This bug leads to the following issue reported by Syzkaller: ================================================================ ax25_connect(): syz-executor318 uses autobind, please contact jreuter@yaina.de ------------[ cut here ]------------ refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 5317 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 Modules linked in: CPU: 0 UID: 0 PID: 5317 Comm: syz-executor318 Not tainted 6.14.0-rc4-syzkaller-00278-gece144f151ac #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 ... Call Trace: <TASK> __refcount_dec include/linux/refcount.h:336 [inline] refcount_dec include/linux/refcount.h:351 [inline] ref_tracker_free+0x6af/0x7e0 lib/ref_tracker.c:236 netdev_tracker_free include/linux/netdevice.h:4302 [inline] netdev_put include/linux/netdevice.h:4319 [inline] ax25_release+0x368/0x960 net/ax25/af_ax25.c:1080 __sock_release net/socket.c:647 [inline] sock_close+0xbc/0x240 net/socket.c:1398 __fput+0x3e9/0x9f0 fs/file_table.c:464 __do_sys_close fs/open.c:1580 [inline] __se_sys_close fs/open.c:1565 [inline] __x64_sys_close+0x7f/0x110 fs/open.c:1565 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... </TASK> ================================================================ Considering the issues above and the comments left in the code that say: "check if we can remove this feature. It is broken."; "autobinding in this may or may not work"; - it is better to completely remove this feature than to fix it because it is broken and leads to various kinds of memory bugs. Now calling connect() without first binding socket will result in an error (-EINVAL). Userspace software that relies on the autobind feature might get broken. However, this feature does not seem widely used with this specific driver as it was not reliable at any point of time, and it is already broken anyway. E.g. ax25-tools and ax25-apps packages for popular distributions do not use the autobind feature for AF_AX25. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+33841dc6aa3e1d86b78a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=33841dc6aa3e1d86b78a Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-24gpio: TODO: add an item to track reworking the sysfs interfaceBartosz Golaszewski
It seems there really exists the need for a simple sysfs interface that can be easily used from minimal initramfs images that don't contain much more than busybox. However the current interface poses a challenge to the removal of global GPIO numberspace. Add an item that tracks extending the existing ABI with a per-chip export/unexport attribute pair. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-6-7b38f07110ee@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-24gpio: TODO: add an item to track the conversion to the new value settersBartosz Golaszewski
Add an item tracking the treewide conversion of GPIO drivers to using the new line value setter callbacks in struct gpio_chip instead of the old ones that don't allow drivers to signal failures to callers. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-5-7b38f07110ee@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-24gpio: TODO: add delimiters between tasks for better readabilityBartosz Golaszewski
For better readability of the TODO, let's add some graphical delimiters between tasks. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-4-7b38f07110ee@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-24gpio: TODO: remove the pinctrl integration taskBartosz Golaszewski
While there are surely some arguments in favor of integrating the GPIO and pinctrl subsystems into one, I believe this is not the right approach. The GPIO subsystem uses intricate locking with SRCU to handle the fact that both consumers and providers may run in different contexts. Pin-controller drivers are always meant to run in process context. This alone is a huge obstacle to any attempt at integration as evident by many problems we already encountered during the hotplug rework. The current glue code is pretty minimal and for most part already allows GPIO controllers to query pinctrl about the information they need. I suggest to drop this task and keep the subsystems separate even if many pin-controllers implement GPIO functionality in addition to pin functions. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-3-7b38f07110ee@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-24gpio: TODO: remove task duplicationBartosz Golaszewski
The removal of linux/gpio.h is already tracked by the item about converting drivers to using the descriptor-based API. Remove the duplicate. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-2-7b38f07110ee@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-24gpio: TODO: remove the item about the new debugfs interfaceBartosz Golaszewski
The consensus among core GPIO stakeholders seems to be that a new debugfs interface will only increase maintenance burden and will fail to attract users that care about long-term stability of the ABI[1]. Let's not go this way and not add a fourth user-facing interface to the GPIO subsystem. [1] https://lore.kernel.org/all/9d3f1ca4-d865-45af-9032-c38cacc7fe93@pengutronix.de/ Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-1-7b38f07110ee@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-23tracing: Use hashtable.h for event_hashSasha Levin
Convert the event_hash array in trace_output.c to use the generic hashtable implementation from hashtable.h instead of the manually implemented hash table. This simplifies the code and makes it more maintainable by using the standard hashtable API defined in hashtable.h. Rename EVENT_HASHSIZE to EVENT_HASH_BITS to properly reflect its new meaning as the number of bits for the hashtable size. Link: https://lore.kernel.org/20250323132800.3010783-1-sashal@kernel.org Link: https://lore.kernel.org/20250319190545.3058319-1-sashal@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-23tracing: Ensure module defining synth event cannot be unloaded while tracingDouglas Raillard
Currently, using synth_event_delete() will fail if the event is being used (tracing in progress), but that is normally done in the module exit function. At that stage, failing is problematic as returning a non-zero status means the module will become locked (impossible to unload or reload again). Instead, ensure the module exit function does not get called in the first place by increasing the module refcnt when the event is enabled. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 35ca5207c2d11 ("tracing: Add synthetic event command generation functions") Link: https://lore.kernel.org/20250318180906.226841-1-douglas.raillard@arm.com Signed-off-by: Douglas Raillard <douglas.raillard@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-23tracing: fix return value in __ftrace_event_enable_disable for ↵Gabriele Paoloni
TRACE_REG_UNREGISTER When __ftrace_event_enable_disable invokes the class callback to unregister the event, the return value is not reported up to the caller, hence leading to event unregister failures being silently ignored. This patch assigns the ret variable to the invocation of the event unregister callback, so that its return value is stored and reported to the caller, and it raises a warning in case of error. Link: https://lore.kernel.org/20250321170821.101403-1-gpaoloni@redhat.com Signed-off-by: Gabriele Paoloni <gpaoloni@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-23tracing/osnoise: Fix possible recursive locking for cpus_read_lock()Ran Xiaokai
Lockdep reports this deadlock log: osnoise: could not start sampling thread ============================================ WARNING: possible recursive locking detected -------------------------------------------- CPU0 ---- lock(cpu_hotplug_lock); lock(cpu_hotplug_lock); Call Trace: <TASK> print_deadlock_bug+0x282/0x3c0 __lock_acquire+0x1610/0x29a0 lock_acquire+0xcb/0x2d0 cpus_read_lock+0x49/0x120 stop_per_cpu_kthreads+0x7/0x60 start_kthread+0x103/0x120 osnoise_hotplug_workfn+0x5e/0x90 process_one_work+0x44f/0xb30 worker_thread+0x33e/0x5e0 kthread+0x206/0x3b0 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20 </TASK> This is the deadlock scenario: osnoise_hotplug_workfn() guard(cpus_read_lock)(); // first lock call start_kthread(cpu) if (IS_ERR(kthread)) { stop_per_cpu_kthreads(); { cpus_read_lock(); // second lock call. Cause the AA deadlock } } It is not necessary to call stop_per_cpu_kthreads() which stops osnoise kthread for every other CPUs in the system if a failure occurs during hotplug of a certain CPU. For start_per_cpu_kthreads(), if the start_kthread() call fails, this function calls stop_per_cpu_kthreads() to handle the error. Therefore, similarly, there is no need to call stop_per_cpu_kthreads() again within start_kthread(). So just remove stop_per_cpu_kthreads() from start_kthread to solve this issue. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250321095249.2739397-1-ranxiaokai627@163.com Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-23tracing: Align synth event print fmtDouglas Raillard
The vast majority of ftrace event print fmt consist of a space-separated field=value pair. Synthetic event currently use a comma-separated field=value pair, which sticks out from events created via more classical means. Align the format of synth events so they look just like any other event, for better consistency and less headache when doing crude text-based data processing. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250319215028.1680278-1-douglas.raillard@arm.com Signed-off-by: Douglas Raillard <douglas.raillard@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-23dt-bindings: timer: Add SiFive CLINT2Nick Hu
Add compatible string and property for the SiFive CLINT v2. The SiFive CLINT v2 is incompatible with the SiFive CLINT v0 due to differences in their control methods. Signed-off-by: Nick Hu <nick.hu@sifive.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250321083507.25298-1-nick.hu@sifive.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2025-03-23netfilter: nf_tables: Only use nf_skip_indirect_calls() when ↵WangYuli
MITIGATION_RETPOLINE 1. MITIGATION_RETPOLINE is x86-only (defined in arch/x86/Kconfig), so no need to AND with CONFIG_X86 when checking if enabled. 2. Remove unused declaration of nf_skip_indirect_calls() when MITIGATION_RETPOLINE is disabled to avoid warnings. 3. Declare nf_skip_indirect_calls() and nf_skip_indirect_calls_enable() as inline when MITIGATION_RETPOLINE is enabled, as they are called only once and have simple logic. Fix follow error with clang-21 when W=1e: net/netfilter/nf_tables_core.c:39:20: error: unused function 'nf_skip_indirect_calls' [-Werror,-Wunused-function] 39 | static inline bool nf_skip_indirect_calls(void) { return false; } | ^~~~~~~~~~~~~~~~~~~~~~ 1 error generated. make[4]: *** [scripts/Makefile.build:207: net/netfilter/nf_tables_core.o] Error 1 make[3]: *** [scripts/Makefile.build:465: net/netfilter] Error 2 make[3]: *** Waiting for unfinished jobs.... Fixes: d8d760627855 ("netfilter: nf_tables: add static key to skip retpoline workarounds") Co-developed-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-23netfilter: socket: Lookup orig tuple for IPv6 SNATMaxim Mikityanskiy
nf_sk_lookup_slow_v4 does the conntrack lookup for IPv4 packets to restore the original 5-tuple in case of SNAT, to be able to find the right socket (if any). Then socket_match() can correctly check whether the socket was transparent. However, the IPv6 counterpart (nf_sk_lookup_slow_v6) lacks this conntrack lookup, making xt_socket fail to match on the socket when the packet was SNATed. Add the same logic to nf_sk_lookup_slow_v6. IPv6 SNAT is used in Kubernetes clusters for pod-to-world packets, as pods' addresses are in the fd00::/8 ULA subnet and need to be replaced with the node's external address. Cilium leverages Envoy to enforce L7 policies, and Envoy uses transparent sockets. Cilium inserts an iptables prerouting rule that matches on `-m socket --transparent` and redirects the packets to localhost, but it fails to match SNATed IPv6 packets due to that missing conntrack lookup. Closes: https://github.com/cilium/cilium/issues/37932 Fixes: eb31628e37a0 ("netfilter: nf_tables: Add support for IPv6 NAT") Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-23netfilter: xtables: Use strscpy() instead of strscpy_pad()Thorsten Blum
kzalloc() already zero-initializes the destination buffer, making strscpy() sufficient for safely copying the name. The additional NUL- padding performed by strscpy_pad() is unnecessary. The size parameter is optional, and strscpy() automatically determines the size of the destination buffer using sizeof() if the argument is omitted. This makes the explicit sizeof() call unnecessary; remove it. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-23netfilter: nfnetlink_queue: Initialize ctx to avoid memory allocation errorChenyuan Yang
It is possible that ctx in nfqnl_build_packet_message() could be used before it is properly initialize, which is only initialized by nfqnl_get_sk_secctx(). This patch corrects this problem by initializing the lsmctx to a safe value when it is declared. This is similar to the commit 35fcac7a7c25 ("audit: Initialize lsmctx to avoid memory allocation error"). Fixes: 2d470c778120 ("lsm: replace context+len with lsm_context") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-22Merge tag 'i2c-for-6.14-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "Fix double free of irq in amd-mp2 driver" * tag 'i2c-for-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: amd-mp2: drop free_irq() of devm_request_irq() allocated irq
2025-03-22Merge tag 'perf-urgent-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf events fix from Ingo Molnar: "Fix an information leak regression in the AMD IBS PMU code" * tag 'perf-urgent-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/amd/ibs: Prevent leaking sensitive data to userspace
2025-03-22Merge tag 'keys-next-6.14-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull keys fix from Jarkko Sakkinen: "Fix potential use-after-free in key_put()" * tag 'keys-next-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: keys: Fix UAF in key_put()
2025-03-22Merge tag 'io_uring-6.14-20250322' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix for the commit that went into your tree yesterday, which exposed an issue with not always clearing notifications. That could cause them to be used more than once" * tag 'io_uring-6.14-20250322' of git://git.kernel.dk/linux: io_uring/net: fix sendzc double notif flush