Age | Commit message (Collapse) | Author |
|
Otherwise, tg->scx.weight can go out of sync while scx_cgroup is not enabled
and ops.cgroup_init() may be called with a stale weight value.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 819513666966 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
|
|
check_directory_structure runs after check_dirents, so it expects that
it won't see any inodes with missing backpointers - normally.
But online fsck can't run check_dirents yet, or the user might only be
running a specific pass, so we need to be careful that this isn't an
error. If an inode is unreachable, that's handled by a separate pass.
Also, add a new 'bch2_inode_has_backpointer()' helper, since we were
doing this inconsistently.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The obj_event may be loaded immediately after inserted, then if the
list_head is not initialized then we may get a poisonous pointer. This
fixes the crash below:
mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
mlx5_core.sf mlx5_core.sf.4: firmware version: 32.38.3056
mlx5_core 0000:03:00.0 en3f0pf0sf2002: renamed from eth0
mlx5_core.sf mlx5_core.sf.4: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
IPv6: ADDRCONF(NETDEV_CHANGE): en3f0pf0sf2002: link becomes ready
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060
Mem abort info:
ESR = 0x96000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000007760fb000
[0000000000000060] pgd=000000076f6d7003, p4d=000000076f6d7003, pud=0000000777841003, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] SMP
Modules linked in: ipmb_host(OE) act_mirred(E) cls_flower(E) sch_ingress(E) mptcp_diag(E) udp_diag(E) raw_diag(E) unix_diag(E) tcp_diag(E) inet_diag(E) binfmt_misc(E) bonding(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) isofs(E) cdrom(E) mst_pciconf(OE) ib_umad(OE) mlx5_ib(OE) ipmb_dev_int(OE) mlx5_core(OE) kpatch_15237886(OEK) mlxdevm(OE) auxiliary(OE) ib_uverbs(OE) ib_core(OE) psample(E) mlxfw(OE) tls(E) sunrpc(E) vfat(E) fat(E) crct10dif_ce(E) ghash_ce(E) sha1_ce(E) sbsa_gwdt(E) virtio_console(E) ext4(E) mbcache(E) jbd2(E) xfs(E) libcrc32c(E) mmc_block(E) virtio_net(E) net_failover(E) failover(E) sha2_ce(E) sha256_arm64(E) nvme(OE) nvme_core(OE) gpio_mlxbf3(OE) mlx_compat(OE) mlxbf_pmc(OE) i2c_mlxbf(OE) sdhci_of_dwcmshc(OE) pinctrl_mlxbf3(OE) mlxbf_pka(OE) gpio_generic(E) i2c_core(E) mmc_core(E) mlxbf_gige(OE) vitesse(E) pwr_mlxbf(OE) mlxbf_tmfifo(OE) micrel(E) mlxbf_bootctl(OE) virtio_ring(E) virtio(E) ipmi_devintf(E) ipmi_msghandler(E)
[last unloaded: mst_pci]
CPU: 11 PID: 20913 Comm: rte-worker-11 Kdump: loaded Tainted: G OE K 5.10.134-13.1.an8.aarch64 #1
Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.2.2.12968 Oct 26 2023
pstate: a0400089 (NzCv daIf +PAN -UAO -TCO BTYPE=--)
pc : dispatch_event_fd+0x68/0x300 [mlx5_ib]
lr : devx_event_notifier+0xcc/0x228 [mlx5_ib]
sp : ffff80001005bcf0
x29: ffff80001005bcf0 x28: 0000000000000001
x27: ffff244e0740a1d8 x26: ffff244e0740a1d0
x25: ffffda56beff5ae0 x24: ffffda56bf911618
x23: ffff244e0596a480 x22: ffff244e0596a480
x21: ffff244d8312ad90 x20: ffff244e0596a480
x19: fffffffffffffff0 x18: 0000000000000000
x17: 0000000000000000 x16: ffffda56be66d620
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000040 x10: ffffda56bfcafb50
x9 : ffffda5655c25f2c x8 : 0000000000000010
x7 : 0000000000000000 x6 : ffff24545a2e24b8
x5 : 0000000000000003 x4 : ffff80001005bd28
x3 : 0000000000000000 x2 : 0000000000000000
x1 : ffff244e0596a480 x0 : ffff244d8312ad90
Call trace:
dispatch_event_fd+0x68/0x300 [mlx5_ib]
devx_event_notifier+0xcc/0x228 [mlx5_ib]
atomic_notifier_call_chain+0x58/0x80
mlx5_eq_async_int+0x148/0x2b0 [mlx5_core]
atomic_notifier_call_chain+0x58/0x80
irq_int_handler+0x20/0x30 [mlx5_core]
__handle_irq_event_percpu+0x60/0x220
handle_irq_event_percpu+0x3c/0x90
handle_irq_event+0x58/0x158
handle_fasteoi_irq+0xfc/0x188
generic_handle_irq+0x34/0x48
...
Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX")
Link: https://patch.msgid.link/r/3ce7f20e0d1a03dc7de6e57494ec4b8eaf1f05c2.1750147949.git.leon@kernel.org
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The GID cache warning messages can flood the kernel log when there are
multiple failed attempts to add GIDs. This can happen when creating many
virtual interfaces without having enough space for their GIDs in the GID
table.
Change pr_warn to pr_warn_ratelimited to prevent log flooding while still
maintaining visibility of the issue.
Link: https://patch.msgid.link/r/fd45ed4a1078e743f498b234c3ae816610ba1b18.1750062357.git.leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
__xa_store() and __xa_erase() were used without holding the proper lock,
which led to a lockdep warning due to unsafe RCU usage. This patch
replaces them with xa_store() and xa_erase(), which perform the necessary
locking internally.
=============================
WARNING: suspicious RCPU usage
6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1 Not tainted
-----------------------------
./include/linux/xarray.h:1211 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
3 locks held by kworker/u136:0/219:
at: process_one_work+0xbe4/0x15f0
process_one_work+0x75c/0x15f0
pagefault_mr+0x9a5/0x1390 [mlx5_ib]
stack backtrace:
CPU: 14 UID: 0 PID: 219 Comm: kworker/u136:0 Not tainted
6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
Call Trace:
dump_stack_lvl+0xa8/0xc0
lockdep_rcu_suspicious+0x1e6/0x260
xas_create+0xb8a/0xee0
xas_store+0x73/0x14c0
__xa_store+0x13c/0x220
? xa_store_range+0x390/0x390
? spin_bug+0x1d0/0x1d0
pagefault_mr+0xcb5/0x1390 [mlx5_ib]
? _raw_spin_unlock+0x1f/0x30
mlx5_ib_eqe_pf_action+0x3be/0x2620 [mlx5_ib]
? lockdep_hardirqs_on_prepare+0x400/0x400
? mlx5_ib_invalidate_range+0xcb0/0xcb0 [mlx5_ib]
process_one_work+0x7db/0x15f0
? pwq_dec_nr_in_flight+0xda0/0xda0
? assign_work+0x168/0x240
worker_thread+0x57d/0xcd0
? rescuer_thread+0xc40/0xc40
kthread+0x3b3/0x800
? kthread_is_per_cpu+0xb0/0xb0
? lock_downgrade+0x680/0x680
? do_raw_spin_lock+0x12d/0x270
? spin_bug+0x1d0/0x1d0
? finish_task_switch.isra.0+0x284/0x9e0
? lockdep_hardirqs_on_prepare+0x284/0x400
? kthread_is_per_cpu+0xb0/0xb0
ret_from_fork+0x2d/0x70
? kthread_is_per_cpu+0xb0/0xb0
ret_from_fork_asm+0x11/0x20
Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free")
Link: https://patch.msgid.link/r/a85ddd16f45c8cb2bc0a188c2b0fcedfce975eb8.1750061791.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
On some systems with Nahum 11 and Nahum 13 the value of the XTAL clock in
the software STRAP is incorrect. This causes the PTP timer to run at the
wrong rate and can lead to synchronization issues.
The STRAP value is configured by the system firmware, and a firmware
update is not always possible. Since the XTAL clock on these systems
always runs at 38.4MHz, the driver may ignore the STRAP and just set
the correct value.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Reviewed-by: Gil Fine <gil.fine@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add simple eswitch mode checker in attaching VF procedure and allocate
required port representor memory structures only in switchdev mode.
The reset flows triggers VF (if present) detach/attach procedure.
It might involve VF port representor(s) re-creation if the device is
configured is switchdev mode (not legacy one).
The memory was blindly allocated in current implementation,
regardless of the mode and not freed if in legacy mode.
Kmemeleak trace:
unreferenced object (percpu) 0x7e3bce5b888458 (size 40):
comm "bash", pid 1784, jiffies 4295743894
hex dump (first 32 bytes on cpu 45):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
pcpu_alloc_noprof+0x4c4/0x7c0
ice_repr_create+0x66/0x130 [ice]
ice_repr_create_vf+0x22/0x70 [ice]
ice_eswitch_attach_vf+0x1b/0xa0 [ice]
ice_reset_all_vfs+0x1dd/0x2f0 [ice]
ice_pci_err_resume+0x3b/0xb0 [ice]
pci_reset_function+0x8f/0x120
reset_store+0x56/0xa0
kernfs_fop_write_iter+0x120/0x1b0
vfs_write+0x31c/0x430
ksys_write+0x61/0xd0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Testing hints (ethX is PF netdev):
- create at least one VF
echo 1 > /sys/class/net/ethX/device/sriov_numvfs
- trigger the reset
echo 1 > /sys/class/net/ethX/device/reset
Fixes: 415db8399d06 ("ice: make representor code generic")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This patch fixes an issue seen in a large-scale deployment under heavy
incoming pkts where the aRFS flow wrongly matches a flow and reprograms the
NIC with wrong settings. That mis-steering causes RX-path latency spikes
and noisy neighbor effects when many connections collide on the same
hash (some of our production servers have 20-30K connections).
set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by
hashing the skb sized by the per rx-queue table size. This results in
multiple connections (even across different rx-queues) getting the same
hash value. The driver steer function modifies the wrong flow to use this
rx-queue, e.g.: Flow#1 is first added:
Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10
Later when a new flow needs to be added:
Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20
The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This
results in both flows getting un-optimized - packets for Flow#1 goes to
q#20, and then reprogrammed back to q#10 later and so on; and Flow #2
programming is never done as Flow#1 is matched first for all misses. Many
flows may wrongly share the same hash and reprogram rules of the original
flow each with their own q#.
Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf
clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s
72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS,
enable aRFS (global value is 144 * rps_flow_cnt).
Test notes about results from ice_rx_flow_steer():
---------------------------------------------------
1. "Skip:" counter increments here:
if (fltr_info->q_index == rxq_idx ||
arfs_entry->fltr_state != ICE_ARFS_ACTIVE)
goto out;
2. "Add:" counter increments here:
ret = arfs_entry->fltr_info.fltr_id;
INIT_HLIST_NODE(&arfs_entry->list_entry);
3. "Update:" counter increments here:
/* update the queue to forward to on an already existing flow */
Runtime comparison: original code vs with the patch for different
rps_flow_cnt values.
+-------------------------------+--------------+--------------+
| rps_flow_cnt | 512 | 2048 |
+-------------------------------+--------------+--------------+
| Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K |
| Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K |
| CPU User | 216 vs 183 | 216 vs 206 |
| CPU System | 1441 vs 1171 | 1447 vs 1320 |
| CPU Softirq | 1245 vs 920 | 1238 vs 961 |
| CPU Total | 29 vs 22.7 | 29 vs 24.9 |
| aRFS Update | 533K vs 59 | 521K vs 32 |
| aRFS Skip | 82M vs 77M | 7.2M vs 4.5M |
+-------------------------------+--------------+--------------+
A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections
showed no performance degradation.
Some points on the patch/aRFS behavior:
1. Enabling full tuple matching ensures flows are always correctly matched,
even with smaller hash sizes.
2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs
and fewer calls to driver for programming on misses.
3. Larger hash tables reduces mis-steering due to more unique flow hashes,
but still has clashes. However, with larger per-device rps_flow_cnt, old
flows take more time to expire and new aRFS flows cannot be added if h/w
limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt
pkts have been processed by this cpu that are not part of the flow).
Fixes: 28bf26724fdb0 ("ice: Implement aRFS")
Signed-off-by: Krishna Kumar <krikku@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The recent change which added READ_ONCE_NOCHECK() to read the nth entry
from the kernel stack incorrectly dropped dereferencing of the stack
pointer in order to read the requested entry.
In result the address of the entry is returned instead of its content.
Dereference the pointer again to fix this.
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/r/20250612163331.GA13384@willie-the-truck
Fixes: d93a855c31b7 ("s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()")
Cc: stable@vger.kernel.org
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Fixes the power regulator retrieval in tcan4x5x_can_probe() by ensuring
the regulator pointer is not set to NULL in the successful return from
devm_regulator_get_optional().
Fixes: 3814ca3a10be ("can: tcan4x5x: tcan4x5x_can_probe(): turn on the power before parsing the config")
Signed-off-by: Brett Werling <brett.werling@garmin.com>
Link: https://patch.msgid.link/20250612191825.3646364-1-brett.werling@garmin.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
btree node scrub was sometimes failing to rewrite nodes with errors;
bch2_btree_node_rewrite() can return a transaction restart and we
weren't checking - the lockrestart_do() needs to wrap the entire
operation.
And there's a better helper it should've been using,
bch2_btree_node_rewrite_key(), which makes all this more convenient.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
LSM hooks such as security_path_mknod() and security_inode_rename() have
access to newly allocated negative dentry, which has NULL d_inode.
Therefore, it is necessary to do the NULL pointer check for d_inode.
Also add selftests that checks the verifier enforces the NULL pointer
check.
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20250613052857.1992233-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Move the VERW clearing before the MONITOR so that VERW doesn't disarm it
and the machine never enters C1.
Original idea by Kim Phillips <kim.phillips@amd.com>.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Synthesize the TSA CPUID feature bits for guests. Set TSA_{SQ,L1}_NO on
unaffected machines.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Add the required features detection glue to bugs.c et all in order to
support the TSA mitigation.
Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
|
|
PERCPU_MODULE_RESERVE defines the maximum size that can by used for the
per-CPU data size used by modules. This is 8KiB.
Commit 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into
one") restructured the per-CPU memory allocation for the module and
moved the separate alloc_percpu() invocations at module init time to a
static per-CPU variable which is allocated by the module loader.
The size of the per-CPU data section for openvswitch is 6488 bytes which
is ~80% of the available per-CPU memory. Together with a few other
modules it is easy to exhaust the available 8KiB of memory.
Allocate ovs_pcpu_storage dynamically at module init time.
Reported-by: Gal Pressman <gal@nvidia.com>
Closes: https://lore.kernel.org/all/c401e017-f8db-4f57-a1cd-89beb979a277@nvidia.com
Fixes: 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into one")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250613123629.-XSoQTCu@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
A recent commit moved the error handling of sqpoll thread and tctx
failures into the thread itself, as part of fixing an issue. However, it
missed that tctx allocation may also fail, and that
io_sq_offload_create() does its own error handling for the task_struct
in that case.
Remove the manual task putting in io_sq_offload_create(), as
io_sq_thread() will notice that the tctx did not get setup and hence it
should put itself and exit.
Reported-by: syzbot+763e12bbf004fb1062e4@syzkaller.appspotmail.com
Fixes: ac0b8b327a56 ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This function exists in both tctx.h (where it belongs) and in io_uring.h
as a remnant of before the tctx handling code got split out. Remove the
io_uring.h definition and ensure that sqpoll.c includes the tctx.h
header to get the definition.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix warnings reported by sparse, related to incorrect type:
drivers/platform/mellanox/mlxbf-tmfifo.c:284:38: warning: incorrect type in assignment (different base types)
drivers/platform/mellanox/mlxbf-tmfifo.c:284:38: expected restricted __virtio32 [usertype] len
drivers/platform/mellanox/mlxbf-tmfifo.c:284:38: got unsigned long
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404040339.S7CUIgf3-lkp@intel.com/
Fixes: 78034cbece79 ("platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20250613214608.2250130-1-davthompson@nvidia.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Without explicitly setting a parent for the watchdog device, the device is
registered with a NULL parent. This causes device_add() (called internally
by devm_watchdog_register_device()) to register the device under
/sys/devices/virtual, since no parent is provided. The result is:
DEVPATH=/devices/virtual/watchdog/watchdog0
To fix this, assign &pdev->dev as the parent of the watchdog device before
calling devm_watchdog_register_device(). This ensures the device is
associated with the Portwell EC platform device and placed correctly in
sysfs as:
DEVPATH=/devices/platform/portwell-ec/watchdog/watchdog0
This aligns the device hierarchy with expectations and avoids misplacement
under the virtual class.
Fixes: 835796753310 ("platform/x86: portwell-ec: Add GPIO and WDT driver for Portwell EC")
Signed-off-by: Ivan Hu <ivan.hu@canonical.com>
Link: https://lore.kernel.org/r/20250616074819.63547-1-ivan.hu@canonical.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
There's really no value in the WARN stack trace etc., the reason
for this happening isn't directly related to the calling function
anyway. Also, syzbot has been observing it constantly, and there's
no way we can resolve it there - those systems are just slow.
Instead print an error message (once) and add a comment about what
really causes this message.
Reported-by: syzbot+468656785707b0e995df@syzkaller.appspotmail.com
Reported-by: syzbot+18c783c5cf6a781e3e2c@syzkaller.appspotmail.com
Reported-by: syzbot+d5924d5cffddfccab68e@syzkaller.appspotmail.com
Reported-by: syzbot+7d73d99525d1ff7752ef@syzkaller.appspotmail.com
Reported-by: syzbot+8e6e002c74d1927edaf5@syzkaller.appspotmail.com
Reported-by: syzbot+97254a3b10c541879a65@syzkaller.appspotmail.com
Reported-by: syzbot+dfd1fd46a1960ad9c6ec@syzkaller.appspotmail.com
Reported-by: syzbot+85e0b8d12d9ca877d806@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20250617104902.146e10919be1.I85f352ca4a2dce6f556e5ff45ceaa5f3769cb5ce@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In OCB, don't accept frames from invalid source addresses
(and in particular don't try to create stations for them),
drop the frames instead.
Reported-by: syzbot+8b512026a7ec10dcbdd9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/6788d2d9.050a0220.20d369.0028.GAE@google.com/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: syzbot+8b512026a7ec10dcbdd9@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20250616171838.7433379cab5d.I47444d63c72a0bd58d2e2b67bb99e1fea37eec6f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
All of these are really meant to be variable-length, and
in the case of s1g_beacon it's actually accessed. Make that
one in particular, and a couple of others (that aren't used
as arrays now), actually variable.
Reported-by: syzbot+fd222bb38e916df26fa4@syzkaller.appspotmail.com
Fixes: 1e1f706fc2ce ("wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements")
Link: https://patch.msgid.link/20250614003037.a3e82e882251.I2e8b58e56ff2a9f8b06c66f036578b7c1d4e4685@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When aoe's rexmit_timer() notices that an aoe target fails to respond to
commands for more than aoe_deadsecs, it calls aoedev_downdev() which
cleans the outstanding aoe and block queues. This can involve sleeping,
such as in blk_mq_freeze_queue(), which should not occur in irq context.
This patch defers that aoedev_downdev() call to the aoe device's
workqueue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
Link: https://lore.kernel.org/r/20250610170600.869-2-jsanders.devel@gmail.com
Tested-By: Valentin Kleibel <valentin@vrvis.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
An aoe device's rq_list contains accepted block requests that are
waiting to be transmitted to the aoe target. This queue was added as
part of the conversion to blk_mq. However, the queue was not cleaned out
when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
indefinitely waiting for those requests to complete, causing a hang. This
fix cleans out the queue before calling blk_mq_freeze_queue().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
Link: https://lore.kernel.org/r/20250610170600.869-1-jsanders.devel@gmail.com
Tested-By: Valentin Kleibel <valentin@vrvis.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
We have provided generic .mmap_prepare() equivalents, so update all file
systems that specify these directly in their file_operations structures.
This updates 9p, adfs, affs, bfs, fat, hfs, hfsplus, hostfs, hpfs, jffs2,
jfs, minix, omfs, ramfs and ufs file systems directly.
It updates generic_ro_fops which impacts qnx4, cramfs, befs, squashfs,
frebxfs, qnx6, efs, romfs, erofs and isofs file systems.
There are remaining file systems which use generic hooks in a less direct
way which we address in a subsequent commit.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/c7dc90e44a9e75e750939ea369290d6e441a18e6.1750099179.git.lorenzo.stoakes@oracle.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
The generic mmap handlers are very simple, so we can very easily convert
these in advance of converting file systems which use them.
This patch does so.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/30622c1f0b98c66840bc8c02668bda276a810b70.1750099179.git.lorenzo.stoakes@oracle.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
This callback is invoked in the mmap() logic far earlier, so error handling
can be performed more safely without complicated and bug-prone state
unwinding required should an error arise.
This hook also avoids passing a pointer to a not-yet-correctly-established
VMA avoiding any issues with referencing this data structure.
It rather provides a pointer to the new struct vm_area_desc descriptor type
which contains all required state and allows easy setting of required
parameters without any consideration needing to be paid to locking or
reference counts.
Note that nested filesystems like overlayfs are compatible with an
.mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare()
compatibility layer for nested file systems").
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/cba8b29ba5f225df8f63f50182d5f6e0fcf94456.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
This callback is invoked in the mmap() logic far earlier, so error handling
can be performed more safely without complicated and bug-prone state
unwinding required should an error arise.
This hook also avoids passing a pointer to a not-yet-correctly-established
VMA avoiding any issues with referencing this data structure.
It rather provides a pointer to the new struct vm_area_desc descriptor type
which contains all required state and allows easy setting of required
parameters without any consideration needing to be paid to locking or
reference counts.
Note that nested filesystems like overlayfs are compatible with an
.mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare()
compatibility layer for nested file systems").
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/5abfe526032a6698fd1bcd074a74165cda7ea57c.1750099179.git.lorenzo.stoakes@oracle.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This is a prerequisite for adapting those filesystems to use the
.mmap_prepare() hook for mmap()'ing which invoke this check as this hook
does not have access to a VMA pointer.
To effect this, change the signature of daxdev_mapping_supported() and
update its callers (ext4 and xfs mmap()'ing hook code).
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/b09de1e8544384074165d92d048e80058d971286.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
Additionally, commit bb666b7c2707 ("mm: add mmap_prepare() compatibility
layer for nested file systems") permits the use of the .mmap_prepare() hook
even in nested filesystems like overlayfs.
There are a number of places where we check only for f_op->mmap - this is
incorrect now mmap_prepare exists, so update all of these to use the
general helper can_mmap_file().
Most notably, this updates the elf logic to allow for the ability to
execute binaries on filesystems which have the .mmap_prepare hook, but
additionally we update nested filesystems.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/b68145b609532e62bab603dd9686faa6562046ec.1750099179.git.lorenzo.stoakes@oracle.com
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
Therefore, update the check for file operations supporting mmap() by using
the file_has_valid_mmap_hooks() helper function, which checks for either
f_op->mmap or f_op->mmap_prepare rather than checking only for f_op->mmap
directly.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/5f120b644b5890d1b50202d0f0d4c9f0d6b62873.1750099179.git.lorenzo.stoakes@oracle.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The call_mmap() function violates the existing convention in
include/linux/fs.h whereby invocations of virtual file system hooks is
performed by functions prefixed with vfs_xxx().
Correct this by renaming call_mmap() to vfs_mmap(). This also avoids
confusion as to the fact that f_op->mmap_prepare may be invoked here.
Also rename __call_mmap_prepare() function to vfs_mmap_prepare() and adjust
to accept a file parameter, this is useful later for nested file systems.
Finally, fix up the VMA userland tests and ensure the mmap_prepare -> mmap
shim is implemented there.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/8d389f4994fa736aa8f9172bef8533c10a9e9011.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.16-rc2
tegra: fix YAML conversion of device tree bindings
|
|
Asus ROG STRIX B550-F GAMING (WI-FI) motherboard has problems on some
SATA ports with at least one hard drive model (WDC WD20EFAX-68FB5N0)
when LPM is enabled. Disabling LPM solves the issue.
Cc: stable@vger.kernel.org
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type")
Signed-off-by: Mikko Korhonen <mjkorhon@gmail.com>
Link: https://lore.kernel.org/r/20250617062055.784827-1-mjkorhon@gmail.com
[cassel: more detailed comment, make single line comments consistent]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
The second argument to dev_err_probe() is the error value. Pass the
return value of devm_request_threaded_irq() there instead of the irq
number.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Fixes: c47f7ff0fe61 ("gpio: pca953x: Utilise dev_err_probe() where it makes sense")
Link: https://lore.kernel.org/r/20250616134503.1201138-1-s.hauer@pengutronix.de
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
The SBI specification clearly states that SBI HFENCE calls should
return SBI_ERR_NOT_SUPPORTED when one of the target hart doesn’t
support hypervisor extension (aka nested virtualization in-case
of KVM RISC-V).
Fixes: c7fa3c48de86 ("RISC-V: KVM: Treat SBI HFENCE calls as NOPs")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250605061458.196003-3-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
As-per the SBI specification, an SBI remote fence operation applies
to the entire address space if either:
1) start_addr and size are both 0
2) size is equal to 2^XLEN-1
>From the above, only #1 is checked by SBI SFENCE calls so fix the
size parameter check in SBI SFENCE calls to cover #2 as well.
Fixes: 13acfec2dbcc ("RISC-V: KVM: Add remote HFENCE functions based on VCPU requests")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250605061458.196003-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
We can only pass negative error codes to bch2_err_str(); if it's a
positive integer it's not an error and we trip an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still
active") ensured that active jobs are returned to the pending list when
extending the timeout. However, it didn't use the pending list's lock to
manipulate the list, which causes a race condition as the scheduler's
workqueues are running.
Hold the lock while manipulating the scheduler's pending list to prevent
a race.
Cc: stable@vger.kernel.org
Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active")
Reported-by: Philipp Stanner <phasta@kernel.org>
Closes: https://lore.kernel.org/dri-devel/964e59ba1539083ef29b06d3c78f5e2e9b138ab8.camel@mailbox.org/
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250602132240.93314-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
|
|
A path exists in a particular snapshot: we should do the pathwalk in the
snapshot ID of the inode we started from, _not_ change snapshot ID as we
walk inodes and dirents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can easily go from inode number -> path now, which makes for more
useful log messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Log the inode's new path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we find a directory connectivity problem, we should do the repair
in the oldest snapshot that has the issue - so that we don't end up
duplicating work or making a real mess of things.
Oldest snapshot IDs have the highest integer value, so - just walk
inodes in reverse order.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch_subvolume.fs_path_parent needs to be updated as well, it should
match inode.bi_parent_subvol.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The dirent will be in a different snapshot if the inode is a subvolume
root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The bch2_subvolume_get_snapshot() call needs to happen before the dirent
lookup - the dirent is in the parent subvolume.
Also, check for loops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
kthread creation checks for pending signals, which is _very_ annoying if
we have to do a long recovery and don't go rw until we've done
significant work.
Check if we'll be going rw and pre-allocate kthreads/workqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|