Age | Commit message (Collapse) | Author |
|
On shutdown when quotas are enabled, the shutdown can deadlock
trying to unpin the dquot buffer buf_log_item like so:
[ 3319.483590] task:kworker/20:0H state:D stack:14360 pid:1962230 tgid:1962230 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 3319.493966] Workqueue: xfs-log/dm-6 xlog_ioend_work
[ 3319.498458] Call Trace:
[ 3319.500800] <TASK>
[ 3319.502809] __schedule+0x699/0xb70
[ 3319.512672] schedule+0x64/0xd0
[ 3319.515573] schedule_timeout+0x30/0xf0
[ 3319.528125] __down_common+0xc3/0x200
[ 3319.531488] __down+0x1d/0x30
[ 3319.534186] down+0x48/0x50
[ 3319.540501] xfs_buf_lock+0x3d/0xe0
[ 3319.543609] xfs_buf_item_unpin+0x85/0x1b0
[ 3319.547248] xlog_cil_committed+0x289/0x570
[ 3319.571411] xlog_cil_process_committed+0x6d/0x90
[ 3319.575590] xlog_state_shutdown_callbacks+0x52/0x110
[ 3319.580017] xlog_force_shutdown+0x169/0x1a0
[ 3319.583780] xlog_ioend_work+0x7c/0xb0
[ 3319.587049] process_scheduled_works+0x1d6/0x400
[ 3319.591127] worker_thread+0x202/0x2e0
[ 3319.594452] kthread+0x20c/0x240
The CIL push has seen the deadlock, so it has aborted the push and
is running CIL checkpoint completion to abort all the items in the
checkpoint. This calls ->iop_unpin(remove = true) to clean up the
log items in the checkpoint.
When a buffer log item is unpined like this, it needs to lock the
buffer to run io completion to correctly fail the buffer and run all
the required completions to fail attached log items as well. In this
case, the attempt to lock the buffer on unpin is hanging because the
buffer is already locked.
I suspected a leaked XFS_BLI_HOLD state because of XFS_BLI_STALE
handling changes I was testing, so I went looking for
pin events on HOLD buffers and unpin events on locked buffer. That
isolated this one buffer with these two events:
xfs_buf_item_pin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 2 pincount 0 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags HOLD|DIRTY|LOGGED liflags DIRTY
....
xfs_buf_item_unpin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 4 pincount 1 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags DIRTY liflags ABORTED
Firstly, bbcount = 0x2, which means it is not a single sector
structure. That rules out every xfs_trans_bhold() case except one:
dquot buffers.
Then hung task dumping gave this trace:
[ 3197.312078] task:fsync-tester state:D stack:12080 pid:2051125 tgid:2051125 ppid:1643233 task_flags:0x400000 flags:0x00004002
[ 3197.323007] Call Trace:
[ 3197.325581] <TASK>
[ 3197.327727] __schedule+0x699/0xb70
[ 3197.334582] schedule+0x64/0xd0
[ 3197.337672] schedule_timeout+0x30/0xf0
[ 3197.350139] wait_for_completion+0xbd/0x180
[ 3197.354235] __flush_workqueue+0xef/0x4e0
[ 3197.362229] xlog_cil_force_seq+0xa0/0x300
[ 3197.374447] xfs_log_force+0x77/0x230
[ 3197.378015] xfs_qm_dqunpin_wait+0x49/0xf0
[ 3197.382010] xfs_qm_dqflush+0x55/0x460
[ 3197.385663] xfs_qm_dquot_isolate+0x29e/0x4d0
[ 3197.389977] __list_lru_walk_one+0x141/0x220
[ 3197.398867] list_lru_walk_one+0x10/0x20
[ 3197.402713] xfs_qm_shrink_scan+0x6a/0x100
[ 3197.406699] do_shrink_slab+0x18a/0x350
[ 3197.410512] shrink_slab+0xf7/0x430
[ 3197.413967] drop_slab+0x97/0xf0
[ 3197.417121] drop_caches_sysctl_handler+0x59/0xc0
[ 3197.421654] proc_sys_call_handler+0x18b/0x280
[ 3197.426050] proc_sys_write+0x13/0x20
[ 3197.429750] vfs_write+0x2b8/0x3e0
[ 3197.438532] ksys_write+0x7e/0xf0
[ 3197.441742] __x64_sys_write+0x1b/0x30
[ 3197.445363] x64_sys_call+0x2c72/0x2f60
[ 3197.449044] do_syscall_64+0x6c/0x140
[ 3197.456341] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Yup, another test run by check-parallel is running drop_caches
concurrently and the dquot shrinker for the hung filesystem is
running. That's trying to flush a dirty dquot from reclaim context,
and it waiting on a log force to complete. xfs_qm_dqflush is called
with the dquot buffer held locked, and so we've called
xfs_log_force() with that buffer locked.
Now the log force is waiting for a workqueue flush to complete, and
that workqueue flush is waiting of CIL checkpoint processing to
finish.
The CIL checkpoint processing is aborting all the log items it has,
and that requires locking aborted buffers to cancel them.
Now, normally this isn't a problem if we are issuing a log force
to unpin an object, because the ->iop_unpin() method wakes pin
waiters first. That results in the pin waiter finishing off whatever
it was doing, dropping the lock and then xfs_buf_item_unpin() can
lock the buffer and fail it.
However, xfs_qm_dqflush() is waiting on the -dquot- unpin event, not
the dquot buffer unpin event, and so it never gets woken and so does
not drop the buffer lock.
Inodes do not have this problem, as they can only be written from
one spot (->iop_push) whilst dquots can be written from multiple
places (memory reclaim, ->iop_push, xfs_dq_dqpurge, and quotacheck).
The reason that the dquot buffer has an attached buffer log item is
that it has been recently allocated. Initialisation of the dquot
buffer logs the buffer directly, thereby pinning it in memory. We
then modify the dquot in a separate operation, and have memory
reclaim racing with a shutdown and we trigger this deadlock.
check-parallel reproduces this reliably on 1kB FSB filesystems with
quota enabled because it does all of these things concurrently
without having to explicitly write tests to exercise these corner
case conditions.
xfs_qm_dquot_logitem_push() doesn't have this deadlock because it
checks if the dquot is pinned before locking the dquot buffer and
skipping it if it is pinned. This means the xfs_qm_dqunpin_wait()
log force in xfs_qm_dqflush() never triggers and we unlock the
buffer safely allowing a concurrent shutdown to fail the buffer
appropriately.
xfs_qm_dqpurge() could have this problem as it is called from
quotacheck and we might have allocated dquot buffers when recording
the quota updates. This can be fixed by calling
xfs_qm_dqunpin_wait() before we lock the dquot buffer. Because we
hold the dquot locked, nothing will be able to add to the pin count
between the unpin_wait and the dqflush callout, so this now makes
xfs_qm_dqpurge() safe against this race.
xfs_qm_dquot_isolate() can also be fixed this same way but, quite
frankly, we shouldn't be doing IO in memory reclaim context. If the
dquot is pinned or dirty, simply rotate it and let memory reclaim
come back to it later, same as we do for inodes.
This then gets rid of the nasty issue in xfs_qm_flush_one() where
quotacheck writeback races with memory reclaim flushing the dquots.
We can lift xfs_qm_dqunpin_wait() up into this code, then get rid of
the "can't get the dqflush lock" buffer write to cycle the dqlfush
lock and enable it to be flushed again. checking if the dquot is
pinned and returning -EAGAIN so that the dquot walk will revisit the
dquot again later.
Finally, with xfs_qm_dqunpin_wait() lifted into all the callers,
we can remove it from the xfs_qm_dqflush() code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
There is a race condition that can trigger in dmflakey fstests that
can result in asserts in xfs_ialloc_read_agi() and
xfs_alloc_read_agf() firing. The asserts look like this:
XFS: Assertion failed: pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks), file: fs/xfs/libxfs/xfs_alloc.c, line: 3440
.....
Call Trace:
<TASK>
xfs_alloc_read_agf+0x2ad/0x3a0
xfs_alloc_fix_freelist+0x280/0x720
xfs_alloc_vextent_prepare_ag+0x42/0x120
xfs_alloc_vextent_iterate_ags+0x67/0x260
xfs_alloc_vextent_start_ag+0xe4/0x1c0
xfs_bmapi_allocate+0x6fe/0xc90
xfs_bmapi_convert_delalloc+0x338/0x560
xfs_map_blocks+0x354/0x580
iomap_writepages+0x52b/0xa70
xfs_vm_writepages+0xd7/0x100
do_writepages+0xe1/0x2c0
__writeback_single_inode+0x44/0x340
writeback_sb_inodes+0x2d0/0x570
__writeback_inodes_wb+0x9c/0xf0
wb_writeback+0x139/0x2d0
wb_workfn+0x23e/0x4c0
process_scheduled_works+0x1d4/0x400
worker_thread+0x234/0x2e0
kthread+0x147/0x170
ret_from_fork+0x3e/0x50
ret_from_fork_asm+0x1a/0x30
I've seen the AGI variant from scrub running on the filesysetm
after unmount failed due to systemd interference:
XFS: Assertion failed: pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) || xfs_is_shutdown(pag->pag_mount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 2804
.....
Call Trace:
<TASK>
xfs_ialloc_read_agi+0xee/0x150
xchk_perag_drain_and_lock+0x7d/0x240
xchk_ag_init+0x34/0x90
xchk_inode_xref+0x7b/0x220
xchk_inode+0x14d/0x180
xfs_scrub_metadata+0x2e2/0x510
xfs_ioc_scrub_metadata+0x62/0xb0
xfs_file_ioctl+0x446/0xbf0
__se_sys_ioctl+0x6f/0xc0
__x64_sys_ioctl+0x1d/0x30
x64_sys_call+0x1879/0x2ee0
do_syscall_64+0x68/0x130
? exc_page_fault+0x62/0xc0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Essentially, it is the same problem. When _flakey_drop_and_remount()
loads the drop-writes table, it makes all writes silently fail. Writes
are reported to the fs as completed successfully, but they are not
issued to the backing store. The filesystem sees the successful
write completion and marks the metadata buffer clean and removes it
from the AIL.
If this happens at the same time as memory pressure is occuring,
the now-clean AGF and/or AGI buffers can be reclaimed from memory.
Shortly afterwards, but before _flakey_drop_and_remount() runs
unmount, background writeback is kicked and it tries to allocate
blocks for the dirty pages in memory. This then tries to access the
AGF buffer we just turfed out of memory. It's not found, so it gets
read in from disk.
This is all fine, except for the fact that the last writeback of the
AGF did not actually reach disk. The AGF on disk is stale compared
to the in-memory state held by the perag, and so they don't match
and the assert fires.
Then other operations on that inode hang because the task was killed
whilst holding inode locks. e.g:
Workqueue: xfs-conv/dm-12 xfs_end_io
Call Trace:
<TASK>
__schedule+0x650/0xb10
schedule+0x6d/0xf0
schedule_preempt_disabled+0x15/0x30
rwsem_down_write_slowpath+0x31a/0x5f0
down_write+0x43/0x60
xfs_ilock+0x1a8/0x210
xfs_trans_alloc_inode+0x9c/0x240
xfs_iomap_write_unwritten+0xe3/0x300
xfs_end_ioend+0x90/0x130
xfs_end_io+0xce/0x100
process_scheduled_works+0x1d4/0x400
worker_thread+0x234/0x2e0
kthread+0x147/0x170
ret_from_fork+0x3e/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
and it's all down hill from there.
Memory pressure is one way to trigger this, another is to run "echo
3 > /proc/sys/vm/drop_caches" randomly while tests are running.
Regardless of how it is triggered, this effectively takes down the
system once umount hangs because it's holding a sb->s_umount lock
exclusive and now every sync(1) call gets stuck on it.
Fix this by replacing the asserts with a corruption detection check
and a shutdown.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Lock order of xfs_ifree_cluster() is cluster buffer -> try ILOCK
-> IFLUSHING, except for the last inode in the cluster that is
triggering the free. In that case, the lock order is ILOCK ->
cluster buffer -> IFLUSHING.
xfs_iflush_cluster() uses cluster buffer -> try ILOCK -> IFLUSHING,
so this can safely run concurrently with xfs_ifree_cluster().
xfs_inode_item_precommit() uses ILOCK -> cluster buffer, but this
cannot race with xfs_ifree_cluster() so being in a different order
will not trigger a deadlock.
xfs_reclaim_inode() during a filesystem shutdown uses ILOCK ->
IFLUSHING -> cluster buffer via xfs_iflush_shutdown_abort(), and
this deadlocks against xfs_ifree_cluster() like so:
sysrq: Show Blocked State
task:kworker/10:37 state:D stack:12560 pid:276182 tgid:276182 ppid:2 flags:0x00004000
Workqueue: xfs-inodegc/dm-3 xfs_inodegc_worker
Call Trace:
<TASK>
__schedule+0x650/0xb10
schedule+0x6d/0xf0
schedule_timeout+0x8b/0x180
schedule_timeout_uninterruptible+0x1e/0x30
xfs_ifree+0x326/0x730
xfs_inactive_ifree+0xcb/0x230
xfs_inactive+0x2c8/0x380
xfs_inodegc_worker+0xaa/0x180
process_scheduled_works+0x1d4/0x400
worker_thread+0x234/0x2e0
kthread+0x147/0x170
ret_from_fork+0x3e/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
task:fsync-tester state:D stack:12160 pid:2255943 tgid:2255943 ppid:3988702 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x650/0xb10
schedule+0x6d/0xf0
schedule_timeout+0x31/0x180
__down_common+0xbe/0x1f0
__down+0x1d/0x30
down+0x48/0x50
xfs_buf_lock+0x3d/0xe0
xfs_iflush_shutdown_abort+0x51/0x1e0
xfs_icwalk_ag+0x386/0x690
xfs_reclaim_inodes_nr+0x114/0x160
xfs_fs_free_cached_objects+0x19/0x20
super_cache_scan+0x17b/0x1a0
do_shrink_slab+0x180/0x350
shrink_slab+0xf8/0x430
drop_slab+0x97/0xf0
drop_caches_sysctl_handler+0x59/0xc0
proc_sys_call_handler+0x189/0x280
proc_sys_write+0x13/0x20
vfs_write+0x33d/0x3f0
ksys_write+0x7c/0xf0
__x64_sys_write+0x1b/0x30
x64_sys_call+0x271d/0x2ee0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x76/0x7e
We can't change the lock order of xfs_ifree_cluster() - XFS_ISTALE
and XFS_IFLUSHING are serialised through to journal IO completion
by the cluster buffer lock being held.
There's quite a few asserts in the code that check that XFS_ISTALE
does not occur out of sync with buffer locking (e.g. in
xfs_iflush_cluster). There's also a dependency on the inode log item
being removed from the buffer before XFS_IFLUSHING is cleared, also
with asserts that trigger on this.
Further, we don't have a requirement for the inode to be locked when
completing or aborting inode flushing because all the inode state
updates are serialised by holding the cluster buffer lock across the
IO to completion.
We can't check for XFS_IRECLAIM in xfs_ifree_mark_inode_stale() and
skip the inode, because there is no guarantee that the inode will be
reclaimed. Hence it *must* be marked XFS_ISTALE regardless of
whether reclaim is preparing to free that inode. Similarly, we can't
check for IFLUSHING before locking the inode because that would
result in dirty inodes not being marked with ISTALE in the event of
racing with XFS_IRECLAIM.
Hence we have to address this issue from the xfs_reclaim_inode()
side. It is clear that we cannot hold the inode locked here when
calling xfs_iflush_shutdown_abort() because it is the inode->buffer
lock order that causes the deadlock against xfs_ifree_cluster().
Hence we need to drop the ILOCK before aborting the inode in the
shutdown case. Once we've aborted the inode, we can grab the ILOCK
again and then immediately reclaim it as it is now guaranteed to be
clean.
Note that dropping the ILOCK in xfs_reclaim_inode() means that it
can now be locked by xfs_ifree_mark_inode_stale() and seen whilst in
this state. This is safe because we have left the XFS_IFLUSHING flag
on the inode and so xfs_ifree_mark_inode_stale() will simply set
XFS_ISTALE and move to the next inode. An ASSERT check in this path
needs to be tweaked to take into account this new shutdown
interaction.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
It already depends on X86_32, but that's also set for ARCH=um.
Recent changes made UML no longer have IO port access since
it's not needed, but this driver uses it. Build it only for
HAS_IOPORT. This is pretty much the same as depending on X86,
but on the off-chance that HAS_IOPORT will ever be optional
on x86 HAS_IOPORT is the real prerequisite.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
Property "num_cpu" and "feature" are read-only once eiointc is created,
which are set with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL attr group before
device creation.
Attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS is to update register
and software state for migration and reset usage, property "num_cpu" and
"feature" can not be update again if it is created already.
Here discard write operation with property "num_cpu" and "feature" in
attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL.
Cc: stable@vger.kernel.org
Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The maximum supported cpu number is EIOINTC_ROUTE_MAX_VCPUS about
irqchip EIOINTC, here add validation about cpu number to avoid array
pointer overflow.
Cc: stable@vger.kernel.org
Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
With EIOINTC interrupt controller, physical CPU ID is set for irq route.
However the function kvm_get_vcpu() is used to get destination vCPU when
delivering irq. With API kvm_get_vcpu(), the logical CPU ID is used.
With API kvm_get_vcpu_by_cpuid(), vCPU ID can be searched from physical
CPU ID.
Cc: stable@vger.kernel.org
Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
With function eiointc_update_sw_coremap(), there is forced assignment
like val = *(u64 *)pvalue. Parameter pvalue may be pointer to char type
or others, there is problem with forced assignment with u64 type.
Here the detailed value is passed rather address pointer.
Cc: stable@vger.kernel.org
Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
IOCSR instruction supports 1/2/4/8 bytes access, the address should be
naturally aligned with its access size. Here address alignment check is
added in the EIOINTC kernel emulation.
Cc: stable@vger.kernel.org
Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host fixes for v6.16-rc4
- imx: fix SMBus protocol compliance during block read
- omap: fix error handling path in probe
- robotfuzz, tiny-usb: prevent zero-length reads
- x86, designware, amdisp: fix build error when modules are
disabled
|
|
Correct spelling as flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Airoha AN7583 SoC have 2 dedicated MDIO bus controller in the SCU
register map. To driver register an MDIO controller based on the DT
reg property and access the register by accessing the parent syscon.
The MDIO bus logic is similar to the MT7530 internal MDIO bus but
deviates of some setting and some HW bug.
On Airoha AN7583 the MDIO clock is set to 25MHz by default and needs to
be correctly setup to 2.5MHz to correctly work (by setting the divisor
to 10x).
There seems to be Hardware bug where AN7583_MII_RWDATA
is not wiped in the context of unconnected PHY and the
previous read value is returned.
Example: (only one PHY on the BUS at 0x1f)
- read at 0x1f report at 0x2 0x7500
- read at 0x0 report 0x7500 on every address
To workaround this, we reset the Mdio BUS at every read
to have consistent values on read operation.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Airoha AN7583 SoC have 3 different MDIO Controller. One comes from
the intergated Switch based on MT7530. The other 2 live under the SCU
register and expose 2 dedicated MDIO controller.
Document the schema for the 2 dedicated MDIO controller.
Each MDIO controller can be independently reset with the SoC reset line.
Each MDIO controller have a dedicated clock configured to 2.5MHz by
default to follow MDIO bus IEEE 802.3 standard.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a regression where wp512 can no longer be used with hmac"
* tag 'v6.16-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: wp512 - Use API partial block handling
|
|
Pull bcachefs fixes from Kent Overstreet:
- Lots of small check/repair fixes, primarily in subvol loop and
directory structure loop (when involving snapshots).
- Fix a few 6.16 regressions: rare UAF in the foreground allocator path
when taking a transaction restart from the transaction bump
allocator, and some small fallout from the change to log the error
being corrected in the journal when repairing errors, also some
fallout from the btree node read error logging improvements.
(Alan, Bharadwaj)
- New option: journal_rewind
This lets the entire filesystem be reset to an earlier point in time.
Note that this is only a disaster recovery tool, and right now there
are major caveats to using it (discards should be disabled, in
particular), but it successfully restored the filesystem of one of
the users who was bit by the subvolume deletion bug and didn't have
backups. I'll likely be making some changes to the discard path in
the future to make this a reliable recovery tool.
- Some new btree iterator tracepoints, for tracking down some
livelock-ish behaviour we've been seeing in the main data write path.
* tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefs: (51 commits)
bcachefs: Plumb correct ip to trans_relock_fail tracepoint
bcachefs: Ensure we rewind to run recovery passes
bcachefs: Ensure btree node scan runs before checking for scanned nodes
bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix
bcachefs: fix bch2_journal_keys_peek_prev_min() underflow
bcachefs: Use wait_on_allocator() when allocating journal
bcachefs: Check for bad write buffer key when moving from journal
bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked
bcachefs: Fix range in bch2_lookup_indirect_extent() error path
bcachefs: fix spurious error_throw
bcachefs: Add missing bch2_err_class() to fileattr_set()
bcachefs: Add missing key type checks to check_snapshot_exists()
bcachefs: Don't log fsck err in the journal if doing repair elsewhere
bcachefs: Fix *__bch2_trans_subbuf_alloc() error path
bcachefs: Fix missing newlines before ero
bcachefs: fix spurious error in read_btree_roots()
bcachefs: fsck: Fix oops in key_visible_in_snapshot()
bcachefs: fsck: fix unhandled restart in topology repair
bcachefs: fsck: Fix check_directory_structure when no check_dirents
bcachefs: Fix restart handling in btree_node_scrub_work()
...
|
|
Thomas Gleixner says:
====================
ptp: Belated spring cleaning of the chardev driver
When looking into supporting auxiliary clocks in the PTP ioctl, the
inpenetrable ptp_ioctl() letter soup bothered me enough to clean it up.
The code (~400 lines!) is really hard to follow due to a gazillion of
local variables, which are only used in certain case scopes, and a
mixture of gotos, breaks and direct error return paths.
Clean it up by splitting out the IOCTL functionality into seperate
functions, which contain only the required local variables and are trivial
to follow. Complete the cleanup by converting the code to lock guards and
get rid of all gotos.
That reduces the code size by 48 lines and also the binary text size is
80 bytes smaller than the current maze.
The series is split up into one patch per IOCTL command group for easy
review.
v1: https://lore.kernel.org/20250620130144.351492917@linutronix.de
====================
Link: https://patch.msgid.link/20250625114404.102196103@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mixture of gotos and direct return codes is inconsistent and just makes
the code harder to read. Let it consistently return error codes directly and
tidy the code flow up accordingly.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250625115133.486953538@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Convert the various spin_lock_irqsave() protected critical regions to
scoped guards. Use spinlock_irq instead of spinlock_irqsave as all the
functions are invoked in thread context with interrupts enabled.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250625115133.425029269@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Finish the ptp_ioctl() cleanup by splitting out the PTP_MASK_EN_SINGLE
ioctl code and removing the remaining local variables and return
statements.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.364422719@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_MASK_CLEAR_ALL ioctl
code into a helper function.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.302755618@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_PIN_SETFUNC ioctl
code into a helper function. Convert to lock guard while at it and remove
the pointless memset of the pd::rsv because nothing uses it.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.241503804@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_PIN_GETFUNC ioctl
code into a helper function. Convert to lock guard while at it and remove
the pointless memset of the pd::rsv because nothing uses it.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.177265865@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_SYS_OFFSET ioctl
code into a helper function.
Convert it to __free() to avoid gotos.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250625115133.113841216@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the
PTP_SYS_OFFSET_EXTENDED ioctl code into a helper function.
Convert it to __free() to avoid gotos.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.050445505@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_SYS_OFFSET_PRECISE
ioctl code into a helper function.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115132.986897454@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_ENABLE_PPS
ioctl code into a helper function. Convert to a lock guard while at it.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115132.923803136@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_PEROUT_REQUEST
ioctl code into a helper function. Convert to a lock guard while at it.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115132.860150473@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Continue the ptp_ioctl() cleanup by splitting out the PTP_EXTTS_REQUEST
ioctl code into a helper function. Convert to a lock guard while at it.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115132.797588258@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ptp_ioctl() is an inpenetrable letter soup with a gazillion of case (scope)
specific variables defined at the top of the function and pointless breaks
and gotos.
Start cleaning it up by splitting out the PTP_CLOCK_GETCAPS ioctl code into
a helper function. Use a argument pointer with a single sparse compliant
type cast instead of proliferating the type cast all over the place.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115132.733409073@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
setup_wait() takes an optional argument and then is called from the top
level of the test script. That confuses shellcheck, which thinks that maybe
the intention is to pass $1 of the script to the function, which is never
the case. To avoid having to annotate every single new test with a SC
disable, split the function in two: one that takes a mandatory argument,
and one that takes no argument at all.
Convert the two existing users of that optional argument, both in Spectrum
resource selftest, to use the new form. Clean up vxlan_bridge_1q_mc_ul.sh
to not pass a now-unused argument.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/8e13123236fe3912ae29bc04a1528bdd8551da1f.1750847794.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is unused since commit f04565ddf52e ("dev: use name hash for
dev_seq_ops")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250625102155.483570-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
lwtunnel_build_state() has check validity of encap_type,
so no need to do this before call it.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250625022059.3958215-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2025-06-27
We've added 6 non-merge commits during the last 8 day(s) which contain
a total of 6 files changed, 120 insertions(+), 20 deletions(-).
The main changes are:
1) Fix RCU usage in task_cls_state() for BPF programs using helpers like
bpf_get_cgroup_classid_curr() outside of networking, from Charalampos
Mitrodimas.
2) Fix a sockmap race between map_update and a pending workqueue from
an earlier map_delete freeing the old psock where both pointed to the
same psock->sk, from Jiayuan Chen.
3) Fix a data corruption issue when using bpf_msg_pop_data() in kTLS which
failed to recalculate the ciphertext length, also from Jiayuan Chen.
4) Remove xdp_redirect_map{,_err} trace events since they are unused and
also hide XDP trace events under CONFIG_BPF_SYSCALL, from Steven Rostedt.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
xdp: tracing: Hide some xdp events under CONFIG_BPF_SYSCALL
xdp: Remove unused events xdp_redirect_map and xdp_redirect_map_err
net, bpf: Fix RCU usage in task_cls_state() for BPF programs
selftests/bpf: Add test to cover ktls with bpf_msg_pop_data
bpf, ktls: Fix data corruption when using bpf_msg_pop_data() in ktls
bpf, sockmap: Fix psock incorrectly pointing to sk
====================
Link: https://patch.msgid.link/20250626230111.24772-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
airoha_qdma_fill_rx_queue()
Since the page_pool for airoha_eth driver is created with
PP_FLAG_DMA_SYNC_DEV flag, we do not need to sync_for_device each page
received from the pool since it is already done by the page_pool codebase.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250625-airoha-sync-for-device-v1-1-923741deaabf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix build errors when CONFIG_NET_SHAPER is disabled, including:
drivers/net/ethernet/microsoft/mana/mana_en.c:804:10: error:
'const struct net_device_ops' has no member named 'net_shaper_ops'
804 | .net_shaper_ops = &mana_shaper_ops,
drivers/net/ethernet/microsoft/mana/mana_en.c:804:35: error:
initialization of 'int (*)(struct net_device *, struct neigh_parms *)'
from incompatible pointer type 'const struct net_shaper_ops *'
[-Werror=incompatible-pointer-types]
804 | .net_shaper_ops = &mana_shaper_ops,
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Fixes: 75cabb46935b ("net: mana: Add support for net_shaper_ops")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506230625.bfUlqb8o-lkp@intel.com/
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1750851355-8067-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fix for stalls during suspend/resume cycles with hid-nintendo (Daniel
J. Ogorchock)
- memory leak and reference count fixes in hid-wacom and in-appletb-kdb
(Qasim Ijaz)
- race condition (leading to kernel crash) fix during device removal in
hid-wacom (Thomas Zeitlhofer)
- fix for missed interrupt in intel-thc-hid (Intel-thc-hid:)
- support for a bunch of new device IDs
* tag 'hid-for-linus-2025062701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: lenovo: Add support for ThinkPad X1 Tablet Thin Keyboard Gen2
HID: appletb-kbd: fix "appletb_backlight" backlight device reference counting
HID: wacom: fix crash in wacom_aes_battery_handler()
HID: intel-ish-hid: ipc: Add Wildcat Lake PCI device ID
hid: intel-ish-hid: Use PCI_DEVICE_DATA() macro for ISH device table
HID: lenovo: Restrict F7/9/11 mode to compact keyboards only
HID: Add IGNORE quirk for SMARTLINKTECHNOLOGY
HID: input: lower message severity of 'No inputs registered, leaving' to debug
HID: quirks: Add quirk for 2 Chicony Electronics HP 5MP Cameras
HID: Intel-thc-hid: Intel-quicki2c: Enhance QuickI2C reset flow
HID: nintendo: avoid bluetooth suspend/resume stalls
HID: wacom: fix kobject reference count leak
HID: wacom: fix memory leak on sysfs attribute creation failure
HID: wacom: fix memory leak on kobject creation failure
|
|
The DT bindings file "renesas,r9a09g057-gbeth.yaml" applies to a whole
family of SoCs, and uses "renesas,rzv2h-gbeth" as a fallback compatible
value. Hence rename it to the more generic "renesas,rzv2h-gbeth.yaml".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/721f6e0e09777e0842ecaca4578bc50c953d2428.1750838954.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
Driver Changes:
- Missing error check (Haoxiang Li)
- Fix xe_hwmon_power_max_write (Karthik)
- Move flushes (Maarten and Matthew Auld)
- Explicitly exit CT safe mode on unwind (Michal)
- Process deferred GGTT node removals on device unwind (Michal)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aF1T6EzzC3xj4K4H@fedora
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix for SNPS PHY HDMI for 1080p@120Hz
- Correct DP AUX DPCD probe address
- Followup build fix for GCOV and AutoFDO enabled config
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://lore.kernel.org/r/aFzsHR9WLYsxg8jy@jlahtine-mobl
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.16-2025-25-25:
amdgpu:
- Cleaner shader support for additional GFX9 GPUs
- MES firmware compatibility fixes
- Discovery error reporting fixes
- SDMA6/7 userq fixes
- Backlight fix
- EDID sanity check
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250625151734.11537-1-alexander.deucher@amd.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Convert altr,uart-1.0 and altr,juart-1.0 to DT schema. These were
applied for nios2, but never sent upstream.
- Fix extra '/' in fsl,ls1028a-reset '$id' path
- Fix warnings in ti,sn65dsi83 schema due to unnecessary $ref.
* tag 'devicetree-fixes-for-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: serial: Convert altr,uart-1.0 to DT schema
dt-bindings: serial: Convert altr,juart-1.0 to DT schema
dt-bindings: soc: fsl,ls1028a-reset: Drop extra "/" in $id
dt-bindings: drm/bridge: ti-sn65dsi83: drop $ref to fix lvds-vod* warnings
|
|
A previous commit aborted mapping more for a non-incremental ring for
bundle peeking, but depending on where in the process this peeking
happened, it would not necessarily prevent a retry by the user. That can
create gaps in the received/read data.
Add struct buf_sel_arg->partial_map, which can pass this information
back. The networking side can then map that to internal state and use it
to gate retry as well.
Since this necessitates a new flag, change io_sr_msg->retry to a
retry_flags member, and store both the retry and partial map condition
in there.
Cc: stable@vger.kernel.org
Fixes: 26ec15e4b0c1 ("io_uring/kbuf: don't truncate end buffer for multiple buffer peeks")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Allow the flexfiles error handling to recognise NFS level errors (as
opposed to RPC level errors) and handle them separately. The main
motivator is the NFSERR_PERM errors that get returned if the NFS client
connects to the data server through a port number that is lower than
1024. In that case, the client should disconnect and retry a READ on a
different data server, or it should retry a WRITE after reconnecting.
Reviewed-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc4).
Conflicts:
Documentation/netlink/specs/mptcp_pm.yaml
9e6dd4c256d0 ("netlink: specs: mptcp: replace underscores with dashes in names")
ec362192aa9e ("netlink: specs: fix up indentation errors")
https://lore.kernel.org/20250626122205.389c2cd4@canb.auug.org.au
Adjacent changes:
Documentation/netlink/specs/fou.yaml
791a9ed0a40d ("netlink: specs: fou: replace underscores with dashes in names")
880d43ca9aa4 ("netlink: specs: clean up spaces in brackets")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- bridge: fix use-after-free during router port configuration
Current release - new code bugs:
- eth: wangxun: fix the creation of page_pool
Previous releases - regressions:
- netpoll: initialize UDP checksum field before checksumming
- wifi: mac80211: finish link init before RCU publish
- bluetooth: fix use-after-free in vhci_flush()
- eth:
- ionic: fix DMA mapping test
- bnxt: properly flush XDP redirect lists
Previous releases - always broken:
- netlink: specs: enforce strict naming of properties
- unix: don't leave consecutive consumed OOB skbs.
- vsock: fix linux/vm_sockets.h userspace compilation errors
- selftests: fix TCP packet checksum"
* tag 'net-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
net: libwx: fix the creation of page_pool
net: selftests: fix TCP packet checksum
atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister().
netlink: specs: enforce strict naming of properties
netlink: specs: tc: replace underscores with dashes in names
netlink: specs: rt-link: replace underscores with dashes in names
netlink: specs: mptcp: replace underscores with dashes in names
netlink: specs: ovs_flow: replace underscores with dashes in names
netlink: specs: devlink: replace underscores with dashes in names
netlink: specs: dpll: replace underscores with dashes in names
netlink: specs: ethtool: replace underscores with dashes in names
netlink: specs: fou: replace underscores with dashes in names
netlink: specs: nfsd: replace underscores with dashes in names
net: enetc: Correct endianness handling in _enetc_rd_reg64
atm: idt77252: Add missing `dma_map_error()`
bnxt: properly flush XDP redirect lists
vsock/uapi: fix linux/vm_sockets.h userspace compilation errors
wifi: mac80211: finish link init before RCU publish
wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version
selftest: af_unix: Add tests for -ECONNRESET.
...
|
|
When performing a file read from RDMA, smbd_recv() prints an "Invalid msg
type 4" error and fails the I/O. This is due to the switch-statement there
not handling the ITER_FOLIOQ handed down from netfslib.
Fix this by collapsing smbd_recv_buf() and smbd_recv_page() into
smbd_recv() and just using copy_to_iter() instead of memcpy(). This
future-proofs the function too, in case more ITER_* types are added.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Talpey <tom@talpey.com>
cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The handling of received data in the smbdirect client code involves using
copy_to_iter() to copy data from the smbd_reponse struct's packet trailer
to a folioq buffer provided by netfslib that encapsulates a chunk of
pagecache.
If, however, CONFIG_HARDENED_USERCOPY=y, this will result in the checks
then performed in copy_to_iter() oopsing with something like the following:
CIFS: Attempting to mount //172.31.9.1/test
CIFS: VFS: RDMA transport established
usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:102!
...
RIP: 0010:usercopy_abort+0x6c/0x80
...
Call Trace:
<TASK>
__check_heap_object+0xe3/0x120
__check_object_size+0x4dc/0x6d0
smbd_recv+0x77f/0xfe0 [cifs]
cifs_readv_from_socket+0x276/0x8f0 [cifs]
cifs_read_from_socket+0xcd/0x120 [cifs]
cifs_demultiplex_thread+0x7e9/0x2d50 [cifs]
kthread+0x396/0x830
ret_from_fork+0x2b8/0x3b0
ret_from_fork_asm+0x1a/0x30
The problem is that the smbd_response slab's packet field isn't marked as
being permitted for usercopy.
Fix this by passing parameters to kmem_slab_create() to indicate that
copy_to_iter() is permitted from the packet region of the smbd_response
slab objects, less the header space.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/acb7f612-df26-4e2a-a35d-7cd040f513e1@samba.org/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Tested-by: Stefan Metzmacher <metze@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order
and prevent the following deadlock from happening
======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc3-build2+ #1301 Tainted: G S W
------------------------------------------------------
cifsd/6055 is trying to acquire lock:
ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200
but task is already holding lock:
ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&ret_buf->chan_lock){+.+.}-{3:3}:
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_setup_session+0x81/0x4b0
cifs_get_smb_ses+0x771/0x900
cifs_mount_get_session+0x7e/0x170
cifs_mount+0x92/0x2d0
cifs_smb3_do_mount+0x161/0x460
smb3_get_tree+0x55/0x90
vfs_get_tree+0x46/0x180
do_new_mount+0x1b0/0x2e0
path_mount+0x6ee/0x740
do_mount+0x98/0xe0
__do_sys_mount+0x148/0x180
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&ret_buf->ses_lock){+.+.}-{3:3}:
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_match_super+0x101/0x320
sget+0xab/0x270
cifs_smb3_do_mount+0x1e0/0x460
smb3_get_tree+0x55/0x90
vfs_get_tree+0x46/0x180
do_new_mount+0x1b0/0x2e0
path_mount+0x6ee/0x740
do_mount+0x98/0xe0
__do_sys_mount+0x148/0x180
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}:
check_noncircular+0x95/0xc0
check_prev_add+0x115/0x2f0
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_signal_cifsd_for_reconnect+0x134/0x200
__cifs_reconnect+0x8f/0x500
cifs_handle_standard+0x112/0x280
cifs_demultiplex_thread+0x64d/0xbc0
kthread+0x2f7/0x310
ret_from_fork+0x2a/0x230
ret_from_fork_asm+0x1a/0x30
other info that might help us debug this:
Chain exists of:
&tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ret_buf->chan_lock);
lock(&ret_buf->ses_lock);
lock(&ret_buf->chan_lock);
lock(&tcp_ses->srv_lock);
*** DEADLOCK ***
3 locks held by cifsd/6055:
#0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200
#1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200
#2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200
Cc: linux-cifs@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Fixes: d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data")
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The driver currently defaults to the internal oscillator as the clock
source for E825-C hardware. While this clock source is labeled TCXO,
indicating a temperature compensated oscillator, this is only true for some
board designs. Many board designs have a less capable oscillator. The
E825-C hardware may also have its clock source set to the TIME_REF pin.
This pin is connected to the DPLL and is often a more stable clock source.
The choice of the internal oscillator is not suitable for all systems,
especially those which want to enable SyncE support.
There is currently no interface available for users to configure the clock
source. Other variants of the E82x board have the clock source configured
in the NVM, but E825-C lacks this capability, so different board designs
cannot select a different default clock via firmware.
In most setups, the TIME_REF is a suitable default clock source.
Additionally, we now fall back to the internal oscillator automatically if
the TIME_REF clock source cannot be locked.
Change the default clock source for E825-C to TIME_REF. Note that the
driver logs a dev_dbg message upon configuring the TSPLL which includes the
clock source and frequency. This can be enabled to confirm which clock
source is in use.
Longterm a proper interface to dynamically introspect and change the clock
source will be designed (perhaps some extension of the DPLL subsystem?)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Initialize TSPLL after initializing PHC in ice_ptp.c instead of calling
for each product in PHC init in ice_ptp_hw.c.
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|