Age | Commit message (Collapse) | Author |
|
xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with
-ENOMEM if the swiotlb has already been initialized.
Add an explicit check io_tlb_default_mem != NULL at the beginning of
xen_swiotlb_init. If the swiotlb is already initialized print a warning
and return -EEXIST.
On x86, the error propagates.
On ARM, we don't actually need a special swiotlb buffer (yet), any
buffer would do. So ignore the error and continue.
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210512201823.1963-3-sstabellini@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init,
today dma_direct_map_page returns error if SWIOTLB_NO_FORCE.
For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can
do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it
is going to be required later (e.g. Xen requires it).
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
CC: catalin.marinas@arm.com
CC: will@kernel.org
CC: linux-arm-kernel@lists.infradead.org
Fixes: 2726bf3ff252 ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Move xen_swiotlb_detect to a static inline function to make it available
to !CONFIG_XEN builds.
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210512201823.1963-1-sstabellini@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029)
the commit e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO
differences inline") broke vDSO on x86. The problem appears to be that
VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and
'#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not
a define).
Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead.
Fixes: e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline")
Reported-by: Mohammed Gamal <mgamal@redhat.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
|
|
Since recent changes instead of storing a large array of struct
io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and
we should not to so worry about restricting max number of registererd
buffer slots, increase the limit 4 times.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are three types of requests that left disabled for sqpoll, namely
epoll ctx, statx, and resources update. Since SQPOLL task is now closely
mimics a userspace thread, remove the restrictions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Always remove linked timeout on io_link_timeout_fn() from the master
request link list, otherwise we may get use-after-free when first
io_link_timeout_fn() puts linked timeout in the fail path, and then
will be found and put on master's free.
Cc: stable@vger.kernel.org # 5.10+
Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly")
Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some interrupt handlers have an "extra" that saves 1 or 2
registers (r14, r15) in the paca save area and makes them available to
use by the handler.
The change to always save nvgprs in exception handlers lead to some
interrupt handlers saving those scratch r14 / r15 registers into the
interrupt frame's GPR saves, which get restored on interrupt exit.
Fix this by always reloading those scratch registers from paca before
the EXCEPTION_COMMON that saves nvgprs.
Fixes: 4228b2c3d20e ("powerpc/64e/interrupt: always save nvgprs on interrupt")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
|
|
scv support introduced the notion of code that implicitly soft-masks
irqs due to the instruction addresses. This is required because scv
enters the kernel with MSR[EE]=1.
If a NMI (including soft-NMI) interrupt hits when we are implicitly
soft-masked then its regs->softe does not reflect this because it is
derived from the explicit soft mask state (paca->irq_soft_mask). This
makes arch_irq_disabled_regs(regs) return false.
This can trigger a warning in the soft-NMI watchdog code (shown below).
Fix it by having NMI interrupts set regs->softe to disabled in case of
interrupting an implicit soft-masked region.
------------[ cut here ]------------
WARNING: CPU: 41 PID: 1103 at arch/powerpc/kernel/watchdog.c:259 soft_nmi_interrupt+0x3e4/0x5f0
CPU: 41 PID: 1103 Comm: (spawn) Not tainted
NIP: c000000000039534 LR: c000000000039234 CTR: c000000000009a00
REGS: c000007fffbcf940 TRAP: 0700 Not tainted
MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 22042482 XER: 200400ad
CFAR: c000000000039260 IRQMASK: 3
GPR00: c000000000039204 c000007fffbcfbe0 c000000001d6c300 0000000000000003
GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
GPR08: 0000007ffd4e0000 0000000000000000 c000007ffffceb00 7265677368657265
GPR12: 9000000000009033 c000007ffffceb00 00000f7075bf4480 000000000000002a
GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
GPR24: 0000000000000000 0000000000000029 c000000001dae058 0000000000000029
GPR28: 0000000000000000 0000000000000800 0000000000000009 c000007fffbcfd60
NIP [c000000000039534] soft_nmi_interrupt+0x3e4/0x5f0
LR [c000000000039234] soft_nmi_interrupt+0xe4/0x5f0
Call Trace:
[c000007fffbcfbe0] [c000000000039204] soft_nmi_interrupt+0xb4/0x5f0 (unreliable)
[c000007fffbcfcf0] [c00000000000c0e8] soft_nmi_common+0x138/0x1c4
--- interrupt: 900 at end_real_trampolines+0x0/0x1000
NIP: c000000000003000 LR: 00007ca426adb03c CTR: 900000000280f033
REGS: c000007fffbcfd60 TRAP: 0900
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44042482 XER: 200400ad
CFAR: 00007ca426946020 IRQMASK: 0
GPR00: 00000000000000ad 00007ffffa45d050 00007ca426b07f00 0000000000000035
GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
GPR08: 0000000000000000 0000000000100000 0000000010000000 00007ffffa45d110
GPR12: 0000000000000001 00007ca426d4e680 00000f7075bf4480 000000000000002a
GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
GPR24: 0000000000000000 00000f7057473f68 0000000000000003 000000000000041b
GPR28: 00007ffffa45d4c4 0000000000000035 0000000000000000 00000f7057473f68
NIP [c000000000003000] end_real_trampolines+0x0/0x1000
LR [00007ca426adb03c] 0x7ca426adb03c
--- interrupt: 900
Instruction dump:
60000000 60000000 60420000 38600001 482b3ae5 60000000 e93f0138 a36d0008
7daa6b78 71290001 7f7907b4 4082fd34 <0fe00000> 4bfffd2c 60420000 ea6100a8
---[ end trace dc75f67d819779da ]---
Fixes: 118178e62e2e ("powerpc: move NMI entry/exit code into wrapper")
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503111708.758261-1-npiggin@gmail.com
|
|
The stf entry barrier fallback is unsafe to execute in a semi-patched
state, which can happen when enabling/disabling the mitigation with
strict kernel RWX enabled and using the hash MMU.
See the previous commit for more details.
Fix it by changing the order in which we patch the instructions.
Note the stf barrier fallback is only used on Power6 or earlier.
Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
|
|
The entry flush mitigation can be enabled/disabled at runtime. When this
happens it results in the kernel patching its own instructions to
enable/disable the mitigation sequence.
With strict kernel RWX enabled instruction patching happens via a
secondary mapping of the kernel text, so that we don't have to make the
primary mapping writable. With the hash MMU this leads to a hash fault,
which causes us to execute the exception entry which contains the entry
flush mitigation.
This means we end up executing the entry flush in a semi-patched state,
ie. after we have patched the first instruction but before we patch the
second or third instruction of the sequence.
On machines with updated firmware the entry flush is a series of special
nops, and it's safe to to execute in a semi-patched state.
However when using the fallback flush the sequence is mflr/branch/mtlr,
and so it's not safe to execute if we have patched out the mflr but not
the other two instructions. Doing so leads to us corrputing LR, leading
to an oops, for example:
# echo 0 > /sys/kernel/debug/powerpc/entry_flush
kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc000000002971000
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1
NIP: c000000002971000 LR: c000000002971000 CTR: c000000000120c40
REGS: c000000013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce)
MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48428482 XER: 00000000
...
NIP 0xc000000002971000
LR 0xc000000002971000
Call Trace:
do_patch_instruction+0xc4/0x340 (unreliable)
do_entry_flush_fixups+0x100/0x3b0
entry_flush_set+0x50/0xe0
simple_attr_write+0x160/0x1a0
full_proxy_write+0x8c/0x110
vfs_write+0xf0/0x340
ksys_write+0x84/0x140
system_call_exception+0x164/0x2d0
system_call_common+0xec/0x278
The simplest fix is to change the order in which we patch the
instructions, so that the sequence is always safe to execute. For the
non-fallback flushes it doesn't matter what order we patch in.
Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au
|
|
The entry flush mitigation can be enabled/disabled at runtime via a
debugfs file (entry_flush), which causes the kernel to patch itself to
enable/disable the relevant mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
sleeper[15639]: segfault (11) at c000000000004c20 nip c000000000004c20 lr c000000000004c20
Shows that we returned to userspace with a corrupted LR that points into
the kernel, due to executing the partially patched call to the fallback
entry flush (ie. we missed the LR restore).
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-2-mpe@ellerman.id.au
|
|
The STF (store-to-load forwarding) barrier mitigation can be
enabled/disabled at runtime via a debugfs file (stf_barrier), which
causes the kernel to patch itself to enable/disable the relevant
mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0)
segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1
code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018
code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c
Shows that we returned to userspace without restoring the user r13
value, due to executing the partially patched STF exit code.
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: a048a07d7f45 ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-1-mpe@ellerman.id.au
|
|
When we move one inode from one directory to another and both the inode
and its previous parent directory were logged before, we are not supposed
to have the dentry for the old parent if we have a power failure after the
log is synced. Only the new dentry is supposed to exist.
Generally this works correctly, however there is a scenario where this is
not currently working, because the old parent of the file/directory that
was moved is not authoritative for a range that includes the dir index and
dir item keys of the old dentry. This case is better explained with the
following example and reproducer:
# The test requires a very specific layout of keys and items in the
# fs/subvolume btree to trigger the bug. So we want to make sure that
# on whatever platform we are, we have the same leaf/node size.
#
# Currently in btrfs the node/leaf size can not be smaller than the page
# size (but it can be greater than the page size). So use the largest
# supported node/leaf size (64K).
$ mkfs.btrfs -f -n 65536 /dev/sdc
$ mount /dev/sdc /mnt
# "testdir" is inode 257.
$ mkdir /mnt/testdir
$ chmod 755 /mnt/testdir
# Create several empty files to have the directory "testdir" with its
# items spread over several leaves (7 in this case).
$ for ((i = 1; i <= 1200; i++)); do
echo -n > /mnt/testdir/file$i
done
# Create our test directory "dira", inode number 1458, which gets all
# its items in leaf 7.
#
# The BTRFS_DIR_ITEM_KEY item for inode 257 ("testdir") that points to
# the entry named "dira" is in leaf 2, while the BTRFS_DIR_INDEX_KEY
# item that points to that entry is in leaf 3.
#
# For this particular filesystem node size (64K), file count and file
# names, we endup with the directory entry items from inode 257 in
# leaves 2 and 3, as previously mentioned - what matters for triggering
# the bug exercised by this test case is that those items are not placed
# in leaf 1, they must be placed in a leaf different from the one
# containing the inode item for inode 257.
#
# The corresponding BTRFS_DIR_ITEM_KEY and BTRFS_DIR_INDEX_KEY items for
# the parent inode (257) are the following:
#
# item 460 key (257 DIR_ITEM 3724298081) itemoff 48344 itemsize 34
# location key (1458 INODE_ITEM 0) type DIR
# transid 6 data_len 0 name_len 4
# name: dira
#
# and:
#
# item 771 key (257 DIR_INDEX 1202) itemoff 36673 itemsize 34
# location key (1458 INODE_ITEM 0) type DIR
# transid 6 data_len 0 name_len 4
# name: dira
$ mkdir /mnt/testdir/dira
# Make sure everything done so far is durably persisted.
$ sync
# Now do a change to inode 257 ("testdir") that does not result in
# COWing leaves 2 and 3 - the leaves that contain the directory items
# pointing to inode 1458 (directory "dira").
#
# Changing permissions, the owner/group, updating or adding a xattr,
# etc, will not change (COW) leaves 2 and 3. So for the sake of
# simplicity change the permissions of inode 257, which results in
# updating its inode item and therefore change (COW) only leaf 1.
$ chmod 700 /mnt/testdir
# Now fsync directory inode 257.
#
# Since only the first leaf was changed/COWed, we log the inode item of
# inode 257 and only the dentries found in the first leaf, all have a
# key type of BTRFS_DIR_ITEM_KEY, and no keys of type
# BTRFS_DIR_INDEX_KEY, because they sort after the former type and none
# exist in the first leaf.
#
# We also log 3 items that represent ranges for dir items and dir
# indexes for which the log is authoritative:
#
# 1) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is
# authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset
# in the range [0, 2285968570] (the offset here is the crc32c of the
# dentry's name). The value 2285968570 corresponds to the offset of
# the first key of leaf 2 (which is of type BTRFS_DIR_ITEM_KEY);
#
# 2) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is
# authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset
# in the range [4293818216, (u64)-1] (the offset here is the crc32c
# of the dentry's name). The value 4293818216 corresponds to the
# offset of the highest key of type BTRFS_DIR_ITEM_KEY plus 1
# (4293818215 + 1), which is located in leaf 2;
#
# 3) a key of type BTRFS_DIR_LOG_INDEX_KEY, with an offset of 1203,
# which indicates the log is authoritative for all keys of type
# BTRFS_DIR_INDEX_KEY that have an offset in the range
# [1203, (u64)-1]. The value 1203 corresponds to the offset of the
# last key of type BTRFS_DIR_INDEX_KEY plus 1 (1202 + 1), which is
# located in leaf 3;
#
# Also, because "testdir" is a directory and inode 1458 ("dira") is a
# child directory, we log inode 1458 too.
$ xfs_io -c "fsync" /mnt/testdir
# Now move "dira", inode 1458, to be a child of the root directory
# (inode 256).
#
# Because this inode was previously logged, when "testdir" was fsynced,
# the log is updated so that the old inode reference, referring to inode
# 257 as the parent, is deleted and the new inode reference, referring
# to inode 256 as the parent, is added to the log.
$ mv /mnt/testdir/dira /mnt
# Now change some file and fsync it. This guarantees the log changes
# made by the previous move/rename operation are persisted. We do not
# need to do any special modification to the file, just any change to
# any file and sync the log.
$ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/testdir/file1
# Simulate a power failure and then mount again the filesystem to
# replay the log tree. We want to verify that we are able to mount the
# filesystem, meaning log replay was successful, and that directory
# inode 1458 ("dira") only has inode 256 (the filesystem's root) as
# its parent (and no longer a child of inode 257).
#
# It used to happen that during log replay we would end up having
# inode 1458 (directory "dira") with 2 hard links, being a child of
# inode 257 ("testdir") and inode 256 (the filesystem's root). This
# resulted in the tree checker detecting the issue and causing the
# mount operation to fail (with -EIO).
#
# This happened because in the log we have the new name/parent for
# inode 1458, which results in adding the new dentry with inode 256
# as the parent, but the previous dentry, under inode 257 was never
# removed - this is because the ranges for dir items and dir indexes
# of inode 257 for which the log is authoritative do not include the
# old dir item and dir index for the dentry of inode 257 referring to
# inode 1458:
#
# - for dir items, the log is authoritative for the ranges
# [0, 2285968570] and [4293818216, (u64)-1]. The dir item at inode 257
# pointing to inode 1458 has a key of (257 DIR_ITEM 3724298081), as
# previously mentioned, so the dir item is not deleted when the log
# replay procedure processes the authoritative ranges, as 3724298081
# is outside both ranges;
#
# - for dir indexes, the log is authoritative for the range
# [1203, (u64)-1], and the dir index item of inode 257 pointing to
# inode 1458 has a key of (257 DIR_INDEX 1202), as previously
# mentioned, so the dir index item is not deleted when the log
# replay procedure processes the authoritative range.
<power failure>
$ mount /dev/sdc /mnt
mount: /mnt: can't read superblock on /dev/sdc.
$ dmesg
(...)
[87849.840509] BTRFS info (device sdc): start tree-log replay
[87849.875719] BTRFS critical (device sdc): corrupt leaf: root=5 block=30539776 slot=554 ino=1458, invalid nlink: has 2 expect no more than 1 for dir
[87849.878084] BTRFS info (device sdc): leaf 30539776 gen 7 total ptrs 557 free space 2092 owner 5
[87849.879516] BTRFS info (device sdc): refs 1 lock_owner 0 current 2099108
[87849.880613] item 0 key (1181 1 0) itemoff 65275 itemsize 160
[87849.881544] inode generation 6 size 0 mode 100644
[87849.882692] item 1 key (1181 12 257) itemoff 65258 itemsize 17
(...)
[87850.562549] item 556 key (1458 12 257) itemoff 16017 itemsize 14
[87850.563349] BTRFS error (device dm-0): block=30539776 write time tree block corruption detected
[87850.564386] ------------[ cut here ]------------
[87850.564920] WARNING: CPU: 3 PID: 2099108 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs]
[87850.566129] Modules linked in: btrfs dm_zero dm_snapshot (...)
[87850.573789] CPU: 3 PID: 2099108 Comm: mount Not tainted 5.12.0-rc8-btrfs-next-86 #1
(...)
[87850.587481] Call Trace:
[87850.587768] btree_csum_one_bio+0x244/0x2b0 [btrfs]
[87850.588354] ? btrfs_bio_fits_in_stripe+0xd8/0x110 [btrfs]
[87850.589003] btrfs_submit_metadata_bio+0xb7/0x100 [btrfs]
[87850.589654] submit_one_bio+0x61/0x70 [btrfs]
[87850.590248] submit_extent_page+0x91/0x2f0 [btrfs]
[87850.590842] write_one_eb+0x175/0x440 [btrfs]
[87850.591370] ? find_extent_buffer_nolock+0x1c0/0x1c0 [btrfs]
[87850.592036] btree_write_cache_pages+0x1e6/0x610 [btrfs]
[87850.592665] ? free_debug_processing+0x1d5/0x240
[87850.593209] do_writepages+0x43/0xf0
[87850.593798] ? __filemap_fdatawrite_range+0xa4/0x100
[87850.594391] __filemap_fdatawrite_range+0xc5/0x100
[87850.595196] btrfs_write_marked_extents+0x68/0x160 [btrfs]
[87850.596202] btrfs_write_and_wait_transaction.isra.0+0x4d/0xd0 [btrfs]
[87850.597377] btrfs_commit_transaction+0x794/0xca0 [btrfs]
[87850.598455] ? _raw_spin_unlock_irqrestore+0x32/0x60
[87850.599305] ? kmem_cache_free+0x15a/0x3d0
[87850.600029] btrfs_recover_log_trees+0x346/0x380 [btrfs]
[87850.601021] ? replay_one_extent+0x7d0/0x7d0 [btrfs]
[87850.601988] open_ctree+0x13c9/0x1698 [btrfs]
[87850.602846] btrfs_mount_root.cold+0x13/0xed [btrfs]
[87850.603771] ? kmem_cache_alloc_trace+0x7c9/0x930
[87850.604576] ? vfs_parse_fs_string+0x5d/0xb0
[87850.605293] ? kfree+0x276/0x3f0
[87850.605857] legacy_get_tree+0x30/0x50
[87850.606540] vfs_get_tree+0x28/0xc0
[87850.607163] fc_mount+0xe/0x40
[87850.607695] vfs_kern_mount.part.0+0x71/0x90
[87850.608440] btrfs_mount+0x13b/0x3e0 [btrfs]
(...)
[87850.629477] ---[ end trace 68802022b99a1ea0 ]---
[87850.630849] BTRFS: error (device sdc) in btrfs_commit_transaction:2381: errno=-5 IO failure (Error while writing out transaction)
[87850.632422] BTRFS warning (device sdc): Skipping commit of aborted transaction.
[87850.633416] BTRFS: error (device sdc) in cleanup_transaction:1978: errno=-5 IO failure
[87850.634553] BTRFS: error (device sdc) in btrfs_replay_log:2431: errno=-5 IO failure (Failed to recover log tree)
[87850.637529] BTRFS error (device sdc): open_ctree failed
In this example the inode we moved was a directory, so it was easy to
detect the problem because directories can only have one hard link and
the tree checker immediately detects that. If the moved inode was a file,
then the log replay would succeed and we would end up having both the
new hard link (/mnt/foo) and the old hard link (/mnt/testdir/foo) present,
but only the new one should be present.
Fix this by forcing re-logging of the old parent directory when logging
the new name during a rename operation. This ensures we end up with a log
that is authoritative for a range covering the keys for the old dentry,
therefore causing the old dentry do be deleted when replaying the log.
A test case for fstests will follow up soon.
Fixes: 64d6b281ba4db0 ("btrfs: remove unnecessary check_parent_dirs_for_sync()")
CC: stable@vger.kernel.org # 5.12+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
`xfs_io -c 'fiemap <off> <len>' <file>`
can give surprising results on btrfs that differ from xfs.
btrfs prints out extents trimmed to fit the user input. If the user's
fiemap request has an offset, then rather than returning each whole
extent which intersects that range, we also trim the start extent to not
have start < off.
Documentation in filesystems/fiemap.txt and the xfs_io man page suggests
that returning the whole extent is expected.
Some cases which all yield the same fiemap in xfs, but not btrfs:
dd if=/dev/zero of=$f bs=4k count=1
sudo xfs_io -c 'fiemap 0 1024' $f
0: [0..7]: 26624..26631
sudo xfs_io -c 'fiemap 2048 1024' $f
0: [4..7]: 26628..26631
sudo xfs_io -c 'fiemap 2048 4096' $f
0: [4..7]: 26628..26631
sudo xfs_io -c 'fiemap 3584 512' $f
0: [7..7]: 26631..26631
sudo xfs_io -c 'fiemap 4091 5' $f
0: [7..6]: 26631..26630
I believe this is a consequence of the logic for merging contiguous
extents represented by separate extent items. That logic needs to track
the last offset as it loops through the extent items, which happens to
pick up the start offset on the first iteration, and trim off the
beginning of the full extent. To fix it, start `off` at 0 rather than
`start` so that we keep the iteration/merging intact without cutting off
the start of the extent.
after the fix, all the above commands give:
0: [0..7]: 26624..26631
The merging logic is exercised by fstest generic/483, and I have written
a new fstest for checking we don't have backwards or zero-length fiemaps
for cases like those above.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Generally a delayed iput is added when we might do the final iput, so
usually we'll end up sleeping while processing the delayed iputs
naturally. However there's no guarantee of this, especially for small
files. In production we noticed 5 instances of RCU stalls while testing
a kernel release overnight across 1000 machines, so this is relatively
common:
host count: 5
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: ....: (20998 ticks this GP) idle=59e/1/0x4000000000000002 softirq=12333372/12333372 fqs=3208
(t=21031 jiffies g=27810193 q=41075) NMI backtrace for cpu 1
CPU: 1 PID: 1713 Comm: btrfs-cleaner Kdump: loaded Not tainted 5.6.13-0_fbk12_rc1_5520_gec92bffc1ec9 #1
Call Trace:
<IRQ> dump_stack+0x50/0x70
nmi_cpu_backtrace.cold.6+0x30/0x65
? lapic_can_unplug_cpu.cold.30+0x40/0x40
nmi_trigger_cpumask_backtrace+0xba/0xca
rcu_dump_cpu_stacks+0x99/0xc7
rcu_sched_clock_irq.cold.90+0x1b2/0x3a3
? trigger_load_balance+0x5c/0x200
? tick_sched_do_timer+0x60/0x60
? tick_sched_do_timer+0x60/0x60
update_process_times+0x24/0x50
tick_sched_timer+0x37/0x70
__hrtimer_run_queues+0xfe/0x270
hrtimer_interrupt+0xf4/0x210
smp_apic_timer_interrupt+0x5e/0x120
apic_timer_interrupt+0xf/0x20 </IRQ>
RIP: 0010:queued_spin_lock_slowpath+0x17d/0x1b0
RSP: 0018:ffffc9000da5fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: ffff889fa81d0cd8 RCX: 0000000000000029
RDX: ffff889fff86c0c0 RSI: 0000000000080000 RDI: ffff88bfc2da7200
RBP: ffff888f2dcdd768 R08: 0000000001040000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff82a55560 R12: ffff88bfc2da7200
R13: 0000000000000000 R14: ffff88bff6c2a360 R15: ffffffff814bd870
? kzalloc.constprop.57+0x30/0x30
list_lru_add+0x5a/0x100
inode_lru_list_add+0x20/0x40
iput+0x1c1/0x1f0
run_delayed_iput_locked+0x46/0x90
btrfs_run_delayed_iputs+0x3f/0x60
cleaner_kthread+0xf2/0x120
kthread+0x10b/0x130
Fix this by adding a cond_resched_lock() to the loop processing delayed
iputs so we can avoid these sort of stalls.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 7000babddac6 ("btrfs: assign proper values to a bool variable in
dev_extent_hole_check_zoned") assigned false to the hole_start parameter
of dev_extent_hole_check_zoned().
The hole_start parameter is not boolean and returns the start location of
the found hole.
Fixes: 7000babddac6 ("btrfs: assign proper values to a bool variable in dev_extent_hole_check_zoned")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.13-2021-05-13:
amdgpu:
- Fixes for flexible array conversions
- Fix sysfs attribute init
- Harvesting fixes
- VCN CG/PG fixes for Picasso
radeon:
- Fixes for flexible array conversions
- Fix for flickering on Oland with multiple 4K displays
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210513163228.3963-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Remove an unused function and a MAINTAINERS update.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210513133617.xq77wwrehpuh7yn2@hendrix
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These close a coverage gap in the intel_pstate driver and fix runtime
PM child count imbalance related to interactions with system-wide
suspend.
Specifics:
- Make intel_pstate work as expected on systems where the platform
firmware enables HWP even though the HWP EPP support is not
advertised (Rafael Wysocki).
- Fix possible runtime PM child count imbalance that may occur if
other runtime PM functions are called after invoking
pm_runtime_force_suspend() and before pm_runtime_force_resume()
is called (Tony Lindgren)"
* tag 'pm-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Fix unpaired parent child_count for force_resume
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert an unnecessary revert of an ACPI power management commit,
add a missing device ID to one of the lists and fix a possible memory
leak in an error path.
Specifics:
- Revert a revert of a recent ACPI power management change that does
not need to be reverted after all (Rafael Wysocki).
- Add missing fan device ID to the list of device IDs for which the
devices should not be put into the ACPI PM domain (Sumeet
Pawnikar).
- Fix possible memory leak in an error path in the ACPI device
enumeration code (Christophe JAILLET)"
* tag 'acpi-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Add ACPI ID of Alder Lake Fan
ACPI: scan: Fix a memory leak in an error handling path
Revert "Revert "ACPI: scan: Turn off unused power resources during initialization""
|
|
* acpi-pm:
ACPI: PM: Add ACPI ID of Alder Lake Fan
Revert "Revert "ACPI: scan: Turn off unused power resources during initialization""
|
|
* pm-core:
PM: runtime: Fix unpaired parent child_count for force_resume
|
|
If a trace event uses the %*.s notation, the trace_check_vprintf() will
fail and will warn about a bad processing of strings, because it does not
take into account the length field when processing the star (*) part.
Have it handle this case as well.
Link: https://lore.kernel.org/linux-nfs/238C0E2D-C2A4-4578-ADD2-C565B3B99842@oracle.com/
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Fixes: 9a6944fee68e2 ("tracing: Add a verifier to check string pointers for trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Merge VT_RESIZEX fixes from Maciej Rozycki:
"I got to the bottom of the issue with VT_RESIZEX recently discussed
and came up with this small patch series, fixing an additional issue
that I originally thought might be broken VGA hardware emulation with
my laptop, which however turned out to be intertwined with the
original problem and also a regression introduced somewhat later.
The fix for that because the first patch, and then to make backporting
feasible I had to put a revert of the offending change from last
September next, followed by a proper fix for the framebuffer issue
that change had tried to address.
See individual change descriptions for details.
These have been verified with true VGA hardware (a Trident TVGA8900
ISA video adapter) using various combinations of `svgatextmode' and
`setfont' command invocations to change both the VT size and the font
size, and also switching between the text console and X11, both by
starting/stopping the X server and by switching between VTs.
All this to ensure bringing the behaviour of VGA text console back to
correct operation as it used to be with Linux 2.6.18"
* emailed patches from Maciej W. Rozycki <macro@orcam.me.uk>:
vt: Fix character height handling with VT_RESIZEX
vt_ioctl: Revert VT_RESIZEX parameter handling removal
vgacon: Record video mode changes with VT_RESIZEX
|
|
Restore the original intent of the VT_RESIZEX ioctl's `v_clin' parameter
which is the number of pixel rows per character (cell) rather than the
height of the font used.
For framebuffer devices the two values are always the same, because the
former is inferred from the latter one. For VGA used as a true text
mode device these two parameters are independent from each other: the
number of pixel rows per character is set in the CRT controller, while
font height is in fact hardwired to 32 pixel rows and fonts of heights
below that value are handled by padding their data with blanks when
loaded to hardware for use by the character generator. One can change
the setting in the CRT controller and it will update the screen contents
accordingly regardless of the font loaded.
The `v_clin' parameter is used by the `vgacon' driver to set the height
of the character cell and then the cursor position within. Make the
parameter explicit then, by defining a new `vc_cell_height' struct
member of `vc_data', set it instead of `vc_font.height' from `v_clin' in
the VT_RESIZEX ioctl, and then use it throughout the `vgacon' driver
except where actual font data is accessed which as noted above is
independent from the CRTC setting.
This way the framebuffer console driver is free to ignore the `v_clin'
parameter as irrelevant, as it always should have, avoiding any issues
attempts to give the parameter a meaning there could have caused, such
as one that has led to commit 988d0763361b ("vt_ioctl: make VT_RESIZEX
behave like VT_RESIZE"):
"syzbot is reporting UAF/OOB read at bit_putcs()/soft_cursor() [1][2],
for vt_resizex() from ioctl(VT_RESIZEX) allows setting font height
larger than actual font height calculated by con_font_set() from
ioctl(PIO_FONT). Since fbcon_set_font() from con_font_set() allocates
minimal amount of memory based on actual font height calculated by
con_font_set(), use of vt_resizex() can cause UAF/OOB read for font
data."
The problem first appeared around Linux 2.5.66 which predates our repo
history, but the origin could be identified with the old MIPS/Linux repo
also at: <git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git>
as commit 9736a3546de7 ("Merge with Linux 2.5.66."), where VT_RESIZEX
code in `vt_ioctl' was updated as follows:
if (clin)
- video_font_height = clin;
+ vc->vc_font.height = clin;
making the parameter apply to framebuffer devices as well, perhaps due
to the use of "font" in the name of the original `video_font_height'
variable. Use "cell" in the new struct member then to avoid ambiguity.
References:
[1] https://syzkaller.appspot.com/bug?id=32577e96d88447ded2d3b76d71254fb855245837
[2] https://syzkaller.appspot.com/bug?id=6b8355d27b2b94fb5cedf4655e3a59162d9e48e3
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org # v2.6.12+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Revert the removal of code handling extra VT_RESIZEX ioctl's parameters
beyond those that VT_RESIZE supports, fixing a functional regression
causing `svgatextmode' not to resize the VT anymore.
As a consequence of the reverted change when the video adapter is
reprogrammed from the original say 80x25 text mode using a 9x16
character cell (720x400 pixel resolution) to say 80x37 text mode and the
same character cell (720x592 pixel resolution), the VT geometry does not
get updated and only upper two thirds of the screen are used for the VT,
and the lower part remains blank. The proportions change according to
text mode geometries chosen.
Revert the change verbatim then, bringing back previous VT resizing.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 988d0763361b ("vt_ioctl: make VT_RESIZEX behave like VT_RESIZE")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix an issue with VGA console font size changes made after the initial
video text mode has been changed with a user tool like `svgatextmode'
calling the VT_RESIZEX ioctl. As it stands in that case the original
screen geometry continues being used to validate further VT resizing.
Consequently when the video adapter is firstly reprogrammed from the
original say 80x25 text mode using a 9x16 character cell (720x400 pixel
resolution) to say 80x37 text mode and the same character cell (720x592
pixel resolution), and secondly the CRTC character cell updated to 9x8
(by loading a suitable font with the KD_FONT_OP_SET request of the
KDFONTOP ioctl), the VT geometry does not get further updated from 80x37
and only upper half of the screen is used for the VT, with the lower
half showing rubbish corresponding to whatever happens to be there in
the video memory that maps to that part of the screen. Of course the
proportions change according to text mode geometries and font sizes
chosen.
Address the problem then, by updating the text mode geometry defaults
rather than checking against them whenever the VT is resized via a user
ioctl.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: e400b6ec4ede ("vt/vgacon: Check if screen resize request comes from userspace")
Cc: stable@vger.kernel.org # v2.6.24+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull NVMe fixes from Christoph:
"nvme fix for Linux 5.13
- correct the check for using the inline bio in nvmet
(Chaitanya Kulkarni)
- demote unsupported command warnings (Chaitanya Kulkarni)
- fix corruption due to double initializing ANA state (me, Hou Pu)
- reset ns->file when open fails (Daniel Wagner)
- fix a NULL deref when SEND is completed with error in nvmet-rdma
(Michal Kalderon)"
* tag 'nvme-5.13-2021-05-13' of git://git.infradead.org/nvme:
nvmet: use new ana_log_size instead the old one
nvmet: seset ns->file when open fails
nvmet: demote fabrics cmd parse err msg to debug
nvmet: use helper to remove the duplicate code
nvmet: demote discovery cmd parse err msg to debug
nvmet-rdma: Fix NULL deref when SEND is completed with error
nvmet: fix inline bio check for passthru
nvmet: fix inline bio check for bdev-ns
nvme-multipath: fix double initialization of ANA state
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix bugs/regressions in adm9240, ltc2992, pmbus/fsp-3y, and occ
drivers, plus a minor cleanup in the corsair-psu driver"
* tag 'hwmon-for-v5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (adm9240) Fix writes into inX_max attributes
hwmon: (ltc2992) Put fwnode in error case during ->probe()
hwmon: (pmbus/fsp-3y) Fix FSP-3Y YH-5151E non-compliant vout encoding
hwmon: (occ) Fix poll rate limiting
hwmon: (corsair-psu) Remove unneeded semicolons
|
|
Anshuman suggested this.
Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210513151819.12526-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
update suspend register settings in Non-DPG mode.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
enable vcn mgcg flag for picasso.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
are connected
Screen flickers rapidly when two 4K 60Hz monitors are in use. This issue
doesn't happen when one monitor is 4K 60Hz (pixelclock 594MHz) and
another one is 4K 30Hz (pixelclock 297MHz).
The issue is gone after setting "power_dpm_force_performance_level" to
"high". Following the indication, we found that the issue occurs when
sclk is too low.
So resolve the issue by disabling sclk switching when there are two
monitors requires high pixelclock (> 297MHz).
v2:
- Only apply the fix to Oland.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Update the method of disabling VCN IP for specific SKU for navi1x ASIC,
it will judge whether should add the related IP at the function of
amdgpu_device_ip_block_add().
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Judgement whether to add an sw ip according to the harvest info.
v2: fix indentation (Alex)
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It is stored in dynamically allocated memory, so sysfs_bin_attr_init() must
be called to initialize it. (Note: "initialization" only sets the .attr.key
member in this struct; it does not change the value of any other members.)
Otherwise, when CONFIG_DEBUG_LOCK_ALLOC=y this message appears during boot:
BUG: key ffff9248900cd148 has not been registered!
Fixes: 9037246bb2da ("drm/amd/display: Add sysfs interface for set/get srm")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1586
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: David Ward <david.ward@gatech.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Create new structure SISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
SISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.
Currently, the code fails because flexible array _levels_ in
struct SISLANDS_SMC_SWSTATE doesn't allow for code that accesses
the first element of initialState.levels and ACPIState.levels
arrays:
drivers/gpu/drm/amd/pm/powerplay/si_dpm.c:
4820: table->initialState.levels[0].mclk.vDLL_CNTL =
4821: cpu_to_be32(si_pi->clock_registers.dll_cntl);
...
5021: table->ACPIState.levels[0].mclk.vDLL_CNTL =
5022: cpu_to_be32(dll_cntl);
because such element cannot be accessed without previously allocating
enough dynamic memory for it to exist (which never actually happens).
So, there is an out-of-bounds bug in this case.
That's why struct SISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.
Also, with the change from one-element array to flexible-array member
in commit 0e1aa13ca3ff ("drm/amd/pm: Replace one-element array with
flexible-array in struct SISLANDS_SMC_SWSTATE"), the size of
dpmLevels in struct SISLANDS_SMC_STATETABLE should be fixed to be
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE instead of
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE - 1.
Fixes: 0e1aa13ca3ff ("drm/amd/pm: Replace one-element array with flexible-array in struct SISLANDS_SMC_SWSTATE")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Create new structure SISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
SISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.
Currently, the code fails because flexible array _levels_ in
struct SISLANDS_SMC_SWSTATE doesn't allow for code that access
the first element of initialState.levels and ACPIState.levels
arrays:
4353 table->initialState.levels[0].mclk.vDLL_CNTL =
4354 cpu_to_be32(si_pi->clock_registers.dll_cntl);
...
4555 table->ACPIState.levels[0].mclk.vDLL_CNTL =
4556 cpu_to_be32(dll_cntl);
because such element cannot exist without previously allocating
any dynamic memory for it (which never actually happens).
That's why struct SISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.
Also, with the change from one-element array to flexible-array member
in commit 96e27e8d919e ("drm/radeon/si_dpm: Replace one-element array
with flexible-array in struct SISLANDS_SMC_SWSTATE"), the size of
dpmLevels in struct SISLANDS_SMC_STATETABLE should be fixed to be
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE instead of
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE - 1.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1583
Fixes: 96e27e8d919e ("drm/radeon/si_dpm: Replace one-element array with flexible-array in struct SISLANDS_SMC_SWSTATE")
Cc: stable@vger.kernel.org
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Create new structure NISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
NISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.
Currently, the code fails because flexible array _levels_ in
struct NISLANDS_SMC_SWSTATE doesn't allow for code that access
the first element of initialState.levels and ACPIState.levels
arrays:
drivers/gpu/drm/radeon/ni_dpm.c:
1690 table->initialState.levels[0].mclk.vMPLL_AD_FUNC_CNTL =
1691 cpu_to_be32(ni_pi->clock_registers.mpll_ad_func_cntl);
...
1903: table->ACPIState.levels[0].mclk.vMPLL_AD_FUNC_CNTL = cpu_to_be32(mpll_ad_func_cntl);
1904: table->ACPIState.levels[0].mclk.vMPLL_AD_FUNC_CNTL_2 = cpu_to_be32(mpll_ad_func_cntl_2);
because such element cannot exist without previously allocating
any dynamic memory for it (which never actually happens).
That's why struct NISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.
Also, with the change from one-element array to flexible-array member
in commit 434fb1e7444a ("drm/radeon/nislands_smc.h: Replace one-element
array with flexible-array member in struct NISLANDS_SMC_SWSTATE"), the
size of dpmLevels in struct NISLANDS_SMC_STATETABLE should be fixed to
be NISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE instead of
NISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE - 1.
Bug: https://lore.kernel.org/dri-devel/3eedbe78-1fbd-4763-a7f3-ac5665e76a4a@xenosoft.de/
Fixes: 434fb1e7444a ("drm/radeon/nislands_smc.h: Replace one-element array with flexible-array member in struct NISLANDS_SMC_SWSTATE")
Cc: stable@vger.kernel.org
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Link: https://lore.kernel.org/dri-devel/9bb5fcbd-daf5-1669-b3e7-b8624b3c36f9@xenosoft.de/
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The new ana_log_size should be used instead of the old one.
Or kernel NULL pointer dereference will happen like below:
[ 38.957849][ T69] BUG: kernel NULL pointer dereference, address: 000000000000003c
[ 38.975550][ T69] #PF: supervisor write access in kernel mode
[ 38.975955][ T69] #PF: error_code(0x0002) - not-present page
[ 38.976905][ T69] PGD 0 P4D 0
[ 38.979388][ T69] Oops: 0002 [#1] SMP NOPTI
[ 38.980488][ T69] CPU: 0 PID: 69 Comm: kworker/0:2 Not tainted 5.12.0+ #54
[ 38.981254][ T69] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 38.982502][ T69] Workqueue: events nvme_loop_execute_work
[ 38.985219][ T69] RIP: 0010:memcpy_orig+0x68/0x10f
[ 38.986203][ T69] Code: 83 c2 20 eb 44 48 01 d6 48 01 d7 48 83 ea 20 0f 1f 00 48 83 ea 20 4c 8b 46 f8 4c 8b 4e f0 4c 8b 56 e8 4c 8b 5e e0 48 8d 76 e0 <4c> 89 47 f8 4c 89 4f f0 4c 89 57 e8 4c 89 5f e0 48 8d 7f e0 73 d2
[ 38.987677][ T69] RSP: 0018:ffffc900001b7d48 EFLAGS: 00000287
[ 38.987996][ T69] RAX: 0000000000000020 RBX: 0000000000000024 RCX: 0000000000000010
[ 38.988327][ T69] RDX: ffffffffffffffe4 RSI: ffff8881084bc004 RDI: 0000000000000044
[ 38.988620][ T69] RBP: 0000000000000024 R08: 0000000100000000 R09: 0000000000000000
[ 38.988991][ T69] R10: 0000000100000000 R11: 0000000000000001 R12: 0000000000000024
[ 38.989289][ T69] R13: ffff8881084bc000 R14: 0000000000000000 R15: 0000000000000024
[ 38.989845][ T69] FS: 0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
[ 38.990234][ T69] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 38.990490][ T69] CR2: 000000000000003c CR3: 00000001085b2000 CR4: 00000000000006f0
[ 38.991105][ T69] Call Trace:
[ 38.994157][ T69] sg_copy_buffer+0xb8/0xf0
[ 38.995357][ T69] nvmet_copy_to_sgl+0x48/0x6d
[ 38.995565][ T69] nvmet_execute_get_log_page_ana+0xd4/0x1cb
[ 38.995792][ T69] nvmet_execute_get_log_page+0xc9/0x146
[ 38.995992][ T69] nvme_loop_execute_work+0x3e/0x44
[ 38.996181][ T69] process_one_work+0x1c3/0x3c0
[ 38.996393][ T69] worker_thread+0x44/0x3d0
[ 38.996600][ T69] ? cancel_delayed_work+0x90/0x90
[ 38.996804][ T69] kthread+0xf7/0x130
[ 38.996961][ T69] ? kthread_create_worker_on_cpu+0x70/0x70
[ 38.997171][ T69] ret_from_fork+0x22/0x30
[ 38.997705][ T69] Modules linked in:
[ 38.998741][ T69] CR2: 000000000000003c
[ 39.000104][ T69] ---[ end trace e719927b609d0fa0 ]---
Fixes: 5e1f689913a4 ("nvme-multipath: fix double initialization of ANA state")
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This may happen if the port becomes resume status exactly
when usb_port_resume() gets port status, it still need provide
a TRSMCRY time before access the device.
CC: <stable@vger.kernel.org>
Reported-by: Tianping Fang <tianping.fang@mediatek.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/20210512020738.52961-1-chunfeng.yun@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is a regression introduced by 1373fefc6243 ("usb: typec: tcpm:
Allow slow charging loops to comply to pSnkStby")
When Source advertises Rp-default, tcpm would request 500mA when in
SINK_DISCOVERY, Type-C spec advises the sink to follow BC1.2 current
limits when Rp-default is advertised.
[12750.503381] Requesting mux state 1, usb-role 2, orientation 1
[12750.503837] state change SNK_ATTACHED -> SNK_STARTUP [rev3 NONE_AMS]
[12751.003891] state change SNK_STARTUP -> SNK_DISCOVERY
[12751.003900] Setting voltage/current limit 5000 mV 500 mA
This patch restores the behavior where the tcpm would request 0mA when
Rp-default is advertised by the source.
[ 73.174252] Requesting mux state 1, usb-role 2, orientation 1
[ 73.174749] state change SNK_ATTACHED -> SNK_STARTUP [rev3 NONE_AMS]
[ 73.674800] state change SNK_STARTUP -> SNK_DISCOVERY
[ 73.674808] Setting voltage/current limit 5000 mV 0 mA
During SNK_DISCOVERY, Cap the current limit to PD_P_SNK_STDBY_MW / 5 only
for slow_charger_loop case.
Fixes: 1373fefc6243 ("usb: typec: tcpm: Allow slow charging loops to comply to pSnkStby")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Link: https://lore.kernel.org/r/20210510211756.3346954-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
One of AMD xhci controller require reset on resume.
Occasionally AMD xhci controller does not respond to
Stop endpoint command.
Once the issue happens controller goes into bad state
and in that case controller needs to be reset.
Cc: <stable@vger.kernel.org>
Signed-off-by: Sandeep Singh <sandeep.singh@amd.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210512080816.866037-6-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On some devices (specifically the SC8180x based Surface Pro X with
QCOM04A6) HC halt / xhci_halt() times out during boot. Manually binding
the xhci-hcd driver at some point later does not exhibit this behavior.
To work around this, double XHCI_MAX_HALT_USEC, which also resolves this
issue.
Cc: <stable@vger.kernel.org>
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210512080816.866037-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
'xhci_urb_enqueue()' is passed a 'mem_flags' argument, because "URBs may be
submitted in interrupt context" (see comment related to 'usb_submit_urb()'
in 'drivers/usb/core/urb.c')
So this flag should be used in all the calling chain.
Up to now, 'xhci_check_maxpacket()' which is only called from
'xhci_urb_enqueue()', uses GFP_KERNEL.
Be safe and pass the mem_flags to this function as well.
Fixes: ddba5cd0aeff ("xhci: Use command structures when queuing commands on the command ring")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210512080816.866037-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 9ebf30007858 ("xhci: Fix halted endpoint at stop endpoint command
completion") in 5.12 changes how cancelled URBs are given back.
To cancel a URB xhci driver needs to stop the endpoint first.
To clear a halted endpoint xhci driver needs to reset the endpoint.
In rare cases when an endpoint halt (error) races with a endpoint stop we
need to clear the reset before removing, and giving back the cancelled URB.
The above change in 5.12 takes care of this, but it also relies on the
reset endpoint completion handler to give back the cancelled URBs.
There are cases when driver refuses to queue reset endpoint commands,
for example when a link suddenly goes to an inactive error state.
In this case the cancelled URB is never given back.
Fix this by giving back the URB in the stop endpoint if queuing a reset
endpoint command fails.
Fixes: 9ebf30007858 ("xhci: Fix halted endpoint at stop endpoint command completion")
CC: <stable@vger.kernel.org> # 5.12
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210512080816.866037-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In the same way as Intel Tiger Lake TCSS (Type-C Subsystem) the Alder Lake
TCSS xHCI needs to be runtime suspended whenever possible to allow the
TCSS hardware block to enter D3cold and thus save energy.
Cc: stable@vger.kernel.org
Signed-off-by: Abhijeet Rao <abhijeet.rao@intel.com>
Signed-off-by: Nikunj A. Dadhania <nikunj.dadhania@intel.com>
Signed-off-by: Azhar Shaikh <azhar.shaikh@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210512080816.866037-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
generations
Some AMD Ryzen generations has different calculation method on maximum
performance. 255 is not for all ASICs, some specific generations should use 166
as the maximum performance. Otherwise, it will report incorrect frequency value
like below:
~ → lscpu | grep MHz
CPU MHz: 3400.000
CPU max MHz: 7228.3198
CPU min MHz: 2200.0000
[ mingo: Tidied up whitespace use. ]
[ Alexander Monakov <amonakov@ispras.ru>: fix 225 -> 255 typo. ]
Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems")
Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies")
Reported-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com>
Fixed-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210425073451.2557394-1-ray.huang@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211791
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
rather than a HEAD or PLAIN type instead, which means its pclustersize
_must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
as illustrated below:
HEAD HEAD / PLAIN lcluster type
____________ ____________
|_:__________|_________:__| file data (uncompressed)
. .
.____________.
|____________| pcluster data (compressed)
Such on-disk case was explained before [1] but missed to be handled
properly in the runtime implementation.
It can be observed if manually generating 1 lcluster-sized pcluster
with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.
[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org
Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org
Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
|