Age | Commit message (Collapse) | Author |
|
Erhard reported that his G5 was crashing with v6.6-rc kernels:
mpic: Setting up HT PICs workarounds for U3/U4
BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe
Faulting instruction address: 0xc00000000005dc40
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 6.6.0-rc3-PMacGS #1
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
NIP: c00000000005dc40 LR: c000000000066660 CTR: c000000000007730
REGS: c0000000022bf510 TRAP: 0380 Tainted: G T (6.6.0-rc3-PMacGS)
MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 44004242 XER: 00000000
IRQMASK: 3
GPR00: 0000000000000000 c0000000022bf7b0 c0000000010c0b00 00000000000001ac
GPR04: 0000000003c80000 0000000000000300 c0000000f20001ae 0000000000000300
GPR08: 0000000000000006 feffbb62ffec65ff 0000000000000001 0000000000000000
GPR12: 9000000000001032 c000000002362000 c000000000f76b80 000000000349ecd8
GPR16: 0000000002367ba8 0000000002367f08 0000000000000006 0000000000000000
GPR20: 00000000000001ac c000000000f6f920 c0000000022cd985 000000000000000c
GPR24: 0000000000000300 00000003b0a3691d c0003e008030000e 0000000000000000
GPR28: c00000000000000c c0000000f20001ee feffbb62ffec65fe 00000000000001ac
NIP hash_page_do_lazy_icache+0x50/0x100
LR __hash_page_4K+0x420/0x590
Call Trace:
hash_page_mm+0x364/0x6f0
do_hash_fault+0x114/0x2b0
data_access_common_virt+0x198/0x1f0
--- interrupt: 300 at mpic_init+0x4bc/0x10c4
NIP: c000000002020a5c LR: c000000002020a04 CTR: 0000000000000000
REGS: c0000000022bf9f0 TRAP: 0300 Tainted: G T (6.6.0-rc3-PMacGS)
MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 24004248 XER: 00000000
DAR: c0003e008030000e DSISR: 40000000 IRQMASK: 1
...
NIP mpic_init+0x4bc/0x10c4
LR mpic_init+0x464/0x10c4
--- interrupt: 300
pmac_setup_one_mpic+0x258/0x2dc
pmac_pic_init+0x28c/0x3d8
init_IRQ+0x90/0x140
start_kernel+0x57c/0x78c
start_here_common+0x1c/0x20
A bisect pointed to the breakage beginning with commit 9fee28baa601 ("powerpc:
implement the new page table range API").
Analysis of the oops pointed to a struct page with a corrupted
compound_head being loaded via page_folio() -> _compound_head() in
hash_page_do_lazy_icache().
The access by the mpic code is to an MMIO address, so the expectation
is that the struct page for that address would be initialised by
init_unavailable_range(), as pointed out by Aneesh.
Instrumentation showed that was not the case, which eventually lead to
the realisation that pfn_valid() was returning false for that address,
causing the struct page to not be initialised.
Because the system is using FLATMEM, the version of pfn_valid() in
memory_model.h is used:
static inline int pfn_valid(unsigned long pfn)
{
...
return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
}
Which relies on max_mapnr being initialised. Early in boot max_mapnr is
zero meaning no PFNs are valid.
max_mapnr is initialised in mem_init() called via:
start_kernel()
mm_core_init() # init/main.c:928
mem_init()
But that is too late for the usage in init_unavailable_range() called via:
start_kernel()
setup_arch() # init/main.c:893
paging_init()
free_area_init()
init_unavailable_range()
Although max_mapnr is currently set in mem_init(), the value is actually
already available much earlier, as soon as mem_topology_setup() has
completed, which is also before paging_init() is called. So move the
initialisation there, which causes paging_init() to correctly initialise
the struct page and fixes the bug.
This bug seems to have been lurking for years, but went unnoticed
because the pre-folio code was inspecting the uninitialised page->flags
but not dereferencing it.
Thanks to Erhard and Aneesh for help debugging.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/20230929132750.3cd98452@yea/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231023112500.1550208-1-mpe@ellerman.id.au
|
|
yield_cpu is a sample of a preempted lock holder that gets propagated
back through the queue. Queued waiters use this to yield to the
preempted lock holder without continually sampling the lock word (which
would defeat the purpose of MCS queueing by bouncing the cache line).
The problem is that yield_cpu can become stale. It can take some time to
be passed down the chain, and if any queued waiter gets preempted then
it will cease to propagate the yield_cpu to later waiters.
This can result in yielding to a CPU that no longer holds the lock,
which is bad, but particularly if it is currently in H_CEDE (idle),
then it appears to be preempted and some hypervisors (PowerVM) can
cause very long H_CONFER latencies waiting for H_CEDE wakeup. This
results in latency spikes and hard lockups on oversubscribed
partitions with lock contention.
This is a minimal fix. Before yielding to yield_cpu, sample the lock
word to confirm yield_cpu is still the owner, and bail out of it is not.
Thanks to a bunch of people who reported this and tracked down the
exact problem using tracepoints and dispatch trace logs.
Fixes: 28db61e207ea ("powerpc/qspinlock: allow propagation of yield CPU down the queue")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Laurent Dufour <ldufour@linux.ibm.com>
Reported-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Debugged-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-2-npiggin@gmail.com
|
|
Sachin reported a warning when running the inject-ra-err selftest:
# selftests: powerpc/mce: inject-ra-err
Disabling lock debugging due to kernel taint
MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered]
MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48]
MCE: CPU19: Initiator CPU
MCE: CPU19: Unknown
------------[ cut here ]------------
WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180
CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G M E 6.6.0-rc3-00055-g9ed22ae6be81 #4
Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
...
NIP radix__tlb_flush+0x160/0x180
LR radix__tlb_flush+0x104/0x180
Call Trace:
radix__tlb_flush+0xf4/0x180 (unreliable)
tlb_finish_mmu+0x15c/0x1e0
exit_mmap+0x1a0/0x510
__mmput+0x60/0x1e0
exit_mm+0xdc/0x170
do_exit+0x2bc/0x5a0
do_group_exit+0x4c/0xc0
sys_exit_group+0x28/0x30
system_call_exception+0x138/0x330
system_call_vectored_common+0x15c/0x2ec
And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine
final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
in radix__tlb_flush() if mm->context.copros is still elevated.
However it's possible for the copros count to be elevated if a process
exits without first closing file descriptors that are associated with a
copro, eg. VAS.
If the process exits with a VAS file still open, the release callback
is queued up for exit_task_work() via:
exit_files()
put_files_struct()
close_files()
filp_close()
fput()
And called via:
exit_task_work()
____fput()
__fput()
file->f_op->release(inode, file)
coproc_release()
vas_user_win_ops->close_win()
vas_deallocate_window()
mm_context_remove_vas_window()
mm_context_remove_copro()
But that is after exit_mm() has been called from do_exit() and triggered
the warning.
Fix it by dropping the warning, and always calling __flush_all_mm().
In the normal case of no copros, that will result in a call to
_tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
does.
If the copros count is elevated then it will cause a global flush, which
should flush translations from any copros. Note that the process table
entry was cleared in arch_exit_mmap(), so copros should not be able to
fetch any new translations.
Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://msgid.link/20231017121527.1574104-1-mpe@ellerman.id.au
|
|
Christophe reported that the change to ARCH_FORCE_MAX_ORDER to limit the
range to 10 had broken his ability to configure hugepages:
# echo 1 > /sys/kernel/mm/hugepages/hugepages-8192kB/nr_hugepages
sh: write error: Invalid argument
Several of the powerpc defconfigs previously set the
ARCH_FORCE_MAX_ORDER value to 12, via the definition in
arch/powerpc/configs/fsl-emb-nonhw.config, used by:
mpc85xx_defconfig
mpc85xx_smp_defconfig
corenet32_smp_defconfig
corenet64_smp_defconfig
mpc86xx_defconfig
mpc86xx_smp_defconfig
Fix it by increasing the allowed range to 12 to restore the previous
behaviour.
Fixes: 358e526a1648 ("powerpc/mm: Reinstate ARCH_FORCE_MAX_ORDER ranges")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Closes: https://lore.kernel.org/all/8011d806-5b30-bf26-2bfe-a08c39d57e20@csgroup.eu/
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230824122849.942072-1-mpe@ellerman.id.au
|
|
Eddie reported that newer kernels were crashing during boot on his 476
FSP2 system:
kernel tried to execute user page (b7ee2000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xb7ee2000
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K FSP-2
Modules linked in:
CPU: 0 PID: 61 Comm: mount Not tainted 6.1.55-d23900f.ppcnf-fsp2 #1
Hardware name: ibm,fsp2 476fpe 0x7ff520c0 FSP-2
NIP: b7ee2000 LR: 8c008000 CTR: 00000000
REGS: bffebd83 TRAP: 0400 Not tainted (6.1.55-d23900f.ppcnf-fs p2)
MSR: 00000030 <IR,DR> CR: 00001000 XER: 20000000
GPR00: c00110ac bffebe63 bffebe7e bffebe88 8c008000 00001000 00000d12 b7ee2000
GPR08: 00000033 00000000 00000000 c139df10 48224824 1016c314 10160000 00000000
GPR16: 10160000 10160000 00000008 00000000 10160000 00000000 10160000 1017f5b0
GPR24: 1017fa50 1017f4f0 1017fa50 1017f740 1017f630 00000000 00000000 1017f4f0
NIP [b7ee2000] 0xb7ee2000
LR [8c008000] 0x8c008000
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 0000000000000000 ]---
The problem is in ret_from_syscall where the check for
icache_44x_need_flush is done. When the flush is needed the code jumps
out-of-line to do the flush, and then intends to jump back to continue
the syscall return.
However the branch back to label 1b doesn't return to the correct
location, instead branching back just prior to the return to userspace,
causing bogus register values to be used by the rfi.
The breakage was introduced by commit 6f76a01173cc
("powerpc/syscall: implement system call entry/exit logic in C for PPC32") which
inadvertently removed the "1" label and reused it elsewhere.
Fix it by adding named local labels in the correct locations. Note that
the return label needs to be outside the ifdef so that CONFIG_PPC_47x=n
compiles.
Fixes: 6f76a01173cc ("powerpc/syscall: implement system call entry/exit logic in C for PPC32")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Eddie James <eajames@linux.ibm.com>
Tested-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/fdaadc46-7476-9237-e104-1d2168526e72@linux.ibm.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://msgid.link/20231010114750.847794-1-mpe@ellerman.id.au
|
|
Booting mpc85xx_defconfig kernel on QEMU leads to:
Bad trap at PC: fe9bab0, SR: 2d000, vector=800
awk[82]: unhandled trap (5) at 0 nip fe9bab0 lr fe9e01c code 5 in libc-2.27.so[fe5a000+17a000]
awk[82]: code: 3aa00000 3a800010 4bffe03c 9421fff0 7ca62b78 38a00000 93c10008 83c10008
awk[82]: code: 38210010 4bffdec8 9421ffc0 7c0802a6 <fc00048e> d8010008 4815190d 93810030
Trace/breakpoint trap
WARNING: no useful console
This is because allthough CONFIG_MATH_EMULATION is selected,
Exception 800 calls unknown_exception().
Call emulation_assist_interrupt() instead.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/066caa6d9480365da9b8ed83692d7101e10ac5f8.1695657339.git.christophe.leroy@csgroup.eu
|
|
Commit 45201c879469 ("powerpc/nohash: Remove hash related code from
nohash headers.") replaced:
if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
return 0;
By:
if (pte_young(*ptep))
return 0;
But it should be:
if (!pte_young(*ptep))
return 0;
Fix it.
Fixes: 45201c879469 ("powerpc/nohash: Remove hash related code from nohash headers.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/8bb7f06494e21adada724ede47a4c3d97e879d40.1695659959.git.christophe.leroy@csgroup.eu
|
|
On 8xx, PAGE_NONE is handled by setting _PAGE_NA instead of clearing
_PAGE_USER.
But then pte_user() returns 1 also for PAGE_NONE.
As _PAGE_NA prevent reads, add a specific version of pte_read()
that returns 0 when _PAGE_NA is set instead of always returning 1.
Fixes: 351750331fc1 ("powerpc/mm: Introduce _PAGE_NA")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/57bcfbe578e43123f9ed73e040229b80f1ad56ec.1695659959.git.christophe.leroy@csgroup.eu
|
|
In the plpar_hcall trace code, currently we use r0
to store the value of r4. But this value is not
used subsequently in the code. Hence remove this unused
save to r0 in plpar_hcall and plpar_hcall9
Suggested-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230929172337.7906-2-atrajeev@linux.vnet.ibm.com
|
|
In powerpc pseries system, below behaviour is observed while
enabling tracing on hcall:
# cd /sys/kernel/debug/tracing/
# cat events/powerpc/hcall_exit/enable
0
# echo 1 > events/powerpc/hcall_exit/enable
# ls
-bash: fork: Bad address
Above is from power9 lpar with latest kernel. Past this, softlockup
is observed. Initially while attempting via perf_event_open to
use "PERF_TYPE_TRACEPOINT", kernel panic was observed.
perf config used:
================
memset(&pe[1],0,sizeof(struct perf_event_attr));
pe[1].type=PERF_TYPE_TRACEPOINT;
pe[1].size=96;
pe[1].config=0x26ULL; /* 38 raw_syscalls/sys_exit */
pe[1].sample_type=0; /* 0 */
pe[1].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1f */
pe[1].inherit=1;
pe[1].precise_ip=0; /* arbitrary skid */
pe[1].wakeup_events=0;
pe[1].bp_type=HW_BREAKPOINT_EMPTY;
pe[1].config1=0x1ULL;
Kernel panic logs:
==================
Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000008
Faulting instruction address: 0xc0000000004c2814
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nfnetlink bonding tls rfkill sunrpc dm_service_time dm_multipath pseries_rng xts vmx_crypto xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth dm_mirror dm_region_hash dm_log dm_mod fuse
CPU: 0 PID: 1431 Comm: login Not tainted 6.4.0+ #1
Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.30 (VL950_892) hv:phyp pSeries
NIP page_remove_rmap+0x44/0x320
LR wp_page_copy+0x384/0xec0
Call Trace:
0xc00000001416e400 (unreliable)
wp_page_copy+0x384/0xec0
__handle_mm_fault+0x9d4/0xfb0
handle_mm_fault+0xf0/0x350
___do_page_fault+0x48c/0xc90
hash__do_page_fault+0x30/0x70
do_hash_fault+0x1a4/0x330
data_access_common_virt+0x198/0x1f0
--- interrupt: 300 at 0x7fffae971abc
git bisect tracked this down to below commit:
'commit baa49d81a94b ("powerpc/pseries: hvcall stack frame overhead")'
This commit changed STACK_FRAME_OVERHEAD (112 ) to
STACK_FRAME_MIN_SIZE (32 ) since 32 bytes is the minimum size
for ELFv2 stack. With the latest kernel, when running on ELFv2,
STACK_FRAME_MIN_SIZE is used to allocate stack size.
During plpar_hcall_trace, first call is made to HCALL_INST_PRECALL
which saves the registers and allocates new stack frame. In the
plpar_hcall_trace code, STK_PARAM is accessed at two places.
1. To save r4: std r4,STK_PARAM(R4)(r1)
2. To access r4 back: ld r12,STK_PARAM(R4)(r1)
HCALL_INST_PRECALL precall allocates a new stack frame. So all
the stack parameter access after the precall, needs to be accessed
with +STACK_FRAME_MIN_SIZE. So the store instruction should be:
std r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
If the "std" is not updated with STACK_FRAME_MIN_SIZE, we will
end up with overwriting stack contents and cause corruption.
But instead of updating 'std', we can instead remove it since
HCALL_INST_PRECALL already saves it to the correct location.
similarly load instruction should be:
ld r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
Fix the load instruction to correctly access the stack parameter
with +STACK_FRAME_MIN_SIZE and remove the store of r4 since the
precall saves it correctly.
Cc: stable@vger.kernel.org # v6.2+
Fixes: baa49d81a94b ("powerpc/pseries: hvcall stack frame overhead")
Co-developed-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230929172337.7906-1-atrajeev@linux.vnet.ibm.com
|
|
In order to use run_kselftest.sh the list of tests must be emitted to
populate kselftest-list.txt.
The powerpc Makefile is written to use EMIT_TESTS. But support for
EMIT_TESTS was dropped in commit d4e59a536f50 ("selftests: Use runner.sh
for emit targets"). Although prior to that commit a548de0fe8e1
("selftests: lib.mk: add test execute bit check to EMIT_TESTS") had
already broken run_kselftest.sh for powerpc due to the executable check
using the wrong path.
It can be fixed by replacing the EMIT_TESTS definitions with actual
emit_tests rules in the powerpc Makefiles. This makes run_kselftest.sh
able to run powerpc tests:
$ cd linux
$ export ARCH=powerpc
$ export CROSS_COMPILE=powerpc64le-linux-gnu-
$ make headers
$ make -j -C tools/testing/selftests install
$ grep -c "^powerpc" tools/testing/selftests/kselftest_install/kselftest-list.txt
182
Fixes: d4e59a536f50 ("selftests: Use runner.sh for emit targets")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230921072623.828772-1-mpe@ellerman.id.au
|
|
The changes to copy_thread() made in commit eed7c420aac7 ("powerpc:
copy_thread differentiate kthreads and user mode threads") inadvertently
broke arch_stack_walk_reliable() because it has knowledge of the stack
layout.
Fix it by changing the condition to match the new logic in
copy_thread(). The changes make the comments about the stack layout
incorrect, rather than rephrasing them just refer the reader to
copy_thread().
Also the comment about the stack backchain is no longer true, since
commit edbd0387f324 ("powerpc: copy_thread add a back chain to the
switch stack frame"), so remove that as well.
Fixes: eed7c420aac7 ("powerpc: copy_thread differentiate kthreads and user mode threads")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230921232441.1181843-1-mpe@ellerman.id.au
|
|
Syzkaller reported a sleep in atomic context bug relating to the HASHCHK
handler logic:
BUG: sleeping function called from invalid context at arch/powerpc/kernel/traps.c:1518
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 25040, name: syz-executor
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
no locks held by syz-executor/25040.
irq event stamp: 34
hardirqs last enabled at (33): [<c000000000048b38>] prep_irq_for_enabled_exit arch/powerpc/kernel/interrupt.c:56 [inline]
hardirqs last enabled at (33): [<c000000000048b38>] interrupt_exit_user_prepare_main+0x148/0x600 arch/powerpc/kernel/interrupt.c:230
hardirqs last disabled at (34): [<c00000000003e6a4>] interrupt_enter_prepare+0x144/0x4f0 arch/powerpc/include/asm/interrupt.h:176
softirqs last enabled at (0): [<c000000000281954>] copy_process+0x16e4/0x4750 kernel/fork.c:2436
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 15 PID: 25040 Comm: syz-executor Not tainted 6.5.0-rc5-00001-g3ccdff6bb06d #3
Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_021) hv:phyp pSeries
Call Trace:
[c0000000a8247ce0] [c00000000032b0e4] __might_resched+0x3b4/0x400 kernel/sched/core.c:10189
[c0000000a8247d80] [c0000000008c7dc8] __might_fault+0xa8/0x170 mm/memory.c:5853
[c0000000a8247dc0] [c00000000004160c] do_program_check+0x32c/0xb20 arch/powerpc/kernel/traps.c:1518
[c0000000a8247e50] [c000000000009b2c] program_check_common_virt+0x3bc/0x3c0
To determine if a trap was caused by a HASHCHK instruction, we inspect
the user instruction that triggered the trap. However this may sleep
if the page needs to be faulted in (get_user_instr() reaches
__get_user(), which calls might_fault() and triggers the bug message).
Move the HASHCHK handler logic to after we allow IRQs, which is fine
because we are only interested in HASHCHK if it's a user space trap.
Fixes: 5bcba4e6c13f ("powerpc/dexcr: Handle hashchk exception")
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230915034604.45393-1-bgray@linux.ibm.com
|
|
It used to be impossible to select CONFIG_CPM2 without selecting
CONFIG_FSL_SOC at the same time because CONFIG_CPM2 was dependent
on CONFIG_8260 and CONFIG_8260 was selecting CONFIG_FSL_SOC.
But after commit eb5aa2137275 ("powerpc/82xx: Remove CONFIG_8260
and CONFIG_8272") CONFIG_CPM2 depends on CONFIG_PPC_82xx instead
but CONFIG_PPC_82xx doesn't directly selects CONFIG_FSL_SOC.
Fix it by forcing CONFIG_PPC_82xx to select CONFIG_FSL_SOC just
like already done by PPC_8xx, PPC_MPC512x, PPC_83xx, PPC_86xx.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: eb5aa2137275 ("powerpc/82xx: Remove CONFIG_8260 and CONFIG_8272")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7ab513546148ebe33ddd4b0ea92c7bfd3cce3ad7.1694705016.git.christophe.leroy@csgroup.eu
|
|
FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
We recently added support for -fpatchable-function-entry and it is
enabled by default on ppc32 (ppc64 needs gcc v13.1.0). When building the
kernel for ppc32 and also enabling CONFIG_LD_DEAD_CODE_DATA_ELIMINATION,
we see the below build error with older gcc versions:
powerpc-linux-gnu-ld: init/main.o(__patchable_function_entries): error: need linked-to section for --gc-sections
This error is thrown since __patchable_function_entries section would be
garbage collected with --gc-sections since it does not reference any
other kept sections. This has subsequently been fixed with:
https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=b7d072167715829eed0622616f6ae0182900de3e
Disable LD_DEAD_CODE_DATA_ELIMINATION for gcc versions before v11.1.0 if
using -fpatchable-function-entry to avoid this bug.
Fixes: 0f71dcfb4aef ("powerpc/ftrace: Add support for -fpatchable-function-entry")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230913134129.2782088-1-naveen@kernel.org
|
|
It can be easy to miss that the notifier mechanism invokes the callbacks
in an atomic context, so add some comments to that effect on the two
handlers we register here.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-4-bgray@linux.ibm.com
|
|
This is called in an atomic context, so is not allowed to sleep if a
user page needs to be faulted in and has nowhere it can be deferred to.
The pagefault_disabled() function is documented as preventing user
access methods from sleeping.
In practice the page will be mapped in nearly always because we are
reading the instruction that just triggered the watchpoint trap.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-3-bgray@linux.ibm.com
|
|
thread_change_pc() uses CPU local data, so must be protected from
swapping CPUs while it is reading the breakpoint struct.
The error is more noticeable after 1e60f3564bad ("powerpc/watchpoints:
Track perf single step directly on the breakpoint"), which added an
unconditional __this_cpu_read() call in thread_change_pc(). However the
existing __this_cpu_read() that runs if a breakpoint does need to be
re-inserted has the same issue.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-2-bgray@linux.ibm.com
|
|
Valid domain value is in range 1 to HV_PERF_DOMAIN_MAX. Current code has
check for domain value greater than or equal to HV_PERF_DOMAIN_MAX. But
the check for domain value 0 is missing.
Fix this issue by adding check for domain value 0.
Before:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Error:
The sys_perf_event_open() syscall returned with 5 (Input/output error) for
event (hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/).
/bin/dmesg | grep -i perf may provide additional information.
Result from dmesg:
[ 37.819387] hv-24x7: hcall failed: [0 0x60040000 0x100 0] => ret
0xfffffffffffffffc (-4) detail=0x2000000 failing ix=0
After:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Warning:
hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ event is not supported by the kernel.
failed to read counter hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/
Fixes: ebd4a5a3ebd9 ("powerpc/perf/hv-24x7: Minor improvements")
Reported-by: Krishan Gopal Sarawast <krishang@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230825055601.360083-1-kjain@linux.ibm.com
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix an UV boot crash
- Skip spurious ENDBR generation on _THIS_IP_
- Fix ENDBR use in putuser() asm methods
- Fix corner case boot crashes on 5-level paging
- and fix a false positive WARNING on LTO kernels"
* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove LTO flags
x86/boot/compressed: Reserve more memory for page tables
x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
x86/ibt: Suppress spurious ENDBR
x86/platform/uv: Use alternate source for socket to node data
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors"
* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
sched/fair: Fix SMT4 group_smt_balance handling
sched/fair: Optimize should_we_balance() for large SMT systems
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Ingo Molnar:
"Fix a cold functions related false-positive objtool warning that
triggers on Clang"
* tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix _THIS_IP_ detection for cold functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull WARN fix from Ingo Molnar:
"Fix a missing preempt-enable in the WARN() slowpath"
* tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
panic: Reenable preemption in WARN slowpath
|
|
The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit a52dd971f947 ("vfs: de-crapify "cp_new_stat()" function").
Then a decade later Mikulas noticed that said inconsistency had been a
mistake in the early x86-64 port, and shouldn't have existed in the
first place. So commit 932aba1e1690 ("stat: fix inconsistency between
struct stat and struct compat_stat") removed the uses of the helpers.
But the helpers remained around, unused.
Get rid of them.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull smb client fixes from Steve French:
"Three small SMB3 client fixes, one to improve a null check and two
minor cleanups"
* tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix some minor typos and repeated words
smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP
smb3: move server check earlier when setting channel sequence number
|
|
Pull smb server fixes from Steve French:
"Two ksmbd server fixes"
* tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd:
ksmbd: fix passing freed memory 'aux_payload_buf'
ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Regression and bug fixes for ext4"
* tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix rec_len verify error
ext4: do not let fstrim block system suspend
ext4: move setting of trimmed bit into ext4_try_to_trim_range()
jbd2: Fix memory leak in journal_init_common()
jbd2: Remove page size assumptions
buffer: Make bh_offset() work for compound pages
|
|
-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:
$ readelf -S purgatory.ro | grep " .text"
[ 1] .text PROGBITS 0000000000000000 00000040
[ 7] .text.purgatory PROGBITS 0000000000000000 000020e0
[ 9] .text.warn PROGBITS 0000000000000000 000021c0
[13] .text.sha256_upda PROGBITS 0000000000000000 000022f0
[15] .text.sha224_upda PROGBITS 0000000000000000 00002be0
[17] .text.sha256_fina PROGBITS 0000000000000000 00002bf0
[19] .text.sha224_fina PROGBITS 0000000000000000 00002cc0
This causes WARNING from kexec_purgatory_setup_sechdrs():
WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
kexec_load_purgatory+0x37f/0x390
Fix this by disabling LTO for purgatory.
[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]
We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.
Fixes: b33fff07e3e3 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
|
|
The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.
The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.
In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.
Update wost-case calculations to include 5-level paging.
To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.
[ Also add a warning, should this memory run low - by Dave Hansen. ]
Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix kernel-devel RPM and linux-headers Deb package
- Fix too long argument list error in 'make modules_install'
* tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: avoid long argument lists in make modules_install
kbuild: fix kernel-devel RPM package and linux-headers Deb package
|
|
Commit 408579cd627a ("mm: Update do_vmi_align_munmap() return
semantics") seems to have updated one of the callers of do_vmi_munmap()
incorrectly: it used to check for the error case (which didn't
change: negative means error).
That commit changed the check to the success case (which did change:
before that commit, 0 was success, and 1 was "success and lock
downgraded". After the change, it's always 0 for success, and the lock
will have been released if requested).
This didn't change any actual VM behavior _except_ for memory accounting
when 'VM_ACCOUNT' was set on the vma. Which made the wrong return value
test fairly subtle, since everything continues to work.
Or rather - it continues to work but the "Committed memory" accounting
goes all wonky (Committed_AS value in /proc/meminfo), and depending on
settings that then causes problems much much later as the VM relies on
bogus statistics for its heuristics.
Revert that one line of the change back to the original logic.
Fixes: 408579cd627a ("mm: Update do_vmi_align_munmap() return semantics")
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Reported-bisected-and-tested-by: Michael Labiuk <michael.labiuk@virtuozzo.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"16 small(ish) fixes all in drivers.
The major fixes are in pm8001 (fixes MSI-X issue going back to its
origin), the qla2xxx endianness fix, which fixes a bug on big endian
and the lpfc ones which can cause an oops on module removal without
them"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
scsi: target: core: Fix target_cmd_counter leak
scsi: pm8001: Setup IRQs on resume
scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
scsi: qedf: Add synchronization between I/O completions and abort
scsi: target: Replace strlcpy() with strscpy()
scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
scsi: qla2xxx: Correct endianness for rqstlen and rsplen
scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
scsi: megaraid_sas: Fix deadlock on firmware crashdump
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fixes from Damien Le Moal:
- Fix link power management transitions to disallow unsupported states
(Niklas)
- A small string handling fix for the sata_mv driver (Christophe)
- Clear port pending interrupts before reset, as per AHCI
specifications (Szuying).
Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
ata_eh_reset() to allow EH to continue on with other actions recorded
with error interrupts triggered before EH completes. And an
additional fix to avoid thawing a port twice in EH (Niklas)
- Small code style fixes in the pata_parport driver to silence the
build bot as it keeps complaining about bad indentation (me)
- A fix for the recent CDL code to avoid fetching sense data for
successful commands when not necessary for correct operation (Niklas)
* tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-core: fetch sense data for successful commands iff CDL enabled
ata: libata-eh: do not thaw the port twice in ata_eh_reset()
ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
ata: pata_parport: Fix code style issues
ata: libahci: clear pending interrupt status
ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
ata: libata: disallow dev-initiated LPM transitions to unsupported states
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fix from Greg KH:
"Here is a single USB fix for a much-reported regression for 6.6-rc1.
It resolves a crash in the typec debugfs code for many systems. It's
been in linux-next with no reported issues, and many people have
reported it resolving their problem with 6.6-rc1"
* tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix NULL pointer dereference
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here is a single driver core fix for a much-reported-by-sysbot issue
that showed up in 6.6-rc1. It's been submitted by many people, all in
the same way, so it obviously fixes things for them all.
Also in here is a single documentation update adding riscv to the
embargoed hardware document in case there are any future issues with
that processor family.
Both of these have been in linux-next with no reported problems"
* tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
driver core: return an error when dev_set_name() hasn't happened
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
comedi subsystem that has ended up being incorrect and caused drivers
that were working for people to be unable to be able to be selected to
build at all.
To fix this, the Kconfig change needs to be reverted and a future set
of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
no rush for them.)
This has been in linux-next with no reported issues"
* tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "comedi: add HAS_IOPORT dependencies"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"The main thing is the removal of 'probe_new' because all i2c client
drivers are converted now. Thanks Uwe, this marks the end of a long
conversion process.
Other than that, we have a few Kconfig updates and driver bugfixes"
* tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cadence: Fix the kernel-doc warnings
i2c: aspeed: Reset the i2c controller when timeout occurs
i2c: I2C_MLXCPLD on ARM64 should depend on ACPI
i2c: Make I2C_ATR invisible
i2c: Drop legacy callback .probe_new()
w1: ds2482: Switch back to use struct i2c_driver's .probe()
|
|
Currently, we fetch sense data for a _successful_ command if either:
1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
sense data supported bit is set); or
2) Command was non-NCQ and regular sense data reporting is enabled.
This means that case 2) will trigger for a non-NCQ command which has
ATA_SENSE bit set, regardless if CDL is enabled or not.
This decision was by design. If the device reports that it has sense data
available, it makes sense to fetch that sense data, since the sk/asc/ascq
could be important information regardless if CDL is enabled or not.
However, the fetching of sense data for a successful command is done via
ATA EH. Considering how intricate the ATA EH is, we really do not want to
invoke ATA EH unless absolutely needed.
Before commit 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL
commands using policy 0xD") we never fetched sense data for successful
commands.
In order to not invoke the ATA EH unless absolutely necessary, even if the
device claims support for sense data reporting, only fetch sense data for
successful (NCQ and non-NCQ commands) commands that are using CDL.
[Damien] Modified the check to test the qc flag ATA_QCFLAG_HAS_CDL
instead of the device support for CDL, which is implied for commands
using CDL.
Fixes: 3ac873c76d79 ("ata: libata-core: fix when to fetch sense data for successful commands")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
a workaround that broke the retry mechanism in ATA EH.
Tejun himself suggested to remove this workaround when it was identified
to cause additional problems:
https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
He even said:
"Hmm... it seems I wasn't thinking straight when I added that work around."
https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
While removing the workaround solved the issue, however, the workaround was
kept to avoid "spurious hotplug events during reset", and instead another
workaround was added on top of the existing workaround in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
Because these IRQs happened when the port was frozen, we know that they
were actually a side effect of PxIS and IS.IPS(x) not being cleared before
the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
pending interrupt status"), so these workarounds can now be removed.
Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
now been reverted, the ATA EH retry mechanism is functional again, so there
is once again no need to thaw the port more than once in ata_eh_reset().
This reverts "the workaround on top of the workaround" introduced in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).
If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.
Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.
The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().
ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.
The additional clearing in ata_eh_reset() was introduced in commit
1e641060c4b5 ("libata: clear eh_info on reset completion").
Looking at the original error report:
https://marc.info/?l=linux-ide&m=124765325828495&w=2
We can see the following happening:
[ 1.074659] ata3: XXX port freeze
[ 1.074700] ata3: XXX hardresetting link, stopping engine
[ 1.074746] ata3: XXX flipping SControl
[ 1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[ 1.411475] ata3: XXX port freeze
[ 1.420049] ata3: XXX starting engine
[ 1.420096] ata3: XXX rc=0, class=1
[ 1.420142] ata3: XXX clearing IRQs for thawing
[ 1.420188] ata3: XXX port thawed
[ 1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."
Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.
So IS.IPS(x) only gets set if PxIS and PxIE is set.
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."
So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.
Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.
However, since commit 737dd811a3db ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit 1e641060c4b5 ("libata:
clear eh_info on reset completion") fixed can no longer happen.
Thus, revert commit 1e641060c4b5 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more kselftest fixes from Shuah Khan
"Fixes to user_events test and ftrace test.
The user_events test was enabled by default in Linux 6.6-rc1. The
following fixes are for bugs found since then:
- add checks for dependencies and skip the test if they aren't met.
The user_events test requires root access, and tracefs and
user_events enabled. It leaves tracefs mounted and a fix is in
progress for that missing piece.
- create user_events test-specific Kconfig fragments
ftrace test fixes:
- unmount tracefs for recovering environment. Fix identified during
the above mentioned user_events dependencies fix.
- adds softlink to latest log directory improving usage"
* tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: tracing: Fix to unmount tracefs for recovering environment
selftests: user_events: create test-specific Kconfig fragments
ftrace/selftests: Add softlink to latest log directory
selftests/user_events: Fix failures when user_events is not installed
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Use correct order when encoding NFSv4 RENAME change_info
- Fix a potential oops during NFSD shutdown
* tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: fix possible oops when nfsd/pool_stats is closed.
nfsd: fix change_info in NFSv4 RENAME replies
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix the handling of block devices in the test_resume mode of
hibernation (Chen Yu)"
* tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: Fix the exclusive get block device in test_resume mode
PM: hibernate: Rename function parameter from snapshot_test to exclusive
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix a thermal core breakage introduced by one of the recent
changes, amend those changes by adding 'const' to a new callback
argument and fix two memory leaks.
Specifics:
- Unbreak disabled trip point check in handle_thermal_trip() that may
cause it to skip enabled trip points (Rafael Wysocki)
- Add missing of_node_put() to of_find_trip_id() and
thermal_of_for_each_cooling_maps() that each break out of a
for_each_child_of_node() loop without dropping the reference to the
child object (Julia Lawall)
- Constify the recently added trip argument of the .get_trend()
thermal zone callback (Rafael Wysocki)"
* tag 'thermal-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Fix disabled trip point check in handle_thermal_trip()
thermal: Constify the trip argument of the .get_trend() zone callback
thermal/of: add missing of_node_put()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM core retrieve_deps() UAF race due to missing locking of a DM
table's list of devices that is managed using dm_{get,put}_device.
- Revert DM core's half-baked RCU optimization if IO submitter has set
REQ_NOWAIT. Can be revisited, and properly justified, after
comprehensively auditing all of DM to also pass GFP_NOWAIT for any
allocations if REQ_NOWAIT used.
* tag 'for-6.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: don't attempt to queue IO under RCU protection
dm: fix a race condition in retrieve_deps
|
|
Pull block fixes from Jens Axboe:
- NVMe pull via Keith:
- nvme-tcp iov len fix (Varun)
- nvme-hwmon const qualifier for safety (Krzysztof)
- nvme-fc null pointer checks (Nigel)
- nvme-pci no numa node fix (Pratyush)
- nvme timeout fix for non-compliant controllers (Keith)
- MD pull via Song fixing regressions with both 6.5 and 6.6
- Fix a use-after-free regression in resizing blk-mq tags (Chengming)
* tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux:
nvme: avoid bogus CRTO values
md: Put the right device in md_seq_next
nvme-pci: do not set the NUMA node of device if it has none
blk-mq: fix tags UAF when shrinking q->nr_hw_queues
md/raid1: fix error: ISO C90 forbids mixed declarations
md: fix warning for holder mismatch from export_rdev()
md: don't dereference mddev after export_rdev()
nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()
nvme: host: hwmon: constify pointers to hwmon_channel_info
nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()
|
|
Pull io_uring fix from Jens Axboe:
"Just a single fix, fixing a regression with poll first, recvmsg, and
using a provided buffer"
* tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux:
io_uring/net: fix iter retargeting for selected buf
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"A change applied to v6.5 kernel brings an issue that usual GFP
allocation is done in atomic context under acquired spin-lock. Let us
revert it"
* tag 'firewire-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
Revert "firewire: core: obsolete usage of GFP_ATOMIC at building node tree"
|
|
Pull drm fixes from Dave Airlie:
"Regular rc2 fixes pull, mostly made up of amdgpu stuff, one i915, and
a bunch of others, one vkms locking violation is reverted.
connector:
- doc fix
exec:
- workaround lockdep issue
tests:
- fix a UAF
vkms:
- revert hrtimer fix
fbdev:
- g364fb: fix build failure with mips
i915:
- Only check eDP HPD when AUX CH is shared.
amdgpu:
- GC 9.4.3 fixes
- Fix white screen issues with S/G display on system with >= 64G of ram
- Replay fixes
- SMU 13.0.6 fixes
- AUX backlight fix
- NBIO 4.3 SR-IOV fixes for HDP
- RAS fixes
- DP MST resume fix
- Fix segfault on systems with no vbios
- DPIA fixes
amdkfd:
- CWSR grace period fix
- Unaligned doorbell fix
- CRIU fix for GFX11
- Add missing TLB flush on gfx10 and newer
radeon:
- make fence wait in suballocator uninterrruptable
gm12u320:
- Fix the timeout usage for usb_bulk_msg()"
* tag 'drm-fixes-2023-09-15' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/tests: helpers: Avoid a driver uaf
Revert "drm/vkms: Fix race-condition between the hrtimer and the atomic commit"
drm/amdkfd: Insert missing TLB flush on GFX10 and later
drm/i915: Only check eDP HPD when AUX CH is shared
drm/amd/display: Fix 2nd DPIA encoder Assignment
drm/amd/display: Add DPIA Link Encoder Assignment Fix
drm/amd/display: fix replay_mode kernel-doc warning
drm/amdgpu: Handle null atom context in VBIOS info ioctl
drm/amdkfd: Checkpoint and restore queues on GFX11
drm/amd/display: Adjust the MST resume flow
drm/amdgpu: fallback to old RAS error message for aqua_vanjaram
drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV
drm/amdgpu/soc21: don't remap HDP registers for SR-IOV
drm/amd/display: Don't check registers, if using AUX BL control
drm/amdgpu: fix retry loop test
drm/amd/display: Add dirty rect support for Replay
Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"
drm/amd/display: fix the white screen issue when >= 64GB DRAM
drm/amdkfd: Update CU masking for GFX 9.4.3
drm/amdkfd: Update cache info reporting for GFX v9.4.3
...
|