Age | Commit message (Collapse) | Author |
|
At boot the FSCR is initialised via one of two paths. On most systems
it's set to a hard coded value in __init_FSCR().
On newer skiboot systems we use the device tree CPU features binding,
where firmware can tell Linux what bits to set in FSCR (and HFSCR).
In both cases the value that's configured at boot is not propagated
into the init_task.thread.fscr value prior to the initial fork of init
(pid 1), which means the value is not used by any processes other than
swapper (the idle task).
For the __init_FSCR() case this is OK, because the value in
init_task.thread.fscr is initialised to something sensible. However it
does mean that the value set in __init_FSCR() is not used other than
for swapper, which is odd and confusing.
The bigger problem is for the device tree CPU features case it
prevents firmware from setting (or clearing) FSCR bits for use by user
space. This means all existing kernels can not have features
enabled/disabled by firmware if those features require
setting/clearing FSCR bits.
We can handle both cases by saving the FSCR value into
init_task.thread.fscr after we have initialised it at boot. This fixes
the bug for device tree CPU features, and will allow us to simplify
the initialisation for the __init_FSCR() case in a future patch.
Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200527145843.2761782-3-mpe@ellerman.id.au
|
|
The device tree CPU features binding includes FSCR bit numbers which
Linux is instructed to set by firmware.
Whether that's a good idea or not, in the case of the DSCR the Linux
implementation has a hard requirement that the FSCR_DSCR bit not be
set by default. We use it to track when a process reads/writes to
DSCR, so it must be clear to begin with.
So if firmware tells us to set FSCR_DSCR we must ignore it.
Currently this does not cause a bug in our DSCR handling because the
value of FSCR that the device tree CPU features code establishes is
only used by swapper. All other tasks use the value hard coded in
init_task.thread.fscr.
However we'd like to fix that in a future commit, at which point this
will become necessary.
Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200527145843.2761782-2-mpe@ellerman.id.au
|
|
__init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc:
Add support for context switching the TAR register") (Feb 2013), and
only set FSCR_TAR.
At that point FSCR (Facility Status and Control Register) was not
context switched, so the setting was permanent after boot.
Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit
54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again
that was permanent after boot.
Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on
POWER8") (Aug 2013) added a limited context switch of FSCR, just the
FSCR_DSCR bit was context switched based on thread.dscr_inherit. That
commit said "This clears the H/FSCR DSCR bit initially", but it
didn't, it left the initialisation of FSCR_DSCR in __init_FSCR().
However the initial context switch from init_task to pid 1 would clear
FSCR_DSCR because thread.dscr_inherit was 0.
That commit also introduced the requirement that FSCR_DSCR be clear
for user processes, so that we can take the facility unavailable
interrupt in order to manage dscr_inherit.
Then in commit 152d523e6307 ("powerpc: Create context switch helpers
save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to
thread_struct. However it still wasn't fully context switched, we just
took the existing value and set FSCR_DSCR if the new thread had
dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR |
FSCR_TAR, but that value was not propagated into the thread_struct, so
the initial context switch set FSCR_DSCR back to 0.
Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context
switching") (Jun 2016) added a full context switch of the FSCR, and
added an initialisation of init_task.thread.fscr to FSCR_TAR |
FSCR_EBB, but omitted FSCR_DSCR.
The end result is that swapper runs with FSCR_DSCR set because of the
initialisation in __init_FSCR(), but no other processes do, they use
the value from init_task.thread.fscr.
Having FSCR_DSCR set for swapper allows it to access SPR 3 from
userspace, but swapper never runs userspace, so it has no useful
effect. It's also confusing to have the value initialised in two
places to two different values.
So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the
point where there's a single value of FSCR, even if it's still set in
two places.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Alistair Popple <alistair@popple.id.au>
Link: https://lore.kernel.org/r/20200527145843.2761782-1-mpe@ellerman.id.au
|
|
'thread' doesn't exist in kuap_check() macro.
Use 'current' instead.
Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b459e1600b969047a74e34251a84a3d6fdf1f312.1590858925.git.christophe.leroy@csgroup.eu
|
|
Since commit c55d7b5e64265f ("powerpc: Remove STRICT_KERNEL_RWX
incompatibility with RELOCATABLE"), powerpc kernels with
-mprofile-kernel can crash in certain scenarios with a trace like below:
BUG: Unable to handle kernel instruction fetch (NULL pointer?)
Faulting instruction address: 0x00000000
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
<snip>
NIP [0000000000000000] 0x0
LR [c0080000102c0048] ext4_iomap_end+0x8/0x30 [ext4]
Call Trace:
iomap_apply+0x20c/0x920 (unreliable)
iomap_bmap+0xfc/0x160
ext4_bmap+0xa4/0x180 [ext4]
bmap+0x4c/0x80
jbd2_journal_init_inode+0x44/0x1a0 [jbd2]
ext4_load_journal+0x440/0x860 [ext4]
ext4_fill_super+0x342c/0x3ab0 [ext4]
mount_bdev+0x25c/0x290
ext4_mount+0x28/0x50 [ext4]
legacy_get_tree+0x4c/0xb0
vfs_get_tree+0x4c/0x130
do_mount+0xa18/0xc50
sys_mount+0x158/0x180
system_call+0x5c/0x68
The NIP points to NULL, or a random location (data even), while the LR
always points to the LEP of a function (with an offset of 8), indicating
that something went wrong with ftrace. However, ftrace is not
necessarily active when such crashes occur.
The kernel OOPS sometimes follows a warning from ftrace indicating that
some module functions could not be patched with a nop. Other times, if a
module is loaded early during boot, instruction patching can fail due to
a separate bug, but the error is not reported due to missing error
reporting.
In all the above cases when instruction patching fails, ftrace will be
disabled but certain kernel module functions will be left with default
calls to _mcount(). This is not a problem with ELFv1. However, with
-mprofile-kernel, the default stub is problematic since it depends on a
valid module TOC in r2. If the kernel (or a different module) calls into
a function that does not use the TOC, the function won't have a prologue
to setup the module TOC. When that function calls into _mcount(), we
will end up in the relocation stub that will use the previous TOC, and
end up trying to jump into a random location. From the above trace:
iomap_apply+0x20c/0x920 [kernel TOC]
|
V
ext4_iomap_end+0x8/0x30 [no GEP == kernel TOC]
|
V
_mcount() stub
[uses kernel TOC -> random entry]
To address this, let's change over to using the special stub that is
used for ftrace_[regs_]caller() for _mcount(). This ensures that we are
not dependent on a valid module TOC in r2 for default _mcount()
handling.
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8affd4298d22099bbd82544fab8185700a6222b1.1587488954.git.naveen.n.rao@linux.vnet.ibm.com
|
|
For -mprofile-kernel, we need special handling when generating stubs for
ftrace calls such as _mcount(). To faciliate this, we check if a
R_PPC64_REL24 relocation is for a symbol named "_mcount()" along with
also checking the instruction sequence. The latter is not really
required since "_mcount()" is an exported symbol and kernel modules
cannot use it. As such, drop the additional checking and simplify the
code. This helps unify stub creation for ftrace stubs with
-mprofile-kernel and aids in code reuse.
Also rename is_mprofile_mcount_callsite() to is_mprofile_ftrace_call()
to reflect the checking being done.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d9c316adfa1fb787ad268bb4691e7e4059ff2d5.1587488954.git.naveen.n.rao@linux.vnet.ibm.com
|
|
module_trampoline_target() is only used by ftrace. Move the prototype
within the appropriate #ifdef in the header. Also, move the function
body to the end of module_64.c so as to consolidate all ftrace code in
one place.
No functional changes.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2527351f65c53c5866068ae130dc34c5d4ee8ad9.1587488954.git.naveen.n.rao@linux.vnet.ibm.com
|
|
Mapping of early shadow area is implemented by using a single static
page table having all entries pointing to the same early shadow page.
The shadow area must therefore occupy full PGD entries.
The shadow area has a size of 128MB starting at 0xf8000000.
With 4k pages, a PGD entry is 4MB
With 16k pages, a PGD entry is 64MB
With 64k pages, a PGD entry is 1GB which is too big.
Until we rework the early shadow mapping, disable KASAN when the page
size is too big.
Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7195fcde7314ccbf7a081b356084a69d421b10d4.1590660977.git.christophe.leroy@csgroup.eu
|
|
On book3s/32, KUEP is an heavy process as it requires to
set/unset the NX bit in each of the 12 user segments
everytime the kernel is entered/exited from/to user space.
Don't select KUEP by default on book3s/32.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1492bb150c1aaa53d99a604b49992e60ea20cd5f.1586962582.git.christophe.leroy@c-s.fr
|
|
On book3s/32, KUAP is an heavy process as it requires to
determine which segments are impacted and unlock/lock
each of them.
And since the implementation of user_access_begin/end, it
is even worth for the time being because unlike __get_user(),
user_access_begin doesn't make difference between read and write
and unlocks access also for read allthought that's unneeded
on book3s/32.
As shown by the size of a kernel built with KUAP and one without,
the overhead is 64k bytes of code. As a comparison a similar
build on an 8xx has an overhead of only 8k bytes of code.
text data bss dec hex filename
7230416 1425868 837376 9493660 90dc9c vmlinux.kuap6xx
7165012 1425548 837376 9427936 8fdbe0 vmlinux.nokuap6xx
6519796 1960028 477464 8957288 88ad68 vmlinux.kuap8xx
6511664 1959864 477464 8948992 888d00 vmlinux.nokuap8xx
Until a more optimised KUAP is implemented on book3s/32,
don't select it by default.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/154a99399317b096ac1f04827b9f8d7a9179ddc1.1586962586.git.christophe.leroy@c-s.fr
|
|
To enable/disable kernel access to user space, the 8xx has to
modify the properties of access group 1. This is done by writing
predefined values into SPRN_Mx_AP registers.
As of today, a __put_user() gives:
00000d64 <my_test>:
d64: 3d 20 4f ff lis r9,20479
d68: 61 29 ff ff ori r9,r9,65535
d6c: 7d 3a c3 a6 mtspr 794,r9
d70: 39 20 00 00 li r9,0
d74: 90 83 00 00 stw r4,0(r3)
d78: 3d 20 6f ff lis r9,28671
d7c: 61 29 ff ff ori r9,r9,65535
d80: 7d 3a c3 a6 mtspr 794,r9
d84: 4e 80 00 20 blr
Because only groups 0 and 1 are used, the definition of
groups 2 to 15 doesn't matter.
By setting unused bits to 0 instead on 1, one instruction is
removed for each lock and unlock action:
00000d5c <my_test>:
d5c: 3d 20 40 00 lis r9,16384
d60: 7d 3a c3 a6 mtspr 794,r9
d64: 39 20 00 00 li r9,0
d68: 90 83 00 00 stw r4,0(r3)
d6c: 3d 20 60 00 lis r9,24576
d70: 7d 3a c3 a6 mtspr 794,r9
d74: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/57425c33dd72f292b1a23570244b81419072a7aa.1586945153.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode.
The very last part of exception exits cannot support a trap.
Blacklist them from kprobe.
While we are at it, remove exc_exit_start symbol which is not
used to avoid having to blacklist it.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/098b0fd3f6299aa1bd692bd576bd7012c84608de.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode.
The very last part of syscall cannot support a trap.
Add a symbol syscall_exit_finish to identify that part and
blacklist it from kprobe.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/23eddf49abb03d1359fa0be4206998eb3800f42c.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode.
As exception entry points are running with MMU disabled,
blacklist them.
The handling of TLF_NAPPING and TLF_SLEEPING is moved before the
CONFIG_TRACE_IRQFLAGS which contains 'reenable_mmu' because from there
kprobe will be possible as the kernel will run with MMU enabled.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f61ac599855e674ebb592464d0ea32a3ba9c6644.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bf57066d05518644dee0840af69d36ab5086729.1585670437.git.christophe.leroy@c-s.fr
|
|
machine_check_in_rtas() is just a trap.
Do the trap directly in the machine check exception handler.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/78899f40f89cb3c4f69bdff7f04eb6ec7cb753d5.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dabed523c1b8955dd425152ce260b390053e727a.1585670437.git.christophe.leroy@c-s.fr
|
|
In hash_low.S, a lot of named local symbols are used instead of
numbers to ease code readability. However, they don't need to be
visible.
In order to ease blacklisting of functions running with MMU
disabled for kprobe, rename the symbols to .Lsymbols in order
to hide them as if they were numbered labels.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/90c430d9e0f7af772a58aaeaf17bcc6321265340.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaab3bff961c3bfe149f1d0bd3593291ef939dcc.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6316e8883753499073f47301857e4e88b73c3ddd.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3ac4ab8dd7008b9706d9228a60645a1756fa84bf.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5dca36682383577a3c2b2bca4d577e8654944461.1585670437.git.christophe.leroy@c-s.fr
|
|
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1ae02b6637b87fc5aaa1d5012c3e2cb30e62b4a3.1585670437.git.christophe.leroy@c-s.fr
|
|
In order to avoid Oopses, use probe_address() to read the
instruction at the address where the trap happened.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7f24b5961a6839ff01df792816807f74ff236bf6.1582567319.git.christophe.leroy@c-s.fr
|
|
This gives us OF_PMEM which is useful in mambo.
This adds 153K to the text of ppc64le_defconfig which 0.8% of the
total text.
LIBNVDIMM text data bss dec hex
Without 18574833 5518150 1539240 25632223 1871ddf
With 18727834 5546206 1539368 25813408 189e1a0
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200519043009.3081885-1-mikey@neuling.org
|
|
Implement rtas_call_reentrant() for reentrant rtas-calls:
"ibm,int-on", "ibm,int-off",ibm,get-xive" and "ibm,set-xive".
On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
items 2 and 3 say:
2 - For the PowerPC External Interrupt option: The * call must be
reentrant to the number of processors on the platform.
3 - For the PowerPC External Interrupt option: The * argument call
buffer for each simultaneous call must be physically unique.
So, these rtas-calls can be called in a lockless way, if using
a different buffer for each cpu doing such rtas call.
For this, it was suggested to add the buffer (struct rtas_args)
in the PACA struct, so each cpu can have it's own buffer.
The PACA struct received a pointer to rtas buffer, which is
allocated in the memory range available to rtas 32-bit.
Reentrant rtas calls are useful to avoid deadlocks in crashing,
where rtas-calls are needed, but some other thread crashed holding
the rtas.lock.
This is a backtrace of a deadlock from a kdump testing environment:
#0 arch_spin_lock
#1 lock_rtas ()
#2 rtas_call (token=8204, nargs=1, nret=1, outputs=0x0)
#3 ics_rtas_mask_real_irq (hw_irq=4100)
#4 machine_kexec_mask_interrupts
#5 default_machine_crash_shutdown
#6 machine_crash_shutdown
#7 __crash_kexec
#8 crash_kexec
#9 oops_end
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
[mpe: Move under #ifdef PSERIES to avoid build breakage]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200518234245.200672-3-leobras.c@gmail.com
|
|
In order to get any rtas* struct into other headers, including rtas.h
may cause a lot of errors, regarding include dependency needed for
inline functions.
Create rtas-types.h and move there all type/struct definitions
from rtas.h, then include rtas-types.h into rtas.h.
Also, as suggested by checkpath.pl, replace uint8_t for u8.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200518234245.200672-2-leobras.c@gmail.com
|
|
Currently, if printk lock (logbuf_lock) is held by other thread during
crash, there is a chance of deadlocking the crash on next printk, and
blocking a possibly desired kdump.
At the start of default_machine_crash_shutdown, make printk enter
NMI context, as it will use per-cpu buffers to store the message,
and avoid locking logbuf_lock.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200512214533.93878-1-leobras.c@gmail.com
|
|
While providing guests, it's desirable to resize it's memory on demand.
By now, it's possible to do so by creating a guest with a small base
memory, hot-plugging all the rest, and using 'movable_node' kernel
command-line parameter, which puts all hot-plugged memory in
ZONE_MOVABLE, allowing it to be removed whenever needed.
But there is an issue regarding guest reboot:
If memory is hot-plugged, and then the guest is rebooted, all hot-plugged
memory goes to ZONE_NORMAL, which offers no guaranteed hot-removal.
It usually prevents this memory to be hot-removed from the guest.
It's possible to use device-tree information to fix that behavior, as
it stores flags for LMB ranges on ibm,dynamic-memory-vN.
It involves marking each memblock with the correct flags as hotpluggable
memory, which mm/memblock.c puts in ZONE_MOVABLE during boot if
'movable_node' is passed.
For carrying such information, the new flag DRCONF_MEM_HOTREMOVABLE was
proposed and accepted into Power Architecture documentation.
This flag should be:
- true (b=1) if the hypervisor may want to hot-remove it later, and
- false (b=0) if it does not care.
During boot, guest kernel reads the device-tree, early_init_drmem_lmb()
is called for every added LMBs. Here, checking for this new flag and
marking memblocks as hotplugable memory is enough to get the desirable
behavior.
This should cause no change if 'movable_node' parameter is not passed
in kernel command-line.
Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200402195156.626430-1-leonardo@linux.ibm.com
|
|
Show the address of the tasks regs in the process listing in xmon. The
regs should always be on the stack page that we also print the address
of, but it's still helpful not to have to find them by hand.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200520111740.953679-1-mpe@ellerman.id.au
|
|
This adds the CPU or thread number to printk messages. This helps a
lot when deciphering concurrent oopses that have been interleaved.
Example output, of PID1 (T1) triggering a warning:
[ 1.581678][ T1] WARNING: CPU: 0 PID: 1 at crypto/rsa-pkcs1pad.c:539 pkcs1pad_verify+0x38/0x140
[ 1.581681][ T1] Modules linked in:
[ 1.581693][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty #1515
[ 1.581700][ T1] NIP: c000000000207d64 LR: c000000000207d3c CTR: c000000000207d2c
[ 1.581708][ T1] REGS: c0000000fd2e7560 TRAP: 0700 Not tainted (5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty)
[ 1.581712][ T1] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44000222 XER: 00040000
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200520121257.961112-1-mpe@ellerman.id.au
|
|
Currently when we boot on a big core system, we get this print:
[ 0.040500] Using small cores at SMT level
This is misleading as we've actually detected big cores.
This patch clears up the print to say we've detect big cores but are
using small cores for scheduling.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200528230731.1235752-1-mikey@neuling.org
|
|
If the memory chunk found for reserving memory overshoots the memory
limit imposed, do not proceed with reserving memory. Default behavior
was this until commit 140777a3d8df ("powerpc/fadump: consider reserved
ranges while reserving memory") changed it unwittingly.
Fixes: 140777a3d8df ("powerpc/fadump: consider reserved ranges while reserving memory")
Cc: stable@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/159057266320.22331.6571453892066907320.stgit@hbathini.in.ibm.com
|
|
'mem=" option is an easy way to put high pressure on memory during
some test. Hence after applying the memory limit, instead of total
mem, the actual usable memory should be considered when reserving mem
for crashkernel. Otherwise the boot up may experience OOM issue.
E.g. it would reserve 4G prior to the change and 512M afterward, if
passing
crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G",
and mem=5G on a 256G machine.
This issue is powerpc specific because it puts higher priority on
fadump and kdump reservation than on "mem=". Referring the following
code:
if (fadump_reserve_mem() == 0)
reserve_crashkernel();
...
/* Ensure that total memory size is page-aligned. */
limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE);
memblock_enforce_memory_limit(limit);
While on other arches, the effect of "mem=" takes a higher priority
and pass through memblock_phys_mem_size() before calling
reserve_crashkernel().
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1585749644-4148-1-git-send-email-kernelfans@gmail.com
|
|
kbuild test robot reported some build warnings in the hw_breakpoint
code when compiled with clang[1]. Some of them were introduced by the
recent powerpc change to add arch_reserve_bp_slot() and
arch_release_bp_slot(). Fix them all.
kernel/events/hw_breakpoint.c:71:12: warning: no previous prototype for function 'hw_breakpoint_weight'
kernel/events/hw_breakpoint.c:216:12: warning: no previous prototype for function 'arch_reserve_bp_slot'
kernel/events/hw_breakpoint.c:221:13: warning: no previous prototype for function 'arch_release_bp_slot'
kernel/events/hw_breakpoint.c:228:13: warning: no previous prototype for function 'arch_unregister_hw_breakpoint'
[1]: https://lore.kernel.org/linuxppc-dev/202005192233.oi9CjRtA%25lkp@intel.com/
Fixes: 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf and ptrace events")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
[mpe: Drop extern, flesh out change log, add Fixes tag]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200602041208.128913-1-ravi.bangoria@linux.ibm.com
|
|
Some SMP platforms don't have CONFIG_IRQ_WORK defined, resulting in a link
error at build time.
Define a stub and clean up the prototype definitions.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"Assorted patches from Miklos.
An interesting part here is /proc/mounts stuff..."
The "/proc/mounts stuff" is using a cursor for keeeping the location
data while traversing the mount listing.
Also probably worth noting is the addition of faccessat2(), which takes
an additional set of flags to specify how the lookup is done
(AT_EACCESS, AT_SYMLINK_NOFOLLOW, AT_EMPTY_PATH).
* 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: add faccessat2 syscall
vfs: don't parse "silent" option
vfs: don't parse "posixacl" option
vfs: don't parse forbidden flags
statx: add mount_root
statx: add mount ID
statx: don't clear STATX_ATIME on SB_RDONLY
uapi: deprecate STATX_ALL
utimensat: AT_EMPTY_PATH support
vfs: split out access_override_creds()
proc/mounts: add cursor
aio: fix async fsync creds
vfs: allow unprivileged whiteout creation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/coredump updates from Al Viro:
"set_fs() removal in coredump-related area - mostly Christoph's
stuff..."
* 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump
binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump
binfmt_elf: remove the set_fs in fill_siginfo_note
signal: refactor copy_siginfo_to_user32
powerpc/spufs: simplify spufs core dumping
powerpc/spufs: stop using access_ok
powerpc/spufs: fix copy_to_user while atomic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/__copy_to_user updates from Al Viro:
"Getting rid of __copy_to_user() callers - stuff that doesn't fit into
other series"
* 'uaccess.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
dlmfs: convert dlmfs_file_read() to copy_to_user()
esas2r: don't bother with __copy_to_user()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/__copy_from_user updates from Al Viro:
"Getting rid of __copy_from_user() callers - patches that don't fit
into other series"
* 'uaccess.__copy_from_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
pstore: switch to copy_from_user()
firewire: switch ioctl_queue_iso to use of copy_from_user()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/__put-user updates from Al Viro:
"Removal of __put_user() calls - misc patches that don't fit into any
other series"
* 'uaccess.__put_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
pcm_native: result of put_user() needs to be checked
scsi_ioctl.c: switch SCSI_IOCTL_GET_IDLUN to copy_to_user()
compat sysinfo(2): don't bother with field-by-field copyout
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/readdir updates from Al Viro:
"Finishing the conversion of readdir.c to unsafe_... API.
This includes the uaccess_{read,write}_begin series by Christophe
Leroy"
* 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
readdir.c: get rid of the last __put_user(), drop now-useless access_ok()
readdir.c: get compat_filldir() more or less in sync with filldir()
switch readdir(2) to unsafe_copy_dirent_name()
drm/i915/gem: Replace user_access_begin by user_write_access_begin
uaccess: Selectively open read or write user access
uaccess: Add user_read_access_begin/end and user_write_access_begin/end
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/access_ok updates from Al Viro:
"Removals of trivially pointless access_ok() calls.
Note: the fiemap stuff was removed from the series, since they are
duplicates with part of ext4 series carried in Ted's tree"
* 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vmci_host: get rid of pointless access_ok()
hfi1: get rid of pointless access_ok()
usb: get rid of pointless access_ok() calls
lpfc_debugfs: get rid of pointless access_ok()
efi_test: get rid of pointless access_ok()
drm_read(): get rid of pointless access_ok()
via-pmu: don't bother with access_ok()
drivers/crypto/ccp/sev-dev.c: get rid of pointless access_ok()
omapfb: get rid of pointless access_ok() calls
amifb: get rid of pointless access_ok() calls
drivers/fpga/dfl-afu-dma-region.c: get rid of pointless access_ok()
drivers/fpga/dfl-fme-pr.c: get rid of pointless access_ok()
cm4000_cs.c cmm_ioctl(): get rid of pointless access_ok()
nvram: drop useless access_ok()
n_hdlc_tty_read(): remove pointless access_ok()
tomoyo_write_control(): get rid of pointless access_ok()
btrfs_ioctl_send(): don't bother with access_ok()
fat_dir_ioctl(): hadn't needed that access_ok() for more than a decade...
dlmfs_file_write(): get rid of pointless access_ok()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/csum updates from Al Viro:
"Regularize the sitation with uaccess checksum primitives:
- fold csum_partial_... into csum_and_copy_..._user()
- on x86 collapse several access_ok()/stac()/clac() into
user_access_begin()/user_access_end()"
* 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
default csum_and_copy_to_user(): don't bother with access_ok()
take the dummy csum_and_copy_from_user() into net/checksum.h
arm: switch to csum_and_copy_from_user()
sh32: convert to csum_and_copy_from_user()
m68k: convert to csum_and_copy_from_user()
xtensa: switch to providing csum_and_copy_from_user()
sparc: switch to providing csum_and_copy_from_user()
parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user()
x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
x86_64: csum_..._copy_..._user(): switch to unsafe_..._user()
get rid of csum_partial_copy_to_user()
|
|
Convert the i.MX MU binding to DT schema format using json-schema
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Convert the i.MX GPCv2 binding to DT schema format using json-schema
Example is updated based on latest DT file and consumer's example is
removed since it is NOT that useful.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Convert the i.MX GPC binding to DT schema format using json-schema
>From latest reference manual, there is actually ONLY 1 interrupt for
GPC, so the additional GPC interrupt is removed.
Consumer's example is also removed since it is NOT that useful.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-06-01
The following pull-request contains BPF updates for your *net-next* tree.
We've added 55 non-merge commits during the last 1 day(s) which contain
a total of 91 files changed, 4986 insertions(+), 463 deletions(-).
The main changes are:
1) Add rx_queue_mapping to bpf_sock from Amritha.
2) Add BPF ring buffer, from Andrii.
3) Attach and run programs through devmap, from David.
4) Allow SO_BINDTODEVICE opt in bpf_setsockopt, from Ferenc.
5) link based flow_dissector, from Jakub.
6) Use tracing helpers for lsm programs, from Jiri.
7) Several sk_msg fixes and extensions, from John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sparse reports a warning at efx_ef10_try_update_nic_stats_vf()
warning: context imbalance in efx_ef10_try_update_nic_stats_vf()
- unexpected unlock
The root cause is the missing annotation at
efx_ef10_try_update_nic_stats_vf()
Add the missing _must_hold(&efx->stats_lock) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extends support to IPv6 for Inline TLS server.
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
v1->v2:
- cc'd tcp folks.
v2->v3:
- changed EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL()
Signed-off-by: David S. Miller <davem@davemloft.net>
|