Age | Commit message (Collapse) | Author |
|
This commit adds kernel-doc style comments with complete parameter
descriptions for the function smp_startup_cpu().
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408053109.96360-2-yang.lee@linux.alibaba.com
|
|
Fix some function names in kernel-doc comments.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408053109.96360-1-yang.lee@linux.alibaba.com
|
|
Fix the kernel-doc annotation for the 'skip' parameter in the
partial_decompress() function by adding a missing underscore and colon.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408083916.123369-1-yang.lee@linux.alibaba.com
|
|
Commit ab1a517d55b0 ("powerpc/syscall: Rename syscall_64.c into
interrupt.c") missed to update these three lines:
GCOV_PROFILE_syscall_64.o := n
KCOV_INSTRUMENT_syscall_64.o := n
UBSAN_SANITIZE_syscall_64.o := n
To restore the original behavior, we could replace them with:
GCOV_PROFILE_interrupt.o := n
KCOV_INSTRUMENT_interrupt.o := n
UBSAN_SANITIZE_interrupt.o := n
However, nobody has noticed the functional change in the past three
years, so they were unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240216135517.2002749-1-masahiroy@kernel.org
|
|
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:
test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
eBPF filter opcode 00d7 (@2) unsupported
jited:0 301 PASS
test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 555 PASS
test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
eBPF filter opcode 00d7 (@2) unsupported
jited:0 268 PASS
test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 269 PASS
test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
eBPF filter opcode 00d7 (@2) unsupported
jited:0 460 PASS
test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 320 PASS
test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
eBPF filter opcode 00d7 (@2) unsupported
jited:0 222 PASS
test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 273 PASS
test_bpf: #344 BPF_LDX_MEMSX | BPF_B
eBPF filter opcode 0091 (@5) unsupported
jited:0 432 PASS
test_bpf: #345 BPF_LDX_MEMSX | BPF_H
eBPF filter opcode 0089 (@5) unsupported
jited:0 381 PASS
test_bpf: #346 BPF_LDX_MEMSX | BPF_W
eBPF filter opcode 0081 (@5) unsupported
jited:0 505 PASS
test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
eBPF filter opcode 0006 (@1) unsupported
jited:0 261 PASS
test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]
Fix them by adding missing processing.
Fixes: daabb2b098e0 ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
|
|
There is code that builds with calls to IO accessors even when
CONFIG_PCI=n, but the actual calls are guarded by runtime checks.
If not those calls would be faulting, because the page at virtual
address zero is (usually) not mapped into the kernel. As Arnd pointed
out, it is possible a large port value could cause the address to be
above mmap_min_addr which would then access userspace, which would be
a bug.
To avoid any such issues, set _IO_BASE to POISON_POINTER_DELTA. That
is a value chosen to point into unmapped space between the kernel and
userspace, so any access will always fault.
Note that on 32-bit POISON_POINTER_DELTA is 0, so the patch only has an
effect on 64-bit.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240503075619.394467-2-mpe@ellerman.id.au
|
|
With -Wextra clang warns about pointer arithmetic using a null pointer.
When building with CONFIG_PCI=n, that triggers a warning in the IO
accessors, eg:
In file included from linux/arch/powerpc/include/asm/io.h:672:
linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro '__do_inb'
591 | #define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port);
| ~~~~~~~~~~~~~~~~~~~~~ ^
That is because when CONFIG_PCI=n, _IO_BASE is defined as 0.
Although _IO_BASE is defined as plain 0, the cast (PCI_IO_ADDR) converts
it to void * before the addition with port happens.
Instead the addition can be done first, and then the cast. The resulting
value will be the same, but avoids the warning, and also avoids void
pointer arithmetic which is apparently non-standard.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaaNsA@mail.gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240503075619.394467-1-mpe@ellerman.id.au
|
|
Currently, bpf jit code on powerpc assumes all the bpf functions and
helpers to be part of core kernel text. This is false for kfunc case,
as function addresses may not be part of core kernel text area. So,
add support for addresses that are not within core kernel text area
too, to enable kfunc support. Emit instructions based on whether the
function address is within core kernel text address or not, to retain
optimized instruction sequence where possible.
In case of PCREL, as a bpf function that is not within core kernel
text area is likely to go out of range with relative addressing on
kernel base, use PC relative addressing. If that goes out of range,
load the full address with PPC_LI64().
With addresses that are not within core kernel text area supported,
override bpf_jit_supports_kfunc_call() to enable kfunc support. Also,
override bpf_jit_supports_far_kfunc_call() to enable 64-bit pointers,
as an address offset can be more than 32-bit long on PPC64.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240502173205.142794-2-hbathini@linux.ibm.com
|
|
With PCREL addressing, there is no kernel TOC. So, it is not setup in
prologue when PCREL addressing is used. But the number of instructions
to skip on a tail call was not adjusted accordingly. That resulted in
not so obvious failures while using tailcalls. 'tailcalls' selftest
crashed the system with the below call trace:
bpf_test_run+0xe8/0x3cc (unreliable)
bpf_prog_test_run_skb+0x348/0x778
__sys_bpf+0xb04/0x2b00
sys_bpf+0x28/0x38
system_call_exception+0x168/0x340
system_call_vectored_common+0x15c/0x2ec
Also, as bpf programs are always module addresses and a bpf helper in
general is a core kernel text address, using PC relative addressing
often fails with "out of range of pcrel address" error. Switch to
using kernel base for relative addressing to handle this better.
Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing")
Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240502173205.142794-1-hbathini@linux.ibm.com
|
|
Now that we track a DEXCR on a per-task basis, individual tasks are free
to configure it as they like.
The interface is a pair of getter/setter prctl's that work on a single
aspect at a time (multiple aspects at once is more difficult if there
are different rules applied for each aspect, now or in future). The
getter shows the current state of the process config, and the setter
allows setting/clearing the aspect.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Account for PR_RISCV_SET_ICACHE_FLUSH_CTX, shrink some longs lines]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-5-bgray@linux.ibm.com
|
|
Inheriting the DEXCR across exec can have security and usability
concerns. If a program is compiled with hash instructions it generally
expects to run with NPHIE enabled. But if the parent process disables
NPHIE then if it's not careful it will be disabled for any children too
and the protection offered by hash checks is basically worthless.
This patch introduces a per-process reset value that new execs in a
particular process tree are initialized with. This enables fine grained
control over what DEXCR value child processes run with by default.
For example, containers running legacy binaries that expect hash
instructions to act as NOPs could configure the reset value of the
container root to control the default reset value for all members of
the container.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Add missing SPDX tag on dexcr.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-4-bgray@linux.ibm.com
|
|
Add capability to make the DEXCR act as a per-process SPR.
We do not yet have an interface for changing the values per task. We
also expect the kernel to use a single DEXCR value across all tasks
while in privileged state, so there is no need to synchronize after
changing it (the userspace aspects will synchronize upon returning to
userspace).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-3-bgray@linux.ibm.com
|
|
The last function to reference module_bug_list went in 2008's
commit b9754568ef17 ("powerpc: Remove dead module_find_bug code")
but I don't think that was called since 2006's
commit 73c9ceab40b1 ("[POWERPC] Generic BUG for powerpc")
Now that the list has gone, I think we can also clean up the bug
entries in mod_arch_specific.
Lightly boot tested.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240503002317.183500-1-linux@treblig.org
|
|
The `memory_limit` variable should only be used during boot, enforce
that by marking it initdata.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422115231.1769984-1-mpe@ellerman.id.au
|
|
We noticed the following nuisance messages during boot process:
vio vio: uevent: failed to send synthetic uevent
vio 4000: uevent: failed to send synthetic uevent
vio 4001: uevent: failed to send synthetic uevent
vio 4002: uevent: failedto send synthetic uevent
vio 4004: uevent: failed to send synthetic uevent
It's caused by either vio_register_device_node() failing to set
dev->of_node or the node is missing a "compatible" property. To match
the definition of modalias in modalias_show(), remove the return of
ENODEV in such cases. The failure messages is also suppressed with this
change.
Signed-off-by: Lidong Zhong <lidong.zhong@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240411020450.12725-1-lidong.zhong@suse.com
|
|
plpar_hcall(), plpar_hcall9(), and related functions expect callers to
provide valid result buffers of certain minimum size. Currently this
is communicated only through comments in the code and the compiler has
no idea.
For example, if I write a bug like this:
long retbuf[PLPAR_HCALL_BUFSIZE]; // should be PLPAR_HCALL9_BUFSIZE
plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, ...);
This compiles with no diagnostics emitted, but likely results in stack
corruption at runtime when plpar_hcall9() stores results past the end
of the array. (To be clear this is a contrived example and I have not
found a real instance yet.)
To make this class of error less likely, we can use explicitly-sized
array parameters instead of pointers in the declarations for the hcall
APIs. When compiled with -Warray-bounds[1], the code above now
provokes a diagnostic like this:
error: array argument is too small;
is of size 32, callee requires at least 72 [-Werror,-Warray-bounds]
60 | plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf,
| ^ ~~~~~~
[1] Enabled for LLVM builds but not GCC for now. See commit
0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") and
related changes.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408-pseries-hvcall-retbuf-v1-1-ebc73d7253cf@linux.ibm.com
|
|
Erhard reported that kmemleak was showing a warning at boot:
kmemleak: Not scanning unknown object at 0xc00000007f000000
CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-PMacG5+ #2
Call Trace:
.dump_stack_lvl+0x7c/0xc4 (unreliable)
.kmemleak_no_scan+0xe0/0x100
.iommu_init_early_dart+0x2f0/0x924
.pmac_probe+0x1b0/0x20c
.setup_arch+0x1b8/0x674
.start_kernel+0xdc/0xb74
start_here_common+0x1c/0x44
DART table allocated at: (____ptrval____)
Which he bisected to a change in kmemleak, commit
23c2d497de21 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()").
Because pmac_probe() is called before mem_topology_setup(), the min/
max PFN variables are still zero. That causes kmemleak_alloc_phys() to
ignore the allocation, because the checks against the PFN fail. Then
kmemleak_no_scan() can't find the allocation and prints warning.
Given that kmemleak_alloc_phys() is ignoring the allocation to begin
with, there's no need to call kmemleak_no_scan() at all, which avoids
the warning.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/bug-216156-206035@https.bugzilla.kernel.org%2F/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240419115913.3317575-1-mpe@ellerman.id.au
|
|
When a device is hot removed on powernv, the hotplug driver clears
the device's state. However, on pseries, if a device is removed by
phyp after reaching the error threshold, the kernel remains unaware,
leading to the device not being torn down. This prevents necessary
remediation actions like failover.
Permanently disable the device if the presence check fails.
Also, in eeh_dev_check_failure in we may consider the error as false
positive if the device is hotpluged out as the get_state call returns
EEH_STATE_NOT_SUPPORT and we may end up not clearing the device state,
so log the event if the state is not moved to permanent failure state.
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422075737.1405551-1-ganeshgr@linux.ibm.com
|
|
The elfcorehdr describes the CPUs and memory of the crashed kernel to
the kernel that captures the dump, known as the second or fadump kernel.
The elfcorehdr needs to be updated if the system's memory changes due to
memory hotplug or online/offline events.
Currently, memory hotplug events are monitored in userspace by udev
rules, and fadump is re-registered, which recreates the elfcorehdr with
the latest available memory in the system.
However, the previous patch ("powerpc: make fadump resilient with memory
add/remove events") moved the creation of elfcorehdr to the second or
fadump kernel. This eliminates the need to regenerate the elfcorehdr
during memory hotplug or online/offline events.
Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let
userspace know that fadump re-registration is not required for memory
add/remove events.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422195932.1583833-3-sourabhjain@linux.ibm.com
|
|
Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.
Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.
The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.
There are several notable issues associated with re-registering fadump
for every memory add/remove events.
1. Bulk memory add/remove events with udev-based fadump re-registration
can lead to race conditions and, more importantly, it creates a wide
window during which fadump is inactive until all memory add/remove
events are settled.
2. Re-registering fadump for every memory add/remove event is
inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
available during early boot and remains fixed thereafter. However, if
elfcorehdr is later recreated with additional memblock regions, its
size will increase, potentially leading to memory corruption.
Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.
At present, the first kernel prepares fadump header and stores it in the
fadump reserved area. The fadump header includes the start address of
the elfcorehdr, crashing CPU details, and other relevant information. In
the event of a crash in the first kernel, the second/fadump boots and
accesses the fadump header prepared by the first kernel. It then
performs the following steps in a platform-specific function
[rtas|opal]_fadump_process:
1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr
Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr and set its address to the global variable elfcorehdr_addr
for the vmcore module to process it in the second/fadump kernel.
Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.
To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.
At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.
In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.
1. version:
It stores the fadump header version, which is currently set to 1.
This provides flexibility to update the fadump crash info header in
the future without changing the magic number. For each change in the
fadump header, the version will be increased. This will help the
updated kernel determine how to handle kernel dumps from older
kernels. The magic number remains relevant for checking fadump header
corruption.
2. pt_regs_sz/cpu_mask_sz:
Store size of pt_regs and cpu_mask structure of first kernel. These
attributes are used to prevent dump processing if the sizes of
pt_regs or cpu_mask structure differ between the first and fadump
kernels.
Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422195932.1583833-2-sourabhjain@linux.ibm.com
|
|
Couple of Minor fixes:
- hcall return values are long. Fix that for h_get_mpp, h_get_ppp and
parse_ppp_data
- If hcall fails, values set should be at-least zero. It shouldn't be
uninitialized values. Fix that for h_get_mpp and h_get_ppp
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240412092047.455483-3-sshegde@linux.ibm.com
|
|
When there are no options specified for lparstat, it is expected to
give reports since LPAR(Logical Partition) boot.
APP(Available Processor Pool) is an indicator of how many cores in the
shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is
derived using pool_idle_time which is obtained using H_PIC call.
The interval based reports show correct APP value while since boot
report shows very high APP values. This happens because in that case APP
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot.
To fix that export boot pool idle time in lparcfg and powerpc-utils will
use this info to derive APP as below for since boot reports.
APP = (pool idle time - boot pool idle time) / (uptime * timebase)
Results:: Observe APP values.
====================== Shared LPAR ================================
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.
====== 6.9-rc1 and lparstat 1.3.10 =============
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21
=== With this patch and powerpc-utils patch to do the above equation ===
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21
=====================================================================
Note: physc, purr/idle purr being inaccurate is being handled in a
separate patch in powerpc-utils tree.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240412092047.455483-2-sshegde@linux.ibm.com
|
|
memory limit value specified by the user are further updated such that
the value is 16MB aligned. This is because hash translation mode use
16MB as direct mapping page size. Make sure we update the global
variable 'memory_limit' with the 16MB aligned value such that all kernel
components will see the new aligned value of the memory limit.
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240403083611.172833-3-aneesh.kumar@kernel.org
|
|
If the user specifies the memory limit, the kernel should honor it such
that all allocation and reservations are made within the memory limit
specified. fadump was breaking that rule. Remove the code which updates
the memory limit such that fadump reservations are done within the
limit specified.
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240403083611.172833-2-aneesh.kumar@kernel.org
|
|
The value specified for the memory limit is used to set a restriction on
memory usage. It is important to ensure that this restriction is within
the linear map kernel address space range. The hash page table
translation uses a 16MB page size to map the kernel linear map address
space. htab_bolt_mapping() function aligns down the size of the range
while mapping kernel linear address space. Since the memblock limit is
enforced very early during boot, before we can detect the type of memory
translation (radix vs hash), we align the memory limit value specified
as a kernel parameter to 16MB. This alignment value will work for both
hash and radix translations.
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Acked-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240403083611.172833-1-aneesh.kumar@kernel.org
|
|
Currently walk_vmemmap() skips the first vmemmap entry pointed to by
vmemmap_list pointer itself. Fix that.
With the fix applied the vmemmap entry at 0xc00c000000000000 for hash is
displayed:
$ cat /sys/kernel/debug/kernel_hash_pagetable
...
0xc00c000000010000: AVPN:cd7bd4e0000 ssize: 1T ...
0xc00c000000000000: AVPN:cd7bd4e0000 ssize: 1T ...
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
[mpe: Tweak change log wording and add example output]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/a19ee3dc2b304d39da364a592d5cd167449f8c4a.1713365940.git.ritesh.list@gmail.com
|
|
nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel
crash when invoked during real mode interrupt handling (e.g. early HMI/MCE
interrupt handler) if percpu allocation comes from vmalloc area.
Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI()
wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue when
percpu allocation is from the embedded first chunk. However with
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where percpu
allocation can come from the vmalloc area.
With kernel command line "percpu_alloc=page" we can force percpu allocation
to come from vmalloc area and can see kernel crash in machine_check_early:
[ 1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110
[ 1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0
[ 1.215719] --- interrupt: 200
[ 1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable)
[ 1.215722] [c000000fffd731b0] [0000000000000000] 0x0
[ 1.215724] [c000000fffd73210] [c000000000008364] machine_check_early_common+0x134/0x1f8
Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu
first chunk is not embedded.
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Shirisha Ganta <shirisha@linux.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240410043006.81577-1-mahesh@linux.ibm.com
|
|
JUMP_LABEL_FEATURE_CHECK_DEBUG used static_key_intialized to determine
whether {cpu,mmu}_has_feature() is used before static keys were
initialized. However, {cpu,mmu}_has_feature() should not be used before
setup_feature_keys() is called but static_key_initialized is set well
before this by the call to jump_label_init() in early_init_devtree().
This creates a window in which JUMP_LABEL_FEATURE_CHECK_DEBUG will not
detect misuse and report errors. Add a flag specifically to indicate
when {cpu,mmu}_has_feature() is safe to use.
Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408052358.5030-1-nicholas@linux.ibm.com
|
|
of_gpio.h is deprecated and subject to remove.
The driver doesn't use it directly, replace it
with what is really being used.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240313135645.2066362-1-andriy.shevchenko@linux.intel.com
|
|
When building with LLVM_IAS=1, there is an error because
'-fatal-warnings' is not recognized as a valid flag:
clang: error: unsupported argument '-fatal-warnings' to option '-Wa,'
Use the double hyphen version of the flag, '--fatal-warnings', which
works with both the GNU assembler and LLVM's integrated assembler.
Fixes: 608d4a5ca563 ("powerpc: Error on assembly warnings")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240405-ppc-fix-wa-fatal-warnings-clang-v1-1-bdcd969f2ef0@kernel.org
|
|
The struct ps3_notification_device in the ps3_probe_thread routine
is too large to be on the stack, causing a warning for an
allmodconfig build with clang.
Change the struct ps3_notification_device from a variable on the stack
to a dynamically allocated variable.
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d64f06f4-81ae-4ec5-ab3b-d7f7f091e0ac@infradead.org
|
|
We currently enable -Werror on the arch/powerpc subtree. However this
only catches C warnings. Assembly warnings are logged, but the make
invocation will still succeed. This can allow incorrect syntax such as
ori r3, r4, r5
to be compiled without catching that the assembler is treating r5
as the immediate value 5.
To prevent this in assembly files and inline assembly, add the
-fatal-warnings option to assembler invocations.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240326044420.577031-1-bgray@linux.ibm.com
|
|
vmpic_msi_feature is only used conditionally, which triggers a rare
-Werror=unused-const-variable= warning with gcc:
arch/powerpc/sysdev/fsl_msi.c:567:37: error: 'vmpic_msi_feature' defined but not used [-Werror=unused-const-variable=]
567 | static const struct fsl_msi_feature vmpic_msi_feature =
Hide this one in the same #ifdef as the reference so we can turn on
the warning by default.
Fixes: 305bcf26128e ("powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240403080702.3509288-2-arnd@kernel.org
|
|
Fixes the following Coccinelle/coccicheck warning reported by
string_choices.cocci:
opportunity for str_plural(tpc)
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240331222249.107467-2-thorsten.blum@toblux.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
- Handle errors in mark_rodata_ro() and mark_initmem_nx()
- Make struct crash_mem available without CONFIG_CRASH_DUMP
Thanks to Christophe Leroy and Hari Bathini.
* tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt changes for 6.9-rc1. Lots
of tiny changes and forward progress to support new hardware and
better support for existing devices. Included in here are:
- Thunderbolt (i.e. USB4) updates for newer hardware and uses as more
people start to use the hardware
- default USB authentication mode Kconfig and documentation update to
make it more obvious what is going on
- USB typec updates and enhancements
- usual dwc3 driver updates
- usual xhci driver updates
- function USB (i.e. gadget) driver updates and additions
- new device ids for lots of drivers
- loads of other small updates, full details in the shortlog
All of these, including a "last minute regression fix" have been in
linux-next with no reported issues"
* tag 'usb-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (185 commits)
usb: usb-acpi: Fix oops due to freeing uninitialized pld pointer
usb: gadget: net2272: Use irqflags in the call to net2272_probe_fin
usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic
phy: tegra: xusb: Add API to retrieve the port number of phy
USB: gadget: pxa27x_udc: Remove unused of_gpio.h
usb: gadget/snps_udc_plat: Remove unused of_gpio.h
usb: ohci-pxa27x: Remove unused of_gpio.h
usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined
usb: Clarify expected behavior of dev_bin_attrs_are_visible()
xhci: Allow RPM on the USB controller (1022:43f7) by default
usb: isp1760: remove SLAB_MEM_SPREAD flag usage
usb: misc: onboard_hub: use pointer consistently in the probe function
usb: gadget: fsl: Increase size of name buffer for endpoints
usb: gadget: fsl: Add of device table to enable module autoloading
usb: typec: tcpm: add support to set tcpc connector orientatition
usb: typec: tcpci: add generic tcpci fallback compatible
dt-bindings: usb: typec-tcpci: add tcpci fallback binding
usb: gadget: fsl-udc: Replace custom log wrappers by dev_{err,warn,dbg,vdbg}
usb: core: Set connect_type of ports based on DT node
dt-bindings: usb: Add downstream facing ports to realtek binding
...
|
|
Remove CONFIG_CRASH_DUMP dependency on CONFIG_KEXEC. CONFIG_KEXEC_CORE
was used at places where CONFIG_CRASH_DUMP or CONFIG_CRASH_RESERVE was
appropriate. Replace with appropriate #ifdefs to support CONFIG_KEXEC
and !CONFIG_CRASH_DUMP configuration option. Also, make CONFIG_FA_DUMP
dependent on CONFIG_CRASH_DUMP to avoid unmet dependencies for FA_DUMP
with !CONFIG_KEXEC_CORE configuration option.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240226103010.589537-4-hbathini@linux.ibm.com
|
|
CONFIG_KEXEC_FILE does not have to select CONFIG_CRASH_DUMP. Move
some code under CONFIG_CRASH_DUMP to support CONFIG_KEXEC_FILE and
!CONFIG_CRASH_DUMP case.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240226103010.589537-3-hbathini@linux.ibm.com
|
|
mark_rodata_ro() and mark_initmem_nx() use functions that can
fail like set_memory_nx() and set_memory_ro(), leading to a not
protected kernel.
In case of failure, panic.
Link: https://github.com/KSPP/linux/issues/7
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/b16329611deb89e1af505d43f0e2a91310584d26.1710587887.git.christophe.leroy@csgroup.eu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use
by glibc
- Add support for recognising the Power11 architected and raw PVRs
- Add support for nr_cpus=n on the command line where the
boot CPU is >= n
- Add ppcxx_allmodconfig targets for all 32-bit sub-arches
- Other small features, cleanups and fixes
Thanks to Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff
Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan,
Li zeming, Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor,
Nicholas Piggin, Peter Bergner, Qiheng Lin, Randy Dunlap, Ricardo B.
Marliere, Rob Herring, Sathvika Vasireddy, Shrikanth Hegde, Uwe
Kleine-König, Vaibhav Jain, and Wen Xiong.
* tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (71 commits)
powerpc/macio: Make remove callback of macio driver void returned
powerpc/83xx: Fix build failure with FPU=n
powerpc/64s: Fix get_hugepd_cache_index() build failure
powerpc/4xx: Fix warp_gpio_leds build failure
powerpc/amigaone: Make several functions static
powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc.
macintosh/adb: make adb_dev_class constant
powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
powerpc/fsl: Fix mfpmr() asm constraint error
powerpc: Remove cpu-as-y completely
powerpc/fsl: Modernise mt/mfpmr
powerpc/fsl: Fix mfpmr build errors with newer binutils
powerpc/64s: Use .machine power4 around dcbt
powerpc/64s: Move dcbt/dcbtst sequence into a macro
powerpc/mm: Code cleanup for __hash_page_thp
powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks
powerpc/irq: Allow softirq to hardirq stack transition
powerpc: Stop using of_root
powerpc/machdep: Define 'compatibles' property in ppc_md and use it
of: Reimplement of_machine_is_compatible() using of_machine_compatible_match()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Avoid unnecessary copying in scomp for trivial SG lists
Algorithms:
- Optimise NEON CCM implementation on ARM64
Drivers:
- Add queue stop/query debugfs support in hisilicon/qm
- Intel qat updates and cleanups"
* tag 'v6.9-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (79 commits)
Revert "crypto: remove CONFIG_CRYPTO_STATS"
crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem
crypto: tcrypt - add ffdhe2048(dh) test
crypto: iaa - fix the missing CRYPTO_ALG_ASYNC in cra_flags
crypto: hisilicon/zip - fix the missing CRYPTO_ALG_ASYNC in cra_flags
hwrng: hisi - use dev_err_probe
MAINTAINERS: Remove T Ambarus from few mchp entries
crypto: iaa - Fix comp/decomp delay statistics
crypto: iaa - Fix async_disable descriptor leak
dt-bindings: rng: atmel,at91-trng: add sam9x7 TRNG
dt-bindings: crypto: add sam9x7 in Atmel TDES
dt-bindings: crypto: add sam9x7 in Atmel SHA
dt-bindings: crypto: add sam9x7 in Atmel AES
crypto: remove CONFIG_CRYPTO_STATS
crypto: dh - Make public key test FIPS-only
crypto: rockchip - fix to check return value
crypto: jitter - fix CRYPTO_JITTERENTROPY help text
crypto: qat - make ring to service map common for QAT GEN4
crypto: qat - fix ring to service map for dcc in 420xx
crypto: qat - fix ring to service map for dcc in 4xxx
...
|
|
Pull kvm updates from Paolo Bonzini:
"S390:
- Changes to FPU handling came in via the main s390 pull request
- Only deliver to the guest the SCLP events that userspace has
requested
- More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same)
- Fix selftests undefined behavior
x86:
- Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can
be programmed *using the architectural encoding*. The enumeration
does NOT say anything about the encoding when the CPU doesn't
report support the event *in general*. It might support it, and it
might support it using the same encoding that made it into the
architectural PMU spec
- Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly
emulates RDMPC, counter availability, and a variety of other
PMC-related behaviors that depend on guest CPUID and therefore are
easier to validate with selftests than with custom guests (aka
kvm-unit-tests)
- Zero out PMU state on AMD if the virtual PMU is disabled, it does
not cause any bug but it wastes time in various cases where KVM
would check if a PMC event needs to be synthesized
- Optimize triggering of emulated events, with a nice ~10%
performance improvement in VM-Exit microbenchmarks when a vPMU is
exposed to the guest
- Tighten the check for "PMI in guest" to reduce false positives if
an NMI arrives in the host while KVM is handling an IRQ VM-Exit
- Fix a bug where KVM would report stale/bogus exit qualification
information when exiting to userspace with an internal error exit
code
- Add a VMX flag in /proc/cpuinfo to report 5-level EPT support
- Rework TDP MMU root unload, free, and alloc to run with mmu_lock
held for read, e.g. to avoid serializing vCPUs when userspace
deletes a memslot
- Tear down TDP MMU page tables at 4KiB granularity (used to be
1GiB). KVM doesn't support yielding in the middle of processing a
zap, and 1GiB granularity resulted in multi-millisecond lags that
are quite impolite for CONFIG_PREEMPT kernels
- Allocate write-tracking metadata on-demand to avoid the memory
overhead when a kernel is built with i915 virtualization support
but the workloads use neither shadow paging nor i915 virtualization
- Explicitly initialize a variety of on-stack variables in the
emulator that triggered KMSAN false positives
- Fix the debugregs ABI for 32-bit KVM
- Rework the "force immediate exit" code so that vendor code
ultimately decides how and when to force the exit, which allowed
some optimization for both Intel and AMD
- Fix a long-standing bug where kvm_has_noapic_vcpu could be left
elevated if vCPU creation ultimately failed, causing extra
unnecessary work
- Cleanup the logic for checking if the currently loaded vCPU is
in-kernel
- Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere
in the kernel) are detected earlier and are less likely to hang the
kernel
x86 Xen emulation:
- Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the gpa
but the underlying host virtual address remains the same
- When possible, use a single host TSC value when computing the
deadline for Xen timers in order to improve the accuracy of the
timer emulation
- Inject pending upcall events when the vCPU software-enables its
APIC to fix a bug where an upcall can be lost (and to follow Xen's
behavior)
- Fall back to the slow path instead of warning if "fast" IRQ
delivery of Xen events fails, e.g. if the guest has aliased xAPIC
IDs
RISC-V:
- Support exception and interrupt handling in selftests
- New self test for RISC-V architectural timer (Sstc extension)
- New extension support (Ztso, Zacas)
- Support userspace emulation of random number seed CSRs
ARM:
- Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
- Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
- Conversion of KVM's representation of LPIs to an xarray, utilized
to address serialization some of the serialization on the LPI
injection path
- Support for _architectural_ VHE-only systems, advertised through
the absence of FEAT_E2H0 in the CPU's ID register
- Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
- Set reserved bits as zero in CPUCFG
- Start SW timer only when vcpu is blocking
- Do not restart SW timer when it is expired
- Remove unnecessary CSR register saving during enter guest
- Misc cleanups and fixes as usual
Generic:
- Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
always true on all architectures except MIPS (where Kconfig
determines the available depending on CPU capabilities). It is
replaced either by an architecture-dependent symbol for MIPS, and
IS_ENABLED(CONFIG_KVM) everywhere else
- Factor common "select" statements in common code instead of
requiring each architecture to specify it
- Remove thoroughly obsolete APIs from the uapi headers
- Move architecture-dependent stuff to uapi/asm/kvm.h
- Always flush the async page fault workqueue when a work item is
being removed, especially during vCPU destruction, to ensure that
there are no workers running in KVM code when all references to
KVM-the-module are gone, i.e. to prevent a very unlikely
use-after-free if kvm.ko is unloaded
- Grab a reference to the VM's mm_struct in the async #PF worker
itself instead of gifting the worker a reference, so that there's
no need to remember to *conditionally* clean up after the worker
Selftests:
- Reduce boilerplate especially when utilize selftest TAP
infrastructure
- Add basic smoke tests for SEV and SEV-ES, along with a pile of
library support for handling private/encrypted/protected memory
- Fix benign bugs where tests neglect to close() guest_memfd files"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
selftests: kvm: remove meaningless assignments in Makefiles
KVM: riscv: selftests: Add Zacas extension to get-reg-list test
RISC-V: KVM: Allow Zacas extension for Guest/VM
KVM: riscv: selftests: Add Ztso extension to get-reg-list test
RISC-V: KVM: Allow Ztso extension for Guest/VM
RISC-V: KVM: Forward SEED CSR access to user space
KVM: riscv: selftests: Add sstc timer test
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Add exception handling support
LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
LoongArch: KVM: Do not restart SW timer when it is expired
LoongArch: KVM: Start SW timer only when vcpu is blocking
LoongArch: KVM: Set reserved bits as zero in CPUCFG
KVM: selftests: Explicitly close guest_memfd files in some gmem tests
KVM: x86/xen: fix recursive deadlock in timer injection
KVM: pfncache: simplify locking and make more self-contained
KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
KVM: x86/xen: improve accuracy of Xen timers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations".
- Kuan-Wei Chiu has also sped up the library sorting code in the series
"lib/sort: Optimize the number of swaps and comparisons".
- Alexey Gladkov has added the ability for code running within an IPC
namespace to alter its IPC and MQ limits. The series is "Allow to
change ipc/mq sysctls inside ipc namespace".
- Geert Uytterhoeven has contributed some dhrystone maintenance work in
the series "lib: dhry: miscellaneous cleanups".
- Ryusuke Konishi continues nilfs2 maintenance work in the series
"nilfs2: eliminate kmap and kmap_atomic calls"
"nilfs2: fix kernel bug at submit_bh_wbc()"
- Nathan Chancellor has updated our build tools requirements in the
series "Bump the minimum supported version of LLVM to 13.0.1".
- Muhammad Usama Anjum continues with the selftests maintenance work in
the series "selftests/mm: Improve run_vmtests.sh".
- Oleg Nesterov has done some maintenance work against the signal code
in the series "get_signal: minor cleanups and fix".
Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.
* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
nilfs2: prevent kernel bug at submit_bh_wbc()
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
ocfs2: enable ocfs2_listxattr for special files
ocfs2: remove SLAB_MEM_SPREAD flag usage
assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
buildid: use kmap_local_page()
watchdog/core: remove sysctl handlers from public header
nilfs2: use div64_ul() instead of do_div()
mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
kexec: copy only happens before uchunk goes to zero
get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
get_signal: don't abuse ksig->info.si_signo and ksig->sig
const_structs.checkpatch: add device_type
Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
list: leverage list_is_head() for list_entry_is_head()
nilfs2: MAINTAINERS: drop unreachable project mirror site
smp: make __smp_processor_id() 0-argument macro
fat: fix uninitialized field in nostale filehandles
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
|
|
Pull drm updates from Dave Airlie:
"Highlights are usual, more AMD IP blocks for future hw, i915/xe
changes, Displayport tunnelling support for i915, msm YUV over DP
changes, new tests for ttm, but its mostly a lot of stuff all over the
place from lots of people.
core:
- EDID cleanups
- scheduler error handling fixes
- managed: add drmm_release_action() with tests
- add ratelimited drm debug print
- DPCD PSR early transport macro
- DP tunneling and bandwidth allocation helpers
- remove built-in edids
- dp: Avoid AUX transfers on powered-down displays
- dp: Add VSC SDP helpers
cross drivers:
- use new drm print helpers
- switch to ->read_edid callback
- gem: add stats for shared buffers plus updates to amdgpu, i915, xe
syncobj:
- fixes to waiting and sleeping
ttm:
- add tests
- fix errno codes
- simply busy-placement handling
- fix page decryption
media:
- tc358743: fix v4l device registration
video:
- move all kernel parameters for video behind CONFIG_VIDEO
sound:
- remove <drm/drm_edid.h> include from header
ci:
- add tests for msm
- fix apq8016 runner
efifb:
- use copy of global screen_info state
vesafb:
- use copy of global screen_info state
simplefb:
- fix logging
bridge:
- ite-6505: fix DP link-training bug
- samsung-dsim: fix error checking in probe
- samsung-dsim: add bsh-smm-s2/pro boards
- tc358767: fix regmap usage
- imx: add i.MX8MP HDMI PVI plus DT bindings
- imx: add i.MX8MP HDMI TX plus DT bindings
- sii902x: fix probing and unregistration
- tc358767: limit pixel PLL input range
- switch to new drm_bridge_read_edid() interface
panel:
- ltk050h3146w: error-handling fixes
- panel-edp: support delay between power-on and enable; use put_sync
in unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49
V8.0, BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings
- panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings
- panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings
- add BOE TH101MB31IG002-28A plus DT bindings
- add EDT ETML1010G3DRA plus DT bindings
- add Novatek NT36672E LCD DSI plus DT bindings
- nt36523: support 120Hz timings, fix includes
- simple: fix display timings on RK32FN48H
- visionox-vtdr6130: fix initialization
- add Powkiddy RGB10MAX3 plus DT bindings
- st7703: support panel rotation plus DT bindings
- add Himax HX83112A plus DT bindings
- ltk500hd1829: add support for ltk101b4029w and admatec 9904370
- simple: add BOE BP082WX1-100 8.2" panel plus DT bindungs
panel-orientation-quirks:
- GPD Win Mini
amdgpu:
- Validate DMABuf imports in compute VMs
- Add RAS ACA framework
- PSP 13 fixes
- Misc code cleanups
- Replay fixes
- Atom interpretor PS, WS bounds checking
- DML2 fixes
- Audio fixes
- DCN 3.5 Z state fixes
- Remove deprecated ida_simple usage
- UBSAN fixes
- RAS fixes
- Enable seq64 infrastructure
- DC color block enablement
- Documentation updates
- DC documentation updates
- DMCUB updates
- ATHUB 4.1 support
- LSDMA 7.0 support
- JPEG DPG support
- IH 7.0 support
- HDP 7.0 support
- VCN 5.0 support
- SMU 13.0.6 updates
- NBIO 7.11 updates
- SDMA 6.1 updates
- MMHUB 3.3 updates
- DCN 3.5.1 support
- NBIF 6.3.1 support
- VPE 6.1.1 support
amdkfd:
- Validate DMABuf imports in compute VMs
- SVM fixes
- Trap handler updates and enhancements
- Fix cache size reporting
- Relocate the trap handler
radeon:
- Atom interpretor PS, WS bounds checking
- Misc code cleanups
xe:
- new query for GuC submission version
- Remove unused persistent exec_queues
- Add vram frequency sysfs attributes
- Add the flag XE_VM_BIND_FLAG_DUMPABLE
- Drop pre-production workarounds
- Drop kunit tests for unsupported platforms
- Start pumbling SR-IOV support with memory based interrupts for VF
- Allow to map BO in GGTT with PAT index corresponding to XE_CACHE_UC
to work with memory based interrupts
- Add GuC Doorbells Manager as prep work SR-IOV
- Implement additional workarounds for xe2 and MTL
- Program a few registers according to perfomance guide spec for Xe2
- Fix remaining 32b build issues and enable it back
- Fix build with CONFIG_DEBUG_FS=n
- Fix warnings from GuC ABI headers
- Introduce Relay Communication for SR-IOV for VF <-> GuC <-> PF
- Release mmap mappings on rpm suspend
- Disable mid-thread preemption when not properly supported by
hardware
- Fix xe_exec by reserving extra fence slot for CPU bind
- Fix xe_exec with full long running exec queue
- Canonicalize addresses where needed for Xe2 and add to devcoredum
- Toggle USM support for Xe2
- Only allow 1 ufence per exec / bind IOCTL
- Add GuC firmware loading for Lunar Lake
- Add XE_VMA_PTE_64K VMA flag
i915:
- Add more ADL-N PCI IDs
- Enable fastboot also on older platforms
- Early transport for panel replay and PSR
- New ARL PCI IDs
- DP TPS4 PHY test pattern support
- Unify and improve VSC SDP for PSR and non-PSR cases
- Refactor memory regions and improve debug logging
- Rework global state serialization
- Remove unused CDCLK divider fields
- Unify HDCP connector logging format
- Use display instead of graphics version in display code
- Move VBT and opregion debugfs next to the implementation
- Abstract opregion interface, use opaque type
- MTL fixes
- HPD handling fixes
- Add GuC submission interface version query
- Atomically invalidate userptr on mmu-notifier
- Update handling of MMIO triggered reports
- Don't make assumptions about intel_wakeref_t type
- Extend driver code of Xe_LPG to Xe_LPG+
- Add flex arrays to struct i915_syncmap
- Allow for very slow HuC loading
- DP tunneling and bandwidth allocation support
msm:
- Correct bindings for MSM8976 and SM8650 platforms
- Start migration of MDP5 platforms to DPU driver
- X1E80100 MDSS support
- DPU:
- Improve DSC allocation, fixing several important corner cases
- Add support for SDM630/SDM660 platforms
- Simplify dpu_encoder_phys_ops
- Apply fixes targeting DSC support with a single DSC encoder
- Apply fixes for HCTL_EN timing configuration
- X1E80100 support
- Add support for YUV420 over DP
- GPU:
- fix sc7180 UBWC config
- fix a7xx LLC config
- new gpu support: a305B, a750, a702
- machine support: SM7150 (different power levels than other a618)
- a7xx devcoredump support
habanalabs:
- configure IRQ affinity according to NUMA node
- move HBM MMU page tables inside the HBM
- improve device reset
- check extended PCIe errors
ivpu:
- updates to firmware API
- refactor BO allocation
imx:
- use devm_ functions during init
hisilicon:
- fix EDID includes
mgag200:
- improve ioremap usage
- convert to struct drm_edid
- Work around PCI write bursts
nouveau:
- disp: use kmemdup()
- fix EDID includes
- documentation fixes
qaic:
- fixes to BO handling
- make use of DRM managed release
- fix order of remove operations
rockchip:
- analogix_dp: get encoder port from DT
- inno_hdmi: support HDMI for RK3128
- lvds: error-handling fixes
ssd130x:
- support SSD133x plus DT bindings
tegra:
- fix error handling
tilcdc:
- make use of DRM managed release
v3d:
- show memory stats in debugfs
- Support display MMU page size
vc4:
- fix error handling in plane prepare_fb
- fix framebuffer test in plane helpers
virtio:
- add venus capset defines
vkms:
- fix OOB access when programming the LUT
- Kconfig improvements
vmwgfx:
- unmap surface before changing plane state
- fix memory leak in error handling
- documentation fixes
- list command SVGA_3D_CMD_DEFINE_GB_SURFACE_V4 as invalid
- fix null-pointer deref in execbuf
- refactor display-mode probing
- fix fencing for creating cursor MOBs
- fix cursor-memory lifetime
xlnx:
- fix live video input for ZynqMP DPSUB
lima:
- fix memory leak
loongson:
- fail if no VRAM present
meson:
- switch to new drm_bridge_read_edid() interface
renesas:
- add RZ/G2L DU support plus DT bindings
mxsfb:
- Use managed mode config
sun4i:
- HDMI: updates to atomic mode setting
mediatek:
- Add display driver for MT8188 VDOSYS1
- DSI driver cleanups
- Filter modes according to hardware capability
- Fix a null pointer crash in mtk_drm_crtc_finish_page_flip
etnaviv:
- enhancements for NPU and MRT support"
* tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel: (1420 commits)
drm/amd/display: Removed redundant @ symbol to fix kernel-doc warnings in -next repo
drm/amd/pm: wait for completion of the EnableGfxImu message
drm/amdgpu/soc21: add mode2 asic reset for SMU IP v14.0.1
drm/amdgpu: add smu 14.0.1 support
drm/amdgpu: add VPE 6.1.1 discovery support
drm/amdgpu/vpe: add VPE 6.1.1 support
drm/amdgpu/vpe: don't emit cond exec command under collaborate mode
drm/amdgpu/vpe: add collaborate mode support for VPE
drm/amdgpu/vpe: add PRED_EXE and COLLAB_SYNC OPCODE
drm/amdgpu/vpe: add multi instance VPE support
drm/amdgpu/discovery: add nbif v6_3_1 ip block
drm/amdgpu: Add nbif v6_3_1 ip block support
drm/amdgpu: Add pcie v6_1_0 ip headers (v5)
drm/amdgpu: Add nbif v6_3_1 ip headers (v5)
arch/powerpc: Remove <linux/fb.h> from backlight code
macintosh/via-pmu-backlight: Include <linux/backlight.h>
fbdev/chipsfb: Include <linux/backlight.h>
drm/etnaviv: Restore some id values
drm/amdkfd: make kfd_class constant
drm/amdgpu: add ring timeout information in devcoredump
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
"Christophe Leroy did most of the work on this release, first with a
few cleanups on CONFIG_STRICT_KERNEL_RWX and ending with error
handling for when set_memory_XX() can fail.
This is part of a larger effort to clean up all these callers which
can fail, modules is just part of it"
* tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Don't ignore errors from set_memory_XX()
lib/test_kmod: fix kernel-doc warnings
powerpc: Simplify strict_kernel_rwx_enabled()
modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled
init: Declare rodata_enabled and mark_rodata_ro() at all time
module: Change module_enable_{nx/x/ro}() to more explicit names
module: Use set_memory_rox()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As is pretty normal for this tree, there are changes all over the
place, especially for small fixes, selftest improvements, and improved
macro usability.
Some header changes ended up landing via this tree as they depended on
the string header cleanups. Also, a notable set of changes is the work
for the reintroduction of the UBSAN signed integer overflow sanitizer
so that we can continue to make improvements on the compiler side to
make this sanitizer a more viable future security hardening option.
Summary:
- string.h and related header cleanups (Tanzir Hasan, Andy
Shevchenko)
- VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
Harshit Mogalapalli)
- selftests/powerpc: Fix load_unaligned_zeropad build failure
(Michael Ellerman)
- hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)
- Handle tail call optimization better in LKDTM (Douglas Anderson)
- Use long form types in overflow.h (Andy Shevchenko)
- Add flags param to string_get_size() (Andy Shevchenko)
- Add Coccinelle script for potential struct_size() use (Jacob
Keller)
- Fix objtool corner case under KCFI (Josh Poimboeuf)
- Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)
- Add str_plural() helper (Michal Wajdeczko, Kees Cook)
- Ignore relocations in .notes section
- Add comments to explain how __is_constexpr() works
- Fix m68k stack alignment expectations in stackinit Kunit test
- Convert string selftests to KUnit
- Add KUnit tests for fortified string functions
- Improve reporting during fortified string warnings
- Allow non-type arg to type_max() and type_min()
- Allow strscpy() to be called with only 2 arguments
- Add binary mode to leaking_addresses scanner
- Various small cleanups to leaking_addresses scanner
- Adding wrapping_*() arithmetic helper
- Annotate initial signed integer wrap-around in refcount_t
- Add explicit UBSAN section to MAINTAINERS
- Fix UBSAN self-test warnings
- Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL
- Reintroduce UBSAN's signed overflow sanitizer"
* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
selftests/powerpc: Fix load_unaligned_zeropad build failure
string: Convert helpers selftest to KUnit
string: Convert selftest to KUnit
sh: Fix build with CONFIG_UBSAN=y
compiler.h: Explain how __is_constexpr() works
overflow: Allow non-type arg to type_max() and type_min()
VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
lib/string_helpers: Add flags param to string_get_size()
x86, relocs: Ignore relocations in .notes section
objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
overflow: Use POD in check_shl_overflow()
lib: stackinit: Adjust target string to 8 bytes for m68k
sparc: vdso: Disable UBSAN instrumentation
kernel.h: Move lib/cmdline.c prototypes to string.h
leaking_addresses: Provide mechanism to scan binary files
leaking_addresses: Ignore input device status lines
leaking_addresses: Use File::Temp for /tmp files
MAINTAINERS: Update LEAKING_ADDRESSES details
fortify: Improve buffer overflow reporting
fortify: Add KUnit tests for runtime overflows
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Just two small updates this time:
- A series I did to unify the definition of PAGE_SIZE through
Kconfig, intended to help with a vdso rework that needs the
constant but cannot include the normal kernel headers when building
the compat VDSO on arm64 and potentially others
- a patch from Yan Zhao to remove the pfn_to_virt() definitions from
a couple of architectures after finding they were both incorrect
and entirely unused"
* tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: define CONFIG_PAGE_SIZE_*KB on all architectures
arch: simplify architecture specific page size configuration
arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
mm: Remove broken pfn_to_virt() on arch csky/hexagon/openrisc
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Fix inconsistency in misfit task load-balancing
- Fix CPU isolation bugs in the task-wakeup logic
- Rework and unify the sched_use_asym_prio() and sched_asym_prefer()
logic
- Clean up and simplify ->avg_* accesses
- Misc cleanups and fixes
* tag 'sched-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/topology: Rename SD_SHARE_PKG_RESOURCES to SD_SHARE_LLC
sched/fair: Check the SD_ASYM_PACKING flag in sched_use_asym_prio()
sched/fair: Rework sched_use_asym_prio() and sched_asym_prefer()
sched/fair: Remove unused parameter from sched_asym()
sched/topology: Remove duplicate descriptions from TOPOLOGY_SD_FLAGS
sched/fair: Simplify the update_sd_pick_busiest() logic
sched/fair: Do strict inequality check for busiest misfit task group
sched/fair: Remove unnecessary goto in update_sd_lb_stats()
sched/fair: Take the scheduling domain into account in select_idle_core()
sched/fair: Take the scheduling domain into account in select_idle_smt()
sched/fair: Add READ_ONCE() and use existing helper function to access ->avg_irq
sched/fair: Use existing helper functions to access ->avg_rt and ->avg_dl
sched/core: Simplify code by removing duplicate #ifdefs
|