Age | Commit message (Collapse) | Author |
|
The code that implements the rarely used PID_IN_CONTEXTIDR feature
dereferences the 'task' field of struct thread_info directly, and this
is no longer possible when THREAD_INFO_IN_TASK=y, as the 'task' field is
omitted from the struct definition in that case. Instead, we should just
cast the thread_info pointer to a task_struct pointer, given that the
former is now the first member of the latter.
So use a helper that abstracts this, and provide implementations for
both cases.
Reported by: Arnd Bergmann <arnd@arndb.de>
Fixes: 18ed1c01a7dd ("ARM: smp: Enable THREAD_INFO_IN_TASK")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into char-misc-next
Mathieu writes:
Coresight changes for v5.16
- A new option to make coresight cpu-debug capabilities available as early
as possible in the kernel boot process.
- Make trace sessions more enduring by coping with scenarios where events
are scheduled on CPUs that can't reach the selected sink.
- A set of improvement to make the TMC-ETR driver more efficient.
- Enhancements to the TRBE driver to correct several errata.
- An enhancement to make the AXI burts size configurable for TMC devices
that can't work with the default value.
- A fix in the CTI module to use the correct device when calling
pm_runtime_put()
- The addition of the Kryo-5xx device to the list of support ETMs.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
* tag 'coresight-next-v5.16.v3' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux: (39 commits)
arm64: errata: Enable TRBE workaround for write to out-of-range address
arm64: errata: Enable workaround for TRBE overwrite in FILL mode
coresight: trbe: Work around write to out of range
coresight: trbe: Make sure we have enough space
coresight: trbe: Add a helper to determine the minimum buffer size
coresight: trbe: Workaround TRBE errata overwrite in FILL mode
coresight: trbe: Add infrastructure for Errata handling
coresight: trbe: Allow driver to choose a different alignment
coresight: trbe: Decouple buffer base from the hardware base
coresight: trbe: Add a helper to pad a given buffer area
coresight: trbe: Add a helper to calculate the trace generated
coresight: trbe: Defer the probe on offline CPUs
coresight: trbe: Fix incorrect access of the sink specific data
coresight: etm4x: Add ETM PID for Kryo-5XX
coresight: trbe: Prohibit trace before disabling TRBE
coresight: trbe: End the AUX handle on truncation
coresight: trbe: Do not truncate buffer on IRQ
coresight: trbe: Fix handling of spurious interrupts
coresight: trbe: irq handler: Do not disable TRBE if no action is needed
coresight: trbe: Unify the enabling sequence
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Three commits fixing some issues introduced with the recent IOMMU
changes we merged.
Thanks to Alexey Kardashevskiy"
* tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
powerpc/pseries/iommu: Check if the default window in use before removing it
powerpc/pseries/iommu: Use correct vfree for it_map
|
|
Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted. So change every instance we can to make the code clearer.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Directly calling do_exit with a signal number has the problem that
all of the side effects of the signal don't happen, such as
killing all of the threads of a process instead of just the
calling thread.
So replace do_exit(SIGSYS) with force_fatal_sig(SIGSYS) which
causes the signal handling to take it's normal path and work
as expected.
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20211020174406.17889-17-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Modify the 32bit version of setup_rt_frame and setup_frame to act
similar to the 64bit version of setup_rt_frame and fail with a signal
instead of calling do_exit.
Replacing do_exit(SIGILL) with force_fatal_signal(SIGILL) ensures that
the process will be terminated cleanly when the stack frame is
invalid, instead of just killing off a single thread and leaving the
process is a weird state.
Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Link: https://lkml.kernel.org/r/20211020174406.17889-16-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
The function try_to_clear_window_buffer is only called from
rtrap_32.c. After it is called the signal pending state is retested,
and signals are handled if TIF_SIGPENDING is set. This allows
try_to_clear_window_buffer to call force_fatal_signal and then rely on
the signal being delivered to kill the process, without any danger of
returning to userspace, or otherwise using possible corrupt state on
failure.
The functional difference between force_fatal_sig and do_exit is that
do_exit will only terminate a single thread, and will never trigger a
core-dump. A multi-threaded program for which a single thread
terminates unexpectedly is hard to reason about. Calling force_fatal_sig
does not give userspace a chance to catch the signal, but otherwise
is an ordinary fatal signal exit, and it will trigger a coredump
of the offending process if core dumps are enabled.
Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Link: https://lkml.kernel.org/r/20211020174406.17889-15-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Reading the history it is unclear why default_trap_handler calls
do_exit. It is not even menthioned in the commit where the change
happened. My best guess is that because it is unknown why the
exception happened it was desired to guarantee the process never
returned to userspace.
Using do_exit(SIGSEGV) has the problem that it will only terminate one
thread of a process, leaving the process in an undefined state.
Use force_sigsegv(SIGSEGV) instead which effectively has the same
behavior except that is uses the ordinary signal mechanism and
terminates all threads of a process and is generally well defined.
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lkml.kernel.org/r/20211020174406.17889-11-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Reduce maintenance burden of DVSEC query implementation by using the
centralized PCI core implementation.
There are two obvious places to simply drop in the new core
implementation. There remains find_dvsec_from_pos() which would benefit
from using a core implementation. As that change is less trivial it is
reserved for later.
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> (v1)
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Link: https://lore.kernel.org/r/163379789065.692348.7117946955275586530.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a build-time warning in x86/sm4"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/sm4 - Fix invalid section entry size
|
|
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in
Kconfig, it prevents asan-stack from getting disabled with clang even
when CONFIG_KASAN_STACK is disabled: fix this by defining the
corresponding config.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
When calling this function, all the shadow memory is already populated
with kasan_early_shadow_pte which has PAGE_KERNEL protection.
kasan_populate_early_shadow write-protects the mapping of the range
of addresses passed in argument in zero_pte_populate, which actually
write-protects all the shadow memory mapping since kasan_early_shadow_pte
is used for all the shadow memory at this point. And then when using
memblock API to populate the shadow memory, the first write access to the
kernel stack triggers a trap. This becomes visible with the next commit
that contains a fix for asan-stack.
We already manually populate all the shadow memory in kasan_early_init
and we write-protect kasan_early_shadow_pte at the end of kasan_init
which makes the calls to kasan_populate_early_shadow superfluous so
we can remove them.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support")
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A e5500 machine running a 32-bit kernel sometimes hangs at boot,
seemingly going into an infinite loop of instruction storage interrupts.
The ESR (Exception Syndrome Register) has a value of 0x800000 (store)
when this happens, which is likely set by a previous store. An
instruction TLB miss interrupt would then leave ESR unchanged, and if no
PTE exists it calls directly to the instruction storage interrupt
handler without changing ESR.
access_error() does not cause a segfault due to a store to a read-only
vma because is_exec is true. Most subsequent fault handling does not
check for a write fault on a read-only vma, and might do strange things
like create a writeable PTE or call page_mkwrite on a read only vma or
file. It's not clear what happens here to cause the infinite faulting in
this case, a fault handler failure or low level PTE or TLB handling.
In any case this can be fixed by having the instruction storage
interrupt zero regs->dsisr rather than storing the ESR value to it.
Fixes: a01a3f2ddbcd ("powerpc: remove arguments from fault handler functions")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Jacques de Laval <jacques.delaval@protonmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Jacques de Laval <jacques.delaval@protonmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211028133043.4159501-1-npiggin@gmail.com
|
|
Merge for-next/fixes to resolve conflicts in arm64_hugetlb_cma_reserve().
* for-next/fixes:
acpi/arm64: fix next_platform_timer() section mismatch error
arm64/hugetlb: fix CMA gigantic page order for non-4K PAGE_SIZE
|
|
* for-next/vdso:
arm64: vdso32: require CROSS_COMPILE_COMPAT for gcc+bfd
arm64: vdso32: suppress error message for 'make mrproper'
arm64: vdso32: drop test for -march=armv8-a
arm64: vdso32: drop the test for dmb ishld
|
|
* for-next/trbe-errata:
arm64: errata: Add detection for TRBE write to out-of-range
arm64: errata: Add workaround for TSB flush failures
arm64: errata: Add detection for TRBE overwrite in FILL mode
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
|
|
* for-next/sve:
arm64/sve: Fix warnings when SVE is disabled
arm64/sve: Add stub for sve_max_virtualisable_vl()
arm64/sve: Track vector lengths for tasks in an array
arm64/sve: Explicitly load vector length when restoring SVE state
arm64/sve: Put system wide vector length information into structs
arm64/sve: Use accessor functions for vector lengths in thread_struct
arm64/sve: Rename find_supported_vector_length()
arm64/sve: Make access to FFR optional
arm64/sve: Make sve_state_size() static
arm64/sve: Remove sve_load_from_fpsimd_state()
arm64/fp: Reindent fpsimd_save()
|
|
* for-next/pfn-valid:
arm64/mm: drop HAVE_ARCH_PFN_VALID
dma-mapping: remove bogus test for pfn_valid from dma_map_resource
|
|
* for-next/mte:
kasan: Extend KASAN mode kernel parameter
arm64: mte: Add asymmetric mode support
arm64: mte: CPU feature detection for Asymm MTE
arm64: mte: Bitfield definitions for Asymm MTE
kasan: Remove duplicate of kasan_flag_async
arm64: kasan: mte: move GCR_EL1 switch to task switch when KASAN disabled
|
|
* for-next/mm:
arm64: mm: update max_pfn after memory hotplug
arm64/mm: Add pud_sect_supported()
arm64: mm: Drop pointless call to set_max_mapnr()
|
|
* for-next/misc:
arm64: Select POSIX_CPU_TIMERS_TASK_WORK
arm64: Document boot requirements for FEAT_SME_FA64
arm64: ftrace: use function_nocfi for _mcount as well
arm64: asm: setup.h: export common variables
arm64/traps: Avoid unnecessary kernel/user pointer conversion
|
|
* for-next/kexec:
arm64: trans_pgd: remove trans_pgd_map_page()
arm64: kexec: remove cpu-reset.h
arm64: kexec: remove the pre-kexec PoC maintenance
arm64: kexec: keep MMU enabled during kexec relocation
arm64: kexec: install a copy of the linear-map
arm64: kexec: use ld script for relocation function
arm64: kexec: relocate in EL1 mode
arm64: kexec: configure EL2 vectors for kexec
arm64: kexec: pass kimage as the only argument to relocation function
arm64: kexec: Use dcache ops macros instead of open-coding
arm64: kexec: skip relocation code for inplace kexec
arm64: kexec: flush image and lists during kexec load time
arm64: hibernate: abstract ttrb0 setup function
arm64: trans_pgd: hibernate: Add trans_pgd_copy_el2_vectors
arm64: kernel: add helper for booted at EL2 and not VHE
|
|
* for-next/extable:
arm64: vmlinux.lds.S: remove `.fixup` section
arm64: extable: add load_unaligned_zeropad() handler
arm64: extable: add a dedicated uaccess handler
arm64: extable: add `type` and `data` fields
arm64: extable: use `ex` for `exception_table_entry`
arm64: extable: make fixup_exception() return bool
arm64: extable: consolidate definitions
arm64: gpr-num: support W registers
arm64: factor out GPR numbering helpers
arm64: kvm: use kvm_exception_table_entry
arm64: lib: __arch_copy_to_user(): fold fixups into body
arm64: lib: __arch_copy_from_user(): fold fixups into body
arm64: lib: __arch_clear_user(): fold fixups into body
|
|
Merge irqchip updates for Linux 5.16 from Marc Zyngier:
- A large cross-arch rework to move irq_enter()/irq_exit() into
the arch code, and removing it from the generic irq code.
Thanks to Mark Rutland for the huge effort!
- A few irqchip drivers are made modular (broadcom, meson), because
that's apparently a thing...
- A new driver for the Microchip External Interrupt Controller
- The irq_cpu_offline()/irq_cpu_online() API is now deprecated and
can only be selected on the Cavium Octeon platform. Once this
platform is removed, the API will be removed at the same time.
- A sprinkle of devm_* helper, as people seem to love that.
- The usual spattering of small fixes and minor improvements.
* tag 'irqchip-5.16': (912 commits)
h8300: Fix linux/irqchip.h include mess
dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings
MIPS: irq: Avoid an unused-variable error
genirq: Hide irq_cpu_{on,off}line() behind a deprecated option
irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()
MIPS: loongson64: Drop call to irq_cpu_offline()
irq: remove handle_domain_{irq,nmi}()
irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
irq: riscv: perform irqentry in entry code
irq: openrisc: perform irqentry in entry code
irq: csky: perform irqentry in entry code
irq: arm64: perform irqentry in entry code
irq: arm: perform irqentry in entry code
irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ
irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ
irq: add generic_handle_arch_irq()
irq: unexport handle_irq_desc()
irq: simplify handle_domain_{irq,nmi}()
irq: mips: simplify do_domain_IRQ()
...
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211029083332.3680101-1-maz@kernel.org
|
|
Using per-cpu storage for @x86_cpu_to_logical_apicid is not optimal.
Broadcast IPI will need at least one cache line per cpu to access this
field.
__x2apic_send_IPI_mask() is using standard bitmask operators.
By converting x86_cpu_to_logical_apicid to an array, we divide by 16x
number of needed cache lines, because we find 16 values per cache
line. CPU prefetcher can kick nicely.
Also move @cluster_masks to READ_MOSTLY section to avoid false sharing.
Tested on a dual socket host with 256 cpus, cost for a full broadcast
is now 11 usec instead of 33 usec.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211007143556.574911-1-eric.dumazet@gmail.com
|
|
Commit 587164cd, introduced new opal message type (OPAL_MSG_PRD2) and
added opal notifier. But I missed to unregister the notifier during
module unload path. This results in below call trace if you try to
unload and load opal_prd module.
Also add new notifier_block for OPAL_MSG_PRD2 message.
Sample calltrace (modprobe -r opal_prd; modprobe opal_prd)
BUG: Unable to handle kernel data access on read at 0xc0080000192200e0
Faulting instruction address: 0xc00000000018d1cc
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
CPU: 66 PID: 7446 Comm: modprobe Kdump: loaded Tainted: G E 5.14.0prd #759
NIP: c00000000018d1cc LR: c00000000018d2a8 CTR: c0000000000cde10
REGS: c0000003c4c0f0a0 TRAP: 0300 Tainted: G E (5.14.0prd)
MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 24224824 XER: 20040000
CFAR: c00000000018d2a4 DAR: c0080000192200e0 DSISR: 40000000 IRQMASK: 1
...
NIP notifier_chain_register+0x2c/0xc0
LR atomic_notifier_chain_register+0x48/0x80
Call Trace:
0xc000000002090610 (unreliable)
atomic_notifier_chain_register+0x58/0x80
opal_message_notifier_register+0x7c/0x1e0
opal_prd_probe+0x84/0x150 [opal_prd]
platform_probe+0x78/0x130
really_probe+0x110/0x5d0
__driver_probe_device+0x17c/0x230
driver_probe_device+0x60/0x130
__driver_attach+0xfc/0x220
bus_for_each_dev+0xa8/0x130
driver_attach+0x34/0x50
bus_add_driver+0x1b0/0x300
driver_register+0x98/0x1a0
__platform_driver_register+0x38/0x50
opal_prd_driver_init+0x34/0x50 [opal_prd]
do_one_initcall+0x60/0x2d0
do_init_module+0x7c/0x320
load_module+0x3394/0x3650
__do_sys_finit_module+0xd4/0x160
system_call_exception+0x140/0x290
system_call_common+0xf4/0x258
Fixes: 587164cd593c ("powerpc/powernv: Add new opal message type")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211028165716.41300-1-hegdevasant@linux.vnet.ibm.com
|
|
ARCH_SUPPORTS_DEBUG_PAGEALLOC
When ARCH_SUPPORTS_DEBUG_PAGEALLOC is not selected, the user can
still select CONFIG_DEBUG_PAGEALLOC in which case __kernel_map_pages()
is provided by mm/page_poison.c
So only define __kernel_map_pages() when both
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC
are defined.
Fixes: 68b44f94d637 ("powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/971b69739ff4746252e711a9845210465c023a9e.1635425947.git.christophe.leroy@csgroup.eu
|
|
Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.
Update the i386 JIT to emit code equal to what the compiler emits for
the regular kernel text (IOW. a plain THUNK call).
Update the x86_64 JIT to emit code similar to the result of compiler
and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags.
Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg),
while doing a THUNK call for RETPOLINE.
This removes the hard-coded retpoline thunks and shrinks the generated
code. Leaving a single retpoline thunk definition in the kernel.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.614772675@infradead.org
|
|
Take an idea from the 32bit JIT, which uses the multi-pass nature of
the JIT to compute the instruction offsets on a prior pass in order to
compute the relative jump offsets on a later pass.
Application to the x86_64 JIT is slightly more involved because the
offsets depend on program variables (such as callee_regs_used and
stack_depth) and hence the computed offsets need to be kept in the
context of the JIT.
This removes, IMO quite fragile, code that hard-codes the offsets and
tries to compute the length of variable parts of it.
Convert both emit_bpf_tail_call_*() functions which have an out: label
at the end. Additionally emit_bpt_tail_call_direct() also has a poke
table entry, for which it computes the offset from the end (and thus
already relies on the previous pass to have computed addrs[i]), also
convert this to be a forward based offset.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.552304864@infradead.org
|
|
Currently Linux prevents usage of retpoline,amd on !AMD hardware, this
is unfriendly and gets in the way of testing. Remove this restriction.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.487348118@infradead.org
|
|
Make sure we can see the text changes when booting with
'debug-alternative'.
Example output:
[ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
[ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
[ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
[ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.422273830@infradead.org
|
|
Try and replace retpoline thunk calls with:
LFENCE
CALL *%\reg
for spectre_v2=retpoline,amd.
Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.
Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.
Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:
Jncc.d8 1f
LFENCE
JMP *%\reg
1:
is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.359986601@infradead.org
|
|
Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:
Jcc __x86_indirect_thunk_\reg
For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:
Jncc.d8 1f
JMP *%\reg
NOP
1:
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.296470217@infradead.org
|
|
Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.
This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.
One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.
Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.232495794@infradead.org
|
|
Stick all the retpolines in a single symbol and have the individual
thunks as inner labels, this should guarantee thunk order and layout.
Previously there were 16 (or rather 15 without rsp) separate symbols and
a toolchain might reasonably expect it could displace them however it
liked, with disregard for their relative position.
However, now they're part of a larger symbol. Any change to their
relative position would disrupt this larger _array symbol and thus not
be sound.
This is the same reasoning used for data symbols. On their own there
is no guarantee about their relative position wrt to one aonther, but
we're still able to do arrays because an array as a whole is a single
larger symbol.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.169659320@infradead.org
|
|
Because it makes no sense to split the retpoline gunk over multiple
headers.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.106290934@infradead.org
|
|
Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any
subsequent usage to start with #undef, which is rude.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.041792350@infradead.org
|
|
Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.978573921@infradead.org
|
|
Now that objtool no longer creates alternatives, these replacement
symbols are no longer needed, remove them.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.915051744@infradead.org
|
|
Instead of writing complete alternatives, simply provide a list of all
the retpoline thunk calls. Then the kernel is free to do with them as
it pleases. Simpler code all-round.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.850007165@infradead.org
|
|
* irq/misc-5.16:
: .
: Misc irqchip fixes for 5.16:
: - MAINTAINERS update for the ARM VIC DT binding
: - Allow drivers using the IRQCHIP_PLATFORM_DRIVER_BEGIN/END
: infrastructure to use COMPILE_TEST without CONFIG_OF
: - DT updates
: - Detangle h8300 linux/irqchip.h inclusion
: .
h8300: Fix linux/irqchip.h include mess
dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings
irqchip: Fix compile-testing without CONFIG_OF
MAINTAINERS: update arm,vic.yaml reference
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
h8300 drags linux/irqchip.h from asm/irq.h, which is in general a bad
idea (asm/*.h should avoid dragging linux/*.h, as it is usually supposed
to work the other way around).
Move the inclusion of linux/irqchip.h to the single location where it
actually matters in the arch code.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211028172849.GA701812@roeck-us.net
|
|
Align the FDINITRD line to the FDARGS one with tabs.
[ bp: Commit message. ]
Signed-off-by: Elyes HAOUAS <ehaouas@noos.fr>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028121021.10384-1-ehaouas@noos.fr
|
|
include/net/sock.h
7b50ecfcc6cd ("net: Rename ->stream_memory_read to ->sock_is_readable")
4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout")
drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")
Adjacent code addition in both cases, keep both.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from WiFi (mac80211), and BPF.
Current release - regressions:
- skb_expand_head: adjust skb->truesize to fix socket memory
accounting
- mptcp: fix corrupt receiver key in MPC + data + checksum
Previous releases - regressions:
- multicast: calculate csum of looped-back and forwarded packets
- cgroup: fix memory leak caused by missing cgroup_bpf_offline
- cfg80211: fix management registrations locking, prevent list
corruption
- cfg80211: correct false positive in bridge/4addr mode check
- tcp_bpf: fix race in the tcp_bpf_send_verdict resulting in reusing
previous verdict
Previous releases - always broken:
- sctp: enhancements for the verification tag, prevent attackers from
killing SCTP sessions
- tipc: fix size validations for the MSG_CRYPTO type
- mac80211: mesh: fix HE operation element length check, prevent out
of bound access
- tls: fix sign of socket errors, prevent positive error codes being
reported from read()/write()
- cfg80211: scan: extend RCU protection in
cfg80211_add_nontrans_list()
- implement ->sock_is_readable() for UDP and AF_UNIX, fix poll() for
sockets in a BPF sockmap
- bpf: fix potential race in tail call compatibility check resulting
in two operations which would make the map incompatible succeeding
- bpf: prevent increasing bpf_jit_limit above max
- bpf: fix error usage of map_fd and fdget() in generic batch update
- phy: ethtool: lock the phy for consistency of results
- prevent infinite while loop in skb_tx_hash() when Tx races with
driver reconfiguring the queue <> traffic class mapping
- usbnet: fixes for bad HW conjured by syzbot
- xen: stop tx queues during live migration, prevent UAF
- net-sysfs: initialize uid and gid before calling
net_ns_get_ownership
- mlxsw: prevent Rx stalls under memory pressure"
* tag 'net-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
Revert "net: hns3: fix pause config problem after autoneg disabled"
mptcp: fix corrupt receiver key in MPC + data + checksum
riscv, bpf: Fix potential NULL dereference
octeontx2-af: Fix possible null pointer dereference.
octeontx2-af: Display all enabled PF VF rsrc_alloc entries.
octeontx2-af: Check whether ipolicers exists
net: ethernet: microchip: lan743x: Fix skb allocation failure
net/tls: Fix flipped sign in async_wait.err assignment
net/tls: Fix flipped sign in tls_err_abort() calls
net/smc: Correct spelling mistake to TCPF_SYN_RECV
net/smc: Fix smc_link->llc_testlink_time overflow
nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
vmxnet3: do not stop tx queues after netif_device_detach()
r8169: Add device 10ec:8162 to driver r8169
ptp: Document the PTP_CLK_MAGIC ioctl number
usbnet: fix error return code in usbnet_probe()
net: hns3: adjust string spaces of some parameters of tx bd info in debugfs
net: hns3: expand buffer len for some debugfs command
net: hns3: add more string spaces for dumping packets number of queue info in debugfs
net: hns3: fix data endian problem of some functions of debugfs
...
|
|
The bpf_jit_binary_free() function requires a non-NULL argument. When
the RISC-V BPF JIT fails to converge in NR_JIT_ITERATIONS steps,
jit_data->header will be NULL, which triggers a NULL
dereference. Avoid this by checking the argument, prior calling the
function.
Fixes: ca6cb5447cec ("riscv, bpf: Factor common RISC-V JIT code")
Signed-off-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20211028125115.514587-1-bjorn@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Xen interrupt injection for event channels relies on accessing the
guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
gfn_to_hva_cache.
This requires the srcu lock to be held, which is mostly the case except
for this code path:
[ 11.822877] WARNING: suspicious RCU usage
[ 11.822965] -----------------------------
[ 11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
[ 11.823131]
[ 11.823131] other info that might help us debug this:
[ 11.823131]
[ 11.823196]
[ 11.823196] rcu_scheduler_active = 2, debug_locks = 1
[ 11.823253] 1 lock held by dom:0/90:
[ 11.823292] #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
[ 11.823379]
[ 11.823379] stack backtrace:
[ 11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
[ 11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 11.823612] Call Trace:
[ 11.823645] dump_stack+0x7a/0xa5
[ 11.823681] lockdep_rcu_suspicious+0xc5/0x100
[ 11.823726] __kvm_xen_has_interrupt+0x179/0x190
[ 11.823773] kvm_cpu_has_extint+0x6d/0x90
[ 11.823813] kvm_cpu_accept_dm_intr+0xd/0x40
[ 11.823853] kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
< post_kvm_run_save() inlined here >
[ 11.823906] kvm_arch_vcpu_ioctl_run+0x135/0x6a0
[ 11.823947] kvm_vcpu_ioctl+0x263/0x680
Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Merge a couple of KVM ppc patches we are keeping in a topic branch.
|