Age | Commit message (Collapse) | Author |
|
Convert cmpxchg128() usages to try_cmpxchg128() in order to generate
slightly better code.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
With users converted to the standard arch_cmpxchg() variants, remove
the now unused __atomic_cmpxchg() and __atomic_cmpxchg_bool() variants.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use arch_try_cmpxchg() instead of __atomic_cmpxchg_bool() everywhere.
This generates the same code like before, but uses the standard
cmpxchg() implementation instead of a custom __atomic_cmpxchg_bool().
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use arch_try_cmpxchg() instead of __atomic_cmpxchg() in
preempt_count_set() to generate similar or better code,
depending in compiler features.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Since gcc 14 flag output operands are supported also for s390.
Provide an arch_atomic try_cmpxchg() implementation so that all
existing atomic_try_cmpxchg() usages generate slightly better code,
if compiled with gcc 14 or newer.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use arch_cmpxchg() instead of __atomic_cmpxchg() for the
arch_atomic_cmpxchg() implementations. arch_cmpxchg() generates
the same code and doesn't need a cast like it is required for
arch_atomic64_cmpxchg().
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Convert the arch_atomic_xchg define to a C function so that proper
type checking is provided.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Since gcc 14 flag output operands are supported also for s390.
Provide an arch_try_cmpxchg128() implementation so that all existing
try_cmpxchg128() variants provide slightly better code, if compiled
with gcc 14 or newer.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Just like x86 and arm64 provide a trivial arch_cmpxchg128_local()
implementation by mapping it to arch_cmpxchg128().
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Get rid of the arch_xchg() inline assemblies by converting the inline
assemblies to C functions which make use of arch_try_cmpxchg().
With flag output operand support the generated code is at least as good as
the previous version. Without it is slightly worse, however getting rid of
all the inline assembly code is worth it.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Since gcc 14 flag output operands are supported also for s390.
Provide an arch_try_cmpxchg() implementation so that all existing
try_cmpxchg() variants provide slightly better code, if compiled
with gcc 14 or newer.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Rewrite __cmpxchg() in order to get rid of the large inline
assemblies. Convert the one and two byte inline assemblies to
C functions.
The generated code of the new implementation is nearly as good or bad as
the old variant, but easier to read.
Note that the new variants are quite close to the generic cmpxchg_emu_u8()
implementation, however a conversion to the generic variant will not follow
since with mm/vmstat.c there is heavy user of one byte cmpxchg(). A not
inlined variant would have a negative performance impact.
Also note that the calls within __arch_cmpxchg() come with rather pointless
"& 0xff..." operations. They exist only to avoid false positive sparse
warnings like "warning: cast truncates bits from constant value ...".
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Dump tools from s390-tools such as zipl need to know the correct dump area
size of the machine they run on in order to be able to create valid
standalone dumper images. Therefore, allow it to be obtained through
the new sysfs read-only attribute /sys/firmware/dump/dump_area_size.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Remove the use of contractions and use proper punctuation in #error
directive messages that discourage the direct inclusion of header files.
Link: https://lkml.kernel.org/r/20241105032231.28833-1-natanielfarzan@gmail.com
Signed-off-by: Nataniel Farzan <natanielfarzan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add gen17 facilities and let KVM_CAP_S390_VECTOR_REGISTERS handle
the enablement of the vector extension facilities.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-4-brueckner@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-4-brueckner@linux.ibm.com>
|
|
Message-security-assist 11 introduces pckmo subfunctions to encrypt
hmac keys.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-3-brueckner@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-3-brueckner@linux.ibm.com>
|
|
Adding support for concurrent-functions facility which provides
additional subfunctions.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-2-brueckner@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-2-brueckner@linux.ibm.com>
|
|
KVM/riscv changes for 6.13
- Accelerate KVM RISC-V when running as a guest
- Perf support to collect KVM guest statistics from host side
|
|
Add an API that will allow updates of the direct/linear map for a set of
physically contiguous pages.
It will be used in the following patches.
Link: https://lkml.kernel.org/r/20241023162711.2579610-6-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
While pci_iov.h has include guards it doesn't include the necessary
header for struct zpci_dev, pci_bus.h on the other hand lacks even basic
include guards. This isn't only fragile and breaks convention but also
confuses tooling such as clang-analyzer. Add both include guards and the
necessary includes.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add a trival phys_to_target_node() implementation which always returns 0 if
CONFIG_NUMA is enabled, since the s390 NUMA implementation only supports
node 0.
This is similar to memory_add_physaddr_to_nid() in order to avoid runtime
warnings.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
virtio-mem uses memory_add_physaddr_to_nid() to determine the NID to use
for memory it adds.
We currently fallback to the dummy implementation in mm/numa.c with
CONFIG_NUMA, which will end up triggering an undesired pr_info_once():
Unknown online node for memory at 0x100000000, assuming node 0
On s390, we map all cpus and memory to node 0, so let's add a simple
memory_add_physaddr_to_nid() implementation that does exactly that,
but without complaining.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-8-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.")
we've been using a section size of 256 MiB on s390 and 32 MiB on s390.
Before that, we were using a section size of 32 MiB on both
architectures.
Now that we have a new mechanism to expose additional memory to a VM --
virtio-mem -- reduce the section size to 128 MiB to allow for more
flexibility and reduce the metadata overhead when dealing with hot(un)plug
granularity smaller than 256 MiB.
128 MiB has been used by x86-64 since the very beginning. arm64 with 4k
base pages switched to 128 MiB as well: it's just big enough on these
architectures to allows for using a huge page (2 MiB) in the vmemmap in
sane setups with sizeof(struct page) == 64 bytes and a huge page mapping
in the direct mapping, while still allowing for small hot(un)plug
granularity.
For s390, we could even switch to a 64 MiB section size, as our huge page
size is 1 MiB: but the smaller the section size, the more sections we'll
have to manage especially on bigger machines. Making it consistent with
x86-64 and arm64 feels like the right thing for now.
Note that the smallest memory hot(un)plug granularity is also limited by
the memory block size, determined by extracting the memory increment
size from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;
therefore, we'll end up with a memory block size of 128 MiB with a
128 MiB section size.
Tested-by: Mario Casquero <mcasquer@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-7-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
devices
To support memory devices under QEMU/KVM, such as virtio-mem,
we have to prepare our kernel virtual address space accordingly and
have to know the highest possible physical memory address we might see
later: the storage limit. The good old SCLP interface is not suitable for
this use case.
In particular, memory owned by memory devices has no relationship to
storage increments, it is always detected using the device driver, and
unaware OSes (no driver) must never try making use of that memory.
Consequently this memory is located outside of the "maximum storage
increment"-indicated memory range.
Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
limit that can exceed the "maximum storage increment", and use the
existing interfaces (i.e., SCLP) to obtain information about the initial
memory that is not owned+managed by memory devices.
If a hypervisor does not support such memory devices, the address exposed
through diag500 STORAGE_LIMIT will correspond to the maximum storage
increment exposed through SCLP.
To teach kdump on s390 to include memory owned by memory devices, there
will be ways to query the relevant memory ranges from the device via a
driver running in special kdump mode (like virtio-mem already implements
to filter /proc/vmcore access so we don't end up reading from unplugged
device blocks).
Update setup_ident_map_size(), to clarify that there can be more than
just online and standby memory.
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241025141453.1210600-4-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The program interrupt code has some extra bits that are sometimes set
by hardware for various reasons; those bits should be ignored when the
program interrupt number is needed for interrupt handling.
Fixes: 05066cafa925 ("s390/mm/fault: Handle guest-related program interrupts in KVM")
Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241031120316.25462-1-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The hugepage parameter was deprecated since commit ddc1a5cbc05d
("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), for
PMD-sized THP, it still tries only preferred node if possible in
vma_alloc_folio() by checking the order of the folio allocation.
Link: https://lkml.kernel.org/r/20241010061556.1846751-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
prepare_hugepage_range() performs almost the same checks for all
architectures that define it, with the exception of mips and loongarch
that also check for overflows.
The rest checks for the addr and len to be properly aligned, so we can
move that to hugetlb_get_unmapped_area() and get rid of a fair amount of
duplicated code.
[akpm@linux-foundation.org: remove now-unused local]
Link: https://lore.kernel.org/oe-kbuild-all/202410081210.uNLbf3Jk-lkp@intel.com/
Link: https://lkml.kernel.org/r/20241007075037.267650-10-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
s390 redefines functions that are already defined (and the same) in
include/asm-generic/hugetlb.h.
Do as the other architectures:
1) include include/asm-generic/hugetlb.h
2) drop the already defined functions in the generic hugetlb.h and
3) use the __HAVE_ARCH_HUGE_* macros to define our own.
This gets rid of quite some code.
Link: https://lkml.kernel.org/r/20241007075037.267650-9-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Hugetlb mappings are now handled through normal channels just like any
other mapping, so we no longer need hugetlb_get_unmapped_area* specific
functions.
Link: https://lkml.kernel.org/r/20241007075037.267650-8-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We want to stop special casing hugetlb mappings and make them go through
generic channels, so teach arch_get_unmapped_area{_topdown} to handle
those.
s390 specific hugetlb function does not set info.align_offset, so do the
same here for compatibility.
Link: https://lkml.kernel.org/r/20241007075037.267650-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and
removexattrat(). Those can be used to operate on extended attributes,
especially security related ones, either relative to a pinned directory
or on a file descriptor without read access, avoiding a
/proc/<pid>/fd/<fd> detour, requiring a mounted procfs.
One use case will be setfiles(8) setting SELinux file contexts
("security.selinux") without race conditions and without a file
descriptor opened with read access requiring SELinux read permission.
Use the do_{name}at() pattern from fs/open.c.
Pass the value of the extended attribute, its length, and for
setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added
struct xattr_args to not exceed six syscall arguments and not
merging the AT_* and XATTR_* flags.
[AV: fixes by Christian Brauner folded in, the entire thing rebased on
top of {filename,file}_...xattr() primitives, treatment of empty
pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling
is cheap, so f...(2) can use it]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
CC: x86@kernel.org
CC: linux-alpha@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
CC: linux-ia64@vger.kernel.org
CC: linux-m68k@lists.linux-m68k.org
CC: linux-mips@vger.kernel.org
CC: linux-parisc@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-s390@vger.kernel.org
CC: linux-sh@vger.kernel.org
CC: sparclinux@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: audit@vger.kernel.org
CC: linux-arch@vger.kernel.org
CC: linux-api@vger.kernel.org
CC: linux-security-module@vger.kernel.org
CC: selinux@vger.kernel.org
[brauner: slight tweaks]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The struct arch_vdso_data is only about vdso time data. So rename it to
arch_vdso_time_data to make it obvious.
Non time-related data will be migrated out of these structs soon.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-28-b64f0842d512@linutronix.de
|
|
This constant is always "0", providing no value and making the logic
harder to understand.
Also prepare for a consolidation of the vdso linkerscript logic by
aligning it with other architectures.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-3-b64f0842d512@linutronix.de
|
|
The flags variable was being used uninitialized.
Initialize it to 0 as expected.
For some reason neither gcc nor clang reported a warning.
Fixes: 05066cafa925 ("s390/mm/fault: Handle guest-related program interrupts in KVM")
Reported-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20241030161906.85476-1-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Reservation of the PMU hardware is done at first event creation
and is protected by a pair of mutex_lock() and mutex_unlock().
After reservation of the PMU hardware the memory
required for the PMUs the event is to be installed on is
allocated by allocate_buffers() and alloc_sampling_buffer().
This done outside of the mutex protection.
Without mutex protection two or more concurrent invocations of
perf_event_init() may run in parallel.
This can lead to allocation of Sample Data Blocks (SDBs)
multiple times for the same PMU.
Prevent this and protect memory allocation of SDBs by
mutex.
Fixes: 8a6fe8f21ec4 ("s390/cpum_sf: Use refcount_t instead of atomic_t")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add a small PtP driver which allows user space to get
the values of the physical and tod clock. This allows
programs like chrony to use STP as clock source and
steer the kernel clock. The physical clock can be used
as a debugging aid to get the clock without any additional
offsets like STP steering or LPAR offset.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To allow specifying the clock source in the upcoming PtP driver,
add a clocksource ID to the s390 TOD clock.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Combine the two VM_FAULT_ERROR checks in do_exception() and move them
to the exit path, similar to x86. Also remove a random blank line.
Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
With the gmap code gone s390 can be easily converted to
LOCK_MM_AND_FIND_VMA like it has been done for most other
architectures.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-12-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
With GMAP_FAULT fault type gone, there are only KERNEL_FAULT and
USER_FAULT fault types left. Therefore there is no need for any fault
type switch statements left.
Rename get_fault_type() into is_kernel_fault() and let it return a
boolean value. Change all switch statements to if statements. This
removes quite a bit of code.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-11-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
With the gmap code gone get_fault_type() can be simplified:
- every fault with user_mode(regs) == true must be a fault in user address
space
- every fault with user_mode(regs) == false is only a fault in user
address space if the used address space is the secondary address space
- every other fault is within the kernel address space
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-10-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Remove the gmap pointer from lowcore, since it is not used anymore.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-9-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Remove gmap_enable(), gmap_disable(), and gmap_get_enabled() since they do
not have any users anymore.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-8-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Stop using gmap_enable(), gmap_disable(), gmap_get_enabled().
The correct guest ASCE is passed as a parameter of sie64a(), there is
no need to save the current gmap in lowcore.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-7-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Any program interrupt that happens in the host during the execution of
a KVM guest will now short circuit the fault handler and return to KVM
immediately. Guest fault handling (including pfault) will happen
entirely inside KVM.
When sie64a() returns zero, current->thread.gmap_int_code will contain
the program interrupt number that caused the exit, or zero if the exit
was not caused by a host program interrupt.
KVM will now take care of handling all guest faults in vcpu_post_run().
Since gmap faults will not be visible by the rest of the kernel, remove
GMAP_FAULT, the linux fault handlers for secure execution faults, the
exception table entries for the sie instruction, the nop padding after
the sie instruction, and all other references to guest faults from the
s390 code.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-6-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Errors in fixup_user_fault() were masked and -EFAULT was returned for
any error, including out of memory.
Fix this by returning the correct error code. This means that in many
cases the error code will be propagated all the way to userspace.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-5-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
When specifying FAULT_FLAG_RETRY_NOWAIT as flag for gmap_fault(), the
gmap fault will be processed only if it can be resolved quickly and
without sleeping. This will be needed for pfault.
Refactor gmap_fault() to improve readability.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-4-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
kvm_arch_fault_in_page() is a useless wrapper around gmap_fault(); just
use gmap_fault() directly instead.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-3-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Now that the guest ASCE is passed as a parameter to __sie64a(),
_PIF_GUEST_FAULT can be used again to determine whether the fault was a
guest or host fault.
Since the guest ASCE will not be taken from the gmap pointer in lowcore
anymore, __GMAP_ASCE can be removed. For the same reason the guest
ASCE needs now to be saved into the cr1 save area unconditionally.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20241022120601.167009-2-imbrenda@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|