Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Borislav Petkov:
- Correct minor issues after the microcode revision reporting
sanitization
* tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/intel: Set new revision only after a successful update
x86/microcode/intel: Remove redundant microcode late updated message
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
"This contains the work to retrieve detailed information about mounts
via two new system calls. This is hopefully the beginning of the end
of the saga that started with fsinfo() years ago.
The LWN articles in [1] and [2] can serve as a summary so we can avoid
rehashing everything here.
At LSFMM in May 2022 we got into a room and agreed on what we want to
do about fsinfo(). Basically, split it into pieces. This is the first
part of that agreement. Specifically, it is concerned with retrieving
information about mounts. So this only concerns the mount information
retrieval, not the mount table change notification, or the extended
filesystem specific mount option work. That is separate work.
Currently mounts have a 32bit id. Mount ids are already in heavy use
by libmount and other low-level userspace but they can't be relied
upon because they're recycled very quickly. We agreed that mounts
should carry a unique 64bit id by which they can be referenced
directly. This is now implemented as part of this work.
The new 64bit mount id is exposed in statx() through the new
STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
returned. If it is raised and the kernel supports the new 64bit mount
id the flag is raised in the result mask and the new 64bit mount id is
returned. New and old mount ids do not overlap so they cannot be
conflated.
Two new system calls are introduced that operate on the 64bit mount
id: statmount() and listmount(). A summary of the api and usage can be
found on LWN as well (cf. [3]) but of course, I'll provide a summary
here as well.
Both system calls rely on struct mnt_id_req. Which is the request
struct used to pass the 64bit mount id identifying the mount to
operate on. It is extensible to allow for the addition of new
parameters and for future use in other apis that make use of mount
ids.
statmount() mimicks the semantics of statx() and exposes a set flags
that userspace may raise in mnt_id_req to request specific information
to be retrieved. A statmount() call returns a struct statmount filled
in with information about the requested mount. Supported requests are
indicated by raising the request flag passed in struct mnt_id_req in
the @mask argument in struct statmount.
Currently we do support:
- STATMOUNT_SB_BASIC:
Basic filesystem info
- STATMOUNT_MNT_BASIC
Mount information (mount id, parent mount id, mount attributes etc)
- STATMOUNT_PROPAGATE_FROM
Propagation from what mount in current namespace
- STATMOUNT_MNT_ROOT
Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)
- STATMOUNT_MNT_POINT
Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)
- STATMOUNT_FS_TYPE
Name of the filesystem type as the magic number isn't enough due to submounts
The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
are appended to the end of the struct. Userspace can use the offsets
in @fs_type, @mnt_root, and @mnt_point to reference those strings
easily.
The struct statmount reserves quite a bit of space currently for
future extensibility. This isn't really a problem and if this bothers
us we can just send a follow-up pull request during this cycle.
listmount() is given a 64bit mount id via mnt_id_req just as
statmount(). It takes a buffer and a size to return an array of the
64bit ids of the child mounts of the requested mount. Userspace can
thus choose to either retrieve child mounts for a mount in batches or
iterate through the child mounts. For most use-cases it will be
sufficient to just leave space for a few child mounts. But for big
mount tables having an iterator is really helpful. Iterating through a
mount table works by setting @param in mnt_id_req to the mount id of
the last child mount retrieved in the previous listmount() call"
Link: https://lwn.net/Articles/934469 [1]
Link: https://lwn.net/Articles/829212 [2]
Link: https://lwn.net/Articles/950569 [3]
* tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
add selftest for statmount/listmount
fs: keep struct mnt_id_req extensible
wire up syscalls for statmount/listmount
add listmount(2) syscall
statmount: simplify string option retrieval
statmount: simplify numeric option retrieval
add statmount(2) syscall
namespace: extract show_path() helper
mounts: keep list of mounts in an rbtree
add unique mount ID
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual fses.
Features:
- Add Jan Kara as VFS reviewer
- Show correct device and inode numbers in proc/<pid>/maps for vma
files on stacked filesystems. This is now easily doable thanks to
the backing file work from the last cycles. This comes with
selftests
Cleanups:
- Remove a redundant might_sleep() from wait_on_inode()
- Initialize pointer with NULL, not 0
- Clarify comment on access_override_creds()
- Rework and simplify eventfd_signal() and eventfd_signal_mask()
helpers
- Process aio completions in batches to avoid needless wakeups
- Completely decouple struct mnt_idmap from namespaces. We now only
keep the actual idmapping around and don't stash references to
namespaces
- Reformat maintainer entries to indicate that a given subsystem
belongs to fs/
- Simplify fput() for files that were never opened
- Get rid of various pointless file helpers
- Rename various file helpers
- Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
last cycle
- Make relatime_need_update() return bool
- Use GFP_KERNEL instead of GFP_USER when allocating superblocks
- Replace deprecated ida_simple_*() calls with their current ida_*()
counterparts
Fixes:
- Fix comments on user namespace id mapping helpers. They aren't
kernel doc comments so they shouldn't be using /**
- s/Retuns/Returns/g in various places
- Add missing parameter documentation on can_move_mount_beneath()
- Rename i_mapping->private_data to i_mapping->i_private_data
- Fix a false-positive lockdep warning in pipe_write() for watch
queues
- Improve __fget_files_rcu() code generation to improve performance
- Only notify writer that pipe resizing has finished after setting
pipe->max_usage otherwise writers are never notified that the pipe
has been resized and hang
- Fix some kernel docs in hfsplus
- s/passs/pass/g in various places
- Fix kernel docs in ntfs
- Fix kcalloc() arguments order reported by gcc 14
- Fix uninitialized value in reiserfs"
* tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
reiserfs: fix uninit-value in comp_keys
watch_queue: fix kcalloc() arguments order
ntfs: dir.c: fix kernel-doc function parameter warnings
fs: fix doc comment typo fs tree wide
selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
fs/proc: show correct device and inode numbers in /proc/pid/maps
eventfd: Remove usage of the deprecated ida_simple_xx() API
fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
fs/hfsplus: wrapper.c: fix kernel-doc warnings
fs: add Jan Kara as reviewer
fs/inode: Make relatime_need_update return bool
pipe: wakeup wr_wait after setting max_usage
file: remove __receive_fd()
file: stop exposing receive_fd_user()
fs: replace f_rcuhead with f_task_work
file: remove pointless wrapper
file: s/close_fd_get_file()/file_close_fd()/g
Improve __fget_files_rcu() code generation (and thus __fget_light())
file: massage cleanup of files that failed to open
fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
...
|
|
for the v6.8 merge window
This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-01-05
We've added 40 non-merge commits during the last 2 day(s) which contain
a total of 73 files changed, 1526 insertions(+), 951 deletions(-).
The main changes are:
1) Fix a memory leak when streaming AF_UNIX sockets were inserted
into multiple sockmap slots/maps, from John Fastabend.
2) Fix gotol in s390 BPF JIT with large offsets, from Ilya Leoshkevich.
3) Fix reattachment branch in bpf_tracing_prog_attach() and reject
the request if there is no valid attach_btf, from Jiri Olsa.
4) Remove deprecated bpfilter kernel leftovers given the project
is developed in user space (https://github.com/facebook/bpfilter),
from Quentin Deslandes.
5) Relax tracing BPF program recursive attach rules given right now
it is not possible to create tracing program call cycles,
from Dmitrii Dolgov.
6) Fix excessive memory consumption for the bpf_global_percpu_ma
for systems with a large number of CPUs, from Yonghong Song.
7) Small x86 BPF JIT cleanup to reuse emit_nops instead of open-coding
memcpy of x86_nops, from Leon Hwang.
8) Follow-up for libbpf to support __arg_ctx global function argument tag
semantics to complement the merged kernel side, from Andrii Nakryiko.
9) Introduce "volatile compare" macros for BPF selftests in order
to make the latter more robust against compiler optimization,
from Alexei Starovoitov.
10) Small simplification in verifier's size checking of helper accesses
along with additional selftests, from Andrei Matei.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (40 commits)
selftests/bpf: Test re-attachment fix for bpf_tracing_prog_attach
bpf: Fix re-attachment branch in bpf_tracing_prog_attach
selftests/bpf: Add test for recursive attachment of tracing progs
bpf: Relax tracing prog recursive attach rules
bpf, x86: Use emit_nops to replace memcpy x86_nops
selftests/bpf: Test gotol with large offsets
selftests/bpf: Double the size of test_loader log
s390/bpf: Fix gotol with large offsets
bpfilter: remove bpfilter
bpf: Remove unnecessary cpu == 0 check in memalloc
selftests/bpf: add __arg_ctx BTF rewrite test
selftests/bpf: add arg:ctx cases to test_global_funcs tests
libbpf: implement __arg_ctx fallback logic
libbpf: move BTF loading step after relocation step
libbpf: move exception callbacks assignment logic into relocation step
libbpf: use stable map placeholder FDs
libbpf: don't rely on map->fd as an indicator of map being created
libbpf: use explicit map reuse flag to skip map creation steps
libbpf: make uniform use of btf__fd() accessor inside libbpf
selftests/bpf: Add a selftest with > 512-byte percpu allocation size
...
====================
Link: https://lore.kernel.org/r/20240105170105.21070-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 84fc86360623 ("ARM: make ARCH_MULTIPLATFORM user-visible") modified
DEBUG_UNCOMPRESS to prevent using it with multiplatform kernels.
Update the help text, remove references to multiplatform.
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc mm fixes from Andrew Morton:
"12 hotfixes.
Two are cc:stable and the remainder either address post-6.7 issues or
aren't considered necessary for earlier kernel versions"
* tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()
mailmap: add entries for Mathieu Othacehe
MAINTAINERS: change vmware.com addresses to broadcom.com
arch/mm/fault: fix major fault accounting when retrying under per-VMA lock
mm/mglru: skip special VMAs in lru_gen_look_around()
MAINTAINERS: hand over hwpoison maintainership to Miaohe Lin
MAINTAINERS: remove hugetlb maintainer Mike Kravetz
mm: fix unmap_mapping_range high bits shift bug
mm: memcg: fix split queue list crash when large folio migration
mm: fix arithmetic for max_prop_frac when setting max_ratio
mm: fix arithmetic for bdi min_ratio
mm: align larger anonymous mappings on THP boundaries
|
|
Use SZ_1M macro instead of hardcoded 1<<20 to make code more readable.
Link: https://lkml.kernel.org/r/20240102144905.110047-3-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "crash: Some cleanups and fixes", v2.
This patchset includes two cleanups and one fix.
This patch (of 3):
The image parameter is no longer in use, remove it. Also, tidy up the
code formatting.
Link: https://lkml.kernel.org/r/20240102144905.110047-1-ytcoode@gmail.com
Link: https://lkml.kernel.org/r/20240102144905.110047-2-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add dummy pmd_dirty() for architectures that don't provide it.
This is similar to commit 6617da8fb565 ("mm: add dummy pmd_young()
for architectures not having it").
Link: https://lkml.kernel.org/r/20231227141205.2200125-5-kinseyho@google.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312210606.1Etqz3M4-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202312210042.xQEiqlEh-lkp@intel.com/
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Donet Tom <donettom@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/mglru: Kconfig cleanup", v4.
This series is the result of the following discussion:
https://lore.kernel.org/47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com/
It mainly avoids building the code that walks page tables on CPUs that
use it, i.e., those don't support hardware accessed bit. Specifically,
it introduces a new Kconfig to guard some of functions added by
commit bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
on CPUs like POWER9, on which the series was tested.
This patch (of 5):
Some architectures are able to set the accessed bit in PTEs when PTEs
are used as part of linear address translations.
Add CONFIG_ARCH_HAS_HW_PTE_YOUNG for such architectures to be able to
override arch_has_hw_pte_young().
Link: https://lkml.kernel.org/r/20231227141205.2200125-1-kinseyho@google.com
Link: https://lkml.kernel.org/r/20231227141205.2200125-2-kinseyho@google.com
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Donet Tom <donettom@linux.vnet.ibm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are two correctness fixes for handing DT input in the
Allwinner (sunxi) SMP startup code"
* tag 'soc-fixes-6.7-3a' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: sun9i: smp: fix return code check of of_property_match_string
ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
|
|
Pull kvm fix from Paolo Bonzini:
- Fix boolean logic in intel_guest_get_msrs
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull kprobes/x86 fix from Masami Hiramatsu:
- Fix to emulate indirect call which size is not 5 byte.
Current code expects the indirect call instructions are 5 bytes, but
that is incorrect. Usually indirect call based on register is shorter
than that, thus the emulation causes a kernel crash by accessing
wrong instruction boundary. This uses the instruction size to
calculate the return address correctly.
* tag 'probes-fixes-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into soc/dt
SoCFPGA DTS updates for v6.8
- Fix dtbs_check warnings for nand, usb, FPGA firmware, and pin-controller
- Clean up of DTS for Agilex5
* tag 'socfpga_dts_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
arm64: dts: intel: minor whitespace cleanup around '='
arm64: dts: socfpga: agilex: drop redundant status
arm64: dts: socfpga: agilex: add unit address to soc node
arm64: dts: socfpga: agilex: move firmware out of soc node
arm64: dts: socfpga: agilex: move FPGA region out of soc node
arm64: dts: socfpga: agilex: align pin-controller name with bindings
arm64: dts: socfpga: stratix10_swvp: drop unsupported DW MSHC properties
arm64: dts: socfpga: stratix10_socdk: align NAND chip name with bindings
arm64: dts: socfpga: stratix10: add unit address to soc node
arm64: dts: socfpga: stratix10: move firmware out of soc node
arm64: dts: socfpga: stratix10: move FPGA region out of soc node
arm64: dts: socfpga: stratix10: align pincfg nodes with bindings
arm64: dts: socfpga: stratix10: add clock-names to DWC2 USB
arm64: dts: socfpga: drop unsupported cdns,page-size and cdns,block-size
ARM: dts: socfpga: align NAND controller name with bindings
ARM: dts: socfpga: drop unsupported cdns,page-size and cdns,block-size
Link: https://lore.kernel.org/r/20240104001354.152410-1-dinguyen@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Move emit_nops() before emit_prologue() and replace
memcpy(prog, x86_nops[5], X86_PATCH_SIZE) with emit_nops(&prog, X86_PATCH_SIZE).
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Link: https://lore.kernel.org/r/20240104142226.87869-2-hffilwlqm@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`")
ended up improving the code generation for the IP csum calculations, and
in particular special-casing the 40-byte case that is a hot case for
IPv6 headers.
It then had _another_ special case for the 64-byte unrolled loop, which
did two chains of 32-byte blocks, which allows modern CPU's to improve
performance by doing the chains in parallel thanks to renaming the carry
flag.
This just unifies the special cases and combines them into just one
single helper the 40-byte csum case, and replaces the 64-byte case by a
80-byte case that just does that single helper twice. It avoids having
all these different versions of inline assembly, and actually improved
performance further in my tests.
There was never anything magical about the 64-byte unrolled case, even
though it happens to be a common size (and typically is the cacheline
size).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The special case for odd aligned buffers is unnecessary and mostly
just adds overhead. Aligned buffers is the expectations, and even for
unaligned buffer, the only case that was helped is if the buffer was
1-byte from word aligned which is ~1/7 of the cases. Overall it seems
highly unlikely to be worth to extra branch.
It was left in the previous perf improvement patch because I was
erroneously comparing the exact output of `csum_partial(...)`, but
really we only need `csum_fold(csum_partial(...))` to match so its
safe to remove.
All csum kunit tests pass.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Laight <david.laight@aculab.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The gotol implementation uses a wrong data type for the offset: it
should be s32, not s16.
Fixes: c690191e23d8 ("s390/bpf: Implement unconditional jump with 32-bit offset")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20240102193531.3169422-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpfilter was supposed to convert iptables filtering rules into
BPF programs on the fly, from the kernel, through a usermode
helper. The base code for the UMH was introduced in 2018, and
couple of attempts (2, 3) tried to introduce the BPF program
generate features but were abandoned.
bpfilter now sits in a kernel tree unused and unusable, occasionally
causing confusion amongst Linux users (4, 5).
As bpfilter is now developed in a dedicated repository on GitHub (6),
it was suggested a couple of times this year (LSFMM/BPF 2023,
LPC 2023) to remove the deprecated kernel part of the project. This
is the purpose of this patch.
[1]: https://lore.kernel.org/lkml/20180522022230.2492505-1-ast@kernel.org/
[2]: https://lore.kernel.org/bpf/20210829183608.2297877-1-me@ubique.spb.ru/#t
[3]: https://lore.kernel.org/lkml/20221224000402.476079-1-qde@naccy.de/
[4]: https://dxuuu.xyz/bpfilter.html
[5]: https://github.com/linuxkit/linuxkit/pull/3904
[6]: https://github.com/facebook/bpfilter
Signed-off-by: Quentin Deslandes <qde@naccy.de>
Link: https://lore.kernel.org/r/20231226130745.465988-1-qde@naccy.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When commit c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE
MSR emulation for extended PEBS") switched the initialization of
cpuc->guest_switch_msrs to use compound literals, it screwed up
the boolean logic:
+ u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
...
- arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
- arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
+ .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
Before the patch, the value of arr[0].guest would have been intel_ctrl &
~cpuc->intel_ctrl_host_mask & ~pebs_mask. The intent is to always treat
PEBS events as host-only because, while the guest runs, there is no way
to tell the processor about the virtual address where to put PEBS records
intended for the host.
Unfortunately, the new expression can be expanded to
(intel_ctrl & ~cpuc->intel_ctrl_host_mask) | (intel_ctrl & ~pebs_mask)
which makes no sense; it includes any bit that isn't *both* marked as
exclude_guest and using PEBS. So, reinstate the old logic. Another
way to write it could be "intel_ctrl & ~(cpuc->intel_ctrl_host_mask |
pebs_mask)", presumably the intention of the author of the faulty.
However, I personally find the repeated application of A AND NOT B to
be a bit more readable.
This shows up as guest failures when running concurrent long-running
perf workloads on the host, and was reported to happen with rcutorture.
All guests on a given host would die simultaneously with something like an
instruction fault or a segmentation violation.
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Analyzed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Merge in arm64 fixes queued for 6.7 so that kpti_install_ng_mappings()
can be updated to use arm64_kernel_unmapped_at_el0() instead of checking
the ARM64_UNMAP_KERNEL_AT_EL0 CPU capability directly.
* for-next/fixes:
arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
perf/arm-cmn: Fail DTC counter allocation correctly
arm64: Avoid enabling KPTI unnecessarily
|
|
* for-next/sysregs:
arm64/sysreg: Add missing system instruction definitions for FGT
arm64/sysreg: Add missing system register definitions for FGT
arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definition for FPMR
arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09
arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09
arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09
arm64/sysreg: Add definition for ID_AA64FPFR0_EL1
arm64/sysreg: Add definition for ID_AA64ISAR3_EL1
arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09
arm64/sysreg: Add definition for ID_AA64PFR2_EL1
arm64/sysreg: update CPACR_EL1 register
arm64/sysreg: add system register POR_EL{0,1}
arm64/sysreg: Add definition for HAFGRTR_EL2
arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09
|
|
* for-next/stacktrace:
arm64: stacktrace: factor out kunwind_stack_walk()
arm64: stacktrace: factor out kernel unwind state
|
|
* for-next/rip-vpipt:
arm64: Rename reserved values for CTR_EL0.L1Ip
arm64: Kill detection of VPIPT i-cache policy
KVM: arm64: Remove VPIPT I-cache handling
|
|
* for-next/perf: (30 commits)
arm: perf: Fix ARCH=arm build with GCC
MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
drivers/perf: add DesignWare PCIe PMU driver
PCI: Move pci_clear_and_set_dword() helper to PCI header
PCI: Add Alibaba Vendor ID to linux/pci_ids.h
docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
Documentation: arm64: Document the PMU event counting threshold feature
arm64: perf: Add support for event counting threshold
arm: pmu: Move error message and -EOPNOTSUPP to individual PMUs
KVM: selftests: aarch64: Update tools copy of arm_pmuv3.h
perf/arm_dmc620: Remove duplicate format attribute #defines
arm: pmu: Share user ABI format mechanism with SPE
arm64: perf: Include threshold control fields in PMEVTYPER mask
arm: perf: Convert remaining fields to use GENMASK
arm: perf: Use GENMASK for PMMIR fields
arm: perf/kvm: Use GENMASK for ARMV8_PMU_PMCR_N
arm: perf: Remove inlines from arm_pmuv3.c
drivers/perf: arm_dsu_pmu: Remove kerneldoc-style comment syntax
drivers/perf: Remove usage of the deprecated ida_simple_xx() API
...
|
|
* for-next/mm:
arm64: irq: set the correct node for shadow call stack
arm64: irq: set the correct node for VMAP stack
|
|
* for-next/misc:
arm64: memory: remove duplicated include
arm64: Delete the zero_za macro
Documentation/arch/arm64: Fix typo
|
|
* for-next/lpa2-prep:
arm64: mm: get rid of kimage_vaddr global variable
arm64: mm: Take potential load offset into account when KASLR is off
arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
arm64: Add ARM64_HAS_LPA2 CPU capability
arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
arm64/mm: Update tlb invalidation routines for FEAT_LPA2
arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
arm64/mm: Modify range-based tlbi to decrement scale
|
|
* for-next/kbuild:
efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
arm64: properly install vmlinuz.efi
arm64: replace <asm-generic/export.h> with <linux/export.h>
arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg
|
|
* for-next/fpsimd:
arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
arm64: fpsimd: Drop unneeded 'busy' flag
|
|
* for-next/early-idreg-overrides:
arm64/kernel: Move 'nokaslr' parsing out of early idreg code
arm64: idreg-override: Avoid kstrtou64() to parse a single hex digit
arm64: idreg-override: Avoid sprintf() for simple string concatenation
arm64: idreg-override: avoid strlen() to check for empty strings
arm64: idreg-override: Avoid parameq() and parameqn()
arm64: idreg-override: Prepare for place relative reloc patching
arm64: idreg-override: Omit non-NULL checks for override pointer
|
|
When running the instruction decoder selftest with LLVM=1 and
CONFIG_PVH=y, there is a series of warnings:
arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
arch/x86/tools/insn_decoder_test: warning: ffffffff81000050 ea <unknown>
arch/x86/tools/insn_decoder_test: warning: objdump says 1 bytes, but insn_get_length() says 7
arch/x86/tools/insn_decoder_test: warning: Decoded and checked 7214721 instructions with 1 failures
GNU objdump outputs "(bad)" instead of "<unknown>", which is already
handled in the bad_expr regex, so there is no warning.
$ objdump -d arch/x86/platform/pvh/head.o | grep -E '50:\s+ea'
50: ea (bad)
$ llvm-objdump -d arch/x86/platform/pvh/head.o | grep -E '50:\s+ea'
50: ea <unknown>
Add "<unknown>" to the bad_expr regex to clear up the warning, allowing
the instruction decoder selftest to fully pass with llvm-objdump.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231205-objdump_reformat-awk-handle-llvm-objdump-bad_expr-v1-1-b4a74f39396f@kernel.org
|
|
kprobe_emulate_call_indirect
kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate
indirect calls. However, int3_emulate_call always assumes the size of
the call to be 5 bytes when calculating the return address. This is
incorrect for register-based indirect calls in x86, which can be either
2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime,
the incorrect return address causes control flow to land onto the wrong
place after return -- possibly not a valid instruction boundary. This
can lead to a panic like the following:
[ 7.308204][ C1] BUG: unable to handle page fault for address: 000000000002b4d8
[ 7.308883][ C1] #PF: supervisor read access in kernel mode
[ 7.309168][ C1] #PF: error_code(0x0000) - not-present page
[ 7.309461][ C1] PGD 0 P4D 0
[ 7.309652][ C1] Oops: 0000 [#1] SMP
[ 7.309929][ C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-rc5-trace-for-next #6
[ 7.310397][ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 7.311068][ C1] RIP: 0010:__common_interrupt+0x52/0xc0
[ 7.311349][ C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[ 7.312512][ C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[ 7.312899][ C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[ 7.313334][ C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[ 7.313702][ C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[ 7.314146][ C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[ 7.314509][ C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[ 7.314951][ C1] FS: 0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[ 7.315396][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.315691][ C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[ 7.316153][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.316508][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7.316948][ C1] Call Trace:
[ 7.317123][ C1] <IRQ>
[ 7.317279][ C1] ? __die_body+0x64/0xb0
[ 7.317482][ C1] ? page_fault_oops+0x248/0x370
[ 7.317712][ C1] ? __wake_up+0x96/0xb0
[ 7.317964][ C1] ? exc_page_fault+0x62/0x130
[ 7.318211][ C1] ? asm_exc_page_fault+0x22/0x30
[ 7.318444][ C1] ? __cfi_native_send_call_func_single_ipi+0x10/0x10
[ 7.318860][ C1] ? default_idle+0xb/0x10
[ 7.319063][ C1] ? __common_interrupt+0x52/0xc0
[ 7.319330][ C1] common_interrupt+0x78/0x90
[ 7.319546][ C1] </IRQ>
[ 7.319679][ C1] <TASK>
[ 7.319854][ C1] asm_common_interrupt+0x22/0x40
[ 7.320082][ C1] RIP: 0010:default_idle+0xb/0x10
[ 7.320309][ C1] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 66 90 0f 00 2d 09 b9 3b 00 fb f4 <fa> c3 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 e9
[ 7.321449][ C1] RSP: 0018:ffffc9000009bee8 EFLAGS: 00000256
[ 7.321808][ C1] RAX: ffff88813bca8b68 RBX: 0000000000000001 RCX: 000000000001ef0c
[ 7.322227][ C1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000001ef0c
[ 7.322656][ C1] RBP: ffffc9000009bef8 R08: 8000000000000000 R09: 00000000000008c2
[ 7.323083][ C1] R10: 0000000000000000 R11: ffffffff81058e70 R12: 0000000000000000
[ 7.323530][ C1] R13: ffff8881002b30c0 R14: 0000000000000000 R15: 0000000000000000
[ 7.323948][ C1] ? __cfi_lapic_next_deadline+0x10/0x10
[ 7.324239][ C1] default_idle_call+0x31/0x50
[ 7.324464][ C1] do_idle+0xd3/0x240
[ 7.324690][ C1] cpu_startup_entry+0x25/0x30
[ 7.324983][ C1] start_secondary+0xb4/0xc0
[ 7.325217][ C1] secondary_startup_64_no_verify+0x179/0x17b
[ 7.325498][ C1] </TASK>
[ 7.325641][ C1] Modules linked in:
[ 7.325906][ C1] CR2: 000000000002b4d8
[ 7.326104][ C1] ---[ end trace 0000000000000000 ]---
[ 7.326354][ C1] RIP: 0010:__common_interrupt+0x52/0xc0
[ 7.326614][ C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[ 7.327570][ C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[ 7.327910][ C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[ 7.328273][ C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[ 7.328632][ C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[ 7.329223][ C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[ 7.329780][ C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[ 7.330193][ C1] FS: 0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[ 7.330632][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.331050][ C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[ 7.331454][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.331854][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7.332236][ C1] Kernel panic - not syncing: Fatal exception in interrupt
[ 7.332730][ C1] Kernel Offset: disabled
[ 7.333044][ C1] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
The relevant assembly code is (from objdump, faulting address
highlighted):
ffffffff8102ed9d: 41 ff d3 call *%r11
ffffffff8102eda0: 65 48 <8b> 05 30 c7 ff mov %gs:0x7effc730(%rip),%rax
The emulation incorrectly sets the return address to be ffffffff8102ed9d
+ 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next
mov. This in turn causes incorrect subsequent instruction decoding and
eventually triggers the page fault above.
Instead of invoking int3_emulate_call, perform push and jmp emulation
directly in kprobe_emulate_call_indirect. At this point we can obtain
the instruction size from p->ainsn.size so that we can calculate the
correct return address.
Link: https://lore.kernel.org/all/20240102233345.385475-1-jinghao7@illinois.edu/
Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
Cc: stable@vger.kernel.org
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The DTS code coding style expects exactly one space before and after '='
sign.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
New device nodes are enabled by default.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The "soc" node has ranges with addresses, so it is should have unit
address to fix dtc W=1 warnings like:
socfpga_agilex.dtsi:152.6-674.4: Warning (unit_address_vs_reg): /soc: node has a reg or ranges property, but no unit name
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The "soc" node is supposed to have only MMIO children, so move the
firmware/svc node to top level to fix dtc W=1 warnings like:
socfpga_agilex.dtsi:663.12-673.5: Warning (simple_bus_reg): /soc@0/firmware: missing or empty reg/ranges property
The node should still be instantiated by drivers/of/platform.c, just
like in all other platforms.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The "soc" node is supposed to have only MMIO children, so move the FPGA
region node to top level to fix dtc W=1 warnings like:
socfpga_agilex.dtsi:141.20-146.5: Warning (simple_bus_reg): /soc@0/base_fpga_region: missing or empty reg/ranges property
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Use a generic node name for the pin controller node to fix:
/socfpga_agilex_n6000.dtb: pinconf@ffd13100: $nodename:0: 'pinconf@ffd13100' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
altr,dw-mshc-ciu-div and altr,dw-mshc-sdr-timing are neither documented
nor used by Linux, so remove them to fix dtbs_check warnings like:
socfpga_stratix10_swvp.dtb: mmc@ff808000: Unevaluated properties are not allowed
('altr,dw-mshc-ciu-div', 'altr,dw-mshc-sdr-timing' were unexpected)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Bindings expect NAND child node name to match certain patterns:
socfpga_agilex_socdk_nand.dtb: nand-controller@ffb90000: Unevaluated properties are not allowed ('flash@0' was unexpected)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The "soc" node has ranges with addresses, so it is should have unit
address to fix dtc W=1 warnings like:
socfpga_stratix10.dtsi:128.6-636.4: Warning (unit_address_vs_reg): /soc: node has a reg or ranges property, but no unit name
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The "soc" node is supposed to have only MMIO children, so move the
firmware/svc node to top level to fix dtc W=1 warnings like:
socfpga_stratix10.dtsi:625.12-635.5: Warning (simple_bus_reg): /soc@0/firmware: missing or empty reg/ranges property
The node should still be instantiated by drivers/of/platform.c, just
like in all other platforms.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The "soc" node is supposed to have only MMIO children, so move the FPGA
region node to top level to fix dtc W=1 warnings like:
socfpga_stratix10.dtsi:136.20-141.5: Warning (simple_bus_reg): /soc@0/base_fpga_region: missing or empty reg/ranges property
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
pinctrl-single bindings require pin configuration node names to match
certain patterns:
socfpga_stratix10_socdk.dtb: pinctrl@ffd13000: 'i2c1-pmx-func', 'i2c1-pmx-func-gpio'
do not match any of the regexes: '-pins(-[0-9]+)?$|-pin$', 'pinctrl-[0-9]+'
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
The DWC2 USB bindings require clock-names property, to provide such to
fix warnings like:
socfpga_stratix10_swvp.dtb: usb@ffb40000: 'clock-names' is a required property
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
cdns,page-size and cdns,block-size are neither documented nor used by
Linux, so remove them to fix dtbs_check warnings like:
socfpga_n5x_socdk.dtb: flash@0: Unevaluated properties are not allowed ('cdns,block-size', 'cdns,page-size' were unexpected)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Bindings expect NAND controller node name to match certain patterns:
socfpga_arria10_socdk_nand.dtb: nand@ffb90000: $nodename:0: 'nand@ffb90000' does not match '^nand-controller(@.*)?'
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|