summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-12powerpc/secvar: Warn when PAGE_SIZE is smaller than max object sizeAndrew Donnellan
Due to sysfs constraints, when writing to a variable, we can only handle writes of up to PAGE_SIZE. It's possible that the maximum object size is larger than PAGE_SIZE, in which case, print a warning on boot so that the user is aware. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-13-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Allow backend to populate static list of variable namesAndrew Donnellan
Currently, the list of variables is populated by calling secvar_ops->get_next() repeatedly, which is explicitly modelled on the OPAL API (including the keylen parameter). For the upcoming PLPKS backend, we have a static list of variable names. It is messy to fit that into get_next(), so instead, let the backend put a NULL-terminated array of variable names into secvar_ops->var_names, which will be used if get_next() is undefined. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-12-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Extend sysfs to include config varsRussell Currey
The forthcoming pseries consumer of the secvar API wants to expose a number of config variables. Allowing secvar implementations to provide their own sysfs attributes makes it easy for consumers to expose what they need to. This is not being used by the OPAL secvar implementation at present, and the config directory will not be created if no attributes are set. Signed-off-by: Russell Currey <ruscur@russell.cc> Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-11-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Clean up init error messagesAndrew Donnellan
Remove unnecessary prefixes from error messages in secvar_sysfs_init() (the file defines pr_fmt, so putting "secvar:" in every message is unnecessary). Make capitalisation and punctuation more consistent. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-10-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Handle max object size in the consumerRussell Currey
Currently the max object size is handled in the core secvar code with an entirely OPAL-specific implementation, so create a new max_size() op and move the existing implementation into the powernv platform. Should be no functional change. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-9-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Handle format string in the consumerRussell Currey
The code that handles the format string in secvar-sysfs.c is entirely OPAL specific, so create a new "format" op in secvar_operations to make the secvar code more generic. No functional change. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-8-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Use sysfs_emit() instead of sprintf()Russell Currey
The secvar format string and object size sysfs files are both ASCII text, and should use sysfs_emit(). No functional change. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-7-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Warn and error if multiple secvar ops are setRussell Currey
The secvar code only supports one consumer at a time. Multiple consumers aren't possible at this point in time, but we'd want it to be obvious if it ever could happen. Signed-off-by: Russell Currey <ruscur@russell.cc> Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-6-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Use u64 in secvar_operationsMichael Ellerman
There's no reason for secvar_operations to use uint64_t vs the more common kernel type u64. The types are compatible, but they require different printk format strings which can lead to confusion. Change all the secvar related routines to use u64. Reviewed-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-5-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Fix incorrect return in secvar_sysfs_load()Russell Currey
secvar_ops->get_next() returns -ENOENT when there are no more variables to return, which is expected behaviour. Fix this by returning 0 if get_next() returns -ENOENT. This fixes an issue introduced in commit bd5d9c743d38 ("powerpc: expose secure variables to userspace via sysfs"), but the return code of secvar_sysfs_load() was never checked so this issue never mattered. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-4-ajd@linux.ibm.com
2023-02-12powerpc/pseries: Fix alignment of PLPKS structures and buffersAndrew Donnellan
A number of structures and buffers passed to PKS hcalls have alignment requirements, which could on occasion cause problems: - Authorisation structures must be 16-byte aligned and must not cross a page boundary - Label structures must not cross page boundaries - Password output buffers must not cross page boundaries To ensure correct alignment, we adjust the allocation size of each of these structures/buffers to be the closest power of 2 that is at least the size of the structure/buffer (since kmalloc() guarantees that an allocation of a power of 2 size will be aligned to at least that size). Reported-by: Benjamin Gray <bgray@linux.ibm.com> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-3-ajd@linux.ibm.com
2023-02-12powerpc/pseries: Fix handling of PLPKS object flushing timeoutAndrew Donnellan
plpks_confirm_object_flushed() uses the H_PKS_CONFIRM_OBJECT_FLUSHED hcall to check whether changes to an object in the Platform KeyStore have been flushed to non-volatile storage. The hcall returns two output values, the return code and the flush status. plpks_confirm_object_flushed() polls the hcall until either the flush status has updated, the return code is an error, or a timeout has been exceeded. While we're still polling, the hcall is returning H_SUCCESS (0) as the return code. In the timeout case, this means that upon exiting the polling loop, rc is 0, and therefore 0 is returned to the user. Handle the timeout case separately and return ETIMEDOUT if triggered. Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Reported-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Tested-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-2-ajd@linux.ibm.com
2023-02-12Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch to bring in some changes that conflict with upcoming next content.
2023-02-12powerpc/ps3: Refresh ps3_defconfigGeoff Levand
Refresh ps3_defconfig for v6.2. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99e87549b17feca3494e9df6f4def04a9ec7c042.1672767868.git.geoff@infradead.org
2023-02-12powerpc/ps3: Change updateboltedpp() panic to infoGeoff Levand
Commit fdacae8a8402 ("powerpc: Activate CONFIG_STRICT_KERNEL_RWX by default") causes ps3_hpte_updateboltedpp() to be called. The correct fix would be to implement updateboltedpp() for PS3, but it's not clear if that's possible. As a stop-gap, change the panic statment in ps3_hpte_updateboltedpp() to a pr_info statement so that bootup can continue. Signed-off-by: Geoff Levand <geoff@infradead.org> [mpe: Flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2df879d982809c05b0dfade57942fe03dbe9e7de.1672767868.git.geoff@infradead.org
2023-02-12powerpc/kcsan: Add KCSAN SupportRohan McLure
Enable HAVE_ARCH_KCSAN for 64-bit Book3S, permitting use of the kernel concurrency sanitiser through the CONFIG_KCSAN_* kconfig options. KCSAN requires compiler builtins __atomic_* 64-bit values, and so only report support on 64-bit. See documentation in Documentation/dev-tools/kcsan.rst for more information. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> [mpe: Limit to Book3S to avoid build failure on Book3E] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206021801.105268-6-rmclure@linux.ibm.com
2023-02-11Merge tag 'spi-fix-v6.2-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of hopefully final fixes for spi: one driver specific fix for an issue with very large transfers and a fix for an issue with the locking fixes in spidev merged earlier this release cycle which was missed" * tag 'spi-fix-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spidev: fix a recursive locking error spi: dw: Fix wrong FIFO level setting for long xfers
2023-02-11KVM: arm64: nv: Use reg_to_encoding() to get sysreg IDOliver Upton
Avoid open-coding and just use the helper to encode the ID from the sysreg table entry. No functional change intended. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230211190742.49843-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11mmc: omap: drop TPS65010 dependencyArnd Bergmann
The original TPS65010 dependency was only needed for MACH_OMAP_H2, which is now gone, but I messed up the conversion when I removed that symbol. Now the missing TPS65010 causes a boot failure on other machines such as the SX1. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 0d7bb85e9413 ("ARM: omap1: remove unused board files") Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-11Merge tag 'x86-urgent-2023-02-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix a kprobes bug, plus add a new Intel model number to the upstream <asm/intel-family.h> header for drivers to use" * tag 'x86-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Lunar Lake M x86/kprobes: Fix 1 byte conditional jump target
2023-02-11Merge tag 'locking-urgent-2023-02-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix an rtmutex missed-wakeup bug" * tag 'locking-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtmutex: Ensure that the top waiter is always woken up
2023-02-11Merge tag 'cxl-fixes-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "Two fixups for CXL (Compute Express Link) in presence of passthrough decoders. This primarily helps developers using the QEMU CXL emulation, but with the impending arrival of CXL switches these types of topologies will be of interest to end users. - Fix a crash when shutting down regions in the presence of passthrough decoders - Fix region creation to understand passthrough decoders instead of the narrower definition of passthrough ports" * tag 'cxl-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Fix passthrough-decoder detection cxl/region: Fix null pointer dereference for resetting decoder
2023-02-11Merge tag 'libnvdimm-fixes-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A fix for an issue that could causes users to inadvertantly reserve too much capacity when debugging the KMSAN and persistent memory namespace, a lockdep fix, and a kernel-doc build warning: - Resolve the conflict between KMSAN and NVDIMM with respect to reserving pmem namespace / volume capacity for larger sizeof(struct page) - Fix a lockdep warning in the the NFIT code - Fix a kernel-doc build warning" * tag 'libnvdimm-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE ACPI: NFIT: fix a potential deadlock during NFIT teardown dax: super.c: fix kernel-doc bad line warning
2023-02-11Merge tag 'fixes-2023-02-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock revert from Mike Rapoport: "Revert 'mm: Always release pages to the buddy allocator in memblock_free_late()' The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies, which will cause a crash. A proper fix will be more involved so revert this change for the time being" * tag 'fixes-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
2023-02-11nfsd: don't destroy global nfs4_file table in per-net shutdownJeff Layton
The nfs4_file table is global, so shutting it down when a containerized nfsd is shut down is wrong and can lead to double-frees. Tear down the nfs4_file_rhltable in nfs4_state_shutdown instead of nfs4_state_shutdown_net. Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017 Reported-by: JianHong Yin <jiyin@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-11perf/x86/intel/uncore: Add Meteor Lake supportKan Liang
The uncore subsystem for Meteor Lake is similar to the previous Alder Lake. The main difference is that MTL provides PMU support for different tiles, while ADL only provides PMU support for the whole package. On ADL, there are CBOX, ARB, and clockbox uncore PMON units. On MTL, they are split into CBOX/HAC_CBOX, ARB/HAC_ARB, and cncu/sncu which provides a fixed counter for clockticks. Also, new MSR addresses are introduced on MTL. The IMC uncore PMON is the same as Alder Lake. Add new PCIIDs of IMC for Meteor Lake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230210190238.1726237-1-kan.liang@linux.intel.com
2023-02-11x86/entry: Fix unwinding from kprobe on PUSH/POP instructionJosh Poimboeuf
If a kprobe (INT3) is set on a stack-modifying single-byte instruction, like a single-byte PUSH/POP or a LEAVE, ORC fails to unwind past it: Call Trace: <TASK> dump_stack_lvl+0x57/0x90 handler_pre+0x33/0x40 [kprobe_example] aggr_pre_handler+0x49/0x90 kprobe_int3_handler+0xe3/0x180 do_int3+0x3a/0x80 exc_int3+0x7d/0xc0 asm_exc_int3+0x35/0x40 RIP: 0010:kernel_clone+0xe/0x3a0 Code: cc e8 16 b2 bf 00 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 cc <53> 48 89 fb 48 83 ec 68 4c 8b 27 65 48 8b 04 25 28 00 00 00 48 89 RSP: 0018:ffffc9000074fda0 EFLAGS: 00000206 RAX: 0000000000808100 RBX: ffff888109de9d80 RCX: 0000000000000000 RDX: 0000000000000011 RSI: ffff888109de9d80 RDI: ffffc9000074fdc8 RBP: ffff8881019543c0 R08: ffffffff81127e30 R09: 00000000e71742a5 R10: ffff888104764a18 R11: 0000000071742a5e R12: ffff888100078800 R13: ffff888100126000 R14: 0000000000000000 R15: ffff888100126005 ? __pfx_call_usermodehelper_exec_async+0x10/0x10 ? kernel_clone+0xe/0x3a0 ? user_mode_thread+0x5b/0x80 ? __pfx_call_usermodehelper_exec_async+0x10/0x10 ? call_usermodehelper_exec_work+0x77/0xb0 ? process_one_work+0x299/0x5f0 ? worker_thread+0x4f/0x3a0 ? __pfx_worker_thread+0x10/0x10 ? kthread+0xf2/0x120 ? __pfx_kthread+0x10/0x10 ? ret_from_fork+0x29/0x50 </TASK> The problem is that #BP saves the pointer to the instruction immediately *after* the INT3, rather than to the INT3 itself. The instruction replaced by the INT3 hasn't actually run, but ORC assumes otherwise and expects the wrong stack layout. Fix it by annotating the #BP exception as a non-signal stack frame, which tells the ORC unwinder to decrement the instruction pointer before looking up the corresponding ORC entry. Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/baafcd3cc1abb14cb757fe081fa696012a5265ee.1676068346.git.jpoimboe@kernel.org
2023-02-11x86/unwind/orc: Add 'signal' field to ORC metadataJosh Poimboeuf
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and exceptions) rather than decremented (like for call stack return addresses) when used to find the next ORC entry. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org
2023-02-11driver core: cpu: don't hand-override the uevent bus_type callback.Greg Kroah-Hartman
Instead of having to change the uevent bus_type callback by hand at runtime, set it at build time based on the build configuration options, making this much simpler to maintain and understand (and allow to make the structure constant.) Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230210102408.1083177-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-11x86/perf/zhaoxin: Add stepping check for ZXCsilviazhao
Some of Nano series processors will lead GP when accessing PMC fixed counter. Meanwhile, their hardware support for PMC has not announced externally. So exclude Nano CPUs from ZXC by checking stepping information. This is an unambiguous way to differentiate between ZXC and Nano CPUs. Following are Nano and ZXC FMS information: Nano FMS: Family=6, Model=F, Stepping=[0-A][C-D] ZXC FMS: Family=6, Model=F, Stepping=E-F OR Family=6, Model=0x19, Stepping=0-3 Fixes: 3a4ac121c2ca ("x86/perf: Add hardware performance events support for Zhaoxin CPU.") Reported-by: Arjan <8vvbbqzo567a@nospam.xutrox.com> Reported-by: Kevin Brace <kevinbrace@gmx.com> Signed-off-by: silviazhao <silviazhao-oc@zhaoxin.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
2023-02-11perf/x86/intel/ds: Fix the conversion from TSC to perf timeKan Liang
The time order is incorrect when the TSC in a PEBS record is used. $perf record -e cycles:upp dd if=/dev/zero of=/dev/null count=10000 $ perf script --show-task-events perf-exec 0 0.000000: PERF_RECORD_COMM: perf-exec:915/915 dd 915 106.479872: PERF_RECORD_COMM exec: dd:915/915 dd 915 106.483270: PERF_RECORD_EXIT(915:915):(914:914) dd 915 106.512429: 1 cycles:upp: ffffffff96c011b7 [unknown] ([unknown]) ... ... The perf time is from sched_clock_cpu(). The current PEBS code unconditionally convert the TSC to native_sched_clock(). There is a shift between the two clocks. If the TSC is stable, the shift is consistent, __sched_clock_offset. If the TSC is unstable, the shift has to be calculated at runtime. This patch doesn't support the conversion when the TSC is unstable. The TSC unstable case is a corner case and very unlikely to happen. If it happens, the TSC in a PEBS record will be dropped and fall back to perf_event_clock(). Fixes: 47a3aeb39e8d ("perf/x86/intel/pebs: Fix PEBS timestamps overwritten") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/CAM9d7cgWDVAq8-11RbJ2uGfwkKD6fA-OMwOKDrNUrU_=8MgEjg@mail.gmail.com/
2023-02-11sched/rt: pick_next_rt_entity(): check list_entryPietro Borrello
Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") removed any path which could make pick_next_rt_entity() return NULL. However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of pick_next_rt_entity()) still checks the error condition, which can never happen, since list_entry() never returns NULL. Remove the BUG_ON check, and instead emit a warning in the only possible error condition here: the queue being empty which should never happen. Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it
2023-02-11sched/deadline: Add more reschedule cases to prio_changed_dl()Valentin Schneider
I've been tracking down an issue on a ~5.17ish kernel where: CPUx CPUy <DL task p0 owns an rtmutex M> <p0 depletes its runtime, gets throttled> <rq switches to the idle task> <DL task p1 blocks on M, boost/replenish p0> <No call to resched_curr() happens here> [idle task keeps running here until *something* accidentally sets TIF_NEED_RESCHED] On that kernel, it is quite easy to trigger using rt-tests's deadline_test [1] with the test running on isolated CPUs (this reduces the chance of something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the issue even more obvious as the hung task detector chimes in). I haven't been able to reproduce this using a mainline kernel, even if I revert 2972e3050e35 ("tracing: Make trace_marker{,_raw} stream-like") which gets rid of the lock involved in the above test, *but* I cannot convince myself the issue isn't there from looking at the code. Make prio_changed_dl() issue a reschedule if the current task isn't a deadline one. While at it, ensure a reschedule is emitted when a queued-but-not-current task gets boosted with an earlier deadline that current's. [1]: https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230206140612.701871-1-vschneid@redhat.com
2023-02-11sched/fair: sanitize vruntime of entity being placedZhang Qiao
When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards. However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu. To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale. [rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Co-developed-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de
2023-02-11sched/fair: Remove capacity inversion detectionVincent Guittot
Remove the capacity inversion detection which is now handled by util_fits_cpu() returning -1 when we need to continue to look for a potential CPU with better performance. This ends up almost reverting patches below except for some comments: commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection") commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()") commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230201143628.270912-3-vincent.guittot@linaro.org
2023-02-11sched/fair: unlink misfit task from cpu overutilizedVincent Guittot
By taking into account uclamp_min, the 1:1 relation between task misfit and cpu overutilized is no more true as a task with a small util_avg may not fit a high capacity cpu because of uclamp_min constraint. Add a new state in util_fits_cpu() to reflect the case that task would fit a CPU except for the uclamp_min hint which is a performance requirement. Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we can use this new value to take additional action to select the best CPU that doesn't match uclamp_min hint. When util_fits_cpu() returns -1, we will continue to look for a possible CPU with better performance, which replaces Capacity Inversion detection with capacity_orig_of() - thermal_load_avg to detect a capacity inversion. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-and-tested-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Link: https://lore.kernel.org/r/20230201143628.270912-2-vincent.guittot@linaro.org
2023-02-11objtool: mem*() are not uaccess safePeter Zijlstra
For mysterious raisins I listed the new __asan_mem*() functions as being uaccess safe, this is giving objtool fails on KASAN builds because these functions call out to the actual __mem*() functions which are not marked uaccess safe. Removing it doesn't make the robots unhappy. Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Bisected-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230126182302.GA687063@paulmck-ThinkPad-P17-Gen-1
2023-02-11KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changesChristoffer Dall
So far we were flushing almost the entire universe whenever a VM would load/unload the SCTLR_EL1 and the two versions of that register had different MMU enabled settings. This turned out to be so slow that it prevented forward progress for a nested VM, because a scheduler timer tick interrupt would always be pending when we reached the nested VM. To avoid this problem, we consider the SCTLR_EL2 when evaluating if caches are on or off when entering virtual EL2 (because this is the value that we end up shadowing onto the hardware EL1 register). Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-19-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Filter out unsupported features from ID regsMarc Zyngier
As there is a number of features that we either can't support, or don't want to support right away with NV, let's add some basic filtering so that we don't advertize silly things to the EL2 guest. Whilst we are at it, advertize FEAT_TTL as well as FEAT_GTG, which the NV implementation will implement. Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-18-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2Jintack Lim
With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2 trap to EL2. Handle those traps just like we do for EL1 registers. One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12 will trap since it's one of the EL12 registers controlled by HCR_EL2.NV bit. Therefore, add a handler for it and don't treat it as a non-trap-registers when preparing a shadow context. These registers, being only a view on their EL1 counterpart, are permanently hidden from userspace. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> [maz: EL12_REG(), register visibility] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Allow a sysreg to be hidden from userspace onlyMarc Zyngier
So far, we never needed to distinguish between registers hidden from userspace and being hidden from a guest (they are always either visible to both, or hidden from both). With NV, we have the ugly case of the EL02 and EL12 registers, which are only a view on the EL0 and EL1 registers. It makes absolutely no sense to expose them to userspace, since it already has the canonical view. Add a new visibility flag (REG_HIDDEN_USER) and a new helper that checks for it and REG_HIDDEN when checking whether to expose a sysreg to userspace. Subsequent patches will make use of it. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisorMarc Zyngier
We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return to the guest and vice versa when taking an exception to the hypervisor, because we emulate virtual EL2 in EL1 and therefore have to translate the mode field from EL2 to EL1 and vice versa. This requires keeping track of the state we enter the guest, for which we transiently use a dedicated flag. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from ↵Jintack Lim
virtual EL2 For the same reason we trap virtual memory register accesses at virtual EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3 introduces the HCR_EL2.NV1 bit to be able to trap on those register accesses in EL1. Do not set this bit until the whole nesting support is completed, which happens further down the line... Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Handle SMCs taken from virtual EL2Jintack Lim
Non-nested guests have used the hvc instruction to initiate SMCCC calls into KVM. This is quite a poor fit for NV as hvc exceptions are always taken to EL2. In other words, KVM needs to unconditionally forward the hvc exception back into vEL2 to uphold the architecture. Instead, treat the smc instruction from vEL2 as we would a guest hypercall, thereby allowing the vEL2 to interact with KVM's hypercall surface. Note that on NV-capable hardware HCR_EL2.TSC causes smc instructions executed in non-secure EL1 to trap to EL2, even if EL3 is not implemented. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-13-maz@kernel.org [Oliver: redo commit message, only handle smc from vEL2] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11x86/tsc: Do feature check as the very first thingBorislav Petkov (AMD)
Do the feature check as the very first thing in the function. Everything else comes after that and is meaningless work if the TSC CPUID bit is not even set. Switch to cpu_feature_enabled() too, while at it. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
2023-02-11x86/tsc: Make recalibrate_cpu_khz() export GPL onlyBorislav Petkov (AMD)
A quick search doesn't reveal any use outside of the kernel - which would be questionable to begin with anyway - so make the export GPL only. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
2023-02-11x86/cacheinfo: Remove unused trace variableBorislav Petkov (AMD)
15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the last use of the trace local var. Remove it too and the useless trace cache case. No functional changes. Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230210234541.9694-1-bp@alien8.de Link: http://lore.kernel.org/r/20220705073349.1512-1-jiapeng.chong@linux.alibaba.com
2023-02-11KVM: arm64: nv: Handle trapped ERET from virtual EL2Christoffer Dall
When a guest hypervisor running virtual EL2 in EL1 executes an ERET instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so that we can emulate the exception return in software. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Inject HVC exceptions to the virtual EL2Jintack Lim
As we expect all PSCI calls from the L1 hypervisor to be performed using SMC when nested virtualization is enabled, it is clear that all HVC instruction from the VM (including from the virtual EL2) are supposed to handled in the virtual EL2. Forward these to EL2 as required. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> [maz: add handling of HCR_EL2.HCD] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Support virtual EL2 exceptionsJintack Lim
Support injecting exceptions and performing exception returns to and from virtual EL2. This must be done entirely in software except when taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE} == {1,1} (a VHE guest hypervisor). [maz: switch to common exception injection framework, illegal exeption return handling] Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>