summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-03-06perf traceevent: Ensure read cmdlines are null terminated.Ian Rogers
Issue detected by address sanitizer. Fixes: cd4ceb63438e9e28 ("perf util: Save pid-cmdline mapping into tracing header") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210226221431.1985458-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf bench numa: Fix the condition checks for max number of NUMA nodesAthira Rajeev
In systems having higher node numbers available like node 255, perf numa bench will fail with SIGABORT. <<>> perf: bench/numa.c:1416: init: Assertion `!(g->p.nr_nodes > 64 || g->p.nr_nodes < 0)' failed. Aborted (core dumped) <<>> Snippet from 'numactl -H' below on a powerpc system where the highest node number available is 255: available: 6 nodes (0,8,252-255) node 0 cpus: <cpu-list> node 0 size: 519587 MB node 0 free: 516659 MB node 8 cpus: <cpu-list> node 8 size: 523607 MB node 8 free: 486757 MB node 252 cpus: node 252 size: 0 MB node 252 free: 0 MB node 253 cpus: node 253 size: 0 MB node 253 free: 0 MB node 254 cpus: node 254 size: 0 MB node 254 free: 0 MB node 255 cpus: node 255 size: 0 MB node 255 free: 0 MB node distances: node 0 8 252 253 254 255 Note: <cpu-list> expands to actual cpu list in the original output. These nodes 252-255 are to represent the memory on GPUs and are valid nodes. The perf numa bench init code has a condition check to see if the number of NUMA nodes (nr_nodes) exceeds MAX_NR_NODES. The value of MAX_NR_NODES defined in perf code is 64. And the 'nr_nodes' is the value from numa_max_node() which represents the highest node number available in the system. In some systems where we could have NUMA node 255, this condition check fails and results in SIGABORT. The numa benchmark uses static value of MAX_NR_NODES in the code to represent size of two NUMA node arrays and node bitmask used for setting memory policy. Patch adds a fix to dynamically allocate size for the two arrays and bitmask value based on the node numbers available in the system. With the fix, perf numa benchmark will work with node configuration on any system and thus removes the static MAX_NR_NODES value. Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/1614271802-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf diff: Don't crash on freeing errno-session on the error pathDmitry Safonov
__cmd_diff() sets result of perf_session__new() to d->session. In case of failure, it's errno and perf-diff may crash with: failed to open perf.data: Permission denied Failed to open perf.data Segmentation fault (core dumped) From the coredump: 0 0x00005569a62b5955 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2681 1 0x00005569a626b37d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:295 2 perf_session__delete (session=0xffffffffffffffff) at util/session.c:291 3 0x00005569a618008a in __cmd_diff () at builtin-diff.c:1239 4 cmd_diff (argc=<optimized out>, argv=<optimized out>) at builtin-diff.c:2011 [..] Funny enough, it won't always crash. For me it crashes only if failed file is second in cmd-line: the reason is that cmd_diff() check files for branch-stacks [in check_file_brstack()] and if the first file doesn't have brstacks, it doesn't proceed to try open other files from cmd-line. Check d->session before calling perf_session__delete(). Another solution would be assigning to temporary variable, checking it, but I find it easier to follow with IS_ERR() check in the same function. After some time it's still obvious why the check is needed, and with temp variable it's possible to make the same mistake. Committer testing: $ perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ] $ perf diff failed to open perf.data.old: No such file or directory Failed to open perf.data.old $ perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ] $ perf diff # Event 'cycles:u' # # Baseline Delta Abs Shared Object Symbol # ........ ......... ................ .......................... # 0.92% +87.66% [unknown] [k] 0xffffffff8825de16 11.39% +0.04% ld-2.32.so [.] __GI___tunables_init 87.70% ld-2.32.so [.] _dl_check_map_versions $ sudo chown root:root perf.data [sudo] password for acme: $ perf diff failed to open perf.data: Permission denied Failed to open perf.data Segmentation fault (core dumped) $ After the patch: $ perf diff failed to open perf.data: Permission denied Failed to open perf.data $ Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dmitry safonov <dima@arista.com> Link: http://lore.kernel.org/lkml/20210302023533.1572231-1-dima@arista.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf tools: Clean 'generated' directory used for creating the syscall table ↵Andreas Wendleder
on x86 Remove generated directory tools/perf/arch/x86/include/generated. Signed-off-by: Andreas Wendleder <andreas.wendleder@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210301185642.163396-1-gonsolo@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf build: Move feature cleanup under tools/buildJiri Olsa
Arnaldo reported issue for following build command: $ rm -rf /tmp/krava; mkdir /tmp/krava; make O=/tmp/krava clean CLEAN config /bin/sh: line 0: cd: /tmp/krava/feature/: No such file or directory ../../scripts/Makefile.include:17: *** output directory "/tmp/krava/feature/" does not exist. Stop. make[1]: *** [Makefile.perf:1010: config-clean] Error 2 make: *** [Makefile:90: clean] Error 2 The problem is that now that we include scripts/Makefile.include in feature's Makefile (which is fine and needed), we need to ensure the OUTPUT directory exists, before executing (out of tree) clean command. Removing the feature's cleanup from perf Makefile and fixing feature's cleanup under build Makefile, so it now checks that there's existing OUTPUT directory before calling the clean. Fixes: 211a741cd3e1 ("tools: Factor Clang, LLC and LLVM utils definitions") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13-git Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210224150831.409639-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf tools: Cast (struct timeval).tv_sec when printingPierre Gondois
The musl-libc [1] defines (struct timeval).tv_sec as a 'long long' for arm and other architectures. The default build having a '-Wformat' flag, not casting the field when printing prevents from building perf. This patch casts the (struct timeval).tv_sec fields to the expected format. [1] git://git.musl-libc.org/musl Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Douglas.raillard@arm.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210224182410.5366-1-Pierre.Gondois@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06tools headers UAPI: Sync kvm.h headers with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in: d9a47edabc4f9481 ("KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR") 8d4e7e80838f45d3 ("KVM: x86: declare Xen HVM shared info capability and add test case") 40da8ccd724f7ca2 ("KVM: x86/xen: Add event channel interrupt vector upcall") These new IOCTLs are now supported on 'perf trace': $ tools/perf/trace/beauty/kvm_ioctl.sh > before $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h $ tools/perf/trace/beauty/kvm_ioctl.sh > after $ diff -u before after --- before 2021-02-23 09:55:46.229058308 -0300 +++ after 2021-02-23 09:55:57.509308058 -0300 @@ -91,6 +91,10 @@ [0xc1] = "GET_SUPPORTED_HV_CPUID", [0xc6] = "X86_SET_MSR_FILTER", [0xc7] = "RESET_DIRTY_RINGS", + [0xc8] = "XEN_HVM_GET_ATTR", + [0xc9] = "XEN_HVM_SET_ATTR", + [0xca] = "XEN_VCPU_GET_ATTR", + [0xcb] = "XEN_VCPU_SET_ATTR", [0xe0] = "CREATE_DEVICE", [0xe1] = "SET_DEVICE_ATTR", [0xe2] = "GET_DEVICE_ATTR", $ Addressing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06tools headers UAPI s390: Sync ptrace.h kernel headersArnaldo Carvalho de Melo
To pick up the changes from: 56e62a7370283601 ("s390: convert to generic entry") That only adds two new defines, so shouldn't cause problems when building the BPF selftests. Silencing this perf build warning: Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/ptrace.h' differs from latest version at 'arch/s390/include/uapi/asm/ptrace.h' diff -u tools/arch/s390/include/uapi/asm/ptrace.h arch/s390/include/uapi/asm/ptrace.h Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf arch powerpc: Sync powerpc syscall.tbl with the kernel sourcesArnaldo Carvalho de Melo
To get the changes in: fbcee2ebe8edbb6a ("powerpc/32: Always save non volatile GPRs at syscall entry") That shouldn't cause any change in tooling, just silences the following tools/perf/ build warning: Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06tools headers UAPI: Sync openat2.h with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in: 99668f618062816c ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED") That don't result in any change in tooling, only silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/openat2.h' differs from latest version at 'include/uapi/linux/openat2.h' diff -u tools/include/uapi/linux/openat2.h include/uapi/linux/openat2.h Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06tools headers UAPI: Sync drm/i915_drm.h with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in: 8c3b1ba0e7ea9a80 ("drm/i915/gt: Track the overall awake/busy time") 348fb0cb0a79bce0 ("drm/i915/pmu: Deprecate I915_PMU_LAST and optimize state tracking") That don't result in any change in tooling: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/i915_drm.h tools/include/uapi/drm/i915_drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after $ Only silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06tools headers UAPI: Update tools's copy of drm.h headersArnaldo Carvalho de Melo
Picking the changes from: 0e0dc448005583a6 ("drm/doc: demote old doc-comments in drm.h") Silencing these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h' diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h No changes in tooling as these are just C comment documentation changes. Cc: Simon Ser <contact@emersion.fr> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06io-wq: always track creds for async issueJens Axboe
If we go async with a request, grab the creds that the task currently has assigned and make sure that the async side switches to them. This is handled in the same way that we do for registered personalities. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-06io-wq: fix race in freeing 'wq' and worker accessJens Axboe
Ran into a use-after-free on the main io-wq struct, wq. It has a worker ref and completion event, but the manager itself isn't holding a reference. This can lead to a race where the manager thinks there are no workers and exits, but a worker is being added. That leads to the following trace: BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0 Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425 CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110 Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020 Call Trace: dump_stack+0x90/0xbe print_address_description.constprop.0+0x67/0x28d ? io_wqe_worker+0x4c0/0x5e0 kasan_report.cold+0x7b/0xd4 ? io_wqe_worker+0x4c0/0x5e0 __asan_load8+0x6d/0xa0 io_wqe_worker+0x4c0/0x5e0 ? io_worker_handle_work+0xc00/0xc00 ? recalc_sigpending+0xe5/0x120 ? io_worker_handle_work+0xc00/0xc00 ? io_worker_handle_work+0xc00/0xc00 ret_from_fork+0x1f/0x30 Allocated by task 3080422: kasan_save_stack+0x23/0x60 __kasan_kmalloc+0x80/0xa0 kmem_cache_alloc_node_trace+0xa0/0x480 io_wq_create+0x3b5/0x600 io_uring_alloc_task_context+0x13c/0x380 io_uring_add_task_file+0x109/0x140 __x64_sys_io_uring_enter+0x45f/0x660 do_syscall_64+0x32/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 3080422: kasan_save_stack+0x23/0x60 kasan_set_track+0x20/0x40 kasan_set_free_info+0x24/0x40 __kasan_slab_free+0xe8/0x120 kfree+0xa8/0x400 io_wq_put+0x14a/0x220 io_wq_put_and_exit+0x9a/0xc0 io_uring_clean_tctx+0x101/0x140 __io_uring_files_cancel+0x36e/0x3c0 do_exit+0x169/0x1340 __x64_sys_exit+0x34/0x40 do_syscall_64+0x32/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Have the manager itself hold a reference, and now both drop points drop and complete if we hit zero, and the manager can unconditionally do a wait_for_completion() instead of having a race between reading the ref count and waiting if it was non-zero. Fixes: fb3a1f6c745c ("io-wq: have manager wait for all workers to exit") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-06cifs: ask for more credit on async read/write code pathsAurelien Aptel
When doing a large read or write workload we only very gradually increase the number of credits which can cause problems with parallelizing large i/o (I/O ramps up more slowly than it should for large read/write workloads) especially with multichannel when the number of credits on the secondary channels starts out low (e.g. less than about 130) or when recovering after server throttled back the number of credit. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-06cifs: fix credit accounting for extra channelAurelien Aptel
With multichannel, operations like the queries from "ls -lR" can cause all credits to be used and errors to be returned since max_credits was not being set correctly on the secondary channels and thus the client was requesting 0 credits incorrectly in some cases (which can lead to not having enough credits to perform any operation on that channel). Signed-off-by: Aurelien Aptel <aaptel@suse.com> CC: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-06iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handlerDinghao Liu
There is one regmap_bulk_read() call in mpu3050_trigger_handler that we have caught its return value bug lack further handling. Check and terminate the execution flow just like the other three regmap_bulk_read() calls in this function. Fixes: 3904b28efb2c7 ("iio: gyro: Add driver for the MPU-3050 gyroscope") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20210301080421.13436-1-dinghao.liu@zju.edu.cn Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-03-06iio: hid-sensor-temperature: Fix issues of timestamp channelYe Xiang
This patch fixes 2 issues of timestamp channel: 1. This patch ensures that there is sufficient space and correct alignment for the timestamp. 2. Correct the timestamp channel scan index. Fixes: 59d0f2da3569 ("iio: hid: Add temperature sensor support") Signed-off-by: Ye Xiang <xiang.ye@intel.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210303063615.12130-4-xiang.ye@intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-03-06iio: hid-sensor-humidity: Fix alignment issue of timestamp channelYe Xiang
This patch ensures that, there is sufficient space and correct alignment for the timestamp. Fixes: d7ed89d5aadf ("iio: hid: Add humidity sensor support") Signed-off-by: Ye Xiang <xiang.ye@intel.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210303063615.12130-2-xiang.ye@intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-03-06ext4: fix bh ref count on error pathsZhaolong Zhang
__ext4_journalled_writepage should drop bhs' ref count on error paths Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Link: https://lore.kernel.org/r/1614678151-70481-1-git-send-email-zhangzl2013@126.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-06fs/ext4: fix integer overflow in s_log_groups_per_flexSabyrzhan Tasbolatov
syzbot found UBSAN: shift-out-of-bounds in ext4_mb_init [1], when 1 << sbi->s_es->s_log_groups_per_flex is bigger than UINT_MAX, where sbi->s_mb_prefetch is unsigned integer type. 32 is the maximum allowed power of s_log_groups_per_flex. Following if check will also trigger UBSAN shift-out-of-bound: if (1 << sbi->s_es->s_log_groups_per_flex >= UINT_MAX) { So I'm checking it against the raw number, perhaps there is another way to calculate UINT_MAX max power. Also use min_t as to make sure it's uint type. [1] UBSAN: shift-out-of-bounds in fs/ext4/mballoc.c:2713:24 shift exponent 60 is too large for 32-bit type 'int' Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x137/0x1be lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 ext4_mb_init_backend fs/ext4/mballoc.c:2713 [inline] ext4_mb_init+0x19bc/0x19f0 fs/ext4/mballoc.c:2898 ext4_fill_super+0xc2ec/0xfbe0 fs/ext4/super.c:4983 Reported-by: syzbot+a8b4b0c60155e87e9484@syzkaller.appspotmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210224095800.3350002-1-snovitoll@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-06ext4: add reclaim checks to xattr codeJan Kara
Syzbot is reporting that ext4 can enter fs reclaim from kvmalloc() while the transaction is started like: fs_reclaim_acquire+0x117/0x150 mm/page_alloc.c:4340 might_alloc include/linux/sched/mm.h:193 [inline] slab_pre_alloc_hook mm/slab.h:493 [inline] slab_alloc_node mm/slub.c:2817 [inline] __kmalloc_node+0x5f/0x430 mm/slub.c:4015 kmalloc_node include/linux/slab.h:575 [inline] kvmalloc_node+0x61/0xf0 mm/util.c:587 kvmalloc include/linux/mm.h:781 [inline] ext4_xattr_inode_cache_find fs/ext4/xattr.c:1465 [inline] ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1508 [inline] ext4_xattr_set_entry+0x1ce6/0x3780 fs/ext4/xattr.c:1649 ext4_xattr_ibody_set+0x78/0x2b0 fs/ext4/xattr.c:2224 ext4_xattr_set_handle+0x8f4/0x13e0 fs/ext4/xattr.c:2380 ext4_xattr_set+0x13a/0x340 fs/ext4/xattr.c:2493 This should be impossible since transaction start sets PF_MEMALLOC_NOFS. Add some assertions to the code to catch if something isn't working as expected early. Link: https://lore.kernel.org/linux-ext4/000000000000563a0205bafb7970@google.com/ Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210222171626.21884-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-06ext4: shrink race window in ext4_should_retry_alloc()Eric Whitney
When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it fails at significant rates on the two test scenarios that disable delayed allocation (ext3conv and data_journal) and force actual block allocation for the fallocate and pwrite functions in the test. The failure rate on 5.10 for both ext3conv and data_journal on one test system typically runs about 85%. On 5.11, the failure rate on ext3conv sometimes drops to as low as 1% while the rate on data_journal increases to nearly 100%. The observed failures are largely due to ext4_should_retry_alloc() cutting off block allocation retries when s_mb_free_pending (used to indicate that a transaction in progress will free blocks) is 0. However, free space is usually available when this occurs during runs of generic/371. It appears that a thread attempting to allocate blocks is just missing transaction commits in other threads that increase the free cluster count and reset s_mb_free_pending while the allocating thread isn't running. Explicitly testing for free space availability avoids this race. The current code uses a post-increment operator in the conditional expression that determines whether the retry limit has been exceeded. This means that the conditional expression uses the value of the retry counter before it's increased, resulting in an extra retry cycle. The current code actually retries twice before hitting its retry limit rather than once. Increasing the retry limit to 3 from the current actual maximum retry count of 2 in combination with the change described above reduces the observed failure rate to less that 0.1% on both ext3conv and data_journal with what should be limited impact on users sensitive to the overhead caused by retries. A per filesystem percpu counter exported via sysfs is added to allow users or developers to track the number of times the retry limit is exceeded without resorting to debugging methods. This should provide some insight into worst case retry behavior. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-06counter: stm32-timer-cnt: fix ceiling miss-alignment with reload registerFabrice Gasnier
Ceiling value may be miss-aligned with what's actually configured into the ARR register. This is seen after probe as currently the ARR value is zero, whereas ceiling value is set to the maximum. So: - reading ceiling reports zero - in case the counter gets enabled without any prior configuration, it won't count. - in case the function gets set by the user 1st, (priv->ceiling) is used. Fix it by getting rid of the cached "priv->ceiling" variable. Rather use the ARR register value directly by using regmap read or write when needed. There should be no drawback on performance as priv->ceiling isn't used in performance critical path. There's also no point in writing ARR while setting function (sms), so it can be safely removed. Fixes: ad29937e206f ("counter: Add STM32 Timer quadrature encoder") Suggested-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/1614793789-10346-1-git-send-email-fabrice.gasnier@foss.st.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-03-06counter: stm32-timer-cnt: fix ceiling write max valueFabrice Gasnier
The ceiling value isn't checked before writing it into registers. The user could write a value higher than the counter resolution (e.g. 16 or 32 bits indicated by max_arr). This makes most significant bits to be truncated. Fix it by checking the max_arr to report a range error [1] to the user. [1] https://lkml.org/lkml/2021/2/12/358 Fixes: ad29937e206f ("counter: Add STM32 Timer quadrature encoder") Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/1614696235-24088-1-git-send-email-fabrice.gasnier@foss.st.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-03-06m68k: Fix virt_addr_valid() W=1 compiler warningsGeert Uytterhoeven
If CONFIG_DEBUG_SG=y, and CONFIG_MMU=y: include/linux/scatterlist.h: In function ‘sg_set_buf’: arch/m68k/include/asm/page_mm.h:174:49: warning: ordered comparison of pointer with null pointer [-Wextra] 174 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory) | ^~ or CONFIG_MMU=n: include/linux/scatterlist.h: In function ‘sg_set_buf’: arch/m68k/include/asm/page_no.h:33:50: warning: ordered comparison of pointer with null pointer [-Wextra] 33 | #define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void *)PAGE_OFFSET) && \ | ^~ Fix this by doing the comparison in the "unsigned long" instead of the "void *" domain. Note that for now this is only seen when compiling btrfs, due to commit e9aa7c285d20a69c ("btrfs: enable W=1 checks for btrfs"), but as people are doing more W=1 compile testing, it will start to show up elsewhere, too. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20210305084122.4118826-1-geert@linux-m68k.org
2021-03-06x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscallsAndy Lutomirski
On a 32-bit fast syscall that fails to read its arguments from user memory, the kernel currently does syscall exit work but not syscall entry work. This confuses audit and ptrace. For example: $ ./tools/testing/selftests/x86/syscall_arg_fault_32 ... strace: pid 264258: entering, ptrace_syscall_info.op == 2 ... This is a minimal fix intended for ease of backporting. A more complete cleanup is coming. Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org
2021-03-06x86/unwind/orc: Silence warnings caused by missing ORC dataJosh Poimboeuf
The ORC unwinder attempts to fall back to frame pointers when ORC data is missing for a given instruction. It sets state->error, but then tries to keep going as a best-effort type of thing. That may result in further warnings if the unwinder gets lost. Until we have some way to register generated code with the unwinder, missing ORC will be expected, and occasionally going off the rails will also be expected. So don't warn about it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Ivan Babrou <ivan@cloudflare.com> Link: https://lkml.kernel.org/r/06d02c4bbb220bd31668db579278b0352538efbb.1612534649.git.jpoimboe@redhat.com
2021-03-06x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2Josh Poimboeuf
KASAN reserves "redzone" areas between stack frames in order to detect stack overruns. A read or write to such an area triggers a KASAN "stack-out-of-bounds" BUG. Normally, the ORC unwinder stays in-bounds and doesn't access the redzone. But sometimes it can't find ORC metadata for a given instruction. This can happen for code which is missing ORC metadata, or for generated code. In such cases, the unwinder attempts to fall back to frame pointers, as a best-effort type thing. This fallback often works, but when it doesn't, the unwinder can get confused and go off into the weeds into the KASAN redzone, triggering the aforementioned KASAN BUG. But in this case, the unwinder's confusion is actually harmless and working as designed. It already has checks in place to prevent off-stack accesses, but those checks get short-circuited by the KASAN BUG. And a BUG is a lot more disruptive than a harmless unwinder warning. Disable the KASAN checks by using READ_ONCE_NOCHECK() for all stack accesses. This finishes the job started by commit 881125bfe65b ("x86/unwind: Disable KASAN checking in the ORC unwinder"), which only partially fixed the issue. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reported-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ivan Babrou <ivan@cloudflare.com> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/9583327904ebbbeda399eca9c56d6c7085ac20fe.1612534649.git.jpoimboe@redhat.com
2021-03-06perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBRKan Liang
To supply a PID/TID for large PEBS, it requires flushing the PEBS buffer in a context switch. For normal LBRs, a context switch can flip the address space and LBR entries are not tagged with an identifier, we need to wipe the LBR, even for per-cpu events. For LBR callstack, save/restore the stack is required during a context switch. Set PERF_ATTACH_SCHED_CB for the event with large PEBS & LBR. Fixes: 9c964efa4330 ("perf/x86/intel: Drain the PEBS buffer during context switches") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201130193842.10569-2-kan.liang@linux.intel.com
2021-03-06perf/core: Flush PMU internal buffers for per-CPU eventsKan Liang
Sometimes the PMU internal buffers have to be flushed for per-CPU events during a context switch, e.g., large PEBS. Otherwise, the perf tool may report samples in locations that do not belong to the process where the samples are processed in, because PEBS does not tag samples with PID/TID. The current code only flush the buffers for a per-task event. It doesn't check a per-CPU event. Add a new event state flag, PERF_ATTACH_SCHED_CB, to indicate that the PMU internal buffers have to be flushed for this event during a context switch. Add sched_cb_entry and perf_sched_cb_usages back to track the PMU/cpuctx which is required to be flushed. Only need to invoke the sched_task() for per-CPU events in this patch. The per-task events have been handled in perf_event_context_sched_in/out already. Fixes: 9c964efa4330 ("perf/x86/intel: Drain the PEBS buffer during context switches") Reported-by: Gabriel Marin <gmx@google.com> Originally-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201130193842.10569-1-kan.liang@linux.intel.com
2021-03-06static_call: Fix the module key fixupPeter Zijlstra
Provided the target address of a R_X86_64_PC32 relocation is aligned, the low two bits should be invariant between the relative and absolute value. Turns out the address is not aligned and things go sideways, ensure we transfer the bits in the absolute form when fixing up the key address. Fixes: 73f44fe19d35 ("static_call: Allow module use without exposing static_call_key") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20210225220351.GE4746@worktop.programming.kicks-ass.net
2021-03-06sched/membarrier: fix missing local execution of ipi_sync_rq_state()Mathieu Desnoyers
The function sync_runqueues_membarrier_state() should copy the membarrier state from the @mm received as parameter to each runqueue currently running tasks using that mm. However, the use of smp_call_function_many() skips the current runqueue, which is unintended. Replace by a call to on_each_cpu_mask(). Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load") Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org # 5.4.x+ Link: https://lore.kernel.org/r/74F1E842-4A84-47BF-B6C2-5407DFDD4A4A@gmail.com
2021-03-06sched: Simplify set_affinity_pending refcountsPeter Zijlstra
Now that we have set_affinity_pending::stop_pending to indicate if a stopper is in progress, and we have the guarantee that if that stopper exists, it will (eventually) complete our @pending we can simplify the refcount scheme by no longer counting the stopper thread. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.724130207@infradead.org
2021-03-06sched: Fix affine_move_task() self-concurrencyPeter Zijlstra
Consider: sched_setaffinity(p, X); sched_setaffinity(p, Y); Then the first will install p->migration_pending = &my_pending; and issue stop_one_cpu_nowait(pending); and the second one will read p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending), the _SAME_ @pending. This causes stopper list corruption. Add set_affinity_pending::stop_pending, to indicate if a stopper is in progress. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org
2021-03-06sched: Optimize migration_cpu_stop()Peter Zijlstra
When the purpose of migration_cpu_stop() is to migrate the task to 'any' valid CPU, don't migrate the task when it's already running on a valid CPU. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.569238629@infradead.org
2021-03-06sched: Collate affine_move_task() stoppersPeter Zijlstra
The SCA_MIGRATE_ENABLE and task_running() cases are almost identical, collapse them to avoid further duplication. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.500108964@infradead.org
2021-03-06sched: Simplify migration_cpu_stop()Peter Zijlstra
When affine_move_task() issues a migration_cpu_stop(), the purpose of that function is to complete that @pending, not any random other p->migration_pending that might have gotten installed since. This realization much simplifies migration_cpu_stop() and allows further necessary steps to fix all this as it provides the guarantee that @pending's stopper will complete @pending (and not some random other @pending). Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.430014682@infradead.org
2021-03-06sched: Fix migration_cpu_stop() requeueingPeter Zijlstra
When affine_move_task(p) is called on a running task @p, which is not otherwise already changing affinity, we'll first set p->migration_pending and then do: stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg); This then gets us to migration_cpu_stop() running on the CPU that was previously running our victim task @p. If we find that our task is no longer on that runqueue (this can happen because of a concurrent migration due to load-balance etc.), then we'll end up at the: } else if (dest_cpu < 1 || pending) { branch. Which we'll take because we set pending earlier. Here we first check if the task @p has already satisfied the affinity constraints, if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop() onto the CPU that is now hosting our task @p: stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, &pending->arg, &pending->stop_work); Except, we've never initialized pending->arg, which will be all 0s. This then results in running migration_cpu_stop() on the next CPU with arg->p == NULL, which gives the by now obvious result of fireworks. The cure is to change affine_move_task() to always use pending->arg, furthermore we can use the exact same pattern as the SCA_MIGRATE_ENABLE case, since we'll block on the pending->done completion anyway, no point in adding yet another completion in stop_one_cpu(). This then gives a clear distinction between the two migration_cpu_stop() use cases: - sched_exec() / migrate_task_to() : arg->pending == NULL - affine_move_task() : arg->pending != NULL; And we can have it ignore p->migration_pending when !arg->pending. Any stop work from sched_exec() / migrate_task_to() is in addition to stop works from affine_move_task(), which will be sufficient to issue the completion. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.357743989@infradead.org
2021-03-06ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5Takashi Iwai
The mute and mic-mute LEDs on HP ZBook Studio G5 are controlled via GPIO bits 0x10 and 0x20, respectively, and we need the extra setup for those. As the similar code is already present for other HP models but with different GPIO pins, this patch factors out the common helper code and applies those GPIO values for each model. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211893 Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210306095018.11746-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-03-06KVM: arm64: Fix range alignment when walking page tablesJia He
When walking the page tables at a given level, and if the start address for the range isn't aligned for that level, we propagate the misalignment on each iteration at that level. This results in the walker ignoring a number of entries (depending on the original misalignment) on each subsequent iteration. Properly aligning the address before the next iteration addresses this issue. Cc: stable@vger.kernel.org Reported-by: Howard Zhang <Howard.Zhang@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jia He <justin.he@arm.com> Fixes: b1e57de62cfb ("KVM: arm64: Add stand-alone page-table walker infrastructure") [maz: rewrite commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210303024225.2591-1-justin.he@arm.com Message-Id: <20210305185254.3730990-9-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibilityMarc Zyngier
It looks like we have broken firmware out there that wrongly advertises a GICv2 compatibility interface, despite the CPUs not being able to deal with it. To work around this, check that the CPU initialising KVM is actually able to switch to MMIO instead of system registers, and use that as a precondition to enable GICv2 compatibility in KVM. Note that the detection happens on a single CPU. If the firmware is lying *and* that the CPUs are asymetric, all hope is lost anyway. Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210305185254.3730990-8-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()Marc Zyngier
As we are about to report a bit more information to the rest of the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more explicit __vgic_v3_get_gic_config(). No functional change. Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210305185254.3730990-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is availableMarc Zyngier
When running under a nesting hypervisor, it isn't guaranteed that the virtual HW will include a PMU. In which case, let's not try to access the PMU registers in the world switch, as that'd be deadly. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210209114844.3278746-3-maz@kernel.org Message-Id: <20210305185254.3730990-6-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static keyMarc Zyngier
We currently find out about the presence of a HW PMU (or the handling of that PMU by perf, which amounts to the same thing) in a fairly roundabout way, by checking the number of counters available to perf. That's good enough for now, but we will soon need to find about about that on paths where perf is out of reach (in the world switch). Instead, let's turn kvm_arm_support_pmu_v3() into a static key. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210209114844.3278746-2-maz@kernel.org Message-Id: <20210305185254.3730990-5-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06KVM: arm64: Fix nVHE hyp panic host context restoreAndrew Scull
When panicking from the nVHE hyp and restoring the host context, x29 is expected to hold a pointer to the host context. This wasn't being done so fix it to make sure there's a valid pointer the host context being used. Rather than passing a boolean indicating whether or not the host context should be restored, instead pass the pointer to the host context. NULL is passed to indicate that no context should be restored. Fixes: a2e102e20fd6 ("KVM: arm64: nVHE: Handle hyp panics") Cc: stable@vger.kernel.org Signed-off-by: Andrew Scull <ascull@google.com> [maz: partial rewrite to fit 5.12-rc1] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210219122406.1337626-1-ascull@google.com Message-Id: <20210305185254.3730990-4-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06KVM: arm64: Avoid corrupting vCPU context register in guest exitWill Deacon
Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") tracks the currently running vCPU, clearing the pointer to NULL on exit from a guest. Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS code to go off into the weeds when it saves the DISR assuming that the CPU context is embedded in a struct vCPU. Leave x1 alone and use x3 as a temporary register instead when clearing the vCPU on the guest exit path. Cc: Marc Zyngier <maz@kernel.org> Cc: Andrew Scull <ascull@google.com> Cc: <stable@vger.kernel.org> Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210226181211.14542-1-will@kernel.org Message-Id: <20210305185254.3730990-3-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06KVM: arm64: nvhe: Save the SPE context earlySuzuki K Poulose
The nVHE KVM hyp drains and disables the SPE buffer, before entering the guest, as the EL1&0 translation regime is going to be loaded with that of the guest. But this operation is performed way too late, because : - The owning translation regime of the SPE buffer is transferred to EL2. (MDCR_EL2_E2PB == 0) - The guest Stage1 is loaded. Thus the flush could use the host EL1 virtual address, but use the EL2 translations instead of host EL1, for writing out any cached data. Fix this by moving the SPE buffer handling early enough. The restore path is doing the right thing. Fixes: 014c4c77aad7 ("KVM: arm64: Improve debug register save/restore flow") Cc: stable@vger.kernel.org Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210302120345.3102874-1-suzuki.poulose@arm.com Message-Id: <20210305185254.3730990-2-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06kvm: x86: use NULL instead of using plain integer as pointerMuhammad Usama Anjum
Sparse warnings removed: warning: Using plain integer as NULL pointer Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com> Message-Id: <20210305180816.GA488770@LEGION> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-05Linux 5.12-rc2v5.12-rc2Linus Torvalds