summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-29fix infoleak in waitid(2)Al Viro
kernel_waitid() can return a PID, an error or 0. rusage is filled in the first case and waitid(2) rusage should've been copied out exactly in that case, *not* whenever kernel_waitid() has not returned an error. Compat variant shares that braino; none of kernel_wait4() callers do, so the below ought to fix it. Reported-and-tested-by: Alexander Potapenko <glider@google.com> Fixes: ce72a16fa705 ("wait4(2)/waitid(2): separate copying rusage to userland") Cc: stable@vger.kernel.org # v4.13 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-29x86/asm: Use register variable to get stack pointer valueAndrey Ryabinin
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") ... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions: -mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d> +sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a> Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29x86/mm: Disable branch profiling in mem_encrypt.cTom Lendacky
Some routines in mem_encrypt.c are called very early in the boot process, e.g. sme_encrypt_kernel(). When CONFIG_TRACE_BRANCH_PROFILING=y is defined the resulting branch profiling associated with the check to see if SME is active results in a kernel crash. Disable branch profiling for mem_encrypt.c by defining DISABLE_BRANCH_PROFILING before including any header files. Reported-by: kernel test robot <lkp@01.org> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929162419.6016.53390.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29Merge tag 'perf-urgent-for-mingo-4.14-20170928' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix syscalltbl build failure (Akemi Yagi) - Fix attr.exclude_kernel setting for default cycles:p, this time for !root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo) - Sync kernel ABI headers with tooling headers (Ingo Molnar) - Remove misleading debug messages with --call-graph option (Mengting Zhang) - Revert vmlinux symbol resolution patches for s390x (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29Merge branch 'fixes-v4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys fixes from James Morris: "Notable here is a rewrite of big_key crypto by Jason Donenfeld to address some issues in the original code. From Jason's commit log: "This started out as just replacing the use of crypto/rng with get_random_bytes_wait, so that we wouldn't use bad randomness at boot time. But, upon looking further, it appears that there were even deeper underlying cryptographic problems, and that this seems to have been committed with very little crypto review. So, I rewrote the whole thing, trying to keep to the conventions introduced by the previous author, to fix these cryptographic flaws." There has been positive review of the new code by Eric Biggers and Herbert Xu, and it passes basic testing via the keyutils test suite. Eric also manually tested it. Generally speaking, we likely need to improve the amount of crypto review for kernel crypto users including keys (I'll post a note separately to ksummit-discuss)" * 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security/keys: rewrite all of big_key crypto security/keys: properly zero out sensitive key material in big_key KEYS: use kmemdup() in request_key_auth_new() KEYS: restrict /proc/keys by credentials at open time KEYS: reset parent each time before searching key_user_tree KEYS: prevent KEYCTL_READ on negative key KEYS: prevent creating a different user's keyrings KEYS: fix writing past end of user-supplied buffer in keyring_read() KEYS: fix key refcount leak in keyctl_read_key() KEYS: fix key refcount leak in keyctl_assume_authority() KEYS: don't revoke uninstantiated key in request_key_auth_new() KEYS: fix cred refcount leak in request_key_auth_new()
2017-09-29amdgpu/dc: inline dml_round_to_multipleDave Airlie
turns out to be a win to inline this. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: rename bios get_image symbol to something more searchable.Dave Airlie
This just makes it easier to find. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: set some of the link dp code to static.Dave Airlie
These aren't currently used outside this file. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: inline all the signal_types code.Dave Airlie
This is worth 300 bytes, and one less source file. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: inline fixed31_32 div_intDave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: use the builtin constant p trick on the 31/32 fixed point.Dave Airlie
This only gets us 100 bytes, but may as well be consistent. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: allow inlining constant int to fixed a lot better.Dave Airlie
This uses two things that might be Linux specific, __builtin_constant_p (might be gcc) and BUILD_BUG_ON. (maybe other dm's can have something similiar). This saves 4k in the bw calcs code. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: inline some of the bw_fixed code.Dave Airlie
This results in a ~4.5k code size reduction. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: move some one line dp functions to inlines.Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: hide some unused aux/i2c payload apis.Dave Airlie
I've no idea if these are used on other platforms, but they aren't used outside this file here, so make them static. Drops 300 bytes. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: inline a bunch of the fixed 31_32 helpers.Dave Airlie
This decreases code size by a few hundred bytes. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: inline some of the fixed 32_32 fnsDave Airlie
This drops ~400 bytes here. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: fix construct return values on irq service.Dave Airlie
This just removes more unused return/errors paths. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu: fixup construct to void paths on some more dc objects.Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: remove pointless return from build_pipe_hw_paramDave Airlie
This never returned anything else. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: another round of dce/dcn construct cleanups.Dave Airlie
This removes any remaining pointless return codepaths from the DCE code. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: cleanup construct returns in gpio.Dave Airlie
This is similiar to previous patches, don't return when we don't need to, also do error checking before allocating memory, makes it simpler to cleanup after. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: remove pointless returns in the i2caux constructor paths. (v2)Dave Airlie
There was lots of return true, and error checking that was never used in these paths. Just remove it all. v2: I missed one return true. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make some audio functions return voidDave Airlie
There is no need to check for these pointers being valid at this level. Check earlier if required. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make program_regamma_pwl return voidDave Airlie
The return value was unused. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make get_audio_clock_info return void.Dave Airlie
This function never returned false under any sane circumstances. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make dce80 timing generator construct return void.Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make dce transform constructor voidDave Airlie
This never returns anything but true. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make opp construct void.Dave Airlie
This doesn't return anything except true. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make link encoder construct void.Dave Airlie
This only ever returned true. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make stream encoder constructor return void.Dave Airlie
The checks weren't useful here really. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: make timing generator constructor return void.Dave Airlie
This can't fail as is. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: constify a bunch of dc structs.Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dm: constify rgb formats.Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dm: constify plane type.Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dm: constify yuv_formats.Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: static constify update_surface_trace_levelDave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/dc: drop dc_ver charDave Airlie
This isn't referenced anywhere, and if it was it should be const. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/pp: rewrite fiji pwr virus upload code.Dave Airlie
Along the same lines as rewriting the polaris code, this rewrites the fiji code, and reduces the driver size by ~40k. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/pp: rewrite polaris pwrvirus upload code.Dave Airlie
This reduces the pwrvirus table size by 30k, by moving the sequences of writes to the data register into blocks. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/nbio: use constant nbio_hdp_flush_reg structs.Dave Airlie
This removes the init path as well, since the init path just did some constant init of some structs. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/soc15: make the pcie index/data registers constant.Dave Airlie
These don't seem to change at runtime, and the initialisers are constant data. This could be improved by not selecting the apu/non-apu path on each pcie read/write access. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/pp: constify soft_dummy_pp_table.Dave Airlie
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgpu/pp: use array_size to size the pwrvirus tables.Dave Airlie
This avoids fragile hardcoding of array size. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29amdgfx/gfx: don't use static objects for ce/de meta. (v2)Dave Airlie
This isn't safe if we have multiple GPUs plugged in, since there is only one copy of this struct in the bss, just allocate on stack, it's 40/108 bytes which should be safe. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-09-29arm64: fault: Route pte translation faults via do_translation_faultWill Deacon
We currently route pte translation faults via do_page_fault, which elides the address check against TASK_SIZE before invoking the mm fault handling code. However, this can cause issues with the path walking code in conjunction with our word-at-a-time implementation because load_unaligned_zeropad can end up faulting in kernel space if it reads across a page boundary and runs into a page fault (e.g. by attempting to read from a guard region). In the case of such a fault, load_unaligned_zeropad has registered a fixup to shift the valid data and pad with zeroes, however the abort is reported as a level 3 translation fault and we dispatch it straight to do_page_fault, despite it being a kernel address. This results in calling a sleeping function from atomic context: BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313 in_atomic(): 0, irqs_disabled(): 0, pid: 10290 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [...] [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144 [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c [<ffffff8e016977f0>] do_page_fault+0x140/0x330 [<ffffff8e01681328>] do_mem_abort+0x54/0xb0 Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0) [...] [<ffffff8e016844fc>] el1_da+0x18/0x78 [<ffffff8e017f399c>] path_parentat+0x44/0x88 [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8 [<ffffff8e017f5044>] filename_create+0x4c/0x128 [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8 [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28 Code: 36380080 d5384100 f9400800 9402566d (d4210000) ---[ end trace 2d01889f2bca9b9f ]--- Fix this by dispatching all translation faults to do_translation_faults, which avoids invoking the page fault logic for faults on kernel addresses. Cc: <stable@vger.kernel.org> Reported-by: Ankit Jain <ankijain@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29arm64: mm: Use READ_ONCE when dereferencing pointer to pte tableWill Deacon
On kernels built with support for transparent huge pages, different CPUs can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk and they must take care to use READ_ONCE to avoid value tearing or caching of stale values by the compiler. Unfortunately, these functions call into our pgtable macros, which don't use READ_ONCE, and compiler caching has been observed to cause the following crash during ext4 writeback: PC is at check_pte+0x20/0x170 LR is at page_vma_mapped_walk+0x2e0/0x540 [...] Process doio (pid: 2463, stack limit = 0xffff00000f2e8000) Call trace: [<ffff000008233328>] check_pte+0x20/0x170 [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540 [<ffff000008234adc>] page_mkclean_one+0xac/0x278 [<ffff000008234d98>] rmap_walk_file+0xf0/0x238 [<ffff000008236e74>] rmap_walk+0x64/0xa0 [<ffff0000082370c8>] page_mkclean+0x90/0xa8 [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8 [<ffff00000832f984>] mpage_submit_page+0x34/0x98 [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170 [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8 [<ffff00000833530c>] ext4_writepages+0x484/0xe30 [<ffff0000081f6ab4>] do_writepages+0x44/0xe8 [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110 [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8 [<ffff000008324310>] ext4_sync_file+0x80/0x4b8 [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0 [<ffff0000082332b4>] SyS_msync+0x194/0x1e8 This is because page_vma_mapped_walk loads the PMD twice before calling pte_offset_map: the first time without READ_ONCE (where it gets all zeroes due to a concurrent pmdp_invalidate) and the second time with READ_ONCE (where it sees a valid table pointer due to a concurrent pmd_populate). However, the compiler inlines everything and caches the first value in a register, which is subsequently used in pte_offset_phys which returns a junk pointer that is later dereferenced when attempting to access the relevant pte. This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure that a stale value is not used. Whilst this is a point fix for a known failure (and simple to backport), a full fix moving all of our page table accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in page_vma_mapped_walk is in the works for a future kernel release. Cc: Jon Masters <jcm@redhat.com> Cc: Timur Tabi <timur@codeaurora.org> Cc: <stable@vger.kernel.org> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29kvm/x86: Handle async PF in RCU read-side critical sectionsBoqun Feng
Sasha Levin reported a WARNING: | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline] | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 ... | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS | 1.10.1-1ubuntu1 04/01/2014 | Call Trace: ... | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline] | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002 | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000 | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410 | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001 | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08 | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040 | __schedule+0x201/0x2240 kernel/sched/core.c:3292 | schedule+0x113/0x460 kernel/sched/core.c:3421 | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158 | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271 | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069 | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996 | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283 | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670 | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140 | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000 | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8 | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000 | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136 | sprintf+0xbe/0xf0 lib/vsprintf.c:2386 | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23 | get_link fs/namei.c:1047 [inline] | link_path_walk+0x1041/0x1490 fs/namei.c:2127 ... This happened when the host hit a page fault, and delivered it as in an async page fault, while the guest was in an RCU read-side critical section. The guest then tries to reschedule in kvm_async_pf_task_wait(), but rcu_preempt_note_context_switch() would treat the reschedule as a sleep in RCU read-side critical section, which is not allowed (even in preemptible RCU). Thus the WARN. To cure this, make kvm_async_pf_task_wait() go to the halt path if the PF happens in a RCU read-side critical section. Reported-by: Sasha Levin <levinsasha928@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-29KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresumeWanpeng Li
------------[ cut here ]------------ WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel] CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G W OE 4.13.0+ #17 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel] Call Trace: ? emulator_read_emulated+0x15/0x20 [kvm] ? segmented_read+0xae/0xf0 [kvm] vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel] ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel] x86_emulate_instruction+0x733/0x810 [kvm] vmx_handle_exit+0x2f4/0xda0 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm] kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 A nested #PF is triggered during L0 emulating instruction for L2. However, it doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes it by queuing the #PF exception instead ,requesting an immediate VM exit from L2 and keeping the exception for L1 pending for a subsequent nested VM exit. This should actually work all the time, making vmx_inject_page_fault_nested totally unnecessary. However, that's not working yet, so this patch can work around the issue in the meanwhile. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-29sched/sysctl: Check user input value of sysctl_sched_time_avgEthan Zhao
System will hang if user set sysctl_sched_time_avg to 0: [root@XXX ~]# sysctl kernel.sched_time_avg_ms=0 Stack traceback for pid 0 0xffff883f6406c600 0 0 1 3 R 0xffff883f6406cf50 *swapper/3 ffff883f7ccc3ae8 0000000000000018 ffffffff810c4dd0 0000000000000000 0000000000017800 ffff883f7ccc3d78 0000000000000003 ffff883f7ccc3bf8 ffffffff810c4fc9 ffff883f7ccc3c08 00000000810c5043 ffff883f7ccc3c08 Call Trace: <IRQ> [<ffffffff810c4dd0>] ? update_group_capacity+0x110/0x200 [<ffffffff810c4fc9>] ? update_sd_lb_stats+0x109/0x600 [<ffffffff810c5507>] ? find_busiest_group+0x47/0x530 [<ffffffff810c5b84>] ? load_balance+0x194/0x900 [<ffffffff810ad5ca>] ? update_rq_clock.part.83+0x1a/0xe0 [<ffffffff810c6d42>] ? rebalance_domains+0x152/0x290 [<ffffffff810c6f5c>] ? run_rebalance_domains+0xdc/0x1d0 [<ffffffff8108a75b>] ? __do_softirq+0xfb/0x320 [<ffffffff8108ac85>] ? irq_exit+0x125/0x130 [<ffffffff810b3a17>] ? scheduler_ipi+0x97/0x160 [<ffffffff81052709>] ? smp_reschedule_interrupt+0x29/0x30 [<ffffffff8173a1be>] ? reschedule_interrupt+0x6e/0x80 <EOI> [<ffffffff815bc83c>] ? cpuidle_enter_state+0xcc/0x230 [<ffffffff815bc80c>] ? cpuidle_enter_state+0x9c/0x230 [<ffffffff815bc9d7>] ? cpuidle_enter+0x17/0x20 [<ffffffff810cd6dc>] ? cpu_startup_entry+0x38c/0x420 [<ffffffff81053373>] ? start_secondary+0x173/0x1e0 Because divide-by-zero error happens in function: update_group_capacity() update_cpu_capacity() scale_rt_capacity() { ... total = sched_avg_period() + delta; used = div_u64(avg, total); ... } To fix this issue, check user input value of sysctl_sched_time_avg, keep it unchanged when hitting invalid input, and set the minimum limit of sysctl_sched_time_avg to 1 ms. Reported-by: James Puthukattukaran <james.puthukattukaran@oracle.com> Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: efault@gmx.de Cc: ethan.kernel@gmail.com Cc: keescook@chromium.org Cc: mcgrof@kernel.org Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1504504774-18253-1-git-send-email-ethan.zhao@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>