summaryrefslogtreecommitdiff
path: root/arch/x86/entry/common.c
AgeCommit message (Collapse)Author
2016-03-10x86/entry: Call enter_from_user_mode() with IRQs offAndy Lutomirski
Now that slow-path syscalls always enter C before enabling interrupts, it's straightforward to call enter_from_user_mode() before enabling interrupts rather than doing it as part of entry tracing. With this change, we should finally be able to retire exception_enter(). This will also enable optimizations based on knowing that we never change context tracking state with interrupts on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bc376ecf87921a495e874ff98139b1ca2f5c5dd7.1457558566.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10x86/entry/32: Change INT80 to be an interrupt gateAndy Lutomirski
We want all of the syscall entries to run with interrupts off so that we can efficiently run context tracking before enabling interrupts. This will regress int $0x80 performance on 32-bit kernels by a couple of cycles. This shouldn't matter much -- int $0x80 is not a fast path. This effectively reverts: 657c1eea0019 ("x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on") ... and fixes the same issue differently. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/59b4f90c9ebfccd8c937305dbbbca680bc74b905.1457558566.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10x86/entry: Remove TIF_SINGLESTEP entry workAndy Lutomirski
Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall entry code. Remove it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17x86/entry/compat: Keep TS_COMPAT set during signal deliveryAndy Lutomirski
Signal delivery needs to know the sign of an interrupted syscall's return value in order to detect -ERESTART variants. Normally this works independently of bitness because syscalls internally return long. Under ptrace, however, this can break, and syscall_get_error is supposed to sign-extend regs->ax if needed. We were clearing TS_COMPAT too early, though, and this prevented sign extension, which subtly broke syscall restart under ptrace. Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # 4.3.x- Fixes: c5c46f59e4e7 ("x86/entry: Add new, comprehensible entry and exit handlers written in C") Link: http://lkml.kernel.org/r/cbce3cf545522f64eb37f5478cb59746230db3b5.1455142412.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29x86/entry/64: Migrate the 64-bit syscall slow path to CAndy Lutomirski
This is more complicated than the 32-bit and compat cases because it preserves an asm fast path for the case where the callee-saved regs aren't needed in pt_regs and no entry or exit work needs to be done. This appears to slow down fastpath syscalls by no more than one cycle on my Skylake laptop. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ce2335a4d42dc164b24132ee5e8c7716061f947b.1454022279.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-21x86/entry: Restore traditional SYSENTER calling conventionAndy Lutomirski
It turns out that some Android versions hardcode the SYSENTER calling convention. This is buggy and will cause problems no matter what the kernel does. Nonetheless, we should try to support it. Credit goes to Linus for pointing out a clean way to handle the SYSENTER/SYSCALL clobber differences while preserving straightforward DWARF annotations. I believe that the original offending Android commit was: https://android.googlesource.com/platform%2Fbionic/+/7dc3684d7a2587e43e6d2a8e0e3f39bf759bd535 Reported-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-and-tested-by: Borislav Petkov <bp@alien8.de> Cc: <mark.gross@intel.com> Cc: Su Tao <tao.su@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: <frank.wang@intel.com> Cc: <borun.fu@intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Mingwei Shi <mingwei.shi@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-18x86/entry/32: Fix entry_INT80_32() to expect interrupts to be onAndy Lutomirski
When I rewrote entry_INT80_32, I thought that int80 was an interrupt gate. It's a trap gate. *facepalm* Thanks to Brian Gerst for pointing out that it's better to change the entry code than to change the gate type. Suggested-by: Brian Gerst <brgerst@gmail.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 150ac78d63af ("x86/entry/32: Switch INT80 to the new C syscall path") Link: http://lkml.kernel.org/r/dc09d9b574a5c1dcca996847875c73f8341ce0ad.1445035014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Split and inline syscall_return_slowpath()Andy Lutomirski
GCC is unable to properly optimize functions that have a very short likely case and a longer and register-heavier cold part -- it fails to sink all of the register saving and stack frame setup code into the unlikely part. Help it out with syscall_return_slowpath() by splitting it into two parts and inline the hot part. Saves 6 cycles for compat syscalls. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/0f773a894ab15c589ac794c2d34ca6ba9b5335c9.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Split and inline prepare_exit_to_usermode()Andy Lutomirski
GCC is unable to properly optimize functions that have a very short likely case and a longer and register-heavier cold part -- it fails to sink all of the register saving and stack frame setup code into the unlikely part. Help it out with prepare_exit_to_usermode() by splitting it into two parts and inline the hot part. Saves 6-8 cycles for compat syscalls. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/9fc53eda4a5b924070952f12fa4ae3e477640a07.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Use pt_regs_to_thread_info() in syscall entry tracingAndy Lutomirski
It generates simpler and faster code than current_thread_info(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/a3b6633e7dcb9f673c1b619afae602d29d27d2cf.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRYAndy Lutomirski
This shaves a few cycles off the slow paths. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/ce383fa9e129286ce6da6e00b53acd4c9fb5d06a.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Micro-optimize compat fast syscall arg fetchAndy Lutomirski
We're following a 32-bit pointer, and the uaccess code isn't smart enough to figure out that the access_ok() check isn't needed. This saves about three cycles on a cache-hot fast syscall. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/bdff034e2f23c5eb974c760cf494cb5bddce8f29.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Force inlining of 32-bit syscall codeAndy Lutomirski
On systems that support fast syscalls, we only really care about the performance of the fast syscall path. Forcibly inline it and add a likely annotation. This saves 4-6 cycles. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/8472036ff1f4b426b4c4c3e3d0b3bf5264407c0c.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Make irqs_disabled checks in exit code depend on lockdepAndy Lutomirski
These checks are quite slow. Disable them in non-lockdep kernels to reduce the performance hit. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/eccff2a154ae6fb50f40228901003a6e9c24f3d0.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscallsAndy Lutomirski
This is slightly messy, but it eliminates an unnecessary cli;sti pair. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/22f34b1096694a37326f36c53407b8dd90f37948.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry/32: Re-implement SYSENTER using the new C pathAndy Lutomirski
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/5b99659e8be70f3dd10cd8970a5c90293d9ad9a7.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry/compat: Implement opportunistic SYSRETL for compat syscallsAndy Lutomirski
If CS, SS and IP are as expected and FLAGS is compatible with SYSRETL, then return from fast compat syscalls (both SYSCALL and SYSENTER) using SYSRETL. Unlike native 64-bit opportunistic SYSRET, this is not invisible to user code: RCX and R8-R15 end up in a different state than shown saved in pt_regs. To compensate, we only do this when returning to the vDSO fast syscall return path. This won't interfere with syscall restart, as we won't use SYSRETL when returning to the INT80 restart instruction. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/aa15e49db33773eb10b73d73466b6d5466d7856a.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Add C code for fast system call entriesAndy Lutomirski
This handles both SYSENTER and SYSCALL. The asm glue will take care of the differences. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/6041a58a9b8ef6d2522ab4350deb1a1945eb563f.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09x86/entry: Add do_syscall_32(), a C function to do 32-bit syscallsAndy Lutomirski
System calls are really quite simple. Add a helper to call a 32-bit system call. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/a77ed179834c27da436fb4a7fb23c8ee77abc11c.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07x86/entry, locking/lockdep: Move lockdep_sys_exit() to ↵Andy Lutomirski
prepare_exit_to_usermode() Rather than worrying about exactly where LOCKDEP_SYS_EXIT should go in the asm code, add it to prepare_exit_from_usermode() and remove all of the asm calls that are followed by prepare_exit_to_usermode(). LOCKDEP_SYS_EXIT now appears only in the syscall fast paths. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1736ebe948b845e68120b86b89091f3ec27f5e8e.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-05x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masksAndy Lutomirski
They are no longer used. Good riddance! Deleting the TIF_ macros is really nice. It was never clear why there were so many variants. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/22c61682f446628573dde0f1d573ab821677e06da.1438378274.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17x86/entry: Fix _TIF_USER_RETURN_NOTIFY check in prepare_exit_to_usermodeAndy Lutomirski
Linus noticed that the early return check was missing _TIF_USER_RETURN_NOTIFY. If the only work flag was _TIF_USER_RETURN_NOTIFY, we'd skip user return notifiers. Fix it. (This is the only missing bit.) This fixes double faults on a KVM host. It's the same issue as last time, except that this time it's very easy to trigger. Apparently no one uses -next as a KVM host. ( I'm still not quite sure what it is that KVM does that blows up so badly if we miss a user return notifier. My best guess is that KVM lets KERNEL_GS_BASE (i.e. the user's gs base) be negative and fixes it up in a user return notifier. If we actually end up in user mode with a negative gs base, we blow up pretty badly. ) Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c5c46f59e4e7 ("x86/entry: Add new, comprehensible entry and exit handlers written in C") Link: http://lkml.kernel.org/r/3f801104d24ee7a6bb1446408d9950777aa63277.1436995419.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07x86/entry: Add new, comprehensible entry and exit handlers written in CAndy Lutomirski
The current x86 entry and exit code, written in a mixture of assembly and C code, is incomprehensible due to being open-coded in a lot of places without coherent documentation. It appears to work primary by luck and duct tape: i.e. obvious runtime failures were fixed on-demand, without re-thinking the design. Due to those reasons our confidence level in that code is low, and it is very difficult to incrementally improve. Add new code written in C, in preparation for simply deleting the old entry code. prepare_exit_to_usermode() is a new function that will handle all slow path exits to user mode. It is called with IRQs disabled and it leaves us in a state in which it is safe to immediately return to user mode. IRQs must not be re-enabled at any point after prepare_exit_to_usermode() returns and user mode is actually entered. (We can, of course, fail to enter user mode and treat that failure as a fresh entry to kernel mode.) All callers of do_notify_resume() will be migrated to call prepare_exit_to_usermode() instead; prepare_exit_to_usermode() needs to do everything that do_notify_resume() does today, but it also takes care of scheduling and context tracking. Unlike do_notify_resume(), it does not need to be called in a loop. syscall_return_slowpath() is exactly what it sounds like: it will be called on any syscall exit slow path. It will replace syscall_trace_leave() and it calls prepare_exit_to_usermode() on the way out. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/c57c8b87661a4152801d7d3786eac2d1a2f209dd.1435952415.git.luto@kernel.org [ Improved the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07x86/entry: Add enter_from_user_mode() and use it in syscallsAndy Lutomirski
Changing the x86 context tracking hooks is dangerous because there are no good checks that we track our context correctly. Add a helper to check that we're actually in CONTEXT_USER when we enter from user mode and wire it up for syscall entries. Subsequent patches will wire this up for all non-NMI entries as well. NMIs are their own special beast and cannot currently switch overall context tracking state. Instead, they have their own special RCU hooks. This is a tiny speedup if !CONFIG_CONTEXT_TRACKING (removes a branch) and a tiny slowdown if CONFIG_CONTEXT_TRACING (adds a layer of indirection). Eventually, we should fix up the core context tracking code to supply a function that does what we want (and can be much simpler than user_exit), which will enable us to get rid of the extra call. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/853b42420066ec3fb856779cdc223a6dcb5d355b.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07x86/entry: Move C entry and exit code to arch/x86/entry/common.cAndy Lutomirski
The entry and exit C helpers were confusingly scattered between ptrace.c and signal.c, even though they aren't specific to ptrace or signal handling. Move them together in a new file. This change just moves code around. It doesn't change anything. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/324d686821266544d8572423cc281f961da445f4.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>