summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2017-11-21ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAEPhilip Derrin
Currently, for ARM kernels with CONFIG_ARM_LPAE and CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the kernel code and rodata are writable. They are marked read-only in a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit is not set (PMD_SECT_AP2). For user mappings, the logic that propagates the software bit to the hardware bit is in set_pmd_at(); but for the kernel, section_update() writes the PMDs directly, skipping this logic. The fix is to set PMD_SECT_AP2 for read-only sections in section_update(), at the same time as L_PMD_SECT_RDONLY. Fixes: 1e3479225acb ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error") Signed-off-by: Philip Derrin <philip@cog.systems> Reported-by: Neil Dick <neil@cog.systems> Tested-by: Neil Dick <neil@cog.systems> Tested-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-11-21ARM: 8721/1: mm: dump: check hardware RO bit for LPAEPhilip Derrin
When CONFIG_ARM_LPAE is set, the PMD dump relies on the software read-only bit to determine whether a page is writable. This concealed a bug which left the kernel text section writable (AP2=0) while marked read-only in the software bit. In a kernel with the AP2 bug, the dump looks like this: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M ro x SHD 0xc0600000-0xc0800000 2M ro NX SHD 0xc0800000-0xc4800000 64M RW NX SHD The fix is to check that the software and hardware bits are both set before displaying "ro". The dump then shows the true perms: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M RW x SHD 0xc0600000-0xc0800000 2M RW NX SHD 0xc0800000-0xc4800000 64M RW NX SHD Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE") Signed-off-by: Philip Derrin <philip@cog.systems> Tested-by: Neil Dick <neil@cog.systems> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-11-21ARM: make decompressor debug output user selectableRussell King
Make the decompressor debug output user selectable, otherwise merely enabling DEBUG_LL causes the decompressor to become board specific, thereby preventing a multi-platform kernel from booting. Enabling DEBUG_LL doesn't cause the kernel itself to become platform specific unless EARLY_PRINTK is enabled, or one of the debugging routines is added in a path that results in it being called. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-11-21ARM: fix get_user_pages_fastRussell King
Ensure that get_user_pages_fast() is not able to access memory which has been mapped with PROT_NONE. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-11-21powerpc/vas: Export chip_to_vas_id()Sukadev Bhattiprolu
Export the symbol chip_to_vas_id() to fix a build failure when CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV=m. Fixes: d4ef61b5e895 ("powerpc/vas, nx-842: Define and use chip_to_vas_id()") Reported-by: Haren Myneni <hbabu@us.ibm.com> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-21x86/umip: Print a warning into the syslog if UMIP-protected instructions are ↵Ricardo Neri
used Print a rate-limited warning when a user-space program attempts to execute any of the instructions that UMIP protects (i.e., SGDT, SIDT, SLDT, STR and SMSW). This is useful, because when CONFIG_X86_INTEL_UMIP=y is selected and supported by the hardware, user space programs that try to execute such instructions will receive a SIGSEGV signal that they might not expect. In the specific cases for which emulation is provided (instructions SGDT, SIDT and SMSW in protected and virtual-8086 modes), no signal is generated. However, a warning is helpful to encourage updates in such programs to avoid the use of such instructions. Warnings are printed via a customized printk() function that also provides information about the program that attempted to use the affected instructions. Utility macros are defined to wrap umip_printk() for the error and warning kernel log levels. While here, replace an existing call to the generic rate-limited pr_err() with the new umip_pr_err(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1511233476-17088-1-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-20ARM: dts: r8a779x: Add '#reset-cells' in cpg-mssrArnd Bergmann
With the latest dtc, we get many warnings about the missing '#reset-cells' property in these controllers, e.g.: arch/arm/boot/dts/r8a7790-lager.dtb: Warning (resets_property): Missing property '#reset-cells' in node /clock-controller@e6150000 or bad phandle (referred from /can@e6e80000:resets[0]) arch/arm/boot/dts/r8a7792-blanche.dtb: Warning (resets_property): Missing property '#reset-cells' in node /soc/clock-controller@e6150000 or bad phandle (referred from /soc/dma-controller@e6700000:resets[0]) arch/arm/boot/dts/r8a7792-wheat.dtb: Warning (resets_property): Missing property '#reset-cells' in node /soc/clock-controller@e6150000 or bad phandle (referred from /soc/ethernet@e6800000:resets[0]) arch/arm/boot/dts/r8a7793-gose.dtb: Warning (resets_property): Missing property '#reset-cells' in node /clock-controller@e6150000 or bad phandle (referred from /gpio@e6050000:resets[0]) arch/arm/boot/dts/r8a7794-alt.dtb: Warning (resets_property): Missing property '#reset-cells' in node /clock-controller@e6150000 or bad phandle (referred from /i2c@e6500000:resets[0]) arch/arm/boot/dts/r8a7794-silk.dtb: Warning (resets_property): Missing property '#reset-cells' in node /clock-controller@e6150000 or bad phandle (referred from /interrupt-controller@e61c0000:resets[0]) This adds it for the three r8a779x chips that were lacking it. The binding mandates this as <1>, so this is the value I use. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [geert: Add fix for r8a7793.dtsi] Fixes: 34fbd2b12761d111 ("ARM: dts: r8a7790: Add reset control properties") Fixes: 6e11a322f1d7505d ("ARM: dts: r8a7792: Add reset control properties") Fixes: 84fb19e1d201ba86 ("ARM: dts: r8a7793: Add reset control properties") Fixes: 615beb759ca494a4 ("ARM: dts: r8a7794: Add reset control properties") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2017-11-20powerpc/64s/slice: Use addr limit when computing slice maskAneesh Kumar K.V
While computing slice mask for the free area we need make sure we only search in the addr limit applicable for this mmap. We update the slb_addr_limit after we request for a mmap above 128TB. But the following mmap request with hint addr below 128TB should still limit its search to below 128TB. ie. we should not use slb_addr_limit to compute slice mask in this case. Instead, we should derive high addr limit based on the mmap hint addr value. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-20s390/disassembler: remove confusing codeHeiko Carstens
When searching the opcode offset table within find_insn() the check "entry->opcode == 0" was intended to clarify that 1-byte opcodes, the first one being 0, are special. However there is no mnemonic for an illegal opcode starting with 0. Therefore there is also no opcode offset table entry that matches, which again means that the check never is true. Therefore just remove the confusing check, and add a comment which hopefully explains how this works. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-20s390: rework __switch_to() to allow larger task_struct offsetsHeiko Carstens
If GCC_PLUGIN_RANDSTRUCT is enabled the members of task_struct will be shuffled around. The offsets of the "pid" and "stack" members within task_struct may not necessarily fit into 12 bits anymore, which causes compile errors within __switch_to, since instructions are used, which only have a 12 bit displacement field. Therefore rework __switch_to, to allow for larger offsets. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-20s390/topology: fix compile error in file arch/s390/kernel/smp.cThomas Richter
Commit 1887aa07b676 ("s390/topology: add detection of dedicated vs shared CPUs") introduced following compiler error when CONFIG_SCHED_TOPOLOGY is not set. CC arch/s390/kernel/smp.o ... arch/s390/kernel/smp.c: In function ‘smp_start_secondary’: arch/s390/kernel/smp.c:812:6: error: implicit declaration of function ‘topology_cpu_dedicated’; did you mean ‘topology_cpu_init’? This patch fixes the compiler error by adding function topology_cpu_dedicated() to return false when this config option is not defined. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc updates from David Miller: 1) Add missing cmpxchg64() for 32-bit sparc. 2) Timer conversions from Allen Pais and Kees Cook. 3) vDSO support, from Nagarathnam Muthusamy. 4) Fix sparc64 huge page table walks based upon bug report by Al Viro, from Nitin Gupta. 5) Optimized fls() for T4 and above, from Vijay Kumar. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix page table walk for PUD hugepages sparc64: Convert timers to user timer_setup() sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t sparc/led: Convert timers to use timer_setup() sparc64: Use sparc optimized fls and __fls for T4 and above sparc64: SPARC optimized __fls function sparc64: SPARC optimized fls function sparc64: Define SPARC default __fls function sparc64: Define SPARC default fls function vDSO for sparc sparc32: Add cmpxchg64(). sbus: char: Move D7S_MINOR to include/linux/miscdevice.h sparc: time: Remove unneeded linux/miscdevice.h include sparc64: mmu_context: Add missing include files
2017-11-18kbuild: remove all dummy assignments to obj-Masahiro Yamada
Now kbuild core scripts create empty built-in.o where necessary. Remove "obj- := dummy.o" tricks. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-11-17Merge tag 'kbuild-v4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: "One of the most remarkable improvements in this cycle is, Kbuild is now able to cache the result of shell commands. Some variables are expensive to compute, for example, $(call cc-option,...) invokes the compiler. It is not efficient to redo this computation every time, even when we are not actually building anything. Kbuild creates a hidden file ".cache.mk" that contains invoked shell commands and their results. The speed-up should be noticeable. Summary: - Fix arch build issues (hexagon, sh) - Clean up various Makefiles and scripts - Fix wrong usage of {CFLAGS,LDFLAGS}_MODULE in arch Makefiles - Cache variables that are expensive to compute - Improve cc-ldopton and ld-option for Clang - Optimize output directory creation" * tag 'kbuild-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits) kbuild: move coccicheck help from scripts/Makefile.help to top Makefile sh: decompressor: add shipped files to .gitignore frv: .gitignore: ignore vmlinux.lds selinux: remove unnecessary assignment to subdir- kbuild: specify FORCE in Makefile.headersinst as .PHONY target kbuild: remove redundant mkdir from ./Kbuild kbuild: optimize object directory creation for incremental build kbuild: create object directories simpler and faster kbuild: filter-out PHONY targets from "targets" kbuild: remove redundant $(wildcard ...) for cmd_files calculation kbuild: create directory for make cache only when necessary sh: select KBUILD_DEFCONFIG depending on ARCH kbuild: fix linker feature test macros when cross compiling with Clang kbuild: shrink .cache.mk when it exceeds 1000 lines kbuild: do not call cc-option before KBUILD_CFLAGS initialization kbuild: Cache a few more calls to the compiler kbuild: Add a cache for generated variables kbuild: add forward declaration of default target to Makefile.asm-generic kbuild: remove KBUILD_SUBDIR_ASFLAGS and KBUILD_SUBDIR_CCFLAGS hexagon/kbuild: replace CFLAGS_MODULE with KBUILD_CFLAGS_MODULE ...
2017-11-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - a bit more MM - procfs updates - dynamic-debug fixes - lib/ updates - checkpatch - epoll - nilfs2 - signals - rapidio - PID management cleanup and optimization - kcov updates - sysvipc updates - quite a few misc things all over the place * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) EXPERT Kconfig menu: fix broken EXPERT menu include/asm-generic/topology.h: remove unused parent_node() macro arch/tile/include/asm/topology.h: remove unused parent_node() macro arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro arch/sh/include/asm/topology.h: remove unused parent_node() macro arch/ia64/include/asm/topology.h: remove unused parent_node() macro drivers/pcmcia/sa1111_badge4.c: avoid unused function warning mm: add infrastructure for get_user_pages_fast() benchmarking sysvipc: make get_maxid O(1) again sysvipc: properly name ipc_addid() limit parameter sysvipc: duplicate lock comments wrt ipc_addid() sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE initramfs: use time64_t timestamps drivers/watchdog: make use of devm_register_reboot_notifier() kernel/reboot.c: add devm_register_reboot_notifier() kcov: update documentation Makefile: support flag -fsanitizer-coverage=trace-cmp kcov: support comparison operands collection kcov: remove pointless current != NULL check kernel/panic.c: add TAINT_AUX ...
2017-11-17arch/tile/include/asm/topology.h: remove unused parent_node() macroDou Liyang
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removed the last user of parent_node(). The parent_node() macro in tile platform is unnecessary. Remove it for cleanup. Link: http://lkml.kernel.org/r/1504234599-29533-7-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Chris Metcalf <cmetcalf@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17arch/sparc/include/asm/topology_64.h: remove unused parent_node() macroDou Liyang
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removed the last user of parent_node(). The parent_node() macro in SPARC64 platform is unnecessary. Remove it for cleanup. Link: http://lkml.kernel.org/r/1504234599-29533-6-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17arch/sh/include/asm/topology.h: remove unused parent_node() macroDou Liyang
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removed the last user of parent_node(). The parent_node() macro in SUPERH platform is unnecessary. Remove it for cleanup. Link: http://lkml.kernel.org/r/1504234599-29533-5-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17arch/ia64/include/asm/topology.h: remove unused parent_node() macroDou Liyang
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removed the last user of parent_node(). The parent_node() macro in IA64(Itanium) platform is unnecessary. Remove it for cleanup. Link: http://lkml.kernel.org/r/1504234599-29533-2-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17pid: remove pidhashGargi Sharma
pidhash is no longer required as all the information can be looked up from idr tree. nr_hashed represented the number of pids that had been hashed. Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant, it has been renamed to pid_allocated and PIDNS_ADDING respectively. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> [ia64] Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17pid: replace pid bitmap implementation with IDR APIGargi Sharma
Patch series "Replacing PID bitmap implementation with IDR API", v4. This series replaces kernel bitmap implementation of PID allocation with IDR API. These patches are written to simplify the kernel by replacing custom code with calls to generic code. The following are the stats for pid and pid_namespace object files before and after the replacement. There is a noteworthy change between the IDR and bitmap implementation. Before text data bss dec hex filename 8447 3894 64 12405 3075 kernel/pid.o After text data bss dec hex filename 3397 304 0 3701 e75 kernel/pid.o Before text data bss dec hex filename 5692 1842 192 7726 1e2e kernel/pid_namespace.o After text data bss dec hex filename 2854 216 16 3086 c0e kernel/pid_namespace.o The following are the stats for ps, pstree and calling readdir on /proc for 10,000 processes. ps: With IDR API With bitmap real 0m1.479s 0m2.319s user 0m0.070s 0m0.060s sys 0m0.289s 0m0.516s pstree: With IDR API With bitmap real 0m1.024s 0m1.794s user 0m0.348s 0m0.612s sys 0m0.184s 0m0.264s proc: With IDR API With bitmap real 0m0.059s 0m0.074s user 0m0.000s 0m0.004s sys 0m0.016s 0m0.016s This patch (of 2): Replace the current bitmap implementation for Process ID allocation. Functions that are no longer required, for example, free_pidmap(), alloc_pidmap(), etc. are removed. The rest of the functions are modified to use the IDR API. The change was made to make the PID allocation less complex by replacing custom code with calls to generic API. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl] Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17bug: define the "cut here" string in a single placeKees Cook
The "cut here" string is used in a few paths. Define it in a single place. Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17sh/boot: add static stack-protector to pre-kernelKees Cook
The sh decompressor code triggers stack-protector code generation when using CONFIG_CC_STACKPROTECTOR_STRONG. As done for arm and mips, add a simple static stack-protector canary. As this wasn't protected before, the risk of using a weak canary is minimized. Once the kernel is actually up, a better canary is chosen. Link: http://lkml.kernel.org/r/1506972007-80614-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <mmarek@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17Merge tag 'pm-fixes-4.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull two power management fixes from Rafael Wysocki: "This is the change making /proc/cpuinfo on x86 report current CPU frequency in "cpu MHz" again in all cases and an additional one dealing with an overzealous check in one of the helper routines in the runtime PM framework" * tag 'pm-fixes-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / runtime: Drop children check from __pm_runtime_set_status() x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
2017-11-17Merge branch 'parisc-4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "Highlights: - one important fix from Dave to prevent kernel crash when userspace hands over invalid values to our in-kernel CAS implementation. - added CPU topology support, including multi-core scheduler support on PA8900 CPUs Minor changes: - minor fixes for sparse (from Luc) - drop duplicates for CPU_BIG_ENDIAN from parisc and sparc top Kconfig files (from Babu) - reorganized parisc PDC (firmware-access) header files for usage from userspace. Required for upcoming qemu parisc emulator and SeaBIOS fork to support parisc" * 'parisc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: arch: Fix duplicates in Kconfig for parisc and sparc parisc: Make some PDC structures accessible in uapi headers parisc: Pass endianness info to sparse parisc: Add CPU topology support parisc: Fix validity check of pointer size argument in new CAS implementation
2017-11-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull second round of s390 updates from Martin Schwidefsky: - rework of the vdso code to avoid the use of the access register mode - use perf AUX buffers for the transport of diagnostic sample data - add perf_regs and user stack dump support - enable perf call graphs for user space programs - add perf register support for floating-point registers - all remaining s390 related timer_setup conversions - bug fixes and cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (30 commits) s390: remove unused parameter from Makefile zfcp: purely mechanical update using timer API, plus blank lines s390/scsi: Convert timers to use timer_setup() s390/cpum_sf: correctly set the PID and TID in perf samples s390/cpum_sf: load program parameter at sampler enablement s390/perf: add perf register support for floating-point registers s390/perf: extend perf_regs support to include floating-point registers s390/perf: define common DWARF register string table s390/perf: add support for perf_regs and libdw s390/perf: add perf_regs support and user stack dump s390/cpum_sf: do not register PMU if no sampling mode is authorized s390/cpumf: remove raw event support in basic-only sampling mode s390/perf: add callback to perf to enable using AUX buffer s390/cpumf: enable using AUX buffer s390/cpumf: introduce AUX buffer for dump diagnostic sample data s390/disassembler: increase show_code buffer size s390: Remove CONFIG_HARDENED_USERCOPY s390: enable CPU alternatives unconditionally s390/nmi: remove unused code s390/mm: remove unused code ...
2017-11-17Merge tag 'locks-v4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking update from Jeff Layton: "A couple of fixes for a patch that went into v4.14, and the bug report just came in a few days ago.. It passes my (minimal) testing, and has been in linux-next for a few days now. I also would like to get my address changed in MAINTAINERS to clear that hurdle" * tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall fcntl: don't leak fd reference when fixup_compat_flock fails MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
2017-11-17Merge branch 'misc.compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat and uaccess updates from Al Viro: - {get,put}_compat_sigset() series - assorted compat ioctl stuff - more set_fs() elimination - a few more timespec64 conversions - several removals of pointless access_ok() in places where it was followed only by non-__ variants of primitives * 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits) coredump: call do_unlinkat directly instead of sys_unlink fs: expose do_unlinkat for built-in callers ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs() ipmi: get rid of pointless access_ok() pi433: sanitize ioctl cxlflash: get rid of pointless access_ok() mtdchar: get rid of pointless access_ok() r128: switch compat ioctls to drm_ioctl_kernel() selection: get rid of field-by-field copyin VT_RESIZEX: get rid of field-by-field copyin i2c compat ioctls: move to ->compat_ioctl() sched_rr_get_interval(): move compat to native, get rid of set_fs() mips: switch to {get,put}_compat_sigset() sparc: switch to {get,put}_compat_sigset() s390: switch to {get,put}_compat_sigset() ppc: switch to {get,put}_compat_sigset() parisc: switch to {get,put}_compat_sigset() get_compat_sigset() get rid of {get,put}_compat_itimerspec() io_getevents: Use timespec64 to represent timeouts ...
2017-11-17Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
2017-11-17x86/smpboot: Fix __max_logical_packages estimatePrarit Bhargava
A system booted with a small number of cores enabled per package panics because the estimate of __max_logical_packages is too low. This occurs when the total number of active cores across all packages is less than the maximum core count for a single package. e.g.: On a 4 package system with 20 cores/package where only 4 cores are enabled on each package, the value of __max_logical_packages is calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4. Calculate __max_logical_packages after the cpu enumeration has completed. Use the boot cpu's data to extrapolate the number of packages. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
2017-11-17x86/topology: Avoid wasting 128k for package id arrayAndi Kleen
Analyzing large early boot allocations unveiled the logical package id storage as a prominent memory waste. Since commit 1f12e32f4cd5 ("x86/topology: Create logical package id") every 64-bit system allocates a 128k array to convert logical package ids. This happens because the array is sized for MAX_LOCAL_APIC which is always 32k on 64bit systems, and it needs 4 bytes for each entry. This is fairly wasteful, especially for the common case of having only one socket, which uses exactly 4 byte out of 128K. There is no user of the package id map which is performance critical, so the lookup is not required to be O(1). Store the logical processor id in cpu_data and use a loop based lookup. To keep the mapping stable accross cpu hotplug operations, add a flag to cpu_data which is set when the CPU is brought up the first time. When the flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on subsequent bringups. [ tglx: Rename the flag to 'initialized', use proper pointers instead of repeated cpu_data(x) evaluation and massage changelog. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
2017-11-17perf/x86/intel/uncore: Cache logical pkg id in uncore driverAndi Kleen
The SNB-EP uncore driver is the only user of topology_phys_to_logical_pkg in a performance critical path. Change it query the logical pkg ID only once at initialization time and then cache it in box structure. This allows to change the logical package management without affecting the performance critical path. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-2-prarit@redhat.com
2017-11-17x86/acpi: Reduce code duplication in mp_override_legacy_irq()Vikas C Sajjan
The new function mp_register_ioapic_irq() is a subset of the code in mp_override_legacy_irq(). Replace the code duplication by invoking mp_register_ioapic_irq() from mp_override_legacy_irq(). Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: linux-pm@vger.kernel.org Cc: kkamagui@gmail.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/1510848825-21965-3-git-send-email-vikas.cha.sajjan@hpe.com
2017-11-17x86/acpi: Handle SCI interrupts above legacy space gracefullyVikas C Sajjan
Platforms which support only IOAPIC mode, pass the SCI information above the legacy space (0-15) via the FADT mechanism and not via MADT. In such cases mp_override_legacy_irq() which is invoked from acpi_sci_ioapic_setup() to register SCI interrupts fails for interrupts greater equal 16, since it is meant to handle only the legacy space and emits error "Invalid bus_irq %u for legacy override". Add a new function to handle SCI interrupts >= 16 and invoke it conditionally in acpi_sci_ioapic_setup(). The code duplication due to this new function will be cleaned up in a separate patch. Co-developed-by: Sunil V L <sunil.vl@hpe.com> Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com> Signed-off-by: Sunil V L <sunil.vl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Abdul Lateef Attar <abdul-lateef.attar@hpe.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: linux-pm@vger.kernel.org Cc: kkamagui@gmail.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/1510848825-21965-2-git-send-email-vikas.cha.sajjan@hpe.com
2017-11-17x86/boot: Fix boot failure when SMP MP-table is based at 0Tom Lendacky
When crosvm is used to boot a kernel as a VM, the SMP MP-table is found at physical address 0x0. This causes mpf_base to be set to 0 and a subsequent "if (!mpf_base)" check in default_get_smp_config() results in the MP-table not being parsed. Further into the boot this results in an oops when attempting a read_apic_id(). Add a boolean variable that is set to true when the MP-table is found. Use this variable for testing if the MP-table was found so that even a value of 0 for mpf_base will result in continued parsing of the MP-table. Fixes: 5997efb96756 ("x86/boot: Use memremap() to map the MPF and MPC data") Reported-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: regression@leemhuis.info Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20171106201753.23059.86674.stgit@tlendack-t1.amdoffice.net
2017-11-17arch: Fix duplicates in Kconfig for parisc and sparcBabu Moger
Fix duplicates for sparc and parisc. This was due these following commits. 1. commit 4c97a0c8fee3 ("arch: define CPU_BIG_ENDIAN for all fixed big endian archs") 2. commit 97d9f969161d ("arch/sparc: Define config parameter CPU_BIG_ENDIAN") 3. commit 74ad3d28af21 ("parisc: Define CONFIG_CPU_BIG_ENDIAN") Remove duplicates. Signed-off-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: Helge Deller <deller@gmx.de>
2017-11-17parisc: Make some PDC structures accessible in uapi headersHelge Deller
While working on a qemu and SeaBIOS-port to parisc, those PDC structures are useful to have accessible from userspace. Signed-off-by: Helge Deller <deller@gmx.de>
2017-11-17parisc: Pass endianness info to sparseLuc Van Oostenryck
parisc is big-endian only but sparse assumes the same endianness as the building machine. This is problematic for code which expect __BYTE_ORDER__ being correctly predefined by the compiler which sparse can then pre-process differently from what gcc would. Fix this by letting sparse know about the architecture endianness. To: James Bottomley <jejb@parisc-linux.org> To: Helge Deller <deller@gmx.de> CC: linux-parisc@vger.kernel.org Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2017-11-17parisc: Add CPU topology supportHelge Deller
Add topology support, including multi-core scheduler support on PA8800/PA8900 CPUs and enhanced output in /proc/cpuinfo, e.g. lscpu now reports on a single-socket, dual-core machine: Architecture: parisc64 CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 CPU family: PA-RISC 2.0 Model name: PA8800 (Mako) Signed-off-by: Helge Deller <deller@gmx.de>
2017-11-17parisc: Fix validity check of pointer size argument in new CAS implementationJohn David Anglin
As noted by Christoph Biedl, passing a pointer size of 4 in the new CAS implementation causes a kernel crash. The attached patch corrects the off by one error in the argument validity check. In reviewing the code, I noticed that we only perform word operations with the pointer size argument. The subi instruction intentionally uses a word condition on 64-bit kernels. Nullification was used instead of a cmpib instruction as the branch should never be taken. The shlw pseudo-operation generates a depw,z instruction and it clears the target before doing a shift left word deposit. Thus, we don't need to clip the upper 32 bits of this argument on 64-bit kernels. Tested with a gcc testsuite run with a 64-bit kernel. The gcc atomic code in libgcc is the only direct user of the new CAS implementation that I am aware of. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Helge Deller <deller@gmx.de>
2017-11-17KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIPPaolo Bonzini
These bits were not defined until now in common code, but they are now that the kernel supports UMIP too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-17KVM: x86: Fix CPUID function for word 6 (80000001_ECX)Janakarajan Natarajan
The function for CPUID 80000001 ECX is set to 0xc0000001. Set it to 0x80000001. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Fixes: d6321d493319 ("KVM: x86: generalize guest_cpuid_has_ helpers") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was ↵Liran Alon
reinjected to L2 vmx_check_nested_events() should return -EBUSY only in case there is a pending L1 event which requires a VMExit from L2 to L1 but such a VMExit is currently blocked. Such VMExits are blocked either because nested_run_pending=1 or an event was reinjected to L2. vmx_check_nested_events() should return 0 in case there are no pending L1 events which requires a VMExit from L2 to L1 or if a VMExit from L2 to L1 was done internally. However, upstream commit which introduced blocking in case an event was reinjected to L2 (commit acc9ab601327 ("KVM: nVMX: Fix pending events injection")) contains a bug: It returns -EBUSY even if there are no pending L1 events which requires VMExit from L2 to L1. This commit fix this issue. Fixes: acc9ab601327 ("KVM: nVMX: Fix pending events injection") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: x86: ioapic: Preserve read-only values in the redirection tableNikita Leshenko
According to 82093AA (IOAPIC) manual, Remote IRR and Delivery Status are read-only. QEMU implements the bits as RO in commit 479c2a1cb7fb ("ioapic: keep RO bits for IOAPIC entry"). Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggeredNikita Leshenko
Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for IOAPICs without an EOI register. They simulate the EOI message manually by changing the trigger mode to edge and then back to level, with the entry being masked during this. QEMU implements this feature in commit ed1263c363c9 ("ioapic: clear remote irr bit for edge-triggered interrupts") As a side effect, this commit removes an incorrect behavior where Remote IRR was cleared when the redirection table entry was rewritten. This is not consistent with the manual and also opens an opportunity for a strange behavior when a redirection table entry is modified from an interrupt handler that handles the same entry: The modification will clear the Remote IRR bit even though the interrupt handler is still running. Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irqNikita Leshenko
Remote IRR for level-triggered interrupts was previously checked in ioapic_set_irq, but since we now have a check in ioapic_service we can remove the redundant check from ioapic_set_irq. This commit doesn't change semantics. Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: x86: ioapic: Don't fire level irq when Remote IRR setNikita Leshenko
Avoid firing a level-triggered interrupt that has the Remote IRR bit set, because that means that some CPU is already processing it. The Remote IRR bit will be cleared after an EOI and the interrupt will refire if the irq line is still asserted. This behavior is aligned with QEMU's IOAPIC implementation that was introduced by commit f99b86b94987 ("x86: ioapic: ignore level irq during processing") in QEMU. Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure raceNikita Leshenko
KVM uses ioapic_handled_vectors to track vectors that need to notify the IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an interrupt with old configuration is pending or running and ioapic_handled_vectors only remembers the newest configuration; thus EOI from the old interrupt is not delievered to the IOAPIC. A previous commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race") addressed this issue by adding pending edge-triggered interrupts to ioapic_handled_vectors, fixing this race for edge-triggered interrupts. The commit explicitly ignored level-triggered interrupts, but this race applies to them as well: 1) IOAPIC sends a level triggered interrupt vector to VCPU0 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC to route the vector to VCPU1. The reconfiguration rewrites only the upper 32 bits of the IOREDTBLn register. (Causes KVM to update ioapic_handled_vectors for VCPU0 and it no longer includes the vector.) 3) VCPU0 sends EOI for the vector, but it's not delievered to the IOAPIC because the ioapic_handled_vectors doesn't include the vector. 4) New interrupts are not delievered to VCPU1 because remote_irr bit is set forever. Therefore, the correct behavior is to add all pending and running interrupts to ioapic_handled_vectors. This commit introduces a slight performance hit similar to commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race") for the rare case that the vector is reused by a non-IOAPIC source on VCPU0. We prefer to keep solution simple and not handle this case just as the original commit does. Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race") Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: x86: inject exceptions produced by x86_decode_insnPaolo Bonzini
Sometimes, a processor might execute an instruction while another processor is updating the page tables for that instruction's code page, but before the TLB shootdown completes. The interesting case happens if the page is in the TLB. In general, the processor will succeed in executing the instruction and nothing bad happens. However, what if the instruction is an MMIO access? If *that* happens, KVM invokes the emulator, and the emulator gets the updated page tables. If the update side had marked the code page as non present, the page table walk then will fail and so will x86_decode_insn. Unfortunately, even though kvm_fetch_guest_virt is correctly returning X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as a fatal error if the instruction cannot simply be reexecuted (as is the case for MMIO). And this in fact happened sometimes when rebooting Windows 2012r2 guests. Just checking ctxt->have_exception and injecting the exception if true is enough to fix the case. Thanks to Eduardo Habkost for helping in the debugging of this issue. Reported-by: Yanan Fu <yfu@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRsEyal Moscovici
Some guests use these unhandled MSRs very frequently. This cause dmesg to be populated with lots of aggregated messages on usage of ignored MSRs. As ignore_msrs=true means that the user is well-aware his guest use ignored MSRs, allow to also disable the prints on their usage. An example of such guest is ESXi which tends to access a lot to MSR 0x34 (MSR_SMI_COUNT) very frequently. In addition, we have observed this to cause unnecessary delays to guest execution. Such an example is ESXi which experience networking delays in it's guests (L2 guests) because of these prints (even when prints are rate-limited). This can easily be reproduced by pinging from one L2 guest to another. Once in a while, a peak in ping RTT will be observed. Removing these unhandled MSR prints solves the issue. Because these prints can help diagnose issues with guests, this commit only suppress them by a module parameter instead of removing them from code entirely. Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [Changed suppress_ignore_msrs_prints to report_ignored_msrs - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>