summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2017-09-20powerpc/tm: Flush TM only if CPU has TM featureGustavo Romero
Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") added code to access TM SPRs in flush_tmregs_to_thread(). However flush_tmregs_to_thread() does not check if TM feature is available on CPU before trying to access TM SPRs in order to copy live state to thread structures. flush_tmregs_to_thread() is indeed guarded by CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on a CPU without TM feature available, thus rendering the execution of TM instructions that are treated by the CPU as illegal instructions. The fix is just to add proper checking in flush_tmregs_to_thread() if CPU has the TM feature before accessing any TM-specific resource, returning immediately if TM is no available on the CPU. Adding that checking in flush_tmregs_to_thread() instead of in places where it is called, like in vsr_get() and vsr_set(), is better because avoids the same problem cropping up elsewhere. Cc: stable@vger.kernel.org # v4.13+ Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/sysrq: Fix oops whem ppmu is not registeredRavi Bangoria
Kernel crashes if power pmu is not registered and user tries to dump regs with 'echo p > /proc/sysrq-trigger'. Sample log: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000000d52f0 NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230 LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50 Call Trace: printk+0x38/0x4c (unreliable) __handle_sysrq+0xe4/0x270 write_sysrq_trigger+0x64/0x80 proc_reg_write+0x80/0xd0 __vfs_write+0x40/0x200 vfs_write+0xc8/0x240 SyS_write+0x60/0x110 system_call+0x58/0x6c Fixes: 5f6d0380c640 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/configs: Update for CONFIG_SND changesMichael Ellerman
Commit eb3b705aaed9 ("ALSA: Make CONFIG_SND_OSSEMUL user-selectable") means we need to set CONFIG_SND_OSSEMUL in our configs, otherwise we lose some of the SND symbols. And commit 0181307abc1d ("ALSA: seq: Reorganize kconfig and build") reorganised things, which causes the churn. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: - fix build without CONFIG_HAVE_KVM_IRQ_ROUTING - fix NULL access in x86 CR access - fix race with VMX posted interrups * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt KVM: VMX: do not change SN bit in vmx_update_pi_irte() KVM: x86: Fix the NULL pointer parameter in check_cr_write() Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"
2017-09-19MIPS: PCI: Move map_irq() hooks out of initdataLorenzo Pieralisi
04c81c7293df ("MIPS: PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") moved the PCI IRQ fixup to the new host bridge map/swizzle_irq() hooks mechanism. Those hooks can also be called after boot, when all the __init/__initdata/__initconst sections have been freed. Therefore, functions called by them (and the data they refer to) must not be marked as __init/__initdata/__initconst lest compilation trigger section mismatch warnings. Fix all the board files map_irq() hooks by simply removing the respective __init/__initdata/__initconst section markers and by adding another persistent hook IRQ map for the txx9 board files. Fixes: 04c81c7293df ("MIPS: PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Steve French <smfrench@gmail.com>
2017-09-19KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interruptHaozhong Zhang
WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)) in kvm_vcpu_trigger_posted_interrupt() intends to detect the violation of invariant that VT-d PI notification event is not suppressed when vcpu is in the guest mode. Because the two checks for the target vcpu mode and the target suppress field cannot be performed atomically, the target vcpu mode may change in between. If that does happen, WARN_ON_ONCE() here may raise false alarms. As the previous patch fixed the real invariant breaker, remove this WARN_ON_ONCE() to avoid false alarms, and document the allowed cases instead. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com> Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-19KVM: VMX: do not change SN bit in vmx_update_pi_irte()Haozhong Zhang
In kvm_vcpu_trigger_posted_interrupt() and pi_pre_block(), KVM assumes that PI notification events should not be suppressed when the target vCPU is not blocked. vmx_update_pi_irte() sets the SN field before changing an interrupt from posting to remapping, but it does not check the vCPU mode. Therefore, the change of SN field may break above the assumption. Besides, I don't see reasons to suppress notification events here, so remove the changes of SN field to avoid race condition. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com> Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-19KVM: x86: Fix the NULL pointer parameter in check_cr_write()Yu Zhang
Routine check_cr_write() will trigger emulator_get_cpuid()-> kvm_cpuid() to get maxphyaddr, and NULL is passed as values for ebx/ecx/edx. This is problematic because kvm_cpuid() will dereference these pointers. Fixes: d1cd3ce90044 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width.") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-19s390/mm: fix write access check in gup_huge_pmd()Gerald Schaefer
The check for the _SEGMENT_ENTRY_PROTECT bit in gup_huge_pmd() is the wrong way around. It must not be set for write==1, and not be checked for write==0. Fix this similar to how it was fixed for ptes long time ago in commit 25591b070336 ("[S390] fix get_user_pages_fast"). One impact of this bug would be unnecessarily using the gup slow path for write==0 on r/w mappings. A potentially more severe impact would be that gup_huge_pmd() will succeed for write==1 on r/o mappings. Cc: <stable@vger.kernel.org> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-19s390/mm: make pmdp_invalidate() do invalidation onlyGerald Schaefer
Commit 227be799c39a ("s390/mm: uninline pmdp_xxx functions from pgtable.h") inadvertently changed the behavior of pmdp_invalidate(), so that it now clears the pmd instead of just marking it as invalid. Fix this by restoring the original behavior. A possible impact of the misbehaving pmdp_invalidate() would be the MADV_DONTNEED races (see commits ced10803 and 58ceeb6b), although we should not have any negative impact on the related dirty/young flags, since those flags are not set by the hardware on s390. Fixes: 227be799c39a ("s390/mm: uninline pmdp_xxx functions from pgtable.h") Cc: <stable@vger.kernel.org> # v4.6+ Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-18arm64: ensure the kernel is compiled for LP64Andrew Pinski
The kernel needs to be compiled as a LP64 binary for ARM64, even when using a compiler that defaults to code-generation for the ILP32 ABI. Consequently, we need to explicitly pass '-mabi=lp64' (supported on gcc-4.9 and newer). Signed-off-by: Andrew Pinski <Andrew.Pinski@caviumnetworks.com> Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-18arm64: relax assembly code alignment from 16 byte to 4 byteMasahiro Yamada
Aarch64 instructions must be word aligned. The current 16 byte alignment is more than enough. Relax it into 4 byte alignment. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-18arm64: efi: Don't include EFI fpsimd save/restore code in non-EFI kernelsDave Martin
__efi_fpsimd_begin()/__efi_fpsimd_end() are for use when making EFI calls only, so using them in non-EFI kernels is not allowed. This patch compiles them out if CONFIG_EFI is not set. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-17arm64/syscalls: Move address limit check in loopThomas Garnier
A bug was reported on ARM where set_fs might be called after it was checked on the work pending function. ARM64 is not affected by this bug but has a similar construct. In order to avoid any similar problems in the future, the addr_limit_user_check function is moved at the beginning of the loop. Fixes: cf7de27ab351 ("arm64/syscalls: Check address limit on user-mode return") Reported-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pratyush Anand <panand@redhat.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Will Drewry <wad@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: David Howells <dhowells@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: Yonghong Song <yhs@fb.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1504798247-48833-5-git-send-email-keescook@chromium.org
2017-09-17arm/syscalls: Optimize address limit checkThomas Garnier
Disable the generic address limit check in favor of an architecture specific optimized implementation. The generic implementation using pending work flags did not work well with ARM and alignment faults. The address limit is checked on each syscall return path to user-mode path as well as the irq user-mode return function. If the address limit was changed, a function is called to report data corruption (stopping the kernel or process based on configuration). The address limit check has to be done before any pending work because they can reset the address limit and the process is killed using a SIGKILL signal. For example the lkdtm address limit check does not work because the signal to kill the process will reset the user-mode address limit. Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Leonard Crestez <leonard.crestez@nxp.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pratyush Anand <panand@redhat.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Will Drewry <wad@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: David Howells <dhowells@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: Yonghong Song <yhs@fb.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1504798247-48833-4-git-send-email-keescook@chromium.org
2017-09-17Revert "arm/syscalls: Check address limit on user-mode return"Thomas Garnier
This reverts commit 73ac5d6a2b6ac3ae8d1e1818f3e9946f97489bc9. The work pending loop can call set_fs after addr_limit_user_check removed the _TIF_FSCHECK flag. This may happen at anytime based on how ARM handles alignment exceptions. It leads to an infinite loop condition. After discussion, it has been agreed that the generic approach is not tailored to the ARM architecture and any fix might not be complete. This patch will be replaced by an architecture specific implementation. The work flag approach will be kept for other architectures. Reported-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pratyush Anand <panand@redhat.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Will Drewry <wad@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: David Howells <dhowells@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: Yonghong Song <yhs@fb.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1504798247-48833-3-git-send-email-keescook@chromium.org
2017-09-17x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUsAndy Lutomirski
For unknown historical reasons (i.e. Borislav doesn't recall), 32-bit kernels invoke cpu_init() on secondary CPUs with initial_page_table loaded into CR3. Then they set current->active_mm to &init_mm and call enter_lazy_tlb() before fixing CR3. This means that the x86 TLB code gets invoked while CR3 is inconsistent, and, with the improved PCID sanity checks I added, we warn. Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 72c0098d92ce ("x86/mm: Reinitialize TLB state on hotplug and resume") Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlierAndy Lutomirski
Otherwise we might have the PCID feature bit set during cpu_init(). This is just for robustness. I haven't seen any actual bugs here. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels") Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware codeAndy Lutomirski
Putting the logical ASID into CR3's PCID bits directly means that we have two cases to consider separately: ASID == 0 and ASID != 0. This means that bugs that only hit in one of these cases trigger nondeterministically. There were some bugs like this in the past, and I think there's still one in current kernels. In particular, we have a number of ASID-unware code paths that save CR3, write some special value, and then restore CR3. This includes suspend/resume, hibernate, kexec, EFI, and maybe other things I've missed. This is currently dangerous: if ASID != 0, then this code sequence will leave garbage in the TLB tagged for ASID 0. We could potentially see corruption when switching back to ASID 0. In principle, an initialize_tlbstate_and_flush() call after these sequences would solve the problem, but EFI, at least, does not call this. (And it probably shouldn't -- initialize_tlbstate_and_flush() is rather expensive.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17x86/mm: Factor out CR3-building codeAndy Lutomirski
Current, the code that assembles a value to load into CR3 is open-coded everywhere. Factor it out into helpers build_cr3() and build_cr3_noflush(). This makes one semantic change: __get_current_cr3_fast() was wrong on SME systems. No one noticed because the only caller is in the VMX code, and there are no CPUs with both SME and VMX. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix addressing the missing CP8 feature bit in CPUID for a range of AMD ZEN models/mask revisions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/AMD: Fix erratum 1076 (CPB bit)
2017-09-16Merge branch 'for-linus-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - minor improvements - fixes for Debian's new gcc defaults (pie enabled by default) - fixes for XSTATE/XSAVE to make UML work again on modern systems * 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: return negative in tuntap_open_tramp() um: remove a stray tab um: Use relative modversions with LD_SCRIPT_DYN um: link vmlinux with -no-pie um: Fix CONFIG_GCOV for modules. Fix minor typos and grammar in UML start_up help um: defconfig: Cleanup from old Kconfig options um: Fix FP register size for XSTATE/XSAVE
2017-09-15Merge branch '4.14-features' of ↵Linus Torvalds
git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS updates from Ralf Baechle: "This is the main pull request for 4.14 for MIPS; below a summary of the non-merge commits: CM: - Rename mips_cm_base to mips_gcr_base - Specify register size when generating accessors - Use BIT/GENMASK for register fields, order & drop shifts - Add cluster & block args to mips_cm_lock_other() CPC: - Use common CPS accessor generation macros - Use BIT/GENMASK for register fields, order & drop shifts - Introduce register modify (set/clear/change) accessors - Use change_*, set_* & clear_* where appropriate - Add CM/CPC 3.5 register definitions - Use GlobalNumber macros rather than magic numbers - Have asm/mips-cps.h include CM & CPC headers - Cluster support for topology functions - Detect CPUs in secondary clusters CPS: - Read GIC_VL_IDENT directly, not via irqchip driver DMA: - Consolidate coherent and non-coherent dma_alloc code - Don't use dma_cache_sync to implement fd_cacheflush FPU emulation / FP assist code: - Another series of 14 commits fixing corner cases such as NaN propgagation and other special input values. - Zero bits 32-63 of the result for a CLASS.D instruction. - Enhanced statics via debugfs - Do not use bools for arithmetic. GCC 7.1 moans about this. - Correct user fault_addr type Generic MIPS: - Enhancement of stack backtraces - Cleanup from non-existing options - Handle non word sized instructions when examining frame - Fix detection and decoding of ADDIUSP instruction - Fix decoding of SWSP16 instruction - Refactor handling of stack pointer in get_frame_info - Remove unreachable code from force_fcr31_sig() - Convert to using %pOF instead of full_name - Remove the R6000 support. - Move FP code from *_switch.S to *_fpu.S - Remove unused ST_OFF from r2300_switch.S - Allow platform to specify multiple its.S files - Add #includes to various files to ensure code builds reliable and without warning.. - Remove __invalidate_kernel_vmap_range - Remove plat_timer_setup - Declare various variables & functions static - Abstract CPU core & VP(E) ID access through accessor functions - Store core & VP IDs in GlobalNumber-style variable - Unify checks for sibling CPUs - Add CPU cluster number accessors - Prevent direct use of generic_defconfig - Make CONFIG_MIPS_MT_SMP default y - Add __ioread64_copy - Remove unnecessary inclusions of linux/irqchip/mips-gic.h GIC: - Introduce asm/mips-gic.h with accessor functions - Use new GIC accessor functions in mips-gic-timer - Remove counter access functions from irq-mips-gic.c - Remove gic_read_local_vp_id() from irq-mips-gic.c - Simplify shared interrupt pending/mask reads in irq-mips-gic.c - Simplify gic_local_irq_domain_map() in irq-mips-gic.c - Drop gic_(re)set_mask() functions in irq-mips-gic.c - Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(), gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c. - Convert remaining shared reg access, local int mask access and remaining local reg access to new accessors - Move GIC_LOCAL_INT_* to asm/mips-gic.h - Remove GIC_CPU_INT* macros from irq-mips-gic.c - Move various definitions to the driver - Remove gic_get_usm_range() - Remove __gic_irq_dispatch() forward declaration - Remove gic_init() - Use mips_gic_present() in place of gic_present and remove gic_present - Move gic_get_c0_*_int() to asm/mips-gic.h - Remove linux/irqchip/mips-gic.h - Inline __gic_init() - Inline gic_basic_init() - Make pcpu_masks a per-cpu variable - Use pcpu_masks to avoid reading GIC_SH_MASK* - Clean up mti, reserved-cpu-vectors handling - Use cpumask_first_and() in gic_set_affinity() - Let the core set struct irq_common_data affinity microMIPS: - Fix microMIPS stack unwinding on big endian systems MIPS-GIC: - SYNC after enabling GIC region NUMA: - Remove the unused parent_node() macro R6: - Constify r2_decoder_tables - Add accessor & bit definitions for GlobalNumber SMP: - Constify smp ops - Allow boot_secondary SMP op to return errors VDSO: - Drop gic_get_usm_range() usage - Avoid use of linux/irqchip/mips-gic.h Platform changes: Alchemy: - Add devboard machine type to cpuinfo - update cpu feature overrides - Threaded carddetect irqs for devboards AR7: - allow NULL clock for clk_get_rate BCM63xx: - Fix ENETDMA_6345_MAXBURST_REG offset - Allow NULL clock for clk_get_rate CI20: - Enable GPIO and RTC drivers in defconfig - Add ethernet and fixed-regulator nodes to DTS Generic platform: - Move Boston and NI 169445 FIT image source to their own files - Include asm/bootinfo.h for plat_fdt_relocated() - Include asm/time.h for get_c0_*_int() - Include asm/bootinfo.h for plat_fdt_relocated() - Include asm/time.h for get_c0_*_int() - Allow filtering enabled boards by requirements - Don't explicitly disable CONFIG_USB_SUPPORT - Bump default NR_CPUS to 16 JZ4700: - Probe the jz4740-rtc driver from devicetree Lantiq: - Drop check of boot select from the spi-falcon driver. - Drop check of boot select from the lantiq-flash MTD driver. - Access boot cause register in the watchdog driver through regmap - Add device tree binding documentation for the watchdog driver - Add docs for the RCU DT bindings. - Convert the fpi bus driver to a platform_driver - Remove ltq_reset_cause() and ltq_boot_select( - Switch to a proper reset driver - Switch to a new drivers/soc GPHY driver - Add an USB PHY driver for the Lantiq SoCs using the RCU module - Use of_platform_default_populate instead of __dt_register_buses - Enable MFD_SYSCON to be able to use it for the RCU MFD - Replace ltq_boot_select() with dummy implementation. Loongson 2F: - Allow NULL clock for clk_get_rate Malta: - Use new GIC accessor functions NI 169445: - Add support for NI 169445 board. - Only include in 32r2el kernels Octeon: - Add support for watchdog of 78XX SOCs. - Add support for watchdog of CN68XX SOCs. - Expose support for mips32r1, mips32r2 and mips64r1 - Enable more drivers in config file - Add support for accessing the boot vector. - Remove old boot vector code from watchdog driver - Define watchdog registers for 70xx, 73xx, 78xx, F75xx. - Make CSR functions node aware. - Allow access to CIU3 IRQ domains. - Misc cleanups in the watchdog driver Omega2+: - New board, add support and defconfig Pistachio: - Enable Root FS on NFS in defconfig Ralink: - Add Mediatek MT7628A SoC - Allow NULL clock for clk_get_rate - Explicitly request exclusive reset control in the pci-mt7620 PCI driver. SEAD3: - Only include in 32 bit kernels by default VoCore: - Add VoCore as a vendor t0 dt-bindings - Add defconfig file" * '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits) MIPS: Refactor handling of stack pointer in get_frame_info MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems MIPS: microMIPS: Fix decoding of swsp16 instruction MIPS: microMIPS: Fix decoding of addiusp instruction MIPS: microMIPS: Fix detection of addiusp instruction MIPS: Handle non word sized instructions when examining frame MIPS: ralink: allow NULL clock for clk_get_rate MIPS: Loongson 2F: allow NULL clock for clk_get_rate MIPS: BCM63XX: allow NULL clock for clk_get_rate MIPS: AR7: allow NULL clock for clk_get_rate MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset mips: Save all registers when saving the frame MIPS: Add DWARF unwinding to assembly MIPS: Make SAVE_SOME more standard MIPS: Fix issues in backtraces MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree MIPS: Ci20: Enable RTC driver watchdog: octeon-wdt: Add support for 78XX SOCs. watchdog: octeon-wdt: Add support for cn68XX SOCs. watchdog: octeon-wdt: File cleaning. ...
2017-09-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: - PPC bugfixes - RCU splat fix - swait races fix - pointless userspace-triggerable BUG() fix - misc fixes for KVM_RUN corner cases - nested virt correctness fixes + one host DoS - some cleanups - clang build fix - fix AMD AVIC with default QEMU command line options - x86 bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry kvm,mips: Fix potential swait_active() races kvm,powerpc: Serialize wq active checks in ops->vcpu_kick kvm: Serialize wq active checks in kvm_vcpu_wake_up() kvm,x86: Fix apf_task_wake_one() wq serialization kvm,lapic: Justify use of swait_active() kvm,async_pf: Use swq_has_sleeper() sched/wait: Add swq_has_sleeper() KVM: VMX: Do not BUG() on out-of-bounds guest IRQ KVM: Don't accept obviously wrong gsi values via KVM_IRQFD kvm: nVMX: Don't allow L2 to access the hardware CR8 KVM: trace events: update list of exit reasons KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously KVM: X86: Don't block vCPU if there is pending exception KVM: SVM: Add irqchip_split() checks before enabling AVIC KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv() KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu() KVM: x86: fix clang build ...
2017-09-15xen: x86: mark xen_find_pt_base as __initArnd Bergmann
gcc-4.6 causes a harmless link-time warning: WARNING: vmlinux.o(.text.unlikely+0x48e): Section mismatch in reference from the function xen_find_pt_base() to the function .init.text:m2p() The function xen_find_pt_base() references the function __init m2p(). This is often because xen_find_pt_base lacks a __init annotation or the annotation of m2p is wrong. Newer compilers inline this function, so it never shows up, but marking it __init is the right way to avoid the warning. Fixes: 70e61199559a ("xen: move p2m list if conflicting with e820 map") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-09-15Merge tag 'nios2-v4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 Pull arch/nios2 update from Ley Foon Tan. * tag 'nios2-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2: nios2: time: Read timer in get_cycles only if initialized nios2: add earlycon support to 3c120 devboard DTS
2017-09-15Merge tag 'powerpc-4.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix, for the handling of alignment interrupts on dcbz instructions. Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka" * tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix handling of alignment interrupt on dcbz instruction
2017-09-15kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properlyJim Mattson
When emulating a nested VM-entry from L1 to L2, several control field validation checks are deferred to the hardware. Should one of these validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When this happens, the L2 guest state is not loaded (even in part), and execution should continue in L1 with the next instruction after the VMLAUNCH/VMRESUME. The VMCS12 is not modified (except for the VM-instruction error field), the VMCS12 MSR save/load lists are not processed, and the CPU state is not loaded from the VMCS12 host area. Moreover, the vmcs02 exit reason is stale, so it should not be consulted for any reason. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15kvm: vmx: Handle VMLAUNCH/VMRESUME failure properlyJim Mattson
On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the VM-instruction error field of the current VMCS), the launch state of the current VMCS is not set to "launched," and the VM-exit information fields of the current VMCS (including IDT-vectoring information and exit reason) are stale. On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit of the exit reason field), the launch state of the current VMCS is not set to "launched," and only two of the VM-exit information fields of the current VMCS are modified (exit reason and exit qualification). The remaining VM-exit information fields of the current VMCS (including IDT-vectoring information, in particular) are stale. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15kvm: nVMX: Remove nested_vmx_succeed after successful VM-entryJim Mattson
After a successful VM-entry, RFLAGS is cleared, with the exception of bit 1, which is always set. This is handled by load_vmcs12_host_state. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15kvm,mips: Fix potential swait_active() racesDavidlohr Bueso
For example, the following could occur, making us miss a wakeup: CPU0 CPU1 kvm_vcpu_block kvm_mips_comparecount_func [L] swait_active(&vcpu->wq) [S] prepare_to_swait(&vcpu->wq) [L] if (!kvm_vcpu_has_pending_timer(vcpu)) schedule() [S] queue_timer_int(vcpu) Ensure that the swait_active() check is not hoisted over the interrupt. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15kvm,powerpc: Serialize wq active checks in ops->vcpu_kickDavidlohr Bueso
Particularly because kvmppc_fast_vcpu_kick_hv() is a callback, ensure that we properly serialize wq active checks in order to avoid potentially missing a wakeup due to racing with the waiter side. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15kvm,x86: Fix apf_task_wake_one() wq serializationDavidlohr Bueso
During code inspection, the following potential race was seen: CPU0 CPU1 kvm_async_pf_task_wait apf_task_wake_one [L] swait_active(&n->wq) [S] prepare_to_swait(&n.wq) [L] if (!hlist_unhahed(&n.link)) schedule() [S] hlist_del_init(&n->link); Properly serialize swait_active() checks such that a wakeup is not missed. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15kvm,lapic: Justify use of swait_active()Davidlohr Bueso
A comment might serve future readers. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15KVM: VMX: Do not BUG() on out-of-bounds guest IRQJan H. Schönherr
The value of the guest_irq argument to vmx_update_pi_irte() is ultimately coming from a KVM_IRQFD API call. Do not BUG() in vmx_update_pi_irte() if the value is out-of bounds. (Especially, since KVM as a whole seems to hang after that.) Instead, print a message only once if we find that we don't have a route for a certain IRQ (which can be out-of-bounds or within the array). This fixes CVE-2017-1000252. Fixes: efc644048ecde54 ("KVM: x86: Update IRTE for posted-interrupts") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15nios2: time: Read timer in get_cycles only if initializedGuenter Roeck
Mainline crashes as follows when running nios2 images. On node 0 totalpages: 65536 free_area_init_node: node 0, pgdat c8408fa0, node_mem_map c8726000 Normal zone: 512 pages used for memmap Normal zone: 0 pages reserved Normal zone: 65536 pages, LIFO batch:15 Unable to handle kernel NULL pointer dereference at virtual address 00000000 ea = c8003cb0, ra = c81cbf40, cause = 15 Kernel panic - not syncing: Oops Problem is seen because get_cycles() is called before the timer it depends on is initialized. Returning 0 in that situation fixes the problem. Fixes: 33d72f3822d7 ("init/main.c: extract early boot entropy from the ..") Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-09-15nios2: add earlycon support to 3c120 devboard DTSTobias Klauser
Allow earlycon to be used on the JTAG UART present in the 3c120 GHRD. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-09-15kvm: nVMX: Don't allow L2 to access the hardware CR8Jim Mattson
If L1 does not specify the "use TPR shadow" VM-execution control in vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store exiting" VM-execution controls in vmcs02. Failure to do so will give the L2 VM unrestricted read/write access to the hardware CR8. This fixes CVE-2017-12154. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15x86/cpu/AMD: Fix erratum 1076 (CPB bit)Borislav Petkov
CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14Merge branch 'work.set_fs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more set_fs removal from Al Viro: "Christoph's 'use kernel_read and friends rather than open-coding set_fs()' series" * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: unexport vfs_readv and vfs_writev fs: unexport vfs_read and vfs_write fs: unexport __vfs_read/__vfs_write lustre: switch to kernel_write gadget/f_mass_storage: stop messing with the address limit mconsole: switch to kernel_read btrfs: switch write_buf to kernel_write net/9p: switch p9_fd_read to kernel_write mm/nommu: switch do_mmap_private to kernel_read serial2002: switch serial2002_tty_write to kernel_{read/write} fs: make the buf argument to __kernel_write a void pointer fs: fix kernel_write prototype fs: fix kernel_read prototype fs: move kernel_read to fs/read_write.c fs: move kernel_write to fs/read_write.c autofs4: switch autofs4_write to __kernel_write ashmem: switch to ->read_iter
2017-09-14Merge branch 'work.ipc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ipc compat cleanup and 64-bit time_t from Al Viro: "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa Dinamani" * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: utimes: Make utimes y2038 safe ipc: shm: Make shmid_kernel timestamps y2038 safe ipc: sem: Make sem_array timestamps y2038 safe ipc: msg: Make msg_queue timestamps y2038 safe ipc: mqueue: Replace timespec with timespec64 ipc: Make sys_semtimedop() y2038 safe get rid of SYSVIPC_COMPAT on ia64 semtimedop(): move compat to native shmat(2): move compat to native msgrcv(2), msgsnd(2): move compat to native ipc(2): move compat to native ipc: make use of compat ipc_perm helpers semctl(): move compat to native semctl(): separate all layout-dependent copyin/copyout msgctl(): move compat to native msgctl(): split the actual work from copyin/copyout ipc: move compat shmctl to native shmctl: split the work from copyin/copyout
2017-09-15powerpc: Fix handling of alignment interrupt on dcbz instructionPaul Mackerras
This fixes the emulation of the dcbz instruction in the alignment interrupt handler. The error was that we were comparing just the instruction type field of op.type rather than the whole thing, and therefore the comparison "type != CACHEOP + DCBZ" was always true. Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Tested-by: Michal Sojka <sojkam1@fel.cvut.cz> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-14Merge branch 'dmi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi update from Jean Delvare: "Mark all struct dmi_system_id instances const" * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: dmi: Mark all struct dmi_system_id instances const
2017-09-14KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" ↵Wanpeng Li
exceptions simultaneously qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2 qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0 qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018 kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000 qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018 qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2 After running some memory intensive workload in guest, I catch the kworker which completes the GUP too quickly, and queues an "Page Ready" #PF exception after the "Page not Present" exception before the next vmentry as the above trace which will result in #DF injected to guest. This patch fixes it by clearing the queue for "Page not Present" if "Page Ready" occurs before the next vmentry since the GUP has already got the required page and shadow page table has already been fixed by "Page Ready" handler. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Fixes: 7c90705bf2a3 ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.") [Changed indentation and added clearing of injected. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-14Merge branch 'kvm-ppc-fixes' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc Bug fixes for stable.
2017-09-14KVM: X86: Don't block vCPU if there is pending exceptionWanpeng Li
Don't block vCPU if there is pending exception. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-14KVM: SVM: Add irqchip_split() checks before enabling AVICSuravee Suthikulpanit
SVM AVIC hardware accelerates guest write to APIC_EOI register (for edge-trigger interrupt), which means it does not trap to KVM. So, only enable SVM AVIC only in split irqchip mode. (e.g. launching qemu w/ option '-machine kernel_irqchip=split'). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support") [Removed pr_debug - Radim.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-14dmi: Mark all struct dmi_system_id instances constChristoph Hellwig
... and __initconst if applicable. Based on similar work for an older kernel in the Grsecurity patch. [JD: fix toshiba-wmi build] [JD: add htcpen] [JD: move __initconst where checkscript wants it] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jean Delvare <jdelvare@suse.de>
2017-09-13arm64: stacktrace: avoid listing stacktrace functions in stacktracePrakash Gupta
The stacktraces always begin as follows: [<c00117b4>] save_stack_trace_tsk+0x0/0x98 [<c0011870>] save_stack_trace+0x24/0x28 ... This is because the stack trace code includes the stack frames for itself. This is incorrect behaviour, and also leads to "skip" doing the wrong thing (which is the number of stack frames to avoid recording.) Perversely, it does the right thing when passed a non-current thread. Fix this by ensuring that we have a known constant number of frames above the main stack trace function, and always skip these. This was fixed for arch arm by commit 3683f44c42e9 ("ARM: stacktrace: avoid listing stacktrace functions in stacktrace") Link: http://lkml.kernel.org/r/1504078343-28754-1-git-send-email-guptap@codeaurora.org Signed-off-by: Prakash Gupta <guptap@codeaurora.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13mm: treewide: remove GFP_TEMPORARY allocation flagMichal Hocko
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>