summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2016-07-30Merge tag 'devicetree-for-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: - remove most of_platform_populate() calls in arch code. Now the DT core code calls it in the default case and platforms only need to call it if they have special needs - use pr_fmt on all the DT core print statements - CoreSight binding doc improvements to block name descriptions - add dt_to_config script which can parse dts files and list corresponding kernel config options - fix memory leak hit with a PowerMac DT - correct a bunch of STMicro compatible strings to use the correct vendor prefix - fix DA9052 PMIC binding doc to match what is actually used in dts files * tag 'devicetree-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (35 commits) documentation: da9052: Update regulator bindings names to match DA9052/53 DTS expectations xtensa: Partially Revert "xtensa: Remove unnecessary of_platform_populate with default match table" xtensa: Fix build error due to missing include file MIPS: ath79: Add missing include file Fix spelling errors in Documentation/devicetree ARM: dts: fix STMicroelectronics compatible strings powerpc/dts: fix STMicroelectronics compatible strings Documentation: dt: i2c: use correct STMicroelectronics vendor prefix scripts/dtc: dt_to_config - kernel config options for a devicetree of: fdt: mark unflattened tree as detached of: overlay: add resolver error prints coresight: document binding acronyms Documentation/devicetree: document cavium-pip rx-delay/tx-delay properties of: use pr_fmt prefix for all console printing of/irq: Mark initialised interrupt controllers as populated of: fix memory leak related to safe_name() Revert "of/platform: export of_default_bus_match_table" of: unittest: use of_platform_default_populate() to populate default bus memory: omap-gpmc: use of_platform_default_populate() to populate default bus bus: uniphier-system-bus: use of_platform_default_populate() to populate default bus ...
2016-07-30Merge tag 'clk-for-linus-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Michael Turquette: "The bulk of the changes are updates and fixes to existing clk provider drivers, along with a pretty standard number of new drivers. The core recieved a small number of updates as well. Core changes of note: - removed CLK_IS_ROOT flag New clk provider drivers: - Renesas r8a7796 clock pulse generator / module standby and software reset - Allwinner sun8i H3 clock controller unit - AmLogic meson8b clock controller (rewritten) - AmLogic gxbb clock controller - support for some new ICs was added by simple changes to static data tables for chips sharing the same family Driver updates of note: - the Allwinner sunxi clock driver infrastucture was rewritten to comform to the state of the art at drivers/clk/sunxi-ng. The old implementation is still supported for backwards compatibility with the DT ABI" * tag 'clk-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (162 commits) clk: Makefile: re-sort and clean up Revert "clk: gxbb: expose CLKID_MMC_PCLK" clk: samsung: Allow modular build of the Audio Subsystem CLKCON driver clk: samsung: make clk-s5pv210-audss explicitly non-modular clk: exynos5433: remove CLK_IGNORE_UNUSED flag from SPI clocks clk: oxnas: Add hardware dependencies clk: imx7d: do not set parent of ethernet time/ref clocks ARM: dt: sun8i: switch the H3 to the new CCU driver clk: sunxi-ng: h3: Fix Kconfig symbol typo clk: sunxi-ng: h3: Fix audio clock divider offset clk: sunxi-ng: Add H3 clocks clk: sunxi-ng: Add N-K-M-P factor clock clk: sunxi-ng: Add N-K-M Factor clock clk: sunxi-ng: Add N-M-factor clock support clk: sunxi-ng: Add N-K-factor clock support clk: sunxi-ng: Add M-P factor clock support clk: sunxi-ng: Add divider clk: sunxi-ng: Add phase clock support clk: sunxi-ng: Add mux clock support clk: sunxi-ng: Add gate clock support ...
2016-07-30Merge branch 'next' of ↵Michael Ellerman
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include more 8xx optimizations, device tree updates, and MVME7100 support."
2016-07-29Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit updates from Paul Moore: "Six audit patches for 4.8. There are a couple of style and minor whitespace tweaks for the logs, as well as a minor fixup to catch errors on user filter rules, however the major improvements are a fix to the s390 syscall argument masking code (reviewed by the nice s390 folks), some consolidation around the exclude filtering (less code, always a win), and a double-fetch fix for recording the execve arguments" * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit: audit: fix a double fetch in audit_log_single_execve_arg() audit: fix whitespace in CWD record audit: add fields to exclude filter by reusing user filter s390: ensure that syscall arguments are properly masked on s390 audit: fix some horrible switch statement style crimes audit: fixup: log on errors from filter user rules
2016-07-29Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - TPM core and driver updates/fixes - IPv6 security labeling (CALIPSO) - Lots of Apparmor fixes - Seccomp: remove 2-phase API, close hole where ptrace can change syscall #" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits) apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family) tpm: Factor out common startup code tpm: use devm_add_action_or_reset tpm2_i2c_nuvoton: add irq validity check tpm: read burstcount from TPM_STS in one 32-bit transaction tpm: fix byte-order for the value read by tpm2_get_tpm_pt tpm_tis_core: convert max timeouts from msec to jiffies apparmor: fix arg_size computation for when setprocattr is null terminated apparmor: fix oops, validate buffer size in apparmor_setprocattr() apparmor: do not expose kernel stack apparmor: fix module parameters can be changed after policy is locked apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: add missing id bounds check on dfa verification apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task apparmor: use list_next_entry instead of list_entry_next apparmor: fix refcount race when finding a child profile apparmor: fix ref count leak when profile sha1 hash is read apparmor: check that xindex is in trans_table bounds ...
2016-07-29Merge tag 'pm-urgent-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a nasty (and really hard to debug) memory corruption during resume from hibernation on x86-64 (that leads to a kernel panic most of the time) due to the use of a stale stack pointer value in FRAME_BEGIN (Josh Poimboeuf)" * tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/power/64: Fix hibernation return address corruption
2016-07-29Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the next part of the hotplug rework. - Convert all notifiers with a priority assigned - Convert all CPU_STARTING/DYING notifiers The final removal of the STARTING/DYING infrastructure will happen when the merge window closes. Another 700 hundred line of unpenetrable maze gone :)" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) timers/core: Correct callback order during CPU hot plug leds/trigger/cpu: Move from CPU_STARTING to ONLINE level powerpc/numa: Convert to hotplug state machine arm/perf: Fix hotplug state machine conversion irqchip/armada: Avoid unused function warnings ARC/time: Convert to hotplug state machine clocksource/atlas7: Convert to hotplug state machine clocksource/armada-370-xp: Convert to hotplug state machine clocksource/exynos_mct: Convert to hotplug state machine clocksource/arm_global_timer: Convert to hotplug state machine rcu: Convert rcutree to hotplug state machine KVM/arm/arm64/vgic-new: Convert to hotplug state machine smp/cfd: Convert core to hotplug state machine x86/x2apic: Convert to CPU hotplug state machine profile: Convert to hotplug state machine timers/core: Convert to hotplug state machine hrtimer: Convert to hotplug state machine x86/tboot: Convert to hotplug state machine arm64/armv8 deprecated: Convert to hotplug state machine hwtracing/coresight-etm4x: Convert to hotplug state machine ...
2016-07-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc updates from David Miller: 1) Double spin lock bug in sunhv serial driver, from Dan Carpenter. 2) Use correct RSS estimate when determining whether to grow the huge TSB or not, from Mike Kravetz. 3) Don't use full three level page tables for hugepages, PMD level is sufficient. From Nitin Gupta. 4) Mask out extraneous bits from TSB_TAG_ACCESS register, we only want the address bits. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Trim page tables for 8M hugepages sparc64 mm: Fix base TSB sizing when hugetlb pages are used sparc: serial: sunhv: fix a double lock bug sparc32: off by ones in BUG_ON() sparc: Don't leak context bits into thread->fault_address
2016-07-29Merge tag 'arc-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: "Things have been calm here - nothing much except for a few fixes" * tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: mm: don't loose PTE_SPECIAL in pte_modify() ARC: dma: fix address translation in arc_dma_free ARC: typo fix in mm/ioremap.c ARC: fix linux-next build breakage
2016-07-29Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: x86/power/64: Fix hibernation return address corruption
2016-07-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 Pull AVR32 updates from Hans-Christian Noren Egtvedt. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32: avr32: off by one in at32_init_pio() avr32: fixup code style in unistd.h and syscall_table.S avr32: wire up preadv2 and pwritev2 syscalls
2016-07-29Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: "Included in this update are: - Patches from Gregory Clement to fix the coherent DMA cases in our dma-mapping code. - A number of CPU errata updates and fixes. - ARM cpuidle improvements from Jisheng Zhang. - Fix from Kees for the location of _etext. - Cleanups from Masahiro Yamada to avoid duplicated messages during the kernel build, and remove CONFIG_ARCH_HAS_BARRIERS. - Remove a udelay loop limitation, allowing for faster CPUs to calibrate the delay correctly. - Cleanup some left-overs from the SW PAN implementation. - Ensure that a modified address limit is not visible to exception handlers" * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (21 commits) ARM: 8586/1: cpuidle: make arm_cpuidle_suspend() a bit more efficient ARM: 8585/1: cpuidle: fix !cpuidle_ops[cpu].init case during init ARM: 8561/4: dma-mapping: Fix the coherent case when iommu is used ARM: 8561/3: dma-mapping: Don't use outer_flush_range when the L2C is coherent ARM: 8560/1: errata: Workaround errata A12 825619 / A17 852421 ARM: 8559/1: errata: Workaround erratum A12 821420 ARM: 8558/1: errata: Workaround errata A12 818325/852422 A17 852423 ARM: save and reset the address limit when entering an exception ARM: 8577/1: Fix Cortex-A15 798181 errata initialization ARM: 8584/1: floppy: avoid gcc-6 warning ARM: 8583/1: mm: fix location of _etext ARM: 8582/1: remove unused CONFIG_ARCH_HAS_BARRIERS ARM: 8306/1: loop_udelay: remove bogomips value limitation ARM: 8581/1: add missing <asm/prom.h> to arch/arm/kernel/devtree.c ARM: 8576/1: avoid duplicating "Kernel: arch/arm/boot/*Image is ready" ARM: 8556/1: on a generic DT system: do not touch l2x0 ARM: uaccess: remove put_user() code duplication ARM: 8580/1: Remove orphaned __addr_ok() definition ARM: get rid of horrible *(unsigned int *)(regs + 1) ARM: introduce svc_pt_regs structure ...
2016-07-29sparc64: Trim page tables for 8M hugepagesNitin Gupta
For PMD aligned (8M) hugepages, we currently allocate all four page table levels which is wasteful. We now allocate till PMD level only which saves memory usage from page tables. Also, when freeing page table for 8M hugepage backed region, make sure we don't try to access non-existent PTE level. Orabug: 22630259 Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-29x86/power/64: Fix hibernation return address corruptionJosh Poimboeuf
In kernel bug 150021, a kernel panic was reported when restoring a hibernate image. Only a picture of the oops was reported, so I can't paste the whole thing here. But here are the most interesting parts: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffff8804615cfd78 ... RIP: ffff8804615cfd78 RSP: ffff8804615f0000 RBP: ffff8804615cfdc0 ... Call Trace: do_signal+0x23 exit_to_usermode_loop+0x64 ... The RIP is on the same page as RBP, so it apparently started executing on the stack. The bug was bisected to commit ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S), which in retrospect seems quite dangerous, since that code saves and restores the stack pointer from a global variable ('saved_context'). There are a lot of moving parts in the hibernate save and restore paths, so I don't know exactly what caused the panic. Presumably, a FRAME_END was executed without the corresponding FRAME_BEGIN, or vice versa. That would corrupt the return address on the stack and would be consistent with the details of the above panic. [ rjw: One major problem is that by the time the FRAME_BEGIN in restore_registers() is executed, the stack pointer value may not be valid any more. Namely, the stack area pointed to by it previously may have been overwritten by some image memory contents and that page frame may now be used for whatever different purpose it had been allocated for before hibernation. In that case, the FRAME_BEGIN will corrupt that memory. ] Instead of doing the frame pointer save/restore around the bounds of the affected functions, just do it around the call to swsusp_save(). That has the same effect of ensuring that if swsusp_save() sleeps, the frame pointers will be correct. It's also a much more obviously safe way to do it than the original patch. And objtool still doesn't report any warnings. Fixes: ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S) Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021 Cc: 4.6+ <stable@vger.kernel.org> # 4.6+ Reported-by: Andre Reinke <andre.reinke@mailbox.org> Tested-by: Andre Reinke <andre.reinke@mailbox.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-29avr32: off by one in at32_init_pio()Dan Carpenter
The pio_dev[] array has MAX_NR_PIO_DEVICES elements so the > should be >=. Fixes: 5f97f7f9400d ('[PATCH] avr32 architecture') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2016-07-29avr32: fixup code style in unistd.h and syscall_table.SHans-Christian Noren Egtvedt
This patch swaps the mix of tabs and space for alignment of comment after code to use spaces only. Also document why recvmmsg was defined twice in the syscall_table.S table, but only once in unistd.h. In short, wired in the table by generic arch patch, but forgotten in unistd.h (review slip).
2016-07-29avr32: wire up preadv2 and pwritev2 syscallsHans-Christian Noren Egtvedt
This patch wires up the new preadv2 and pwritev2 syscall on AVR32. On AVR32, all parameters beyond the 5th are passed on the stack. System calls don't use the stack -- they borrow a callee-saved register instead. This means that syscalls that take 6 parameters must be called through a stub that pushes the last parameter on the stack. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
2016-07-29arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFOJames Hogan
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined for arm64 at all even though ARCH_DLINFO will contain one NEW_AUX_ENT for the VDSO address. This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for AT_BASE_PLATFORM which arm64 doesn't use, but lets define it now and add the comment above ARCH_DLINFO as found in several other architectures to remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to date. Fixes: f668cd1673aa ("arm64: ELF definitions") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-29arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinuxArd Biesheuvel
The linker routines that we rely on to produce a relocatable PIE binary treat it as a shared ELF object in some ways, i.e., it emits symbol based R_AARCH64_ABS64 relocations into the final binary since doing so would be appropriate when linking a shared library that is subject to symbol preemption. (This means that an executable can override certain symbols that are exported by a shared library it is linked with, and that the shared library *must* update all its internal references as well, and point them to the version provided by the executable.) Symbol preemption does not occur for OS hosted PIE executables, let alone for vmlinux, and so we would prefer to get rid of these symbol based relocations. This would allow us to simplify the relocation routines, and to strip the .dynsym, .dynstr and .hash sections from the binary. (Note that these are tiny, and are placed in the .init segment, but they clutter up the vmlinux binary.) Note that these R_AARCH64_ABS64 relocations are only emitted for absolute references to symbols defined in the linker script, all other relocatable quantities are covered by anonymous R_AARCH64_RELATIVE relocations that simply list the offsets to all 64-bit values in the binary that need to be fixed up based on the offset between the link time and run time addresses. Fortunately, GNU ld has a -Bsymbolic option, which is intended for shared libraries to allow them to ignore symbol preemption, and unconditionally bind all internal symbol references to its own definitions. So set it for our PIE binary as well, and get rid of the asoociated sections and the relocation code that processes them. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: fixed conflict with __dynsym_offset linker script entry] Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-29arm64: vmlinux.lds: make __rela_offset and __dynsym_offset ABSOLUTEArd Biesheuvel
Due to the untyped KIMAGE_VADDR constant, the linker may not notice that the __rela_offset and __dynsym_offset expressions are absolute values (i.e., are not subject to relocation). This does not matter for KASLR, but it does confuse kallsyms in relative mode, since it uses the lowest non-absolute symbol address as the anchor point, and expects all other symbol addresses to be within 4 GB of it. Fix this by qualifying these expressions as ABSOLUTE() explicitly. Fixes: 0cd3defe0af4 ("arm64: kernel: perform relocation processing from ID map") Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-29MIPS: c-r4k: Use SMP calls for CM indexed cache opsJames Hogan
The MIPS Coherence Manager (CM) can propagate address-based ("hit") cache operations to other cores in the coherent system, alleviating software of the need to use SMP calls, however indexed cache operations are not propagated by hardware since doing so makes no sense for separate caches. Update r4k_op_needs_ipi() to report that only hit cache operations are globalized by the CM, requiring indexed cache operations to be globalized by software via an SMP call. r4k_on_each_cpu() previously had a special case for CONFIG_MIPS_MT_SMP, intended to avoid the SMP calls when the only other CPUs in the system were other VPEs in the same core, and hence sharing the same caches. This was changed by commit cccf34e9411c ("MIPS: c-r4k: Fix cache flushing for MT cores") to apparently handle multi-core multi-VPE systems, but it focussed mainly on hit cache ops, so the SMP calls were still disabled entirely for CM systems. This doesn't normally cause problems, but tests can be written to hit these corner cases by using multiple threads, or changing task affinities to force the process to migrate cores. For example the failure of mprotect RW->RX to globally sync icaches (via flush_cache_range) can be detected by modifying and mprotecting a code page on one core, and migrating to a different core to execute from it. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13807/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Avoid small flush_icache_range SMP callsJames Hogan
Avoid SMP calls for flushing small icache ranges. On non-CM platforms, and CM platforms too after we make r4k_on_each_cpu() take the cache op type into account, it will be called on multiple CPUs due to the possibility that local_r4k_flush_icache_range_ipi() could do non-globalized indexed cache ops. This rougly copies the range size check out into r4k_flush_icache_range(), which can disallow indexed cache ops and allow r4k_on_each_cpu() to skip the SMP call. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13805/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Local flush_icache_range cache op overrideJames Hogan
Allow the permitted cache op types used by local_r4k_flush_icache_range_ipi() to be overridden by the SMP caller. This will allow SMP calls to be avoided under certain circumstances, falling back to a single CPU performing globalized hit cache ops only. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13803/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Split r4k_flush_kernel_vmap_range()James Hogan
Split the operation of r4k_flush_kernel_vmap_range() into separate SMP callbacks for the indexed cache flush and hit cache flush cases, since the logic to determine which to use can be determined by the initiating CPU prior to doing any SMP calls. This will help when we change r4k_on_each_cpu() to distinguish indexed and hit cache ops in a later patch, preventing globalized hit cache ops being performed redundantly on multiple CPUs. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13806/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Exclude sibling CPUs in SMP callsJames Hogan
When performing SMP calls to foreign cores, exclude sibling CPUs from the provided map, as we already handle the local core on the current CPU. This prevents an SMP call from for example core 0, VPE 1 to VPE 0 on the same core. In the process the cpu_foreign_map cpumask is turned into an array of cpumasks, so that each CPU has its own version of it which excludes sibling CPUs. r4k_op_needs_ipi() is also updated to reflect that cache management SMP calls are not needed when all CPUs are siblings (i.e. there are no foreign CPUs according to the new cpu_foreign_map[] semantics which exclude siblings). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: Felix Fietkau <nbd@nbd.name> Cc: Jayachandran C. <jchandra@broadcom.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13801/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Fix valid ASID optimisationJames Hogan
Several cache operations are optimised to return early from the SMP call handler if the memory map in question has no valid ASID on the current CPU, or any online CPU in the case of MIPS_MT_SMP. The idea is that if a memory map has never been used on a CPU it shouldn't have cache lines in need of flushing. However this doesn't cover all cases when ASIDs for other CPUs need to be checked: - Offline VPEs may have recently been online and brought lines into the (shared) cache, so they should also be checked, rather than only online CPUs. - SMP systems with a Coherence Manager (CM), but with MT disabled still have globalized hit cache ops, but don't use SMP calls, so all present CPUs should be taken into account. - R6 systems have a different multithreading implementation, so MIPS_MT_SMP won't be set, but as above may still have a CM which globalizes hit cache ops. Additionally for non-globalized cache operations where an SMP call to a single VPE in each foreign core is used, it is not necessary to check every CPU in the system, only sibling CPUs sharing the same first level cache. Fix this by making has_valid_asid() take a cache op type argument like r4k_on_each_cpu(), so it can determine whether r4k_on_each_cpu() will have done SMP calls to other cores. It can then determine which set of CPUs to check the ASIDs of based on that, excluding foreign CPUs if an SMP call will have been performed. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13804/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Add r4k_on_each_cpu cache op type argJames Hogan
The r4k_on_each_cpu() function calls the specified cache flush helper on other CPUs if deemed necessary due to the cache ops not being globalized by hardware. However this really depends on the cache op addressing type, as the MIPS Coherence Manager (CM) if present will globalize "hit" cache ops (addressed by virtual address), but not "index" cache ops (addressed by cache index). This results in index cache ops only being performed on a single CPU when CM is present. Most (but not all) of the functions called by r4k_on_each_cpu() perform cache operations exclusively with a single cache op type, so add a type argument and modify the callers to pass in some combination of R4K_HIT (global kernel virtual addressing or user virtual addressing conditional upon matching active_mm) and R4K_INDEX (index into cache). This will allow r4k_on_each_cpu() to later distinguish these cases and decide whether to perform an SMP call based on it. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13798/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Avoid dcache flush for sigtrampsJames Hogan
Avoid the dcache and scache flush in local_r4k_flush_cache_sigtramp() if the icache fills straight from the dcache. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13802/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Fix sigtramp SMP call to use kmapJames Hogan
Fix r4k_flush_cache_sigtramp() and local_r4k_flush_cache_sigtramp() to flush the delay slot emulation trampoline cacheline through a kmap rather than directly when the active_mm doesn't match that of the task initiating the flush, a bit like local_r4k_flush_cache_page() does. This would fix a corner case on SMP systems without hardware globalized hit cache ops, where a migration to another CPU after the flush, where that CPU did not have the same mm active at the time of the flush, could result in stale icache content being executed instead of the trampoline, e.g. from a previous delay slot emulation with a similar stack pointer. This case was artificially triggered by replacing the icache flush with a full indexed flush (not globalized on CM systems) and forcing the SMP call to take place, with a test program that alternated two FPU delay slots with a parent process repeatedly changing scheduler affinity. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13797/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: c-r4k: Fix protected_writeback_scache_line for EVAJames Hogan
The protected_writeback_scache_line() function is used by local_r4k_flush_cache_sigtramp() to flush an FPU delay slot emulation trampoline on the userland stack from the caches so it is visible to subsequent instruction fetches. Commit de8974e3f76c ("MIPS: asm: r4kcache: Add EVA cache flushing functions") updated some protected_ cache flush functions to use EVA CACHEE instructions via protected_cachee_op(), and commit 83fd43449baa ("MIPS: r4kcache: Add EVA case for protected_writeback_dcache_line") did the same thing for protected_writeback_dcache_line(), but protected_writeback_scache_line() never got updated. Lets fix that now to flush the right user address from the secondary cache rather than some arbitrary kernel unmapped address. This issue was spotted through code inspection, and it seems unlikely to be possible to hit this in practice. It theoretically affect EVA kernels on EVA capable cores with an L2 cache, where the icache fetches straight from RAM (cpu_icache_snoops_remote_store == 0), running a hard float userland with FPU disabled (nofpu). That both Malta and Boston platforms override cpu_icache_snoops_remote_store to 1 suggests that all MIPS cores fetch instructions into icache straight from L2 rather than RAM. Fixes: de8974e3f76c ("MIPS: asm: r4kcache: Add EVA cache flushing functions") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13800/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: SMP: Drop stop_this_cpu() cpu_foreign_map hackJames Hogan
Commit cccf34e9411c ("MIPS: c-r4k: Fix cache flushing for MT cores") added the cpu_foreign_map cpumask containing a single VPE from each online core, and recalculated it when secondary CPUs are brought up. stop_this_cpu() was also updated to recalculate cpu_foreign_map, but with an additional hack before marking the CPU as offline to copy cpu_online_mask into cpu_foreign_map and perform an SMP memory barrier. This appears to have been intended to prevent cache management IPIs being missed when the VPE representing the core in cpu_foreign_map is taken offline while other VPEs remain online. Unfortunately there is nothing in this hack to prevent r4k_on_each_cpu() from reading the old cpu_foreign_map, and smp_call_function_many() from reading that new cpu_online_mask with the core's representative VPE marked offline. It then wouldn't send an IPI to any online VPEs of that core. stop_this_cpu() is only actually called in panic and system shutdown / halt / reboot situations, in which case all CPUs are going down and we don't really need to care about cache management, so drop this hack. Note that the __cpu_disable() case for CPU hotplug is handled in the previous commit, and no synchronisation is needed there due to the use of stop_machine() which prevents hotplug from taking place while any CPU has disabled preemption (as r4k_on_each_cpu() does). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13796/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: SMP: Update cpu_foreign_map on CPU disableJames Hogan
When a CPU is disabled via CPU hotplug, cpu_foreign_map is not updated. This could result in cache management SMP calls being sent to offline CPUs instead of online siblings in the same core. Add a call to calculate_cpu_foreign_map() in the various MIPS cpu disable callbacks after set_cpu_online(). All cases are updated for consistency and to keep cpu_foreign_map strictly up to date, not just those which may support hardware multithreading. Fixes: cccf34e9411c ("MIPS: c-r4k: Fix cache flushing for MT cores") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: David Daney <david.daney@cavium.com> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: Hongliang Tao <taohl@lemote.com> Cc: Hua Yan <yanh@lemote.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13799/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29MIPS: SMP: Clear ASID without confusing has_valid_asid()James Hogan
The SMP flush_tlb_*() functions may clear the memory map's ASIDs for other CPUs if the mm has only a single user (the current CPU) in order to avoid SMP calls. However this makes it appear to has_valid_asid(), which is used by various cache flush functions, as if the CPUs have never run in the mm, and therefore can't have cached any of its memory. For flush_tlb_mm() this doesn't sound unreasonable. flush_tlb_range() corresponds to flush_cache_range() which does do full indexed cache flushes, but only on the icache if the specified mapping is executable, otherwise it doesn't guarantee that there are no cache contents left for the mm. flush_tlb_page() corresponds to flush_cache_page(), which will perform address based cache ops on the specified page only, and also only touches the icache if the page is executable. It does not guarantee that there are no cache contents left for the mm. For example, this affects flush_cache_range() which uses the has_valid_asid() optimisation. It is required to flush the icache when mappings are made executable (e.g. using mprotect) so they are immediately usable. If some code is changed to non executable in order to be modified then it will not be flushed from the icache during that time, but the ASID on other CPUs may still be cleared for TLB flushing. When the code is changed back to executable, flush_cache_range() will assume the code hasn't run on those other CPUs due to the zero ASID, and won't invalidate the icache on them. This is fixed by clearing the other CPUs ASIDs to 1 instead of 0 for the above two flush_tlb_*() functions when the corresponding cache flushes are likely to be incomplete (non executable range flush, or any page flush). This ASID appears valid to has_valid_asid(), but still triggers ASID regeneration due to the upper ASID version bits being 0, which is less than the minimum ASID version of 1 and so always treated as stale. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13795/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28sparc64 mm: Fix base TSB sizing when hugetlb pages are usedMike Kravetz
do_sparc64_fault() calculates both the base and huge page RSS sizes and uses this information in calls to tsb_grow(). The calculation for base page TSB size is not correct if the task uses hugetlb pages. hugetlb pages are not accounted for in RSS, therefore the call to get_mm_rss(mm) does not include hugetlb pages. However, the number of pages based on huge_pte_count (which does include hugetlb pages) is subtracted from this value. This will result in an artificially small and often negative RSS calculation. The base TSB size is then often set to max_tsb_size as the passed RSS is unsigned, so a negative value looks really big. THP pages are also accounted for in huge_pte_count, and THP pages are accounted for in RSS so the calculation in do_sparc64_fault() is correct if a task only uses THP pages. A single huge_pte_count is not sufficient for TSB sizing if both hugetlb and THP pages can be used. Instead of a single counter, use two: one for hugetlb and one for THP. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-28Merge tag 'libnvdimm-for-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Replace pcommit with ADR / directed-flushing. The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the requirement is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. - On-demand ARS (address range scrub). Linux uses the results of the ACPI ARS commands to track bad blocks in pmem devices. When latent errors are detected we re-scrub the media to refresh the bad block list, userspace can also request a re-scrub at any time. - Support for the Microsoft DSM (device specific method) command format. - Support for EDK2/OVMF virtual disk device memory ranges. - Various fixes and cleanups across the subsystem. * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits) libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register" nfit: do an ARS scrub on hitting a latent media error nfit: move to nfit/ sub-directory nfit, libnvdimm: allow an ARS scrub to be triggered on demand libnvdimm: register nvdimm_bus devices with an nd_bus driver pmem: clarify a debug print in pmem_clear_poison x86/insn: remove pcommit Revert "KVM: x86: add pcommit support" nfit, tools/testing/nvdimm/: unify shutdown paths libnvdimm: move ->module to struct nvdimm_bus_descriptor nfit: cleanup acpi_nfit_init calling convention nfit: fix _FIT evaluation memory leak + use after free tools/testing/nvdimm: add manufacturing_{date|location} dimm properties tools/testing/nvdimm: add virtual ramdisk range acpi, nfit: treat virtual ramdisk SPA as pmem region pmem: kill __pmem address space pmem: kill wmb_pmem() libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes fs/dax: remove wmb_pmem() libnvdimm, pmem: flush posted-write queues on shutdown ...
2016-07-28Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "The rest of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits) mm, compaction: simplify contended compaction handling mm, compaction: introduce direct compaction priority mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations mm, page_alloc: make THP-specific decisions more generic mm, page_alloc: restructure direct compaction handling in slowpath mm, page_alloc: don't retry initial attempt in slowpath mm, page_alloc: set alloc_flags only once in slowpath lib/stackdepot.c: use __GFP_NOWARN for stack allocations mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB mm, kasan: account for object redzone in SLUB's nearest_obj() mm: fix use-after-free if memory allocation failed in vma_adjust() zsmalloc: Delete an unnecessary check before the function call "iput" mm/memblock.c: fix index adjustment error in __next_mem_range_rev() mem-hotplug: alloc new page from a nearest neighbor node when mem-offline mm: optimize copy_page_to/from_iter_iovec mm: add cond_resched() to generic_swapfile_activate() Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode mm: hwpoison: remove incorrect comments make __section_nr() more efficient ...
2016-07-28arm64:acpi: fix the acpi alignment exception when 'mem=' specifiedDennis Chen
When booting an ACPI enabled kernel with 'mem=x', there is the possibility that ACPI data regions from the firmware will lie above the memory limit. Ordinarily these will be removed by memblock_enforce_memory_limit(.). Unfortunately, this means that these regions will then be mapped by acpi_os_ioremap(.) as device memory (instead of normal) thus unaligned accessess will then provoke alignment faults. In this patch we adopt memblock_mem_limit_remove_map instead, and this preserves these ACPI data regions (marked NOMAP) thus ensuring that these regions are not mapped as device memory. For example, below is an alignment exception observed on ARM platform when booting the kernel with 'acpi=on mem=8G': ... Unable to handle kernel paging request at virtual address ffff0000080521e7 pgd = ffff000008aa0000 [ffff0000080521e7] *pgd=000000801fffe003, *pud=000000801fffd003, *pmd=000000801fffc003, *pte=00e80083ff1c1707 Internal error: Oops: 96000021 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc3-next-20160616+ #172 Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1001A 02/09/2016 task: ffff800001ef0000 ti: ffff800001ef8000 task.ti: ffff800001ef8000 PC is at acpi_ns_lookup+0x520/0x734 LR is at acpi_ns_lookup+0x4a4/0x734 pc : [<ffff0000083b8b10>] lr : [<ffff0000083b8a94>] pstate: 60000045 sp : ffff800001efb8b0 x29: ffff800001efb8c0 x28: 000000000000001b x27: 0000000000000001 x26: 0000000000000000 x25: ffff800001efb9e8 x24: ffff000008a10000 x23: 0000000000000001 x22: 0000000000000001 x21: ffff000008724000 x20: 000000000000001b x19: ffff0000080521e7 x18: 000000000000000d x17: 00000000000038ff x16: 0000000000000002 x15: 0000000000000007 x14: 0000000000007fff x13: ffffff0000000000 x12: 0000000000000018 x11: 000000001fffd200 x10: 00000000ffffff76 x9 : 000000000000005f x8 : ffff000008725fa8 x7 : ffff000008a8df70 x6 : ffff000008a8df70 x5 : ffff000008a8d000 x4 : 0000000000000010 x3 : 0000000000000010 x2 : 000000000000000c x1 : 0000000000000006 x0 : 0000000000000000 ... acpi_ns_lookup+0x520/0x734 acpi_ds_load1_begin_op+0x174/0x4fc acpi_ps_build_named_op+0xf8/0x220 acpi_ps_create_op+0x208/0x33c acpi_ps_parse_loop+0x204/0x838 acpi_ps_parse_aml+0x1bc/0x42c acpi_ns_one_complete_parse+0x1e8/0x22c acpi_ns_parse_table+0x8c/0x128 acpi_ns_load_table+0xc0/0x1e8 acpi_tb_load_namespace+0xf8/0x2e8 acpi_load_tables+0x7c/0x110 acpi_init+0x90/0x2c0 do_one_initcall+0x38/0x12c kernel_init_freeable+0x148/0x1ec kernel_init+0x10/0xec ret_from_fork+0x10/0x40 Code: b9009fbc 2a00037b 36380057 3219037b (b9400260) ---[ end trace 03381e5eb0a24de4 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b With 'efi=debug', we can see those ACPI regions loaded by firmware on that board as: efi: 0x0083ff185000-0x0083ff1b4fff [Reserved | | | | | | | | |WB|WT|WC|UC]* efi: 0x0083ff1b5000-0x0083ff1c2fff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC]* efi: 0x0083ff223000-0x0083ff224fff [ACPI Memory NVS | | | | | | | | |WB|WT|WC|UC]* Link: http://lkml.kernel.org/r/1468475036-5852-3-git-send-email-dennis.chen@arm.com Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Dennis Chen <dennis.chen@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Kaly Xin <kaly.xin@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28mm: move most file-based accounting to the nodeMel Gorman
There are now a number of accounting oddities such as mapped file pages being accounted for on the node while the total number of file pages are accounted on the zone. This can be coped with to some extent but it's confusing so this patch moves the relevant file-based accounted. Due to throttling logic in the page allocator for reliable OOM detection, it is still necessary to track dirty and writeback pages on a per-zone basis. [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting] Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28mm: move page mapped accounting to the nodeMel Gorman
Reclaim makes decisions based on the number of pages that are mapped but it's mixing node and zone information. Account NR_FILE_MAPPED and NR_ANON_PAGES pages on the node. Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28mm, vmscan: move LRU lists to nodeMel Gorman
This moves the LRU lists from the zone to the node and related data such as counters, tracing, congestion tracking and writeback tracking. Unfortunately, due to reclaim and compaction retry logic, it is necessary to account for the number of LRU pages on both zone and node logic. Most reclaim logic is based on the node counters but the retry logic uses the zone counters which do not distinguish inactive and active sizes. It would be possible to leave the LRU counters on a per-zone basis but it's a heavier calculation across multiple cache lines that is much more frequent than the retry checks. Other than the LRU counters, this is mostly a mechanical patch but note that it introduces a number of anomalies. For example, the scans are per-zone but using per-node counters. We also mark a node as congested when a zone is congested. This causes weird problems that are fixed later but is easier to review. In the event that there is excessive overhead on 32-bit systems due to the nodes being on LRU then there are two potential solutions 1. Long-term isolation of highmem pages when reclaim is lowmem When pages are skipped, they are immediately added back onto the LRU list. If lowmem reclaim persisted for long periods of time, the same highmem pages get continually scanned. The idea would be that lowmem keeps those pages on a separate list until a reclaim for highmem pages arrives that splices the highmem pages back onto the LRU. It potentially could be implemented similar to the UNEVICTABLE list. That would reduce the skip rate with the potential corner case is that highmem pages have to be scanned and reclaimed to free lowmem slab pages. 2. Linear scan lowmem pages if the initial LRU shrink fails This will break LRU ordering but may be preferable and faster during memory pressure than skipping LRU pages. Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: fat: fix error message for bogus number of directory entries fat: fix typo s/supeblock/superblock/ ASoC: max9877: Remove unused function declaration dw2102: don't output spurious blank lines to the kernel log init: fix Kconfig text ARM: io: fix comment grammar ocfs: fix ocfs2_xattr_user_get() argument name scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()
2016-07-28ARC: mm: don't loose PTE_SPECIAL in pte_modify()Vineet Gupta
LTP madvise05 was generating mm splat | [ARCLinux]# /sd/ltp/testcases/bin/madvise05 | BUG: Bad page map in process madvise05 pte:80e08211 pmd:9f7d4000 | page:9fdcfc90 count:1 mapcount:-1 mapping: (null) index:0x0 flags: 0x404(referenced|reserved) | page dumped because: bad pte | addr:200b8000 vm_flags:00000070 anon_vma: (null) mapping: (null) index:1005c | file: (null) fault: (null) mmap: (null) readpage: (null) | CPU: 2 PID: 6707 Comm: madvise05 And for newer kernels, the system was rendered unusable afterwards. The problem was mprotect->pte_modify() clearing PTE_SPECIAL (which is set to identify the special zero page wired to the pte). When pte was finally unmapped, special casing for zero page was not done, and instead it was treated as a "normal" page, tripping on the map counts etc. This fixes ARC STAR 9001053308 Cc: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-07-28Merge branches 'cpuidle', 'fixes' and 'misc' into for-linusRussell King
2016-07-28MIPS: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFOJames Hogan
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined for MIPS at all even though ARCH_DLINFO will contain one NEW_AUX_ENT for the VDSO address. This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for AT_BASE_PLATFORM which MIPS doesn't use, but lets define it now and add the comment above ARCH_DLINFO as found in several other architectures to remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to date. Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13823/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28MIPS: Octeon: Improve USB reset code for OCTEON II.Steven J. Hill
At boot time, do a better job of resetting the USB host controller to make the frequency "eye" diagram more compliant with the USB standard while making the controller more reliable. Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13831/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28MIPS: Octeon: Put restrictions on DMA descriptors.Steven J. Hill
Set the DMA mask such that all descriptors stay in the lower 4GB of memory. Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13830/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28MIPS: Octeon: Remove forced mappings of USB interrupts.Steven J. Hill
Get rid of unnecessary forced interrupt mappings for the USB host controller on OCTEON II. Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13824/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspaceDavid Howells
MIPS64 needs to use compat_sys_keyctl for 32-bit userspace rather than calling sys_keyctl. The latter will work in a lot of cases, thereby hiding the issue. Reported-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: keyrings@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13832/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28MIPS: Print segment physical address when EU=1James Hogan
Currently the debugfs interface to print the segment configuration refuses to print the physical address of mapped segments. However if the EU bit is set these become unmapped at error level (when CP0_Status.ERL=1), so the physical address is still relevant. Update the logic to print the physical address of mapped segments when the EU bit is set, while still hiding the Cache Coherency Attribute (since EU overrides that to uncached when ERL=1 too). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13833/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28KVM: PPC: Book3S HV: Save/restore TM state in H_CEDEPaul Mackerras
It turns out that if the guest does a H_CEDE while the CPU is in a transactional state, and the H_CEDE does a nap, and the nap loses the architected state of the CPU (which is is allowed to do), then we lose the checkpointed state of the virtual CPU. In addition, the transactional-memory state recorded in the MSR gets reset back to non-transactional, and when we try to return to the guest, we take a TM bad thing type of program interrupt because we are trying to transition from non-transactional to transactional with a hrfid instruction, which is not permitted. The result of the program interrupt occurring at that point is that the host CPU will hang in an infinite loop with interrupts disabled. Thus this is a denial of service vulnerability in the host which can be triggered by any guest (and depending on the guest kernel, it can potentially triggered by unprivileged userspace in the guest). This vulnerability has been assigned the ID CVE-2016-5412. To fix this, we save the TM state before napping and restore it on exit from the nap, when handling a H_CEDE in real mode. The case where H_CEDE exits to host virtual mode is already OK (as are other hcalls which exit to host virtual mode) because the exit path saves the TM state. Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>