summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-05-20Merge tag 'devicetree-for-4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: - Rewrite of the unflattening code to avoid recursion and lessen the stack usage. - Rewrite of the phandle args parsing code to get rid of the fixed args size. This is needed for IOMMU code. - Sync to latest dtc which adds more dts style checking. These warnings are enabled with "W=1" compiles. - Tegra documentation updates related to the above warnings. - A bunch of spelling and other doc fixes. - Various vendor prefix additions. * tag 'devicetree-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (52 commits) devicetree: Add Creative Technology vendor id gpio: dt-bindings: add ibm,ppc4xx-gpio binding of/unittest: Remove unnecessary module.h header inclusion drivers/of: Fix build warning in populate_node() drivers/of: Fix depth when unflattening devicetree of: dynamic: changeset prop-update revert fix drivers/of: Export of_detach_node() drivers/of: Return allocated memory from of_fdt_unflatten_tree() drivers/of: Specify parent node in of_fdt_unflatten_tree() drivers/of: Rename unflatten_dt_node() drivers/of: Avoid recursively calling unflatten_dt_node() drivers/of: Split unflatten_dt_node() of: include errno.h in of_graph.h of: document refcount incrementation of of_get_cpu_node() Documentation: dt: soc: fix spelling mistakes Documentation: dt: power: fix spelling mistake Documentation: dt: pinctrl: fix spelling mistake Documentation: dt: opp: fix spelling mistake Documentation: dt: net: fix spelling mistakes Documentation: dt: mtd: fix spelling mistake ...
2016-05-20Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Primary 4.7 merge window changes - Updates to the new Intel X722 iWARP driver - Updates to the hfi1 driver - Fixes for the iw_cxgb4 driver - Misc core fixes - Generic RDMA READ/WRITE API addition - SRP updates - Misc ipoib updates - Minor mlx5 updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits) IB/mlx5: Fire the CQ completion handler from tasklet net/mlx5_core: Use tasklet for user-space CQ completion events IB/core: Do not require CAP_NET_ADMIN for packet sniffing IB/mlx4: Fix unaligned access in send_reply_to_slave IB/mlx5: Report Scatter FCS device capability when supported IB/mlx5: Add Scatter FCS support for Raw Packet QP IB/core: Add Scatter FCS create flag IB/core: Add Raw Scatter FCS device capability IB/core: Add extended device capability flags i40iw: pass hw_stats by reference rather than by value i40iw: Remove unnecessary synchronize_irq() before free_irq() i40iw: constify i40iw_vf_cqp_ops structure IB/mlx5: Add UARs write-combining and non-cached mapping IB/mlx5: Allow mapping the free running counter on PROT_EXEC IB/mlx4: Use list_for_each_entry_safe IB/SA: Use correct free function IB/core: Fix a potential array overrun in CMA and SA agent IB/core: Remove unnecessary check in ibnl_rcv_msg IB/IWPM: Fix a potential skb leak RDMA/nes: replace custom print_hex_dump() ...
2016-05-21drm: Nuke ->vblank_disable_allowedDaniel Vetter
This was added in commit 0a3e67a4caac273a3bfc4ced3da364830b1ab241 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Tue Sep 30 12:14:26 2008 -0700 drm: Rework vblank-wait handling to allow interrupt reduction. to stay backwards-compatible with old UMS code that didn't even tell the kernel when it did a modeset, so that the kernel could save/restore vblank counters. At worst this means vblanks will be somewhat funky on a setup that very likely no one still runs. So let's just nuke it. Plan B would be to set it unconditionally in drm_vblank_init for kms drivers, instead of in each driver separately. So if this patch breaks anything please only restore the hunks in drmP.h and drm_irq.c, plus add a check for DRIVER_MODESET in drm_vblank_init. Stumbled over this in a discussion on irc with Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Eric Anholt <eric@anholt.net> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Mark Yao <mark.yao@rock-chips.com> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Liviu Dudau <Liviu.Dudau@arm.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Tested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-20mlx5: avoid unused variable warningArnd Bergmann
When CONFIG_NET_CLS_ACT is disabled, we get a new warning in the mlx5 ethernet driver because the tc_for_each_action() loop never references the iterator: mellanox/mlx5/core/en_tc.c: In function 'mlx5e_stats_flower': mellanox/mlx5/core/en_tc.c:431:20: error: unused variable 'a' [-Werror=unused-variable] struct tc_action *a; This changes the dummy tc_for_each_action() macro by adding a cast to void, letting the compiler know that the variable is intentionally declared but not used here. I could not come up with a nicer workaround, but this seems to do the trick. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: aad7e08d39bd ("net/mlx5e: Hardware offloaded flower filter statistics support") Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef") Acked-By: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20Merge tag 'mfd-for-linus-4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "New Drivers: - Add new driver for MAXIM MAX77620/MAX20024 PMIC - Add new driver for Hisilicon HI665X PMIC New Device Support: - Add support for AXP809 in axp20x-rsb - Add support for Power Supply in axp20x New core features: - devm_mfd_* managed resources Fix-ups: - Remove unused code (da9063-irq, wm8400-core, tps6105x, smsc-ece1099, twl4030-power) - Improve clean-up in error path (intel_quark_i2c_gpio) - Explicitly include headers (syscon.h) - Allow building as modules (max77693) - Use IS_ENABLED() instead of rolling your own (dm355evm_msp, wm8400-core) - DT adaptions (axp20x, hi655x, arizona, max77620) - Remove CLK_IS_ROOT flag (intel-lpss, intel_quark) - Move to gpiochip API (asic3, dm355evm_msp, htc-egpio, htc-i2cpld, sm501, tc6393xb, tps65010, ucb1x00, vexpress) - Make use of devm_mfd_* calls (act8945a, as3711, atmel-hlcdc, bcm590xx, hi6421-pmic-core, lp3943, menf21bmc, mt6397, rdc321x, rk808, rn5t618, rt5033, sky81452, stw481x, tps6507x, tps65217, wm8400) Bug Fixes" - Fix ACPI child matching (mfd-core) - Fix start-up ordering issues (mt6397-core, arizona-core) - Fix forgotten register state on resume (intel-lpss) - Fix Clock related issues (twl6040) - Fix scheduling whilst atomic (omap-usb-tll) - Kconfig changes (vexpress)" * tag 'mfd-for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (73 commits) mfd: hi655x: Add MFD driver for hi655x mfd: ab8500-debugfs: Trivial fix of spelling mistake on "between" mfd: vexpress: Add !ARCH_USES_GETTIMEOFFSET dependency mfd: Add device-tree binding doc for PMIC MAX77620/MAX20024 mfd: max77620: Add core driver for MAX77620/MAX20024 mfd: arizona: Add defines for GPSW values that can be used from DT mfd: omap-usb-tll: Fix scheduling while atomic BUG mfd: wm5110: ARIZONA_CLOCK_CONTROL should be volatile mfd: axp20x: Add a cell for the ac power_supply part of the axp20x PMICs mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table correctly mfd: wl1273-core: Use devm_mfd_add_devices() for mfd_device registration mfd: tps65910: Use devm_mfd_add_devices and devm_regmap_add_irq_chip mfd: sec: Use devm_mfd_add_devices and devm_regmap_add_irq_chip mfd: rc5t583: Use devm_mfd_add_devices and devm_request_threaded_irq mfd: max77686: Use devm_mfd_add_devices and devm_regmap_add_irq_chip mfd: as3722: Use devm_mfd_add_devices and devm_regmap_add_irq_chip mfd: twl4030-power: Remove driver path in file comment MAINTAINERS: Add entry for X-Powers AXP family PMIC drivers mfd: smsc-ece1099: Remove unnecessarily remove callback mfd: Use IS_ENABLED(CONFIG_FOO) instead of checking FOO || FOO_MODULE ...
2016-05-20Merge tag 'fbdev-4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev updates from Tomi Valkeinen: - imxfb: fix lcd power up - small fixes and cleanups * tag 'fbdev-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: fbdev: Use IS_ENABLED() instead of checking for built-in or module efifb: Don't show the mapping VA video: AMBA CLCD: Remove unncessary include in amba-clcd.c fbdev: ssd1307fb: Fix charge pump setting Documentation: fb: fix spelling mistakes fbdev: fbmem: implement error handling in fbmem_init() fbdev: sh_mipi_dsi: remove driver video: fbdev: imxfb: add some error handling video: fbdev: imxfb: fix semantic of .get_power and .set_power video: fbdev: omap2: Remove deprecated regulator_can_change_voltage() usage
2016-05-20Merge tag 'powerpc-4.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." * tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits) powerpc/86xx: Fix PCI interrupt map definition powerpc/86xx: Move pci1 definition to the include file powerpc/fsl: Fix build of the dtb embedded kernel images powerpc/fsl: Fix rcpm compatible string powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC powerpc/fsl-pci: Add a workaround for PCI 5 errata powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb powerpc/powernv/npu: Add PE to PHB's list powerpc/powernv: Fix insufficient memory allocation powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" powerpc/powernv/npu: Enable NVLink pass through powerpc/powernv/npu: Rework TCE Kill handling powerpc/powernv/npu: Add set/unset window helpers powerpc/powernv/ioda2: Export debug helper pe_level_printk() ...
2016-05-20KVM: arm/arm64: vgic-new: implement mapped IRQ handlingAndre Przywara
We now store the mapped hardware IRQ number in our struct, so we don't need the irq_phys_map for the new VGIC. Implement the hardware IRQ mapping on top of the reworked arch timer interface. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: vgic_init: implement map_resourcesEric Auger
map_resources is the last initialization step. It is executed on first VCPU run. At that stage the code checks that userspace has provided the base addresses for the relevant VGIC regions, which depend on the type of VGIC that is exposed to the guest. Also we check if the two regions overlap. If the checks succeeded, we register the respective register frames with the kvm_io_bus framework. If we emulate a GICv2, the function also forces vgic_init execution if it has not been executed yet. Also we map the virtual GIC CPU interface onto the guest's CPU interface. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: vgic_init: implement vgic_initEric Auger
This patch allocates and initializes the data structures used to model the vgic distributor and virtual cpu interfaces. At that stage the number of IRQs and number of virtual CPUs is frozen. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: vgic_init: implement vgic_createEric Auger
This patch implements the vgic_creation function which is called on CREATE_IRQCHIP VM IOCTL (v2 only) or KVM_CREATE_DEVICE Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_initEric Auger
Implements kvm_vgic_hyp_init and vgic_probe function. This uses the new firmware independent VGIC probing to support both ACPI and DT based systems (code from Marc Zyngier). The vgic_global struct is enriched with new fields populated by those functions. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: vgic_kvm_device: implement kvm_vgic_addrEric Auger
kvm_vgic_addr is used by the userspace to set the base address of the following register regions, as seen by the guest: - distributor(v2 and v3), - re-distributors (v3), - CPU interface (v2). Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add GICv3 SGI system register trap handlerAndre Przywara
In contrast to GICv2 SGIs in a GICv3 implementation are not triggered by a MMIO write, but with a system register write. KVM knows about that register already, we just need to implement the handler and wire it up to the core KVM/ARM code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add CTLR, TYPER and IIDR handlersMarc Zyngier
Those three registers are v2 emulation specific, so their implementation lives entirely in vgic-mmio-v2.c. Also they are handled in one function, as their implementation is pretty simple. When the guest enables the distributor, we kick all VCPUs to get potentially pending interrupts serviced. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add MMIO handling frameworkMarc Zyngier
Add an MMIO handling framework to the VGIC emulation: Each register is described by its offset, size (or number of bits per IRQ, if applicable) and the read/write handler functions. We provide initialization macros to describe each GIC register later easily. Separate dispatch functions for read and write accesses are connected to the kvm_io_bus framework and binary-search for the responsible register handler based on the offset address within the region. We convert the incoming data (referenced by a pointer) to the host's endianess and use pass-by-value to hand the data over to the actual handler functions. The register handler prototype and the endianess conversion are courtesy of Christoffer Dall. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Implement kvm_vgic_vcpu_pending_irqEric Auger
Tell KVM whether a particular VCPU has an IRQ that needs handling in the guest. This is used to decide whether a VCPU is runnable. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-20KVM: arm/arm64: vgic-new: Add GICv3 world switch backendMarc Zyngier
As the GICv3 virtual interface registers differ from their GICv2 siblings, we need different handlers for processing maintenance interrupts and reading/writing to the LRs. Implement the respective handler functions and connect them to existing code to be called if the host is using a GICv3. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add GICv2 world switch backendMarc Zyngier
Processing maintenance interrupts and accessing the list registers are dependent on the host's GIC version. Introduce vgic-v2.c to contain GICv2 specific functions. Implement the GICv2 specific code for syncing the emulation state into the VGIC registers. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Add IRQ sync/flush frameworkMarc Zyngier
Implement the framework for syncing IRQs between our emulation and the list registers, which represent the guest's view of IRQs. This is done in kvm_vgic_flush_hwstate and kvm_vgic_sync_hwstate, which gets called on guest entry and exit. The code talking to the actual GICv2/v3 hardware is added in the following patches. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: Implement virtual IRQ injectionChristoffer Dall
Provide a vgic_queue_irq_unlock() function which decides whether a given IRQ needs to be queued to a VCPU's ap_list. This should be called whenever an IRQ becomes pending or enabled, either as a result of userspace injection, from in-kernel emulated devices like the architected timer or from MMIO accesses to the distributor emulation. Also provides the necessary functions to allow userland to inject an IRQ to a guest. Since this is the first code that starts using our locking mechanism, we add some (hopefully) clear documentation of our locking strategy and requirements along with this patch. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: vgic-new: Add data structure definitionsChristoffer Dall
Add a new header file for the new and improved GIC implementation. The big change is that we now have a struct vgic_irq per IRQ instead of spreading all the information over various bitmaps. We include this new header conditionally from within the old header file for the time being to avoid touching all the users. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-20KVM: arm/arm64: move GICv2 emulation defines into arm-gic-v3.hAndre Przywara
As (some) GICv3 hosts can emulate a GICv2, some GICv2 specific masks for the list register definition also apply to GICv3 LRs. At the moment we have those definitions in the KVM VGICv3 implementation, so let's move them into the GICv3 header file to have them automatically defined. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-20KVM: arm/arm64: pmu: abstract access to number of SPIsAndre Przywara
Currently the PMU uses a member of the struct vgic_dist directly, which not only breaks abstraction, but will fail with the new VGIC. Abstract this access in the VGIC header file and refactor the validity check in the PMU code. Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Get rid of vgic_cpu->nr_lrChristoffer Dall
The number of list registers is a property of the underlying system, not of emulated VGIC CPU interface. As we are about to move this variable to global state in the new vgic for clarity, move it from the legacy implementation as well to make the merge of the new code easier. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Move timer IRQ map to latest possible timeChristoffer Dall
We are about to modify the VGIC to allocate all data structures dynamically and store mapped IRQ information on a per-IRQ struct, which is indeed allocated dynamically at init time. Therefore, we cannot record the mapped IRQ info from the timer at timer reset time like it's done now, because VCPU reset happens before timer init. A possible later time to do this is on the first run of a per VCPU, it just requires us to move the enable state to be a per-VCPU state and do the lookup of the physical IRQ number when we are about to run the VCPU. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: vgic: Remove irq_phys_map from interfaceAndre Przywara
Now that the virtual arch timer does not care about the irq_phys_map anymore, let's rework kvm_vgic_map_phys_irq() to return an error value instead. Any reference to that mapping can later be done by passing the correct combination of VCPU and virtual IRQ number. This makes the irq_phys_map handling completely private to the VGIC code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: arch_timer: Remove irq_phys_mapAndre Przywara
Now that the interface between the arch timer and the VGIC does not require passing the irq_phys_map entry pointer anymore, let's remove it from the virtual arch timer and use the virtual IRQ number instead directly. The remaining pointer returned by kvm_vgic_map_phys_irq() will be removed in the following patch. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: Remove the IRQ field from struct irq_phys_mapChristoffer Dall
The communication of a Linux IRQ number from outside the VGIC to the vgic was a leftover from the day when the vgic code cared about how a particular device injects virtual interrupts mapped to a physical interrupt. We can safely remove this notion, leaving all physical IRQ handling to be done in the device driver (the arch timer in this case), which makes room for a saner API for the new VGIC. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org>
2016-05-20KVM: arm/arm64: vgic: avoid map in kvm_vgic_unmap_phys_irq()Andre Przywara
kvm_vgic_unmap_phys_irq() only needs the virtual IRQ number, so let's just pass that between the arch timer and the VGIC to get rid of the irq_phys_map pointer. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic: avoid map in kvm_vgic_map_is_active()Andre Przywara
For getting the active state of a mapped IRQ, we actually only need the virtual IRQ number, not the pointer to the mapping entry. Pass the virtual IRQ number from the arch timer to the VGIC directly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic: avoid map in kvm_vgic_inject_mapped_irq()Andre Przywara
When we want to inject a hardware mapped IRQ into a guest, we actually only need the virtual IRQ number from the irq_phys_map. So let's pass this number directly from the arch timer to the VGIC to avoid using the map as a parameter. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20Merge tag 'perf-core-for-mingo-20160516' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Honour the kernel.perf_event_max_stack knob more precisely by not counting PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo) - Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim) - Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim) - Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim) - Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen) - Store vdso buildid unconditionally, as it appears in callchains and we're not checking those when creating the build-id table, so we end up not being able to resolve VDSO symbols when doing analysis on a different machine than the one where recording was done, possibly of a different arch even (arm -> x86_64) (He Kuang) Infrastructure changes: - Generalize max_stack sysctl handler, will be used for configuring multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo) Cleanups: - Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using open coded strings (Masami Hiramatsu) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - fsnotify fix - poll() timeout fix - a few scripts/ tweaks - debugobjects updates - the (small) ocfs2 queue - Minor fixes to kernel/padata.c - Maybe half of the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) mm, page_alloc: restore the original nodemask if the fast path allocation failed mm, page_alloc: uninline the bad page part of check_new_page() mm, page_alloc: don't duplicate code in free_pcp_prepare mm, page_alloc: defer debugging checks of pages allocated from the PCP mm, page_alloc: defer debugging checks of freed pages until a PCP drain cpuset: use static key better and convert to new API mm, page_alloc: inline pageblock lookup in page free fast paths mm, page_alloc: remove unnecessary variable from free_pcppages_bulk mm, page_alloc: pull out side effects from free_pages_check mm, page_alloc: un-inline the bad part of free_pages_check mm, page_alloc: check multiple page fields with a single branch mm, page_alloc: remove field from alloc_context mm, page_alloc: avoid looking up the first zone in a zonelist twice mm, page_alloc: shortcut watermark checks for order-0 pages mm, page_alloc: reduce cost of fair zone allocation policy retry mm, page_alloc: shorten the page allocator fast path mm, page_alloc: check once if a zone has isolated pageblocks mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath mm, page_alloc: simplify last cpupid reset mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() ...
2016-05-19cpuset: use static key better and convert to new APIVlastimil Babka
An important function for cpusets is cpuset_node_allowed(), which optimizes on the fact if there's a single root CPU set, it must be trivially allowed. But the check "nr_cpusets() <= 1" doesn't use the cpusets_enabled_key static key the right way where static keys eliminate branching overhead with jump labels. This patch converts it so that static key is used properly. It's also switched to the new static key API and the checking functions are converted to return bool instead of int. We also provide a new variant __cpuset_zone_allowed() which expects that the static key check was already done and they key was enabled. This is needed for get_page_from_freelist() where we want to also avoid the relatively slower check when ALLOC_CPUSET is not set in alloc_flags. The impact on the page allocator microbenchmark is less than expected but the cleanup in itself is worthwhile. 4.6.0-rc2 4.6.0-rc2 multcheck-v1r20 cpuset-v1r20 Min alloc-odr0-1 348.00 ( 0.00%) 348.00 ( 0.00%) Min alloc-odr0-2 254.00 ( 0.00%) 254.00 ( 0.00%) Min alloc-odr0-4 213.00 ( 0.00%) 213.00 ( 0.00%) Min alloc-odr0-8 186.00 ( 0.00%) 183.00 ( 1.61%) Min alloc-odr0-16 173.00 ( 0.00%) 171.00 ( 1.16%) Min alloc-odr0-32 166.00 ( 0.00%) 163.00 ( 1.81%) Min alloc-odr0-64 162.00 ( 0.00%) 159.00 ( 1.85%) Min alloc-odr0-128 160.00 ( 0.00%) 157.00 ( 1.88%) Min alloc-odr0-256 169.00 ( 0.00%) 166.00 ( 1.78%) Min alloc-odr0-512 180.00 ( 0.00%) 180.00 ( 0.00%) Min alloc-odr0-1024 188.00 ( 0.00%) 187.00 ( 0.53%) Min alloc-odr0-2048 194.00 ( 0.00%) 193.00 ( 0.52%) Min alloc-odr0-4096 199.00 ( 0.00%) 198.00 ( 0.50%) Min alloc-odr0-8192 202.00 ( 0.00%) 201.00 ( 0.50%) Min alloc-odr0-16384 203.00 ( 0.00%) 202.00 ( 0.49%) Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Zefan Li <lizefan@huawei.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm, page_alloc: inline pageblock lookup in page free fast pathsMel Gorman
The function call overhead of get_pfnblock_flags_mask() is measurable in the page free paths. This patch uses an inlined version that is faster. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm, page_alloc: avoid looking up the first zone in a zonelist twiceMel Gorman
The allocator fast path looks up the first usable zone in a zonelist and then get_page_from_freelist does the same job in the zonelist iterator. This patch preserves the necessary information. 4.6.0-rc2 4.6.0-rc2 fastmark-v1r20 initonce-v1r20 Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%) Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%) Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%) Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%) Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%) Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%) Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%) Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%) Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%) Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%) Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%) Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%) Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%) Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%) Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%) The benefit is negligible and the results are within the noise but each cycle counts. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm, page_alloc: simplify last cpupid resetMel Gorman
The current reset unnecessarily clears flags and makes pointless calculations. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm, page_alloc: convert alloc_flags to unsignedMel Gorman
alloc_flags is a bitmask of flags but it is signed which does not necessarily generate the best code depending on the compiler. Even without an impact, it makes more sense that this be unsigned. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm, page_alloc: inline the fast path of the zonelist iteratorMel Gorman
The page allocator iterates through a zonelist for zones that match the addressing limitations and nodemask of the caller but many allocations will not be restricted. Despite this, there is always functional call overhead which builds up. This patch inlines the optimistic basic case and only calls the iterator function for the complex case. A hindrance was the fact that cpuset_current_mems_allowed is used in the fastpath as the allowed nodemask even though all nodes are allowed on most systems. The patch handles this by only considering cpuset_current_mems_allowed if a cpuset exists. As well as being faster in the fast-path, this removes some junk in the slowpath. The performance difference on a page allocator microbenchmark is; 4.6.0-rc2 4.6.0-rc2 statinline-v1r20 optiter-v1r20 Min alloc-odr0-1 412.00 ( 0.00%) 382.00 ( 7.28%) Min alloc-odr0-2 301.00 ( 0.00%) 282.00 ( 6.31%) Min alloc-odr0-4 247.00 ( 0.00%) 233.00 ( 5.67%) Min alloc-odr0-8 215.00 ( 0.00%) 203.00 ( 5.58%) Min alloc-odr0-16 199.00 ( 0.00%) 188.00 ( 5.53%) Min alloc-odr0-32 191.00 ( 0.00%) 182.00 ( 4.71%) Min alloc-odr0-64 187.00 ( 0.00%) 177.00 ( 5.35%) Min alloc-odr0-128 185.00 ( 0.00%) 175.00 ( 5.41%) Min alloc-odr0-256 193.00 ( 0.00%) 184.00 ( 4.66%) Min alloc-odr0-512 207.00 ( 0.00%) 197.00 ( 4.83%) Min alloc-odr0-1024 213.00 ( 0.00%) 203.00 ( 4.69%) Min alloc-odr0-2048 220.00 ( 0.00%) 209.00 ( 5.00%) Min alloc-odr0-4096 226.00 ( 0.00%) 214.00 ( 5.31%) Min alloc-odr0-8192 229.00 ( 0.00%) 218.00 ( 4.80%) Min alloc-odr0-16384 229.00 ( 0.00%) 219.00 ( 4.37%) perf indicated that next_zones_zonelist disappeared in the profile and __next_zones_zonelist did not appear. This is expected as the micro-benchmark would hit the inlined fast-path every time. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm, page_alloc: inline zone_statisticsMel Gorman
zone_statistics has one call-site but it's a public function. Make it static and inline. The performance difference on a page allocator microbenchmark is; 4.6.0-rc2 4.6.0-rc2 statbranch-v1r20 statinline-v1r20 Min alloc-odr0-1 419.00 ( 0.00%) 412.00 ( 1.67%) Min alloc-odr0-2 305.00 ( 0.00%) 301.00 ( 1.31%) Min alloc-odr0-4 250.00 ( 0.00%) 247.00 ( 1.20%) Min alloc-odr0-8 219.00 ( 0.00%) 215.00 ( 1.83%) Min alloc-odr0-16 203.00 ( 0.00%) 199.00 ( 1.97%) Min alloc-odr0-32 195.00 ( 0.00%) 191.00 ( 2.05%) Min alloc-odr0-64 191.00 ( 0.00%) 187.00 ( 2.09%) Min alloc-odr0-128 189.00 ( 0.00%) 185.00 ( 2.12%) Min alloc-odr0-256 198.00 ( 0.00%) 193.00 ( 2.53%) Min alloc-odr0-512 210.00 ( 0.00%) 207.00 ( 1.43%) Min alloc-odr0-1024 216.00 ( 0.00%) 213.00 ( 1.39%) Min alloc-odr0-2048 221.00 ( 0.00%) 220.00 ( 0.45%) Min alloc-odr0-4096 227.00 ( 0.00%) 226.00 ( 0.44%) Min alloc-odr0-8192 232.00 ( 0.00%) 229.00 ( 1.29%) Min alloc-odr0-16384 232.00 ( 0.00%) 229.00 ( 1.29%) Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm, page_alloc: use new PageAnonHead helper in the free page fast pathMel Gorman
The PageAnon check always checks for compound_head but this is a relatively expensive check if the caller already knows the page is a head page. This patch creates a helper and uses it in the page free path which only operates on head pages. With this patch and "Only check PageCompound for high-order pages", the performance difference on a page allocator microbenchmark is; 4.6.0-rc2 4.6.0-rc2 vanilla nocompound-v1r20 Min alloc-odr0-1 425.00 ( 0.00%) 417.00 ( 1.88%) Min alloc-odr0-2 313.00 ( 0.00%) 308.00 ( 1.60%) Min alloc-odr0-4 257.00 ( 0.00%) 253.00 ( 1.56%) Min alloc-odr0-8 224.00 ( 0.00%) 221.00 ( 1.34%) Min alloc-odr0-16 208.00 ( 0.00%) 205.00 ( 1.44%) Min alloc-odr0-32 199.00 ( 0.00%) 199.00 ( 0.00%) Min alloc-odr0-64 195.00 ( 0.00%) 193.00 ( 1.03%) Min alloc-odr0-128 192.00 ( 0.00%) 191.00 ( 0.52%) Min alloc-odr0-256 204.00 ( 0.00%) 200.00 ( 1.96%) Min alloc-odr0-512 213.00 ( 0.00%) 212.00 ( 0.47%) Min alloc-odr0-1024 219.00 ( 0.00%) 219.00 ( 0.00%) Min alloc-odr0-2048 225.00 ( 0.00%) 225.00 ( 0.00%) Min alloc-odr0-4096 230.00 ( 0.00%) 231.00 ( -0.43%) Min alloc-odr0-8192 235.00 ( 0.00%) 234.00 ( 0.43%) Min alloc-odr0-16384 235.00 ( 0.00%) 234.00 ( 0.43%) Min free-odr0-1 215.00 ( 0.00%) 191.00 ( 11.16%) Min free-odr0-2 152.00 ( 0.00%) 136.00 ( 10.53%) Min free-odr0-4 119.00 ( 0.00%) 107.00 ( 10.08%) Min free-odr0-8 106.00 ( 0.00%) 96.00 ( 9.43%) Min free-odr0-16 97.00 ( 0.00%) 87.00 ( 10.31%) Min free-odr0-32 91.00 ( 0.00%) 83.00 ( 8.79%) Min free-odr0-64 89.00 ( 0.00%) 81.00 ( 8.99%) Min free-odr0-128 88.00 ( 0.00%) 80.00 ( 9.09%) Min free-odr0-256 106.00 ( 0.00%) 95.00 ( 10.38%) Min free-odr0-512 116.00 ( 0.00%) 111.00 ( 4.31%) Min free-odr0-1024 125.00 ( 0.00%) 118.00 ( 5.60%) Min free-odr0-2048 133.00 ( 0.00%) 126.00 ( 5.26%) Min free-odr0-4096 136.00 ( 0.00%) 130.00 ( 4.41%) Min free-odr0-8192 138.00 ( 0.00%) 130.00 ( 5.80%) Min free-odr0-16384 137.00 ( 0.00%) 130.00 ( 5.11%) There is a sizable boost to the free allocator performance. While there is an apparent boost on the allocation side, it's likely a co-incidence or due to the patches slightly reducing cache footprint. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19oom, oom_reaper: try to reap tasks which skip regular OOM killer pathMichal Hocko
If either the current task is already killed or PF_EXITING or a selected task is PF_EXITING then the oom killer is suppressed and so is the oom reaper. This patch adds try_oom_reaper which checks the given task and queues it for the oom reaper if that is safe to be done meaning that the task doesn't share the mm with an alive process. This might help to release the memory pressure while the task tries to exit. [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Raushaniya Maksudova <rmaksudova@parallels.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19arch: fix has_transparent_hugepage()Hugh Dickins
I've just discovered that the useful-sounding has_transparent_hugepage() is actually an architecture-dependent minefield: on some arches it only builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when not, but on some of those (arm and arm64) it then gives the wrong answer; and on mips alone it's marked __init, which would crash if called later (but so far it has not been called later). Straighten this out: make it available to all configs, with a sensible default in asm-generic/pgtable.h, removing its definitions from those arches (arc, arm, arm64, sparc, tile) which are served by the default, adding #define has_transparent_hugepage has_transparent_hugepage to those (mips, powerpc, s390, x86) which need to override the default at runtime, and removing the __init from mips (but maybe that kind of code should be avoided after init: set a static variable the first time it's called). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [arch/s390] Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19huge mm: move_huge_pmd does not need new_vmaHugh Dickins
Remove move_huge_pmd()'s redundant new_vma arg: all it was used for was a VM_NOHUGEPAGE check on new_vma flags, but the new_vma is cloned from the old vma, so a trans_huge_pmd in the new_vma will be as acceptable as it was in the old vma, alignment and size permitting. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm: /proc/sys/vm/stat_refresh to force vmstat updateHugh Dickins
Provide /proc/sys/vm/stat_refresh to force an immediate update of per-cpu into global vmstats: useful to avoid a sleep(2) or whatever before checking counts when testing. Originally added to work around a bug which left counts stranded indefinitely on a cpu going idle (an inaccuracy magnified when small below-batch numbers represent "huge" amounts of memory), but I believe that bug is now fixed: nonetheless, this is still a useful knob. Its schedule_on_each_cpu() is probably too expensive just to fold into reading /proc/meminfo itself: give this mode 0600 to prevent abuse. Allow a write or a read to do the same: nothing to read, but "grep -h Shmem /proc/sys/vm/stat_refresh /proc/meminfo" is convenient. Oh, and since global_page_state() itself is careful to disguise any underflow as 0, hack in an "Invalid argument" and pr_warn() if a counter is negative after the refresh - this helped to fix a misaccounting of NR_ISOLATED_FILE in my migration code. But on recent kernels, I find that NR_ALLOC_BATCH and NR_PAGES_SCANNED often go negative some of the time. I have not yet worked out why, but have no evidence that it's actually harmful. Punt for the moment by just ignoring the anomaly on those. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19tmpfs: preliminary minor tidyupsHugh Dickins
Make a few cleanups in mm/shmem.c, before going on to complicate it. shmem_alloc_page() will become more complicated: we can't afford to to have that complication duplicated between a CONFIG_NUMA version and a !CONFIG_NUMA version, so rearrange the #ifdef'ery there to yield a single shmem_swapin() and a single shmem_alloc_page(). Yes, it's a shame to inflict the horrid pseudo-vma on non-NUMA configurations, but eliminating it is a larger cleanup: I have an alloc_pages_mpol() patchset not yet ready - mpol handling is subtle and bug-prone, and changed yet again since my last version. Move __SetPageLocked, __SetPageSwapBacked from shmem_getpage_gfp() to shmem_alloc_page(): that SwapBacked flag will be useful in future, to help to distinguish different cases appropriately. And the SGP_DIRTY variant of SGP_CACHE is hard to understand and of little use (IIRC it dates back to when shmem_getpage() returned the page unlocked): kill it and do the necessary in shmem_file_read_iter(). But an arm64 build then complained that info may be uninitialized (where shmem_getpage_gfp() deletes a freshly alloced page beyond eof), and advancing to an "sgp <= SGP_CACHE" test jogged it back to reality. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm: update_lru_size do the __mod_zone_page_stateHugh Dickins
Konstantin Khlebnikov pointed out (nearly four years ago, when lumpy reclaim was removed) that lru_size can be updated by -nr_taken once per call to isolate_lru_pages(), instead of page by page. Update it inside isolate_lru_pages(), or at its two callsites? I chose to update it at the callsites, rearranging and grouping the updates by nr_taken and nr_scanned together in both. With one exception, mem_cgroup_update_lru_size(,lru,) is then used where __mod_zone_page_state(,NR_LRU_BASE+lru,) is used; and we shall be adding some more calls in a future commit. Make the code a little smaller and simpler by incorporating stat update in lru_size update. The exception was move_active_pages_to_lru(), which aggregated the pgmoved stat update separately from the individual lru_size updates; but I still think this a simplification worth making. However, the __mod_zone_page_state is not peculiar to mem_cgroups: so better use the name update_lru_size, calls mem_cgroup_update_lru_size when CONFIG_MEMCG. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm: update_lru_size warn and reset bad lru_sizeHugh Dickins
Though debug kernels have a VM_BUG_ON to help protect from misaccounting lru_size, non-debug kernels are liable to wrap it around: and then the vast unsigned long size draws page reclaim into a loop of repeatedly doing nothing on an empty list, without even a cond_resched(). That soft lockup looks confusingly like an over-busy reclaim scenario, with lots of contention on the lru_lock in shrink_inactive_list(): yet has a totally different origin. Help differentiate with a custom warning in mem_cgroup_update_lru_size(), even in non-debug kernels; and reset the size to avoid the lockup. But the particular bug which suggested this change was mine alone, and since fixed. Make it a WARN_ONCE: the first occurrence is the most informative, a flurry may follow, yet even when rate-limited little more is learnt. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19mm: uninline page_mapped()Andrew Morton
It's huge. Uninlining it saves 206 bytes per callsite. Shaves 4924 bytes from the x86_64 allmodconfig vmlinux. [akpm@linux-foundation.org: coding-style fixes] Cc: Steve Capper <steve.capper@arm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>