summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-02-01Merge tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "This seems to have been a comparatively quieter merge window, I assume due to holidays etc. The "biggest" change is AMD header cleanups, which merge/remove a bunch of them. The AMD gpu scheduler is now being made generic with the etnaviv driver wanting to reuse the code, hopefully other drivers can go in the same direction. Otherwise it's the usual lots of stuff in i915/amdgpu, not so much stuff elsewhere. Core: - Add .last_close and .output_poll_changed helpers to reduce driver footprints - Fix plane clipping - Improved debug printing support - Add panel orientation property - Update edid derived properties at edid setting - Reduction in fbdev driver footprint - Move amdgpu scheduler into core for other drivers to use. i915: - Selftest and IGT improvements - Fast boot prep work on IPS, pipe config - HW workarounds for Cannonlake, Geminilake - Cannonlake clock and HDMI2.0 fixes - GPU cache invalidation and context switch improvements - Display planes cleanup - New PMU interface for perf queries - New firmware support for KBL/SKL - Geminilake HW workaround for perforamce - Coffeelake stolen memory improvements - GPU reset robustness work - Cannonlake horizontal plane flipping - GVT work amdgpu/radeon: - RV and Vega header file cleanups (lots of lines gone!) - TTM operation context support - 48-bit GPUVM support for Vega/RV - ECC support for Vega - Resizeable BAR support - Multi-display sync support - Enable swapout for reserved BOs during allocation - S3 fixes on Raven - GPU reset cleanup and fixes - 2+1 level GPU page table amdkfd: - GFX7/8 SDMA user queues support - Hardware scheduling for multiple processes - dGPU prep work rcar: - Added R8A7743/5 support - System suspend/resume support sun4i: - Multi-plane support for YUV formats - A83T and LVDS support msm: - Devfreq support for GPU tegra: - Prep work for adding Tegra186 support - Tegra186 HDMI support - HDMI2.0 and zpos support by using generic helpers tilcdc: - Misc fixes omapdrm: - Support memory bandwidth limits - DSI command mode panel cleanups - DMM error handling exynos: - drop the old IPP subdriver. etnaviv: - Occlusion query fixes - Job handling fixes - Prep work for hooking in gpu scheduler armada: - Move closer to atomic modesetting - Allow disabling primary plane if overlay is full screen imx: - Format modifier support - Add tile prefetch to PRE - Runtime PM support for PRG ast: - fix LUT loading" * tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linux: (1471 commits) drm/ast: Load lut in crtc_commit drm: Check for lessee in DROP_MASTER ioctl drm: fix gpu scheduler link order drm/amd/display: Demote error print to debug print when ATOM impl missing dma-buf: fix reservation_object_wait_timeout_rcu once more v2 drm/amdgpu: Avoid leaking PM domain on driver unbind (v2) drm/amd/amdgpu: Add Polaris version check drm/amdgpu: Reenable manual GPU reset from sysfs drm/amdgpu: disable MMHUB power gating on raven drm/ttm: Don't unreserve swapped BOs that were previously reserved drm/ttm: Don't add swapped BOs to swap-LRU list drm/amdgpu: only check for ECC on Vega10 drm/amd/powerplay: Fix smu_table_entry.handle type drm/ttm: add VADDR_FLAG_UPDATED_COUNT to correctly update dma_page global count drm: Fix PANEL_ORIENTATION_QUIRKS breaking the Kconfig DRM menuconfig drm/radeon: fill in rb backend map on evergreen/ni. drm/amdgpu/gfx9: fix ngg enablement to clear gds reserved memory (v2) drm/ttm: only free pages rather than update global memory count together drm/amdgpu: fix CPU based VM updates drm/amdgpu: fix typo in amdgpu_vce_validate_bo ...
2018-02-01Merge tag 'clk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "The core framework has a handful of patches this time around, mostly due to the clk rate protection support added by Jerome Brunet. This feature will allow consumers to lock in a certain rate on the output of a clk so that things like audio playback don't hear pops when the clk frequency changes due to shared parent clks changing rates. Currently the clk API doesn't guarantee the rate of a clk stays at the rate you request after clk_set_rate() is called, so this new API will allow drivers to express that requirement. Beyond this, the core got some debugfs pretty printing patches and a couple minor non-critical fixes. Looking outside of the core framework diff we have some new driver additions and the removal of a legacy TI clk driver. Both of these hit high in the dirstat. Also, the removal of the asm-generic/clkdev.h file causes small one-liners in all the architecture Kbuild files. Overall, the driver diff seems to be the normal stuff that comes all the time to fix little problems here and there and to support new hardware. Summary: Core: - Clk rate protection - Symbolic clk flags in debugfs output - Clk registration enabled clks while doing bookkeeping updates New Drivers: - Spreadtrum SC9860 - HiSilicon hi3660 stub - Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS - Amlogic Meson-AXG - ASPEED BMC Removed Drivers: - TI OMAP 3xxx legacy clk (non-DT) support - asm*/clkdev.h got removed (not really a driver) Updates: - Renesas FDP1-0 module clock on R-Car M3-W - Renesas LVDS module clock on R-Car V3M - Misc fixes to pr_err() prints - Qualcomm MSM8916 audio fixes - Qualcomm IPQ8074 rounded out support for more peripherals - Qualcomm Alpha PLL variants - Divider code was using container_of() on bad pointers - Allwinner DE2 clks on H3 - Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED - Mediatek clk driver compile test support - AT91 PMC clk suspend/resume restoration support - PLL issues fixed on si5351 - Broadcom IProc PLL calculation updates - DVFS support for Armada mvebu CPU clks - Allwinner fixed post-divider support - TI clkctrl fixes and support for newer SoCs" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits) clk: aspeed: Handle inverse polarity of USB port 1 clock gate clk: aspeed: Fix return value check in aspeed_cc_init() clk: aspeed: Add reset controller clk: aspeed: Register gated clocks clk: aspeed: Add platform driver and register PLLs clk: aspeed: Register core clocks clk: Add clock driver for ASPEED BMC SoCs clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built clk: fix reentrancy of clk_enable() on UP systems clk: meson-axg: fix potential NULL dereference in axg_clkc_probe() clk: Simplify debugfs registration clk: Fix debugfs_create_*() usage clk: Show symbolic clock flags in debugfs clk: renesas: r8a7796: Add FDP clock clk: Move __clk_{get,put}() into private clk.h API clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks clk: Improve flags doc for of_clk_detect_critical() arch: Remove clkdev.h asm-generic from Kbuild clk: sunxi-ng: a83t: Add M divider to TCON1 clock clk: Prepare to remove asm-generic/clkdev.h ...
2018-02-01Merge tag 'armsoc-drivers' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "A number of new drivers get added this time, along with many low-priority bugfixes. The most interesting changes by subsystem are: bus drivers: - Updates to the Broadcom bus interface driver to support newer SoC types - The TI OMAP sysc driver now supports updated DT bindings memory controllers: - A new driver for Tegra186 gets added - A new driver for the ti-emif sram, to allow relocating suspend/resume handlers there SoC specific: - A new driver for Qualcomm QMI, the interface to the modem on MSM SoCs - A new driver for power domains on the actions S700 SoC - A driver for the Xilinx Zynq VCU logicoreIP reset controllers: - A new driver for Amlogic Meson-AGX - various bug fixes tee subsystem: - A new user interface got added to enable asynchronous communication with the TEE supplicant. - A new method of using user space memory for communication with the TEE is added" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (84 commits) of: platform: fix OF node refcount leak soc: fsl: guts: Add a NULL check for devm_kasprintf() bus: ti-sysc: Fix smartreflex sysc mask psci: add CPU_IDLE dependency soc: xilinx: Fix Kconfig alignment soc: xilinx: xlnx_vcu: Use bitwise & rather than logical && on clkoutdiv soc: xilinx: xlnx_vcu: Depends on HAS_IOMEM for xlnx_vcu soc: bcm: brcmstb: Be multi-platform compatible soc: brcmstb: biuctrl: exit without warning on non brcmstb platforms Revert "soc: brcmstb: Only register SoC device on STB platforms" bus: omap: add MODULE_LICENSE tags soc: brcmstb: Only register SoC device on STB platforms tee: shm: Potential NULL dereference calling tee_shm_register() soc: xilinx: xlnx_vcu: Add Xilinx ZYNQMP VCU logicoreIP init driver dt-bindings: soc: xilinx: Add DT bindings to xlnx_vcu driver soc: xilinx: Create folder structure for soc specific drivers of: platform: populate /firmware/ node from of_platform_default_populate_init() soc: samsung: Add SPDX license identifiers soc: qcom: smp2p: Use common error handling code in qcom_smp2p_probe() tee: shm: don't put_page on null shm->pages ...
2018-02-01Merge tag 'armsoc-soc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform updates from Arnd Bergmann: "These are mostly minor bugfixes, cleanup and many defconfig updates to support added drivers. In particular OMAP and PXA keep cleaning up the legacy code base, as usual. Nvidia adds some more SoC support code for Tegra 186. For the first time on years, we are actually adding a non-DT platform for the EP93xx based Liebherr controller BK3.1. It's a minor variation of the EP93xx reference design and in active use, while EP93xx apparently doesn't have enough new development to have any device tree support" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (73 commits) ARM: omap: hwmod: fix section mismatch warnings ARM: pxa/tosa-bt: add MODULE_LICENSE tag arm64: defconfig: enable CONFIG_ACPI_APEI_EINJ arm64: defconfig: enable EDAC GHES option arm64: defconfig: enable CONFIG_ACPI_APEI_MEMORY_FAILURE ARM: imx_v6_v7_defconfig: enable CONFIG_CPU_FREQ_STAT Wind down ARM/TANGO port ARM: davinci: constify gpio_led ARM: davinci: drop unneeded newline soc: Add SoC driver for Gemini ARM: SAMSUNG: Add SPDX license identifiers ARM: S5PV210: Add SPDX license identifiers ARM: S3C64XX: Add SPDX license identifiers ARM: S3C24XX: Add SPDX license identifiers ARM: EXYNOS: Add SPDX license identifiers ARM: imx: remove unused imx3 pm definitions ARM: imx: don't abort MMDC probe if power saving status doesn't match ARM: imx_v6_v7_defconfig: enable RTC_DRV_MXC_V2 ARM: imx_v6_v7_defconfig: Add missing config for DART-MX6 SoM ARM: davinci: Use PTR_ERR_OR_ZERO() ...
2018-02-01Merge tag 'armsoc-dt' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC device tree updates from Arnd Bergmann: "We get a moderate number of new machines this time, and only one new SoC variant (Actions S700): Actions: - S700 Soc and CubieBoard7 development board - Allo.com Sparky Single-board-computer Allwinner: - Orange Pi R1 development board - Libre Computer Board ALL-H3-CC H3 single-board computer ASpeed ast2x00: - Witherspoon: OpenPower Power9 server manufactured by IBM that uses the ASPEED ast2500 - Zaius: OpenPower Power9 server manufactured by Invatech that uses the ASPEED ast2500 - Q71L: Intel Xeon server manufactured by Qanta that uses the ASPEED ast2400 AT91: - Axentia Nattis/Natte digital signage - sama5d2 PTC-ek Evaluation board Freescale/NXP i.MX: - SolidRun Humminboard2 development board - Variscite DART-MX6 SoM and Carrier-board - Technologic TS-4600 and TS-7970 development board - Toradex Colibri iMX7D SoM board - v1.5 variant of Solidrun Cubox-i and Hummingboard Freescale/NXP Layerscape: - Moxa UC-8410A Series industrial computer Gemini: - D-Link DNS-313 NAS enclosure OMAP: - LogicPD OMAP35xx SOM-LV devkit - LogicPD OMAP35xx Torpedo devkit Renesas: - r8a77970 (V3M) Starter Kit board - r8a7795 (M3-W) Salvator-XS board We finally managed to get the dtc warnings under control, with no more build-time warnings for bad device tree files. This includes fixes for the majority of platforms, including nomadik, samsung, lpc32xx, STi, spear, mediatek, freescale, qcom, realview, keystone, omap, kirkwood, renesas, hisilicon, and broadcom. Files get rearranged on a few platforms, in particular the Marvell Armada 7K/8K device tree files are changed in preparation for future SoC support, based on more than two of the same chips in one package, and some boards get renamed for oxnas for consistency. Finally, many existing SoCs gain descriptions for additional on-chip devices that we can now support with kernel drivers: - Allwinner A83t (drm, ethernet, i2c, ...), H3/H5 (USB-OTG) - Amlogic AXG family (clk, pinctrl, pwm, ...), and others (vpu, hdmi) - Aspeed clk controller support - Freescale LS1088A, LS1021A device support - Gemini Ethernet, PCI, TVE, panel - Keystone gpio, qspi, more uarts - Mediatek cpufreq, regulator, clock, reset - Marvell thermal, cpufreq, nand - Renesas SMP, thermal, timer, PWM, sound, phy, ipmmu - Rockchip Mipi, GPU, display - Samsung Exynos5433 PMU, power domain, nfc - Spreadtrum: sc9860 clocks - Tegra TX2 PSDI, HDMI, I2C,SMMU, display, fuse, ..." * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (690 commits) arm64: dts: stratix10: fix SPI settings ARM: dts: socfpga: add i2c reset signals arm64: dts: stratix10: add USB ECC reset bit arm64: dts: stratix10: enable USB on the devkit ARM: dts: socfpga: disable over-current for Arria10 USB devkit ARM: dts: Nokia N9: add support for up/down keys in the dts ARM: dts: nomadik: add interrupt-parent for clcd ARM: dts: Add ethernet to a bunch of platforms ARM: dts: Add ethernet to the Gemini SoC ARM: dts: rename oxnas dts files ARM: dts: s5pv210: add interrupt-parent for ohci ARM: lpc3250: fix uda1380 gpio numbers ARM: dts: STi: Add gpio polarity for "hdmi,hpd-gpio" property ARM: dts: dra7: Reduce shut down temperature of non-cpu thermal zones ARM: dts: n900: Add aliases for lcd and tvout displays ARM: dts: Update ti-sysc data for existing users ARM: dts: Fix smartreflex compatible for omap3 shared mpu-iva instance arm64: dts: marvell: armada-80x0: Fix pinctrl compatible string arm: spear13xx: Fix spics gpio controller's warning arm: spear13xx: Fix dmas cells ...
2018-02-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add a console_msg_format command line option: The value "default" keeps the old "[time stamp] text\n" format. The value "syslog" allows to see the syslog-like "<log level>[timestamp] text" format. This feature was requested by people doing regression tests, for example, 0day robot. They want to have both filtered and full logs at hands. - Reduce the risk of softlockup: Pass the console owner in a busy loop. This is a new approach to the old problem. It was first proposed by Steven Rostedt on Kernel Summit 2017. It marks a context in which the console_lock owner calls console drivers and could not sleep. On the other side, printk() callers could detect this state and use a busy wait instead of a simple console_trylock(). Finally, the console_lock owner checks if there is a busy waiter at the end of the special context and eventually passes the console_lock to the waiter. The hand-off works surprisingly well and helps in many situations. Well, there is still a possibility of the softlockup, for example, when the flood of messages stops and the last owner still has too much to flush. There is increasing number of people having problems with printk-related softlockups. We might eventually need to get better solution. Anyway, this looks like a good start and promising direction. - Do not allow to schedule in console_unlock() called from printk(): This reverts an older controversial commit. The reschedule helped to avoid softlockups. But it also slowed down the console output. This patch is obsoleted by the new console waiter logic described above. In fact, the reschedule made the hand-off less effective. - Deprecate "%pf" and "%pF" format specifier: It was needed on ia64, ppc64 and parisc64 to dereference function descriptors and show the real function address. It is done transparently by "%ps" and "pS" format specifier now. Sergey Senozhatsky found that all the function descriptors were in a special elf section and could be easily detected. - Remove printk_symbol() API: It has been obsoleted by "%pS" format specifier, and this change helped to remove few continuous lines and a less intuitive old API. - Remove redundant memsets: Sergey removed unnecessary memset when processing printk.devkmsg command line option. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits) printk: drop redundant devkmsg_log_str memsets printk: Never set console_may_schedule in console_trylock() printk: Hide console waiter logic into helpers printk: Add console owner and waiter logic to load balance console writes kallsyms: remove print_symbol() function checkpatch: add pF/pf deprecation warning symbol lookup: introduce dereference_symbol_descriptor() parisc64: Add .opd based function descriptor dereference powerpc64: Add .opd based function descriptor dereference ia64: Add .opd based function descriptor dereference sections: split dereference_function_descriptor() openrisc: Fix conflicting types for _exext and _stext lib: do not use print_symbol() irq debug: do not use print_symbol() sysfs: do not use print_symbol() drivers: do not use print_symbol() x86: do not use print_symbol() unicore32: do not use print_symbol() sh: do not use print_symbol() mn10300: do not use print_symbol() ...
2018-02-01Merge tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Mask INTx from user if pdev->irq is zero (Alexey Kardashevskiy) - Capability helper cleanup (Alex Williamson) - Allow mmaps overlapping MSI-X vector table with region capability exposing this feature (Alexey Kardashevskiy) - mdev static cleanups (Xiongwei Song) * tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfio: vfio: mdev: make a couple of functions and structure vfio_mdev_driver static vfio-pci: Allow mapping MSIX BAR vfio: Simplify capability helper vfio-pci: Mask INTx if a device is not capabable of enabling it
2018-02-01Merge branch 'KASAN-read_word_at_a_time'Linus Torvalds
Merge KASAN word-at-a-time fixups from Andrey Ryabinin. The word-at-a-time optimizations have caused headaches for KASAN, since the whole point is that we access byte streams in bigger chunks, and KASAN can be unhappy about the potential extra access at the end of the string. We used to have a horrible hack in dcache, and then people got complaints from the strscpy() case. This fixes it all up properly, by adding an explicit helper for the "access byte stream one word at a time" case. * emailed patches from Andrey Ryabinin <aryabinin@virtuozzo.com>: fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports" fs/dcache: Use read_word_at_a_time() in dentry_string_cmp() lib/strscpy: Shut up KASAN false-positives in strscpy() compiler.h: Add read_word_at_a_time() function. compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
2018-02-01compiler.h: Add read_word_at_a_time() function.Andrey Ryabinin
Sometimes we know that it's safe to do potentially out-of-bounds access because we know it won't cross a page boundary. Still, KASAN will report this as a bug. Add read_word_at_a_time() function which is supposed to be used in such cases. In read_word_at_a_time() KASAN performs relaxed check - only the first byte of access is validated. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()Andrey Ryabinin
Instead of having two identical __read_once_size_nocheck() functions with different attributes, consolidate all the difference in new macro __no_kasan_or_inline and use it. No functional changes. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01Merge tag 'devicetree-for-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: - Convert to use memblock_virt_alloc in DT code which supports bootmem arches. With this we can remove the arch specific early_init_dt_alloc_memory_arch() functions. - Enable running the DT unittests on UML - Use SPDX license tags on DT files - Fix early FDT kconfig ifdef logic - Clean-up unittest Makefile - Fix function comment for of_irq_parse_raw - Add missing documentation for linux,initrd-{start,end} properties - Clean-up of binding examples using uppercase hex - Add trivial devices W83773G and Infineon TLV493D-A1B6 - Add missing STM32 SoC bindings - Various small binding doc fixes * tag 'devicetree-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (23 commits) xtensa: remove arch specific early DT functions x86: remove arch specific early_init_dt_alloc_memory_arch nios2: remove arch specific early_init_dt_alloc_memory_arch mips: remove arch specific early_init_dt_alloc_memory_arch metag: remove arch specific early DT functions cris: remove arch specific early DT functions libfdt: remove unnecessary include directive from <linux/libfdt.h> of: unittest: refactor Makefile of/fdt: use memblock_virt_alloc for early alloc of: Use SPDX license tag for DT files of/fdt: Fix #ifdef dependency of early flattree declarations dt-bindings: h8300 clocksource: correct spelling of pulse dt-bindings: imx6q-pcie: Add required property for i.MX6SX mmc: Don't reference Linux-specific OF_GPIO_ACTIVE_LOW flag in DT binding dt-bindings: Use lower case hex in unit-addresses dt-bindings: display: panel: Fix compatible string for Toshiba LT089AC29000 dt-bindings: Add Infineon TLV493D-A1B6 dt-bindings: mailbox: ti,message-manager: Fix interrupt name error dt-bindings: chosen: Document linux,initrd-{start,end} dt-bindings: arm: document supported STM32 SoC family ...
2018-02-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer updates from Dmitry Torokhov: - evdev interface has been adjusted to extend the life of timestamps on 32 bit systems to the year of 2108 - Synaptics RMI4 driver's PS/2 guest handling ha beed updated to improve chances of detecting trackpoints on the pass-through port - mms114 touchcsreen controller driver has been updated to support generic device properties and work with mms152 cntrollers - Goodix driver now supports generic touchscreen properties - couple of drivers for AVR32 architecture are gone as the architecture support has been removed from the kernel - gpio-tilt driver has been removed as there are no mainline users and the driver itself is using legacy APIs and relies on platform data - MODULE_LINECSE/MODULE_VERSION cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits) Input: goodix - use generic touchscreen_properties Input: mms114 - fix typo in definition Input: mms114 - use BIT() macro instead of explicit shifting Input: mms114 - replace mdelay with msleep Input: mms114 - add support for mms152 Input: mms114 - drop platform data and use generic APIs Input: mms114 - mark as direct input device Input: mms114 - do not clobber interrupt trigger Input: edt-ft5x06 - fix error handling for factory mode on non-M06 Input: stmfts - set IRQ_NOAUTOEN to the irq flag Input: auo-pixcir-ts - delete an unnecessary return statement Input: auo-pixcir-ts - remove custom log for a failed memory allocation Input: da9052_tsi - remove unused mutex Input: docs - use PROPERTY_ENTRY_U32() directly Input: synaptics-rmi4 - log when we create a guest serio port Input: synaptics-rmi4 - unmask F03 interrupts when port is opened Input: synaptics-rmi4 - do not delete interrupt memory too early Input: ad7877 - use managed resource allocations Input: stmfts,s6sy671 - add SPDX identifier Input: remove atmel-wm97xx touchscreen driver ...
2018-02-01Merge tag 'char-misc-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big pull request for char/misc drivers for 4.16-rc1. There's a lot of stuff in here. Three new driver subsystems were added for various types of hardware busses: - siox - slimbus - soundwire as well as a new vboxguest subsystem for the VirtualBox hypervisor drivers. There's also big updates from the FPGA subsystem, lots of Android binder fixes, the usual handful of hyper-v updates, and lots of other smaller driver updates. All of these have been in linux-next for a long time, with no reported issues" * tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (155 commits) char: lp: use true or false for boolean values android: binder: use VM_ALLOC to get vm area android: binder: Use true and false for boolean values lkdtm: fix handle_irq_event symbol for INT_HW_IRQ_EN EISA: Delete error message for a failed memory allocation in eisa_probe() EISA: Whitespace cleanup misc: remove AVR32 dependencies virt: vbox: Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES soundwire: Fix a signedness bug uio_hv_generic: fix new type mismatch warnings uio_hv_generic: fix type mismatch warnings auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE uio_hv_generic: add rescind support uio_hv_generic: check that host supports monitor page uio_hv_generic: create send and receive buffers uio: document uio_hv_generic regions doc: fix documentation about uio_hv_generic vmbus: add monitor_id and subchannel_id to sysfs per channel vmbus: fix ABI documentation uio_hv_generic: use ISR callback method ...
2018-02-01Merge tag 'driver-core-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of "big" driver core patches for 4.16-rc1. The majority of the work here is in the firmware subsystem, with reworks to try to attempt to make the code easier to handle in the long run, but no functional change. There's also some tree-wide sysfs attribute fixups with lots of acks from the various subsystem maintainers, as well as a handful of other normal fixes and changes. And finally, some license cleanups for the driver core and sysfs code. All have been in linux-next for a while with no reported issues" * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits) device property: Define type of PROPERTY_ENRTY_*() macros device property: Reuse property_entry_free_data() device property: Move property_entry_free_data() upper firmware: Fix up docs referring to FIRMWARE_IN_KERNEL firmware: Drop FIRMWARE_IN_KERNEL Kconfig option USB: serial: keyspan: Drop firmware Kconfig options sysfs: remove DEBUG defines sysfs: use SPDX identifiers drivers: base: add coredump driver ops sysfs: add attribute specification for /sysfs/devices/.../coredump test_firmware: fix missing unlock on error in config_num_requests_store() test_firmware: make local symbol test_fw_config static sysfs: turn WARN() into pr_warn() firmware: Fix a typo in fallback-mechanisms.rst treewide: Use DEVICE_ATTR_WO treewide: Use DEVICE_ATTR_RO treewide: Use DEVICE_ATTR_RW sysfs.h: Use octal permissions component: add debugfs support bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate ...
2018-02-01Merge tag 'staging-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO updates from Greg KH: "Here is the big Staging and IIO driver patches for 4.16-rc1. There is the normal amount of new IIO drivers added, like all releases. The networking IPX and the ncpfs filesystem are moved into the staging tree, as they are on their way out of the kernel due to lack of use anymore. The visorbus subsystem finall has started moving out of the staging tree to the "real" part of the kernel, and the most and fsl-mc codebases are almost ready to move out, that will probably happen for 4.17-rc1 if all goes well. Other than that, there is a bunch of license header cleanups in the tree, along with the normal amount of coding style churn that we all know and love for this codebase. I also got frustrated at the Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting huge chunks of it that were never even being used. Full details of everything is in the shortlog. All of these patches have been in linux-next for a while with no reported issues" * tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits) staging: rtlwifi: remove redundant initialization of 'cfg_cmd' staging: rtl8723bs: remove a couple of redundant initializations staging: comedi: reformat lines to 80 chars or less staging: lustre: separate a connection destroy from free struct kib_conn Staging: rtl8723bs: Use !x instead of NULL comparison Staging: rtl8723bs: Remove dead code Staging: rtl8723bs: Change names to conform to the kernel code staging: ccree: Fix missing blank line after declaration staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd' staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST staging: fbtft: remove unused FB_TFT_SSD1325 kconfig staging: comedi: dt2811: remove redundant initialization of 'ns' staging: wilc1000: fix alignments to match open parenthesis staging: wilc1000: removed unnecessary defined enums typedef staging: wilc1000: remove unnecessary use of parentheses staging: rtl8192u: remove redundant initialization of 'timeout' staging: sm750fb: fix CamelCase for dispSet var staging: lustre: lnet/selftest: fix compile error on UP build staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons staging: rts5208: Fix "seg_no" calculation in reset_ms_card() ...
2018-02-01Merge tag 'tty-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/staging driver updates from Greg KH: "Here is the big tty/serial driver update for 4.16-rc1. The usual number of various serial driver fixes and updates to try to get them to work with crazy hardware configurations (seriously, how many different ways are hardware engineers going to come up with to hook up a simple UART?) There is also some serdev bugfixes and updates, as well as a smattering of other small fixes in here. All have been in the linux-next tree for a while, with no reported issues" * tag 'tty-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits) tty: serial: exar: Relocate sleep wake-up handling tty: fix data race between tty_init_dev and flush of buf serial: imx: fix endless loop during suspend serial: core: mark port as initialized after successful IRQ change serdev: only match serdev devices serdev: do not generate modaliases for controllers serial: mxs-auart: don't use GPIOF_* with gpiod_get_direction serial: 8250_dw: Revert "Improve clock rate setting" MAINTAINERS: Add myself as designated reviewer for 8250_dw gpio: serial: max310x: Support open-drain configuration for GPIOs serdev: Fix serdev_uevent failure on ACPI enumerated serdev-controllers serial: 8250_ingenic: Parse earlycon options serial: 8250_ingenic: Add support for the JZ4770 SoC serial: core: Make uart_parse_options take const char* argument serial: 8250_of: fix return code when probe function fails to get reset serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS serial: 8250_uniphier: fix error return code in uniphier_uart_probe() tty: n_gsm: Allow ADM response in addition to UA for control dlci tty: omap-serial: Fix initial on-boot RTS GPIO level tty: serial: jsm: Add one check against NULL pointer dereference ...
2018-02-01Merge tag 'usb-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/PHY updates from Greg KH: "Here is the big USB and PHY driver update for 4.16-rc1. Along with the normally expected XHCI, MUSB, and Gadget driver patches, there are some PHY driver fixes, license cleanups, sysfs attribute cleanups, usbip changes, and a raft of other smaller fixes and additions. Full details are in the shortlog. All of these have been in the linux-next tree for a long time with no reported issues" * tag 'usb-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (137 commits) USB: serial: pl2303: new device id for Chilitag USB: misc: fix up some remaining DEVICE_ATTR() usages USB: musb: fix up one odd DEVICE_ATTR() usage USB: atm: fix up some remaining DEVICE_ATTR() usage USB: move many drivers to use DEVICE_ATTR_WO USB: move many drivers to use DEVICE_ATTR_RO USB: move many drivers to use DEVICE_ATTR_RW USB: misc: chaoskey: Use true and false for boolean values USB: storage: remove old wording about how to submit a change USB: storage: remove invalid URL from drivers usb: ehci-omap: don't complain on -EPROBE_DEFER when no PHY found usbip: list: don't list devices attached to vhci_hcd usbip: prevent bind loops on devices attached to vhci_hcd USB: serial: remove redundant initializations of 'mos_parport' usb/gadget: Fix "high bandwidth" check in usb_gadget_ep_match_desc() usb: gadget: compress return logic into one line usbip: vhci_hcd: update 'status' file header and format USB: serial: simple: add Motorola Tetra driver CDC-ACM: apply quirk for card reader usb: option: Add support for FS040U modem ...
2018-01-31Merge tag 'docs-4.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer" * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits) linux-next: docs-rst: Fix typos in kfigure.py linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt Documentation: Fix misconversion of #if docs: add index entry for networking/msg_zerocopy Documentation: security/credentials.rst: explain need to sort group_list LICENSES: Add MPL-1.1 license LICENSES: Add the GPL 1.0 license LICENSES: Add Linux syscall note exception LICENSES: Add the MIT license LICENSES: Add the BSD-3-clause "Clear" license LICENSES: Add the BSD 3-clause "New" or "Revised" License LICENSES: Add the BSD 2-clause "Simplified" license LICENSES: Add the LGPL-2.1 license LICENSES: Add the LGPL 2.0 license LICENSES: Add the GPL 2.0 license Documentation: Add license-rules.rst to describe how to properly identify file licenses scripts: kernel_doc: better handle show warnings logic fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at doc: md: Fix a file name to md-fault.c in fault-injection.txt errseq: Add to documentation tree ...
2018-01-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - misc fixes - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) mm: remove PG_highmem description tools, vm: new option to specify kpageflags file mm/swap.c: make functions and their kernel-doc agree mm, memory_hotplug: fix memmap initialization mm: correct comments regarding do_fault_around() mm: numa: do not trap faults on shared data section pages. hugetlb, mbind: fall back to default policy if vma is NULL hugetlb, mempolicy: fix the mbind hugetlb migration mm, hugetlb: further simplify hugetlb allocation API mm, hugetlb: get rid of surplus page accounting tricks mm, hugetlb: do not rely on overcommit limit during migration mm, hugetlb: integrate giga hugetlb more naturally to the allocation path mm, hugetlb: unify core page allocation accounting and initialization mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes mm/memcontrol.c: make local symbol static mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd() include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer mm/compaction.c: fix comment for try_to_compact_pages() mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it zsmalloc: use U suffix for negative literals being shifted ...
2018-01-31mm: remove PG_highmem descriptionMiles Chen
Commit cbe37d093707 ("[PATCH] mm: remove PG_highmem") removed PG_highmem to save a page flag. So the description of PG_highmem is no longer needed. Link: http://lkml.kernel.org/r/1517391212-2950-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31hugetlb, mbind: fall back to default policy if vma is NULLMichal Hocko
Dan Carpenter has noticed that mbind migration callback (new_page) can get a NULL vma pointer and choke on it inside alloc_huge_page_vma which relies on the VMA to get the hstate. We used to BUG_ON this case but the BUG_+ON has been removed recently by "hugetlb, mempolicy: fix the mbind hugetlb migration". The proper way to handle this is to get the hstate from the migrated page and rely on huge_node (resp. get_vma_policy) do the right thing with null VMA. We are currently falling back to the default mempolicy in that case which is in line what THP path is doing here. Link: http://lkml.kernel.org/r/20180110104712.GR1732@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31hugetlb, mempolicy: fix the mbind hugetlb migrationMichal Hocko
do_mbind migration code relies on alloc_huge_page_noerr for hugetlb pages. alloc_huge_page_noerr uses alloc_huge_page which is a highlevel allocation function which has to take care of reserves, overcommit or hugetlb cgroup accounting. None of that is really required for the page migration because the new page is only temporal and either will replace the original page or it will be dropped. This is essentially as for other migration call paths and there shouldn't be any reason to handle mbind in a special way. The current implementation is even suboptimal because the migration might fail just because the hugetlb cgroup limit is reached, or the overcommit is saturated. Fix this by making mbind like other hugetlb migration paths. Add a new migration helper alloc_huge_page_vma as a wrapper around alloc_huge_page_nodemask with additional mempolicy handling. alloc_huge_page_noerr has no more users and it can go. Link: http://lkml.kernel.org/r/20180103093213.26329-7-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm, hugetlb: do not rely on overcommit limit during migrationMichal Hocko
hugepage migration relies on __alloc_buddy_huge_page to get a new page. This has 2 main disadvantages. 1) it doesn't allow to migrate any huge page if the pool is used completely which is not an exceptional case as the pool is static and unused memory is just wasted. 2) it leads to a weird semantic when migration between two numa nodes might increase the pool size of the destination NUMA node while the page is in use. The issue is caused by per NUMA node surplus pages tracking (see free_huge_page). Address both issues by changing the way how we allocate and account pages allocated for migration. Those should temporal by definition. So we mark them that way (we will abuse page flags in the 3rd page) and update free_huge_page to free such pages to the page allocator. Page migration path then just transfers the temporal status from the new page to the old one which will be freed on the last reference. The global surplus count will never change during this path but we still have to be careful when migrating a per-node suprlus page. This is now handled in move_hugetlb_state which is called from the migration path and it copies the hugetlb specific page state and fixes up the accounting when needed Rename __alloc_buddy_huge_page to __alloc_surplus_huge_page to better reflect its purpose. The new allocation routine for the migration path is __alloc_migrate_huge_page. The user visible effect of this patch is that migrated pages are really temporal and they travel between NUMA nodes as per the migration request: Before migration /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0 After /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:1 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0 with the previous implementation, both nodes would have nr_hugepages:1 until the page is freed. Link: http://lkml.kernel.org/r/20180103093213.26329-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM ↵Petr Tesarik
mem_map pointer The comment is confusing. On the one hand, it refers to 32-bit alignment (struct page alignment on 32-bit platforms), but this would only guarantee that the 2 lowest bits must be zero. On the other hand, it claims that at least 3 bits are available, and 3 bits are actually used. This is not broken, because there is a stronger alignment guarantee, just less obvious. Let's fix the comment to make it clear how many bits are available and why. Although memmap arrays are allocated in various places, the resulting pointer is encoded eventually, so I am adding a BUG_ON() here to enforce at runtime that all expected bits are indeed available. I have also added a BUILD_BUG_ON to check that PFN_SECTION_SHIFT is sufficient, because this part of the calculation can be easily checked at build time. [ptesarik@suse.com: v2] Link: http://lkml.kernel.org/r/20180125100516.589ea6af@ezekiel.suse.cz Link: http://lkml.kernel.org/r/20180119080908.3a662e6f@ezekiel.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemi Wang <kemi.wang@intel.com> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31zswap: only save zswap header when necessaryYu Zhao
We waste sizeof(swp_entry_t) for zswap header when using zsmalloc as zpool driver because zsmalloc doesn't support eviction. Add zpool_evictable() to detect if zpool is potentially evictable, and use it in zswap to avoid waste memory for zswap header. [yuzhao@google.com: The zpool->" prefix is a result of copy & paste] Link: http://lkml.kernel.org/r/20180110225626.110330-1-yuzhao@google.com Link: http://lkml.kernel.org/r/20180110224741.83751-1-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Dan Streetman <ddstreet@ieee.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31hugetlb: implement memfd sealingMarc-André Lureau
Implements memfd sealing, similar to shmem: - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in memfd_add_seals(). write() doesn't exist for hugetlbfs. - SHRINK: added similar check as shmem_setattr() - GROW: added similar check as shmem_setattr() & shmem_fallocate() Except write() operation that doesn't exist with hugetlbfs, that should make sealing as close as it can be to shmem support. Link: http://lkml.kernel.org/r/20171107122800.25517-5-marcandre.lureau@redhat.com Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31hugetlb: expose hugetlbfs_inode_info in headerMarc-André Lureau
hugetlbfs inode information will need to be accessed by code in mm/shmem.c for file sealing operations. Move inode information definition from .c file to header for needed access. Link: http://lkml.kernel.org/r/20171107122800.25517-4-marcandre.lureau@redhat.com Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31shmem: rename functions that are memfd-relatedMarc-André Lureau
Those functions are called for memfd files, backed by shmem or hugetlb (the next patches will handle hugetlb). Link: http://lkml.kernel.org/r/20171107122800.25517-3-marcandre.lureau@redhat.com Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31shmem: unexport shmem_add_seals()/shmem_get_seals()Marc-André Lureau
Patch series "memfd: add sealing to hugetlb-backed memory", v3. Recently, Mike Kravetz added hugetlbfs support to memfd. However, he didn't add sealing support. One of the reasons to use memfd is to have shared memory sealing when doing IPC or sharing memory with another process with some extra safety. qemu uses shared memory & hugetables with vhost-user (used by dpdk), so it is reasonable to use memfd now instead for convenience and security reasons. This patch (of 9): The functions are called through shmem_fcntl() only. And no danger in removing the EXPORTs as the routines only work with shmem file structs. Link: http://lkml.kernel.org/r/20171107122800.25517-2-marcandre.lureau@redhat.com Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: remove reference to PG_buddyMatthew Wilcox
PG_buddy doesn't exist any more. It's called PageBuddy now. Link: http://lkml.kernel.org/r/20171220155552.15884-9-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: document how to use struct pageMatthew Wilcox
Be really explicit about what bits / bytes are reserved for users that want to store extra information about the pages they allocate. Link: http://lkml.kernel.org/r/20171220155552.15884-8-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: store compound_dtor / compound_order as bytesMatthew Wilcox
Neither of these values get even close to 256; compound_dtor is currently at a maximum of 3, and compound_order can't be over 64. No machine has inefficient access to bytes since EV5, and while those are still supported, we don't optimise for them any more. This does not shrink struct page, but it removes an ifdef and frees up 2-6 bytes for future use. diff of pahole output: struct callback_head callback_head; /* 32 16 */ struct { long unsigned int compound_head; /* 32 8 */ - unsigned int compound_dtor; /* 40 4 */ - unsigned int compound_order; /* 44 4 */ + unsigned char compound_dtor; /* 40 1 */ + unsigned char compound_order; /* 41 1 */ }; /* 32 16 */ }; /* 32 16 */ union { [mawilcox@microsoft.com: add comment] Link: http://lkml.kernel.org/r/20171221000144.GB2980@bombadil.infradead.org Link: http://lkml.kernel.org/r/20171220155552.15884-7-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: introduce _slub_counter_tMatthew Wilcox
Instead of putting the ifdef in the middle of the definition of struct page, pull it forward to the rest of the ifdeffery around the SLUB cmpxchg_double optimisation. Link: http://lkml.kernel.org/r/20171220155552.15884-6-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: improve comment on page->mappingMatthew Wilcox
The comment on page->mapping is terse, and out of date (it does not mention the possibility of PAGE_MAPPING_MOVABLE). Instead, point the interested reader to page-flags.h where there is a much better comment. Link: http://lkml.kernel.org/r/20171220155552.15884-5-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: remove misleading alignment claimsMatthew Wilcox
The "third double word block" isn't on 32-bit systems. The layout looks like this: unsigned long flags; struct address_space *mapping pgoff_t index; atomic_t _mapcount; atomic_t _refcount; which is 32 bytes on 64-bit, but 20 bytes on 32-bit. Nobody is trying to use the fact that it's double-word aligned today, so just remove the misleading claims. Link: http://lkml.kernel.org/r/20171220155552.15884-4-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: de-indent struct pageMatthew Wilcox
I found the struct { union { struct { union { struct { } } } } } layout rather confusing. Fortunately, there is an easier way to write this. The innermost union is of four things which are the size of an int, so the ones which are used by slab/slob/slub can be pulled up two levels to be in the outermost union with 'counters'. That leaves us with struct { union { struct { atomic_t; atomic_t; } } } which has the same layout, but is easier to read. Output from the current git version of pahole, diffed with -uw to ignore the whitespace changes from the indentation: }; /* 16 8 */ union { long unsigned int counters; /* 24 8 */ - struct { - union { - atomic_t _mapcount; /* 24 4 */ unsigned int active; /* 24 4 */ struct { unsigned int inuse:16; /* 24:16 4 */ @@ -21,7 +18,8 @@ unsigned int frozen:1; /* 24: 0 4 */ }; /* 24 4 */ int units; /* 24 4 */ - }; /* 24 4 */ + struct { + atomic_t _mapcount; /* 24 4 */ atomic_t _refcount; /* 28 4 */ }; /* 24 8 */ }; /* 24 8 */ Link: http://lkml.kernel.org/r/20171220155552.15884-3-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: align struct page more aestheticallyMatthew Wilcox
Patch series "Restructure struct page", v2. This series does not attempt any grand restructuring. Instead, it cures the worst of the indentitis, fixes the documentation and reduces the ifdeffery. The only layout change is compound_dtor and compound_order are each reduced to one byte. This patch (of 8): Instead of an ifdef block at the end of the struct, which needed its own comment, define _struct_page_alignment up at the top where it fits nicely with the existing comment. Link: http://lkml.kernel.org/r/20171220155552.15884-2-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacksDavid Rientjes
Commit 4d4bbd8526a8 ("mm, oom_reaper: skip mm structs with mmu notifiers") prevented the oom reaper from unmapping private anonymous memory with the oom reaper when the oom victim mm had mmu notifiers registered. The rationale is that doing mmu_notifier_invalidate_range_{start,end}() around the unmap_page_range(), which is needed, can block and the oom killer will stall forever waiting for the victim to exit, which may not be possible without reaping. That concern is real, but only true for mmu notifiers that have blockable invalidate_range_{start,end}() callbacks. This patch adds a "flags" field to mmu notifier ops that can set a bit to indicate that these callbacks do not block. The implementation is steered toward an expensive slowpath, such as after the oom reaper has grabbed mm->mmap_sem of a still alive oom victim. [rientjes@google.com: mmu_notifier_invalidate_range_end() can also call the invalidate_range() must not block, fix comment] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801091339570.240101@chino.kir.corp.google.com [akpm@linux-foundation.org: make mm_has_blockable_invalidate_notifiers() return bool, use rwsem_is_locked()] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1712141329500.74052@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Joerg Roedel <joro@8bytes.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm/thp: remove pmd_huge_split_prepare()Aneesh Kumar K.V
Instead of marking the pmd ready for split, invalidate the pmd. This should take care of powerpc requirement. Only side effect is that we mark the pmd invalid early. This can result in us blocking access to the page a bit longer if we race against a thp split. [kirill.shutemov@linux.intel.com: rebased, dirty THP once] Link: http://lkml.kernel.org/r/20171213105756.69879-13-kirill.shutemov@linux.intel.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nitin Gupta <nitin.m.gupta@oracle.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: do not lose dirty and accessed bits in pmdp_invalidate()Kirill A. Shutemov
Vlastimil noted that pmdp_invalidate() is not atomic and we can lose dirty and access bits if CPU sets them after pmdp dereference, but before set_pmd_at(). The patch change pmdp_invalidate() to make the entry non-present atomically and return previous value of the entry. This value can be used to check if CPU set dirty/accessed bits under us. The race window is very small and I haven't seen any reports that can be attributed to the bug. For this reason, I don't think backporting to stable trees needed. Link: http://lkml.kernel.org/r/20171213105756.69879-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nitin Gupta <nitin.m.gupta@oracle.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31asm-generic: provide generic_pmdp_establish()Kirill A. Shutemov
Patch series "Do not lose dirty bit on THP pages", v4. Vlastimil noted that pmdp_invalidate() is not atomic and we can lose dirty and access bits if CPU sets them after pmdp dereference, but before set_pmd_at(). The bug can lead to data loss, but the race window is tiny and I haven't seen any reports that suggested that it happens in reality. So I don't think it worth sending it to stable. Unfortunately, there's no way to address the issue in a generic way. We need to fix all architectures that support THP one-by-one. All architectures that have THP supported have to provide atomic pmdp_invalidate() that returns previous value. If generic implementation of pmdp_invalidate() is used, architecture needs to provide atomic pmdp_estabish(). pmdp_estabish() is not used out-side generic implementation of pmdp_invalidate() so far, but I think this can change in the future. This patch (of 12): This is an implementation of pmdp_establish() that is only suitable for an architecture that doesn't have hardware dirty/accessed bits. In this case we can't race with CPU which sets these bits and non-atomic approach is fine. Link: http://lkml.kernel.org/r/20171213105756.69879-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nitin Gupta <nitin.m.gupta@oracle.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: get 7% more pages in a pagevecMatthew Wilcox
We don't have to use an entire 'long' for the number of elements in the pagevec; we know it's a number between 0 and 14 (now 15). So we can store it in a char, and then the bool packs next to it and we still have two or six bytes of padding for more elements in the header. That gives us space to cram in an extra page. Link: http://lkml.kernel.org/r/20171206022521.GM26021@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: add unmap_mapping_pages()Matthew Wilcox
Several users of unmap_mapping_range() would prefer to express their range in pages rather than bytes. Unfortuately, on a 32-bit kernel, you have to remember to cast your page number to a 64-bit type before shifting it, and four places in the current tree didn't remember to do that. That's a sign of a bad interface. Conveniently, unmap_mapping_range() actually converts from bytes into pages, so hoist the guts of unmap_mapping_range() into a new function unmap_mapping_pages() and convert the callers which want to use pages. Link: http://lkml.kernel.org/r/20171206142627.GD32044@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: "zhangyi (F)" <yi.zhang@huawei.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm, hugetlb: remove hugepages_treat_as_movable sysctlMichal Hocko
hugepages_treat_as_movable has been introduced by 396faf0303d2 ("Allow huge page allocations to use GFP_HIGH_MOVABLE") to allow hugetlb allocations from ZONE_MOVABLE even when hugetlb pages were not migrateable. The purpose of the movable zone was different at the time. It aimed at reducing memory fragmentation and hugetlb pages being long lived and large werre not contributing to the fragmentation so it was acceptable to use the zone back then. Things have changed though and the primary purpose of the zone became migratability guarantee. If we allow non migrateable hugetlb pages to be in ZONE_MOVABLE memory hotplug might fail to offline the memory. Remove the knob and only rely on hugepage_migration_supported to allow movable zones. Mel said: : Primarily it was aimed at allowing the hugetlb pool to safely shrink with : the ability to grow it again. The use case was for batched jobs, some of : which needed huge pages and others that did not but didn't want the memory : useless pinned in the huge pages pool. : : I suspect that more users rely on THP than hugetlbfs for flexible use of : huge pages with fallback options so I think that removing the option : should be ok. Link: http://lkml.kernel.org/r/20171003072619.8654-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Alexandru Moise <00moses.alexander00@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: remove unused pgdat_reclaimable_pages()Jan Kara
Remove unused function pgdat_reclaimable_pages() and node_page_state_snapshot() which becomes unused as well. Link: http://lkml.kernel.org/r/20171122094416.26019-1-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: memcontrol: fix excessive complexity in memory.stat reportingJohannes Weiner
We've seen memory.stat reads in top-level cgroups take up to fourteen seconds during a userspace bug that created tens of thousands of ghost cgroups pinned by lingering page cache. Even with a more reasonable number of cgroups, aggregating memory.stat is unnecessarily heavy. The complexity is this: nr_cgroups * nr_stat_items * nr_possible_cpus where the stat items are ~70 at this point. With 128 cgroups and 128 CPUs - decent, not enormous setups - reading the top-level memory.stat has to aggregate over a million per-cpu counters. This doesn't scale. Instead of spreading the source of truth across all CPUs, use the per-cpu counters merely to batch updates to shared atomic counters. This is the same as the per-cpu stocks we use for charging memory to the shared atomic page_counters, and also the way the global vmstat counters are implemented. Vmstat has elaborate spilling thresholds that depend on the number of CPUs, amount of memory, and memory pressure - carefully balancing the cost of counter updates with the amount of per-cpu error. That's because the vmstat counters are system-wide, but also used for decisions inside the kernel (e.g. NR_FREE_PAGES in the allocator). Neither is true for the memory controller. Use the same static batch size we already use for page_counter updates during charging. The per-cpu error in the stats will be 128k, which is an acceptable ratio of cores to memory accounting granularity. [hannes@cmpxchg.org: fix warning in __this_cpu_xchg() calls] Link: http://lkml.kernel.org/r/20171201135750.GB8097@cmpxchg.org Link: http://lkml.kernel.org/r/20171103153336.24044-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: memcontrol: implement lruvec stat functions on top of each otherJohannes Weiner
The implementation of the lruvec stat functions and their variants for accounting through a page, or accounting from a preemptible context, are mostly identical and needlessly repetitive. Implement the lruvec_page functions by looking up the page's lruvec and then using the lruvec function. Implement the functions for preemptible contexts by disabling preemption before calling the atomic context functions. Link: http://lkml.kernel.org/r/20171103153336.24044-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: memcontrol: eliminate raw access to stat and event countersJohannes Weiner
Replace all raw 'this_cpu_' modifications of the stat and event per-cpu counters with API functions such as mod_memcg_state(). This makes the code easier to read, but is also in preparation for the next patch, which changes the per-cpu implementation of those counters. Link: http://lkml.kernel.org/r/20171103153336.24044-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: use sc->priority for slab shrink targetsJosef Bacik
Previously we were using the ratio of the number of lru pages scanned to the number of eligible lru pages to determine the number of slab objects to scan. The problem with this is that these two things have nothing to do with each other, so in slab heavy work loads where there is little to no page cache we can end up with the pages scanned being a very low number. This means that we reclaim next to no slab pages and waste a lot of time reclaiming small amounts of space. Consider the following scenario, where we have the following values and the rest of the memory usage is in slab Active: 58840 kB Inactive: 46860 kB Every time we do a get_scan_count() we do this scan = size >> sc->priority where sc->priority starts at DEF_PRIORITY, which is 12. The first loop through reclaim would result in a scan target of 2 pages to 11715 total inactive pages, and 3 pages to 14710 total active pages. This is a really really small target for a system that is entirely slab pages. And this is super optimistic, this assumes we even get to scan these pages. We don't increment sc->nr_scanned unless we 1) isolate the page, which assumes it's not in use, and 2) can lock the page. Under pressure these numbers could probably go down, I'm sure there's some random pages from daemons that aren't actually in use, so the targets get even smaller. Instead use sc->priority in the same way we use it to determine scan amounts for the lru's. This generally equates to pages. Consider the following slab_pages = (nr_objects * object_size) / PAGE_SIZE What we would like to do is scan = slab_pages >> sc->priority but we don't know the number of slab pages each shrinker controls, only the objects. However say that theoretically we knew how many pages a shrinker controlled, we'd still have to convert this to objects, which would look like the following scan = shrinker_pages >> sc->priority scan_objects = (PAGE_SIZE / object_size) * scan or written another way scan_objects = (shrinker_pages >> sc->priority) * (PAGE_SIZE / object_size) which can thus be written scan_objects = ((shrinker_pages * PAGE_SIZE) / object_size) >> sc->priority which is just scan_objects = nr_objects >> sc->priority We don't need to know exactly how many pages each shrinker represents, it's objects are all the information we need. Making this change allows us to place an appropriate amount of pressure on the shrinker pools for their relative size. Link: http://lkml.kernel.org/r/1510780549-6812-1-git-send-email-josef@toxicpanda.com Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Dave Chinner <david@fromorbit.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31mm: drop hotplug lock from lru_add_drain_all()Michal Hocko
Pulling cpu hotplug locks inside the mm core function like lru_add_drain_all just asks for problems and the recent lockdep splat [1] just proves this. While the usage in that particular case might be wrong we should avoid the locking as lru_add_drain_all() is used in many places. It seems that this is not all that hard to achieve actually. We have done the same thing for drain_all_pages which is analogous by commit a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks inside the allocator"). All we have to care about is to handle - the work item might be executed on a different cpu in worker from unbound pool so it doesn't run on pinned on the cpu - we have to make sure that we do not race with page_alloc_cpu_dead calling lru_add_drain_cpu the first part is already handled because the worker calls lru_add_drain which disables preemption when calling lru_add_drain_cpu on the local cpu it is draining. The later is true because page_alloc_cpu_dead is called on the controlling CPU after the hotplugged CPU vanished completely. [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com [add a cpu hotplug locking interaction as per tglx] Link: http://lkml.kernel.org/r/20171116120535.23765-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>