summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-03-07Merge branch 'next-general' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: - Extend LSM stacking to allow sharing of cred, file, ipc, inode, and task blobs. This paves the way for more full-featured LSMs to be merged, and is specifically aimed at LandLock and SARA LSMs. This work is from Casey and Kees. - There's a new LSM from Micah Morton: "SafeSetID gates the setid family of syscalls to restrict UID/GID transitions from a given UID/GID to only those approved by a system-wide whitelist." This feature is currently shipping in ChromeOS. * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (62 commits) keys: fix missing __user in KEYCTL_PKEY_QUERY LSM: Update list of SECURITYFS users in Kconfig LSM: Ignore "security=" when "lsm=" is specified LSM: Update function documentation for cap_capable security: mark expected switch fall-throughs and add a missing break tomoyo: Bump version. LSM: fix return value check in safesetid_init_securityfs() LSM: SafeSetID: add selftest LSM: SafeSetID: remove unused include LSM: SafeSetID: 'depend' on CONFIG_SECURITY LSM: Add 'name' field for SafeSetID in DEFINE_LSM LSM: add SafeSetID module that gates setid calls LSM: add SafeSetID module that gates setid calls tomoyo: Allow multiple use_group lines. tomoyo: Coding style fix. tomoyo: Swicth from cred->security to task_struct->security. security: keys: annotate implicit fall throughs security: keys: annotate implicit fall throughs security: keys: annotate implicit fall through capabilities:: annotate implicit fall through ...
2019-03-07Merge branch 'for-5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Oleg's pids controller accounting update which gets rid of rcu delay in pids accounting updates - rstat (cgroup hierarchical stat collection mechanism) optimization - Doc updates * 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: remove unused task_has_mempolicy() cgroup, rstat: Don't flush subtree root unless necessary cgroup: add documentation for pids.events file Documentation: cgroup-v2: eliminate markup warnings MAINTAINERS: Update cgroup entry cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting
2019-03-07Merge tag 'fsnotify_for_v5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify updates from Jan Kara: "Support for fanotify directory events and changes to make waiting for fanotify permission event response killable" * tag 'fsnotify_for_v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (25 commits) fanotify: Make waits for fanotify events only killable fanotify: Use interruptible wait when waiting for permission events fanotify: Track permission event state fanotify: Simplify cleaning of access_list fsnotify: Create function to remove event from notification list fanotify: Move locking inside get_one_event() fanotify: Fold dequeue_event() into process_access_response() fanotify: Select EXPORTFS fanotify: report FAN_ONDIR to listener with FAN_REPORT_FID fanotify: add support for create/attrib/move/delete events fanotify: support events with data type FSNOTIFY_EVENT_INODE fanotify: check FS_ISDIR flag instead of d_is_dir() fsnotify: report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events fanotify: use vfs_get_fsid() helper instead of vfs_statfs() vfs: add vfs_get_fsid() helper fanotify: cache fsid in fsnotify_mark_connector fanotify: enable FAN_REPORT_FID init flag fanotify: copy event fid info to user fanotify: encode file identifier for FAN_REPORT_FID fanotify: open code fill_event_metadata() ...
2019-03-07Merge tag 'dtype_for_v5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull dtype handling cleanups from Jan Kara: "A reworked dtype cleanup patches based on your feedback to the previous version of these. Again the series includes only the generic code and ext2 cleanup as a sample. The plan is to push cleanups for other filesystems separately through respective trees once the generic code lands to reduce the number of conflicts" * tag 'dtype_for_v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: use common file type conversion fs: common implementation of file type
2019-03-06Merge tag 'usb-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/PHY updates from Greg KH: "Here is the big USB/PHY driver pull request for 5.1-rc1. The usual set of gadget driver updates, phy driver updates, xhci updates, and typec additions. Also included in here are a lot of small cleanups and fixes and driver updates where needed. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (167 commits) wusb: Remove unnecessary static function ckhdid_printf usb: core: make default autosuspend delay configurable usb: core: Fix typo in description of "authorized_default" usb: chipidea: Refactor USB PHY selection and keep a single PHY usb: chipidea: Grab the (legacy) USB PHY by phandle first usb: chipidea: imx: set power polarity dt-bindings: usb: ci-hdrc-usb2: add property power-active-high usb: chipidea: imx: remove unused header files usb: chipidea: tegra: Fix missed ci_hdrc_remove_device() usb: core: add option of only authorizing internal devices usb: typec: tps6598x: handle block writes separately with plain-I2C adapters usb: xhci: Fix for Enabling USB ROLE SWITCH QUIRK on INTEL_SUNRISEPOINT_LP_XHCI usb: xhci: fix build warning - missing prototype usb: xhci: dbc: Fixing typo error. usb: xhci: remove unused member 'parent' in xhci_regset struct xhci: tegra: Prevent error pointer dereference USB: serial: option: add Telit ME910 ECM composition usb: core: Replace hardcoded check with inline function from usb.h usb: core: skip interfaces disabled in devicetree usb: typec: mux: remove redundant check on variable match ...
2019-03-06Merge tag 'tty-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the "big" patchset for the tty/serial driver layer for 5.1-rc1. It's really not all that big, nothing major here. There are a lot of tiny driver fixes and updates, combined with other cleanups for different serial drivers and the vt layer. Full details are in the shortlog. All of these have been in linux-next with no reported issues" * tag 'tty-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (70 commits) tty: xilinx_uartps: Correct return value in probe serial: sprd: Modify the baud rate calculation formula dt-bindings: serial: Add Milbeaut serial driver description serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart serial: 8250_pxa: honor the port number from devicetree tty: hvc_xen: Mark expected switch fall-through tty: n_gsm: Mark expected switch fall-throughs tty: serial: msm_serial: Remove __init from msm_console_setup() tty: serial: samsung: Enable baud clock during initialisation serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO tty: serial: remove redundant likely annotation tty/n_hdlc: mark expected switch fall-through serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup() serial: 8250_pci: Fix number of ports for ACCES serial cards vt: perform safe console erase in the right order tty/nozomi: use pci_iomap instead of ioremap_nocache tty/synclink: remove ISA support serial: 8250_pci: Replace custom code with pci_match_id() serial: max310x: Correction of the initial setting of the MODE1 bits for various supported ICs. serial: mps2-uart: Add parentheses around conditional in mps2_uart_shutdown ...
2019-03-06Merge tag 'staging-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO updates from Greg KH: "Here is the big staging/iio driver pull request for 5.1-rc1. Lots of good IIO driver updates and cleanups in here as always. Combined with the removal of the xgifb driver, we have a net "loss" of over 9000 lines in the pull request, always a nice thing. As the outreachy application process is currently happening, there are loads of tiny checkpatch cleanup fixes all over the staging tree, which accounts for the majority of the fixups" * tag 'staging-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (341 commits) staging: mt7621-dma: remove license boilerplate text staging: mt7621-dma: add SPDX GPL-2.0+ license identifier Staging: ks7010: Replace typecast to int Staging: vt6655: Align a static function declaration staging: speakup: fix line over 80 characters. staging: mt7621-eth: Remove license boilerplate text staging: mt7621-eth: Add SPDX license identifier staging: ks7010: removed custom Michael MIC implementation. staging: rtl8192e: Fix space and suspect issue Staging: vt6655: Modify comment style of SPDX License Identifier Staging: vt6655: Modify comment style for SPDX-License-Identifier Staging: vt6655: Align a function declaration Staging: vt6655: Alignment of function declaration staging: rtl8712: Fix indentation issue staging: wilc1000: fix incorrent type in initializer staging: rtl8188eu: remove unused P2P_PRIVATE_IOCTL_SET_LEN staging: rtl8188eu: remove unused enum P2P_PROTO_WK_ID staging: rtl8723bs: Remove duplicated include from drv_types.h Staging: vt6655: Alignment should match open parenthesis staging: erofs: fix mis-acted TAIL merging behavior ...
2019-03-06Merge tag 'driver-core-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...
2019-03-06Merge tag 'char-misc-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver patch pull request for 5.1-rc1. The largest thing by far is the new habanalabs driver for their AI accelerator chip. For now it is in the drivers/misc directory but will probably move to a new directory soon along with other drivers of this type. Other than that, just the usual set of individual driver updates and fixes. There's an "odd" merge in here from the DRM tree that they asked me to do as the MEI driver is starting to interact with the i915 driver, and it needed some coordination. All of those patches have been properly acked by the relevant subsystem maintainers. All of these have been in linux-next with no reported issues, most for quite some time" * tag 'char-misc-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (219 commits) habanalabs: adjust Kconfig to fix build errors habanalabs: use %px instead of %p in error print habanalabs: use do_div for 64-bit divisions intel_th: gth: Fix an off-by-one in output unassigning habanalabs: fix little-endian<->cpu conversion warnings habanalabs: use NULL to initialize array of pointers habanalabs: fix little-endian<->cpu conversion warnings habanalabs: soft-reset device if context-switch fails habanalabs: print pointer using %p habanalabs: fix memory leak with CBs with unaligned size habanalabs: return correct error code on MMU mapping failure habanalabs: add comments in uapi/misc/habanalabs.h habanalabs: extend QMAN0 job timeout habanalabs: set DMA0 completion to SOB 1007 habanalabs: fix validation of WREG32 to DMA completion habanalabs: fix mmu cache registers init habanalabs: disable CPU access on timeouts habanalabs: add MMU DRAM default page mapping habanalabs: Dissociate RAZWI info from event types misc/habanalabs: adjust Kconfig to fix build errors ...
2019-03-06Merge tag 'sound-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "We had again a busy development cycle with many new drivers as well as lots of core improvements / cleanups. Let's go for highlights: ALSA core: - PCM locking scheme was refactored for reducing a global rwlock - PCM suspend is handled in the device type PM ops now; lots of explicit calls were reduced by this action - Cleanups about PCM buffer preallocation calls - Kill NULL device object in memory allocations - Lots of procfs API cleanups ASoC core: - Support for only powering up channels that are actively being used - Cleanups / fixes of topology API ASoC drivers: - MediaTek BTCVSD for a Bluetooth radio chip, which is the first such driver we've had upstream! - Quite a few improvements to simplify the generic card drivers, especially the merge of the SCU cards into the main generic drivers - Lots of fixes for probing on Intel systems to follow more standard styles - A big refresh and cleanup of the Samsung drivers - New drivers: Asahi Kasei Microdevices AK4497, Cirrus Logic CS4341 and CS35L26, Google ChromeOS embedded controllers, Ingenic JZ4725B, MediaTek BTCVSD, MT8183 and MT6358, NXP MICFIL, Rockchip RK3328, Spreadtrum DMA controllers, Qualcomm WCD9335, Xilinx S/PDIF and PCM formatters ALSA drivers: - Improvements of Tegra HD-audio controller driver for supporting new chips - HD-audio codec quirks for ALC294 S4 resume, ASUS laptop, Chrome headset button support and Dell workstations - Improved DSD support on USB-audio - Quirk for MOTU MicroBook II USB-audio - Support for Fireface UCX support and Solid State Logic Duende Classic/Mini" * tag 'sound-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (461 commits) ALSA: usb-audio: Add quirk for MOTU MicroBook II ASoC: stm32: i2s: skip useless write in slave mode ASoC: stm32: i2s: fix race condition in irq handler ASoC: stm32: i2s: remove useless callback ASoC: stm32: i2s: fix dma configuration ASoC: stm32: i2s: fix stream count management ASoC: stm32: i2s: fix 16 bit format support ASoC: stm32: i2s: fix IRQ clearing ASoC: qcom: Kconfig: fix dependency for sdm845 ASoC: Intel: Boards: Add Maxim98373 support ASoC: rsnd: gen: fix SSI9 4/5/6/7 busif related register address ALSA: firewire-motu: fix construction of PCM frame for capture direction ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56 ALSA: hda: Extend i915 component bind timeout ASoC: wm_adsp: Improve logging messages ASoC: wm_adsp: Add support for multiple compressed buffers ASoC: wm_adsp: Refactor compress stream initialisation ASoC: wm_adsp: Reorder some functions for improved clarity ASoC: wm_adsp: Factor out stripping padding from ADSP data ASoC: cs35l36: Fix an IS_ERR() vs NULL checking bug ...
2019-03-06Merge tag 'devprop-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework updates from Rafael Wysocki: "Fix the length value used in the PROPERTY_ENTRY_STRING() macro and make software nodes use the get_named_child_node() fwnode callback (Heikki Krogerus)" * tag 'devprop-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: software node: Implement get_named_child_node fwnode callback device property: Fix the length used in PROPERTY_ENTRY_STRING()
2019-03-06Merge tag 'acpi-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These are ACPICA updates including ACPI 6.3 support among other things, APEI updates including the ARM Software Delegated Exception Interface (SDEI) support, ACPI EC driver fixes and cleanups and other assorted improvements. Specifics: - Update the ACPICA code in the kernel to upstream revision 20190215 including ACPI 6.3 support and more: * New predefined methods: _NBS, _NCH, _NIC, _NIH, and _NIG (Erik Schmauss). * Update of the PCC Identifier structure in PDTT (Erik Schmauss). * Support for new Generic Affinity Structure subtable in SRAT (Erik Schmauss). * New PCC operation region support (Erik Schmauss). * Support for GICC statistical profiling for MADT (Erik Schmauss). * New Error Disconnect Recover notification support (Erik Schmauss). * New PPTT Processor Structure Flags fields support (Erik Schmauss). * ACPI 6.3 HMAT updates (Erik Schmauss). * GTDT Revision 3 support (Erik Schmauss). * Legacy module-level code (MLC) support removal (Erik Schmauss). * Update/clarification of messages for control method failures (Bob Moore). * Warning on creation of a zero-length opregion (Bob Moore). * acpiexec option to dump extra info for memory leaks (Bob Moore). * More ACPI error to firmware error conversions (Bob Moore). * Debugger fix (Bob Moore). * Copyrights update (Bob Moore) - Clean up sleep states support code in ACPICA (Christoph Hellwig) - Rework in_nmi() handling in the APEI code and add suppor for the ARM Software Delegated Exception Interface (SDEI) to it (James Morse) - Fix possible out-of-bounds accesses in BERT-related core (Ross Lagerwall) - Fix the APEI code parsing HEST that includes a Deferred Machine Check subtable (Yazen Ghannam) - Use DEFINE_DEBUGFS_ATTRIBUTE for APEI-related debugfs files (YueHaibing) - Switch the APEI ERST code to the new generic UUID API (Andy Shevchenko) - Update the MAINTAINERS entry for APEI (Borislav Petkov) - Fix and clean up the ACPI EC driver (Rafael Wysocki, Zhang Rui) - Fix DMI checks handling in the ACPI backlight driver and add the "Lunch Box" chassis-type check to it (Hans de Goede) - Add support for using ACPI table overrides included in built-in initrd images (Shunyong Yang) - Update ACPI device enumeration to treat the PWM2 device as "always present" on Lenovo Yoga Book (Yauhen Kharuzhy) - Fix up the enumeration of device objects with the PRP0001 device ID (Andy Shevchenko) - Clean up PPTT parsing error messages (John Garry) - Clean up debugfs files creation handling (Greg Kroah-Hartman, Rafael Wysocki) - Clean up the ACPI DPTF Makefile (Masahiro Yamada)" * tag 'acpi-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) ACPI / bus: Respect PRP0001 when retrieving device match data ACPICA: Update version to 20190215 ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting ACPICA: ACPI 6.3: add GTDT Revision 3 support ACPICA: ACPI 6.3: HMAT updates ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags ACPICA: ACPI 6.3: add Error Disconnect Recover Notification value ACPICA: ACPI 6.3: MADT: add support for statistical profiling in GICC ACPICA: ACPI 6.3: add PCC operation region support for AML interpreter efi: cper: Fix possible out-of-bounds access ACPI: APEI: Fix possible out-of-bounds access to BERT region ACPICA: ACPI 6.3: SRAT: add Generic Affinity Structure subtable ACPICA: ACPI 6.3: Add Trigger order to PCC Identifier structure in PDTT ACPICA: ACPI 6.3: Adding predefined methods _NBS, _NCH, _NIC, _NIH, and _NIG ACPICA: Update/clarify messages for control method failures ACPICA: Debugger: Fix possible fault with the "test objects" command ACPICA: Interpreter: Emit warning for creation of a zero-length op region ACPICA: Remove legacy module-level code support ACPI / x86: Make PWM2 device always present at Lenovo Yoga Book ACPI / video: Extend chassis-type detection with a "Lunch Box" check ..
2019-03-06Merge tag 'pm-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are PM-runtime framework changes to use ktime instead of jiffies for accounting, new PM core flag to mark devices that don't need any form of power management, cpuidle updates including driver API documentation and a new governor, cpufreq updates including a new driver for Armada 8K, thermal cleanups and more, some energy-aware scheduling (EAS) enabling changes, new chips support in the intel_idle and RAPL drivers and assorted cleanups in some other places. Specifics: - Update the PM-runtime framework to use ktime instead of jiffies for accounting (Thara Gopinath, Vincent Guittot) - Optimize the autosuspend code in the PM-runtime framework somewhat (Ladislav Michl) - Add a PM core flag to mark devices that don't need any form of power management (Sudeep Holla) - Introduce driver API documentation for cpuidle and add a new cpuidle governor for tickless systems (Rafael Wysocki) - Add Jacobsville support to the intel_idle driver (Zhang Rui) - Clean up a cpuidle core header file and the cpuidle-dt and ACPI processor-idle drivers (Yangtao Li, Joseph Lo, Yazen Ghannam) - Add new cpufreq driver for Armada 8K (Gregory Clement) - Fix and clean up cpufreq core (Rafael Wysocki, Viresh Kumar, Amit Kucheria) - Add support for light-weight tear-down and bring-up of CPUs to the cpufreq core and use it in the cpufreq-dt driver (Viresh Kumar) - Fix cpu_cooling Kconfig dependencies, add support for CPU cooling auto-registration to the cpufreq core and use it in multiple cpufreq drivers (Amit Kucheria) - Fix some minor issues and do some cleanups in the davinci, e_powersaver, ap806, s5pv210, qcom and kryo cpufreq drivers (Bartosz Golaszewski, Gustavo Silva, Julia Lawall, Paweł Chmiel, Taniya Das, Viresh Kumar) - Add a Hisilicon CPPC quirk to the cppc_cpufreq driver (Xiongfeng Wang) - Clean up the intel_pstate and acpi-cpufreq drivers (Erwan Velu, Rafael Wysocki) - Clean up multiple cpufreq drivers (Yangtao Li) - Update cpufreq-related MAINTAINERS entries (Baruch Siach, Lukas Bulwahn) - Add support for exposing the Energy Model via debugfs and make multiple cpufreq drivers register an Energy Model to support energy-aware scheduling (Quentin Perret, Dietmar Eggemann, Matthias Kaehlcke) - Add Ice Lake mobile and Jacobsville support to the Intel RAPL power-capping driver (Gayatri Kammela, Zhang Rui) - Add a power estimation helper to the operating performance points (OPP) framework and clean up a core function in it (Quentin Perret, Viresh Kumar) - Make minor improvements in the generic power domains (genpd), OPP and system suspend frameworks and in the PM core (Aditya Pakki, Douglas Anderson, Greg Kroah-Hartman, Rafael Wysocki, Yangtao Li)" * tag 'pm-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (80 commits) cpufreq: kryo: Release OPP tables on module removal cpufreq: ap806: add missing of_node_put after of_device_is_available cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies cpufreq: Pass updated policy to driver ->setpolicy() callback cpufreq: Fix two debug messages in cpufreq_set_policy() cpufreq: Reorder and simplify cpufreq_update_policy() cpufreq: Add kerneldoc comments for two core functions PM / core: Add support to skip power management in device/driver model cpufreq: intel_pstate: Rework iowait boosting to be less aggressive cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate() cpufreq: intel_pstate: Avoid redundant initialization of local vars powercap/intel_rapl: add Ice Lake mobile ACPI / processor: Set P_LVL{2,3} idle state descriptions cpufreq / cppc: Work around for Hisilicon CPPC cpufreq ACPI / CPPC: Add a helper to get desired performance cpufreq: davinci: move configuration to include/linux/platform_data cpufreq: speedstep: convert BUG() to BUG_ON() cpufreq: powernv: fix missing check of return value in init_powernv_pstates() cpufreq: longhaul: remove unneeded semicolon cpufreq: pcc-cpufreq: remove unneeded semicolon ..
2019-03-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (159 commits) tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include proc: more robust bulk read test proc: test /proc/*/maps, smaps, smaps_rollup, statm proc: use seq_puts() everywhere proc: read kernel cpu stat pointer once proc: remove unused argument in proc_pid_lookup() fs/proc/thread_self.c: code cleanup for proc_setup_thread_self() fs/proc/self.c: code cleanup for proc_setup_self() proc: return exit code 4 for skipped tests mm,mremap: bail out earlier in mremap_to under map pressure mm/sparse: fix a bad comparison mm/memory.c: do_fault: avoid usage of stale vm_area_struct writeback: fix inode cgroup switching comment mm/huge_memory.c: fix "orig_pud" set but not used mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC mm/memcontrol.c: fix bad line in comment mm/cma.c: cma_declare_contiguous: correct err handling mm/page_ext.c: fix an imbalance with kmemleak mm/compaction: pass pgdat to too_many_isolated() instead of zone mm: remove zone_lru_lock() function, access ->lru_lock directly ...
2019-03-06Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM SoC late updates from Arnd Bergmann: "Here are two branches that came relatively late during the linux-5.0 development cycle and have dependencies on the other branches: - On the TI OMAP platform, the CPSW Ethernet PHY mode selection driver is being replaced, this puts the final pieces in place - On the DaVinci platform, the interrupt handling code in arch/arm gets moved into a regular device driver in drivers/irqchip. Since they both had some time in linux-next after the 5.0-rc8 release, I'm sending them along with the other updates" * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (38 commits) net: ethernet: ti: cpsw: deprecate cpsw-phy-sel driver ARM: davinci: remove intc related fields from davinci_soc_info irqchip: davinci-cp-intc: move the driver to drivers/irqchip ARM: davinci: cp-intc: remove redundant comments ARM: davinci: cp-intc: drop GPL license boilerplate ARM: davinci: cp-intc: use readl/writel_relaxed() ARM: davinci: cp-intc: unify error handling ARM: davinci: cp-intc: improve coding style ARM: davinci: cp-intc: request the memory region before remapping it ARM: davinci: cp-intc: use the new-style config structure ARM: davinci: cp-intc: convert all hex numbers to lowercase ARM: davinci: cp-intc: use a common prefix for all symbols ARM: davinci: cp-intc: add the new config structures for da8xx SoCs irqchip: davinci-cp-intc: add a new config structure ARM: davinci: cp-intc: add a wrapper around cp_intc_init() ARM: davinci: cp-intc: remove cp_intc.h irqchip: davinci-aintc: move the driver to drivers/irqchip ARM: davinci: aintc: remove unnecessary includes ARM: davinci: aintc: remove the timer-specific irq_set_handler() ARM: davinci: aintc: request memory region before remapping it ...
2019-03-06Merge tag 'armsoc-drivers' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "As usual, the drivers/tee and drivers/reset subsystems get merged here, with the expected set of smaller updates and some new hardware support. The tee subsystem now supports device drivers to be attached to a tee, the first example here is a random number driver with its implementation in the secure world. Three new power domain drivers get added for specific chip families: - Broadcom BCM283x chips (used in Raspberry Pi) - Qualcomm Snapdragon phone chips - Xilinx ZynqMP FPGA SoCs One new driver is added to talk to the BPMP firmware on NVIDIA Tegra210 Existing drivers are extended for new SoC variants from NXP, NVIDIA, Amlogic and Qualcomm" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (113 commits) tee: optee: update optee_msg.h and optee_smc.h to dual license tee: add cancellation support to client interface dpaa2-eth: configure the cache stashing amount on a queue soc: fsl: dpio: configure cache stashing destination soc: fsl: dpio: enable frame data cache stashing per software portal soc: fsl: guts: make fsl_guts_get_svr() static hwrng: make symbol 'optee_rng_id_table' static tee: optee: Fix unsigned comparison with less than zero hwrng: Fix unsigned comparison with less than zero tee: fix possible error pointer ctx dereferencing hwrng: optee: Initialize some structs using memset instead of braces tee: optee: Initialize some structs using memset instead of braces soc: fsl: dpio: fix memory leak of a struct qbman on error exit path clk: tegra: dfll: Make symbol 'tegra210_cpu_cvb_tables' static soc: qcom: llcc-slice: Fix typos qcom: soc: llcc-slice: Consolidate some code qcom: soc: llcc-slice: Clear the global drv_data pointer on error drivers: soc: xilinx: Add ZynqMP power domain driver firmware: xilinx: Add APIs to control node status/power dt-bindings: power: Add ZynqMP power domain bindings ...
2019-03-06Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM SoC platform updates from Arnd Bergmann: "The APM X-Gene platform is now maintained by folks from Ampere computing that took over the product line a while ago, this gets reflected in the MAINTAINERS file. Cleanups continue on the older mach-davinci and mach-pxa platform, to get them to be more like the modern ones. For pxa, we now remove the Raumfeld platform code as it now works with device tree based booting. i.MX adds a couple new features for the i.MX7ULP SoC Mediatek gains support for a new SoC: MT7629 is a new wireless router platform, following MT7623. Aside from those, there are the usual minor cleanups and bugfixes across several platforms" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (49 commits) MAINTAINERS: Update Ampere email address usb: ohci-da8xx: remove unused callbacks from platform data ARM: davinci: da830-evm: remove legacy usb helpers ARM: davinci: omapl138-hawk: remove legacy usb helpers usb: ohci-da8xx: add vbus and overcurrent gpios ARM: davinci: da830-evm: use gpio lookup entries for usb gpios ARM: davinci: omapl138-hawk: use gpio lookup entries for usb gpios usb: ohci-da8xx: add a helper pointer to &pdev->dev usb: ohci-da8xx: add a new line after local variables arm64: meson: enable g12a clock controller MAINTAINERS: Add entry for uDPU board ARM: davinci: da850-evm: use GPIO hogs instead of the legacy API arm: mediatek: add MT7629 smp bring up code Revert "ARM: mediatek: add MT7623a smp bringup code" dt-bindings: soc: fix typo of MT8173 power dt-bindings ARM: meson: remove COMMON_CLK_AMLOGIC selection arm64: meson: remove COMMON_CLK_AMLOGIC selection ARM: lpc32xx: remove platform data of ARM PL111 LCD controller ARM: lpc32xx: remove platform data of ARM PL180 SD/MMC controller ARM: lpc32xx: Use kmemdup to replace duplicating its implementation ...
2019-03-06Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - refcount conversions - Solve the rq->leaf_cfs_rq_list can of worms for real. - improve power-aware scheduling - add sysctl knob for Energy Aware Scheduling - documentation updates - misc other changes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) kthread: Do not use TIMER_IRQSAFE kthread: Convert worker lock to raw spinlock sched/fair: Use non-atomic cpumask_{set,clear}_cpu() sched/fair: Remove unused 'sd' parameter from select_idle_smt() sched/wait: Use freezable_schedule() when possible sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block sched/fair: Explain LLC nohz kick condition sched/fair: Simplify nohz_balancer_kick() sched/topology: Fix percpu data types in struct sd_data & struct s_data sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument sched/fair: Fix O(nr_cgroups) in the load balancing path sched/fair: Optimize update_blocked_averages() sched/fair: Fix insertion in rq->leaf_cfs_rq_list sched/fair: Add tmp_alone_branch assertion sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock() sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity sched/fair: Update scale invariance of PELT sched/fair: Move the rq_of() helper function sched/core: Convert task_struct.stack_refcount to refcount_t ...
2019-03-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of tooling updates - too many to list, here's a few highlights: - Various subcommand updates to 'perf trace', 'perf report', 'perf record', 'perf annotate', 'perf script', 'perf test', etc. - CPU and NUMA topology and affinity handling improvements, - HW tracing and HW support updates: - Intel PT updates - ARM CoreSight updates - vendor HW event updates - BPF updates - Tons of infrastructure updates, both on the build system and the library support side - Documentation updates. - ... and lots of other changes, see the changelog for details. Kernel side updates: - Tighten up kprobes blacklist handling, reduce the number of places where developers can install a kprobe and hang/crash the system. - Fix/enhance vma address filter handling. - Various PMU driver updates, small fixes and additions. - refcount_t conversions - BPF updates - error code propagation enhancements - misc other changes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits) perf script python: Add Python3 support to syscall-counts-by-pid.py perf script python: Add Python3 support to syscall-counts.py perf script python: Add Python3 support to stat-cpi.py perf script python: Add Python3 support to stackcollapse.py perf script python: Add Python3 support to sctop.py perf script python: Add Python3 support to powerpc-hcalls.py perf script python: Add Python3 support to net_dropmonitor.py perf script python: Add Python3 support to mem-phys-addr.py perf script python: Add Python3 support to failed-syscalls-by-pid.py perf script python: Add Python3 support to netdev-times.py perf tools: Add perf_exe() helper to find perf binary perf script: Handle missing fields with -F +.. perf data: Add perf_data__open_dir_data function perf data: Add perf_data__(create_dir|close_dir) functions perf data: Fail check_backup in case of error perf data: Make check_backup work over directories perf tools: Add rm_rf_perf_data function perf tools: Add pattern name checking to rm_rf perf tools: Add depth checking to rm_rf perf data: Add global path holder ...
2019-03-06Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The biggest part of this tree is the new auto-generated atomics API wrappers by Mark Rutland. The primary motivation was to allow instrumentation without uglifying the primary source code. The linecount increase comes from adding the auto-generated files to the Git space as well: include/asm-generic/atomic-instrumented.h | 1689 ++++++++++++++++-- include/asm-generic/atomic-long.h | 1174 ++++++++++--- include/linux/atomic-fallback.h | 2295 +++++++++++++++++++++++++ include/linux/atomic.h | 1241 +------------ I preferred this approach, so that the full call stack of the (already complex) locking APIs is still fully visible in 'git grep'. But if this is excessive we could certainly hide them. There's a separate build-time mechanism to determine whether the headers are out of date (they should never be stale if we do our job right). Anyway, nothing from this should be visible to regular kernel developers. Other changes: - Add support for dynamic keys, which removes a source of false positives in the workqueue code, among other things (Bart Van Assche) - Updates to tools/memory-model (Andrea Parri, Paul E. McKenney) - qspinlock, wake_q and lockdep micro-optimizations (Waiman Long) - misc other updates and enhancements" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/lockdep: Shrink struct lock_class_key locking/lockdep: Add module_param to enable consistency checks lockdep/lib/tests: Test dynamic key registration lockdep/lib/tests: Fix run_tests.sh kernel/workqueue: Use dynamic lockdep keys for workqueues locking/lockdep: Add support for dynamic keys locking/lockdep: Verify whether lock objects are small enough to be used as class keys locking/lockdep: Check data structure consistency locking/lockdep: Reuse lock chains that have been freed locking/lockdep: Fix a comment in add_chain_cache() locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count() locking/lockdep: Reuse list entries that are no longer in use locking/lockdep: Free lock classes that are no longer in use locking/lockdep: Update two outdated comments locking/lockdep: Make it easy to detect whether or not inside a selftest locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock() locking/lockdep: Initialize the locks_before and locks_after lists earlier locking/lockdep: Make zap_class() remove all matching lock order entries locking/lockdep: Reorder struct lock_class members locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache ...
2019-03-06Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main EFI changes in this cycle were: - Use 32-bit alignment for efi_guid_t - Allow the SetVirtualAddressMap() call to be omitted - Implement earlycon=efifb based on existing earlyprintk code - Various minor fixes and code cleanups from Sai, Ard and me" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: Fix build error due to enum collision between efi.h and ima.h efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted efi: Replace GPL license boilerplate with SPDX headers efi/fdt: Apply more cleanups efi: Use 32-bit alignment for efi_guid_t efi/memattr: Don't bail on zero VA if it equals the region's PA x86/efi: Mark can_free_region() as an __init function
2019-03-05writeback: fix inode cgroup switching commentGreg Thelen
Commit 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") refers to inode_switch_wb_work_fn() which never got merged. Switch the comments to inode_switch_wbs_work_fn(). Link: http://lkml.kernel.org/r/20190305004617.142590-1-gthelen@google.com Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: remove zone_lru_lock() function, access ->lru_lock directlyAndrey Ryabinin
We have common pattern to access lru_lock from a page pointer: zone_lru_lock(page_zone(page)) Which is silly, because it unfolds to this: &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]->zone_pgdat->lru_lock while we can simply do &NODE_DATA(page_to_nid(page))->lru_lock Remove zone_lru_lock() function, since it's only complicate things. Use 'page_pgdat(page)->lru_lock' pattern instead. [aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()] Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/workingset: remove unused @mapping argument in workingset_eviction()Andrey Ryabinin
workingset_eviction() doesn't use and never did use the @mapping argument. Remove it. Link: http://lkml.kernel.org/r/20190228083329.31892-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05include/linux/compaction.h: fix potential build errorYu Zhao
Declaration of struct node is required regardless. On UMA systems, including compaction.h without preceding node.h shouldn't cause a build error. Link: http://lkml.kernel.org/r/20190208080437.253322-1-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: page_cache_add_speculative(): refactor out some code duplicationjohn.hubbard@gmail.com
From: John Hubbard <jhubbard@nvidia.com> This combines the common elements of these routines: page_cache_get_speculative() page_cache_add_speculative() This was anticipated by the original author, as shown by the comment in commit ce0ad7f095258 ("powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3)"): "Same as above, but add instead of inc (could just be merged)" There is no intention to introduce any behavioral change, but there is a small risk of that, due to slightly differing ways of expressing the TINY_RCU and related configurations. This also removes the VM_BUG_ON(in_interrupt()) that was in page_cache_add_speculative(), but not in page_cache_get_speculative(). This provides slightly less detection of such bugs, but it given that it was only there on the "add" path anyway, we can likely do without it just fine. And it removes the VM_BUG_ON_PAGE(PageCompound(page) && page != compound_head(page), page); that page_cache_add_speculative() had. Link: http://lkml.kernel.org/r/20190206231016.22734-2-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/page_poison: update comment after code movedMichael S. Tsirkin
mm/debug-pagealloc.c is no more, so of course header now needs to be updated. This seems like something checkpatch should be able to catch - worth looking into? Link: http://lkml.kernel.org/r/20190207191113.14039-1-mst@redhat.com Fixes: 8823b1dbc05f ("mm/page_poison.c: enable PAGE_POISONING as a separate option") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05numa: make "nr_online_nodes" unsigned intAlexey Dobriyan
Number of online NUMA nodes can't be negative as well. This doesn't save space as the variable is used only in 32-bit context, but do it anyway for consistency. Link: http://lkml.kernel.org/r/20190201223151.GB15820@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05numa: make "nr_node_ids" unsigned intAlexey Dobriyan
Number of NUMA nodes can't be negative. This saves a few bytes on x86_64: add/remove: 0/0 grow/shrink: 4/21 up/down: 27/-265 (-238) Function old new delta hv_synic_alloc.cold 88 110 +22 prealloc_shrinker 260 262 +2 bootstrap 249 251 +2 sched_init_numa 1566 1567 +1 show_slab_objects 778 777 -1 s_show 1201 1200 -1 kmem_cache_init 346 345 -1 __alloc_workqueue_key 1146 1145 -1 mem_cgroup_css_alloc 1614 1612 -2 __do_sys_swapon 4702 4699 -3 __list_lru_init 655 651 -4 nic_probe 2379 2374 -5 store_user_store 118 111 -7 red_zone_store 106 99 -7 poison_store 106 99 -7 wq_numa_init 348 338 -10 __kmem_cache_empty 75 65 -10 task_numa_free 186 173 -13 merge_across_nodes_store 351 336 -15 irq_create_affinity_masks 1261 1246 -15 do_numa_crng_init 343 321 -22 task_numa_fault 4760 4737 -23 swapfile_init 179 156 -23 hv_synic_alloc 536 492 -44 apply_wqattrs_prepare 746 695 -51 Link: http://lkml.kernel.org/r/20190201223029.GA15820@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: swap: use mem_cgroup_is_root() instead of deferencing css->parentYang Shi
mem_cgroup_is_root() is the preferred API to check if memcg is root or not. Use it instead of deferencing css->parent. Link: http://lkml.kernel.org/r/1547232913-118148-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: update get_user_pages_longterm to migrate pages allocated from CMA regionAneesh Kumar K.V
This patch updates get_user_pages_longterm to migrate pages allocated out of CMA region. This makes sure that we don't keep non-movable pages (due to page reference count) in the CMA area. This will be used by ppc64 in a later patch to avoid pinning pages in the CMA region. ppc64 uses CMA region for allocation of the hardware page table (hash page table) and not able to migrate pages out of CMA region results in page table allocation failures. One case where we hit this easy is when a guest using a VFIO passthrough device. VFIO locks all the guest's memory and if the guest memory is backed by CMA region, it becomes unmovable resulting in fragmenting the CMA and possibly preventing other guests from allocation a large enough hash page table. NOTE: We allocate the new page without using __GFP_THISNODE Link: http://lkml.kernel.org/r/20190114095438.32470-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/cma: add PF flag to force non cma allocAneesh Kumar K.V
Patch series "mm/kvm/vfio/ppc64: Migrate compound pages out of CMA region", v8. ppc64 uses the CMA area for the allocation of guest page table (hash page table). We won't be able to start guest if we fail to allocate hash page table. We have observed hash table allocation failure because we failed to migrate pages out of CMA region because they were pinned. This happen when we are using VFIO. VFIO on ppc64 pins the entire guest RAM. If the guest RAM pages get allocated out of CMA region, we won't be able to migrate those pages. The pages are also pinned for the lifetime of the guest. Currently we support migration of non-compound pages. With THP and with the addition of hugetlb migration we can end up allocating compound pages from CMA region. This patch series add support for migrating compound pages. This patch (of 4): Add PF_MEMALLOC_NOCMA which make sure any allocation in that context is marked non-movable and hence cannot be satisfied by CMA region. This is useful with get_user_pages_longterm where we want to take a page pin by migrating pages from CMA region. Marking the section PF_MEMALLOC_NOCMA ensures that we avoid unnecessary page migration later. Link: http://lkml.kernel.org/r/20190114095438.32470-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: better document PG_reservedDavid Hildenbrand
The usage of PG_reserved and how PG_reserved pages are to be treated is buried deep down in different parts of the kernel. Let's shine some light onto these details by documenting current users and expected behavior. Especially, clarify on the "Some of them might not even exist" case. These are physical memory gaps that will never be dumped as they are not marked as IORESOURCE_SYSRAM. PG_reserved does in general not hinder anybody from dumping or swapping. In some cases, these pages will not be stored in the hibernation image. Link: http://lkml.kernel.org/r/20190114125903.24845-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: <yi.z.zhang@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: rid swapoff of quadratic complexityVineeth Remanan Pillai
This patch was initially posted by Kelley Nielsen. Reposting the patch with all review comments addressed and with minor modifications and optimizations. Also, folding in the fixes offered by Hugh Dickins and Huang Ying. Tests were rerun and commit message updated with new results. try_to_unuse() is of quadratic complexity, with a lot of wasted effort. It unuses swap entries one by one, potentially iterating over all the page tables for all the processes in the system for each one. This new proposed implementation of try_to_unuse simplifies its complexity to linear. It iterates over the system's mms once, unusing all the affected entries as it walks each set of page tables. It also makes similar changes to shmem_unuse. Improvement swapoff was called on a swap partition containing about 6G of data, in a VM(8cpu, 16G RAM), and calls to unuse_pte_range() were counted. Present implementation....about 1200M calls(8min, avg 80% cpu util). Prototype.................about 9.0K calls(3min, avg 5% cpu util). Details In shmem_unuse(), iterate over the shmem_swaplist and, for each shmem_inode_info that contains a swap entry, pass it to shmem_unuse_inode(), along with the swap type. In shmem_unuse_inode(), iterate over its associated xarray, and store the index and value of each swap entry in an array for passing to shmem_swapin_page() outside of the RCU critical section. In try_to_unuse(), instead of iterating over the entries in the type and unusing them one by one, perhaps walking all the page tables for all the processes for each one, iterate over the mmlist, making one pass. Pass each mm to unuse_mm() to begin its page table walk, and during the walk, unuse all the ptes that have backing store in the swap type received by try_to_unuse(). After the walk, check the type for orphaned swap entries with find_next_to_unuse(), and remove them from the swap cache. If find_next_to_unuse() starts over at the beginning of the type, repeat the check of the shmem_swaplist and the walk a maximum of three times. Change unuse_mm() and the intervening walk functions down to unuse_pte_range() to take the type as a parameter, and to iterate over their entire range, calling the next function down on every iteration. In unuse_pte_range(), make a swap entry from each pte in the range using the passed in type. If it has backing store in the type, call swapin_readahead() to retrieve the page and pass it to unuse_pte(). Pass the count of pages_to_unuse down the page table walks in try_to_unuse(), and return from the walk when the desired number of pages has been swapped back in. Link: http://lkml.kernel.org/r/20190114153129.4852-2-vpillai@digitalocean.com Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com> Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/hugetlb: add prot_modify_start/commit sequence for hugetlb updateAneesh Kumar K.V
Architectures like ppc64 require to do a conditional tlb flush based on the old and new value of pte. Follow the regular pte change protection sequence for hugetlb too. This allows the architectures to override the update sequence. Link: http://lkml.kernel.org/r/20190116085035.29729-5-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: fix some typos in mm directoryWei Yang
No functional change. Link: http://lkml.kernel.org/r/20190118235123.27843-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm, memcg: create mem_cgroup_from_seqChris Down
This is the start of a series of patches similar to my earlier DEFINE_MEMCG_MAX_OR_VAL work, but with less Macro Magic(tm). There are a bunch of places we go from seq_file to mem_cgroup, which currently requires manually getting the css, then getting the mem_cgroup from the css. It's in enough places now that having mem_cgroup_from_seq makes sense (and also makes the next patch a bit nicer). Link: http://lkml.kernel.org/r/20190124194050.GA31341@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05kernel: cgroup: add poll file operationJohannes Weiner
Cgroup has a standardized poll/notification mechanism for waking all pollers on all fds when a filesystem node changes. To allow polling for custom events, add a .poll callback that can override the default. This is in preparation for pollable cgroup pressure files which have per-fd trigger configurations. Link: http://lkml.kernel.org/r/20190124211518.244221-3-surenb@google.com Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05fs: kernfs: add poll file operationJohannes Weiner
Patch series "psi: pressure stall monitors", v3. Android is adopting psi to detect and remedy memory pressure that results in stuttering and decreased responsiveness on mobile devices. Psi gives us the stall information, but because we're dealing with latencies in the millisecond range, periodically reading the pressure files to detect stalls in a timely fashion is not feasible. Psi also doesn't aggregate its averages at a high enough frequency right now. This patch series extends the psi interface such that users can configure sensitive latency thresholds and use poll() and friends to be notified when these are breached. As high-frequency aggregation is costly, it implements an aggregation method that is optimized for fast, short-interval averaging, and makes the aggregation frequency adaptive, such that high-frequency updates only happen while monitored stall events are actively occurring. With these patches applied, Android can monitor for, and ward off, mounting memory shortages before they cause problems for the user. For example, using memory stall monitors in userspace low memory killer daemon (lmkd) we can detect mounting pressure and kill less important processes before device becomes visibly sluggish. In our memory stress testing psi memory monitors produce roughly 10x less false positives compared to vmpressure signals. Having ability to specify multiple triggers for the same psi metric allows other parts of Android framework to monitor memory state of the device and act accordingly. The new interface is straightforward. The user opens one of the pressure files for writing and writes a trigger description into the file descriptor that defines the stall state - some or full, and the maximum stall time over a given window of time. E.g.: /* Signal when stall time exceeds 100ms of a 1s window */ char trigger[] = "full 100000 1000000"; fd = open("/proc/pressure/memory"); write(fd, trigger, sizeof(trigger)); while (poll() >= 0) { ... } close(fd); When the monitored stall state is entered, psi adapts its aggregation frequency according to what the configured time window requires in order to emit event signals in a timely fashion. Once the stalling subsides, aggregation reverts back to normal. The trigger is associated with the open file descriptor. To stop monitoring, the user only needs to close the file descriptor and the trigger is discarded. Patches 1-4 prepare the psi code for polling support. Patch 5 implements the adaptive polling logic, the pressure growth detection optimized for short intervals, and hooks up write() and poll() on the pressure files. The patches were developed in collaboration with Johannes Weiner. This patch (of 5): Kernfs has a standardized poll/notification mechanism for waking all pollers on all fds when a filesystem node changes. To allow polling for custom events, add a .poll callback that can override the default. This is in preparation for pollable cgroup pressure files which have per-fd trigger configurations. Link: http://lkml.kernel.org/r/20190124211518.244221-2-surenb@google.com Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm, compaction: capture a page under direct compactionMel Gorman
Compaction is inherently race-prone as a suitable page freed during compaction can be allocated by any parallel task. This patch uses a capture_control structure to isolate a page immediately when it is freed by a direct compactor in the slow path of the page allocator. The intent is to avoid redundant scanning. 5.0.0-rc1 5.0.0-rc1 selective-v3r17 capture-v3r19 Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%* Amean fault-both-3 2582.11 ( 0.00%) 2563.68 ( 0.71%) Amean fault-both-5 4500.26 ( 0.00%) 4233.52 ( 5.93%) Amean fault-both-7 5819.53 ( 0.00%) 6333.65 ( -8.83%) Amean fault-both-12 9321.18 ( 0.00%) 9759.38 ( -4.70%) Amean fault-both-18 9782.76 ( 0.00%) 10338.76 ( -5.68%) Amean fault-both-24 15272.81 ( 0.00%) 13379.55 * 12.40%* Amean fault-both-30 15121.34 ( 0.00%) 16158.25 ( -6.86%) Amean fault-both-32 18466.67 ( 0.00%) 18971.21 ( -2.73%) Latency is only moderately affected but the devil is in the details. A closer examination indicates that base page fault latency is reduced but latency of huge pages is increased as it takes creater care to succeed. Part of the "problem" is that allocation success rates are close to 100% even when under pressure and compaction gets harder 5.0.0-rc1 5.0.0-rc1 selective-v3r17 capture-v3r19 Percentage huge-3 96.70 ( 0.00%) 98.23 ( 1.58%) Percentage huge-5 96.99 ( 0.00%) 95.30 ( -1.75%) Percentage huge-7 94.19 ( 0.00%) 97.24 ( 3.24%) Percentage huge-12 94.95 ( 0.00%) 97.35 ( 2.53%) Percentage huge-18 96.74 ( 0.00%) 97.30 ( 0.58%) Percentage huge-24 97.07 ( 0.00%) 97.55 ( 0.50%) Percentage huge-30 95.69 ( 0.00%) 98.50 ( 2.95%) Percentage huge-32 96.70 ( 0.00%) 99.27 ( 2.65%) And scan rates are reduced as expected by 6% for the migration scanner and 29% for the free scanner indicating that there is less redundant work. Compaction migrate scanned 20815362 19573286 Compaction free scanned 16352612 11510663 [mgorman@techsingularity.net: remove redundant check] Link: http://lkml.kernel.org/r/20190201143853.GH9565@techsingularity.net Link: http://lkml.kernel.org/r/20190118175136.31341-23-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm, compaction: be selective about what pageblocks to clear skip hintsMel Gorman
Pageblock hints are cleared when compaction restarts or kswapd makes enough progress that it can sleep but it's over-eager in that the bit is cleared for migration sources with no LRU pages and migration targets with no free pages. As pageblock skip hint flushes are relatively rare and out-of-band with respect to kswapd, this patch makes a few more expensive checks to see if it's appropriate to even clear the bit. Every pageblock that is not cleared will avoid 512 pages being scanned unnecessarily on x86-64. The impact is variable with different workloads showing small differences in latency, success rates and scan rates. This is expected as clearing the hints is not that common but doing a small amount of work out-of-band to avoid a large amount of work in-band later is generally a good thing. Link: http://lkml.kernel.org/r/20190118175136.31341-22-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> [cai@lca.pw: no stuck in __reset_isolation_pfn()] Link: http://lkml.kernel.org/r/20190206034732.75687-1-cai@lca.pw Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm, compaction: use free lists to quickly locate a migration sourceMel Gorman
The migration scanner is a linear scan of a zone with a potentiall large search space. Furthermore, many pageblocks are unusable such as those filled with reserved pages or partially filled with pages that cannot migrate. These still get scanned in the common case of allocating a THP and the cost accumulates. The patch uses a partial search of the free lists to locate a migration source candidate that is marked as MOVABLE when allocating a THP. It prefers picking a block with a larger number of free pages already on the basis that there are fewer pages to migrate to free the entire block. The lowest PFN found during searches is tracked as the basis of the start for the linear search after the first search of the free list fails. After the search, the free list is shuffled so that the next search will not encounter the same page. If the search fails then the subsequent searches will be shorter and the linear scanner is used. If this search fails, or if the request is for a small or unmovable/reclaimable allocation then the linear scanner is still used. It is somewhat pointless to use the list search in those cases. Small free pages must be used for the search and there is no guarantee that movable pages are located within that block that are contiguous. 5.0.0-rc1 5.0.0-rc1 noboost-v3r10 findmig-v3r15 Amean fault-both-3 3771.41 ( 0.00%) 3390.40 ( 10.10%) Amean fault-both-5 5409.05 ( 0.00%) 5082.28 ( 6.04%) Amean fault-both-7 7040.74 ( 0.00%) 7012.51 ( 0.40%) Amean fault-both-12 11887.35 ( 0.00%) 11346.63 ( 4.55%) Amean fault-both-18 16718.19 ( 0.00%) 15324.19 ( 8.34%) Amean fault-both-24 21157.19 ( 0.00%) 16088.50 * 23.96%* Amean fault-both-30 21175.92 ( 0.00%) 18723.42 * 11.58%* Amean fault-both-32 21339.03 ( 0.00%) 18612.01 * 12.78%* 5.0.0-rc1 5.0.0-rc1 noboost-v3r10 findmig-v3r15 Percentage huge-3 86.50 ( 0.00%) 89.83 ( 3.85%) Percentage huge-5 92.52 ( 0.00%) 91.96 ( -0.61%) Percentage huge-7 92.44 ( 0.00%) 92.85 ( 0.44%) Percentage huge-12 92.98 ( 0.00%) 92.74 ( -0.25%) Percentage huge-18 91.70 ( 0.00%) 91.71 ( 0.02%) Percentage huge-24 91.59 ( 0.00%) 92.13 ( 0.60%) Percentage huge-30 90.14 ( 0.00%) 93.79 ( 4.04%) Percentage huge-32 90.03 ( 0.00%) 91.27 ( 1.37%) This shows an improvement in allocation latencies with similar allocation success rates. While not presented, there was a 31% reduction in migration scanning and a 8% reduction on system CPU usage. A 2-socket machine showed similar benefits. [mgorman@techsingularity.net: several fixes] Link: http://lkml.kernel.org/r/20190204120111.GL9565@techsingularity.net [vbabka@suse.cz: migrate block that was found-fast, some optimisations] Link: http://lkml.kernel.org/r/20190118175136.31341-10-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <Vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: shuffle GFP_* flagsAlexey Dobriyan
GFP_KERNEL is one of the most used constant but on archs like arm with fixed length instruction some constants are more equal than the others. Constants with tightly packed bits can be injected directly into instruction stream: 0: e3a00d33 mov r0, #3264 ; 0xcc0 Others require multiple instructions or even loading out of instruction stream: 0: e3a000c0 mov r0, #192 ; 0xc0 4: e3400060 movt r0, #96 ; 0x60 Shuffle GFP_* flags so that GFP_KERNEL/GFP_ATOMIC + __GFP_ZERO bits are close to each other. Savings on arm configs are ~0.1%. Link: http://lkml.kernel.org/r/20190109201838.GA9140@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/hugetlb: enable arch specific huge page size support for migrationAnshuman Khandual
Architectures like arm64 have HugeTLB page sizes which are different than generic sizes at PMD, PUD, PGD level and implemented via contiguous bits. At present these special size HugeTLB pages cannot be identified through macros like (PMD|PUD|PGDIR)_SHIFT and hence chosen not be migrated. Enabling migration support for these special HugeTLB page sizes along with the generic ones (PMD|PUD|PGD) would require identifying all of them on a given platform. A platform specific hook can precisely enumerate all huge page sizes supported for migration. Instead of comparing against standard huge page orders let hugetlb_migration_support() function call a platform hook arch_hugetlb_migration_support(). Default definition for the platform hook maintains existing semantics which checks standard huge page order. But an architecture can choose to override the default and provide support for a comprehensive set of huge page sizes. Link: http://lkml.kernel.org/r/1545121450-1663-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/hugetlb: enable PUD level huge page migrationAnshuman Khandual
Architectures like arm64 have PUD level HugeTLB pages for certain configs (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be enabled for migration. It can be achieved through checking for PUD_SHIFT order based HugeTLB pages during migration. Link: http://lkml.kernel.org/r/1545121450-1663-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/hugetlb: distinguish between migratability and movabilityAnshuman Khandual
Patch series "arm64/mm: Enable HugeTLB migration", v4. This patch series enables HugeTLB migration support for all supported huge page sizes at all levels including contiguous bit implementation. Following HugeTLB migration support matrix has been enabled with this patch series. All permutations have been tested except for the 16GB. CONT PTE PMD CONT PMD PUD -------- --- -------- --- 4K: 64K 2M 32M 1G 16K: 2M 32M 1G 64K: 2M 512M 16G First the series adds migration support for PUD based huge pages. It then adds a platform specific hook to query an architecture if a given huge page size is supported for migration while also providing a default fallback option preserving the existing semantics which just checks for (PMD|PUD|PGDIR)_SHIFT macros. The last two patches enables HugeTLB migration on arm64 and subscribe to this new platform specific hook by defining an override. The second patch differentiates between movability and migratability aspects of huge pages and implements hugepage_movable_supported() which can then be used during allocation to decide whether to place the huge page in movable zone or not. This patch (of 5): During huge page allocation it's migratability is checked to determine if it should be placed under movable zones with GFP_HIGHUSER_MOVABLE. But the movability aspect of the huge page could depend on other factors than just migratability. Movability in itself is a distinct property which should not be tied with migratability alone. This differentiates these two and implements an enhanced movability check which also considers huge page size to determine if it is feasible to be placed under a movable zone. At present it just checks for gigantic pages but going forward it can incorporate other enhanced checks. Link: http://lkml.kernel.org/r/1545121450-1663-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: remove sysctl_extfrag_handler()Matthew Wilcox
sysctl_extfrag_handler() neglects to propagate the return value from proc_dointvec_minmax() to its caller. It's a wrapper that doesn't need to exist, so just use proc_dointvec_minmax() directly. Link: http://lkml.kernel.org/r/20190104032557.3056-1-willy@infradead.org Signed-off-by: Matthew Wilcox <willy@infradead.org> Reported-by: Aditya Pakki <pakki001@umn.edu> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05memcg: localize memcg_kmem_enabled() checkShakeel Butt
Move the memcg_kmem_enabled() checks into memcg kmem charge/uncharge functions, so, the users don't have to explicitly check that condition. This is purely code cleanup patch without any functional change. Only the order of checks in memcg_charge_slab() can potentially be changed but the functionally it will be same. This should not matter as memcg_charge_slab() is not in the hot path. Link: http://lkml.kernel.org/r/20190103161203.162375-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: reuse only-pte-mapped KSM page in do_wp_page()Kirill Tkhai
Add an optimization for KSM pages almost in the same way that we have for ordinary anonymous pages. If there is a write fault in a page, which is mapped to an only pte, and it is not related to swap cache; the page may be reused without copying its content. [ Note that we do not consider PageSwapCache() pages at least for now, since we don't want to complicate __get_ksm_page(), which has nice optimization based on this (for the migration case). Currenly it is spinning on PageSwapCache() pages, waiting for when they have unfreezed counters (i.e., for the migration finish). But we don't want to make it also spinning on swap cache pages, which we try to reuse, since there is not a very high probability to reuse them. So, for now we do not consider PageSwapCache() pages at all. ] So in reuse_ksm_page() we check for 1) PageSwapCache() and 2) page_stable_node(), to skip a page, which KSM is currently trying to link to stable tree. Then we do page_ref_freeze() to prohibit KSM to merge one more page into the page, we are reusing. After that, nobody can refer to the reusing page: KSM skips !PageSwapCache() pages with zero refcount; and the protection against of all other participants is the same as for reused ordinary anon pages pte lock, page lock and mmap_sem. [akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s] Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>