git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-03-20	phy: amlogic: G12A: Fix misuse of GENMASK macro	Joe Perches
	Arguments are supposed to be ordered high then low. Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: qcom: qmp: Use power_on/off ops for PCIe	Bjorn Andersson
	The PCIe PHY initialization requires the attached device to be present, which is primarily achieved by the PCI controller driver. So move the logic from init/exit to power_on/power_off. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: uniphier-pcie: Add SoC-dependent phy-mode function support	Kunihiko Hayashi
	Since this phy is shared by multiple devices including USB and PCIe, it is necessary to determine which device use this phy. This patch adds SoC-dependent functions to determine a device using this phy. When there is 'socionext,syscon' property in the pcie-phy node, the driver calls SoC-dependt function instead of checking .has_syscon in SoC-dependent data. The function configures the system controller to use phy for PCIe. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: uniphier-pcie: Add legacy SoC support for Pro5	Kunihiko Hayashi
	Add legacy SoC support that needs to manage gio clock and reset and to skip setting unimplemented phy parameters. This supports Pro5. This specifies only 1 port use because Pro5 doesn't set it in the power-on sequence. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: uniphier-usb3hs: Change Rx sync mode to avoid communication failure	Kunihiko Hayashi
	In case of using default parameters, communication failure might occur in rare cases. This sets Rx sync mode parameter to avoid the issue. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: uniphier-usb3hs: Add legacy SoC support for Pro5	Kunihiko Hayashi
	Add legacy SoC support that needs to manage gio clock and reset. This supports Pro5. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: uniphier-usb3ss: Add Pro5 support	Kunihiko Hayashi
	Pro5 SoC has same scheme of USB3 ss-phy as Pro4, so the data for Pro5 is equivalent to Pro4. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	dt-bindings: phy: socionext: Add Pro5 support and remove Pro4 from usb3-hsphy	Kunihiko Hayashi
	This adds compatible string for Pro5 SoC that needs to manage gio clock and reset. And Pro4 SoC uses USB2 PHY instead of USB3 HS-PHY, so this removes Pro4 description from usb3-hsphy. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: socionext: Use devm_platform_ioremap_resource()	Kunihiko Hayashi
	Use devm_platform_ioremap_resource() to simplify the code. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: qcom-qmp: Add MSM8996 UFS QMP support	Bjorn Andersson
	The support for the 14nm MSM8996 UFS PHY is currently handled by the UFS-specific 14nm QMP driver, due to the earlier need for additional operations beyond the standard PHY API. Add support for this PHY to the common QMP driver, to allow us to remove the old driver. Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add support for subnode bindings	Swapnil Jakhade
	Implement single link subnode support to the phy driver. Add reset support including PHY reset and individual lane reset. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add platform dependent initialization structure	Swapnil Jakhade
	Add platform dependent initialization data for Torrent PHY used in TI's J721E SoC. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Use regmap to read and write DPTX PHY registers	Swapnil Jakhade
	Use regmap to read and write DPTX specific PHY registers. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Use regmap to read and write Torrent PHY registers	Swapnil Jakhade
	Use regmap for accessing Torrent PHY registers. Modify register offsets as defined in Torrent PHY user guide. Abstract address calculation using regmap APIs. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Implement PHY configure APIs	Swapnil Jakhade
	Add support for PHY configuration APIs. These will mainly reconfigure link rate, number of lanes, voltage swing and pre-emphasis values. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add 19.2 MHz reference clock support	Swapnil Jakhade
	Add configuration functions for 19.2 MHz refclock support. Add register configurations for SSC support. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Refactor code for reusability	Swapnil Jakhade
	Add a separate function to set different power state values. Use uniform polling timeout value. Also check return values of functions for proper error handling. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add wrapper for DPTX register access	Swapnil Jakhade
	Add wrapper functions to read, write DisplayPort specific PHY registers to improve code readability. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add wrapper for PHY register access	Swapnil Jakhade
	Add a wrapper function to write Torrent PHY registers to improve code readability. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Adopt Torrent nomenclature	Swapnil Jakhade
	- Change private data struct cdns_dp_phy to cdns_torrent_phy - Change module description and registration accordingly - Generic torrent functions have prefix cdns_torrent_phy_* - Functions specific to Torrent phy for DisplayPort are prefixed as cdns_torrent_dp_* Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-dp: Rename to phy-cadence-torrent	Yuti Amonkar
	Rename Cadence DP PHY driver from phy-cadence-dp to phy-cadence-torrent to make it more generic for future use. Modifiy Makefile and Kconfig accordingly. Also, change driver compatible from "cdns,dp-phy" to "cdns,torrent-phy".This will not affect ABI as the driver has never been functional, and therefore do not exist in any active use case. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	dt-bindings: phy: Add Cadence MHDP PHY bindings in YAML format.	Yuti Amonkar
	- Add Cadence MHDP PHY bindings in YAML format. - Add Torrent PHY reference clock bindings. - Add sub-node bindings for each group of PHY lanes based on PHY type. Each sub-node includes properties such as master lane number, link reset, phy type, number of lanes etc. - Add reset support including PHY reset and individual lane reset. - Add a new compatible string used for TI SoCs using Torrent PHY. This will not affect ABI as the driver has never been functional, and therefore do not exist in any active use case. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	platform/x86: touchscreen_dmi: Add info for the Chuwi Vi8 Plus tablet	Hans de Goede
	Add touchscreen info for the Chuwi Vi8 Plus tablet. This tablet uses a Chipone ICN8505 touchscreen controller, with the firmware used by the touchscreen embedded in the EFI firmware. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-11-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	platform/x86: touchscreen_dmi: Add EFI embedded firmware info support	Hans de Goede
	Sofar we have been unable to get permission from the vendors to put the firmware for touchscreens listed in touchscreen_dmi in linux-firmware. Some of the tablets with such a touchscreen have a touchscreen driver, and thus a copy of the firmware, as part of their EFI code. This commit adds the necessary info for the new EFI embedded-firmware code to extract these firmwares, making the touchscreen work OOTB without the user needing to manually add the firmware. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-10-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	Input: icn8505 - Switch to firmware_request_platform for retreiving the fw	Hans de Goede
	Unfortunately sofar we have been unable to get permission to redistribute icn8505 touchscreen firmwares in linux-firmware. This means that people need to find and install the firmware themselves before the touchscreen will work Some UEFI/x86 tablets with an icn8505 touchscreen have a copy of the fw embedded in their UEFI boot-services code. This commit makes the icn8505 driver use the new firmware_request_platform function, which will fallback to looking for such an embedded copy when direct filesystem lookup fails. This will make the touchscreen work OOTB on devices where there is a fw copy embedded in the UEFI code. Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-9-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	Input: silead - Switch to firmware_request_platform for retreiving the fw	Hans de Goede
	Unfortunately sofar we have been unable to get permission to redistribute Silead touchscreen firmwares in linux-firmware. This means that people need to find and install the firmware themselves before the touchscreen will work Some UEFI/x86 tablets with a Silead touchscreen have a copy of the fw embedded in their UEFI boot-services code. This commit makes the silead driver use the new firmware_request_platform function, which will fallback to looking for such an embedded copy when direct filesystem lookup fails. This will make the touchscreen work OOTB on devices where there is a fw copy embedded in the UEFI code. Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-8-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	selftests: firmware: Add firmware_request_platform tests	Hans de Goede
	Add tests cases for checking the new firmware_request_platform api. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20200115163554.101315-7-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	test_firmware: add support for firmware_request_platform	Hans de Goede
	Add support for testing firmware_request_platform through a new trigger_request_platform trigger. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20200115163554.101315-6-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	firmware: Add new platform fallback mechanism and firmware_request_platform()	Hans de Goede
	In some cases the platform's main firmware (e.g. the UEFI fw) may contain an embedded copy of device firmware which needs to be (re)loaded into the peripheral. Normally such firmware would be part of linux-firmware, but in some cases this is not feasible, for 2 reasons: 1) The firmware is customized for a specific use-case of the chipset / use with a specific hardware model, so we cannot have a single firmware file for the chipset. E.g. touchscreen controller firmwares are compiled specifically for the hardware model they are used with, as they are calibrated for a specific model digitizer. 2) Despite repeated attempts we have failed to get permission to redistribute the firmware. This is especially a problem with customized firmwares, these get created by the chip vendor for a specific ODM and the copyright may partially belong with the ODM, so the chip vendor cannot give a blanket permission to distribute these. This commit adds a new platform fallback mechanism to the firmware loader which will try to lookup a device fw copy embedded in the platform's main firmware if direct filesystem lookup fails. Drivers which need such embedded fw copies can enable this fallback mechanism by using the new firmware_request_platform() function. Note that for now this is only supported on EFI platforms and even on these platforms firmware_fallback_platform() only works if CONFIG_EFI_EMBEDDED_FIRMWARE is enabled (this gets selected by drivers which need this), in all other cases firmware_fallback_platform() simply always returns -ENOENT. Reported-by: Dave Olsthoorn <dave@bewaar.me> Suggested-by: Peter Jones <pjones@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-5-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	Merge tag 'stable-shared-branch-for-driver-tree' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into driver-core-next Ard writes: Stable shared branch between EFI and driver tree Stable shared branch to ease the integration of Hans's series to support device firmware loaded from EFI boot service memory regions. [PATCH v12 00/10] efi/firmware/platform-x86: Add EFI embedded fw support https://lore.kernel.org/linux-efi/20200115163554.101315-1-hdegoede@redhat.com/ * tag 'stable-shared-branch-for-driver-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: Add embedded peripheral firmware support efi: Export boot-services code and data as debugfs-blobs
2020-03-20	spi: atmel-quadspi: Add verbose debug facilities to monitor register accesses	Tudor Ambarus
	This feature should not be enabled in release but can be useful for developers who need to monitor register accesses at some specific places. Helped me identify a bug in u-boot, by comparing the register accesses from the linux driver with the ones from its u-boot variant. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20200320065058.891221-1-tudor.ambarus@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-03-20	lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions	Peter Zijlstra
	nmi_enter() does lockdep_off() and hence lockdep ignores everything. And NMI context makes it impossible to do full IN-NMI tracking like we do IN-HARDIRQ, that could result in graph_lock recursion. However, since look_up_lock_class() is lockless, we can find the class of a lock that has prior use and detect IN-NMI after USED, just not USED after IN-NMI. NOTE: By shifting the lockdep_off() recursion count to bit-16, we can easily differentiate between actual recursion and off. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lkml.kernel.org/r/20200221134215.090538203@infradead.org
2020-03-20	locking/lockdep: Rework lockdep_lock	Peter Zijlstra
	A few sites want to assert we own the graph_lock/lockdep_lock, provide a more conventional lock interface for it with a number of trivial debug checks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313102107.GX12561@hirez.programming.kicks-ass.net
2020-03-20	locking/lockdep: Fix bad recursion pattern	Peter Zijlstra
	There were two patterns for lockdep_recursion: Pattern-A: if (current->lockdep_recursion) return current->lockdep_recursion = 1; /* do stuff / current->lockdep_recursion = 0; Pattern-B: current->lockdep_recursion++; / do stuff / current->lockdep_recursion--; But a third pattern has emerged: Pattern-C: current->lockdep_recursion = 1; / do stuff */ current->lockdep_recursion = 0; And while this isn't broken per-se, it is highly dangerous because it doesn't nest properly. Get rid of all Pattern-C instances and shore up Pattern-A with a warning. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313093325.GW12561@hirez.programming.kicks-ass.net
2020-03-20	locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()	Boqun Feng
	Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep triggered a warning: [ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) ... [ ] Call Trace: [ ] lock_is_held_type+0x5d/0x150 [ ] ? rcu_lockdep_current_cpu_online+0x64/0x80 [ ] rcu_read_lock_any_held+0xac/0x100 [ ] ? rcu_read_lock_held+0xc0/0xc0 [ ] ? __slab_free+0x421/0x540 [ ] ? kasan_kmalloc+0x9/0x10 [ ] ? __kmalloc_node+0x1d7/0x320 [ ] ? kvmalloc_node+0x6f/0x80 [ ] __bfs+0x28a/0x3c0 [ ] ? class_equal+0x30/0x30 [ ] lockdep_count_forward_deps+0x11a/0x1a0 The warning got triggered because lockdep_count_forward_deps() call __bfs() without current->lockdep_recursion being set, as a result a lockdep internal function (__bfs()) is checked by lockdep, which is unexpected, and the inconsistency between the irq-off state and the state traced by lockdep caused the warning. Apart from this warning, lockdep internal functions like __bfs() should always be protected by current->lockdep_recursion to avoid potential deadlocks and data inconsistency, therefore add the current->lockdep_recursion on-and-off section to protect __bfs() in both lockdep_count_forward_deps() and lockdep_count_backward_deps() Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com
2020-03-20	perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box	Kan Liang
	The IMC uncore unit in Ice Lake server can only be accessed by MMIO, which is similar as Snow Ridge. Factor out __snr_uncore_mmio_init_box which can be shared with Ice Lake server in the following patch. No functional changes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1584470314-46657-2-git-send-email-kan.liang@linux.intel.com
2020-03-20	perf/x86/intel/uncore: Add box_offsets for free-running counters	Kan Liang
	The offset between uncore boxes of free-running counters varies, e.g. IIO free-running counters on Ice Lake server. Add box_offsets, an array of offsets between adjacent uncore boxes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1584470314-46657-1-git-send-email-kan.liang@linux.intel.com
2020-03-20	perf/core: Fix reversed NULL check in perf_event_groups_less()	Dan Carpenter
	This NULL check is reversed so it leads to a Smatch warning and presumably a NULL dereference. kernel/events/core.c:1598 perf_event_groups_less() error: we previously assumed 'right->cgrp->css.cgroup' could be null (see line 1590) Fixes: 95ed6c707f26 ("perf/cgroup: Order events in RB tree by cgroup id") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200312105637.GA8960@mwanda
2020-03-20	perf/core: Fix endless multiplex timer	Peter Zijlstra
	Kan and Andi reported that we fail to kill rotation when the flexible events go empty, but the context does not. XXX moar Fixes: fd7d55172d1e ("perf/cgroups: Don't rotate events for cgroups unnecessarily") Reported-by: Andi Kleen <ak@linux.intel.com> Reported-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200305123851.GX2596@hirez.programming.kicks-ass.net
2020-03-20	x86/optprobe: Fix OPTPROBE vs UACCESS	Peter Zijlstra
	While looking at an objtool UACCESS warning, it suddenly occurred to me that it is entirely possible to have an OPTPROBE right in the middle of an UACCESS region. In this case we must of course clear FLAGS.AC while running the KPROBE. Luckily the trampoline already saves/restores [ER]FLAGS, so all we need to do is inject a CLAC. Unfortunately we cannot use ALTERNATIVE() in the trampoline text, so we have to frob that manually. Fixes: ca0bbc70f147 ("sched/x86_64: Don't save flags on context switch") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200305092130.GU2596@hirez.programming.kicks-ass.net
2020-03-20	sched/fair: Fix condition of avg_load calculation	Tao Zhou
	In update_sg_wakeup_stats(), the comment says: Computing avg_load makes sense only when group is fully busy or overloaded. But, the code below this comment does not check like this. From reading the code about avg_load in other functions, I confirm that avg_load should be calculated in fully busy or overloaded case. The comment is correct and the checking condition is wrong. So, change that condition. Fixes: 57abff067a08 ("sched/fair: Rework find_idlest_group()") Signed-off-by: Tao Zhou <ouwen210@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/Message-ID:
2020-03-20	sched/rt: cpupri_find: Trigger a full search as fallback	Qais Yousef
	If we failed to find a fitting CPU, in cpupri_find(), we only fallback to the level we found a hit at. But Steve suggested to fallback to a second full scan instead as this could be a better effort. https://lore.kernel.org/lkml/20200304135404.146c56eb@gandalf.local.home/ We trigger the 2nd search unconditionally since the argument about triggering a full search is that the recorded fall back level might have become empty by then. Which means storing any data about what happened would be meaningless and stale. I had a humble try at timing it and it seemed okay for the small 6 CPUs system I was running on https://lore.kernel.org/lkml/20200305124324.42x6ehjxbnjkklnh@e107158-lin.cambridge.arm.com/ On large system this second full scan could be expensive. But there are no users outside capacity awareness for this fitness function at the moment. Heterogeneous systems tend to be small with 8cores in total. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20200310142219.syxzn5ljpdxqtbgx@e107158-lin.cambridge.arm.com
2020-03-20	kthread: Do not preempt current task if it is going to call schedule()	Liang Chen
	when we create a kthread with ktrhead_create_on_cpu(),the child thread entry is ktread.c:ktrhead() which will be preempted by the parent after call complete(done) while schedule() is not called yet,then the parent will call wait_task_inactive(child) but the child is still on the runqueue, so the parent will schedule_hrtimeout() for 1 jiffy,it will waste a lot of time,especially on startup. parent child ktrhead_create_on_cpu() wait_fo_completion(&done) -----> ktread.c:ktrhead() \|----- complete(done);--wakeup and preempted by parent kthread_bind() <------------\| \|-> schedule();--dequeue here wait_task_inactive(child) \| schedule_hrtimeout(1 jiffy) -\| So we hope the child just wakeup parent but not preempted by parent, and the child is going to call schedule() soon,then the parent will not call schedule_hrtimeout(1 jiffy) as the child is already dequeue. The same issue for ktrhead_park()&&kthread_parkme(). This patch can save 120ms on rk312x startup with CONFIG_HZ=300. Signed-off-by: Liang Chen <cl@rock-chips.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20200306070133.18335-2-cl@rock-chips.com
2020-03-20	sched/fair: Improve spreading of utilization	Vincent Guittot
	During load_balancing, a group with spare capacity will try to pull some utilizations from an overloaded group. In such case, the load balance looks for the runqueue with the highest utilization. Nevertheless, it should also ensure that there are some pending tasks to pull otherwise the load balance will fail to pull a task and the spread of the load will be delayed. This situation is quite transient but it's possible to highlight the effect with a short run of sysbench test so the time to spread task impacts the global result significantly. Below are the average results for 15 iterations on an arm64 octo core: sysbench --test=cpu --num-threads=8 --max-requests=1000 run tip/sched/core +patchset total time: 172ms 158ms per-request statistics: avg: 1.337ms 1.244ms max: 21.191ms 10.753ms The average max doesn't fully reflect the wide spread of the value which ranges from 1.350ms to more than 41ms for the tip/sched/core and from 1.350ms to 21ms with the patch. Other factors like waiting for an idle load balance or cache hotness can delay the spreading of the tasks which explains why we can still have up to 21ms with the patch. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200312165429.990-1-vincent.guittot@linaro.org
2020-03-20	sched: Avoid scale real weight down to zero	Michael Wang
	During our testing, we found a case that shares no longer working correctly, the cgroup topology is like: /sys/fs/cgroup/cpu/A (shares=102400) /sys/fs/cgroup/cpu/A/B (shares=2) /sys/fs/cgroup/cpu/A/B/C (shares=1024) /sys/fs/cgroup/cpu/D (shares=1024) /sys/fs/cgroup/cpu/D/E (shares=1024) /sys/fs/cgroup/cpu/D/E/F (shares=1024) The same benchmark is running in group C & F, no other tasks are running, the benchmark is capable to consumed all the CPUs. We suppose the group C will win more CPU resources since it could enjoy all the shares of group A, but it's F who wins much more. The reason is because we have group B with shares as 2, since A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus, so A->cfs_rq.load.weight become very small. And in calc_group_shares() we calculate shares as: load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); shares = (tg_shares * load) / tg_weight; Since the 'cfs_rq->load.weight' is too small, the load become 0 after scale down, although 'tg_shares' is 102400, shares of the se which stand for group A on root cfs_rq become 2. While the se of D on root cfs_rq is far more bigger than 2, so it wins the battle. Thus when scale_load_down() scale real weight down to 0, it's no longer telling the real story, the caller will have the wrong information and the calculation will be buggy. This patch add check in scale_load_down(), so the real weight will be >= MIN_SHARES after scale, after applied the group C wins as expected. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com
2020-03-20	psi: Move PF_MEMSTALL out of task->flags	Yafang Shao
	The task->flags is a 32-bits flag, in which 31 bits have already been consumed. So it is hardly to introduce other new per process flag. Currently there're still enough spaces in the bit-field section of task_struct, so we can define the memstall state as a single bit in task_struct instead. This patch also removes an out-of-date comment pointed by Matthew. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/1584408485-1921-1-git-send-email-laoar.shao@gmail.com
2020-03-20	MAINTAINERS: Add maintenance information for psi	Johannes Weiner
	Add a maintainer section for psi, as it's a user-visible, configurable kernel feature. The patches are still routed through the scheduler tree due to the close integration with that code, but get_maintainers.pl does the right thing and makes sure everybody gets CCd: $ ./scripts/get_maintainer.pl -f kernel/sched/psi.c Johannes Weiner <hannes@cmpxchg.org> (maintainer:PRESSURE STALL INFORMATION (PSI)) Ingo Molnar <mingo@redhat.com> (maintainer:SCHEDULER) Peter Zijlstra <peterz@infradead.org> (maintainer:SCHEDULER) ... Reported-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200316191333.115523-4-hannes@cmpxchg.org
2020-03-20	psi: Optimize switching tasks inside shared cgroups	Johannes Weiner
	When switching tasks running on a CPU, the psi state of a cgroup containing both of these tasks does not change. Right now, we don't exploit that, and can perform many unnecessary state changes in nested hierarchies, especially when most activity comes from one leaf cgroup. This patch implements an optimization where we only update cgroups whose state actually changes during a task switch. These are all cgroups that contain one task but not the other, up to the first shared ancestor. When both tasks are in the same group, we don't need to update anything at all. We can identify the first shared ancestor by walking the groups of the incoming task until we see TSK_ONCPU set on the local CPU; that's the first group that also contains the outgoing task. The new psi_task_switch() is similar to psi_task_change(). To allow code reuse, move the task flag maintenance code into a new function and the poll/avg worker wakeups into the shared psi_group_change(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200316191333.115523-3-hannes@cmpxchg.org
2020-03-20	psi: Fix cpu.pressure for cpu.max and competing cgroups	Johannes Weiner
	For simplicity, cpu pressure is defined as having more than one runnable task on a given CPU. This works on the system-level, but it has limitations in a cgrouped reality: When cpu.max is in use, it doesn't capture the time in which a task is not executing on the CPU due to throttling. Likewise, it doesn't capture the time in which a competing cgroup is occupying the CPU - meaning it only reflects cgroup-internal competitive pressure, not outside pressure. Enable tracking of currently executing tasks, and then change the definition of cpu pressure in a cgroup from NR_RUNNING > 1 to NR_RUNNING > ON_CPU which will capture the effects of cpu.max as well as competition from outside the cgroup. After this patch, a cgroup running `stress -c 1` with a cpu.max setting of 5000 10000 shows ~50% continuous CPU pressure. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200316191333.115523-2-hannes@cmpxchg.org
2020-03-20	sched/core: Distribute tasks within affinity masks	Paul Turner
	Currently, when updating the affinity of tasks via either cpusets.cpus, or, sched_setaffinity(); tasks not currently running within the newly specified mask will be arbitrarily assigned to the first CPU within the mask. This (particularly in the case that we are restricting masks) can result in many tasks being assigned to the first CPUs of their new masks. This: 1) Can induce scheduling delays while the load-balancer has a chance to spread them between their new CPUs. 2) Can antogonize a poor load-balancer behavior where it has a difficult time recognizing that a cross-socket imbalance has been forced by an affinity mask. This change adds a new cpumask interface to allow iterated calls to distribute within the intersection of the provided masks. The cases that this mainly affects are: - modifying cpuset.cpus - when tasks join a cpuset - when modifying a task's affinity via sched_setaffinity(2) Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Qais Yousef <qais.yousef@arm.com> Tested-by: Qais Yousef <qais.yousef@arm.com> Link: https://lkml.kernel.org/r/20200311010113.136465-1-joshdon@google.com