linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-03-20	phy: cadence-torrent: Use regmap to read and write DPTX PHY registers	Swapnil Jakhade
	Use regmap to read and write DPTX specific PHY registers. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Use regmap to read and write Torrent PHY registers	Swapnil Jakhade
	Use regmap for accessing Torrent PHY registers. Modify register offsets as defined in Torrent PHY user guide. Abstract address calculation using regmap APIs. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Implement PHY configure APIs	Swapnil Jakhade
	Add support for PHY configuration APIs. These will mainly reconfigure link rate, number of lanes, voltage swing and pre-emphasis values. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add 19.2 MHz reference clock support	Swapnil Jakhade
	Add configuration functions for 19.2 MHz refclock support. Add register configurations for SSC support. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Refactor code for reusability	Swapnil Jakhade
	Add a separate function to set different power state values. Use uniform polling timeout value. Also check return values of functions for proper error handling. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add wrapper for DPTX register access	Swapnil Jakhade
	Add wrapper functions to read, write DisplayPort specific PHY registers to improve code readability. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Add wrapper for PHY register access	Swapnil Jakhade
	Add a wrapper function to write Torrent PHY registers to improve code readability. Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-torrent: Adopt Torrent nomenclature	Swapnil Jakhade
	- Change private data struct cdns_dp_phy to cdns_torrent_phy - Change module description and registration accordingly - Generic torrent functions have prefix cdns_torrent_phy_* - Functions specific to Torrent phy for DisplayPort are prefixed as cdns_torrent_dp_* Signed-off-by: Swapnil Jakhade <sjakhade@cadence.com> Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	phy: cadence-dp: Rename to phy-cadence-torrent	Yuti Amonkar
	Rename Cadence DP PHY driver from phy-cadence-dp to phy-cadence-torrent to make it more generic for future use. Modifiy Makefile and Kconfig accordingly. Also, change driver compatible from "cdns,dp-phy" to "cdns,torrent-phy".This will not affect ABI as the driver has never been functional, and therefore do not exist in any active use case. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	dt-bindings: phy: Add Cadence MHDP PHY bindings in YAML format.	Yuti Amonkar
	- Add Cadence MHDP PHY bindings in YAML format. - Add Torrent PHY reference clock bindings. - Add sub-node bindings for each group of PHY lanes based on PHY type. Each sub-node includes properties such as master lane number, link reset, phy type, number of lanes etc. - Add reset support including PHY reset and individual lane reset. - Add a new compatible string used for TI SoCs using Torrent PHY. This will not affect ABI as the driver has never been functional, and therefore do not exist in any active use case. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-03-20	platform/x86: touchscreen_dmi: Add info for the Chuwi Vi8 Plus tablet	Hans de Goede
	Add touchscreen info for the Chuwi Vi8 Plus tablet. This tablet uses a Chipone ICN8505 touchscreen controller, with the firmware used by the touchscreen embedded in the EFI firmware. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-11-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	platform/x86: touchscreen_dmi: Add EFI embedded firmware info support	Hans de Goede
	Sofar we have been unable to get permission from the vendors to put the firmware for touchscreens listed in touchscreen_dmi in linux-firmware. Some of the tablets with such a touchscreen have a touchscreen driver, and thus a copy of the firmware, as part of their EFI code. This commit adds the necessary info for the new EFI embedded-firmware code to extract these firmwares, making the touchscreen work OOTB without the user needing to manually add the firmware. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-10-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	Input: icn8505 - Switch to firmware_request_platform for retreiving the fw	Hans de Goede
	Unfortunately sofar we have been unable to get permission to redistribute icn8505 touchscreen firmwares in linux-firmware. This means that people need to find and install the firmware themselves before the touchscreen will work Some UEFI/x86 tablets with an icn8505 touchscreen have a copy of the fw embedded in their UEFI boot-services code. This commit makes the icn8505 driver use the new firmware_request_platform function, which will fallback to looking for such an embedded copy when direct filesystem lookup fails. This will make the touchscreen work OOTB on devices where there is a fw copy embedded in the UEFI code. Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-9-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	Input: silead - Switch to firmware_request_platform for retreiving the fw	Hans de Goede
	Unfortunately sofar we have been unable to get permission to redistribute Silead touchscreen firmwares in linux-firmware. This means that people need to find and install the firmware themselves before the touchscreen will work Some UEFI/x86 tablets with a Silead touchscreen have a copy of the fw embedded in their UEFI boot-services code. This commit makes the silead driver use the new firmware_request_platform function, which will fallback to looking for such an embedded copy when direct filesystem lookup fails. This will make the touchscreen work OOTB on devices where there is a fw copy embedded in the UEFI code. Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-8-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	selftests: firmware: Add firmware_request_platform tests	Hans de Goede
	Add tests cases for checking the new firmware_request_platform api. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20200115163554.101315-7-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	test_firmware: add support for firmware_request_platform	Hans de Goede
	Add support for testing firmware_request_platform through a new trigger_request_platform trigger. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20200115163554.101315-6-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	firmware: Add new platform fallback mechanism and firmware_request_platform()	Hans de Goede
	In some cases the platform's main firmware (e.g. the UEFI fw) may contain an embedded copy of device firmware which needs to be (re)loaded into the peripheral. Normally such firmware would be part of linux-firmware, but in some cases this is not feasible, for 2 reasons: 1) The firmware is customized for a specific use-case of the chipset / use with a specific hardware model, so we cannot have a single firmware file for the chipset. E.g. touchscreen controller firmwares are compiled specifically for the hardware model they are used with, as they are calibrated for a specific model digitizer. 2) Despite repeated attempts we have failed to get permission to redistribute the firmware. This is especially a problem with customized firmwares, these get created by the chip vendor for a specific ODM and the copyright may partially belong with the ODM, so the chip vendor cannot give a blanket permission to distribute these. This commit adds a new platform fallback mechanism to the firmware loader which will try to lookup a device fw copy embedded in the platform's main firmware if direct filesystem lookup fails. Drivers which need such embedded fw copies can enable this fallback mechanism by using the new firmware_request_platform() function. Note that for now this is only supported on EFI platforms and even on these platforms firmware_fallback_platform() only works if CONFIG_EFI_EMBEDDED_FIRMWARE is enabled (this gets selected by drivers which need this), in all other cases firmware_fallback_platform() simply always returns -ENOENT. Reported-by: Dave Olsthoorn <dave@bewaar.me> Suggested-by: Peter Jones <pjones@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-5-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20	Merge tag 'stable-shared-branch-for-driver-tree' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into driver-core-next Ard writes: Stable shared branch between EFI and driver tree Stable shared branch to ease the integration of Hans's series to support device firmware loaded from EFI boot service memory regions. [PATCH v12 00/10] efi/firmware/platform-x86: Add EFI embedded fw support https://lore.kernel.org/linux-efi/20200115163554.101315-1-hdegoede@redhat.com/ * tag 'stable-shared-branch-for-driver-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: Add embedded peripheral firmware support efi: Export boot-services code and data as debugfs-blobs
2020-03-20	spi: atmel-quadspi: Add verbose debug facilities to monitor register accesses	Tudor Ambarus
	This feature should not be enabled in release but can be useful for developers who need to monitor register accesses at some specific places. Helped me identify a bug in u-boot, by comparing the register accesses from the linux driver with the ones from its u-boot variant. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20200320065058.891221-1-tudor.ambarus@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-03-20	lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions	Peter Zijlstra
	nmi_enter() does lockdep_off() and hence lockdep ignores everything. And NMI context makes it impossible to do full IN-NMI tracking like we do IN-HARDIRQ, that could result in graph_lock recursion. However, since look_up_lock_class() is lockless, we can find the class of a lock that has prior use and detect IN-NMI after USED, just not USED after IN-NMI. NOTE: By shifting the lockdep_off() recursion count to bit-16, we can easily differentiate between actual recursion and off. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lkml.kernel.org/r/20200221134215.090538203@infradead.org
2020-03-20	locking/lockdep: Rework lockdep_lock	Peter Zijlstra
	A few sites want to assert we own the graph_lock/lockdep_lock, provide a more conventional lock interface for it with a number of trivial debug checks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313102107.GX12561@hirez.programming.kicks-ass.net
2020-03-20	locking/lockdep: Fix bad recursion pattern	Peter Zijlstra
	There were two patterns for lockdep_recursion: Pattern-A: if (current->lockdep_recursion) return current->lockdep_recursion = 1; /* do stuff / current->lockdep_recursion = 0; Pattern-B: current->lockdep_recursion++; / do stuff / current->lockdep_recursion--; But a third pattern has emerged: Pattern-C: current->lockdep_recursion = 1; / do stuff */ current->lockdep_recursion = 0; And while this isn't broken per-se, it is highly dangerous because it doesn't nest properly. Get rid of all Pattern-C instances and shore up Pattern-A with a warning. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313093325.GW12561@hirez.programming.kicks-ass.net
2020-03-20	locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()	Boqun Feng
	Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep triggered a warning: [ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) ... [ ] Call Trace: [ ] lock_is_held_type+0x5d/0x150 [ ] ? rcu_lockdep_current_cpu_online+0x64/0x80 [ ] rcu_read_lock_any_held+0xac/0x100 [ ] ? rcu_read_lock_held+0xc0/0xc0 [ ] ? __slab_free+0x421/0x540 [ ] ? kasan_kmalloc+0x9/0x10 [ ] ? __kmalloc_node+0x1d7/0x320 [ ] ? kvmalloc_node+0x6f/0x80 [ ] __bfs+0x28a/0x3c0 [ ] ? class_equal+0x30/0x30 [ ] lockdep_count_forward_deps+0x11a/0x1a0 The warning got triggered because lockdep_count_forward_deps() call __bfs() without current->lockdep_recursion being set, as a result a lockdep internal function (__bfs()) is checked by lockdep, which is unexpected, and the inconsistency between the irq-off state and the state traced by lockdep caused the warning. Apart from this warning, lockdep internal functions like __bfs() should always be protected by current->lockdep_recursion to avoid potential deadlocks and data inconsistency, therefore add the current->lockdep_recursion on-and-off section to protect __bfs() in both lockdep_count_forward_deps() and lockdep_count_backward_deps() Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com
2020-03-20	perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box	Kan Liang
	The IMC uncore unit in Ice Lake server can only be accessed by MMIO, which is similar as Snow Ridge. Factor out __snr_uncore_mmio_init_box which can be shared with Ice Lake server in the following patch. No functional changes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1584470314-46657-2-git-send-email-kan.liang@linux.intel.com
2020-03-20	perf/x86/intel/uncore: Add box_offsets for free-running counters	Kan Liang
	The offset between uncore boxes of free-running counters varies, e.g. IIO free-running counters on Ice Lake server. Add box_offsets, an array of offsets between adjacent uncore boxes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1584470314-46657-1-git-send-email-kan.liang@linux.intel.com
2020-03-20	perf/core: Fix reversed NULL check in perf_event_groups_less()	Dan Carpenter
	This NULL check is reversed so it leads to a Smatch warning and presumably a NULL dereference. kernel/events/core.c:1598 perf_event_groups_less() error: we previously assumed 'right->cgrp->css.cgroup' could be null (see line 1590) Fixes: 95ed6c707f26 ("perf/cgroup: Order events in RB tree by cgroup id") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200312105637.GA8960@mwanda
2020-03-20	perf/core: Fix endless multiplex timer	Peter Zijlstra
	Kan and Andi reported that we fail to kill rotation when the flexible events go empty, but the context does not. XXX moar Fixes: fd7d55172d1e ("perf/cgroups: Don't rotate events for cgroups unnecessarily") Reported-by: Andi Kleen <ak@linux.intel.com> Reported-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200305123851.GX2596@hirez.programming.kicks-ass.net
2020-03-20	x86/optprobe: Fix OPTPROBE vs UACCESS	Peter Zijlstra
	While looking at an objtool UACCESS warning, it suddenly occurred to me that it is entirely possible to have an OPTPROBE right in the middle of an UACCESS region. In this case we must of course clear FLAGS.AC while running the KPROBE. Luckily the trampoline already saves/restores [ER]FLAGS, so all we need to do is inject a CLAC. Unfortunately we cannot use ALTERNATIVE() in the trampoline text, so we have to frob that manually. Fixes: ca0bbc70f147 ("sched/x86_64: Don't save flags on context switch") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200305092130.GU2596@hirez.programming.kicks-ass.net
2020-03-20	sched/fair: Fix condition of avg_load calculation	Tao Zhou
	In update_sg_wakeup_stats(), the comment says: Computing avg_load makes sense only when group is fully busy or overloaded. But, the code below this comment does not check like this. From reading the code about avg_load in other functions, I confirm that avg_load should be calculated in fully busy or overloaded case. The comment is correct and the checking condition is wrong. So, change that condition. Fixes: 57abff067a08 ("sched/fair: Rework find_idlest_group()") Signed-off-by: Tao Zhou <ouwen210@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/Message-ID:
2020-03-20	sched/rt: cpupri_find: Trigger a full search as fallback	Qais Yousef
	If we failed to find a fitting CPU, in cpupri_find(), we only fallback to the level we found a hit at. But Steve suggested to fallback to a second full scan instead as this could be a better effort. https://lore.kernel.org/lkml/20200304135404.146c56eb@gandalf.local.home/ We trigger the 2nd search unconditionally since the argument about triggering a full search is that the recorded fall back level might have become empty by then. Which means storing any data about what happened would be meaningless and stale. I had a humble try at timing it and it seemed okay for the small 6 CPUs system I was running on https://lore.kernel.org/lkml/20200305124324.42x6ehjxbnjkklnh@e107158-lin.cambridge.arm.com/ On large system this second full scan could be expensive. But there are no users outside capacity awareness for this fitness function at the moment. Heterogeneous systems tend to be small with 8cores in total. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20200310142219.syxzn5ljpdxqtbgx@e107158-lin.cambridge.arm.com
2020-03-20	kthread: Do not preempt current task if it is going to call schedule()	Liang Chen
	when we create a kthread with ktrhead_create_on_cpu(),the child thread entry is ktread.c:ktrhead() which will be preempted by the parent after call complete(done) while schedule() is not called yet,then the parent will call wait_task_inactive(child) but the child is still on the runqueue, so the parent will schedule_hrtimeout() for 1 jiffy,it will waste a lot of time,especially on startup. parent child ktrhead_create_on_cpu() wait_fo_completion(&done) -----> ktread.c:ktrhead() \|----- complete(done);--wakeup and preempted by parent kthread_bind() <------------\| \|-> schedule();--dequeue here wait_task_inactive(child) \| schedule_hrtimeout(1 jiffy) -\| So we hope the child just wakeup parent but not preempted by parent, and the child is going to call schedule() soon,then the parent will not call schedule_hrtimeout(1 jiffy) as the child is already dequeue. The same issue for ktrhead_park()&&kthread_parkme(). This patch can save 120ms on rk312x startup with CONFIG_HZ=300. Signed-off-by: Liang Chen <cl@rock-chips.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20200306070133.18335-2-cl@rock-chips.com
2020-03-20	sched/fair: Improve spreading of utilization	Vincent Guittot
	During load_balancing, a group with spare capacity will try to pull some utilizations from an overloaded group. In such case, the load balance looks for the runqueue with the highest utilization. Nevertheless, it should also ensure that there are some pending tasks to pull otherwise the load balance will fail to pull a task and the spread of the load will be delayed. This situation is quite transient but it's possible to highlight the effect with a short run of sysbench test so the time to spread task impacts the global result significantly. Below are the average results for 15 iterations on an arm64 octo core: sysbench --test=cpu --num-threads=8 --max-requests=1000 run tip/sched/core +patchset total time: 172ms 158ms per-request statistics: avg: 1.337ms 1.244ms max: 21.191ms 10.753ms The average max doesn't fully reflect the wide spread of the value which ranges from 1.350ms to more than 41ms for the tip/sched/core and from 1.350ms to 21ms with the patch. Other factors like waiting for an idle load balance or cache hotness can delay the spreading of the tasks which explains why we can still have up to 21ms with the patch. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200312165429.990-1-vincent.guittot@linaro.org
2020-03-20	sched: Avoid scale real weight down to zero	Michael Wang
	During our testing, we found a case that shares no longer working correctly, the cgroup topology is like: /sys/fs/cgroup/cpu/A (shares=102400) /sys/fs/cgroup/cpu/A/B (shares=2) /sys/fs/cgroup/cpu/A/B/C (shares=1024) /sys/fs/cgroup/cpu/D (shares=1024) /sys/fs/cgroup/cpu/D/E (shares=1024) /sys/fs/cgroup/cpu/D/E/F (shares=1024) The same benchmark is running in group C & F, no other tasks are running, the benchmark is capable to consumed all the CPUs. We suppose the group C will win more CPU resources since it could enjoy all the shares of group A, but it's F who wins much more. The reason is because we have group B with shares as 2, since A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus, so A->cfs_rq.load.weight become very small. And in calc_group_shares() we calculate shares as: load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); shares = (tg_shares * load) / tg_weight; Since the 'cfs_rq->load.weight' is too small, the load become 0 after scale down, although 'tg_shares' is 102400, shares of the se which stand for group A on root cfs_rq become 2. While the se of D on root cfs_rq is far more bigger than 2, so it wins the battle. Thus when scale_load_down() scale real weight down to 0, it's no longer telling the real story, the caller will have the wrong information and the calculation will be buggy. This patch add check in scale_load_down(), so the real weight will be >= MIN_SHARES after scale, after applied the group C wins as expected. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com
2020-03-20	psi: Move PF_MEMSTALL out of task->flags	Yafang Shao
	The task->flags is a 32-bits flag, in which 31 bits have already been consumed. So it is hardly to introduce other new per process flag. Currently there're still enough spaces in the bit-field section of task_struct, so we can define the memstall state as a single bit in task_struct instead. This patch also removes an out-of-date comment pointed by Matthew. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/1584408485-1921-1-git-send-email-laoar.shao@gmail.com
2020-03-20	MAINTAINERS: Add maintenance information for psi	Johannes Weiner
	Add a maintainer section for psi, as it's a user-visible, configurable kernel feature. The patches are still routed through the scheduler tree due to the close integration with that code, but get_maintainers.pl does the right thing and makes sure everybody gets CCd: $ ./scripts/get_maintainer.pl -f kernel/sched/psi.c Johannes Weiner <hannes@cmpxchg.org> (maintainer:PRESSURE STALL INFORMATION (PSI)) Ingo Molnar <mingo@redhat.com> (maintainer:SCHEDULER) Peter Zijlstra <peterz@infradead.org> (maintainer:SCHEDULER) ... Reported-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200316191333.115523-4-hannes@cmpxchg.org
2020-03-20	psi: Optimize switching tasks inside shared cgroups	Johannes Weiner
	When switching tasks running on a CPU, the psi state of a cgroup containing both of these tasks does not change. Right now, we don't exploit that, and can perform many unnecessary state changes in nested hierarchies, especially when most activity comes from one leaf cgroup. This patch implements an optimization where we only update cgroups whose state actually changes during a task switch. These are all cgroups that contain one task but not the other, up to the first shared ancestor. When both tasks are in the same group, we don't need to update anything at all. We can identify the first shared ancestor by walking the groups of the incoming task until we see TSK_ONCPU set on the local CPU; that's the first group that also contains the outgoing task. The new psi_task_switch() is similar to psi_task_change(). To allow code reuse, move the task flag maintenance code into a new function and the poll/avg worker wakeups into the shared psi_group_change(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200316191333.115523-3-hannes@cmpxchg.org
2020-03-20	psi: Fix cpu.pressure for cpu.max and competing cgroups	Johannes Weiner
	For simplicity, cpu pressure is defined as having more than one runnable task on a given CPU. This works on the system-level, but it has limitations in a cgrouped reality: When cpu.max is in use, it doesn't capture the time in which a task is not executing on the CPU due to throttling. Likewise, it doesn't capture the time in which a competing cgroup is occupying the CPU - meaning it only reflects cgroup-internal competitive pressure, not outside pressure. Enable tracking of currently executing tasks, and then change the definition of cpu pressure in a cgroup from NR_RUNNING > 1 to NR_RUNNING > ON_CPU which will capture the effects of cpu.max as well as competition from outside the cgroup. After this patch, a cgroup running `stress -c 1` with a cpu.max setting of 5000 10000 shows ~50% continuous CPU pressure. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200316191333.115523-2-hannes@cmpxchg.org
2020-03-20	sched/core: Distribute tasks within affinity masks	Paul Turner
	Currently, when updating the affinity of tasks via either cpusets.cpus, or, sched_setaffinity(); tasks not currently running within the newly specified mask will be arbitrarily assigned to the first CPU within the mask. This (particularly in the case that we are restricting masks) can result in many tasks being assigned to the first CPUs of their new masks. This: 1) Can induce scheduling delays while the load-balancer has a chance to spread them between their new CPUs. 2) Can antogonize a poor load-balancer behavior where it has a difficult time recognizing that a cross-socket imbalance has been forced by an affinity mask. This change adds a new cpumask interface to allow iterated calls to distribute within the intersection of the provided masks. The cases that this mainly affects are: - modifying cpuset.cpus - when tasks join a cpuset - when modifying a task's affinity via sched_setaffinity(2) Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Qais Yousef <qais.yousef@arm.com> Tested-by: Qais Yousef <qais.yousef@arm.com> Link: https://lkml.kernel.org/r/20200311010113.136465-1-joshdon@google.com
2020-03-20	sched/fair: Fix enqueue_task_fair warning	Vincent Guittot
	When a cfs rq is throttled, the latter and its child are removed from the leaf list but their nr_running is not changed which includes staying higher than 1. When a task is enqueued in this throttled branch, the cfs rqs must be added back in order to ensure correct ordering in the list but this can only happens if nr_running == 1. When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq() when enqueuing an entity to make sure that the complete branch will be added. Similarly unthrottle_cfs_rq() can stop adding cfs in the list when a parent is throttled. Iterate the remaining entity to ensure that the complete branch will be added in the list. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org #v5.1+ Link: https://lkml.kernel.org/r/20200306135257.25044-1-vincent.guittot@linaro.org
2020-03-20	arm64: alternative: fix build with clang integrated assembler	Ilie Halip
	Building an arm64 defconfig with clang's integrated assembler, this error occurs: <instantiation>:2:2: error: unrecognized instruction mnemonic _ASM_EXTABLE 9999b, 9f ^ arch/arm64/mm/cache.S:50:1: note: while in macro instantiation user_alt 9f, "dc cvau, x4", "dc civac, x4", 0 ^ While GNU as seems fine with case-sensitive macro instantiations, clang doesn't, so use the actual macro name (_asm_extable) as in the rest of the file. Also checked that the generated assembly matches the GCC output. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Fixes: 290622efc76e ("arm64: fix "dc cvau" cache operation on errata-affected core") Link: https://github.com/ClangBuiltLinux/linux/issues/924 Signed-off-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-20	media: allegro: create new struct for channel parameters	Michael Tretter
	Add a new struct for the channel parameters that is contained in the CREATE_CHANNEL message. This is in preparation for newer firmwares that pass the channel parameters in a dedicated buffer instead of embedding the parameters into the CREATE_CHANNEL message. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: move mail definitions to separate file	Michael Tretter
	Move the mail definitions from the driver core to a dedicated file. The mails that are exchanged between driver and firmware are not stable across firmware versions. This is in preparation to make the driver able to handle multiple firmware version by having dedicated code for handling mails. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: pass buffers through firmware	Michael Tretter
	As we know which buffers are processed by the codec from the address in the ENCODE_FRAME response, we can queue multiple buffers in the firmware and retrieve the buffer from the response of the firmware. This enables the firmware to use the internal scheduling the codec and avoids round trips through the driver when fetching the next frame. Remove buffers that have been passed to the firmware from the m2m buffer queue and put them into a shadow queue for tracking the buffer in the driver. When we receive a ENCODE_FRAME response from the firmware, get the buffer from the shadow queue and finish the buffer. Furthermore, it is necessary to finish the job straight after passing the buffer to the firmware to allow the V4L2 framework to send further buffers to the driver. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: verify source and destination buffer in VCU response	Michael Tretter
	The PUT_STREAM_BUFFER and ENCODE_FRAME request have fields that allow to pass arbitrary 64 bit values through the firmware to the ENCODE_FRAME response. Use these values to verify that the buffers when finishing the frame are actually the same buffers that have been sent for encoding a frame. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: handle dependency of bitrate and bitrate_peak	Michael Tretter
	The peak bitrate must not be smaller than the configured bitrate. Update the other control whenever one of the controls changes to reflect this dependency. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: read bitrate mode directly from control	Michael Tretter
	There is no need to copy the bitrate mode to a field in the channel and the value can be read directly from the control. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: make QP configurable	Michael Tretter
	The V4L2_CID_MPEG_VIDEO_FRAME_RC_ENABLE control allows to enable/disable rate control on a channel. When rate control is disabled, the driver shall use constant QP, which are set by the application. Also implement the controls for configuring the QP. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: make frame rate configurable	Michael Tretter
	The allegro dvt codec adjust the encoding speed according to a configured frame rate. Furthermore, the frame rate is written into the coded stream. Ensure that the coded video data has the correct frame rate by implementing s_parm for setting the frame rate from user space. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: skip filler data if possible	Michael Tretter
	The driver instructs the firmware to leave some space space in front of the coded frame data for SPS/PPS data. If the driver receives an IDR, it writes the SPS/PPS into that free space and fills the rest with filler data. However, if there is no additional data, the driver can use the plane offset to skip this space instead of adding filler data. As the size of the SPS/PPS is only available after writing it, keep the filler data between the SPS/PPS and the coded frame data. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-03-20	media: allegro: warn if response message has an unexpected size	Michael Tretter
	The driver uses structs to parse the responses from the VCU and expects a certain size of the responses. However, the size and format of the mails is not stable across firmware versions. Therefore, print a warning if the size does not match the expected size to warn the user that strange things might happen. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>