summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-17KVM: x86/mmu: Factor out allocating memslot rmapBen Gardon
Small refactor to facilitate allocating rmaps for all memslots at once. No functional change expected. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-3-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86/mmu: Deduplicate rmap freeingBen Gardon
Small code deduplication. No functional change expected. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Do not write protect huge page in initially-all-set modeKeqian Zhu
Currently, when dirty logging is started in initially-all-set mode, we write protect huge pages to prepare for splitting them into 4K pages, and leave normal pages untouched as the logging will be enabled lazily as dirty bits are cleared. However, enabling dirty logging lazily is also feasible for huge pages. This not only reduces the time of start dirty logging, but it also greatly reduces side-effect on guest when there is high dirty rate. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20210429034115.35560-3-zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Support write protecting only large pagesKeqian Zhu
Prepare for write protecting large page lazily during dirty log tracking, for which we will only need to write protect gfns at large page granularity. No functional or performance change expected. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20210429034115.35560-2-zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: hyper-v: Advertise support for fast XMM hypercallsSiddharth Chandrasekaran
Now that kvm_hv_flush_tlb() has been patched to support XMM hypercall inputs, we can start advertising this feature to guests. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <e63fc1c61dd2efecbefef239f4f0a598bd552750.1622019134.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: kvm_hv_flush_tlb use inputs from XMM registersSiddharth Chandrasekaran
Hyper-V supports the use of XMM registers to perform fast hypercalls. This allows guests to take advantage of the improved performance of the fast hypercall interface even though a hypercall may require more than (the current maximum of) two input registers. The XMM fast hypercall interface uses six additional XMM registers (XMM0 to XMM5) to allow the guest to pass an input parameter block of up to 112 bytes. Add framework to read from XMM registers in kvm_hv_hypercall() and use the additional hypercall inputs from XMM registers in kvm_hv_flush_tlb() when possible. Cc: Alexander Graf <graf@amazon.com> Co-developed-by: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <fc62edad33f1920fe5c74dde47d7d0b4275a9012.1622019134.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: hyper-v: Collect hypercall params into structSiddharth Chandrasekaran
As of now there are 7 parameters (and flags) that are used in various hyper-v hypercall handlers. There are 6 more input/output parameters passed from XMM registers which are to be added in an upcoming patch. To make passing arguments to the handlers more readable, capture all these parameters into a single structure. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <273f7ed510a1f6ba177e61b73a5c7bfbee4a4a87.1622019133.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Move FPU register accessors into fpu.hSiddharth Chandrasekaran
Hyper-v XMM fast hypercalls use XMM registers to pass input/output parameters. To access these, hyperv.c can reuse some FPU register accessors defined in emulator.c. Move them to a common location so both can access them. While at it, reorder the parameters of these accessor methods to make them more readable. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <01a85a6560714d4d3637d3d86e5eba65073318fa.1622019133.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86/mmu: Make is_nx_huge_page_enabled an inline functionShaokun Zhang
Function 'is_nx_huge_page_enabled' is called only by kvm/mmu, so make it as inline fucntion and remove the unnecessary declaration. Cc: Ben Gardon <bgardon@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Message-Id: <1622102271-63107-1-git-send-email-zhangshaokun@hisilicon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: selftests: Fix kvm_check_cap() assertionFuad Tabba
KVM_CHECK_EXTENSION ioctl can return any negative value on error, and not necessarily -1. Change the assertion to reflect that. Signed-off-by: Fuad Tabba <tabba@google.com> Message-Id: <20210615150443.1183365-1-tabba@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17arm64: suspend: Use cpuidle context helpers in cpu_suspend()Marc Zyngier
Use cpuidle context helpers to switch to using DAIF.IF instead of PMR to mask interrupts, ensuring that we suspend with interrupts being able to reach the CPU interface. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20210615111227.2454465-5-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-06-17PSCI: Use cpuidle context helpers in psci_cpu_suspend_enter()Marc Zyngier
The PSCI CPU suspend code isn't aware of the PMR vs DAIF game, resulting in a system that locks up if entering CPU suspend with GICv3 pNMI enabled. To save the day, teach the suspend code about our new cpuidle context helpers, which will do everything that's required just like the usual WFI cpuidle code. This fixes my Altra system, which would otherwise lock-up at boot time when booted with irqchip.gicv3_pseudo_nmi=1. Tested-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20210615111227.2454465-4-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-06-17arm64: Convert cpu_do_idle() to using cpuidle context helpersMarc Zyngier
Now that we have helpers that are aware of the pseudo-NMI feature, introduce them to cpu_do_idle(). This allows for some nice cleanup. No functional change intended. Tested-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210615111227.2454465-3-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-06-17arm64: Add cpuidle context save/restore helpersMarc Zyngier
As we need to start doing some additional work on all idle paths, let's introduce a set of macros that will perform the work related to the GICv3 pseudo-NMI idle entry exit. Stubs are introduced to 32bit ARM for compatibility. As these helpers are currently unused, there is no functional change. Tested-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210615111227.2454465-2-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-06-17Merge tag 'fixes_for_v5.13-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and fanotify fixes from Jan Kara: "A fixup finishing disabling of quotactl_path() syscall (I've missed archs using different way to declare syscalls) and a fix of an fd leak in error handling path of fanotify" * tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: finish disable quotactl_path syscall fanotify: fix copy_event_to_user() fid error clean up
2021-06-17usb: core: hub: Disable autosuspend for Cypress CY7C65632Andrew Lunn
The Cypress CY7C65632 appears to have an issue with auto suspend and detecting devices, not too dissimilar to the SMSC 5534B hub. It is easiest to reproduce by connecting multiple mass storage devices to the hub at the same time. On a Lenovo Yoga, around 1 in 3 attempts result in the devices not being detected. It is however possible to make them appear using lsusb -v. Disabling autosuspend for this hub resolves the issue. Fixes: 1208f9e1d758 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub") Cc: stable@vger.kernel.org Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210614155524.2228800-1-andrew@lunn.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-17Merge tag 'irqchip-fixes-5.13-2' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Fix GICv3 NMI handling where an IRQ could be mistakenly handled as a NMI, with disatrous effects Link: https://lore.kernel.org/r/20210610171127.2404752-1-maz@kernel.org
2021-06-17spi: xilinx: convert to yamlNobuhiro Iwamatsu
Convert SPI for Xilinx bindings documentation to YAML schemas. Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210605002931.858031-1-iwamatsu@nigauri.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-17spi: convert Cadence SPI bindings to YAMLNobuhiro Iwamatsu
Convert spi for Cadence SPI bindings documentation to YAML. Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210605003811.858676-1-iwamatsu@nigauri.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-17perf/x86: Reset the dirty counter to prevent the leak for an RDPMC taskKan Liang
The counter value of a perf task may leak to another RDPMC task. For example, a perf stat task as below is running on CPU 0. perf stat -e 'branches,cycles' -- taskset -c 0 ./workload In the meantime, an RDPMC task, which is also running on CPU 0, may read the GP counters periodically. (The RDPMC task creates a fixed event, but read four GP counters.) $./rdpmc_read_all_counters index 0x0 value 0x8001e5970f99 index 0x1 value 0x8005d750edb6 index 0x2 value 0x0 index 0x3 value 0x0 index 0x0 value 0x8002358e48a5 index 0x1 value 0x8006bd1e3bc9 index 0x2 value 0x0 index 0x3 value 0x0 It is a potential security issue. Once the attacker knows what the other thread is counting. The PerfMon counter can be used as a side-channel to attack cryptosystems. The counter value of the perf stat task leaks to the RDPMC task because perf never clears the counter when it's stopped. Three methods were considered to address the issue. - Unconditionally reset the counter in x86_pmu_del(). It can bring extra overhead even when there is no RDPMC task running. - Only reset the un-assigned dirty counters when the RDPMC task is scheduled in via sched_task(). It fails for the below case. Thread A Thread B clone(CLONE_THREAD) ---> set_affine(0) set_affine(1) while (!event-enabled) ; event = perf_event_open() mmap(event) ioctl(event, IOC_ENABLE); ---> RDPMC Counters are still leaked to the thread B. - Only reset the un-assigned dirty counters before updating the CR4.PCE bit. The method is implemented here. The dirty counter is a counter, on which the assigned event has been deleted, but the counter is not reset. To track the dirty counters, add a 'dirty' variable in the struct cpu_hw_events. The security issue can only be found with an RDPMC task. To enable the RDMPC, the CR4.PCE bit has to be updated. Add a perf_clear_dirty_counters() right before updating the CR4.PCE bit to clear the existing dirty counters. Only the current un-assigned dirty counters are reset, because the RDPMC assigned dirty counters will be updated soon. After applying the patch, $ ./rdpmc_read_all_counters index 0x0 value 0x0 index 0x1 value 0x0 index 0x2 value 0x0 index 0x3 value 0x0 index 0x0 value 0x0 index 0x1 value 0x0 index 0x2 value 0x0 index 0x3 value 0x0 Performance The performance of a context switch only be impacted when there are two or more perf users and one of the users must be an RDPMC user. In other cases, there is no performance impact. The worst-case occurs when there are two users: the RDPMC user only uses one counter; while the other user uses all available counters. When the RDPMC task is scheduled in, all the counters, other than the RDPMC assigned one, have to be reset. Test results for the worst-case, using a modified lat_ctx as measured on an Ice Lake platform, which has 8 GP and 3 FP counters (ignoring SLOTS). lat_ctx -s 128K -N 1000 processes 2 Without the patch: The context switch time is 4.97 us With the patch: The context switch time is 5.16 us There is ~4% performance drop for the context switching time in the worst-case. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1623693582-187370-1-git-send-email-kan.liang@linux.intel.com
2021-06-17sched/fair: Age the average idle timePeter Zijlstra
This is a partial forward-port of Peter Ziljstra's work first posted at: https://lore.kernel.org/lkml/20180530142236.667774973@infradead.org/ Currently select_idle_cpu()'s proportional scheme uses the average idle time *for when we are idle*, that is temporally challenged. When a CPU is not at all idle, we'll happily continue using whatever value we did see when the CPU goes idle. To fix this, introduce a separate average idle and age it (the existing value still makes sense for things like new-idle balancing, which happens when we do go idle). The overall goal is to not spend more time scanning for idle CPUs than we're idle for. Otherwise we're inhibiting work. This means that we need to consider the cost over all the wake-ups between consecutive idle periods. To track this, the scan cost is subtracted from the estimated average idle time. The impact of this patch is related to workloads that have domains that are fully busy or overloaded. Without the patch, the scan depth may be too high because a CPU is not reaching idle. Due to the nature of the patch, this is a regression magnet. It potentially wins when domains are almost fully busy or overloaded -- at that point searches are likely to fail but idle is not being aged as CPUs are active so search depth is too large and useless. It will potentially show regressions when there are idle CPUs and a deep search is beneficial. This tbench result on a 2-socket broadwell machine partially illustates the problem 5.13.0-rc2 5.13.0-rc2 vanilla sched-avgidle-v1r5 Hmean 1 445.02 ( 0.00%) 451.36 * 1.42%* Hmean 2 830.69 ( 0.00%) 846.03 * 1.85%* Hmean 4 1350.80 ( 0.00%) 1505.56 * 11.46%* Hmean 8 2888.88 ( 0.00%) 2586.40 * -10.47%* Hmean 16 5248.18 ( 0.00%) 5305.26 * 1.09%* Hmean 32 8914.03 ( 0.00%) 9191.35 * 3.11%* Hmean 64 10663.10 ( 0.00%) 10192.65 * -4.41%* Hmean 128 18043.89 ( 0.00%) 18478.92 * 2.41%* Hmean 256 16530.89 ( 0.00%) 17637.16 * 6.69%* Hmean 320 16451.13 ( 0.00%) 17270.97 * 4.98%* Note that 8 was a regression point where a deeper search would have helped but it gains for high thread counts when searches are useless. Hackbench is a more extreme example although not perfect as the tasks idle rapidly hackbench-process-pipes 5.13.0-rc2 5.13.0-rc2 vanilla sched-avgidle-v1r5 Amean 1 0.3950 ( 0.00%) 0.3887 ( 1.60%) Amean 4 0.9450 ( 0.00%) 0.9677 ( -2.40%) Amean 7 1.4737 ( 0.00%) 1.4890 ( -1.04%) Amean 12 2.3507 ( 0.00%) 2.3360 * 0.62%* Amean 21 4.0807 ( 0.00%) 4.0993 * -0.46%* Amean 30 5.6820 ( 0.00%) 5.7510 * -1.21%* Amean 48 8.7913 ( 0.00%) 8.7383 ( 0.60%) Amean 79 14.3880 ( 0.00%) 13.9343 * 3.15%* Amean 110 21.2233 ( 0.00%) 19.4263 * 8.47%* Amean 141 28.2930 ( 0.00%) 25.1003 * 11.28%* Amean 172 34.7570 ( 0.00%) 30.7527 * 11.52%* Amean 203 41.0083 ( 0.00%) 36.4267 * 11.17%* Amean 234 47.7133 ( 0.00%) 42.0623 * 11.84%* Amean 265 53.0353 ( 0.00%) 47.7720 * 9.92%* Amean 296 60.0170 ( 0.00%) 53.4273 * 10.98%* Stddev 1 0.0052 ( 0.00%) 0.0025 ( 51.57%) Stddev 4 0.0357 ( 0.00%) 0.0370 ( -3.75%) Stddev 7 0.0190 ( 0.00%) 0.0298 ( -56.64%) Stddev 12 0.0064 ( 0.00%) 0.0095 ( -48.38%) Stddev 21 0.0065 ( 0.00%) 0.0097 ( -49.28%) Stddev 30 0.0185 ( 0.00%) 0.0295 ( -59.54%) Stddev 48 0.0559 ( 0.00%) 0.0168 ( 69.92%) Stddev 79 0.1559 ( 0.00%) 0.0278 ( 82.17%) Stddev 110 1.1728 ( 0.00%) 0.0532 ( 95.47%) Stddev 141 0.7867 ( 0.00%) 0.0968 ( 87.69%) Stddev 172 1.0255 ( 0.00%) 0.0420 ( 95.91%) Stddev 203 0.8106 ( 0.00%) 0.1384 ( 82.92%) Stddev 234 1.1949 ( 0.00%) 0.1328 ( 88.89%) Stddev 265 0.9231 ( 0.00%) 0.0820 ( 91.11%) Stddev 296 1.0456 ( 0.00%) 0.1327 ( 87.31%) Again, higher thread counts benefit and the standard deviation shows that results are also a lot more stable when the idle time is aged. The patch potentially matters when a socket was multiple LLCs as the maximum search depth is lower. However, some of the test results were suspiciously good (e.g. specjbb2005 gaining 50% on a Zen1 machine) and other results were not dramatically different to other mcahines. Given the nature of the patch, Peter's full series is not being forward ported as each part should stand on its own. Preferably they would be merged at different times to reduce the risk of false bisections. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210615111611.GH30378@techsingularity.net
2021-06-17sched/cpufreq: Consider reduced CPU capacity in energy calculationLukasz Luba
Energy Aware Scheduling (EAS) needs to predict the decisions made by SchedUtil. The map_util_freq() exists to do that. There are corner cases where the max allowed frequency might be reduced (due to thermal). SchedUtil as a CPUFreq governor, is aware of that but EAS is not. This patch aims to address it. SchedUtil stores the maximum allowed frequency in 'sugov_policy::next_freq' field. EAS has to predict that value, which is the real used frequency. That value is made after a call to cpufreq_driver_resolve_freq() which clamps to the CPUFreq policy limits. In the existing code EAS is not able to predict that real frequency. This leads to energy estimation errors. To avoid wrong energy estimation in EAS (due to frequency miss prediction) make sure that the step which calculates Performance Domain frequency, is also aware of the allowed CPU capacity. Furthermore, modify map_util_freq() to not extend the frequency value. Instead, use map_util_perf() to extend the util value in both places: SchedUtil and EAS, but for EAS clamp it to max allowed CPU capacity. In the end, we achieve the same desirable behavior for both subsystems and alignment in regards to the real CPU frequency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (For the schedutil part) Link: https://lore.kernel.org/r/20210614191238.23224-1-lukasz.luba@arm.com
2021-06-17sched/fair: Take thermal pressure into account while estimating energyLukasz Luba
Energy Aware Scheduling (EAS) needs to be able to predict the frequency requests made by the SchedUtil governor to properly estimate energy used in the future. It has to take into account CPUs utilization and forecast Performance Domain (PD) frequency. There is a corner case when the max allowed frequency might be reduced due to thermal. SchedUtil is aware of that reduced frequency, so it should be taken into account also in EAS estimations. SchedUtil, as a CPUFreq governor, knows the maximum allowed frequency of a CPU, thanks to cpufreq_driver_resolve_freq() and internal clamping to 'policy::max'. SchedUtil is responsible to respect that upper limit while setting the frequency through CPUFreq drivers. This effective frequency is stored internally in 'sugov_policy::next_freq' and EAS has to predict that value. In the existing code the raw value of arch_scale_cpu_capacity() is used for clamping the returned CPU utilization from effective_cpu_util(). This patch fixes issue with too big single CPU utilization, by introducing clamping to the allowed CPU capacity. The allowed CPU capacity is a CPU capacity reduced by thermal pressure raw value. Thanks to knowledge about allowed CPU capacity, we don't get too big value for a single CPU utilization, which is then added to the util sum. The util sum is used as a source of information for estimating whole PD energy. To avoid wrong energy estimation in EAS (due to capped frequency), make sure that the calculation of util sum is aware of allowed CPU capacity. This thermal pressure might be visible in scenarios where the CPUs are not heavily loaded, but some other component (like GPU) drastically reduced available power budget and increased the SoC temperature. Thus, we still use EAS for task placement and CPUs are not over-utilized. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210614191128.22735-1-lukasz.luba@arm.com
2021-06-17thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressureLukasz Luba
The thermal pressure signal gives information to the scheduler about reduced CPU capacity due to thermal. It is based on a value stored in a per-cpu 'thermal_pressure' variable. The online CPUs will get the new value there, while the offline won't. Unfortunately, when the CPU is back online, the value read from per-cpu variable might be wrong (stale data). This might affect the scheduler decisions, since it sees the CPU capacity differently than what is actually available. Fix it by making sure that all online+offline CPUs would get the proper value in their per-cpu variable when thermal framework sets capping. Fixes: f12e4f66ab6a3 ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping") Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/20210614191030.22241-1-lukasz.luba@arm.com
2021-06-17sched/fair: Return early from update_tg_cfs_load() if delta == 0Dietmar Eggemann
In case the _avg delta is 0 there is no need to update se's _avg (level n) nor cfs_rq's _avg (level n-1). These values stay the same. Since cfs_rq's _avg isn't changed, i.e. no load is propagated down, cfs_rq's _sum should stay the same as well. So bail out after se's _sum has been updated. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20210601083616.804229-1-dietmar.eggemann@arm.com
2021-06-17sched/pelt: Check that *_avg are null when *_sum areVincent Guittot
Check that we never break the rule that pelt's avg values are null if pelt's sum are. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Odin Ugedal <odin@uged.al> Link: https://lore.kernel.org/r/20210601155328.19487-1-vincent.guittot@linaro.org
2021-06-17hwmon: (ntc_thermistor) Drop unused headers.Jonathan Cameron
The IIO usage in this driver is purely consumer so it should only be including linux/iio/consumer.h Whilst here drop pm_runtime.h as there is no runtime power management in the driver. Found using include-what-you-use and manual inspection of the suggestions. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210611142257.103094-1-jic23@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17MAINTAINERS: Add Delta DPS920AB PSU driverRobert Marko
Add maintainers entry for the Delta DPS920AB PSU driver. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17dt-bindings: trivial-devices: Add Delta DPS920ABRobert Marko
Add trivial device entry for Delta DPS920AB PSU. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Link: https://lore.kernel.org/r/20210607103431.2039073-2-robert.marko@sartura.hr Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus) Add driver for Delta DPS-920AB PSURobert Marko
This adds support for the Delta DPS-920AB PSU. Only missing feature is fan control which the PSU supports. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Link: https://lore.kernel.org/r/20210607103431.2039073-1-robert.marko@sartura.hr [groeck: Add MODULE_IMPORT_NS(PMBUS);] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus/pim4328) Add documentation for the pim4328 PMBus driverErik Rosen
Add documentation and index link for pim4328 PMBus driver. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus/pim4328) Add PMBus driver for PIM4006, PIM4328 and PIM4820Erik Rosen
Add hardware monitoring support for Flex power interface modules PIM4006, PIM4328 and PIM4820. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus) Allow phase function even if it's not on pageErik Rosen
Allow the use of a phase function even if it does not exist on the associated page. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus) Add support for reading direct mode coefficientsErik Rosen
Add support for reading and decoding direct format coefficients to the PMBus core driver. If the new flag PMBUS_USE_COEFFICIENTS_CMD is set, the driver will use the COEFFICIENTS register together with the information in the pmbus_sensor_attr structs to initialize relevant coefficients for the direct mode format. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> [groeck: Initialize ret with -EINVAL in pmbus_init_coefficients()] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus) Add new pmbus flag NO_WRITE_PROTECTErik Rosen
Some PMBus chips respond with invalid data when reading the WRITE_PROTECT register. For such chips, this flag should be set so that the PMBus core driver doesn't use the WRITE_PROTECT command to determine its behavior. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17docs: hwmon: adm1177.rst: avoid using ReSt :doc:`foo` markupMauro Carvalho Chehab
The :doc:`foo` tag is auto-generated via automarkup.py. So, use the filename at the sources, instead of :doc:`foo`. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/32b0db7e79a3ed0e817213113c607a1b819e3867.1622898327.git.mchehab+huawei@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus_core) Check adapter PEC supportMadhava Reddy Siddareddygari
Currently, for Packet Error Checking (PEC) only the controller is checked for support. This causes problems on the cisco-8000 platform where a SMBUS transaction errors are observed. This is because PEC has to be enabled only if both controller and adapter support it. Added code to check PEC capability for adapter and enable it only if both controller and adapter supports PEC. Signed-off-by: Madhava Reddy Siddareddygari <msiddare@cisco.com> [Upstream from SONiC https://github.com/Azure/sonic-linux-kernel/pull/215] Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/r/20210605052700.541455-1-pmenzel@molgen.mpg.de [groeck: Dropped unnecessary continuation line] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (ina3221) use CVRF only for single-shot conversionNinad Malwade
As per current logic the wait time per conversion is arouns 430ms for 512 samples and around 860ms for 1024 samples for 3 channels considering 140us as the bus voltage and shunt voltage sampling conversion time. This waiting time is a lot for the continuous mode and even for the single shot mode. For continuous mode when moving average is considered the waiting for CVRF bit is not required and the data from the previous conversion is sufficuent. As mentioned in the datasheet the conversion ready bit is provided to help coordinate single-shot conversions, we can restrict the use to single-shot mode only. Also, the conversion time is for the averaged samples, the wait time for the polling can omit the number of samples consideration. Signed-off-by: Ninad Malwade <nmalwade@nvidia.com> Link: https://lore.kernel.org/r/1622789683-30931-1-git-send-email-nmalwade@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (max31790) Detect and report zero fan speedGuenter Roeck
If a fan is not running or not connected, of if fan monitoring is disabled, the fan count register returns a fixed value of 0xffe0. So far this is then translated to a RPM value larger than 0. Since this is misleading and does not really make much sense, report a fan RPM of 0 in this situation. Cc: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Václav Kubernát <kubernat@cesnet.cz> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Václav Kubernát <kubernat@cesnet.cz> Link: https://lore.kernel.org/r/20210526154022.3223012-7-linux@roeck-us.net
2021-06-17hwmon: (max31790) Clear fan fault after reporting itGuenter Roeck
Fault bits in MAX31790 are sticky and have to be cleared explicitly. A write operation into either the 'Target Duty Cycle' register or the 'Target Count' register is necessary to clear a fault. At the same time, we can never clear cached fault status values before reading them because the companion fault status for any given fan is cleared as well when clearing a fault. Cc: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Václav Kubernát <kubernat@cesnet.cz> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Václav Kubernát <kubernat@cesnet.cz> Link: https://lore.kernel.org/r/20210526154022.3223012-6-linux@roeck-us.net
2021-06-17hwmon: (max31790) Fix pwmX_enable attributesGuenter Roeck
pwmX_enable supports three possible values: 0: Fan control disabled. Duty cycle is fixed to 0% 1: Fan control enabled, pwm mode. Duty cycle is determined by values written into Target Duty Cycle registers. 2: Fan control enabled, rpm mode Duty cycle is adjusted such that fan speed matches the values in Target Count registers The current code does not do this; instead, it mixes pwm control configuration with fan speed monitoring configuration. Worse, it reports that pwm control would be disabled (pwmX_enable==0) when it is in fact enabled in pwm mode. Part of the problem may be that the chip sets the "TACH input enable" bit on its own whenever the mode bit is set to RPM mode, but that doesn't mean that "TACH input enable" accurately reflects the pwm mode. Fix it up and only handle pwm control with the pwmX_enable attributes. In the documentation, clarify that disabling pwm control (pwmX_enable=0) sets the pwm duty cycle to 0%. In the code, explain why TACH_INPUT_EN is set together with RPM_MODE. While at it, only update the configuration register if the configuration has changed, and only update the cached configuration if updating the chip configuration was successful. Cc: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Václav Kubernát <kubernat@cesnet.cz> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Václav Kubernát <kubernat@cesnet.cz> Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz> Link: https://lore.kernel.org/r/20210526154022.3223012-4-linux@roeck-us.net
2021-06-17hwmon: (max31790) Report correct current pwm duty cyclesGuenter Roeck
The MAX31790 has two sets of registers for pwm duty cycles, one to request a duty cycle and one to read the actual current duty cycle. Both do not have to be the same. When reporting the pwm duty cycle to the user, the actual pwm duty cycle from pwm duty cycle registers needs to be reported. When setting it, the pwm target duty cycle needs to be written. Since we don't know the actual pwm duty cycle after a target pwm duty cycle has been written, set the valid flag to false to indicate that actual pwm duty cycle should be read from the chip instead of using cached values. Cc: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Václav Kubernát <kubernat@cesnet.cz> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Václav Kubernát <kubernat@ceesnet.cz> Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz> Link: https://lore.kernel.org/r/20210526154022.3223012-3-linux@roeck-us.net
2021-06-17hwmon: (max31790) Fix fan speed reporting for fan7..12Guenter Roeck
Fans 7..12 do not have their own set of configuration registers. So far the code ignored that and read beyond the end of the configuration register range to get the tachometer period. This resulted in more or less random fan speed values for those fans. The datasheet is quite vague when it comes to defining the tachometer period for fans 7..12. Experiments confirm that the period is the same for both fans associated with a given set of configuration registers. Fixes: 54187ff9d766 ("hwmon: (max31790) Convert to use new hwmon registration API") Fixes: 195a4b4298a7 ("hwmon: Driver for Maxim MAX31790") Cc: Jan Kundrát <jan.kundrat@cesnet.cz> Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Václav Kubernát <kubernat@cesnet.cz> Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210526154022.3223012-2-linux@roeck-us.net
2021-06-17hwmon: (sht4x) Fix sht4x_read_values return valueJoe Perches
Kernel doc for sht4x_read_values() shows 0 on success, 1 on failure but the return value on success is actually always positive as it is set to SHT4X_RESPONSE_LENGTH by a successful call to i2c_master_recv(). Miscellanea: o Update the kernel doc for sht4x_read_values to 0 for success or -ERRNO o Remove incorrectly used kernel doc /** header for other _read functions o Typo fix succesfull->successful o Reverse a test to unindent a block and use goto unlock o Declare cmd[SHT4X_CMD_LEN] rather than cmd[] At least for gcc 10.2, object size is reduced a tiny bit. $ size drivers/hwmon/sht4x.o* text data bss dec hex filename 1752 404 256 2412 96c drivers/hwmon/sht4x.o.new 1825 404 256 2485 9b5 drivers/hwmon/sht4x.o.old Signed-off-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/60eedce497137eb34448c0c77e01ec9d9c972ad7.camel@perches.com Reviewed by: Navin Sankar Velliangiri <navin@linumiz.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: Add sht4x Temperature and Humidity Sensor DriverNavin Sankar Velliangiri
This patch adds a hwmon driver for the SHT4x Temperature and Humidity sensor. Signed-off-by: Navin Sankar Velliangiri <navin@linumiz.com> [groeck: dropped unnecessary empty line and continuation lines] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17docs: hwmon: Add an entry for mp2888Fabio Estevam
The entry for mp2888 is missing and it causes the following 'make htmldocs' build warning: Documentation/hwmon/mp2888.rst: WARNING: document isn't included in any toctree Add the mp2888 entry. Signed-off-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20210521172218.37592-1-festevam@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (adm1275) enable adm1272 temperature reportingChu Lin
adm1272 supports temperature reporting but it is disabled by default. Tested: ls temp1_* temp1_crit temp1_highest temp1_max temp1_crit_alarm temp1_input temp1_max_alarm cat temp1_input 26642 Signed-off-by: Chu Lin <linchuyuan@google.com> Link: https://lore.kernel.org/r/20210512171043.2433694-1-linchuyuan@google.com [groeck: Updated subject to reflect correct driver] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17dt-bindings: Add MP2888 voltage regulator deviceVadim Pasternak
Monolithic Power Systems, Inc. (MPS) dual-loop, digital, multi-phase controller. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210511055619.118104-4-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus) Add support for MPS Multi-phase mp2888 controllerVadim Pasternak
Add support for mp2888 device from Monolithic Power Systems, Inc. (MPS) vendor. This is a digital, multi-phase, pulse-width modulation controller. This device supports: - One power rail. - Programmable Multi-Phase up to 10 Phases. - PWM-VID Interface - One pages 0 for telemetry. - Programmable pins for PMBus Address. - Built-In EEPROM to Store Custom Configurations. - Can configured VOUT readout in direct or VID format and allows setting of different formats on rails 1 and 2. For VID the following protocols are available: VR13 mode with 5-mV DAC; VR13 mode with 10-mV DAC, IMVP9 mode with 5-mV DAC. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20210511055619.118104-3-vadimp@nvidia.com [groeck: Add MODULE_IMPORT_NS] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-17hwmon: (pmbus) Increase maximum number of phases per pageVadim Pasternak
Increase maximum number of phases from 8 to 10 to support multi-phase devices allowing up to 10 phases. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20210511055619.118104-2-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>