Age | Commit message (Collapse) | Author |
|
Debt reduction was blocked if any iocg was short on budget in the past
period to avoid reducing debts while some iocgs are saturated. However, this
ends up unnecessarily blocking debt reduction due to temporary local
imbalances when the device is generally being underutilized, while also
failing to block when the underlying device is overwhelmed and the usage
becomes low from high latency.
Given that debt accumulation mostly happens with swapout bursts which can
significantly deteriorate the underlying device's latency response, the
current logic is not great.
Let's replace it with ioc->busy_level based condition so that we block debt
reduction when the underlying device is being saturated. ioc_forgive_debts()
call is moved after busy_level determination.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Debt reduction logic is going to be improved and expanded. Factor it out
into ioc_forgive_debts() and generalize the comment a bit. No functional
change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux
Pull devfreq updates for 5.9-rc7 from Chanwoo Choi:
"1. Update devfreq core
- Add missing timer type to devfreq_summary debugfs node.
2. Fix devfreq device driver
- Fix the exception handling about clock on tegra30-devfreq.c"
* tag 'devfreq-fixes-for-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: tegra30: Disable clock on error in probe
PM / devfreq: Add timer type to devfreq_summary debugfs
|
|
Add DM target feature flag DM_TARGET_NOWAIT which advertises that
target works with REQ_NOWAIT bios.
Add dm_table_supports_nowait() and update dm_table_set_restrictions()
to set/clear QUEUE_FLAG_NOWAIT accordingly.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add QUEUE_FLAG_NOWAIT to allow a block device to advertise support for
REQ_NOWAIT. Bio-based devices may set QUEUE_FLAG_NOWAIT where
applicable.
Update QUEUE_FLAG_MQ_DEFAULT to include QUEUE_FLAG_NOWAIT. Also
update submit_bio_checks() to verify it is set for REQ_NOWAIT bios.
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
No need to go through the hd_struct to find the partition number.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
No need to go through the hd_struct to find the partition number.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bd_contains is never NULL for an open block device. In addition ibd_bd
is always set to a block device that was exclusively opened by the
target code, so the holder is guranteed to be ib_dev as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The ->bd_contains field is set by __blkdev_get and drivers have no
business manipulating it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bd_disk is set on all block devices, including those for partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bd_disk is set on all block devices, including those for partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To check for partitions of the same disk bd_contains works as well, but
bd_disk is way more obvious.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a littler helper to make the somewhat arcane bd_contains checks a
little more obvious.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bd_contains is an implementation detail and should not be mentioned in
a userspace API documentation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
commit 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE") removed the
REQ_NOWAIT_INLINE related code, but the diff wasn't applied to
blk_types.h somehow.
Then commit 2771cefeac49 ("block: remove the REQ_NOWAIT_INLINE flag")
removed the REQ_NOWAIT_INLINE flag while the BLK_QC_T_EAGAIN flag still
remains.
Fixes: 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.10/drivers
Pull MD updates from Song.
* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid10: improve discard request for far layout
md/raid10: improve raid10 discard request
md/raid10: pull codes that wait for blocked dev into one function
md/raid10: extend r10bio devs to raid disks
md: add md_submit_discard_bio() for submitting discard bio
md: Simplify code with existing definition RESYNC_SECTORS in raid10.c
md/raid5: reallocate page array after setting new stripe_size
md/raid5: resize stripe_head when reshape array
md/raid5: let multiple devices of stripe_head share page
md/raid6: let async recovery function support different page offset
md/raid6: let syndrome computor support different page offset
md/raid5: convert to new xor compution interface
md/raid5: add new xor function to support different page offset
md/raid5: make async_copy_data() to support different page offset
md/raid5: add a new member of offset into r5dev
md: only calculate blocksize once and use i_blocksize()
|
|
If we cancel these requests, we'll leak the memory associated with the
filename. Add them to the table of ops that need cleaning, if
REQ_F_NEED_CLEANUP is set.
Cc: stable@vger.kernel.org
Fixes: e62753e4e292 ("io_uring: call statx directly")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled.
Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are
toggled inadvertantly skipped the MMU context reset due to the mask
of bits that triggers PDPTR loads also being used to trigger MMU context
resets.
Fixes: 427890aff855 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode")
Fixes: cb957adb4ea4 ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode")
Cc: Jim Mattson <jmattson@google.com>
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Based on Google-internal RSEQ work done by Paul Turner and Andrew
Hunter.
This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
The test quite often fails without the previous patch in this
patchset, but consistently passes with it.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-3-posk@google.com
|
|
This patch adds rseq_offset_deref_addv() function to
tools/testing/selftests/rseq/rseq-x86.h, to be used in a selftest in
the next patch in the patchset.
Once an architecture adds support for this function they should define
"RSEQ_ARCH_HAS_OFFSET_DEREF_ADDV".
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-2-posk@google.com
|
|
This patchset is based on Google-internal RSEQ work done by Paul
Turner and Andrew Hunter.
When working with per-CPU RSEQ-based memory allocations, it is
sometimes important to make sure that a global memory location is no
longer accessed from RSEQ critical sections. For example, there can be
two per-CPU lists, one is "active" and accessed per-CPU, while another
one is inactive and worked on asynchronously "off CPU" (e.g. garbage
collection is performed). Then at some point the two lists are
swapped, and a fast RCU-like mechanism is required to make sure that
the previously active list is no longer accessed.
This patch introduces such a mechanism: in short, membarrier() syscall
issues an IPI to a CPU, restarting a potentially active RSEQ critical
section on the CPU.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-1-posk@google.com
|
|
Barry Song noted the following
Something is wrong. In find_busiest_group(), we are checking if
src has higher load, however, in task_numa_find_cpu(), we are
checking if dst will have higher load after balancing. It seems
it is not sensible to check src.
It maybe cause wrong imbalance value, for example,
if dst_running = env->dst_stats.nr_running + 1 results in 3 or
above, and src_running = env->src_stats.nr_running - 1 results
in 1;
The current code is thinking imbalance as 0 since src_running is
smaller than 2. This is inconsistent with load balancer.
Basically, in find_busiest_group(), the NUMA imbalance is ignored if moving
a task "from an almost idle domain" to a "domain with spare capacity". This
patch forbids movement "from a misplaced domain" to "an almost idle domain"
as that is closer to what the CPU load balancer expects.
This patch is not a universal win. The old behaviour was intended to allow
a task from an almost idle NUMA node to migrate to its preferred node if
the destination had capacity but there are corner cases. For example,
a NAS compute load could be parallelised to use 1/3rd of available CPUs
but not all those potential tasks are active at all times allowing this
logic to trigger. An obvious example is specjbb 2005 running various
numbers of warehouses on a 2 socket box with 80 cpus.
specjbb
5.9.0-rc4 5.9.0-rc4
vanilla dstbalance-v1r1
Hmean tput-1 46425.00 ( 0.00%) 43394.00 * -6.53%*
Hmean tput-2 98416.00 ( 0.00%) 96031.00 * -2.42%*
Hmean tput-3 150184.00 ( 0.00%) 148783.00 * -0.93%*
Hmean tput-4 200683.00 ( 0.00%) 197906.00 * -1.38%*
Hmean tput-5 236305.00 ( 0.00%) 245549.00 * 3.91%*
Hmean tput-6 281559.00 ( 0.00%) 285692.00 * 1.47%*
Hmean tput-7 338558.00 ( 0.00%) 334467.00 * -1.21%*
Hmean tput-8 340745.00 ( 0.00%) 372501.00 * 9.32%*
Hmean tput-9 424343.00 ( 0.00%) 413006.00 * -2.67%*
Hmean tput-10 421854.00 ( 0.00%) 434261.00 * 2.94%*
Hmean tput-11 493256.00 ( 0.00%) 485330.00 * -1.61%*
Hmean tput-12 549573.00 ( 0.00%) 529959.00 * -3.57%*
Hmean tput-13 593183.00 ( 0.00%) 555010.00 * -6.44%*
Hmean tput-14 588252.00 ( 0.00%) 599166.00 * 1.86%*
Hmean tput-15 623065.00 ( 0.00%) 642713.00 * 3.15%*
Hmean tput-16 703924.00 ( 0.00%) 660758.00 * -6.13%*
Hmean tput-17 666023.00 ( 0.00%) 697675.00 * 4.75%*
Hmean tput-18 761502.00 ( 0.00%) 758360.00 * -0.41%*
Hmean tput-19 796088.00 ( 0.00%) 798368.00 * 0.29%*
Hmean tput-20 733564.00 ( 0.00%) 823086.00 * 12.20%*
Hmean tput-21 840980.00 ( 0.00%) 856711.00 * 1.87%*
Hmean tput-22 804285.00 ( 0.00%) 872238.00 * 8.45%*
Hmean tput-23 795208.00 ( 0.00%) 889374.00 * 11.84%*
Hmean tput-24 848619.00 ( 0.00%) 966783.00 * 13.92%*
Hmean tput-25 750848.00 ( 0.00%) 903790.00 * 20.37%*
Hmean tput-26 780523.00 ( 0.00%) 962254.00 * 23.28%*
Hmean tput-27 1042245.00 ( 0.00%) 991544.00 * -4.86%*
Hmean tput-28 1090580.00 ( 0.00%) 1035926.00 * -5.01%*
Hmean tput-29 999483.00 ( 0.00%) 1082948.00 * 8.35%*
Hmean tput-30 1098663.00 ( 0.00%) 1113427.00 * 1.34%*
Hmean tput-31 1125671.00 ( 0.00%) 1134175.00 * 0.76%*
Hmean tput-32 968167.00 ( 0.00%) 1250286.00 * 29.14%*
Hmean tput-33 1077676.00 ( 0.00%) 1060893.00 * -1.56%*
Hmean tput-34 1090538.00 ( 0.00%) 1090933.00 * 0.04%*
Hmean tput-35 967058.00 ( 0.00%) 1107421.00 * 14.51%*
Hmean tput-36 1051745.00 ( 0.00%) 1210663.00 * 15.11%*
Hmean tput-37 1019465.00 ( 0.00%) 1351446.00 * 32.56%*
Hmean tput-38 1083102.00 ( 0.00%) 1064541.00 * -1.71%*
Hmean tput-39 1232990.00 ( 0.00%) 1303623.00 * 5.73%*
Hmean tput-40 1175542.00 ( 0.00%) 1340943.00 * 14.07%*
Hmean tput-41 1127826.00 ( 0.00%) 1339492.00 * 18.77%*
Hmean tput-42 1198313.00 ( 0.00%) 1411023.00 * 17.75%*
Hmean tput-43 1163733.00 ( 0.00%) 1228253.00 * 5.54%*
Hmean tput-44 1305562.00 ( 0.00%) 1357886.00 * 4.01%*
Hmean tput-45 1326752.00 ( 0.00%) 1406061.00 * 5.98%*
Hmean tput-46 1339424.00 ( 0.00%) 1418451.00 * 5.90%*
Hmean tput-47 1415057.00 ( 0.00%) 1381570.00 * -2.37%*
Hmean tput-48 1392003.00 ( 0.00%) 1421167.00 * 2.10%*
Hmean tput-49 1408374.00 ( 0.00%) 1418659.00 * 0.73%*
Hmean tput-50 1359822.00 ( 0.00%) 1391070.00 * 2.30%*
Hmean tput-51 1414246.00 ( 0.00%) 1392679.00 * -1.52%*
Hmean tput-52 1432352.00 ( 0.00%) 1354020.00 * -5.47%*
Hmean tput-53 1387563.00 ( 0.00%) 1409563.00 * 1.59%*
Hmean tput-54 1406420.00 ( 0.00%) 1388711.00 * -1.26%*
Hmean tput-55 1438804.00 ( 0.00%) 1387472.00 * -3.57%*
Hmean tput-56 1399465.00 ( 0.00%) 1400296.00 * 0.06%*
Hmean tput-57 1428132.00 ( 0.00%) 1396399.00 * -2.22%*
Hmean tput-58 1432385.00 ( 0.00%) 1386253.00 * -3.22%*
Hmean tput-59 1421612.00 ( 0.00%) 1371416.00 * -3.53%*
Hmean tput-60 1429423.00 ( 0.00%) 1389412.00 * -2.80%*
Hmean tput-61 1396230.00 ( 0.00%) 1351122.00 * -3.23%*
Hmean tput-62 1418396.00 ( 0.00%) 1383098.00 * -2.49%*
Hmean tput-63 1409918.00 ( 0.00%) 1374662.00 * -2.50%*
Hmean tput-64 1410236.00 ( 0.00%) 1376216.00 * -2.41%*
Hmean tput-65 1396405.00 ( 0.00%) 1364418.00 * -2.29%*
Hmean tput-66 1395975.00 ( 0.00%) 1357326.00 * -2.77%*
Hmean tput-67 1392986.00 ( 0.00%) 1349642.00 * -3.11%*
Hmean tput-68 1386541.00 ( 0.00%) 1343261.00 * -3.12%*
Hmean tput-69 1374407.00 ( 0.00%) 1342588.00 * -2.32%*
Hmean tput-70 1377513.00 ( 0.00%) 1334654.00 * -3.11%*
Hmean tput-71 1369319.00 ( 0.00%) 1334952.00 * -2.51%*
Hmean tput-72 1354635.00 ( 0.00%) 1329005.00 * -1.89%*
Hmean tput-73 1350933.00 ( 0.00%) 1318942.00 * -2.37%*
Hmean tput-74 1351714.00 ( 0.00%) 1316347.00 * -2.62%*
Hmean tput-75 1352198.00 ( 0.00%) 1309974.00 * -3.12%*
Hmean tput-76 1349490.00 ( 0.00%) 1286064.00 * -4.70%*
Hmean tput-77 1336131.00 ( 0.00%) 1303684.00 * -2.43%*
Hmean tput-78 1308896.00 ( 0.00%) 1271024.00 * -2.89%*
Hmean tput-79 1326703.00 ( 0.00%) 1290862.00 * -2.70%*
Hmean tput-80 1336199.00 ( 0.00%) 1291629.00 * -3.34%*
The performance at the mid-point is better but not universally better. The
patch is a mixed bag depending on the workload, machine and overall
levels of utilisation. Sometimes it's better (sometimes much better),
other times it is worse (sometimes much worse). Given that there isn't a
universally good decision in this section and more people seem to prefer
the patch then it may be best to keep the LB decisions consistent and
revisit imbalance handling when the load balancer code changes settle down.
Jirka Hladky added the following observation.
Our results are mostly in line with what you see. We observe
big gains (20-50%) when the system is loaded to 1/3 of the
maximum capacity and mixed results at the full load - some
workloads benefit from the patch at the full load, others not,
but performance changes at the full load are mostly within the
noise of results (+/-5%). Overall, we think this patch is helpful.
[mgorman@techsingularity.net: Rewrote changelog]
Fixes: fb86f5b211 ("sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity")
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200921221849.GI3179@techsingularity.net
|
|
The busy_factor, which increases load balance interval when a cpu is busy,
is set to 32 by default. This value generates some huge LB interval on
large system like the THX2 made of 2 node x 28 cores x 4 threads.
For such system, the interval increases from 112ms to 3584ms at MC level.
And from 228ms to 7168ms at NUMA level.
Even on smaller system, a lower busy factor has shown improvement on the
fair distribution of the running time so let reduce it for all.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-5-vincent.guittot@linaro.org
|
|
sched domains tend to trigger simultaneously the load balance loop but
the larger domains often need more time to collect statistics. This
slowness makes the larger domain trying to detach tasks from a rq whereas
tasks already migrated somewhere else at a sub-domain level. This is not
a real problem for idle LB because the period of smaller domains will
increase with its CPUs being busy and this will let time for higher ones
to pulled tasks. But this becomes a problem when all CPUs are already busy
because all domains stay synced when they trigger their LB.
A simple way to minimize simultaneous LB of all domains is to decrement the
the busy interval by 1 jiffies. Because of the busy_factor, the interval of
larger domain will not be a multiple of smaller ones anymore.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-4-vincent.guittot@linaro.org
|
|
The 25% default imbalance threshold for DIE and NUMA domain is large
enough to generate significant unfairness between threads. A typical
example is the case of 11 threads running on 2x4 CPUs. The imbalance of
20% between the 2 groups of 4 cores is just low enough to not trigger
the load balance between the 2 groups. We will have always the same 6
threads on one group of 4 CPUs and the other 5 threads on the other
group of CPUS. With a fair time sharing in each group, we ends up with
+20% running time for the group of 5 threads.
Consider decreasing the imbalance threshold for overloaded case where we
use the load to balance task and to ensure fair time sharing.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Hillf Danton <hdanton@sina.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-3-vincent.guittot@linaro.org
|
|
Some UCs like 9 always running tasks on 8 CPUs can't be balanced and the
load balancer currently migrates the waiting task between the CPUs in an
almost random manner. The success of a rq pulling a task depends of the
value of nr_balance_failed of its domains and its ability to be faster
than others to detach it. This behavior results in an unfair distribution
of the running time between tasks because some CPUs will run most of the
time, if not always, the same task whereas others will share their time
between several tasks.
Instead of using nr_balance_failed as a boolean to relax the condition
for detaching task, the LB will use nr_balanced_failed to relax the
threshold between the tasks'load and the imbalance. This mecanism
prevents the same rq or domain to always win the load balance fight.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-2-vincent.guittot@linaro.org
|
|
In the file fair.c, sometims update_tg_load_avg(cfs_rq, 0) is used,
sometimes update_tg_load_avg(cfs_rq, false) is used.
update_tg_load_avg() has the parameter force, but in current code,
it never set 1 or true to it, so remove the force parameter.
Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200924014755.36253-1-tian.xianting@h3c.com
|
|
We've met problems that occasionally tasks with full cpumask
(e.g. by putting it into a cpuset or setting to full affinity)
were migrated to our isolated cpus in production environment.
After some analysis, we found that it is due to the current
select_idle_smt() not considering the sched_domain mask.
Steps to reproduce on my 31-CPU hyperthreads machine:
1. with boot parameter: "isolcpus=domain,2-31"
(thread lists: 0,16 and 1,17)
2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads"
3. some threads will be migrated to the isolated cpu16~17.
Fix it by checking the valid domain mask in select_idle_smt().
Fixes: 10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings())
Reported-by: Wetp Zhang <wetp.zy@linux.alibaba.com>
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1600930127-76857-1-git-send-email-xlpang@linux.alibaba.com
|
|
There is no caller in tree, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20200922132410.48440-1-yuehaibing@huawei.com
|
|
The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime
between CPUs, allowing a CPU to run a real-time task up to 100% of the
time while leaving more space for non-real-time tasks to run on the CPU
that lend rt_runtime.
The problem is that a CPU can easily borrow enough rt_runtime to allow
a spinning rt-task to run forever, starving per-cpu tasks like kworkers,
which are non-real-time by design.
This patch disables RT_RUNTIME_SHARE by default, avoiding this problem.
The feature will still be present for users that want to enable it,
though.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Wei Wang <wvw@google.com>
Link: https://lkml.kernel.org/r/b776ab46817e3db5d8ef79175fa0d71073c051c7.1600697903.git.bristot@redhat.com
|
|
When a boosted task gets throttled, what normally happens is that it's
immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the
runtime and clears the dl_throttled flag. There is a special case however:
if the throttling happened on sched-out and the task has been deboosted in
the meantime, the replenish is skipped as the task will return to its
normal scheduling class. This leaves the task with the dl_throttled flag
set.
Now if the task gets boosted up to the deadline scheduling class again
while it is sleeping, it's still in the throttled state. The normal wakeup
however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't
actually place it on the rq. Thus we end up with a task that is runnable,
but not actually on the rq and neither a immediate replenishment happens,
nor is the replenishment timer set up, so the task is stuck in
forever-throttled limbo.
Clear the dl_throttled flag before dropping back to the normal scheduling
class to fix this issue.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200831110719.2126930-1-l.stach@pengutronix.de
|
|
Use runnable_avg to classify numa node state similarly to what is done for
normal load balancer. This helps to ensure that numa and normal balancers
use the same view of the state of the system.
Large arm64system: 2 nodes / 224 CPUs:
hackbench -l (256000/#grp) -g #grp
grp tip/sched/core +patchset improvement
1 14,008(+/- 4,99 %) 13,800(+/- 3.88 %) 1,48 %
4 4,340(+/- 5.35 %) 4.283(+/- 4.85 %) 1,33 %
16 3,357(+/- 0.55 %) 3.359(+/- 0.54 %) -0,06 %
32 3,050(+/- 0.94 %) 3.039(+/- 1,06 %) 0,38 %
64 2.968(+/- 1,85 %) 3.006(+/- 2.92 %) -1.27 %
128 3,290(+/-12.61 %) 3,108(+/- 5.97 %) 5.51 %
256 3.235(+/- 3.95 %) 3,188(+/- 2.83 %) 1.45 %
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200921072959.16317-1-vincent.guittot@linaro.org
|
|
The struct of_device_id is not defined with !CONFIG_OF so its forward
declaration should be hidden to as well. This should address clang
compile warning:
drivers/mmc/host/sdhci-s3c.c:464:34: warning: tentative array definition assumed to have one element
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200925072532.10272-1-krzk@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Fix inconsistent indenting, reported by Smatch:
drivers/mmc/host/sdhci-esdhc-imx.c:1380 sdhci_esdhc_imx_hwinit() warn: inconsistent indenting
drivers/mmc/host/sdhci-sprd.c:390 sdhci_sprd_request_done() warn: inconsistent indenting
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200923153739.30327-2-krzk@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The 'struct mmc_host *mmc' comes from drvdata set at the end of probe,
so it cannot be NULL. The code already dereferences it few lines before
the check with mmc_priv(). This also fixes smatch warning:
drivers/mmc/host/moxart-mmc.c:692 moxart_remove() warn: variable dereferenced before check 'mmc' (see line 688)
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200923153739.30327-1-krzk@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The MMC core has now a generic check if some tuning is in progress. Its
protected area is a bit larger than the custom one in this driver but we
concluded that this works equally well for the intended case. So, drop
the local flag and switch to the generic one.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200922172253.4458-1-wsa@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Simplify the return expression.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Link: https://lore.kernel.org/r/20200921131042.92340-1-miaoqinglang@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Add documentation for mmc_hw_reset to make sure the intended use case is
clear.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20200918215446.65654-1-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
fbcon_get_font() is reading out-of-bounds. A malicious user may resize
`vc->vc_font.height` to a large value, causing fbcon_get_font() to
read out of `fontdata`.
fbcon_get_font() handles both built-in and user-provided fonts.
Fortunately, recently we have added FONT_EXTRA_WORDS support for built-in
fonts, so fix it by adding range checks using FNTSIZE().
This patch depends on patch "fbdev, newport_con: Move FONT_EXTRA_WORDS
macros into linux/font.h", and patch "Fonts: Support FONT_EXTRA_WORDS
macros for built-in fonts".
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+29d4ed7f3bdedf2aa2fd@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=08b8be45afea11888776f897895aef9ad1c3ecfd
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/b34544687a1a09d6de630659eb7a773f4953238b.1600953813.git.yepeilin.cs@gmail.com
|
|
syzbot has reported an issue in the framebuffer layer, where a malicious
user may overflow our built-in font data buffers.
In order to perform a reliable range check, subsystems need to know
`FONTDATAMAX` for each built-in font. Unfortunately, our font descriptor,
`struct console_font` does not contain `FONTDATAMAX`, and is part of the
UAPI, making it infeasible to modify it.
For user-provided fonts, the framebuffer layer resolves this issue by
reserving four extra words at the beginning of data buffers. Later,
whenever a function needs to access them, it simply uses the following
macros:
Recently we have gathered all the above macros to <linux/font.h>. Let us
do the same thing for built-in fonts, prepend four extra words (including
`FONTDATAMAX`) to their data buffers, so that subsystems can use these
macros for all fonts, no matter built-in or user-provided.
This patch depends on patch "fbdev, newport_con: Move FONT_EXTRA_WORDS
macros into linux/font.h".
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=08b8be45afea11888776f897895aef9ad1c3ecfd
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/ef18af00c35fb3cc826048a5f70924ed6ddce95b.1600953813.git.yepeilin.cs@gmail.com
|
|
drivers/video/console/newport_con.c is borrowing FONT_EXTRA_WORDS macros
from drivers/video/fbdev/core/fbcon.h. To keep things simple, move all
definitions into <linux/font.h>.
Since newport_con now uses four extra words, initialize the fourth word in
newport_set_font() properly.
Cc: stable@vger.kernel.org
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/7fb8bc9b0abc676ada6b7ac0e0bd443499357267.1600953813.git.yepeilin.cs@gmail.com
|
|
The struct flowi must never be interpreted by itself as its size
depends on the address family. Therefore it must always be grouped
with its original family value.
In this particular instance, the original family value is lost in
the function xfrm_state_find. Therefore we get a bogus read when
it's coupled with the wrong family which would occur with inter-
family xfrm states.
This patch fixes it by keeping the original family value.
Note that the same bug could potentially occur in LSM through
the xfrm_state_pol_flow_match hook. I checked the current code
there and it seems to be safe for now as only secid is used which
is part of struct flowi_common. But that API should be changed
so that so that we don't get new bugs in the future. We could
do that by replacing fl with just secid or adding a family field.
Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com
Fixes: 48b8d78315bf ("[XFRM]: State selection update to use inner...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Asymmetric digsig supports SM2-with-SM3 algorithm combination,
so that IMA can also verify SM2's signature data.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Vitaly Chikunov <vt@altlinux.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The digital certificate format based on SM2 crypto algorithm as
specified in GM/T 0015-2012. It was published by State Encryption
Management Bureau, China.
The method of generating Other User Information is defined as
ZA=H256(ENTLA || IDA || a || b || xG || yG || xA || yA), it also
specified in https://tools.ietf.org/html/draft-shen-sm2-ecdsa-02.
The x509 certificate supports SM2-with-SM3 type certificate
verification. Because certificate verification requires ZA
in addition to tbs data, ZA also depends on elliptic curve
parameters and public key data, so you need to access tbs in sig
and calculate ZA. Finally calculate the digest of the
signature and complete the verification work. The calculation
process of ZA is declared in specifications GM/T 0009-2012
and GM/T 0003.2-2012.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The digital certificate format based on SM2 crypto algorithm as
specified in GM/T 0015-2012. It was published by State Encryption
Management Bureau, China.
This patch adds the OID object identifier defined by OSCCA. The
x509 certificate supports SM2-with-SM3 type certificate parsing.
It uses the standard elliptic curve public key, and the sm2
algorithm signs the hash generated by sm3.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Reviewed-by: Vitaly Chikunov <vt@altlinux.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add testmgr test vectors for SM2 algorithm. These vectors come
from `openssl pkeyutl -sign` and libgcrypt.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When the 'key' allocation fails, the 'req' will not be released,
which will cause memory leakage on this path. This patch adds a
'free_req' tag used to solve this problem, and two new err values
are added to reflect the real reason of the error.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Some asymmetric algorithms will get different ciphertext after
each encryption, such as SM2, and let testmgr support the testing
of such algorithms.
In struct akcipher_testvec, set c and c_size to be empty, skip
the comparison of the ciphertext, and compare the decrypted
plaintext with m to achieve the test purpose.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This new module implement the SM2 public key algorithm. It was
published by State Encryption Management Bureau, China.
List of specifications for SM2 elliptic curve public key cryptography:
* GM/T 0003.1-2012
* GM/T 0003.2-2012
* GM/T 0003.3-2012
* GM/T 0003.4-2012
* GM/T 0003.5-2012
IETF: https://tools.ietf.org/html/draft-shen-sm2-ecdsa-02
oscca: http://www.oscca.gov.cn/sca/xxgk/2010-12/17/content_1002386.shtml
scctc: http://www.gmbz.org.cn/main/bzlb.html
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The implementation of EC is introduced from libgcrypt as the
basic algorithm of elliptic curve, which can be more perfectly
integrated with MPI implementation.
Some other algorithms will be developed based on mpi ecc, such as SM2.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Xufeng Zhang <yunbo.xufeng@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|