summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-21dt-bindings: ipmi: Convert ASPEED KCS binding to schemaAndrew Jeffery
Given the deprecated binding, improve the ability to detect issues in the platform devicetrees. Further, a subsequent patch will introduce a new interrupts property for specifying SerIRQ behaviour, so convert before we do any further additions. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Zev Weiss <zweiss@equinix.com> Message-Id: <20210608104757.582199-13-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Add serio adaptorAndrew Jeffery
kcs_bmc_serio acts as a bridge between the KCS drivers in the IPMI subsystem and the existing userspace interfaces available through the serio subsystem. This is useful when userspace would like to make use of the BMC KCS devices for purposes that aren't IPMI. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Message-Id: <20210608104757.582199-12-andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Enable IBF on openAndrew Jeffery
This way devices don't get delivered IRQs when no-one is interested. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Message-Id: <20210608104757.582199-11-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Allow clients to control KCS IRQ stateAndrew Jeffery
Add a mechanism for controlling whether the client associated with a KCS device will receive Input Buffer Full (IBF) and Output Buffer Empty (OBE) events. This enables an abstract implementation of poll() for KCS devices. A wart in the implementation is that the ASPEED KCS devices don't support an OBE interrupt for the BMC. Instead we pretend it has one by polling the status register waiting for the Output Buffer Full (OBF) bit to clear, and generating an event when OBE is observed. Cc: CS20 KWLiu <KWLIU@nuvoton.com> Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Message-Id: <20210608104757.582199-10-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Decouple the IPMI chardev from the coreAndrew Jeffery
Now that we have untangled the data-structures, split the userspace interface out into its own module. Userspace interfaces and drivers are registered to the KCS BMC core to support arbitrary binding of either. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Message-Id: <20210608104757.582199-9-andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Strip private client data from struct kcs_bmcAndrew Jeffery
Move all client-private data out of `struct kcs_bmc` into the KCS client implementation. With this change the KCS BMC core code now only concerns itself with abstract `struct kcs_bmc` and `struct kcs_bmc_client` types, achieving expected separation of concerns. Further, the change clears the path for implementation of alternative userspace interfaces. The chardev data-structures are rearranged in the same manner applied to the KCS device driver data-structures in an earlier patch - `struct kcs_bmc_client` is embedded in the client's private data and we exploit container_of() to translate as required. Finally, now that it is free of client data, `struct kcs_bmc` is renamed to `struct kcs_bmc_device` to contrast `struct kcs_bmc_client`. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Message-Id: <20210608104757.582199-8-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Split headers into device and clientAndrew Jeffery
Strengthen the distinction between code that abstracts the implementation of the KCS behaviours (device drivers) and code that exploits KCS behaviours (clients). Neither needs to know about the APIs required by the other, so provide separate headers. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Message-Id: <20210608104757.582199-7-andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Turn the driver data-structures inside-outAndrew Jeffery
Make the KCS device drivers responsible for allocating their own memory. Until now the private data for the device driver was allocated internal to the private data for the chardev interface. This coupling required the slightly awkward API of passing through the struct size for the driver private data to the chardev constructor, and then retrieving a pointer to the driver private data from the allocated chardev memory. In addition to being awkward, the arrangement prevents the implementation of alternative userspace interfaces as the device driver private data is not independent. Peel a layer off the onion and turn the data-structures inside out by exploiting container_of() and embedding `struct kcs_device` in the driver private data. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Message-Id: <20210608104757.582199-6-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Split out kcs_bmc_cdev_ipmiAndrew Jeffery
Take steps towards defining a coherent API to separate the KCS device drivers from the userspace interface. Decreasing the coupling will improve the separation of concerns and enable the introduction of alternative userspace interfaces. For now, simply split the chardev logic out to a separate file. The code continues to build into the same module. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Message-Id: <20210608104757.582199-5-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Rename {read,write}_{status,data}() functionsAndrew Jeffery
Rename the functions in preparation for separating the IPMI chardev out from the KCS BMC core. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Message-Id: <20210608104757.582199-4-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc: Make status update atomicAndrew Jeffery
Enable more efficient implementation of read-modify-write sequences. Both device drivers for the KCS BMC stack use regmaps. The new callback allows us to exploit regmap_update_bits(). Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Message-Id: <20210608104757.582199-3-andrew@aj.id.au> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21ipmi: kcs_bmc_aspeed: Use of match data to extract KCS propertiesAndrew Jeffery
Unpack and remove the aspeed_kcs_probe_of_v[12]() functions to aid rearranging how the private device-driver memory is allocated. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Message-Id: <20210608104757.582199-2-andrew@aj.id.au> Reviewed-by: Zev Weiss <zweiss@equinix.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-06-21Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue."Yifan Zhang
This reverts commit 4cbbe34807938e6e494e535a68d5ff64edac3f20. Reason for revert: side effect of enlarging CP_MEC_DOORBELL_RANGE may cause some APUs fail to enter gfxoff in certain user cases. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-06-21Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full ↵Yifan Zhang
doorbell." This reverts commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3. Reason for revert: Side effect of enlarging CP_MEC_DOORBELL_RANGE may cause some APUs fail to enter gfxoff in certain user cases. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-06-21drm/amdgpu: Call drm_framebuffer_init last for framebuffer initMichel Dänzer
Once drm_framebuffer_init has returned 0, the framebuffer is hooked up to the reference counting machinery and can no longer be destroyed with a simple kfree. Therefore, it must be called last. If drm_framebuffer_init returns 0 but its caller then returns non-0, there will likely be memory corruption fireworks down the road. The following lead me to this fix: [ 12.891228] kernel BUG at lib/list_debug.c:25! [...] [ 12.891263] RIP: 0010:__list_add_valid+0x4b/0x70 [...] [ 12.891324] Call Trace: [ 12.891330] drm_framebuffer_init+0xb5/0x100 [drm] [ 12.891378] amdgpu_display_gem_fb_verify_and_init+0x47/0x120 [amdgpu] [ 12.891592] ? amdgpu_display_user_framebuffer_create+0x10d/0x1f0 [amdgpu] [ 12.891794] amdgpu_display_user_framebuffer_create+0x126/0x1f0 [amdgpu] [ 12.891995] drm_internal_framebuffer_create+0x378/0x3f0 [drm] [ 12.892036] ? drm_internal_framebuffer_create+0x3f0/0x3f0 [drm] [ 12.892075] drm_mode_addfb2+0x34/0xd0 [drm] [ 12.892115] ? drm_internal_framebuffer_create+0x3f0/0x3f0 [drm] [ 12.892153] drm_ioctl_kernel+0xe2/0x150 [drm] [ 12.892193] drm_ioctl+0x3da/0x460 [drm] [ 12.892232] ? drm_internal_framebuffer_create+0x3f0/0x3f0 [drm] [ 12.892274] amdgpu_drm_ioctl+0x43/0x80 [amdgpu] [ 12.892475] __se_sys_ioctl+0x72/0xc0 [ 12.892483] do_syscall_64+0x33/0x40 [ 12.892491] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: f258907fdd835e "drm/amdgpu: Verify bo size can fit framebuffer size on init." Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-21block/partitions/msdos: Fix typo inidicator -> indicatorThomas Bracht Laumann Jespersen
Just a fix for a small typo in msdos_partition(). Signed-off-by: Thomas Bracht Laumann Jespersen <t@laumann.xyz> Link: https://lore.kernel.org/r/20210619195130.19348-1-t@laumann.xyz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block, bfq: reset waker pointer with shared queuesPaolo Valente
Commit 85686d0dc194 ("block, bfq: keep shared queues out of the waker mechanism") leaves shared bfq_queues out of the waker-detection mechanism. It attains this goal by not updating the pointer last_completed_rq_bfqq, if the last request completed belongs to a shared bfq_queue (so that the pointer will not point to the shared bfq_queue). Yet this has a side effect: the pointer last_completed_rq_bfqq keeps pointing, deceptively, to a bfq_queue that actually is not the last one to have had a request completed. As a consequence, such a bfq_queue may deceptively be considered as a waker of some bfq_queue, even of some shared bfq_queue. To address this issue, reset last_completed_rq_bfqq if the last request completed belongs to a shared queue. Fixes: 85686d0dc194 ("block, bfq: keep shared queues out of the waker mechanism") Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20210619140948.98712-8-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block, bfq: check waker only for queues with no in-flight I/OPaolo Valente
Consider two bfq_queues, say Q1 and Q2, with Q2 empty. If a request of Q1 gets completed shortly before a new request arrives for Q2, then BFQ flags Q1 as a candidate waker for Q2. Yet, the arrival of this new request may have a different cause, in the following case. If also Q2 has requests in flight while waiting for the arrival of a new request, then the completion of its own requests may be the actual cause of the awakening of the process that sends I/O to Q2. So Q1 may be flagged wrongly as a candidate waker. This commit avoids this deceptive flagging, by disabling candidate-waker flagging for Q2, if Q2 has in-flight I/O. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20210619140948.98712-7-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block, bfq: avoid delayed merge of async queuesPaolo Valente
Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created queues"), BFQ may schedule a merge between a newly created sync bfq_queue, say Q2, and the last sync bfq_queue created, say Q1. To this goal, BFQ stores the address of Q1 in the field bic->stable_merge_bfqq of the bic associated with Q2. So, when the time for the possible merge arrives, BFQ knows which bfq_queue to merge Q2 with. In particular, BFQ checks for possible merges on request arrivals. Yet the same bic may also be associated with an async bfq_queue, say Q3. So, if a request for Q3 arrives, then the above check may happen to be executed while the bfq_queue at hand is Q3, instead of Q2. In this case, Q1 happens to be merged with an async bfq_queue. This is not only a conceptual mistake, because async queues are to be kept out of queue merging, but also a bug that leads to inconsistent states. This commits simply filters async queues out of delayed merges. Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues") Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20210619140948.98712-6-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block, bfq: boost throughput by extending queue-merging timesPietro Pedroni
One of the methods with which bfq boosts throughput is by merging queues. One of the merging variants in bfq is the stable merge. This mechanism is activated between two queues only if they are created within a certain maximum time T1 from each other. Merging can happen soon or be delayed. In the second case, before merging, bfq needs to evaluate a throughput-boost parameter that indicates whether the queue generates a high throughput is served alone. Merging occurs when this throughput-boost is not high enough. In particular, this parameter is evaluated and late merging may occur only after at least a time T2 from the creation of the queue. Currently T1 and T2 are set to 180ms and 200ms, respectively. In this way the merging mechanism rarely occurs because time is not enough. This results in a noticeable lowering of the overall throughput with some workloads (see the example below). This commit introduces two constants bfq_activation_stable_merging and bfq_late_stable_merging in order to increase the duration of T1 and T2. Both the stable merging activation time and the late merging time are set to 600ms. This value has been experimentally evaluated using sqlite benchmark in the Phoronix Test Suite on a HDD. The duration of the benchmark before this fix was 111.02s, while now it has reached 97.02s, a better result than that of all the other schedulers. Signed-off-by: Pietro Pedroni <pedroni.pietro.96@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20210619140948.98712-5-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block, bfq: consider also creation time in delayed stable mergePaolo Valente
Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created queues"), BFQ may schedule a merge between a newly created sync bfq_queue and the last sync bfq_queue created. Such a merging is not performed immediately, because BFQ needs first to find out whether the newly created queue actually reaches a higher throughput if not merged at all (and in that case BFQ will not perform any stable merging). To check that, a little time must be waited after the creation of the new queue, so that some I/O can flow in the queue, and statistics on such I/O can be computed. Yet, to evaluate the above waiting time, the last split time is considered as start time, instead of the creation time of the queue. This is a mistake, because considering the split time is correct only in the following scenario. The queue undergoes a non-stable merges on the arrival of its very first I/O request, due to close I/O with some other queue. While the queue is merged for close I/O, stable merging is not considered. Yet the queue may then happen to be split, if the close I/O finishes (or happens to be a false positive). From this time on, the queue can again be considered for stable merging. But, again, a little time must elapse, to let some new I/O flow in the queue and to get updated statistics. To wait for this time, the split time is to be taken into account. Yet, if the queue does not undergo a non-stable merge on the arrival of its very first request, then BFQ immediately checks whether the stable merge is to be performed. It happens because the split time for a queue is initialized to minus infinity when the queue is created. This commit fixes this mistake by adding the missing condition. Now the check for delayed stable-merge is performed after a little time is elapsed not only from the last queue split time, but also from the creation time of the queue. Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues") Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20210619140948.98712-4-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block, bfq: fix delayed stable merge checkLuca Mariotti
When attempting to schedule a merge of a given bfq_queue with the currently in-service bfq_queue or with a cooperating bfq_queue among the scheduled bfq_queues, delayed stable merge is checked for rotational or non-queueing devs. For this stable merge to be performed, some conditions must be met. If the current bfq_queue underwent some split from some merged bfq_queue, one of these conditions is that two hundred milliseconds must elapse from split, otherwise this condition is always met. Unfortunately, by mistake, time_is_after_jiffies() was written instead of time_is_before_jiffies() for this check, verifying that less than two hundred milliseconds have elapsed instead of verifying that at least two hundred milliseconds have elapsed. Fix this issue by replacing time_is_after_jiffies() with time_is_before_jiffies(). Signed-off-by: Luca Mariotti <mariottiluca1@hotmail.it> Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: Pietro Pedroni <pedroni.pietro.96@gmail.com> Link: https://lore.kernel.org/r/20210619140948.98712-3-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block, bfq: let also stably merged queues enjoy weight raisingPaolo Valente
Merged bfq_queues are kept out of weight-raising (low-latency) mechanisms. The reason is that these queues are usually created for non-interactive and non-soft-real-time tasks. Yet this is not the case for stably-merged queues. These queues are merged just because they are created shortly after each other. So they may easily serve the I/O of an interactive or soft-real time application, if the application happens to spawn multiple processes. To address this issue, this commits lets also stably-merged queued enjoy weight raising. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20210619140948.98712-2-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21blk-wbt: make sure throttle is enabled properlyZhang Yi
After commit a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt"), if throttle was disabled by wbt_disable_default(), we could not enable again, fix this by set enable_state back to WBT_STATE_ON_DEFAULT. Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210619093700.920393-3-yi.zhang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21blk-wbt: introduce a new disable state to prevent false positive by ↵Zhang Yi
rwb_enabled() Now that we disable wbt by simply zero out rwb->wb_normal in wbt_disable_default() when switch elevator to bfq, but it's not safe because it will become false positive if we change queue depth. If it become false positive between wbt_wait() and wbt_track() when submit write request, it will lead to drop rqw->inflight to -1 in wbt_done(), which will end up trigger IO hung. Fix this issue by introduce a new state which mean the wbt was disabled. Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210619093700.920393-2-yi.zhang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Prioritize high-priority requestsBart Van Assche
While one or more requests with a certain I/O priority are pending, do not dispatch lower priority requests. Dispatch lower priority requests anyway after the "aging" time has expired. This patch has been tested as follows: modprobe scsi_debug ndelay=1000000 max_queue=16 && sd='' && while [ -z "$sd" ]; do sd=/dev/$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*) done && echo $((100*1000)) > /sys/block/$sd/queue/iosched/aging_expire && cd /sys/fs/cgroup/blkio/ && echo $$ >cgroup.procs && echo restrict-to-be >blkio.prio.class && mkdir -p hipri && cd hipri && echo none-to-rt >blkio.prio.class && { max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/low-pri.txt & } && echo $$ >cgroup.procs && max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/hi-pri.txt Result: * 11000 IOPS for the high-priority job * 40 IOPS for the low-priority job If the aging expiry time is changed from 100s into 0, the IOPS results change into 6712 and 6796 IOPS. The max-iops script is a script that runs fio with the following arguments: --bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60 --norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j} --iodepth=${arg_d} --iodepth_batch_submit=${arg_a} --iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1} --filename=${positional_argument_1} Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-17-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Add cgroup supportBart Van Assche
Maintain statistics per cgroup and export these to user space. These statistics are essential for verifying whether the proper I/O priorities have been assigned to requests. An example of the statistics data with this patch applied: $ cat /sys/fs/cgroup/io.stat 11:2 rbytes=0 wbytes=0 rios=3 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0 8:32 rbytes=2142720 wbytes=0 rios=105 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0 Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-16-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Track I/O statisticsBart Van Assche
Track I/O statistics per I/O priority and export these statistics to debugfs. These statistics help developers of the deadline scheduler. Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-15-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Add I/O priority supportBart Van Assche
Maintain one dispatch list and one FIFO list per I/O priority class: RT, BE and IDLE. Maintain statistics for each priority level. Split the debugfs attributes per priority level as follows: $ ls /sys/kernel/debug/block/.../sched/ async_depth dispatch2 read_next_rq write2_fifo_list batching read0_fifo_list starved write_next_rq dispatch0 read1_fifo_list write0_fifo_list dispatch1 read2_fifo_list write1_fifo_list Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-14-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Micro-optimize the batching algorithmBart Van Assche
When dispatching the first request of a batch, the deadline_move_request() call clears .next_rq[] for the opposite data direction. .next_rq[] is not restored when changing data direction. Fix this by not clearing .next_rq[] and by keeping track of the data direction of a batch in a variable instead. This patch is a micro-optimization because: - The number of deadline_next_request() calls for the read direction is halved. - The number of times that deadline_next_request() returns NULL is reduced. Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-13-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Reserve 25% of scheduler tags for synchronous requestsBart Van Assche
For interactive workloads it is important that synchronous requests are not delayed. Hence reserve 25% of scheduler tags for synchronous requests. This patch still allows asynchronous requests to fill the hardware queues since blk_mq_init_sched() makes sure that the number of scheduler requests is the double of the hardware queue depth. From blk_mq_init_sched(): q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth, BLKDEV_MAX_RQ); Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-12-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Improve the sysfs show and store macrosBart Van Assche
Define separate macros for integers and jiffies to improve readability. Use sysfs_emit() and kstrtoint() instead of sprintf() and simple_strtol(). The former functions are the recommended functions. Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-11-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Improve compile-time argument checkingBart Van Assche
Modern compilers complain if an out-of-range value is passed to a function argument that has an enumeration type. Let the compiler detect out-of-range data direction arguments instead of verifying the data_dir argument at runtime. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-10-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()Bart Van Assche
Change "queue" into "sched" to make the function names reflect better the purpose of these functions. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-9-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Remove two local variablesBart Van Assche
Make __dd_dispatch_request() easier to read by removing two local variables. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-8-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Add two lockdep_assert_held() statementsBart Van Assche
Document the locking strategy by adding two lockdep_assert_held() statements. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-7-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/mq-deadline: Add several commentsBart Van Assche
Make the code easier to read by adding more comments. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-6-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block: Introduce the ioprio rq-qos policyBart Van Assche
Introduce an rq-qos policy that assigns an I/O priority to requests based on blk-cgroup configuration settings. This policy has the following advantages over the ioprio_set() system call: - This policy is cgroup based so it has all the advantages of cgroups. - While ioprio_set() does not affect page cache writeback I/O, this rq-qos controller affects page cache writeback I/O for filesystems that support assiociating a cgroup with writeback I/O. See also Documentation/admin-guide/cgroup-v2.rst. Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210618004456.7280-5-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/blk-rq-qos: Move a function from a header file into a C fileBart Van Assche
rq_qos_id_to_name() is only used in blk-mq-debugfs.c so move that function into in blk-mq-debugfs.c. Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20210618004456.7280-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() callsBart Van Assche
Before adding more calls in this function, simplify the error path. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210618004456.7280-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21block/Kconfig: Make the BLK_WBT and BLK_WBT_MQ entries consecutiveBart Van Assche
These entries were consecutive at the time of their introduction but are no longer consecutive. Make these again consecutive. Additionally, modify the help text since it refers to blk-mq and since the legacy block layer has been removed. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20210618004456.7280-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-21netfs: fix test for whether we can skip read when writing beyond EOFJeff Layton
It's not sufficient to skip reading when the pos is beyond the EOF. There may be data at the head of the page that we need to fill in before the write. Add a new helper function that corrects and clarifies the logic of when we can skip reads, and have it only zero out the part of the page that won't have data copied in for the write. Finally, don't set the page Uptodate after zeroing. It's not up to date since the write data won't have been copied in yet. [DH made the following changes: - Prefixed the new function with "netfs_". - Don't call zero_user_segments() for a full-page write. - Altered the beyond-last-page check to avoid a DIV instruction and got rid of then-redundant zero-length file check. ] Fixes: e1b1240c1ff5f ("netfs: Add write_begin helper") Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> cc: ceph-devel@vger.kernel.org Link: https://lore.kernel.org/r/20210613233345.113565-1-jlayton@kernel.org/ Link: https://lore.kernel.org/r/162367683365.460125.4467036947364047314.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/162391826758.1173366.11794946719301590013.stgit@warthog.procyon.org.uk/ # v2
2021-06-21afs: Fix afs_write_end() to handle short writesDavid Howells
Fix afs_write_end() to correctly handle a short copy into the intended write region of the page. Two things are necessary: (1) If the page is not up to date, then we should just return 0 (ie. indicating a zero-length copy). The loop in generic_perform_write() will go around again, possibly breaking up the iterator into discrete chunks[1]. This is analogous to commit b9de313cf05fe08fa59efaf19756ec5283af672a for ceph. (2) The page should not have been set uptodate if it wasn't completely set up by netfs_write_begin() (this will be fixed in the next patch), so we need to set uptodate here in such a case. Also remove the assertion that was checking that the page was set uptodate since it's now set uptodate if it wasn't already a few lines above. The assertion was from when uptodate was set elsewhere. Changes: v3: Remove the handling of len exceeding the end of the page. Fixes: 3003bbd0697b ("afs: Use the netfs_write_begin() helper") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Al Viro <viro@zeniv.linux.org.uk> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YMwVp268KTzTf8cN@zeniv-ca.linux.org.uk/ [1] Link: https://lore.kernel.org/r/162367682522.460125.5652091227576721609.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/162391825688.1173366.3437507255136307904.stgit@warthog.procyon.org.uk/ # v2
2021-06-21gpio: mxc: Fix disabled interrupt wake-up supportLoic Poulain
A disabled/masked interrupt marked as wakeup source must be re-enable and unmasked in order to be able to wake-up the host. That can be done by flaging the irqchip with IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND. Note: It 'sometimes' works without that change, but only thanks to the lazy generic interrupt disabling (keeping interrupt unmasked). Reported-by: Michal Koziel <michal.koziel@emlogic.no> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-06-21Merge series "regulator: qcom,rpmh-regulator: Add support for pmic available ↵Mark Brown
on SA8155p-adp board" from Bhupesh Sharma <bhupesh.sharma@linaro.org>: Changes since v2: ----------------- - v2 series can be found here: https://lore.kernel.org/linux-arm-msm/20210615074543.26700-1-bhupesh.sharma@linaro.org/T/#m8303d27d561b30133992da88198abb78ea833e21 - Addressed review comments from Bjorn and Mark. - As per suggestion from Bjorn, seperated the patches in different patchsets (specific to each subsystem) to ease review and patch application. Changes since v1: ----------------- - v1 series can be found here: https://lore.kernel.org/linux-arm-msm/20210607113840.15435-1-bhupesh.sharma@linaro.org/T/#mc524fe82798d4c4fb75dd0333318955e0406ad18 - Addressed review comments from Bjorn and Vinod received on the v1 series. This series adds the regulator support code for SA8155p-adp board which is based on Qualcomm snapdragon sa8155p SoC which in turn is simiar to the sm8150 SoC. This board supports a new PMIC PMM8155AU. While at it, also make some cosmetic changes to the regulator driver and dt-bindings to make sure the compatibles are alphabetical and also fix issues with extra comma(s) at the end of terminator line(s). Cc: Mark Brown <broonie@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Bhupesh Sharma (5): dt-bindings: regulator: qcom,rpmh-regulator: Arrange compatibles alphabetically dt-bindings: regulator: qcom,rpmh-regulator: Add compatible for SA8155p-adp board pmic regulator: qcom-rpmh: Cleanup terminator line commas regulator: qcom-rpmh: Add terminator at the end of pm7325x_vreg_data[] array regulator: qcom-rpmh: Add new regulator found on SA8155p adp board .../regulator/qcom,rpmh-regulator.yaml | 17 ++--- drivers/regulator/qcom-rpmh-regulator.c | 62 +++++++++++++++---- 2 files changed, 59 insertions(+), 20 deletions(-) -- 2.31.1
2021-06-21Merge series "Extend regulator notification support" from Matti Vaittinen ↵Mark Brown
<matti.vaittinen@fi.rohmeurope.com>: Extend regulator notification support This series extends the regulator notification and error flag support. Initial discussion on the topic can be found here: https://lore.kernel.org/lkml/6046836e22b8252983f08d5621c35ececb97820d.camel@fi.rohmeurope.com/ In a nutshell - the series adds: 1. WARNING level events/error flags. (Patch 3) Current regulator 'ERROR' event notifications for over/under voltage, over current and over temperature are used to indicate condition where monitored entity is so badly "off" that it actually indicates a hardware error which can not be recovered. The most typical hanling for that is believed to be a (graceful) system-shutdown. Here we add set of 'WARNING' level flags to allow sending notifications to consumers before things are 'that badly off' so that consumer drivers can implement recovery-actions. 2. Device-tree properties for specifying limit values. (Patches 1, 5) Add limits for above mentioned 'ERROR' and 'WARNING' levels (which send notifications to consumers) and also for a 'PROTECTION' level (which will be used to immediately shut-down the regulator(s) W/O informing consumer drivers. Typically implemented by hardware). Property parsing is implemented in regulator core which then calls callback operations for limit setting from the IC drivers. A warning is emitted if protection is requested by device tree but the underlying IC does not support configuring requested protection. 3. Helpers which can be registered by IC. (Patch 4) Target is to avoid implementing IRQ handling and IRQ storm protection in each IC driver. (Many of the ICs implementin these IRQs do not allow masking or acking the IRQ but keep the IRQ asserted for the whole duration of problem keeping the processor in IRQ handling loop). 4. Emergency poweroff function (refactored out of the thermal_core to kernel/reboot.c) which is called if IC fires error IRQs but IC reading fails and given retry-count is exceeded. (Patches 2, 4) Please note that the mutex in the emergency shutdown was replaced by a simple atomic in order to allow call from any context. The helper was attempted to be done so it could be used to implement roughly same logic as is used in qcom-labibb regulator. This means amongst other things a safety shut-down if IC registers are not readable. Using these shut-down retry counters are optional. The idea is that the helper could be also used by simpler ICs which do not provide status register(s) which can be used to check if error is still active. ICs which do not have such status register can simply omit the 'renable' callback (and retry-counts etc) - and helper assumes the situation is Ok and re-enables IRQ after given time period. If problem persists the handler is ran again and another notification is sent - but at least the delay allows processor to avoid IRQ loop. Patch 7 takes this notification support in use at BD9576MUF. Patch 8 is related to MFD change which is not really related to the RFC here. It was added to this series in order to avoid potential conflicts. Patch 9 adds a maintainers entry. Changelog v10-RESEND: - rebased on v5.13-rc4 Changelog v10: - rebased on v5.13-rc2 - Move rdev_*() print macros to the internal.h and use rdev_dbg() from irq_helpers.c - Export rdev_get_name() and move it from coupler.h to driver.h for others to use. (It was already in coupler.h but not exported - usage was limited and coupler.h does not sound like optimal place as rdev_name is not only used by coupled regulators) - Send all regulator notifications from irq_helpers.c at one OR'd event for the sake of simplicity. For BD9576 this does not matter as it has own IRQ for each event case. Header defining events says they may be OR'd. - Change WARN() at protection shutdown to pr_emerg as suggested by Petr. Changelog v9: - rebases on v5.13-rc1 - Update thermal documentation - Fix regulator notification event number Changelog v8: - split shutdown API adding and thermal core taking it in use to own patches. - replace the spinlock with atomic when ensuring the emergency shutdown is only called once. Changelog v7: general: - rebased on v5.12-rc7 - new patch for refactoring the hw-failure reboot logic out of thermal_core.c for others to use. notification helpers: - fix regulator error_flags query - grammar/typos - do not BUG() but attempt to shut-down the system - use BITS_PER_TYPE() Changelog v6: Add MAINTAINERS entry Changes to IRQ notifiers - move devm functions to drivers/regulator/devres.c - drop irq validity check - use devm_add_action_or_reset() - fix styling issues - fix kerneldocs Changelog v5: - Fix the badly formatted pr_emerg() call. Changelog v4: - rebased on v5.12-rc6 - dropped RFC - fix external FET DT-binding. - improve prints for cases when expecting HW failure. - styling and typos Changelog v3: Regulator core: - Fix dangling pointer access at regulator_irq_helper() stpmic1_regulator: - fix function prototype (compile error) bd9576-regulator: - Update over current limits to what was given in new data-sheet (REV00K) - Allow over-current monitoring without external FET. Set limits to values given in data-sheet (REV00K). Changelog v2: Generic: - rebase on v5.12-rc2 + BD9576 series - Split devm variant of delayed wq to own series Regulator framework: - Provide non devm variant of IRQ notification helpers - shorten dt-property names as suggested by Rob - unconditionally call map_event in IRQ handling and require it to be populated BD9576 regulators: - change the FET resistance property to micro-ohms - fix voltage computation in OC limit setting
2021-06-21arm64/mm: Rename ARM64_SWAPPER_USES_SECTION_MAPSAnshuman Khandual
ARM64_SWAPPER_USES_SECTION_MAPS implies that a PMD level huge page mappings are used for swapper, idmap and vmemmap. Lets make it PMD explicit removing any possible confusion with generic memory sections and also bit generic as it's applicable for idmap and vmemmap mappings as well. Hence rename it as ARM64_KERNEL_USES_PMD_MAPS instead. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/1623991622-24294-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-21KVM: nVMX: Dynamically compute max VMCS index for vmcs12Sean Christopherson
Calculate the max VMCS index for vmcs12 by walking the array to find the actual max index. Hardcoding the index is prone to bitrot, and the calculation is only done on KVM bringup (albeit on every CPU, but there aren't _that_ many null entries in the array). Fixes: 3c0f99366e34 ("KVM: nVMX: Add a TSC multiplier field in VMCS12") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210618214658.2700765-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-21KVM: VMX: Skip #PF(RSVD) intercepts when emulating smaller maxphyaddrJim Mattson
As part of smaller maxphyaddr emulation, kvm needs to intercept present page faults to see if it needs to add the RSVD flag (bit 3) to the error code. However, there is no need to intercept page faults that already have the RSVD flag set. When setting up the page fault intercept, add the RSVD flag into the #PF error code mask field (but not the #PF error code match field) to skip the intercept when the RSVD flag is already set. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20210618235941.1041604-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-21Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fix from Russell King: - fix gcc 10 compiler regression with cpu_init() * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9081/1: fix gcc-10 thumb2-kernel regression