summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-04-08perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to ↵Yunlong Song
the unexpected change of pid_max The current memory allocation of struct task_desc *pid_to_task[MAX_PID] is in a permanent and preset way, and it has two problems: Problem 1: If the pid_max, which is the max number of pids in the system, is much smaller than MAX_PID (1024*1000), then it causes a waste of stack memory. This may happen in the case where the number of cpu cores is much smaller than 1000. Problem 2: If the pid_max is changed from the default value to a value larger than MAX_PID, then it will cause assertion failure problem. The maximum value of pid_max can be set to pid_max_max (see pidmap_init defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64, PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This value is much larger than MAX_PID, and will take up 32768 Kbytes (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much larger than the default 8192 Kbytes of the stack size of calling process. Due to these two problems, we use calloc to allocate the memory of pid_to_task dynamically. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ echo 1025000 > /proc/sys/kernel/pid_max $ cat /proc/sys/kernel/pid_max 1025000 Run some applications until the pid of some process is greater than the value of MAX_PID (1024*1000). Before this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55480 nsecs the run test took 1000008 nsecs the sleep test took 1063151 nsecs perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55435 nsecs the run test took 1000004 nsecs the sleep test took 1059312 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08perf sched replay: Increase the MAX_PID value to fix assertion failure problemYunlong Song
Current MAX_PID is only 65536, which will cause assertion failure problem when CPU cores are more than 64 in x86_64. This is because the pid_max value in x86_64 is at least PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in include/linux/threads.h). Thus for MAX_PID = 65536, the correspoinding CPU cores are 65536/1024=64. This is obviously not enough at all for x86_64, and will cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the codes. We increase MAX_PID value from 65536 to 1024*1000, which can be used in x86_64 with 1000 cores. This number is finally decided according to the limitation of stack size of calling process. Use 'ulimit -a', the result shows the stack size of any process is 8192 Kbytes, which is defined in include/uapi/linux/resource.h (#define _STK_LIM (8*1024*1024)). Thus we choose a large enough value for MAX_PID, and make it satisfy to the limitation of the stack size, i.e., making the perf process take up a memory space just smaller than 8192 Kbytes. We have calculated and tested that 1024*1000 is OK for MAX_PID. This means perf sched replay can now be used with at most 1000 cores in x86_64 without any assertion failure problem. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 Before this patch: $ perf sched replay run measurement overhead: 240 nsecs sleep measurement overhead: 55379 nsecs the run test took 1000004 nsecs the sleep test took 1059424 nsecs perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55397 nsecs the run test took 999920 nsecs the sleep test took 1053313 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08perf sched replay: Use struct task_desc instead of struct task_task for ↵Yunlong Song
correct meaning There is no struct task_task at all, thus it is a typo error in the old commits, now fix it to what it should be in order to avoid unnecessary misunderstanding. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08perf kmem: Respect -i optionJiri Olsa
Currently the perf kmem does not respect -i option. Initializing the file.path properly after options get parsed. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1428298576-9785-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08tools lib traceevent: Honor operator priorityNamhyung Kim
Currently it ignores operator priority and just sets processed args as a right operand. But it could result in priority inversion in case that the right operand is also a operator arg and its priority is lower. For example, following print format is from new kmem events. "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0) But this was treated as below: REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0) In this case, the right arg was '?' operator which has lower priority. But it just sets the whole arg so making the output confusing - page was always 0 or 1 since that's the result of logical operation. With this patch, it can handle it properly like following: ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1428298576-9785-10-git-send-email-namhyung@kernel.org [ Replaced 'swap' with 'rotate' in a comment as requested by Steve and agreed by Namhyung ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08perf kmaps: Check kmaps to make code more robustWang Nan
This patch add checks in places where map__kmap is used to get kmaps from struct kmap. Error messages are added at map__kmap to warn invalid accessing of kmap (for the case of !map->dso->kernel, kmap(map) does not exists at all). Also, introduces map__kmaps() to warn uninitialized kmaps. Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: pi3orama@163.com Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1428394966-131044-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08perf evlist: Fix inverted logic in perf_mmap__emptyHe Kuang
perf_evlist__mmap_consume() uses perf_mmap__empty() to judge whether perf_mmap is empty and can be released. But the result is inverted so fix it. Signed-off-by: He Kuang <hekuang@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1428399071-7141-1-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08ALSA: hda - Work around races of power up/down with runtime PMTakashi Iwai
Currently, snd_hdac_power_up()/down() helpers checks whether the codec is being in pm (suspend/resume), and skips the call of runtime get/put during it. This is needed as there are lots of power up/down sequences called in the paths that are also used in the PM itself. An example is found in hda_codec.c::codec_exec_verb(), where this can power up the codec while it may be called again in its power up sequence, too. The above works in most cases, but sometimes we really want to wait for the real power up. For example, the control element get/put may want explicit power up so that the value change is assured to reach to the hardware. Using the current snd_hdac_power_up(), however, results in a race, e.g. when it's called during the runtime suspend is being performed. In the worst case, as found in patch_ca0132.c, it can even lead to the deadlock because the code assumes the power up while it was skipped due to the check above. For dealing with such cases, this patch makes snd_hdac_power_up() and _down() to two variants: with and without in_pm flag check. The version with pm flag check is named as snd_hdac_power_up_pm() while the version without pm flag check is still kept as snd_hdac_power_up(). (Just because the usage of the former is fewer.) Then finally, the patch replaces each call potentially done in PM with the new _pm() variant. In theory, we can implement a unified version -- if we can distinguish the current context whether it's in the pm path. But such an implementation is cumbersome, so leave the code like this a bit messy way for now... Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96271 Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-04-08regulator: qcom: Tidy up probe()Bjorn Andersson
Tidy up error reporting and move rpm reference retrieval out of the for loop for improved readability. Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-04-08regulator: qcom: Rework to single platform deviceBjorn Andersson
Modeling the individual RPM resources as platform devices consumes at least 12-15kb of RAM, just to hold the platform_device structs. Rework this to instead have one device per pmic exposed by the RPM. With this representation we can more accurately define the input pins on the pmic and have the supply description match the data sheet. Suggested-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-04-08regulator: qcom: Refactor of-parsing codeBjorn Andersson
Refactor out all custom property parsing code from the probe function into a function suitable for regulator_desc->of_parse_cb usage. Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-04-08regulator: qcom: Don't enable DRMS in driverBjorn Andersson
The driver itself should not flag regulators as being DRMS compatible, this should come from board or dt files. Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-04-08kconfig: Simplify MakefileMichal Marek
Use a single rule for targets handled directly by the conf program. Signed-off-by: Michal Marek <mmarek@suse.cz>
2015-04-08regulator: max8660: fix assignment of pdata to data that becomes deadColin Ian King
pdata is assigned to &pdata_of, however, pdata_of becomes dead (when it goes out of scope) so pdata effectively becomes a dead pointer to the out of scope object. This is detected by static analysis: [drivers/regulator/max8660.c:411]: (error) Dead pointer usage. Pointer 'pdata' is dead if it has been assigned '&pdata_of' at line 404. Move declaration of pdata_of so it is always in scope. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-04-08spi: img-spfi: Setup TRANSACTION register before CONTROL registerSifan Naeem
Setting the transfer length in the TRANSACTION register after the CONTROL register is programmed causes intermittent timeout issues in SPFI transfers when using the SPI framework to control the CS GPIO lines. To avoid this issue, set transfer length before programming the CONTROL register. Signed-off-by: Sifan Naeem <sifan.naeem@imgtec.com> Signed-off-by: Andrew Bresticker <abrestic@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-04-08ASoC: Intel: Fix a buffer overflow issueJie Yang
0day robot reported a buffer overflow issue: ... sound/soc/intel/haswell/sst-haswell-pcm.c:1107 hsw_pcm_probe() error: buffer\ overflow 'hsw_dais' 4 <= 4 sound/soc/intel/haswell/sst-haswell-pcm.c:1109 hsw_pcm_probe() error: buffer\ overflow 'hsw_dais' 4 <= 4 ... Fix it by initializing the index(i) to correct value. Signed-off-by: Jie Yang <yang.jie@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-04-08mmc: core: Convert the error field in struct mmc_command|data into an intUlf Hansson
Everybody expects the error field in the struct mmc_command|data to be and int but it's actually an unsigned int. Let's convert it into an int to meet the expectations. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-04-08mmc: sdhci-of-arasan: Call OF parsing for MMCMichal Simek
Also check MMC OF properties. The controller supports MMC too. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-04-08mmc: sdhci-pci: fix 64 BIT DMA quirks for rtsxMicky Ching
rts5250 chip failed handle 64 bit ADMA for address below 4G. Add 64 BIT quirks to disable this feature. Signed-off-by: Micky Ching <micky_ching@realsil.com.cn> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-04-08ALSA: hda - Create AFG sysfs node at lastTakashi Iwai
... so that user-space can know that the whole nodes have been created. Unfortunately, this can't be implemented easily in race-free way, so it's a kind of compromise. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-04-08ALSA: hda/realtek - Support Dell headset mode for ALC288Kailang Yang
Dell create new platform with ALC288 codec. This patch will enable headset mode for Dino platform. [slight code refactoring and compile fix by tiwai] Signed-off-by: Kailang Yang <kailang@realtek.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-04-08ALSA: hda/realtek - Support headset mode for ALC286/288Kailang Yang
Support headset mode for ALC286 and ALC288 platforms. Signed-off-by: Kailang Yang <kailang@realtek.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-04-08Merge branch 'for-linus' into for-nextTakashi Iwai
Back merge HD-audio quirks to for-next branch, so that we can apply a couple of more quirks. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-04-08ALSA: hda/realtek - Make more stable to get pin sense for ALC283Kailang Yang
Pin sense will active when power pin is wake up. Power pin will not wake up immediately during resume state. Add some delay to wait for power pin activated. Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-04-08drm/rockchip: use for_each_endpoint_of_node macro, drop endpoint reference ↵Philipp Zabel
on break Using the for_each_... macro should make the code a bit shorter and easier to read. Also, when breaking out of the loop, the endpoint node reference count needs to be decremented. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
2015-04-08drm/rcar-du: use for_each_endpoint_of_node macroPhilipp Zabel
Using the for_each_... macro should make the code a bit shorter and easier to read. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2015-04-08drm/imx: use for_each_endpoint_of_node macro in imx_drm_encoder_get_mux_idPhilipp Zabel
Using the for_each_... macro should make the code bit shorter and easier to read. This patch also properly decrements the endpoint node reference count before returning out of the loop. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2015-04-08drm: use for_each_endpoint_of_node macro in drm_of_find_possible_crtcsPhilipp Zabel
Using the for_each_... macro should make the code a bit shorter and easier to read. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2015-04-08kvm: mmu: lazy collapse small sptes into large sptesWanpeng Li
Dirty logging tracks sptes in 4k granularity, meaning that large sptes have to be split. If live migration is successful, the guest in the source machine will be destroyed and large sptes will be created in the destination. However, the guest continues to run in the source machine (for example if live migration fails), small sptes will remain around and cause bad performance. This patch introduce lazy collapsing of small sptes into large sptes. The rmap will be scanned in ioctl context when dirty logging is stopped, dropping those sptes which can be collapsed into a single large-page spte. Later page faults will create the large-page sptes. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428046825-6905-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: Clear CR2 on VCPU resetNadav Amit
CR2 is not cleared as it should after reset. See Intel SDM table named "IA-32 Processor States Following Power-up, Reset, or INIT". Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-5-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: DR0-DR3 are not clear on resetNadav Amit
DR0-DR3 are not cleared as they should during reset and when they are set from userspace. It appears to be caused by c77fb5fe6f03 ("KVM: x86: Allow the guest to run with dirty debug registers"). Force their reload on these situations. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-4-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: BSP in MSR_IA32_APICBASE is writableNadav Amit
After reset, the CPU can change the BSP, which will be used upon INIT. Reset should return the BSP which QEMU asked for, and therefore handled accordingly. To quote: "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: simplify kvm_apic_mapRadim Krčmář
recalculate_apic_map() uses two passes over all VCPUs. This is a relic from time when we selected a global mode in the first pass and set up the optimized table in the second pass (to have a consistent mode). Recent changes made mixed mode unoptimized and we can do it in one pass. Format of logical MDA is a function of the mode, so we encode it in apic_logical_id() and drop obsoleted variables from the struct. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-5-git-send-email-rkrcmar@redhat.com> [Add lid_bits temporary in apic_logical_id. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: avoid logical_map when it is invalidRadim Krčmář
We want to support mixed modes and the easiest solution is to avoid optimizing those weird and unlikely scenarios. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-4-git-send-email-rkrcmar@redhat.com> [Add comment above KVM_APIC_MODE_* defines. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: fix mixed APIC mode broadcastRadim Krčmář
Broadcast allowed only one global APIC mode, but mixed modes are theoretically possible. x2APIC IPI doesn't mean 0xff as broadcast, the rest does. x2APIC broadcasts are accepted by xAPIC. If we take SDM to be logical, even addreses beginning with 0xff should be accepted, but real hardware disagrees. This patch aims for simple code by considering most of real behavior as undefined. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-3-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: use MDA for interrupt matchingRadim Krčmář
In mixed modes, we musn't deliver xAPIC IPIs like x2APIC and vice versa. Instead of preserving the information in apic_send_ipi(), we regain it by converting all destinations into correct MDA in the slow path. This allows easier reasoning about subsequent matching. Our kvm_apic_broadcast() had an interesting design decision: it didn't consider IOxAPIC 0xff as broadcast in x2APIC mode ... everything worked because IOxAPIC can't set that in physical mode and logical mode considered it as a message for first 8 VCPUs. This patch interprets IOxAPIC 0xff as x2APIC broadcast. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-2-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08kvm/ppc/mpic: drop unused IRQ_testbitArseny Solokha
Drop unused static procedure which doesn't have callers within its translation unit. It had been already removed independently in QEMU[1] from the OpenPIC implementation borrowed from the kernel. [1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.html Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Cc: Alexander Graf <agraf@suse.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1424768706-23150-3-git-send-email-asolokha@kb.kras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: nVMX: remove unnecessary double caching of MAXPHYADDREugene Korenevsky
After speed-up of cpuid_maxphyaddr() it can be called frequently: instead of heavyweight enumeration of CPUID entries it returns a cached pre-computed value. It is also inlined now. So caching its result became unnecessary and can be removed. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205644.GA1258@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entryEugene Korenevsky
On each VM-entry CPU should check the following VMCS fields for zero bits beyond physical address width: - APIC-access address - virtual-APIC address - posted-interrupt descriptor address This patch adds these checks required by Intel SDM. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205627.GA1244@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpuEugene Korenevsky
cpuid_maxphyaddr(), which performs lot of memory accesses is called extensively across KVM, especially in nVMX code. This patch adds a cached value of maxphyaddr to vcpu.arch to reduce the pressure onto CPU cache and simplify the code of cpuid_maxphyaddr() callers. The cached value is initialized in kvm_arch_vcpu_init() and reloaded every time CPUID is updated by usermode. It is obvious that these reloads occur infrequently. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205612.GA1223@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: vmx: pass error code with internal error #2Radim Krčmář
Exposing the on-stack error code with internal error is cheap and potentially useful. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428001865-32280-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08x86: vdso: fix pvclock races with task migrationRadim Krčmář
If we were migrated right after __getcpu, but before reading the migration_count, we wouldn't notice that we read TSC of a different VCPU, nor that KVM's bug made pvti invalid, as only migration_count on source VCPU is increased. Change vdso instead of updating migration_count on destination. Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"") Message-Id: <1428000263-11892-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: remove kvm_read_hva and kvm_read_hva_atomicPaolo Bonzini
The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit 86ab8cffb498 (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>
2015-04-08KVM: x86: optimize delivery of TSC deadline timer interruptPaolo Bonzini
The newly-added tracepoint shows the following results on the tscdeadline_latency test: qemu-kvm-8387 [002] 6425.558974: kvm_vcpu_wakeup: poll time 10407 ns qemu-kvm-8387 [002] 6425.558984: kvm_vcpu_wakeup: poll time 0 ns qemu-kvm-8387 [002] 6425.561242: kvm_vcpu_wakeup: poll time 10477 ns qemu-kvm-8387 [002] 6425.561251: kvm_vcpu_wakeup: poll time 0 ns and so on. This is because we need to go through kvm_vcpu_block again after the timer IRQ is injected. Avoid it by polling once before entering kvm_vcpu_block. On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%) from the latency of the TSC deadline timer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: extract blocking logic from __vcpu_runPaolo Bonzini
Rename the old __vcpu_run to vcpu_run, and extract part of it to a new function vcpu_block. The next patch will add a new condition in vcpu_block, avoid extra indentation. Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08kvm: x86: fix x86 eflags fixed bitWanpeng Li
Guest can't be booted w/ ept=0, there is a message dumped as below: If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000011 EBX=f000d2f6 ECX=00006cac EDX=000f8956 ESI=bffbdf62 EDI=00000000 EBP=00006c68 ESP=00006c68 EIP=0000d187 EFL=00000004 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =e000 000e0000 ffffffff 00809300 DPL=0 DS16 [-WA] CS =f000 000f0000 ffffffff 00809b00 DPL=0 CS16 [-RA] SS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] DS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] FS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] GS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f6a80 00000037 IDT= 000f6abe 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=01 1e b8 6a 2e 0f 01 16 74 6a 0f 20 c0 66 83 c8 01 0f 22 c0 <66> ea 8f d1 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 b8X X86 eflags bit 1 is fixed set, which means that 1 << 1 is set instead of 1, this patch fix it. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428473294-6633-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08gpio: arrange Kconfig symbols alphabeticallyLinus Walleij
This rearranges the GPIO drivers Kconfig symbols alphabetically as the top comment in the file already states they should be. No functional changes whatsoever. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-04-08gpio: ich: Implement get_direction functionAaron Sierra
Allow the kernel to query the driver for a GPIO's pin direction. Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-04-08gpio: use (!foo) instead of (foo == NULL)Varka Bhadram
Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-04-08gpio: arizona: drop owner assignment from platform_driversVarka Bhadram
This driver no need to set the owner field, it will be populated by driver core. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>