linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-03-08	PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED	Bjorn Helgaas
	A shadow copy of an option ROM is placed by the BIOS as a fixed address. Set IORESOURCE_PCI_FIXED to indicate that we can't move the shadow copy. This prevents warnings like the following when we assign resources: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment This warning is emitted by pdev_sort_resources(), which already ignores IORESOURCE_PCI_FIXED resources. Link: http://lkml.kernel.org/r/CA+55aFyVMfTBB0oz_yx8+eQOEJnzGtCsYSj9QuhEpdZ9BHdq5A@mail.gmail.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08	x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs	Bjorn Helgaas
	The Home Agent and PCU PCI devices in Broadwell-EP have a non-BAR register where a BAR should be. We don't know what the side effects of sizing the "BAR" would be, and we don't know what address space the "BAR" might appear to describe. Mark these devices as having non-compliant BARs so the PCI core doesn't touch them. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Andi Kleen <ak@linux.intel.com> CC: stable@vger.kernel.org
2016-03-08	Merge tag 'sound-4.5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "It's always an ambivalent feeling to send a large pull request at the late stage like this, especially when most of patches came from me. Anyway, this is a collection of lots of small fixes that slipped from the previous pull request. All fixes are about ASoC, and the majority of changes are corrections of the wrong access types in ALSA ctl enum items. They are mostly harmless on 32bit architectures, but actually buggy on 64bit. So we addressed all these now in a shot. The rest are various small ASoC driver fixes. Among them, only two changes have been done to ASoC core, and both of them are trivial. The rest are all device-specific. So overall, they should be safe to apply" * tag 'sound-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits) ASoC: wm_adsp: Fix enum ctl accesses in a wrong type ASoC: wm9081: Fix enum ctl accesses in a wrong type ASoC: wm8996: Fix enum ctl accesses in a wrong type ASoC: wm8994: Fix enum ctl accesses in a wrong type ASoC: wm8985: Fix enum ctl accesses in a wrong type ASoC: wm8983: Fix enum ctl accesses in a wrong type ASoC: wm8958: Fix enum ctl accesses in a wrong type ASoC: wm8904: Fix enum ctl accesses in a wrong type ASoC: wm8753: Fix enum ctl accesses in a wrong type ASoC: wl1273: Fix enum ctl accesses in a wrong type ASoC: tlv320dac33: Fix enum ctl accesses in a wrong type ASoC: max98095: Fix enum ctl accesses in a wrong type ASoC: max98088: Fix enum ctl accesses in a wrong type ASoC: ab8500: Fix enum ctl accesses in a wrong type ASoC: da732x: Fix enum ctl accesses in a wrong type ASoC: cs42l51: Fix enum ctl accesses in a wrong type ASoC: intel: mfld: Fix enum ctl accesses in a wrong type ASoC: omap: rx51: Fix enum ctl accesses in a wrong type ASoC: omap: n810: Fix enum ctl accesses in a wrong type ASoC: pxa: tosa: Fix enum ctl accesses in a wrong type ...
2016-03-08	unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition	Bjorn Helgaas
	HAVE_ARCH_PCI_SET_DMA_MASK is unused, so remove the #define for it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: GUAN Xuetao <gxt@mprc.pku.edu.cn>
2016-03-08	Merge tag 'edac_fix_for_4.5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC fix from Borislav Petkov: "Last minute fix for sb_edac which fixes DIMM detection on certain Xeon Phi configurations: A single fix to the Xeon Phi section of sb_edac. The issue was introduced during this merge window" * tag 'edac_fix_for_4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon Phi
2016-03-08	x86/mm, x86/mce: Add memcpy_mcsafe()	Tony Luck
	Make use of the EXTABLE_FAULT exception table entries to write a kernel copy routine that doesn't crash the system if it encounters a machine check. Prime use case for this is to copy from large arrays of non-volatile memory used as storage. We have to use an unrolled copy loop for now because current hardware implementations treat a machine check in "rep mov" as fatal. When that is fixed we can simplify. Return type is a "bool". True means that we copied OK, false means that it didn't. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@gmail.com> Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08	cgroup: implement cgroup_subsys->implicit_on_dfl	Tejun Heo
	Some controllers, perf_event for now and possibly freezer in the future, don't really make sense to control explicitly through "cgroup.subtree_control". For example, the primary role of perf_event is identifying the cgroups of tasks; however, because the controller also keeps a small amount of state per cgroup, it can't be replaced with simple cgroup membership tests. This patch implements cgroup_subsys->implicit_on_dfl flag. When set, the controller is implicitly enabled on all cgroups on the v2 hierarchy so that utility type controllers such as perf_event can be enabled and function transparently. An implicit controller doesn't show up in "cgroup.controllers" or "cgroup.subtree_control", is exempt from no internal process rule and can be stolen from the default hierarchy even if there are non-root csses. v2: Reimplemented on top of the recent updates to css handling and subsystem rebinding. Rebinding implicit subsystems is now a simple matter of exempting it from the busy subsystem check. Signed-off-by: Tejun Heo <tj@kernel.org>
2016-03-08	cgroup: use css_set->mg_dst_cgrp for the migration target cgroup	Tejun Heo
	Migration can be multi-target on the default hierarchy when a controller is enabled - processes belonging to each child cgroup have to be moved to the child cgroup itself to refresh css association. This isn't a problem for cgroup_migrate_add_src() as each source css_set still maps to single source and target cgroups; however, cgroup_migrate_prepare_dst() is called once after all source css_sets are added and thus might not have a single destination cgroup. This is currently worked around by specifying NULL for @dst_cgrp and using the source's default cgroup as destination as the only multi-target migration in use is self-targetting. While this works, it's subtle and clunky. As all taget cgroups are already specified while preparing the source css_sets, this clunkiness can easily be removed by recording the target cgroup in each source css_set. This patch adds css_set->mg_dst_cgrp which is recorded on cgroup_migrate_src() and used by cgroup_migrate_prepare_dst(). This also makes migration code ready for arbitrary multi-target migration. Signed-off-by: Tejun Heo <tj@kernel.org>
2016-03-08	cgroup: make cgroup[_taskset]_migrate() take cgroup_root instead of cgroup	Tejun Heo
	On the default hierarchy, a migration can be multi-source and/or multi-destination. cgroup_taskest_migrate() used to incorrectly assume single destination cgroup but the bug has been fixed by 1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from subtree_control enabling"). Since the commit, @dst_cgrp to cgroup[_taskset]_migrate() is only used to determine which subsystems are affected or which cgroup_root the migration is taking place in. As such, @dst_cgrp is misleading. This patch replaces @dst_cgrp with @root. Signed-off-by: Tejun Heo <tj@kernel.org>
2016-03-08	cgroup: move migration destination verification out of ↵	Tejun Heo
	cgroup_migrate_prepare_dst() cgroup_migrate_prepare_dst() verifies whether the destination cgroup is allowable; however, the test doesn't really belong there. It's too deep and common in the stack and as a result the test itself is gated by another test. Separate the test out into cgroup_may_migrate_to() and update cgroup_attach_task() and cgroup_transfer_tasks() to perform the test directly. This doesn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org>
2016-03-08	cgroup: fix incorrect destination cgroup in cgroup_update_dfl_csses()	Tejun Heo
	cgroup_update_dfl_csses() should move each task in the subtree to self; however, it was incorrectly calling cgroup_migrate_add_src() with the root of the subtree as @dst_cgrp. Fortunately, cgroup_migrate_add_src() currently uses @dst_cgrp only to determine the hierarchy and the bug doesn't cause any actual breakages. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org>
2016-03-08	perf/x86/intel/rapl: Simplify quirk handling even more	Borislav Petkov
	Drop the quirk() function pointer in favor of a simple boolean which says whether the quirk should be applied or not. Update comment while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-tip-commits@vger.kernel.org Link: http://lkml.kernel.org/r/20160308164041.GF16568@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08	drm/amdgpu/dp: add back special handling for NUTMEG	Alex Deucher
	When I fixed the dp rate selection in: 3b73b168cffd9c392584d3f665021fa2190f8612 drm/amdgpu: fix dp link rate selection (v2) I accidently dropped the special handling for NUTMEG DP bridge chips. They require a fixed link rate. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-03-08	drm/radeon/dp: add back special handling for NUTMEG	Alex Deucher
	When I fixed the dp rate selection in: 092c96a8ab9d1bd60ada2ed385cc364ce084180e drm/radeon: fix dp link rate selection (v2) I accidently dropped the special handling for NUTMEG DP bridge chips. They require a fixed link rate. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Tested-by: Ken Moffat <zarniwhoop@ntlworld.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-03-08	bcache: fix cache_set_flush() NULL pointer dereference on OOM	Eric Wheeler
	When bch_cache_set_alloc() fails to kzalloc the cache_set, the asyncronous closure handling tries to dereference a cache_set that hadn't yet been allocated inside of cache_set_flush() which is called by __cache_set_unregister() during cleanup. This appears to happen only during an OOM condition on bcache_register. Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: stable@vger.kernel.org
2016-03-08	bcache: cleaned up error handling around register_cache()	Eric Wheeler
	Fix null pointer dereference by changing register_cache() to return an int instead of being void. This allows it to return -ENOMEM or -ENODEV and enables upper layers to handle the OOM case without NULL pointer issues. See this thread: http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3521 Fixes this error: gargamel:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register bcache: register_cache() error opening sdh2: cannot allocate memory BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8 IP: [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache] PGD 120dff067 PUD 1119a3067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: veth ip6table_filter ip6_tables (...) CPU: 4 PID: 3371 Comm: kworker/4:3 Not tainted 4.4.2-amd64-i915-volpreempt-20160213bc1 #3 Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 Workqueue: events cache_set_flush [bcache] task: ffff88020d5dc280 ti: ffff88020b6f8000 task.ti: ffff88020b6f8000 RIP: 0010:[<ffffffffc05a7e8d>] [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache] Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net> Tested-by: Marc MERLIN <marc@merlins.org> Cc: <stable@vger.kernel.org>
2016-03-08	bcache: fix race of writeback thread starting before complete initialization	Eric Wheeler
	The bch_writeback_thread might BUG_ON in read_dirty() if dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed its related initialization. This patch downs the dc->writeback_lock until after initialization is complete, thus preventing bch_writeback_thread from proceeding prematurely. See this thread: http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453 Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net> Tested-by: Marc MERLIN <marc@merlins.org> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-08	futex: Replace barrier() in unqueue_me() with READ_ONCE()	Jianyu Zhan
	Commit e91467ecd1ef ("bug in futex unqueue_me") introduced a barrier() in unqueue_me() to prevent the compiler from rereading the lock pointer which might change after a check for NULL. Replace the barrier() with a READ_ONCE() for the following reasons: 1) READ_ONCE() is a weaker form of barrier() that affects only the specific load operation, while barrier() is a general compiler level memory barrier. READ_ONCE() was not available at the time when the barrier was added. 2) Aside of that READ_ONCE() is descriptive and self explainatory while a barrier without comment is not clear to the casual reader. No functional change. [ tglx: Massaged changelog ] Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: dave@stgolabs.net Cc: peterz@infradead.org Cc: linux@rasmusvillemoes.dk Cc: akpm@linux-foundation.org Cc: fengguang.wu@intel.com Cc: bigeasy@linutronix.de Link: http://lkml.kernel.org/r/1457314344-5685-1-git-send-email-nasa4836@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-08	NVMe: Create discard zero quirk white list	Keith Busch
	The NVMe specification does not require discarded blocks return zeroes on read, but provides that behavior as a possibility. Some applications more efficiently use an SSD if reads on discarded blocks were deterministically zero, based on the "discard_zeroes_data" queue attribute. There is no specification defined way to determine device behavior on discarded blocks, so the driver always left the queue setting disabled. We can only know behavior based on individual device models, so this patch adds a flag to the NVMe "quirk" list that vendors may set if they know their controller works that way. The patch also sets the new flag for one such known device. Signed-off-by: Keith Busch <keith.busch@intel.com> Suggested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-08	s390: Fix misspellings in comments	Adam Buchbinder
	Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08	s390/mm: split arch/s390/mm/pgtable.c	Martin Schwidefsky
	The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08	s390/mm: uninline pmdp_xxx functions from pgtable.h	Martin Schwidefsky
	The pmdp_xxx function are smaller than their ptep_xxx counterparts but to keep things symmetrical unline them as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08	s390/mm: uninline ptep_xxx functions from pgtable.h	Martin Schwidefsky
	The code in the various ptep_xxx functions has grown quite large, consolidate them to four out-of-line functions: ptep_xchg_direct to exchange a pte with another with immediate flushing ptep_xchg_lazy to exchange a pte with another in a batched update ptep_modify_prot_start to begin a protection flags update ptep_modify_prot_commit to commit a protection flags update Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08	Merge branch 'linus' into x86/fpu, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08	x86/asm-offsets: Remove PARAVIRT_enabled	Andy Lutomirski
	It no longer has any users. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08	x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled	Andy Lutomirski
	x86_64 has very clean espfix handling on paravirt: espfix64 is set up in native_iret, so paravirt systems that override iret bypass espfix64 automatically. This is robust and straightforward. x86_32 is messier. espfix is set up before the IRET paravirt patch point, so it can't be directly conditionalized on whether we use native_iret. We also can't easily move it into native_iret without regressing performance due to a bizarre consideration. Specifically, on 64-bit kernels, the logic is: if (regs->ss & 0x4) setup_espfix; On 32-bit kernels, the logic is: if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 && (regs->flags & X86_EFLAGS_VM) == 0) setup_espfix; The performance of setup_espfix itself is essentially irrelevant, but the comparison happens on every IRET so its performance matters. On x86_64, there's no need for any registers except flags to implement the comparison, so we fold the whole thing into native_iret. On x86_32, we don't do that because we need a free register to implement the comparison efficiently. We therefore do espfix setup before restoring registers on x86_32. This patch gets rid of the explicit paravirt_enabled check by introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE to skip espfix on paravirt systems where iret != native_iret. This is also messy, but it's at least in line with other things we do. This improves espfix performance by removing a branch, but no one cares. More importantly, it removes a paravirt_enabled user, which is good because paravirt_enabled is ill-defined and is going away. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08	KVM: s390: allocate only one DMA page per VM	David Hildenbrand
	We can fit the 2k for the STFLE interpretation and the crypto control block into one DMA page. As we now only have to allocate one DMA page, we can clean up the code a bit. As a nice side effect, this also fixes a problem with crycbd alignment in case special allocation debug options are enabled, debugged by Sascha Silbe. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: enable STFLE interpretation only if enabled for the guest	David Hildenbrand
	Not setting the facility list designation disables STFLE interpretation, this is what we want if the guest was told to not have it. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: wake up when the VCPU cpu timer expires	David Hildenbrand
	When the VCPU cpu timer expires, we have to wake up just like when the ckc triggers. For now, setting up a cpu timer in the guest and going into enabled wait will never lead to a wakeup. This patch fixes this problem. Just as for the ckc, we have to take care of waking up too early. We have to recalculate the sleep time and go back to sleep. Please note that the timer callback calls kvm_s390_get_cpu_timer() from interrupt context. As the timer is canceled when leaving handle_wait(), and we don't do any VCPU cpu timer writes/updates in that function, we can be sure that we will never try to read the VCPU cpu timer from the same cpu that is currentyl updating the timer (deadlock). Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Tested-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: step the VCPU timer while in enabled wait	David Hildenbrand
	The cpu timer is a mean to measure task execution time. We want to account everything for a VCPU for which it is responsible. Therefore, if the VCPU wants to sleep, it shall be accounted for it. We can easily get this done by not disabling cpu timer accounting when scheduled out while sleeping because of enabled wait. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: protect VCPU cpu timer with a seqcount	David Hildenbrand
	For now, only the owning VCPU thread (that has loaded the VCPU) can get a consistent cpu timer value when calculating the delta. However, other threads might also be interested in a more recent, consistent value. Of special interest will be the timer callback of a VCPU that executes without having the VCPU loaded and could run in parallel with the VCPU thread. The cpu timer has a nice property: it is only updated by the owning VCPU thread. And speaking about accounting, a consistent value can only be calculated by looking at cputm_start and the cpu timer itself in one shot, otherwise the result might be wrong. As we only have one writing thread at a time (owning VCPU thread), we can use a seqcount instead of a seqlock and retry if the VCPU refreshed its cpu timer. This avoids any heavy locking and only introduces a counter update/check plus a handful of smp_wmb(). The owning VCPU thread should never have to retry on reads, and also for other threads this might be a very rare scenario. Please note that we have to use the raw_* variants for locking the seqcount as lockdep will produce false warnings otherwise. The rq->lock held during vcpu_load/put is also acquired from hardirq context. Lockdep cannot know that we avoid potential deadlocks by disabling preemption and thereby disable concurrent write locking attempts (via vcpu_put/load). Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: step VCPU cpu timer during kvm_run ioctl	David Hildenbrand
	Architecturally we should only provide steal time if we are scheduled away, and not if the host interprets a guest exit. We have to step the guest CPU timer in these cases. In the first shot, we will step the VCPU timer only during the kvm_run ioctl. Therefore all time spent e.g. in interception handlers or on irq delivery will be accounted for that VCPU. We have to take care of a few special cases: - Other VCPUs can test for pending irqs. We can only report a consistent value for the VCPU thread itself when adding the delta. - We have to take care of STP sync, therefore we have to extend kvm_clock_sync() and disable preemption accordingly - During any call to disable/enable/start/stop we could get premeempted and therefore get start/stop calls. Therefore we have to make sure we don't get into an inconsistent state. Whenever a VCPU is scheduled out, sleeping, in user space or just about to enter the SIE, the guest cpu timer isn't stepped. Please note that all primitives are prepared to be called from both environments (cpu timer accounting enabled or not), although not completely used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called while cpu timer accounting is enabled). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: abstract access to the VCPU cpu timer	David Hildenbrand
	We want to manually step the cpu timer in certain scenarios in the future. Let's abstract any access to the cpu timer, so we can hide the complexity internally. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: store cpu id in vcpu->cpu when scheduled in	David Hildenbrand
	By storing the cpu id, we have a way to verify if the current cpu is owning a VCPU. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	KVM: s390: Add diag "watchdog functions" to trace event decoding	Alexander Yarygin
	DIAG 0x288 may occur now. Let's add its code to the diag table in sie.h. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08	Merge branch 'timers/core-v9' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/nohz Pull nohz enhancements from Frederic Weisbecker: "Currently in nohz full configs, the tick dependency is checked asynchronously by nohz code from interrupt and context switch for each concerned subsystem with a set of function provided by these. Such functions are made of many conditions and details that can be heavyweight as they are called on fastpath: sched_can_stop_tick(), posix_cpu_timer_can_stop_tick(), perf_event_can_stop_tick()... Thomas suggested a few months ago to make that tick dependency check synchronous. Instead of checking subsystems details from each interrupt to guess if the tick can be stopped, every subsystem that may have a tick dependency should set itself a flag specifying the state of that dependency. This way we can verify if we can stop the tick with a single lightweight mask check on fast path. This conversion from a pull to a push model to implement tick dependency is the core feature of this patchset that is split into: * Nohz wide kick simplification * Improve nohz tracing * Introduce tick dependency mask * Migrate scheduler, posix timers, perf events and sched clock tick dependencies to the tick dependency mask." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08	hwmon: Define binding for the nsa320-hwmon driver	Adam Baker
	Define a binding for the hardware monitoring chip present in the Zyxel NSA-320 and some of the other Zyxel NAS devices. Signed-off-by: Adam Baker <linux@baker-net.org.uk> Acked-by: Rob Herring <robh@kernel.org> [groeck: Fixed whitespace error] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-03-08	x86/nmi: Mark 'ignore_nmis' as __read_mostly	Kostenzer Felix
	ignore_nmis is used in two distinct places: 1. modified through {stop,restart}_nmi by alternative_instructions 2. read by do_nmi to determine if default_do_nmi should be called or not thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath. Signed-off-by: Kostenzer Felix <fkostenzer@live.at> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08	KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS	David Hildenbrand
	With MACHINE_HAS_VX, we convert the floating point registers from the vector registeres when storing the status. For other VCPUs, these are stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs, which resolves to the currently loaded VCPU. So kvm_s390_store_status_unloaded() currently writes the wrong floating point registers (converted from the vector registers) when called from another VCPU on a z13. This is only the case for old user space not handling SIGP STORE STATUS and SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All other calls come from the loaded VCPU via kvm_s390_store_status(). Fixes: 9abc2a08a7d6 (KVM: s390: fix memory overwrites when vx is disabled) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	Merge branch 'kvm-ppc-fixes' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
2016-03-08	KVM: VMX: disable PEBS before a guest entry	Radim Krčmář
	Linux guests on Haswell (and also SandyBridge and Broadwell, at least) would crash if you decided to run a host command that uses PEBS, like perf record -e 'cpu/mem-stores/pp' -a This happens because KVM is using VMX MSR switching to disable PEBS, but SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it isn't safe: When software needs to reconfigure PEBS facilities, it should allow a quiescent period between stopping the prior event counting and setting up a new PEBS event. The quiescent period is to allow any latent residual PEBS records to complete its capture at their previously specified buffer address (provided by IA32_DS_AREA). There might not be a quiescent period after the MSR switch, so a CPU ends up using host's MSR_IA32_DS_AREA to access an area in guest's memory. (Or MSR switching is just buggy on some models.) The guest can learn something about the host this way: If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results in #PF where we leak host's MSR_IA32_DS_AREA through CR2. After that, a malicious guest can map and configure memory where MSR_IA32_DS_AREA is pointing and can therefore get an output from host's tracing. This is not a critical leak as the host must initiate with PEBS tracing and I have not been able to get a record from more than one instruction before vmentry in vmx_vcpu_run() (that place has most registers already overwritten with guest's). We could disable PEBS just few instructions before vmentry, but disabling it earlier shouldn't affect host tracing too much. We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that optimization isn't worth its code, IMO. (If you are implementing PEBS for guests, be sure to handle the case where both host and guest enable PEBS, because this patch doesn't.) Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.") Cc: <stable@vger.kernel.org> Reported-by: Jiří Olša <jolsa@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: micro-optimize gpte_access	Paolo Bonzini
	Avoid AND-NOT, most x86 processor lack an instruction for it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: simplify last_pte_bitmap	Paolo Bonzini
	Branch-free code is fun and everybody knows how much Avi loves it, but last_pte_bitmap takes it a bit to the extreme. Since the code is simply doing a range check, like (level == 1 \|\| ((gpte & PT_PAGE_SIZE_MASK) && level < N) we can make it branch-free without storing the entire truth table; it is enough to cache N. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: coalesce more page zapping in mmu_sync_children	Paolo Bonzini
	mmu_sync_children can only process up to 16 pages at a time. Check if we need to reschedule, and do not bother zapping the pages until that happens. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: move zap/flush to kvm_mmu_get_page	Paolo Bonzini
	kvm_mmu_get_page is the only caller of kvm_sync_page_transient and kvm_sync_pages. Moving the handling of the invalid_list there removes the need for the underdocumented kvm_sync_page_transient function. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: invert return value of mmu.sync_page and kvm_sync_page	Paolo Bonzini
	Return true if the page was synced (and the TLB must be flushed) and false if the page was zapped. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: cleanup __kvm_sync_page and its callers	Paolo Bonzini
	Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes things unnecessarily tricky. If kvm_mmu_prepare_zap_page is called, it will call kvm_unlink_unsync_page too. So kvm_unlink_unsync_page can be called just as well at the beginning or the end of __kvm_sync_page... which means that we might do it in kvm_sync_page too and remove the parameter. kvm_sync_page ends up being the same code that kvm_sync_pages used to have before the previous patch. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: use kvm_sync_page in kvm_sync_pages	Paolo Bonzini
	If the last argument is true, kvm_unlink_unsync_page is called anyway in __kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page itself). Therefore, kvm_sync_pages can just call kvm_sync_page, instead of going through kvm_unlink_unsync_page+__kvm_sync_page. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: move TLB flush out of __kvm_sync_page	Paolo Bonzini
	By doing this, kvm_sync_pages can use __kvm_sync_page instead of reinventing it. Because of kvm_mmu_flush_or_zap, the code does not end up being more complex than before, and more cleanups to kvm_sync_pages will come in the next patches. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: introduce kvm_mmu_flush_or_zap	Paolo Bonzini
	This is a generalization of mmu_pte_write_flush_tlb, that also takes care of calling kvm_mmu_commit_zap_page. The next patches will introduce more uses. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>