summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-15Merge tag 'for-linus-20191115' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes that should make it into this release. This contains: - io_uring: - The timeout command assumes sequence == 0 means that we want one completion, but this kind of overloading is unfortunate as it prevents users from doing a pure time based wait. Since this operation was introduced in this cycle, let's correct it now, while we can. (me) - One-liner to fix an issue with dependent links and fixed buffer reads. The actual IO completed fine, but the link got severed since we stored the wrong expected value. (me) - Add TIMEOUT to list of opcodes that don't need a file. (Pavel) - rsxx missing workqueue destry calls. Old bug. (Chuhong) - Fix blk-iocost active list check (Jiufei) - Fix impossible-to-hit overflow merge condition, that still hit some folks very rarely (Junichi) - Fix bfq hang issue from 5.3. This didn't get marked for stable, but will go into stable post this merge (Paolo)" * tag 'for-linus-20191115' of git://git.kernel.dk/linux-block: rsxx: add missed destroy_workqueue calls in remove iocost: check active_list of all the ancestors in iocg_activate() block, bfq: deschedule empty bfq_queues not referred by any process io_uring: ensure registered buffer import returns the IO length io_uring: Fix getting file for timeout block: check bi_size overflow before merge io_uring: make timeout sequence == 0 mean no sequence
2019-11-15i2c: core: fix use after free in of_i2c_notifyWen Yang
We can't use "adap->dev" after it has been freed. Fixes: 5bf4fa7daea6 ("i2c: break out OF support into separate file") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-11-15i2c: acpi: Force bus speed to 400KHz if a Silead touchscreen is presentHans de Goede
Many cheap devices use Silead touchscreen controllers. Testing has shown repeatedly that these touchscreen controllers work fine at 400KHz, but for unknown reasons do not work properly at 100KHz. This has been seen on both ARM and x86 devices using totally different i2c controllers. On some devices the ACPI tables list another device at the same I2C-bus as only being capable of 100KHz, testing has shown that these other devices work fine at 400KHz (as can be expected of any recent I2C hw). This commit makes i2c_acpi_find_bus_speed() always return 400KHz if a Silead touchscreen controller is present, fixing the touchscreen not working on devices which ACPI tables' wrongly list another device on the same bus as only being capable of 100KHz. Specifically this fixes the touchscreen on the Jumper EZpad 6 m4 not working. Reported-by: youling 257 <youling257@gmail.com> Tested-by: youling 257 <youling257@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> [wsa: rewording warning a little] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2019-11-15Merge branch 'ptp-Validate-the-ancillary-ioctl-flags-more-carefully'David S. Miller
Richard Cochran says: ==================== ptp: Validate the ancillary ioctl flags more carefully. The flags passed to the ioctls for periodic output signals and time stamping of external signals were never checked, and thus formed a useless ABI inadvertently. More recently, a version 2 of the ioctls was introduced in order make the flags meaningful. This series tightens up the checks on the new ioctl flags. - Patch 1 ensures at least one edge flag is set for the new ioctl. - Patches 2-7 are Jacob's recent checks, picking up the tags. - Patch 8 introduces a "strict" flag for passing to the drivers when the new ioctl is used. - Patches 9-12 implement the "strict" checking in the drivers. - Patch 13 extends the test program to exercise combinations of flags. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Extend the test program to check the external time stamp flags.Richard Cochran
Because each driver and hardware has different capabilities, the test cannot provide a simple pass/fail result, but it can at least show what combinations of flags are supported. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mlx5: Reject requests to enable time stamping on both edges.Richard Cochran
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15igb: Reject requests that fail to enable time stamping on both edges.Richard Cochran
This hardware always time stamps rising and falling edges, and so this patch validates that the request does contains both edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15dp83640: Reject requests to enable time stamping on both edges.Richard Cochran
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mv88e6xxx: Reject requests to enable time stamping on both edges.Richard Cochran
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Introduce strict checking of external time stamp options.Richard Cochran
User space may request time stamps on rising edges, falling edges, or both. However, the particular mode may or may not be supported in the hardware or in the driver. This patch adds a "strict" flag that tells drivers to ensure that the requested mode will be honored. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15renesas: reject unsupported external timestamp flagsJacob Keller
Fix the renesas PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mlx5: reject unsupported external timestamp flagsJacob Keller
Fix the mlx5 core PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. [ RC: I'm not 100% sure what this driver does, but if I'm not wrong it follows the dp83640: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge ] Cc: Feras Daoud <ferasda@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15igb: reject unsupported external timestamp flagsJacob Keller
Fix the igb PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. This HW always time stamps both edges: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp both edges PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp both edges PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp both edges PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp both edges Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15dp83640: reject unsupported external timestamp flagsJacob Keller
Fix the dp83640 PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. For the record, the semantics of this driver are: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge Cc: Stefan Sørensen <stefan.sorensen@spectralink.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mv88e6xxx: reject unsupported external timestamp flagsJacob Keller
Fix the mv88e6xxx PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. For the record, the semantics of this driver are: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp rising edge Cc: Brandon Streiff <brandon.streiff@ni.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: reject PTP periodic output requests with unsupported flagsJacob Keller
Commit 823eb2a3c4c7 ("PTP: add support for one-shot output") introduced a new flag for the PTP periodic output request ioctl. This flag is not currently supported by any driver. Fix all drivers which implement the periodic output request ioctl to explicitly reject any request with flags they do not understand. This ensures that the driver does not accidentally misinterpret the PTP_PEROUT_ONE_SHOT flag, or any new flag introduced in the future. This is important for forward compatibility: if a new flag is introduced, the driver should reject requests to enable the flag until the driver has actually been modified to support the flag in question. Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Christopher Hall <christopher.s.hall@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Validate requests to enable time stamping of external signals.Richard Cochran
Commit 415606588c61 ("PTP: introduce new versions of IOCTLs") introduced a new external time stamp ioctl that validates the flags. This patch extends the validation to ensure that at least one rising or falling edge flag is set when enabling external time stamps. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: ep93xx_eth: fix mismatch of request_mem_region in removeChuhong Yuan
The driver calls release_resource in remove to match request_mem_region in probe, which is incorrect. Fix it by using the right one, release_mem_region. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ax88172a: fix information leak on short answersOliver Neukum
If a malicious device gives a short MAC it can elicit up to 5 bytes of leaked memory out of the driver. We need to check for ETH_ALEN instead. Reported-by: syzbot+a8d4acdad35e6bbca308@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15selftests: mlxsw: Adjust test to recent changesIdo Schimmel
mlxsw does not support VXLAN devices with a physical device attached and vetoes such configurations upon enslavement to an offloaded bridge. Commit 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") changed the VXLAN device to be an upper of the physical device which causes mlxsw to veto the creation of the VXLAN device with "Unknown upper device type". This is OK as this configuration is not supported, but it prevents us from testing bad flows involving the enslavement of VXLAN devices with a physical device to a bridge, regardless if the physical device is an mlxsw netdev or not. Adjust the test to use a dummy device as a physical device instead of a mlxsw netdev. Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15dm integrity: fix excessive alignment of metadata runsMikulas Patocka
Metadata runs are supposed to be aligned on 4k boundary (so that they work efficiently with disks with 4k sectors). However, there was a programming bug that makes them aligned on 128k boundary instead. The unused space is wasted. Fix this bug by providing a proper 4k alignment. In order to keep existing volumes working, we introduce a new flag SB_FLAG_FIXED_PADDING - when the flag is clear, we calculate the padding the old way. In order to make sure that the old version cannot mount the volume created by the new version, we increase superblock version to 4. Also in order to not break with old integritysetup, we fix alignment only if the parameter "fix_padding" is present when formatting the device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-15Input: synaptics-rmi4 - destroy F54 poller workqueue when removingChuhong Yuan
The driver forgets to destroy workqueue in remove() similarly to what is done when probe() fails. Add a call to destroy_workqueue() to fix it. Since unregistration will wait for the work to finish, we do not need to cancel/flush the work instance in remove(). Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191114023405.31477-1-hslester96@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-11-15Input: ff-memless - kill timer in destroy()Oliver Neukum
No timer must be left running when the device goes away. Signed-off-by: Oliver Neukum <oneukum@suse.com> Reported-and-tested-by: syzbot+b6c55daa701fc389e286@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1573726121.17351.3.camel@suse.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-11-15Merge tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two fixes for the buffered reads and O_DIRECT writes serialization patch that went into -rc1 and a fixup for a bogus warning on older gcc versions" * tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-client: rbd: silence bogus uninitialized warning in rbd_object_map_update_finish() ceph: increment/decrement dio counter on async requests ceph: take the inode lock before acquiring cap refs
2019-11-15afs: Fix race in commit bulk status fetchDavid Howells
When a lookup is done, the afs filesystem will perform a bulk status-fetch operation on the requested vnode (file) plus the next 49 other vnodes from the directory list (in AFS, directory contents are downloaded as blobs and parsed locally). When the results are received, it will speculatively populate the inode cache from the extra data. However, if the lookup races with another lookup on the same directory, but for a different file - one that's in the 49 extra fetches, then if the bulk status-fetch operation finishes first, it will try and update the inode from the other lookup. If this other inode is still in the throes of being created, however, this will cause an assertion failure in afs_apply_status(): BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags)); on or about fs/afs/inode.c:175 because it expects data to be there already that it can compare to. Fix this by skipping the update if the inode is being created as the creator will presumably set up the inode with the same information. Fixes: 39db9815da48 ("afs: Fix application of the results of a inline bulk status fetch") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-15Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "One trivial fix for -rc8/final that ensures that the script used to detect RELR relocation support in the toolchain works correctly when $CC contains quotes. Although it fails safely (by failing to detect the support when it exists), it would be nice to have this fixed in 5.4 given that it was only introduced in the last merge window. Summary: - Handle CC variables containing quotes in tools-support-relr.sh script" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: scripts/tools-support-relr.sh: un-quote variables
2019-11-15Merge tag 'mips_fixes_5.4_4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "A fix and simplification for SGI IP27 exception handlers, and a small MAINTAINERS update for Broadcom MIPS systems" * tag 'mips_fixes_5.4_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MAINTAINERS: Remove Kevin as maintainer of BMIPS generic platforms MIPS: SGI-IP27: fix exception handler replication
2019-11-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM fixes from Paolo Bonzini: - fixes for CONFIG_KVM_COMPAT=n - two updates to the IFU erratum - selftests build fix - brown paper bag fix * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Add a comment describing the /dev/kvm no_compat handling KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast() KVM: Forbid /dev/kvm being opened by a compat task when CONFIG_KVM_COMPAT=n KVM: X86: Reset the three MSR list number variables to 0 in kvm_init_msr_list() selftests: kvm: fix build with glibc >= 2.30 kvm: x86: disable shattered huge page recovery for PREEMPT_RT.
2019-11-15Merge tag 'mmc-v5.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fix from Ulf Hansson: "Don't overwrite quirk flags in sdhci-of-at91 host driver" * tag 'mmc-v5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-of-at91: fix quirk2 overwrite
2019-11-15Merge tag 'sound-5.4-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A few small last-minute fixes for USB-audio and HD-audio as well as for PCM core: - A race fix for PCM core between stopping and closing a stream - USB-audio regressions in the recent descriptor validation code and relevant changes - A read of uninitialized value in USB-audio spotted by fuzzer - A fix for USB-audio race at stopping a stream - Intel HD-audio platform fixes" * tag 'sound-5.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Fix incorrect size check for processing/extension units ALSA: usb-audio: Fix incorrect NULL check in create_yamaha_midi_quirk() ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed() ALSA: usb-audio: not submit urb for stopped endpoint ALSA: hda: hdmi - fix pin setup on Tigerlake ALSA: hda: Add Cometlake-S PCI ID ALSA: usb-audio: Fix missing error check at mixer resolution test
2019-11-15Merge tag 'drm-fixes-2019-11-15' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Here is this weeks non-intel hw vuln fixes pull. Three drivers, all small fixes. i915: - MOCS table fixes for EHL and TGL - Update Display's rawclock on resume - GVT's dmabuf reference drop fix amdgpu: - Fix a potential crash in firmware parsing sun4i: - One fix to the dotclock dividers range for sun4i" * tag 'drm-fixes-2019-11-15' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: fix null pointer deref in firmware header printing drm/i915/tgl: MOCS table update Revert "drm/i915/ehl: Update MOCS table for EHL" drm/sun4i: tcon: Set min division of TCON0_DCLK to 1. drm/i915: update rawclk also on resume drm/i915/gvt: fix dropping obj reference twice
2019-11-15Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc vfs fixes from Al Viro: "Assorted fixes all over the place; some of that is -stable fodder, some regressions from the last window" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable ecryptfs: fix unlink and rmdir in face of underlying fs modifications audit_get_nd(): don't unlock parent too early exportfs_decode_fh(): negative pinned may become positive without the parent locked cgroup: don't put ERR_PTR() into fc->root autofs: fix a leak in autofs_expire_indirect() aio: Fix io_pgetevents() struct __compat_aio_sigset layout fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()
2019-11-15KVM: x86: deliver KVM IOAPIC scan request to target vCPUsNitesh Narayan Lal
In IOAPIC fixed delivery mode instead of flushing the scan requests to all vCPUs, we should only send the requests to vCPUs specified within the destination field. This patch introduces kvm_get_dest_vcpus_mask() API which retrieves an array of target vCPUs by using kvm_apic_map_get_dest_lapic() and then based on the vcpus_idx, it sets the bit in a bitmap. However, if the above fails kvm_get_dest_vcpus_mask() finds the target vCPUs by traversing all available vCPUs. Followed by setting the bits in the bitmap. If we had different vCPUs in the previous request for the same redirection table entry then bits corresponding to these vCPUs are also set. This to done to keep ioapic_handled_vectors synchronized. This bitmap is then eventually passed on to kvm_make_vcpus_request_mask() to generate a masked request only for the target vCPUs. This would enable us to reduce the latency overhead on isolated vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN. Suggested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: remember position in kvm->vcpus arrayRadim Krčmář
Fetching an index for any vcpu in kvm->vcpus array by traversing the entire array everytime is costly. This patch remembers the position of each vcpu in kvm->vcpus array by storing it in vcpus_idx under kvm_vcpu structure. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Add support for capturing highest observable L2 TSCAaron Lewis
The L1 hypervisor may include the IA32_TIME_STAMP_COUNTER MSR in the vmcs12 MSR VM-exit MSR-store area as a way of determining the highest TSC value that might have been observed by L2 prior to VM-exit. The current implementation does not capture a very tight bound on this value. To tighten the bound, add the IA32_TIME_STAMP_COUNTER MSR to the vmcs02 VM-exit MSR-store area whenever it appears in the vmcs12 VM-exit MSR-store area. When L0 processes the vmcs12 VM-exit MSR-store area during the emulation of an L2->L1 VM-exit, special-case the IA32_TIME_STAMP_COUNTER MSR, using the value stored in the vmcs02 VM-exit MSR-store area to derive the value to be stored in the vmcs12 VM-exit MSR-store area. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15kvm: vmx: Rename function find_msr() to vmx_find_msr_index()Aaron Lewis
Rename function find_msr() to vmx_find_msr_index() in preparation for an upcoming patch where we export it and use it in nested.c. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15kvm: vmx: Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRSAaron Lewis
Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRS. This needs to be done due to the addition of the MSR-autostore area that will be added in a future patch. After that the name AUTOLOAD will no longer make sense. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15kvm: nested: Introduce read_and_check_msr_entry()Aaron Lewis
Add the function read_and_check_msr_entry() which just pulls some code out of nested_vmx_store_msr(). This will be useful as reusable code in upcoming patches. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: mark functions in the header as "static inline"Paolo Bonzini
Correct a small inaccuracy in the shattering of vmx.c, which becomes visible now that pmu_intel.c includes nested.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} controlOliver Upton
The "load IA32_PERF_GLOBAL_CTRL" bit for VM-entry and VM-exit should only be exposed to the guest if IA32_PERF_GLOBAL_CTRL is a valid MSR. Create a new helper to allow pmu_refresh() to update the VM-Entry and VM-Exit controls to ensure PMU values are initialized when performing the is_valid_msr() check. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-EntryOliver Upton
Add condition to prepare_vmcs02 which loads IA32_PERF_GLOBAL_CTRL on VM-entry if the "load IA32_PERF_GLOBAL_CTRL" bit on the VM-entry control is set. Use SET_MSR_OR_WARN() rather than directly writing to the field to avoid overwrite by atomic_switch_perf_msrs(). Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Use kvm_set_msr to load IA32_PERF_GLOBAL_CTRL on VM-ExitOliver Upton
The existing implementation for loading the IA32_PERF_GLOBAL_CTRL MSR on VM-exit was incorrect, as the next call to atomic_switch_perf_msrs() could cause this value to be overwritten. Instead, call kvm_set_msr() which will allow atomic_switch_perf_msrs() to correctly set the values. Define a macro, SET_MSR_OR_WARN(), to set the MSR with kvm_set_msr() and WARN on failure. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Check HOST_IA32_PERF_GLOBAL_CTRL on VM-EntryOliver Upton
Add a consistency check on nested vm-entry for host's IA32_PERF_GLOBAL_CTRL from vmcs12. Per Intel's SDM Vol 3 26.2.2: If the "load IA32_PERF_GLOBAL_CTRL" VM-exit control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that register" Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: nVMX: Check GUEST_IA32_PERF_GLOBAL_CTRL on VM-EntryOliver Upton
Add condition to nested_vmx_check_guest_state() to check the validity of GUEST_IA32_PERF_GLOBAL_CTRL. Per Intel's SDM Vol 3 26.3.1.1: If the "load IA32_PERF_GLOBAL_CTRL" VM-entry control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that register. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: VMX: Add helper to check reserved bits in IA32_PERF_GLOBAL_CTRLOliver Upton
Create a helper function to check the validity of a proposed value for IA32_PERF_GLOBAL_CTRL from the existing check in intel_pmu_set_msr(). Per Intel's SDM, the reserved bits in IA32_PERF_GLOBAL_CTRL must be cleared for the corresponding host/guest state fields. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15selftests: kvm: Simplify loop in kvm_create_max_vcpus testWainer dos Santos Moschetta
On kvm_create_max_vcpus test remove unneeded local variable in the loop that add vcpus to the VM. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86: Optimization: Requst TLB flush in fast_cr3_switch() instead of do ↵Liran Alon
it directly When KVM emulates a nested VMEntry (L1->L2 VMEntry), it switches mmu root page. If nEPT is used, this will happen from kvm_init_shadow_ept_mmu()->__kvm_mmu_new_cr3() and otherwise it will happpen from nested_vmx_load_cr3()->kvm_mmu_new_cr3(). Either case, __kvm_mmu_new_cr3() will use fast_cr3_switch() in attempt to switch to a previously cached root page. In case fast_cr3_switch() finds a matching cached root page, it will set it in mmu->root_hpa and request KVM_REQ_LOAD_CR3 such that on next entry to guest, KVM will set root HPA in appropriate hardware fields (e.g. vmcs->eptp). In addition, fast_cr3_switch() calls kvm_x86_ops->tlb_flush() in order to flush TLB as MMU root page was replaced. This works as mmu->root_hpa, which vmx_flush_tlb() use, was already replaced in cached_root_available(). However, this may result in unnecessary INVEPT execution because a KVM_REQ_TLB_FLUSH may have already been requested. For example, by prepare_vmcs02() in case L1 don't use VPID. Therefore, change fast_cr3_switch() to just request TLB flush on next entry to guest. Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMCLike Xu
Currently, a host perf_event is created for a vPMC functionality emulation. It’s unpredictable to determine if a disabled perf_event will be reused. If they are disabled and are not reused for a considerable period of time, those obsolete perf_events would increase host context switch overhead that could have been avoided. If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu sched time slice, and its independent enable bit of the vPMC isn't set, we can predict that the guest has finished the use of this vPMC, and then do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in. This lazy mechanism delays the event release time to the beginning of the next scheduled time slice if vPMC's MSRs aren't changed during this time slice. If guest comes back to use this vPMC in next time slice, a new perf event would be re-created via perf_event_create_kernel_counter() as usual. Suggested-by: Wei Wang <wei.w.wang@intel.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counterLike Xu
The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is a heavyweight and high-frequency operation, especially when host disables the watchdog (maximum 21000000 ns) which leads to an unacceptable latency of the guest NMI handler. It limits the use of vPMUs in the guest. When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop and release its existing perf_event (if any) every time EVEN in most cases almost the same requested perf_event will be created and configured again. For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl' for fixed) is the same as its current config AND a new sample period based on pmc->counter is accepted by host perf interface, the current event could be reused safely as a new created one does. Otherwise, do release the undesirable perf_event and reprogram a new one as usual. It's light-weight to call pmc_pause_counter (disable, read and reset event) and pmc_resume_counter (recalibrate period and re-enable event) as guest expects instead of release-and-create again on any condition. Compared to use the filterable event->attr or hw.config, a new 'u64 current_config' field is added to save the last original programed config for each vPMC. Based on this implementation, the number of calls to pmc_reprogram_counter is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event. In the usage of multiplexing perf sampling mode, the average latency of the guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up). If host disables watchdog, the minimum latecy of guest NMI handler could be speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average. Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callbackLike Xu
Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*, and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) || .is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>