Age | Commit message (Collapse) | Author |
|
|
|
|
|
If the device is of type VFL_TYPE_SUBDEV then vdev->ioctl_ops
is NULL so the 'if (!ops->vidioc_query_ext_ctrl)' check would crash.
Add a test for !ops to the condition.
All sub-devices that have controls will use the control framework,
so they do not have an equivalent to ops->vidioc_query_ext_ctrl.
Returning false if ops is NULL is the correct thing to do here.
Fixes: b8c601e8af ("v4l2-compat-ioctl32.c: fix ctrl_is_pointer")
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: <stable@vger.kernel.org> # for v4.15 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
Currently kexec() will crash when switching into a 5-level paging
enabled kernel.
I missed that we need to change relocate_kernel() to set CR4.LA57
flag if the kernel has 5-level paging enabled.
I avoided using #ifdef CONFIG_X86_5LEVEL here and inferred if we need to
enable 5-level paging from previous CR4 value. This way the code is
ready for boot-time switching between paging modes.
With this patch applied, in addition to kexec 4-to-4 which always worked,
we can kexec 4-to-5 and 5-to-5 - while 5-to-4 will need more work.
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Baoquan He <bhe@redhat.com>
Cc: <stable@vger.kernel.org> # v4.14+
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Fixes: 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Link: http://lkml.kernel.org/r/20180129110845.26633-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.
Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.
If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:
1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();
2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
completed via blk_mq_sched_restart()
3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.
One invariant is that queue will be rerun if SCHED_RESTART is set.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
satp is the name used by the current privileged spec 1.10, use it
instead of the old name. The most recent release binutils release
(2.29) doesn't know about the satp name yet, so stick to the name from
the previous privileged ISA release and comment on the fact.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
init_mm.pgd (aka swapped_pgd) gets relocated like all other kernel
symbols by the elf loader, so there is no need to reload it from satp.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
This patch allows devices that require memory that can be addressed
using 32-bit addresses to work easily on RISC-V systems. The newly
improved dma-direct ops will tap into this pool automatically for
32-bit addressing.
Based on an earlier patch from Wesley W. Terpstra.
CC: Wesley W. Terpstra <terpstra@sifive.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
RISC-V systems perform TLB shootdows via the SBI, which currently
performs an IPI to each of the remote harts which then performs a local
TLB flush. This process is a bit on the slow side, but we can at least
speed it up for some common cases by restricting the set of harts to
shoot down to the actual set of harts that are currently participating
in the given mm context, as opposed to the entire system.
This should provide a measurable performance increase, but we haven't
measured it. Regardless, it seems like obviously the right thing to do
here.
Signed-off-by: Andrew Waterman <andrew@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The SUM bit is enabled at the beginning of the copy_{to,from}_user and
{get,put}_user routines, and cleared before they return. But these user
copy helper can be interrupted by exceptions, in which case the SUM bit
will remain set, which leads to elevated privileges for the code running
in exception context, as that can now access userspace address space
unconditionally. This frequently happens when the user copy routines
access freshly allocated user memory that hasn't been faulted in, and a
pagefault needs to be taken before the user copy routines can continue.
Fix this by unconditionally clearing SUM when the exception handler is
called - the restore code will automatically restore it based on the
saved value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
IS_ERR_VALUE() already implies unlikely(), so it can be omitted.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
The __ARCH_HAVE_MMU define is (and was) used nowhere in the tree and
also doesn't appear to be used by any libc.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
This patch contains basic ftrace support for RV64I platform.
Specifically, function tracer (HAVE_FUNCTION_TRACER), function graph
tracer (HAVE_FUNCTION_GRAPH_TRACER), and a frame pointer test
(HAVE_FUNCTION_GRAPH_FP_TEST) are implemented following the
instructions in Documentation/trace/ftrace-design.txt.
Note that the functions in both ftrace.c and setup.c should not be
hooked with the compiler's -pg option: to prevent infinite self-
referencing for the former, and to ignore early setup stuff for the
latter.
Signed-off-by: Alan Kao <alankao@andestech.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
This is just some cruft left over from before the port converted to
device tree. The right way to handle memory regions is to specify them
in the device tree, which BBL (our simplest bootloader) is already
capable of doing. This patch simply removes the cruft.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
builtin_cmdline handling is present in drivers/of/fdt.c so the
duplicate logic in arch/riscv/setup.c results in duplication of
the builtin command line. e.g. CONFIG_CMDLINE="root=/dev/vda ro"
gets appended twice and gives "root=/dev/vda ro root=/dev/vda ro"
Before this patch:
[ 0.000000] Kernel command line: root=/dev/vda ro root=/dev/vda ro
After this patch:
[ 0.000000] Kernel command line: root=/dev/vda ro
Signed-off-by: Michael Clark <mjc@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
renameat has been deprecated in favor of renameat2 for new ports. This
allows the audit tests to build on RISC-V.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've followed up to support some generic features such
as cgroup, block reservation, linking fscrypt_ops, delivering
write_hints, and some ioctls. And, we could fix some corner cases in
terms of power-cut recovery and subtle deadlocks.
Enhancements:
- bitmap operations to handle NAT blocks
- readahead to improve readdir speed
- switch to use fscrypt_*
- apply write hints for direct IO
- add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid
- modify b_avail and b_free to consider root reserved blocks
- support cgroup writeback
- support FIEMAP_FLAG_XATTR for fibmap
- add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents
- add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks
- support inode creation time
Bug fixs:
- sysfile-based quota operations
- memory footprint accounting
- allow to write data on partial preallocation case
- fix deadlock case on fallocate
- fix to handle fill_super errors
- fix missing inode updates of fsync'ed file
- recover renamed file which was fsycn'ed before
- drop inmemory pages in corner error case
- keep last_disk_size correctly
- recover missing i_inline flags during roll-forward
Various clean-up patches were added as well"
* tag 'f2fs-for-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (72 commits)
f2fs: support inode creation time
f2fs: rebuild sit page from sit info in mem
f2fs: stop issuing discard if fs is readonly
f2fs: clean up duplicated assignment in init_discard_policy
f2fs: use GFP_F2FS_ZERO for cleanup
f2fs: allow to recover node blocks given updated checkpoint
f2fs: recover some i_inline flags
f2fs: correct removexattr behavior for null valued extended attribute
f2fs: drop page cache after fs shutdown
f2fs: stop gc/discard thread after fs shutdown
f2fs: hanlde error case in f2fs_ioc_shutdown
f2fs: split need_inplace_update
f2fs: fix to update last_disk_size correctly
f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
f2fs: clean up error path of fill_super
f2fs: avoid hungtask when GC encrypted block if io_bits is set
f2fs: allow quota to use reserved blocks
f2fs: fix to drop all inmem pages correctly
f2fs: speed up defragment on sparse file
f2fs: support F2FS_IOC_PRECACHE_EXTENTS
...
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable bugfixes:
- Fix breakages in the nfsstat utility due to the inclusion of the
NFSv4 LOOKUPP operation
- Fix a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall()
due to nfs_idmap_legacy_upcall() being called without an 'aux'
parameter
- Fix a refcount leak in the standard O_DIRECT error path
- Fix a refcount leak in the pNFS O_DIRECT fallback to MDS path
- Fix CPU latency issues with nfs_commit_release_pages()
- Fix the LAYOUTUNAVAILABLE error case in the file layout type
- NFS: Fix a race between mmap() and O_DIRECT
Features:
- Support the statx() mask and query flags to enable optimisations
when the user is requesting only attributes that are already up to
date in the inode cache, or is specifying the AT_STATX_DONT_SYNC
flag
- Add a module alias for the SCSI pNFS layout type
Bugfixes:
- Automounting when resolving a NFSv4 referral should preserve the
RDMA transport protocol settings
- Various other RDMA bugfixes from Chuck
- pNFS block layout fixes
- Always set NFS_LOCK_LOST when a lock is lost"
* tag 'nfs-for-4.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits)
NFS: Fix a race between mmap() and O_DIRECT
NFS: Remove a redundant call to unmap_mapping_range()
pnfs/blocklayout: Ensure disk address in block device map
pnfs/blocklayout: pnfs_block_dev_map uses bytes, not sectors
lockd: Fix server refcounting
SUNRPC: Fix null rpc_clnt dereference in rpc_task_queued tracepoint
SUNRPC: Micro-optimize __rpc_execute
SUNRPC: task_run_action should display tk_callback
sunrpc: Format RPC events consistently for display
SUNRPC: Trace xprt_timer events
xprtrdma: Correct some documenting comments
xprtrdma: Fix "bytes registered" accounting
xprtrdma: Instrument allocation/release of rpcrdma_req/rep objects
xprtrdma: Add trace points to instrument QP and CQ access upcalls
xprtrdma: Add trace points in the client-side backchannel code paths
xprtrdma: Add trace points for connect events
xprtrdma: Add trace points to instrument MR allocation and recovery
xprtrdma: Add trace points to instrument memory invalidation
xprtrdma: Add trace points in reply decoder path
xprtrdma: Add trace points to instrument memory registration
..
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull kern_recvmsg reduction from Al Viro:
"kernel_recvmsg() is a set_fs()-using wrapper for sock_recvmsg(). In
all but one case that is not needed - use of ITER_KVEC for ->msg_iter
takes care of the data and does not care about set_fs(). The only
exception is svc_udp_recvfrom() where we want cmsg to be store into
kernel object; everything else can just use sock_recvmsg() and be done
with that.
A followup converting svc_udp_recvfrom() away from set_fs() (and
killing kernel_recvmsg() off) is *NOT* in here - I'd like to hear what
netdev folks think of the approach proposed in that followup)"
* 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
tipc: switch to sock_recvmsg()
smc: switch to sock_recvmsg()
ipvs: switch to sock_recvmsg()
mISDN: switch to sock_recvmsg()
drbd: switch to sock_recvmsg()
lustre lnet_sock_read(): switch to sock_recvmsg()
cfs2: switch to sock_recvmsg()
ncpfs: switch to sock_recvmsg()
dlm: switch to sock_recvmsg()
svc_recvfrom(): switch to sock_recvmsg()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mqueue/bpf vfs cleanups from Al Viro:
"mqueue and bpf go through rather painful and similar contortions to
create objects in their dentry trees. Provide a primitive for doing
that without abusing ->mknod(), switch bpf and mqueue to it.
Another mqueue-related thing that has ended up in that branch is
on-demand creation of internal mount (based upon the work of Giuseppe
Scrivano)"
* 'work.mqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mqueue: switch to on-demand creation of internal mount
tidy do_mq_open() up a bit
mqueue: clean prepare_open() up
do_mq_open(): move all work prior to dentry_open() into a helper
mqueue: fold mq_attr_ok() into mqueue_get_inode()
move dentry_open() calls up into do_mq_open()
mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata
bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod()
new primitive: vfs_mkobj()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.
Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.
Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.
The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.
As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"
* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...
|
|
last kicked event index must be updated unconditionally:
even if we don't need to kick, we do not want to re-check
the same entry for events.
Reported-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
|
|
last kicked event index must be updated unconditionally:
even if we don't need to kick, we do not want to re-check
the same entry for events.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
3647 608 0 4255 109f drivers/virtio/virtio_mmio.o
File size after constify virtio_mmio_match.
text data bss dec hex filename
4063 192 0 4255 109f drivers/virtio/virtio_mmio.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Fix ptr_ret.cocci warnings:
drivers/firmware/efi/efi.c:610:8-14: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
Fix ptr_ret.cocci warnings:
drivers/virtio/virtio_mmio.c:653:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Replace the specification of four data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Add a new field VIRTIO_BALLOON_S_CACHES to virtio_balloon memory
statistics protocol. The value represents all disk/file caches.
In this case it corresponds to the sum of values
Buffers+Cached+SwapCached from /proc/meminfo.
Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Nothing too interesting. Documentation updates and trivial changes;
however, this pull request does containt he previusly discussed
dropping of __must_check from strscpy()"
* 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Documentation: Fix 'file_mapped' -> 'mapped_file'
string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
cgroup, docs: document the root cgroup behavior of cpu and io controllers
cgroup-v2.txt: fix typos
cgroup: Update documentation reference
Documentation/cgroup-v1: fix outdated programming details
cgroup, docs: document cgroup v2 device controller
|
|
When running nested KVM on Hyper-V guests its required to update
masterclocks for all guests when L1 migrates to a host with different TSC
frequency.
Implement the procedure in the following way:
- Pause all guests.
- Tell the host (Hyper-V) to stop emulating TSC accesses.
- Update the gtod copy, recompute clocks.
- Unpause all guests.
This is somewhat similar to cpufreq but there are two important differences:
- TSC emulation can only be disabled globally (on all CPUs)
- The new TSC frequency is not known until emulation is turned off so
there is no way to 'prepare' for the event upfront.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-8-vkuznets@redhat.com
|
|
Currently, KVM is able to work in 'masterclock' mode passing
PVCLOCK_TSC_STABLE_BIT to guests when the clocksource which is used on the
host is TSC.
When running nested on Hyper-V the guest normally uses a different one: TSC
page which is resistant to TSC frequency changes on events like L1
migration. Add support for it in KVM.
The only non-trivial change is in vgettsc(): when updating the gtod copy
both the clock readout and tsc value have to be updated now.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-7-vkuznets@redhat.com
|
|
Hyper-V reenlightenment interrupts arrive when the VM is migrated, While
they are not interesting in general it's important when L2 nested guests
are running.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com
|
|
It is very unlikely for CPUs to get offlined when running on Hyper-V as
there is a protection in the vmbus module which prevents it when the guest
has any VMBus devices assigned. This, however, may change in future if an
option to reassign an already active channel will be added. It is also
possible to run without any Hyper-V devices or to have a CPU with no
assigned channels.
Reassign reenlightenment notifications to some other active CPU when the
CPU which is assigned to them goes offline.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-5-vkuznets@redhat.com
|
|
Hyper-V supports Live Migration notification. This is supposed to be used
in conjunction with TSC emulation: when a VM is migrated to a host with
different TSC frequency for some short period the host emulates the
accesses to TSC and sends an interrupt to notify about the event. When the
guest is done updating everything it can disable TSC emulation and
everything will start working fast again.
These notifications weren't required until now as Hyper-V guests are not
supposed to use TSC as a clocksource: in Linux the TSC is even marked as
unstable on boot. Guests normally use 'tsc page' clocksource and host
updates its values on migrations automatically.
Things change when with nested virtualization: even when the PV
clocksources (kvm-clock or tsc page) are passed through to the nested
guests the TSC frequency and frequency changes need to be know..
Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
the fix in on the way.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
|
|
This is going to be used from KVM code where both TSC and TSC page value
are needed.
Nothing is supposed to use the function when Hyper-V code is compiled out,
just BUG().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-3-vkuznets@redhat.com
|
|
In hyperv_init() its presumed that it always has access to VP index and
hypercall MSRs while according to the specification it should be checked if
it's allowed to access the corresponding MSRs before accessing them.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-2-vkuznets@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu update from Tejun Heo:
"One trivial patch to convert the return type from int to bool"
* 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: percpu_counter_initialized can be boolean
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
"Nothing too interesting. Several patches to convert mdelay() to
usleep_range(), removal of unused pata_at32, and other low level
driver specific changes"
* 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: pata_pdc2027x: Replace mdelay with msleep
ata: pata_it821x: Replace mdelay with usleep_range in it821x_firmware_command
ata: sata_mv: Replace mdelay with usleep_range in mv_reset_channel
ata: remove pata_at32
phy: brcm-sata: remove unused variable
phy: brcm-sata: fix semicolon.cocci warnings
ata: ahci_brcm: Recover from failures to identify devices
phy: brcm-sata: Implement calibrate callback
ahci: Add Intel Cannon Lake PCH-H PCI ID
ata_piix: constify pci_bits
libata:pata_atiixp: Don't use unconnected secondary port on SB600
ata: ahci_brcm: Avoid clobbering SATA_TOP_CTRL_BUS_CTRL
ahci: Allow setting a default LPM policy for mobile chipsets
ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI
ahci: Annotate PCI ids for mobile Intel chipsets as such
|
|
Pull workqueue updates from Tejun Heo:
"Workqueue has an early init trick where workqueues can be created and
work items queued on them before the workqueue subsystem is online.
This helps simplifying early init and operation of low level
subsystems which use workqueues for managerial things which aren't
depended upon early during boot.
Out of laziness, the early init didn't cover workqueues with
WQ_MEM_RECLAIM, which is inconsistent and confusing because adding the
flag simply makes the system fail to boot. Cover WQ_MEM_RECLAIM too.
This was originally brought up for RCU but RCU didn't actually need
this. I still think it's a good idea to cover it"
* 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: allow WQ_MEM_RECLAIM on early init workqueues
workqueue: separate out init_rescuer()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns updates from Eric Biederman:
"Between the holidays and other distractions only a small amount of
namespace work made it into my tree this time.
Just a final cleanup from a revert several kernels ago and a small
typo fix from Wolffhardt Schwabe"
* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fix typo in assignment of fs default overflow gid
autofs4: Modify autofs_wait to use current_uid() and current_gid()
userns: Don't fail follow_automount based on s_user_ns
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo cleanups from Eric Biederman:
"Long ago when 2.4 was just a testing release copy_siginfo_to_user was
made to copy individual fields to userspace, possibly for efficiency
and to ensure initialized values were not copied to userspace.
Unfortunately the design was complex, it's assumptions unstated, and
humans are fallible and so while it worked much of the time that
design failed to ensure unitialized memory is not copied to userspace.
This set of changes is part of a new design to clean up siginfo and
simplify things, and hopefully make the siginfo handling robust enough
that a simple inspection of the code can be made to ensure we don't
copy any unitializied fields to userspace.
The design is to unify struct siginfo and struct compat_siginfo into a
single definition that is shared between all architectures so that
anyone adding to the set of information shared with struct siginfo can
see the whole picture. Hopefully ensuring all future si_code
assignments are arch independent.
The design is to unify copy_siginfo_to_user32 and
copy_siginfo_from_user32 so that those function are complete and cope
with all of the different cases documented in signinfo_layout. I don't
think there was a single implementation of either of those functions
that was complete and correct before my changes unified them.
The design is to introduce a series of helpers including
force_siginfo_fault that take the values that are needed in struct
siginfo and build the siginfo structure for their callers. Ensuring
struct siginfo is built correctly.
The remaining work for 4.17 (unless someone thinks it is post -rc1
material) is to push usage of those helpers down into the
architectures so that architecture specific code will not need to deal
with the fiddly work of intializing struct siginfo, and then when
struct siginfo is guaranteed to be fully initialized change copy
siginfo_to_user into a simple wrapper around copy_to_user.
Further there is work in progress on the issues that have been
documented requires arch specific knowledge to sort out.
The changes below fix or at least document all of the issues that have
been found with siginfo generation. Then proceed to unify struct
siginfo the 32 bit helpers that copy siginfo to and from userspace,
and generally clean up anything that is not arch specific with regards
to siginfo generation.
It is a lot but with the unification you can of siginfo you can
already see the code reduction in the kernel"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
mm/memory_failure: Remove unused trapno from memory_failure
signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
signal: Helpers for faults with specialized siginfo layouts
signal: Add send_sig_fault and force_sig_fault
signal: Replace memset(info,...) with clear_siginfo for clarity
signal: Don't use structure initializers for struct siginfo
signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
ptrace: Use copy_siginfo in setsiginfo and getsiginfo
signal: Unify and correct copy_siginfo_to_user32
signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
signal: Unify and correct copy_siginfo_from_user32
signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
signal: Move addr_lsb into the _sigfault union for clarity
...
|
|
Flush indirect branches when switching into a process that marked itself
non dumpable. This protects high value processes like gpg better,
without having too high performance overhead.
If done naïvely, we could switch to a kernel idle thread and then back
to the original process, such as:
process A -> idle -> process A
In such scenario, we do not have to do IBPB here even though the process
is non-dumpable, as we are switching back to the same process after a
hiatus.
To avoid the redundant IBPB, which is expensive, we track the last mm
user context ID. The cost is to have an extra u64 mm context id to track
the last mm we were using before switching to the init_mm used by idle.
Avoiding the extra IBPB is probably worth the extra memory for this
common scenario.
For those cases where tlb_defer_switch_to_init_mm() returns true (non
PCID), lazy tlb will defer switch to init_mm, so we will not be changing
the mm for the process A -> idle -> process A switch. So IBPB will be
skipped for this case.
Thanks to the reviewers and Andy Lutomirski for the suggestion of
using ctx_id which got rid of the problem of mm pointer recycling.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: linux@dominikbrodowski.net
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: pbonzini@redhat.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"The main theme of this pull request is security covering variants 2
and 3 for arm64. I expect to send additional patches next week
covering an improved firmware interface (requires firmware changes)
for variant 2 and way for KPTI to be disabled on unaffected CPUs
(Cavium's ThunderX doesn't work properly with KPTI enabled because of
a hardware erratum).
Summary:
- Security mitigations:
- variant 2: invalidate the branch predictor with a call to
secure firmware
- variant 3: implement KPTI for arm64
- 52-bit physical address support for arm64 (ARMv8.2)
- arm64 support for RAS (firmware first only) and SDEI (software
delegated exception interface; allows firmware to inject a RAS
error into the OS)
- perf support for the ARM DynamIQ Shared Unit PMU
- CPUID and HWCAP bits updated for new floating point multiplication
instructions in ARMv8.4
- remove some virtual memory layout printks during boot
- fix initial page table creation to cope with larger than 32M kernel
images when 16K pages are enabled"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (104 commits)
arm64: Fix TTBR + PAN + 52-bit PA logic in cpu_do_switch_mm
arm64: Turn on KPTI only on CPUs that need it
arm64: Branch predictor hardening for Cavium ThunderX2
arm64: Run enable method for errata work arounds on late CPUs
arm64: Move BP hardening to check_and_switch_context
arm64: mm: ignore memory above supported physical address size
arm64: kpti: Fix the interaction between ASID switching and software PAN
KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA
KVM: arm64: Handle RAS SErrors from EL2 on guest exit
KVM: arm64: Handle RAS SErrors from EL1 on guest exit
KVM: arm64: Save ESR_EL2 on guest SError
KVM: arm64: Save/Restore guest DISR_EL1
KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
KVM: arm/arm64: mask/unmask daif around VHE guests
arm64: kernel: Prepare for a DISR user
arm64: Unconditionally enable IESB on exception entry/return for firmware-first
arm64: kernel: Survive corrected RAS errors notified by SError
arm64: cpufeature: Detect CPU RAS Extentions
arm64: sysreg: Move to use definitions for all the SCTLR bits
arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early
...
|
|
In commit f8350daf7af0 ("dm cache: tune migration throttling") the
value for DEFAULT_MIGRATION_THRESHOLD was decreased from 204800 to
2048. Edit device-mapper/cache.txt to reflect the correct default
value for migration_threshold.
Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Despite the fact that all the other code there seems to be doing it, just
using set_cpu_cap() in early_intel_init() doesn't actually work.
For CPUs with PKU support, setup_pku() calls get_cpu_cap() after
c->c_init() has set those feature bits. That resets those bits back to what
was queried from the hardware.
Turning the bits off for bad microcode is easy to fix. That can just use
setup_clear_cpu_cap() to force them off for all CPUs.
I was less keen on forcing the feature bits *on* that way, just in case
of inconsistencies. I appreciate that the kernel is going to get this
utterly wrong if CPU features are not consistent, because it has already
applied alternatives by the time secondary CPUs are brought up.
But at least if setup_force_cpu_cap() isn't being used, we might have a
chance of *detecting* the lack of the corresponding bit and either
panicking or refusing to bring the offending CPU online.
So ensure that the appropriate feature bits are set within get_cpu_cap()
regardless of how many extra times it's called.
Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: karahmed@amazon.de
Cc: peterz@infradead.org
Cc: bp@alien8.de
Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk
|
|
Reformat DPC register definitions to follow the convention that register
field masks indicate the register width, e.g., a field of a 16-bit register
uses a mask of 4 hex digits, with leading zeros included as needed.
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
|
|
Add definitions for DPC Status register fields and use them in the code.
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
|
|
dpc_process_rp_pio_error() only calls dpc_rp_pio_get_info(), so squash them
together. No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sinan Kaya <okaya@codeaurora.org>
|