summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-12-09Merge branch 'amd-xgbe'David S. Miller
Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver fixes 2014-12-02 The following series of patches includes two bug fixes. Unfortunately, the first patch will create a conflict when eventually merged into net-next but should be very easy to resolve. - Do not clear the interrupt bit in the xgbe_ring_data structure - Associate a Tx SKB with the proper xgbe_ring_data structure This patch series is based on net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09amd-xgbe: Associate Tx SKB with proper ring descriptorLendacky, Thomas
The SKB for a Tx packet is associated with an xgbe_ring_data structure in the xgbe_map_tx_skb function. However, it is being saved in the structure after the last structure used when the SKB is mapped. Use the last used structure to save the SKB value. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09amd-xgbe: Do not clear interrupt indicatorLendacky, Thomas
The interrupt value within the xgbe_ring_data structure is used as an indicator of which Rx descriptor should have the INTE bit set to generate an interrupt when that Rx descriptor is used. This bit was mistakenly cleared in the xgbe_unmap_rdata function, effectively nullifying the ethtool rx-frames support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09amd-xgbe: IRQ names require allocated memoryLendacky, Thomas
When requesting an irq, the name passed in must be (part of) allocated memory. The irq name was a local variable and resulted in random characters when listing /proc/interrupts. Add a character field to the xgbe_channel structure to hold the irq name and use that. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09sunrpc/cache: convert to use string_escape_str()Andy Shevchenko
There is nice kernel helper to escape a given strings by provided rules. Let's use it instead of custom approach. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [bfields@redhat.com: fix length calculation] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: only call test_bit once in svc_xprt_receivedJeff Layton
...move the WARN_ON_ONCE inside the following if block since they use the same condition. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09fs: nfsd: Fix signedness bug in compare_blobRasmus Villemoes
Bugs similar to the one in acbbe6fbb240 (kcmp: fix standard comparison bug) are in rich supply. In this variant, the problem is that struct xdr_netobj::len has type unsigned int, so the expression o1->len - o2->len _also_ has type unsigned int; it has completely well-defined semantics, and the result is some non-negative integer, which is always representable in a long long. But this means that if the conditional triggers, we are guaranteed to return a positive value from compare_blob. In this case it could be fixed by - res = o1->len - o2->len; + res = (long long)o1->len - (long long)o2->len; but I'd rather eliminate the usually broken 'return a - b;' idiom. Reviewed-by: Jeff Layton <jlayton@primarydata.com> Cc: <stable@vger.kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: add some tracepoints around enqueue and dequeue of svc_xprtJeff Layton
These were useful when I was tracking down a race condition between svc_xprt_do_enqueue and svc_get_next_xprt. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: convert to lockless lookup of queued server threadsJeff Layton
Testing has shown that the pool->sp_lock can be a bottleneck on a busy server. Every time data is received on a socket, the server must take that lock in order to dequeue a thread from the sp_threads list. Address this problem by eliminating the sp_threads list (which contains threads that are currently idle) and replacing it with a RQ_BUSY flag in svc_rqst. This allows us to walk the sp_all_threads list under the rcu_read_lock and find a suitable thread for the xprt by doing a test_and_set_bit. Note that we do still have a potential atomicity problem however with this approach. We don't want svc_xprt_do_enqueue to set the rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned zero (which indicates that the thread was idle). But, by the time we check that, the bit could be flipped by a waking thread. To address this, we acquire a new per-rqst spinlock (rq_lock) and take that before doing the test_and_set_bit. If that returns false, then we can set rq_xprt and drop the spinlock. Then, when the thread wakes up, it must set the bit under the same spinlock and can trust that if it was already set then the rq_xprt is also properly set. With this scheme, the case where we have an idle thread no longer needs to take the highly contended pool->sp_lock at all, and that removes the bottleneck. That still leaves one issue: What of the case where we walk the whole sp_all_threads list and don't find an idle thread? Because the search is lockess, it's possible for the queueing to race with a thread that is going to sleep. To address that, we queue the xprt and then search again. If we find an idle thread at that point, we can't attach the xprt to it directly since that might race with a different thread waking up and finding it. All we can do is wake the idle thread back up and let it attempt to find the now-queued xprt. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Chris Worley <chris.worley@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: fix potential races in pool_stats collectionJeff Layton
In a later patch, we'll be removing some spinlocking around the socket and thread queueing code in order to fix some contention problems. At that point, the stats counters will no longer be protected by the sp_lock. Change the counters to atomic_long_t fields, except for the "sockets_queued" counter which will still be manipulated under a spinlock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Chris Worley <chris.worley@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free itJeff Layton
...also make the manipulation of sp_all_threads list use RCU-friendly functions. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Chris Worley <chris.worley@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: require svc_create callers to pass in meaningful shutdown routineJeff Layton
Currently all svc_create callers pass in NULL for the shutdown parm, which then gets fixed up to be svc_rpcb_cleanup if the service uses rpcbind. Simplify this by instead having the the only caller that requires it (lockd) pass in svc_rpcb_cleanup and get rid of the special casing. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: have svc_wake_up only deal with pool 0Jeff Layton
The way that svc_wake_up works is a bit inefficient. It walks all of the available pools for a service and either wakes up a task in each one or sets the SP_TASK_PENDING flag in each one. When svc_wake_up is called, there is no need to wake up more than one thread to do this work. In practice, only lockd currently uses this function and it's single threaded anyway. Thus, this just boils down to doing a wake up of a thread in pool 0 or setting a single flag. Eliminate the for loop in this function and change it to just operate on pool 0. Also update the comments that sit above it and get rid of some code that has been commented out for years now. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: convert sp_task_pending flag to use atomic bitopsJeff Layton
In a later patch, we'll want to be able to handle this flag without holding the sp_lock. Change this field to an unsigned long flags field, and declare a new flag in it that can be managed with atomic bitops. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: move rq_cachetype field to better optimize spaceJeff Layton
There are a couple of holes in the svc_rqst field on x86_64. Move the rq_cachetype to a different location to eliminate both of them. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: move rq_splice_ok flag into rq_flagsJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: move rq_dropme flag into rq_flagsJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: move rq_usedeferral flag to rq_flagsJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: move rq_local field to rq_flagsJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to itJeff Layton
In a later patch, we're going to need some atomic bit flags. Since that field will need to be an unsigned long, we mitigate that space consumption by migrating some other bitflags to the new field. Start with the rq_secure flag. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09Merge tag 'nfs-for-3.19-1' into nfsd for-3.19 branchJ. Bruce Fields
Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in svc_rqst handling functions", which subsequent server rpc patches from jlayton depend on. I'm merging this later tag on the assumption that's more likely to be a tested and stable point.
2014-12-09blk-mq: Use all available hardware queuesBart Van Assche
Suppose that a system has two CPU sockets, three cores per socket, that it does not support hyperthreading and that four hardware queues are provided by a block driver. With the current algorithm this will lead to the following assignment of CPU cores to hardware queues: HWQ 0: 0 1 HWQ 1: 2 3 HWQ 2: 4 5 HWQ 3: (none) This patch changes the queue assignment into: HWQ 0: 0 1 HWQ 1: 2 HWQ 2: 3 4 HWQ 3: 5 In other words, this patch has the following three effects: - All four hardware queues are used instead of only three. - CPU cores are spread more evenly over hardware queues. For the above example the range of the number of CPU cores associated with a single HWQ is reduced from [0..2] to [1..2]. - If the number of HWQ's is a multiple of the number of CPU sockets it is now guaranteed that all CPU cores associated with a single HWQ reside on the same CPU socket. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@canonical.com> Cc: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-09blk-mq: Micro-optimize bt_get()Bart Van Assche
Remove a superfluous finish_wait() call. Convert the two bt_wait_ptr() calls into a single call. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Elliott <elliott@hp.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-09blk-mq: Fix a race between bt_clear_tag() and bt_get()Bart Van Assche
What we need is the following two guarantees: * Any thread that observes the effect of the test_and_set_bit() by __bt_get_word() also observes the preceding addition of 'current' to the appropriate wait list. This is guaranteed by the semantics of the spin_unlock() operation performed by prepare_and_wait(). Hence the conversion of test_and_set_bit_lock() into test_and_set_bit(). * The wait lists are examined by bt_clear() after the tag bit has been cleared. clear_bit_unlock() guarantees that any thread that observes that the bit has been cleared also observes the store operations preceding clear_bit_unlock(). However, clear_bit_unlock() does not prevent that the wait lists are examined before that the tag bit is cleared. Hence the addition of a memory barrier between clear_bit() and the wait list examination. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Elliott <elliott@hp.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: <stable@vger.kernel.org> # v3.13+ Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-09blk-mq: Avoid that __bt_get_word() wraps multiple timesBart Van Assche
If __bt_get_word() is called with last_tag != 0, if the first find_next_zero_bit() fails, if after wrap-around the test_and_set_bit() call fails and find_next_zero_bit() succeeds, if the next test_and_set_bit() call fails and subsequently find_next_zero_bit() does not find a zero bit, then another wrap-around will occur. Avoid this by introducing an additional local variable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Elliott <elliott@hp.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: <stable@vger.kernel.org> # v3.13+ Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-09blk-mq: Fix a use-after-freeBart Van Assche
blk-mq users are allowed to free the memory request_queue.tag_set points at after blk_cleanup_queue() has finished but before blk_release_queue() has started. This can happen e.g. in the SCSI core. The SCSI core namely embeds the tag_set structure in a SCSI host structure. The SCSI host structure is freed by scsi_host_dev_release(). This function is called after blk_cleanup_queue() finished but can be called before blk_release_queue(). This means that it is not safe to access request_queue.tag_set from inside blk_release_queue(). Hence remove the blk_sync_queue() call from blk_release_queue(). This call is not necessary - outstanding requests must have finished before blk_release_queue() is called. Additionally, move the blk_mq_free_queue() call from blk_release_queue() to blk_cleanup_queue() to avoid that struct request_queue.tag_set gets accessed after it has been freed. This patch avoids that the following kernel oops can be triggered when deleting a SCSI host for which scsi-mq was enabled: Call Trace: [<ffffffff8109a7c4>] lock_acquire+0xc4/0x270 [<ffffffff814ce111>] mutex_lock_nested+0x61/0x380 [<ffffffff812575f0>] blk_mq_free_queue+0x30/0x180 [<ffffffff8124d654>] blk_release_queue+0x84/0xd0 [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0 [<ffffffff8126c140>] kobject_put+0x30/0x70 [<ffffffff81245895>] blk_put_queue+0x15/0x20 [<ffffffff8125c409>] disk_release+0x99/0xd0 [<ffffffff8133d056>] device_release+0x36/0xb0 [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0 [<ffffffff8126c140>] kobject_put+0x30/0x70 [<ffffffff8125a78a>] put_disk+0x1a/0x20 [<ffffffff811d4cb5>] __blkdev_put+0x135/0x1b0 [<ffffffff811d56a0>] blkdev_put+0x50/0x160 [<ffffffff81199eb4>] kill_block_super+0x44/0x70 [<ffffffff8119a2a4>] deactivate_locked_super+0x44/0x60 [<ffffffff8119a87e>] deactivate_super+0x4e/0x70 [<ffffffff811b9833>] cleanup_mnt+0x43/0x90 [<ffffffff811b98d2>] __cleanup_mnt+0x12/0x20 [<ffffffff8107252c>] task_work_run+0xac/0xe0 [<ffffffff81002c01>] do_notify_resume+0x61/0xa0 [<ffffffff814d2c58>] int_signal+0x12/0x17 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Elliott <elliott@hp.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: <stable@vger.kernel.org> # v3.13+ Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-09virtio: allow finalize_features to failMichael S. Tsirkin
This will make it easy for transports to validate features and return failure. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09virtio_ccw: legacy: don't negotiate rev 1/featuresMichael S. Tsirkin
Legacy balloon device doesn't pretend to support revision 1 or 64 bit features. But just in case someone implements a broken one that does, let's not even try to drive legacy only devices using revision 1, and let's not give them a chance to say they support VIRTIO_F_VERSION_1 by not reading or writing high feature bits. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09virtio: add API to detect legacy devicesMichael S. Tsirkin
transports need to be able to detect legacy-only devices (ATM balloon only) to use legacy path to drive them. Add a core API to do just that. The implementation just blacklists balloon: not too pretty, but let's not over-engineer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09virtio_console: fix sparse warningsMichael S. Tsirkin
CHECK drivers/char/virtio_console.c drivers/char/virtio_console.c:687:36: warning: incorrect type in argument 1 (different address spaces) drivers/char/virtio_console.c:687:36: expected void [noderef] <asn:1>*to drivers/char/virtio_console.c:687:36: got char *out_buf drivers/char/virtio_console.c:790:35: warning: incorrect type in argument 2 (different address spaces) drivers/char/virtio_console.c:790:35: expected char *out_buf drivers/char/virtio_console.c:790:35: got char [noderef] <asn:1>*ubuf fill_readbuf is reused with both kernel and userspace pointers, depending on value of to_user flag. Tag address parameter as __user, and cast to/from regular pointer type when we know it's safe. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09vhost: remove unnecessary forward declarations in vhost.hJason Wang
Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09virtio: drop VIRTIO_F_VERSION_1 from driversMichael S. Tsirkin
Core activates this bit automatically now, drop it from drivers that set it explicitly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09virtio: make VIRTIO_F_VERSION_1 a transport bitMichael S. Tsirkin
Activate VIRTIO_F_VERSION_1 automatically unless legacy_only is set. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09virtio_balloon: add legacy_only flagMichael S. Tsirkin
We have no plans to support virtio 1.0 in balloon driver. Add an explicit flag to mark it legacy only. This will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09virtio_console: virtio 1.0 supportMichael S. Tsirkin
Pretty straight-forward, just use accessors for all fields. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09af_packet: virtio 1.0 stubsMichael S. Tsirkin
This merely fixes sparse warnings, without actually adding support for the new APIs. Still working out the best way to enable the new functionality. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09vhost/scsi: partial virtio 1.0 supportMichael S. Tsirkin
Include all endian conversions as required by virtio 1.0. Don't set virtio 1.0 yet, since that requires ANY_LAYOUT which we don't yet support. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-09virtio_scsi: export to userspaceMichael S. Tsirkin
Replace uXX by __uXX and _packed by __attribute((packed)) as seems to be the norm for userspace headers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-09virtio_scsi: move to uapiMichael S. Tsirkin
Guests need to use virtio scsi API, so export it to uapi, nice to e.g. qemu and will help us remember this file affects ABI. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-09virtio_scsi: v1.0 supportMichael S. Tsirkin
Note: for consistency, and to avoid sparse errors, convert all fields, even those no longer in use for virtio v1.0. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-09macvtap: TUN_VNET_LE supportMichael S. Tsirkin
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2014-12-09tun: TUN_VNET_LE support, fix sparse warnings for virtio headersMichael S. Tsirkin
Pretty straight-forward: convert all fields to/from virtio endian-ness. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2014-12-09tun: add VNET_LE flagMichael S. Tsirkin
virtio 1.0 modified virtio net header format, making all fields little endian. Users can tweak header format before submitting it to tun, but this means more data copies where none were necessary. And if the iovec is in RO memory, this means we might need to split iovec also means we might in theory overflow iovec max size. This patch adds a simpler way for applications to handle this, using new "little endian" flag in tun. As a result, tun simply byte-swaps header fields as appropriate. This is a NOP on LE architectures. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09tun: drop most type definesMichael S. Tsirkin
It's just as easy to use IFF_ flags directly, there's no point in adding our own defines. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09tun: move internal flag defines out of uapiMichael S. Tsirkin
TUN_ flags are internal and never exposed to userspace. Any application using it is almost certainly buggy. Move them out to tun.c. Note: we remove these completely in follow-up patches, this code movement is split out for ease of review. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09vhost/net: enable virtio 1.0Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09vhost/net: larger header for virtio 1.0Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09vhost/net: virtio 1.0 byte swapMichael S. Tsirkin
I had to add an explicit tag to suppress compiler warning: gcc isn't smart enough to notice that len is always initialized since function is called with size > 0. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09vhost: virtio 1.0 endian-ness supportMichael S. Tsirkin
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09vhost: switch to __get/__put_user exclusivelyMichael S. Tsirkin
Most places in vhost can use __get/__put_user rather than get/put_user since addresses are pre-validated. This should be good for performance, but this also will help make code sparse-clean: get/put_user macros don't play well with __virtioXX bitwise tags. Switch to get/put_user to __ variants everywhere in vhost. There's one exception - for consistency switch that as well, and add an explicit access_ok check. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>