summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-02-24cifs: don't leak -EAGAIN for stat() during reconnectRonnie Sahlberg
If from cifs_revalidate_dentry_attr() the SMB2/QUERY_INFO call fails with an error, such as STATUS_SESSION_EXPIRED, causing the session to be reconnected it is possible we will leak -EAGAIN back to the application even for system calls such as stat() where this is not a valid error. Fix this by re-trying the operation from within cifs_revalidate_dentry_attr() if cifs_get_inode_info*() returns -EAGAIN. This fixes stat() and possibly also other system calls that uses cifs_revalidate_dentry*(). Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org>
2020-02-24scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four ↵Adam Williamson
appropriate places Arnd Bergmann inadvertently typoed these in d320a9551e394 and 64cbfa96551a; they seem to be the cause of https://bugzilla.redhat.com/show_bug.cgi?id=1801353 , invalid SCSI commands when udev tries to query a DVD drive. [arnd] Found another instance of the same bug, also introduced in my compat_ioctl series. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1801353 Link: https://lore.kernel.org/r/20200219165139.3467320-1-arnd@arndb.de Fixes: c103d6ee69f9 ("compat_ioctl: ide: floppy: add handler") Fixes: 64cbfa96551a ("compat_ioctl: move cdrom commands into cdrom.c") Fixes: d320a9551e39 ("compat_ioctl: scsi: move ioctl handling into drivers") Bisected-by: Chris Murphy <bugzilla@colorremedies.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including the fix for CVE-2020-2732 and a few issues found by 'make W=1'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: rstify new ioctls in api.rst KVM: nVMX: Check IO instruction VM-exit conditions KVM: nVMX: Refactor IO bitmap checks into helper function KVM: nVMX: Don't emulate instructions in guest mode KVM: nVMX: Emulate MTF when performing instruction emulation KVM: fix error handling in svm_hardware_setup KVM: SVM: Fix potential memory leak in svm_cpu_init() KVM: apic: avoid calculating pending eoi from an uninitialized val KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI kvm/emulate: fix a -Werror=cast-function-type KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Fix some obsolete comments and grammar error KVM: x86: fix missing prototypes KVM: x86: enable -Werror
2020-02-24Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a Kconfig-related build error and an integer overflow in chacha20poly1305" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: chacha20poly1305 - prevent integer overflow on large input tee: amdtee: amdtee depends on CRYPTO_DEV_CCP_DD
2020-02-24Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull tmpfs fix from Al Viro: "Regression from fs_parse series this cycle..." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: tmpfs: deny and force are not huge mount options
2020-02-24floppy: check FDC index for errors before assigning itLinus Torvalds
Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in wait_til_ready(). Which on the face of it can't happen, since as Willy Tarreau points out, the function does no particular memory access. Except through the FDCS macro, which just indexes a static allocation through teh current fdc, which is always checked against N_FDC. Except the checking happens after we've already assigned the value. The floppy driver is a disgrace (a lot of it going back to my original horrd "design"), and has no real maintainer. Nobody has the hardware, and nobody really cares. But it still gets used in virtual environment because it's one of those things that everybody supports. The whole thing should be re-written, or at least parts of it should be seriously cleaned up. The 'current fdc' index, which is used by the FDCS macro, and which is often shadowed by a local 'fdc' variable, is a prime example of how not to write code. But because nobody has the hardware or the motivation, let's just fix up the immediate problem with a nasty band-aid: test the fdc index before actually assigning it to the static 'fdc' variable. Reported-by: Jordy Zomer <jordy@simplyhacker.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-24net: bridge: fix stale eth hdr pointer in br_dev_xmitNikolay Aleksandrov
In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but if the packet has the vlan header inside (e.g. bridge with disabled tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag() to extract the vid before filtering which in turn calls pskb_may_pull() and we may end up with a stale eth pointer. Moreover the cached eth header pointer will generally be wrong after that operation. Remove the eth header caching and just use eth_hdr() directly, the compiler does the right thing and calculates it only once so we don't lose anything. Fixes: 057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24Merge branch 'net-ll_temac-Bugfixes'David S. Miller
Esben Haabendal says: ==================== net: ll_temac: Bugfixes Fix a number of bugs which have been present since the first commit. The bugs fixed in patch 1,2 and 4 have all been observed in real systems, and was relatively easy to reproduce given an appropriate stress setup. Changes since v1: - Changed error handling of of dma_map_single() in temac_start_xmit() to drop packet instead of returning NETDEV_TX_BUSY. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Handle DMA halt condition caused by buffer underrunEsben Haabendal
The SDMA engine used by TEMAC halts operation when it has finished processing of the last buffer descriptor in the buffer ring. Unfortunately, no interrupt event is generated when this happens, so we need to setup another mechanism to make sure DMA operation is restarted when enough buffers have been added to the ring. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressureEsben Haabendal
Failures caused by GFP_ATOMIC memory pressure have been observed, and due to the missing error handling, results in kernel crash such as [1876998.350133] kernel BUG at mm/slub.c:3952! [1876998.350141] invalid opcode: 0000 [#1] PREEMPT SMP PTI [1876998.350147] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-scnxt #1 [1876998.350150] Hardware name: N/A N/A/COMe-bIP2, BIOS CCR2R920 03/01/2017 [1876998.350160] RIP: 0010:kfree+0x1ca/0x220 [1876998.350164] Code: 85 db 74 49 48 8b 95 68 01 00 00 48 31 c2 48 89 10 e9 d7 fe ff ff 49 8b 04 24 a9 00 00 01 00 75 0b 49 8b 44 24 08 a8 01 75 02 <0f> 0b 49 8b 04 24 31 f6 a9 00 00 01 00 74 06 41 0f b6 74 24 5b [1876998.350172] RSP: 0018:ffffc900000f0df0 EFLAGS: 00010246 [1876998.350177] RAX: ffffea00027f0708 RBX: ffff888008d78000 RCX: 0000000000391372 [1876998.350181] RDX: 0000000000000000 RSI: ffffe8ffffd01400 RDI: ffff888008d78000 [1876998.350185] RBP: ffff8881185a5d00 R08: ffffc90000087dd8 R09: 000000000000280a [1876998.350189] R10: 0000000000000002 R11: 0000000000000000 R12: ffffea0000235e00 [1876998.350193] R13: ffff8881185438a0 R14: 0000000000000000 R15: ffff888118543870 [1876998.350198] FS: 0000000000000000(0000) GS:ffff88811f300000(0000) knlGS:0000000000000000 [1876998.350203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 s#1 Part1 [1876998.350206] CR2: 00007f8dac7b09f0 CR3: 000000011e20a006 CR4: 00000000001606e0 [1876998.350210] Call Trace: [1876998.350215] <IRQ> [1876998.350224] ? __netif_receive_skb_core+0x70a/0x920 [1876998.350229] kfree_skb+0x32/0xb0 [1876998.350234] __netif_receive_skb_core+0x70a/0x920 [1876998.350240] __netif_receive_skb_one_core+0x36/0x80 [1876998.350245] process_backlog+0x8b/0x150 [1876998.350250] net_rx_action+0xf7/0x340 [1876998.350255] __do_softirq+0x10f/0x353 [1876998.350262] irq_exit+0xb2/0xc0 [1876998.350265] do_IRQ+0x77/0xd0 [1876998.350271] common_interrupt+0xf/0xf [1876998.350274] </IRQ> In order to handle such failures more graceful, this change splits the receive loop into one for consuming the received buffers, and one for allocating new buffers. When GFP_ATOMIC allocations fail, the receive will continue with the buffers that is still there, and with the expectation that the allocations will succeed in a later call to receive. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Add more error handling of dma_map_single() callsEsben Haabendal
This adds error handling to the remaining dma_map_single() calls, so that behavior is well defined if/when we run out of DMA memory. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix race condition causing TX hangEsben Haabendal
It is possible that the interrupt handler fires and frees up space in the TX ring in between checking for sufficient TX ring space and stopping the TX queue in temac_start_xmit. If this happens, the queue wake from the interrupt handler will occur before the queue is stopped, causing a lost wakeup and the adapter's transmit hanging. To avoid this, after stopping the queue, check again whether there is sufficient space in the TX ring. If so, wake up the queue again. This is a port of the similar fix in axienet driver, commit 7de44285c1f6 ("net: axienet: Fix race condition causing TX hang"). Fixes: 23ecc4bde21f ("net: ll_temac: fix checksum offload logic") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24KVM: s390: rstify new ioctls in api.rstChristian Borntraeger
We also need to rstify the new ioctls that we added in parallel to the rstification of the kvm docs. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24scsi: zfcp: fix wrong data and display format of SFP+ temperatureBenjamin Block
When implementing support for retrieval of local diagnostic data from the FCP channel, the wrong data format was assumed for the temperature of the local SFP+ connector. The Fibre Channel Link Services (FC-LS-3) specification is not clear on the format of the stored integer, and only after consulting the SNIA specification SFF-8472 did we realize it is stored as two's complement. Thus, the used data and display format is wrong, and highly misleading for users when the temperature should drop below 0°C (however unlikely that may be). To fix this, change the data format in `struct fsf_qtcb_bottom_port` from unsigned to signed, and change the printf format string used to generate `zfcp_sysfs_adapter_diag_sfp_temperature_show()` from `%hu` to `%hd`. Link: https://lore.kernel.org/r/d6e3be5428da5c9490cfff4df7cae868bc9f1a7e.1582039501.git.bblock@linux.ibm.com Fixes: a10a61e807b0 ("scsi: zfcp: support retrieval of SFP Data via Exchange Port Data") Fixes: 6028f7c4cd87 ("scsi: zfcp: introduce sysfs interface for diagnostics of local SFP transceiver") Cc: <stable@vger.kernel.org> # 5.5+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-24scsi: sd_sbc: Fix sd_zbc_report_zones()Damien Le Moal
The block layer generic blk_revalidate_disk_zones() checks the validity of zone descriptors reported by a disk using the blk_revalidate_zone_cb() callback function executed for each zone descriptor. If a ZBC disk reports invalid zone descriptors, blk_revalidate_disk_zones() returns an error and sd_zbc_read_zones() changes the disk capacity to 0, which in turn results in the gendisk structure capacity to be set to 0. This all works well for the first revalidate pass on a disk and the block layer detects the capactiy change. On the second revalidate pass, blk_revalidate_disk_zones() is called again and sd_zbc_report_zones() executed to check the zones a second time. However, for this second pass, the gendisk capacity is now 0, which results in sd_zbc_report_zones() to do nothing and to report success and no zones. blk_revalidate_disk_zones() in turn returns success and sets the disk queue chunk_sectors limit with zero as no zones were checked, causing a oops to trigger on the BUG_ON(!is_power_of_2(chunk_sectors)) in blk_queue_chunk_sectors(). Fix this by using the sdkp capacity field rather than the gendisk capacity for the report zones loop in sd_zbc_report_zones(). Also add a check to return immediately an error if the sdkp capacity is 0. With this fix, invalid/buggy ZBC disk scan does not trigger a oops and are exposed with a 0 capacity. This change also preserve the chance for the disk to be correctly revalidated on the second revalidate pass as the scsi disk structure capacity field is always set to the disk reported value when sd_zbc_report_zones() is called. Link: https://lore.kernel.org/r/20200219063800.880834-1-damien.lemoal@wdc.com Fixes: d41003513e61 ("block: rework zone reporting") Cc: Cc: <stable@vger.kernel.org> # v5.5 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-24io_uring: fix personality idr leakJens Axboe
We somehow never free the idr, even though we init it for every ctx. Free it when the rest of the ring data is freed. Fixes: 071698e13ac6 ("io_uring: allow registering credentials") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-24sched/numa: Stop an exhastive search if a reasonable swap candidate or idle ↵Mel Gorman
CPU is found When domains are imbalanced or overloaded a search of all CPUs on the target domain is searched and compared with task_numa_compare. In some circumstances, a candidate is found that is an obvious win. o A task can move to an idle CPU and an idle CPU is found o A swap candidate is found that would move to its preferred domain This patch terminates the search when either condition is met. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-14-mgorman@techsingularity.net
2020-02-24sched/numa: Bias swapping tasks based on their preferred nodeMel Gorman
When swapping tasks for NUMA balancing, it is preferred that tasks move to or remain on their preferred node. When considering an imbalance, encourage tasks to move to their preferred node and discourage tasks from moving away from their preferred node. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-13-mgorman@techsingularity.net
2020-02-24sched/numa: Find an alternative idle CPU if the CPU is part of an active ↵Mel Gorman
NUMA balance Multiple tasks can attempt to select and idle CPU but fail because numa_migrate_on is already set and the migration fails. Instead of failing, scan for an alternative idle CPU. select_idle_sibling is not used because it requires IRQs to be disabled and it ignores numa_migrate_on allowing multiple tasks to stack. This scan may still fail if there are idle candidate CPUs due to races but if this occurs, it's best that a task stay on an available CPU that move to a contended one. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-12-mgorman@techsingularity.net
2020-02-24sched/numa: Prefer using an idle CPU as a migration target instead of ↵Mel Gorman
comparing tasks task_numa_find_cpu() can scan a node multiple times. Minimally it scans to gather statistics and later to find a suitable target. In some cases, the second scan will simply pick an idle CPU if the load is not imbalanced. This patch caches information on an idle core while gathering statistics and uses it immediately if load is not imbalanced to avoid a second scan of the node runqueues. Preference is given to an idle core rather than an idle SMT sibling to avoid packing HT siblings due to linearly scanning the node cpumask. As a side-effect, even when the second scan is necessary, the importance of using select_idle_sibling is much reduced because information on idle CPUs is cached and can be reused. Note that this patch actually makes is harder to move to an idle CPU as multiple tasks can race for the same idle CPU due to a race checking numa_migrate_on. This is addressed in the next patch. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-11-mgorman@techsingularity.net
2020-02-24sched/fair: Take into account runnable_avg to classify groupVincent Guittot
Take into account the new runnable_avg signal to classify a group and to mitigate the volatility of util_avg in face of intensive migration or new task with random utilization. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-10-mgorman@techsingularity.net
2020-02-24sched/pelt: Add a new runnable average signalVincent Guittot
Now that runnable_load_avg has been removed, we can replace it by a new signal that will highlight the runnable pressure on a cfs_rq. This signal track the waiting time of tasks on rq and can help to better define the state of rqs. At now, only util_avg is used to define the state of a rq: A rq with more that around 80% of utilization and more than 1 tasks is considered as overloaded. But the util_avg signal of a rq can become temporaly low after that a task migrated onto another rq which can bias the classification of the rq. When tasks compete for the same rq, their runnable average signal will be higher than util_avg as it will include the waiting time and we can use this signal to better classify cfs_rqs. The new runnable_avg will track the runnable time of a task which simply adds the waiting time to the running time. The runnable _avg of cfs_rq will be the /Sum of se's runnable_avg and the runnable_avg of group entity will follow the one of the rq similarly to util_avg. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-9-mgorman@techsingularity.net
2020-02-24sched/pelt: Remove unused runnable load averageVincent Guittot
Now that runnable_load_avg is no more used, we can remove it to make space for a new signal. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-8-mgorman@techsingularity.net
2020-02-24sched/numa: Use similar logic to the load balancer for moving between ↵Mel Gorman
domains with spare capacity The standard load balancer generally tries to keep the number of running tasks or idle CPUs balanced between NUMA domains. The NUMA balancer allows tasks to move if there is spare capacity but this causes a conflict and utilisation between NUMA nodes gets badly skewed. This patch uses similar logic between the NUMA balancer and load balancer when deciding if a task migrating to its preferred node can use an idle CPU. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-7-mgorman@techsingularity.net
2020-02-24sched/numa: Replace runnable_load_avg by load_avgVincent Guittot
Similarly to what has been done for the normal load balancer, we can replace runnable_load_avg by load_avg in numa load balancing and track the other statistics like the utilization and the number of running tasks to get to better view of the current state of a node. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-6-mgorman@techsingularity.net
2020-02-24sched/fair: Reorder enqueue/dequeue_task_fair pathVincent Guittot
The walk through the cgroup hierarchy during the enqueue/dequeue of a task is split in 2 distinct parts for throttled cfs_rq without any added value but making code less readable. Change the code ordering such that everything related to a cfs_rq (throttled or not) will be done in the same loop. In addition, the same steps ordering is used when updating a cfs_rq: - update_load_avg - update_cfs_group - update *h_nr_running This reordering enables the use of h_nr_running in PELT algorithm. No functional and performance changes are expected and have been noticed during tests. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: "Dietmar Eggemann <dietmar.eggemann@arm.com>" Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-5-mgorman@techsingularity.net
2020-02-24sched/numa: Distinguish between the different task_numa_migrate() failure casesMel Gorman
sched:sched_stick_numa is meant to fire when a task is unable to migrate to the preferred node but from the trace, it's possibile to tell the difference between "no CPU found", "migration to idle CPU failed" and "tasks could not be swapped". Extend the tracepoint accordingly. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-4-mgorman@techsingularity.net
2020-02-24sched/numa: Trace when no candidate CPU was found on the preferred nodeMel Gorman
sched:sched_stick_numa is meant to fire when a task is unable to migrate to the preferred node. The case where no candidate CPU could be found is not traced which is an important gap. The tracepoint is not fired when the task is not allowed to run on any CPU on the preferred node or the task is already running on the target CPU but neither are interesting corner cases. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/20200224095223.13361-3-mgorman@techsingularity.net
2020-02-24Merge tag 'v5.6-rc3' into sched/core, to pick up fixes and dependent patchesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-24drm/i915/gvt: Separate display reset from ALL_ENGINES resetTina Zhang
ALL_ENGINES reset doesn't clobber display with the current gvt-g supported platforms. Thus ALL_ENGINES reset shouldn't reset the display engine registers emulated by gvt-g. This fixes guest warning like [ 14.622026] [drm] Initialized i915 1.6.0 20200114 for 0000:00:03.0 on minor 0 [ 14.967917] fbcon: i915drmfb (fb0) is primary device [ 25.100188] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] E RROR [CRTC:51:pipe A] flip_done timed out [ 25.100860] -----------[ cut here ]----------- [ 25.100861] pll on state mismatch (expected 0, found 1) [ 25.101024] WARNING: CPU: 1 PID: 30 at drivers/gpu/drm/i915/display/intel_dis play.c:14382 verify_single_dpll_state.isra.115+0x28f/0x320 [i915] [ 25.101025] Modules linked in: intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 aesni_intel cr ypto_simd cryptd glue_helper cec rc_core video drm_kms_helper joydev drm input_l eds i2c_algo_bit serio_raw fb_sys_fops syscopyarea sysfillrect sysimgblt mac_hid qemu_fw_cfg sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 e1000 psmouse i2c_piix4 pata_acpi floppy [ 25.101052] CPU: 1 PID: 30 Comm: kworker/u4:1 Not tainted 5.5.0+ #1 [ 25.101053] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1 .12.1-0-ga5cab58 04/01/2014 [ 25.101055] Workqueue: events_unbound async_run_entry_fn [ 25.101092] RIP: 0010:verify_single_dpll_state.isra.115+0x28f/0x320 [i915] [ 25.101093] Code: e0 d9 ff e9 a3 fe ff ff 80 3d e9 c2 11 00 00 44 89 f6 48 c7 c7 c0 9d 88 c0 75 3b e8 eb df d9 ff e9 c7 fe ff ff e8 d1 e0 ae c4 <0f> 0b e9 7a fe ff ff 80 3d c0 c2 11 00 00 8d 71 41 89 c2 48 c7 c7 [ 25.101093] RSP: 0018:ffffb1de80107878 EFLAGS: 00010286 [ 25.101094] RAX: 0000000000000000 RBX: ffffb1de80107884 RCX: 0000000000000007 [ 25.101095] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff94fdfdd19740 [ 25.101095] RBP: ffffb1de80107938 R08: 0000000d6bfdc7b4 R09: 000000000000002b [ 25.101096] R10: ffff94fdf82dc000 R11: 0000000000000225 R12: 00000000000001f8 [ 25.101096] R13: ffff94fdb3ca6a90 R14: ffff94fdb3ca0000 R15: 0000000000000000 [ 25.101097] FS: 0000000000000000(0000) GS:ffff94fdfdd00000(0000) knlGS:00000 00000000000 [ 25.101098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.101098] CR2: 00007fbc3e2be9c8 CR3: 000000003339a003 CR4: 0000000000360ee0 [ 25.101101] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 25.101101] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 25.101102] Call Trace: [ 25.101139] intel_atomic_commit_tail+0xde4/0x1520 [i915] [ 25.101141] ? flush_workqueue_prep_pwqs+0xfa/0x130 [ 25.101142] ? flush_workqueue+0x198/0x3c0 [ 25.101174] intel_atomic_commit+0x2ad/0x320 [i915] [ 25.101209] drm_atomic_commit+0x4a/0x50 [drm] [ 25.101220] drm_client_modeset_commit_atomic+0x1c4/0x200 [drm] [ 25.101231] drm_client_modeset_commit_force+0x47/0x170 [drm] [ 25.101250] drm_fb_helper_restore_fbdev_mode_unlocked+0x4e/0xa0 [drm_kms_hel per] [ 25.101255] drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper] [ 25.101287] intel_fbdev_set_par+0x1a/0x40 [i915] [ 25.101289] ? con_is_visible+0x2e/0x60 [ 25.101290] fbcon_init+0x378/0x600 [ 25.101292] visual_init+0xd5/0x130 [ 25.101296] do_bind_con_driver+0x217/0x430 [ 25.101297] do_take_over_console+0x7d/0x1b0 [ 25.101298] do_fbcon_takeover+0x5c/0xb0 [ 25.101299] fbcon_fb_registered+0x199/0x1a0 [ 25.101301] register_framebuffer+0x22c/0x330 [ 25.101306] __drm_fb_helper_initial_config_and_unlock+0x31a/0x520 [drm_kms_h elper] [ 25.101311] drm_fb_helper_initial_config+0x35/0x40 [drm_kms_helper] [ 25.101341] intel_fbdev_initial_config+0x18/0x30 [i915] [ 25.101342] async_run_entry_fn+0x3c/0x150 [ 25.101343] process_one_work+0x1fd/0x3f0 [ 25.101344] worker_thread+0x34/0x410 [ 25.101346] kthread+0x121/0x140 [ 25.101346] ? process_one_work+0x3f0/0x3f0 [ 25.101347] ? kthread_park+0x90/0x90 [ 25.101350] ret_from_fork+0x35/0x40 [ 25.101351] --[ end trace b5b47d44cd998ba1 ]-- Fixes: 6294b61ba769 ("drm/i915/gvt: add missing display part reset for vGPU reset") Signed-off-by: Tina Zhang <tina.zhang@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20200221023234.28635-1-tina.zhang@intel.com
2020-02-24mac80211: rx: avoid RCU list traversal under mutexMadhuparna Bhowmik
local->sta_mtx is held in __ieee80211_check_fast_rx_iface(). No need to use list_for_each_entry_rcu() as it also requires a cond argument to avoid false lockdep warnings when not used in RCU read-side section (with CONFIG_PROVE_RCU_LIST). Therefore use list_for_each_entry(); Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: explicitly include if_vlan.hJohannes Berg
We use that here, and do seem to get it through some recursive include, but better include it explicitly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200224093814.1b9c258fec67.I45ac150d4e11c72eb263abec9f1f0c7add9bef2b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-23net: core: devlink.c: Hold devlink->lock from the beginning of ↵Madhuparna Bhowmik
devlink_dpipe_table_register() devlink_dpipe_table_find() should be called under either rcu_read_lock() or devlink->lock. devlink_dpipe_table_register() calls devlink_dpipe_table_find() without holding the lock and acquires it later. Therefore hold the devlink->lock from the beginning of devlink_dpipe_table_register(). Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: phy: Avoid multiple suspendsFlorian Fainelli
It is currently possible for a PHY device to be suspended as part of a network device driver's suspend call while it is still being attached to that net_device, either via phy_suspend() or implicitly via phy_stop(). Later on, when the MDIO bus controller get suspended, we would attempt to suspend again the PHY because it is still attached to a network device. This is both a waste of time and creates an opportunity for improper clock/power management bugs to creep in. Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: ks8851-ml: Fix IRQ handling and lockingMarek Vasut
The KS8851 requires that packet RX and TX are mutually exclusive. Currently, the driver hopes to achieve this by disabling interrupt from the card by writing the card registers and by disabling the interrupt on the interrupt controller. This however is racy on SMP. Replace this approach by expanding the spinlock used around the ks_start_xmit() TX path to ks_irq() RX path to assure true mutual exclusion and remove the interrupt enabling/disabling, which is now not needed anymore. Furthermore, disable interrupts also in ks_net_stop(), which was missing before. Note that a massive improvement here would be to re-use the KS8851 driver approach, which is to move the TX path into a worker thread, interrupt handling to threaded interrupt, and synchronize everything with mutexes, but that would be a much bigger rework, for a separate patch. Signed-off-by: Marek Vasut <marex@denx.de> Cc: David S. Miller <davem@davemloft.net> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Stetiar <ynezz@true.cz> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23docs: networking: phy: Rephrase paragraph for clarityJonathan Neuschäfer
Let's make it a little easier to read. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23io_uring: handle multiple personalities in link chainsJens Axboe
If we have a chain of requests and they don't all use the same credentials, then the head of the chain will be issued with the credentails of the tail of the chain. Ensure __io_queue_sqe() overrides the credentials, if they are different. Once we do that, we can clean up the creds handling as well, by only having io_submit_sqe() do the lookup of a personality. It doesn't need to assign it, since __io_queue_sqe() now always does the right thing. Fixes: 75c6a03904e0 ("io_uring: support using a registered personality for commands") Reported-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-24Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"Orson Zhai
This reverts commit 4585fbcb5331fc910b7e553ad3efd0dd7b320d14. The name changing as devfreq(X) breaks some user space applications, such as Android HAL from Unisoc and Hikey [1]. The device name will be changed unexpectly after every boot depending on module init sequence. It will make trouble to setup some system configuration like selinux for Android. So we'd like to revert it back to old naming rule before any better way being found. [1] https://lkml.org/lkml/2018/5/8/1042 Cc: John Stultz <john.stultz@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Signed-off-by: Orson Zhai <orson.unisoc@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-02-23tcp: fix TFO SYNACK undo to avoid double-timestamp-undoNeal Cardwell
In a rare corner case the new logic for undo of SYNACK RTO could result in triggering the warning in tcp_fastretrans_alert() that says: WARN_ON(tp->retrans_out != 0); The warning looked like: WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270 The sequence that tickles this bug is: - Fast Open server receives TFO SYN with data, sends SYNACK - (client receives SYNACK and sends ACK, but ACK is lost) - server app sends some data packets - (N of the first data packets are lost) - server receives client ACK that has a TS ECR matching first SYNACK, and also SACKs suggesting the first N data packets were lost - server performs TS undo of SYNACK RTO, then immediately enters recovery - buggy behavior then performed a *second* undo that caused the connection to be in CA_Open with retrans_out != 0 Basically, the incoming ACK packet with SACK blocks causes us to first undo the cwnd reduction from the SYNACK RTO, but then immediately enters fast recovery, which then makes us eligible for undo again. And then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo using a "mash-up" of state from two different loss recovery phases: it uses the timestamp info from the ACK of the original SYNACK, and the undo_marker from the fast recovery. This fix refines the logic to only invoke the tcp_try_undo_loss() inside tcp_rcv_synrecv_state_fastopen() if the connection is still in CA_Loss. If peer SACKs triggered fast recovery, then tcp_rcv_synrecv_state_fastopen() can't safely undo. Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23hv_netvsc: Fix unwanted wakeup in netvsc_attach()Haiyang Zhang
When netvsc_attach() is called by operations like changing MTU, etc., an extra wakeup may happen while netvsc_attach() calling rndis_filter_device_add() which sends rndis messages when queue is stopped in netvsc_detach(). The completion message will wake up queue 0. We can reproduce the issue by changing MTU etc., then the wake_queue counter from "ethtool -S" will increase beyond stop_queue counter: stop_queue: 0 wake_queue: 1 The issue causes queue wake up, and counter increment, no other ill effects in current code. So we didn't see any network problem for now. To fix this, initialize tx_disable to true, and set it to false when the NIC is ready to be attached or registered. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23Linux 5.6-rc3v5.6-rc3Linus Torvalds
2020-02-23net: usb: qmi_wwan: restore mtu min/max values after raw_ip switchDaniele Palmas
usbnet creates network interfaces with min_mtu = 0 and max_mtu = ETH_MAX_MTU. These values are not modified by qmi_wwan when the network interface is created initially, allowing, for example, to set mtu greater than 1500. When a raw_ip switch is done (raw_ip set to 'Y', then set to 'N') the mtu values for the network interface are set through ether_setup, with min_mtu = ETH_MIN_MTU and max_mtu = ETH_DATA_LEN, not allowing anymore to set mtu greater than 1500 (error: mtu greater than device maximum). The patch restores the original min/max mtu values set by usbnet after a raw_ip switch. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23Merge tag 'for-5.6-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "These are fixes that were found during testing with help of error injection, plus some other stable material. There's a fixup to patch added to rc1 causing locking in wrong context warnings, tests found one more deadlock scenario. The patches are tagged for stable, two of them now in the queue but we'd like all three released at the same time. I'm not happy about fixes to fixes in such a fast succession during rcs, but I hope we found all the fallouts of commit 28553fa992cb ('Btrfs: fix race between shrinking truncate and fiemap')" * tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents btrfs: fix bytes_may_use underflow in prealloc error condtition btrfs: handle logged extent failure properly btrfs: do not check delayed items are empty for single transaction cleanup btrfs: reset fs_root to NULL on error in open_ctree btrfs: destroy qgroup extent records on transaction abort
2020-02-23Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "More miscellaneous ext4 bug fixes (all stable fodder)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix mount failure with quota configured as module jbd2: fix ocfs2 corrupt when clearing block group bits ext4: fix race between writepages and enabling EXT4_EXTENTS_FL ext4: rename s_journal_flag_rwsem to s_writepages_rwsem ext4: fix potential race between s_flex_groups online resizing and access ext4: fix potential race between s_group_info online resizing and access ext4: fix potential race between online resizing and write operations ext4: add cond_resched() to __ext4_find_entry() ext4: fix a data race in EXT4_I(inode)->i_disksize
2020-02-23Merge tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linuxLinus Torvalds
Pull csky updates from Guo Ren: "Sorry, I missed 5.6-rc1 merge window, but in this pull request the most are the fixes and the rests are between fixes and features. The only outside modification is the MAINTAINERS file update with our mailing list. - cache flush implementation fixes - ftrace modify panic fix - CONFIG_SMP boot problem fix - fix pt_regs saving for atomic.S - fix fixaddr_init without highmem. - fix stack protector support - fix fake Tightly-Coupled Memory code compile and use - fix some typos and coding convention" * tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux: (23 commits) csky: Replace <linux/clk-provider.h> by <linux/of_clk.h> csky: Implement copy_thread_tls csky: Add PCI support csky: Minimize defconfig to support buildroot config.fragment csky: Add setup_initrd check code csky: Cleanup old Kconfig options arch/csky: fix some Kconfig typos csky: Fixup compile warning for three unimplemented syscalls csky: Remove unused cache implementation csky: Fixup ftrace modify panic csky: Add flush_icache_mm to defer flush icache all csky: Optimize abiv2 copy_to_user_page with VM_EXEC csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860) csky: Remove unnecessary flush_icache_* implementation csky: Support icache flush without specific instructions csky/Kconfig: Add Kconfig.platforms to support some drivers csky/smp: Fixup boot failed when CONFIG_SMP csky: Set regs->usp to kernel sp, when the exception is from kernel csky/mm: Fixup export invalid_pte_table symbol csky: Separate fixaddr_init from highmem ...
2020-02-23KVM: nVMX: Check IO instruction VM-exit conditionsOliver Upton
Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution controls when checking instruction interception. If the 'use IO bitmaps' VM-execution control is 1, check the instruction access against the IO bitmaps to determine if the instruction causes a VM-exit. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Refactor IO bitmap checks into helper functionOliver Upton
Checks against the IO bitmap are useful for both instruction emulation and VM-exit reflection. Refactor the IO bitmap checks into a helper function. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Don't emulate instructions in guest modePaolo Bonzini
vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: stable@vger.kernel.org [Made commit, added commit msg - Oliver] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: fix error handling in svm_hardware_setupLi RongQing
rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>