summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-11-22spi: imx: add lpspi bus driverGao Pan
This patch adds lpspi driver to support new i.MX products which use lpspi instead of ecspi. The lpspi can continue operating in stop mode when an appropriate clock is available. It is also designed for low CPU overhead with DMA offloading of FIFO register accesses. Signed-off-by: Gao Pan <pandy.gao@nxp.com> Reviewed-by: Fugang Duan <B38611@freescale.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-11-22KVM: s390: handle floating point registers in the run ioctl not in vcpu_put/loadChristian Borntraeger
Right now we switch the host fprs/vrs in kvm_arch_vcpu_load and switch back in kvm_arch_vcpu_put. This process is already optimized since commit 9977e886cbbc7 ("s390/kernel: lazy restore fpu registers") avoiding double save/restores on schedule. We still reload the pointers and test the guest fpc on each context switch, though. We can minimize the cost of vcpu_load/put by doing the test in the VCPU_RUN ioctl itself. As most VCPU threads almost never exit to userspace in the common fast path, this allows to avoid this overhead for the common case (eventfd driven I/O, all exits including sleep handled in the kernel) - making kvm_arch_vcpu_load/put basically disappear in perf top. Also adapt the fpu get/set ioctls. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-11-22KVM: s390: handle access registers in the run ioctl not in vcpu_put/loadChristian Borntraeger
Right now we save the host access registers in kvm_arch_vcpu_load and load them in kvm_arch_vcpu_put. Vice versa for the guest access registers. On schedule this means, that we load/save access registers multiple times. e.g. VCPU_RUN with just one reschedule and then return does [from user space via VCPU_RUN] - save the host registers in kvm_arch_vcpu_load (via ioctl) - load the guest registers in kvm_arch_vcpu_load (via ioctl) - do guest stuff - decide to schedule/sleep - save the guest registers in kvm_arch_vcpu_put (via sched) - load the host registers in kvm_arch_vcpu_put (via sched) - save the host registers in switch_to (via sched) - schedule - return - load the host registers in switch_to (via sched) - save the host registers in kvm_arch_vcpu_load (via sched) - load the guest registers in kvm_arch_vcpu_load (via sched) - do guest stuff - decide to go to userspace - save the guest registers in kvm_arch_vcpu_put (via ioctl) - load the host registers in kvm_arch_vcpu_put (via ioctl) [back to user space] As the kernel does not use access registers, we can avoid this reloading and simply piggy back on switch_to (let it save the guest values instead of host values in thread.acrs) by moving the host/guest switch into the VCPU_RUN ioctl function. We now do [from user space via VCPU_RUN] - save the host registers in kvm_arch_vcpu_ioctl_run - load the guest registers in kvm_arch_vcpu_ioctl_run - do guest stuff - decide to schedule/sleep - save the guest registers in switch_to - schedule - return - load the guest registers in switch_to (via sched) - do guest stuff - decide to go to userspace - save the guest registers in kvm_arch_vcpu_ioctl_run - load the host registers in kvm_arch_vcpu_ioctl_run This seems to save about 10% of the vcpu_put/load functions according to perf. As vcpu_load no longer switches the acrs, We can also loading the acrs in kvm_arch_vcpu_ioctl_set_sregs. Suggested-by: Fan Zhang <zhangfan@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All conflicts were simple overlapping changes except perhaps for the Thunder driver. That driver has a change_mtu method explicitly for sending a message to the hardware. If that fails it returns an error. Normally a driver doesn't need an ndo_change_mtu method becuase those are usually just range changes, which are now handled generically. But since this extra operation is needed in the Thunder driver, it has to stay. However, if the message send fails we have to restore the original MTU before the change because the entire call chain expects that if an error is thrown by ndo_change_mtu then the MTU did not change. Therefore code is added to nicvf_change_mtu to remember the original MTU, and to restore it upon nicvf_update_hw_max_frs() failue. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22kvm: x86: don't print warning messages for unimplemented msrsBandan Das
Change unimplemented msrs messages to use pr_debug. If CONFIG_DYNAMIC_DEBUG is set, then these messages can be enabled at run time or else -DDEBUG can be used at compile time to enable them. These messages will still be printed if ignore_msrs=1. Signed-off-by: Bandan Das <bsd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22parisc: Fix printk continuations in system detectionHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>
2016-11-22spi: spi-fsl-dspi: Fix continuous selection formatSanchayan Maity
Current DMA implementation was not handling the continuous selection format viz. SPI chip select would be deasserted even between sequential serial transfers. Use existing dspi_data_to_pushr function to restructure the transmit code path and set or reset the CONT bit on same lines as code path in EOQ mode does. This correctly implements continuous selection format while also correcting and cleaning up the transmit code path. Signed-off-by: Sanchayan Maity <maitysanchayan@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-11-22spi: spi-fsl-dspi: Fix incorrect DMA setupSanchayan Maity
Currently dmaengine_prep_slave_single was being called with length set to the complete DMA buffer size. This resulted in unwanted bytes being transferred to the SPI register leading to clock and MOSI lines having unwanted data even after chip select got deasserted and the required bytes having been transferred. While at it also clean up the use of curr_xfer_len which is central to the DMA setup, from bytes to DMA transfers for every use. Signed-off-by: Sanchayan Maity <maitysanchayan@gmail.com> Reviewed-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-11-22spi: spi-fsl-dspi: Fix incorrect freeing of DMA allocated buffersSanchayan Maity
Buffers allocated with a call to dma_alloc_coherent should be freed with dma_free_coherent instead of the currently used devm_kfree. Signed-off-by: Sanchayan Maity <maitysanchayan@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-11-22spi: use sg_next for walking through the allocated scatterlist tableJuan Gutierrez
A null dereference or Oops exception might occurs when reading at once the whole content of an spi-nor of big enough size that requires an scatterlist table that does not fit into one single page. The spi_map_buf function is ignoring the chained sg case by dereferenceing the scatterlist elements in an array fashion. This wrongly assumes that the allocation of the scatterlist elements are contiguous. This is true as long as the scatterlist table fits within a PAGE_SIZE. However, for allocation where the scatter table is bigger than that, the pages allocated by sg_alloc might not be contigous. The sg table can be properly walked by sg_next instead of using an array. Signed-off-by: Juan Gutierrez <juan.gutierrez@nxp.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-11-22KVM: nVMX: invvpid handling improvementsJan Dakinevich
- Expose all invalidation types to the L1 - Reject invvpid instruction, if L1 passed zero vpid value to single context invalidations Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Tested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22KVM: VMX: clean up declaration of VPID/EPT invalidation typesJan Dakinevich
- Remove VMX_EPT_EXTENT_INDIVIDUAL_ADDR, since there is no such type of EPT invalidation - Add missing VPID types names Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Tested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22block,blkcg: use __GFP_NOWARN for best-effort allocations in blkcgTejun Heo
blkcg allocates some per-cgroup data structures with GFP_NOWAIT and when that fails falls back to operations which aren't specific to the cgroup. Occassional failures are expected under pressure and falling back to non-cgroup operation is the right thing to do. Unfortunately, I forgot to add __GFP_NOWARN to these allocations and these expected failures end up creating a lot of noise. Add __GFP_NOWARN. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Marc MERLIN <marc@merlins.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: remove unnecesary checkMing Lei
The check on bio->bi_vcnt doesn't make sense in erase_end_io(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: use bio_add_page() in do_erase()Ming Lei
Also code gets simplified a bit. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: use bio_add_page() in __bdev_writeseg()Ming Lei
Also this patch simplify the code a bit. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: convert to bio_add_page() in sync_request()Ming Lei
Always bio_add_page() is the standard and preferred way to do the task. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22bcache: debug: avoid accessing .bi_io_vec directlyMing Lei
Instead we use standard iterator way to do that. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22target: avoid accessing .bi_vcnt directlyMing Lei
When the bio is full, bio_add_pc_page() will return zero, so use this information tell when the bio is full. Also replace access to .bi_vcnt for pr_debug() with bio_segments(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block: floppy: use bio_add_page()Ming Lei
Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block: drbd: remove impossible failure handlingMing Lei
For a non-cloned bio, bio_add_page() only returns failure when the io vec table is full, but in that case, bio->bi_vcnt can't be zero at all. So remove the impossible failure handling. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block: bio: pass bvec table to bio_init()Ming Lei
Some drivers often use external bvec table, so introduce this helper for this case. It is always safe to access the bio->bi_io_vec in this way for this case. After converting to this usage, it will becomes a bit easier to evaluate the remaining direct access to bio->bi_io_vec, so it can help to prepare for the following multipage bvec support. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixed up the new O_DIRECT cases. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block_dev: get rid of blksize bits calculationJens Axboe
We store the bits in the bdev sector size locally, but we don't use the calculation anymore. All we do with it is shift it back up to the bdev sector size. So let's just use that directly and kill the variable and bits calculation. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22vgaarb: use valid dev pointer in vgaarb_info()Arnd Bergmann
We now pass the device to the debug messages, but on non-x86, this is an invalid pointer in vga_arb_device_init: drivers/gpu/vga/vgaarb.c: In function 'vga_arb_device_init': drivers/gpu/vga/vgaarb.c:1467:4: error: 'dev' may be used uninitialized in this function [-Werror=maybe-uninitialized] This moves the initialization of the dev pointer outside of the architecture #ifdef. Fixes: a75d68f62106 ("vgaarb: Use dev_printk() when possible") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161122143445.1896558-1-arnd@arndb.de
2016-11-22block_dev: Fixed direct I/O bio sector calculationDamien Le Moal
A direct I/O alignment must be always checked against the device blocks size, but the I/O offset (bio->bi_iter.bi_sector must always use 512B sector unit, and not the actual logical block size. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22marvell: mark mvneta and mvpp2 32-bit onlyArnd Bergmann
Both of these drivers won't work on 64-bit architectures unless they are redesigned, since they store a virtual address pointer in a 32-bit field of the descriptors: drivers/net/ethernet/marvell/mvneta_bm.c: In function 'mvneta_bm_construct': drivers/net/ethernet/marvell/mvneta_bm.c:103:16: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_prs_vlan_init': drivers/net/ethernet/marvell/mvpp2.c:2563:32: error: large integer implicitly truncated to unsigned type [-Werror=overflow] This limits the COMPILE_TEST option for the two drivers again to only build them on 32-bit. This seems nicer than shutting up the warnings, in case we ever actually want to use them on 64-bit, as the warnings indicate which parts of the driver are currently broken there. Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with COMPILE_TEST") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22Merge branch 'mlxsw-thermal-zone'David S. Miller
Jiri Pirko says: ==================== mlxsw: core: Implement thermal zone Implement thermal zone for mlxsw based HW. The first patch is just a register dependency for the second patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22mlxsw: core: Implement thermal zoneIvan Vecera
Implement thermal zone for mlxsw based HW. It uses temperature sensor provided by ASIC (the same as mlxsw hwmon interface) to report current temp to thermal core. The ASIC's PWM is then used to control speed of system fans registered as cooling devices. Signed-off-by: Ivan Vecera <cera@cera.cz> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22mlxsw: reg: Add Management Fan Speed Limit registerJiri Pirko
The MFSL register is used to configure the fan speed event / interrupt notification mechanism. Fan speed threshold are defined for both under-speed and over-speed. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22Merge branch 'mv88e6390-initial-support'David S. Miller
Andrew Lunn says: ==================== Start adding support for mv88e6390 This is the first patchset implementing support for the mv88e6390 family. This is a new generation of switch devices and has numerous incompatible changes to the registers. These patches allow the switch to the detected during probe, and makes the statistics unit work. These patches are insufficient to make the mv88e6390 functional. More patches will follow. v2: Move stats code into global1 Change DT compatible string to mv88e6190 Fixed mv88e6351 stats which v1 had broken ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Move g1 stats code in global1.[ch]Andrew Lunn
Move the stats functions which access global 1 registers into global1.c. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Implement mv88e6390 get_statsAndrew Lunn
The mv88e6390 uses a different bit to select between bank0 and bank1 of the statistics. So implement an ops function for this, and pass the selector bit to the generic stats read function. Also, the histogram selection has moved for the mv88e6390, so abstract its selection as well. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Add stats_get_stats to ops structureAndrew Lunn
Different families have different sets of statistics. Abstract this using a stats_get_stats op. The mv88e6390 needs a different implementation, which will be added later. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Add stats_get_sset_count|string to ops structureAndrew Lunn
Different families have different sets of statistics. Abstract this using a stats_get_sset_count and stats_get_strings op. Each stat has a bitmap, and the ops implementer uses a bit map mask to count the statistics which apply for the family, or return the list of strings. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v2: Rename functions to avoid _ prefix. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Add mv88e6390 statistics unit initAndrew Lunn
The statistics unit on the mv88e6390 needs the histogram mode to be configured in a different register compared to other devices. Add an ops to do this. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v2: Rename to mv88e6390_g1_stats_set_histogram Move into global1.c Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Add mv88e6390 stats snapshot operationAndrew Lunn
The MV88E6390 has a control register for what the histogram statistics actually contain. This means the stat_snapshot method should not set this information. So implement the 6390 stats_snapshot function without these bits. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Add comment about family a device belongs toAndrew Lunn
Knowing the family of device belongs to helps with picking the ops implementation which is appropriate to the device. So add a comment to each structure of ops. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Abstract stats_snapshot into ops structureAndrew Lunn
Taking a stats snapshot differs between same families. Abstract this into an ops member. At the same time, move the code into global1.[ch], since the registers are in the global1 range. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Add the mv88e6390 familyAndrew Lunn
With the devices added to the tables, the probe will recognize the switch. This however is not sufficient to make it work properly, other changes are needed because of incompatibilities. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Fix unused variable warning by using variableAndrew Lunn
_mv88e6xxx_stats_wait() did not check the return value from mv88e6xxx_g1_read(), so the compiler complained about set but unused err. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22net: dsa: mv88e6xxx: Take switch out of reset before probeAndrew Lunn
The switch needs to be taken out of reset before we can read its ID register on the MDIO bus. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22drm/arm: hdlcd: fix plane base address updateRussell King
While testing HDMI with Xorg on the Juno board, I find that when Xorg starts up or shuts down, the display is shifted significantly to the right and wrapped in the active region. (No sync bars are visible.) The timings are correct, it behaves as if the start address has been shifted many pixels _into_ the framebuffer. This occurs whenever the display mode size is changed - using xrandr in Xorg shows that changing the resolution triggers the problem almost every time, but changing the refresh rate does not. Using devmem2 to disable and re-enable the HDLCD resolves the issue, and repeated disable/enable cycles do not make the issue re-appear. Further debugging shows that we try to update the controller configuration while enabled. Alwys ensure that the HDLCD is disabled prior to updating the controller timings, and use drm_crtc_vblank_off()/drm_crtc_vblank_on() so that DRM knows whether it can expect vblank interrupts. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
2016-11-22ARM: gr8: Rename the DTSI and relevant DTSMaxime Ripard
Reviews have found that sun5i was a better prefix after all for the GR8. Rename the relevant device trees before it's too late. Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2016-11-22kvm: x86: CPUID.01H:EDX.APIC[bit 9] should mirror IA32_APIC_BASE[11]Jim Mattson
From the Intel SDM, volume 3, section 10.4.3, "Enabling or Disabling the Local APIC," When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent to an IA-32 processor without an on-chip APIC. The CPUID feature flag for the APIC (see Section 10.4.2, "Presence of the Local APIC") is also set to 0. Signed-off-by: Jim Mattson <jmattson@google.com> [Changed subject tag from nVMX to x86.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22Merge tag 'ib-mfd-arm-leds-v4.10' of ↵Jacek Anaszewski
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into linux-leds/for-next Pull PM8XXX namespace cleanup from Lee Jones. * tag 'ib-mfd-arm-leds-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: qcom-pm8xxx: Clean up PM8XXX namespace
2016-11-22x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()Peter Zijlstra
Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted() when a paravirt spinlock enabled kernel is ran on native hardware. Do this by patching out the CALL instruction with "XOR %RAX,%RAX" which has the same effect (0 return value). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22locking/mutex: Break out of expensive busy-loop on ↵Pan Xinhui
{mutex,rwsem}_spin_on_owner() when owner vCPU is preempted An over-committed guest with more vCPUs than pCPUs has a heavy overload in the two spin_on_owner. This blames on the lock holder preemption issue. Break out of the loop if the vCPU is preempted: if vcpu_is_preempted(cpu) is true. test-case: perf record -a perf bench sched messaging -g 400 -p && perf report before patch: 20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner 8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock 4.12% sched-messaging [kernel.vmlinux] [k] system_call 3.01% sched-messaging [kernel.vmlinux] [k] system_call_common 2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7 2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner 2.00% sched-messaging [kernel.vmlinux] [k] osq_lock after patch: 9.99% sched-messaging [kernel.vmlinux] [k] mutex_unlock 5.28% sched-messaging [unknown] [H] 0xc0000000000768e0 4.27% sched-messaging [kernel.vmlinux] [k] __copy_tofrom_user_power7 3.77% sched-messaging [kernel.vmlinux] [k] copypage_power7 3.24% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq 3.02% sched-messaging [kernel.vmlinux] [k] system_call 2.69% sched-messaging [kernel.vmlinux] [k] wait_consider_task Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-4-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU ↵Pan Xinhui
in osq_lock() An over-committed guest with more vCPUs than pCPUs has a heavy overload in osq_lock(). This is because if vCPU-A holds the osq lock and yields out, vCPU-B ends up waiting for per_cpu node->locked to be set. IOW, vCPU-B waits for vCPU-A to run and unlock the osq lock. Use the new vcpu_is_preempted(cpu) interface to detect if a vCPU is currently running or not, and break out of the spin-loop if so. test case: $ perf record -a perf bench sched messaging -g 400 -p && perf report before patch: 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is 2.49% sched-messaging [kernel.vmlinux] [k] system_call after patch: 20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner 8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock 4.12% sched-messaging [kernel.vmlinux] [k] system_call 3.01% sched-messaging [kernel.vmlinux] [k] system_call_common 2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7 2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner 2.00% sched-messaging [kernel.vmlinux] [k] osq_lock Suggested-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-3-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Translated to English. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22Documentation/virtual/kvm: Support the vCPU preemption checkPan Xinhui
Commit ("x86/kvm: support vCPU preemption check") added a new struct kvm_steal_time::preempted field. This field tells us if a vCPU is running or not. It is zero if some old KVM does not support this field or if the vCPU is not preempted. Other values means the vCPU has been preempted. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-12-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Various typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22x86/xen: Support the vCPU preemption checkJuergen Gross
Support the vcpu_is_preempted() functionality under Xen. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. A quick test (4 vCPUs on 1 physical CPU doing a parallel build job with "make -j 8") reduced system time by about 5% with this patch. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-11-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>