Age | Commit message (Collapse) | Author |
|
No point to have the sysfs files around before the cpu is online and no
point to have them around until the cpu is dead. Get rid of the explicit
state.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
|
|
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linuxtronix.de
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161117183541.8588-2-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
We accidentally allocate sizeof(u32) instead of sizeof(struct
be_cmd_get_session_resp).
Fixes: 50a4b824be9e ("scsi: be2iscsi: Fix to make boot discovery non-blocking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
While issuing any ATA passthrough command to firmware the driver will
block the device. But it will unblock the device only if the I/O
completes through the ISR path. If a controller reset occurs before
command completion the device will remain in blocked state.
Make sure we unblock the device following a controller reset if an ATA
passthrough command was queued.
[mkp: clarified patch description]
Cc: <stable@vger.kernel.org> # v4.4+
Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset")
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Older controllers use SCSI target id '0' for the first internal disk. As
the controllers are now placed on the same bus as the internal disks
this leads to a clash with the SCSI target id of controller. This patch
checks the SCSI revision, and moves older controller to bus '3' to be
compatible with older releases and avoid this problem.
[mkp: fixed uninitialized variable]
Fixes: 09371d623c9 ("hpsa: Change SAS transport devices to bus 0.")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fix from Zhang Rui:
"We only have one urgent fix this time.
Commit 3105f234e0ab ("thermal/powerclamp: correct cpu support check"),
which is shipped in 4.9-rc3, fixed a problem introduced by commit
b721ca0d1927 ("thermal/powerclamp: remove cpu whitelist").
But unfortunately, it broke intel_powerclamp driver module auto-
loading at the same time. Thus we need this change to add back module
auto-loading for 4.9"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal/powerclamp: add back module device table
|
|
The hci_get_route() API is used to look up local HCI devices, however
so far it has been incapable of dealing with anything else than the
public address of HCI devices. This completely breaks with LE-only HCI
devices that do not come with a public address, but use a static
random address instead.
This patch exteds the hci_get_route() API with a src_type parameter
that's used for comparing with the right address of each HCI device.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes.
One prevents timeouts on mpt3sas when trying to use the secure erase
protocol which causes the erase protocol to be aborted. The second is
a regression in a prior fix which causes all commands to abort during
PCI extended error recovery, which is incorrect because PCI EEH is
independent from what's happening on the FC transport"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: do not abort all commands in the adapter during EEH recovery
scsi: mpt3sas: Fix secure erase premature termination
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of driver fixes.
The sunxi fixes are for an incorrect clk tree configuration and a bad
frequency calculation. The other two are fixes for passing the wrong
pointer in drivers recently converted to clk_hw style registration"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: efm32gg: Pass correct type to hw provider registration
clk: berlin: Pass correct type to hw provider registration
clk: sunxi: Fix M factor computation for APB1
clk: sunxi-ng: sun6i-a31: Force AHB1 clock to use PLL6 as parent
|
|
A correct bugfix introduced a harmless warning that shows up with gcc-7:
fs/nfs/callback.c: In function 'nfs_callback_up':
fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]
What happens here is that the 'minorversion == 0' check tells the
compiler that we assume minorversion can be something other than 0,
but when CONFIG_NFS_V4_1 is disabled that would be invalid and
result in an out-of-bounds access.
The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
really can't happen, which makes the code slightly smaller and also
avoids the warning.
The bugfix that introduced the warning is marked for stable backports,
we want this one backported to the same releases.
Fixes: 98b0f80c2396 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net")
Cc: stable@vger.kernel.org # v3.7+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes for autogroup scheduling, for races when turning the feature
on/off via /proc/sys/kernel/sched_autogroup_enabled"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/autogroup: Do not use autogroup->tg in zombie threads
sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- two fixes to make (very) old Intel CPUs boot reliably
- fix the intel-mid driver and rename it
- two KASAN false positive fixes
- an FPU fix
- two sysfb fixes
- two build fixes related to new toolchain versions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
x86/platform/intel-mid: Register watchdog device after SCU
x86/fpu: Fix invalid FPU ptrace state after execve()
x86/boot: Fail the boot if !M486 and CPUID is missing
x86/traps: Ignore high word of regs->cs in early_fixup_exception()
x86/dumpstack: Prevent KASAN false positive warnings
x86/unwind: Prevent KASAN false positive warnings in guess unwinder
x86/boot: Avoid warning for zero-filling .bss
x86/sysfb: Fix lfb_size calculation
x86/sysfb: Add support for 64bit EFI lfb_base
|
|
Andre Noll reported panics after my recent fix (commit 34fad54c2537
"net: __skb_flow_dissect() must cap its return value")
After some more headaches, Alexander root caused the problem to
init_default_flow_dissectors() being called too late, in case
a network driver like IGB is not a module and receives DHCP message
very early.
Fix is to call init_default_flow_dissectors() much earlier,
as it is a core infrastructure and does not depend on another
kernel service.
Fixes: 06635a35d13d4 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andre Noll <maan@tuebingen.mpg.de>
Diagnosed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All conflicts were simple overlapping changes except perhaps
for the Thunder driver.
That driver has a change_mtu method explicitly for sending
a message to the hardware. If that fails it returns an
error.
Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.
However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Both of these drivers won't work on 64-bit architectures unless they
are redesigned, since they store a virtual address pointer in a 32-bit
field of the descriptors:
drivers/net/ethernet/marvell/mvneta_bm.c: In function 'mvneta_bm_construct':
drivers/net/ethernet/marvell/mvneta_bm.c:103:16: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_prs_vlan_init':
drivers/net/ethernet/marvell/mvpp2.c:2563:32: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
This limits the COMPILE_TEST option for the two drivers again to
only build them on 32-bit. This seems nicer than shutting up the
warnings, in case we ever actually want to use them on 64-bit,
as the warnings indicate which parts of the driver are currently
broken there.
Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jiri Pirko says:
====================
mlxsw: core: Implement thermal zone
Implement thermal zone for mlxsw based HW.
The first patch is just a register dependency for the second patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MFSL register is used to configure the fan speed event / interrupt
notification mechanism. Fan speed threshold are defined for both
under-speed and over-speed.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrew Lunn says:
====================
Start adding support for mv88e6390
This is the first patchset implementing support for the mv88e6390
family. This is a new generation of switch devices and has numerous
incompatible changes to the registers. These patches allow the switch
to the detected during probe, and makes the statistics unit work.
These patches are insufficient to make the mv88e6390 functional. More
patches will follow.
v2:
Move stats code into global1
Change DT compatible string to mv88e6190
Fixed mv88e6351 stats which v1 had broken
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the stats functions which access global 1 registers into
global1.c.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mv88e6390 uses a different bit to select between bank0 and bank1
of the statistics. So implement an ops function for this, and pass the
selector bit to the generic stats read function. Also, the histogram
selection has moved for the mv88e6390, so abstract its selection as
well.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Different families have different sets of statistics. Abstract this
using a stats_get_stats op. The mv88e6390 needs a different
implementation, which will be added later.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Different families have different sets of statistics. Abstract this
using a stats_get_sset_count and stats_get_strings op. Each stat has a
bitmap, and the ops implementer uses a bit map mask to count the
statistics which apply for the family, or return the list of strings.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
Rename functions to avoid _ prefix.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The statistics unit on the mv88e6390 needs the histogram mode to be
configured in a different register compared to other devices. Add an
ops to do this.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
Rename to mv88e6390_g1_stats_set_histogram
Move into global1.c
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MV88E6390 has a control register for what the histogram statistics
actually contain. This means the stat_snapshot method should not set
this information. So implement the 6390 stats_snapshot function without
these bits.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Knowing the family of device belongs to helps with picking the ops
implementation which is appropriate to the device. So add a comment to
each structure of ops.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Taking a stats snapshot differs between same families. Abstract this
into an ops member. At the same time, move the code into global1.[ch],
since the registers are in the global1 range.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the devices added to the tables, the probe will recognize the
switch. This however is not sufficient to make it work properly, other
changes are needed because of incompatibilities.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
_mv88e6xxx_stats_wait() did not check the return value from
mv88e6xxx_g1_read(), so the compiler complained about set but unused
err.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The switch needs to be taken out of reset before we can read its ID
register on the MDIO bus.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While testing HDMI with Xorg on the Juno board, I find that when Xorg
starts up or shuts down, the display is shifted significantly to the
right and wrapped in the active region. (No sync bars are visible.)
The timings are correct, it behaves as if the start address has been
shifted many pixels _into_ the framebuffer.
This occurs whenever the display mode size is changed - using xrandr
in Xorg shows that changing the resolution triggers the problem
almost every time, but changing the refresh rate does not.
Using devmem2 to disable and re-enable the HDLCD resolves the issue,
and repeated disable/enable cycles do not make the issue re-appear.
Further debugging shows that we try to update the controller
configuration while enabled.
Alwys ensure that the HDLCD is disabled prior to updating the
controller timings, and use drm_crtc_vblank_off()/drm_crtc_vblank_on()
so that DRM knows whether it can expect vblank interrupts.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
Reviews have found that sun5i was a better prefix after all for the GR8.
Rename the relevant device trees before it's too late.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
|
|
Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted()
when a paravirt spinlock enabled kernel is ran on native hardware.
Do this by patching out the CALL instruction with "XOR %RAX,%RAX"
which has the same effect (0 return value).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
{mutex,rwsem}_spin_on_owner() when owner vCPU is preempted
An over-committed guest with more vCPUs than pCPUs has a heavy overload
in the two spin_on_owner. This blames on the lock holder preemption
issue.
Break out of the loop if the vCPU is preempted: if vcpu_is_preempted(cpu)
is true.
test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report
before patch:
20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner
8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock
4.12% sched-messaging [kernel.vmlinux] [k] system_call
3.01% sched-messaging [kernel.vmlinux] [k] system_call_common
2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7
2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
2.00% sched-messaging [kernel.vmlinux] [k] osq_lock
after patch:
9.99% sched-messaging [kernel.vmlinux] [k] mutex_unlock
5.28% sched-messaging [unknown] [H] 0xc0000000000768e0
4.27% sched-messaging [kernel.vmlinux] [k] __copy_tofrom_user_power7
3.77% sched-messaging [kernel.vmlinux] [k] copypage_power7
3.24% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
3.02% sched-messaging [kernel.vmlinux] [k] system_call
2.69% sched-messaging [kernel.vmlinux] [k] wait_consider_task
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-4-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
in osq_lock()
An over-committed guest with more vCPUs than pCPUs has a heavy overload
in osq_lock().
This is because if vCPU-A holds the osq lock and yields out, vCPU-B ends
up waiting for per_cpu node->locked to be set. IOW, vCPU-B waits for
vCPU-A to run and unlock the osq lock.
Use the new vcpu_is_preempted(cpu) interface to detect if a vCPU is
currently running or not, and break out of the spin-loop if so.
test case:
$ perf record -a perf bench sched messaging -g 400 -p && perf report
before patch:
18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
2.49% sched-messaging [kernel.vmlinux] [k] system_call
after patch:
20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner
8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock
4.12% sched-messaging [kernel.vmlinux] [k] system_call
3.01% sched-messaging [kernel.vmlinux] [k] system_call_common
2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7
2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
2.00% sched-messaging [kernel.vmlinux] [k] osq_lock
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-3-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Translated to English. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit ("x86/kvm: support vCPU preemption check") added a new
struct kvm_steal_time::preempted field. This field tells us if
a vCPU is running or not.
It is zero if some old KVM does not support this field or if the vCPU
is not preempted. Other values means the vCPU has been preempted.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-12-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Various typo fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Support the vcpu_is_preempted() functionality under Xen. This will
enhance lock performance on overcommitted hosts (more runnable vCPUs
than physical CPUs in the system) as doing busy waits for preempted
vCPUs will hurt system performance far worse than early yielding.
A quick test (4 vCPUs on 1 physical CPU doing a parallel build job
with "make -j 8") reduced system time by about 5% with this patch.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-11-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Support the vcpu_is_preempted() functionality under KVM. This will
enhance lock performance on overcommitted hosts (more runnable vCPUs
than physical CPUs in the system) as doing busy waits for preempted
vCPUs will hurt system performance far worse than early yielding.
struct kvm_steal_time::preempted indicates that if one vCPU is running or
not after commit "x86, kvm/x86.c: support vCPU preempted check".
unix benchmark result:
host: kernel 4.8.1, i5-4570, 4 cpus
guest: kernel 4.8.1, 8 vcpus
test-case after-patch before-patch
Execl Throughput | 18307.9 lps | 11701.6 lps
File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
Pipe Throughput | 11872208.7 lps | 11855628.9 lps
Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
Process Creation | 29881.2 lps | 28572.8 lps
Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
System Call Overhead | 10385653.0 lps | 10419979.0 lps
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-10-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Support the vcpu_is_preempted() functionality under KVM. This will
enhance lock performance on overcommitted hosts (more runnable vCPUs
than physical CPUs in the system) as doing busy waits for preempted
vCPUs will hurt system performance far worse than early yielding.
Use struct kvm_steal_time::preempted to indicate that if a vCPU
is running or not.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-9-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Typo fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It allows us to update some status or field of a structure partially.
We can also save a kvm_read_guest_cached() call if we just update one
fild of the struct regardless of its current value.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Typo fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
guests
Optimize spinlock and mutex busy-loops by providing a vcpu_is_preempted(cpu)
function on KVM and Xen platforms.
Extend the pv_lock_ops interface accordingly and implement the callbacks
on KVM and Xen.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Translated to English. ]
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-7-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This implements the s390 version for vcpu_is_preempted(cpu),
by reworking the existing smp_vcpu_scheduled() function into
arch_vcpu_is_preempted().
We can then also get rid of the local cpu_is_preempted()
function by moving the CIF_ENABLED_WAIT test into
arch_vcpu_is_preempted().
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-6-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Optimize spinlock and mutex busy-loops by providing a vcpu_is_preempted(cpu)
function on pSeries. We do not support PowerNV.
All this can be achieved by using lppaca->yield_count, which is zero on PowerNV.
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-5-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch is the first step to add support to improve lock holder
preemption beaviour.
vcpu_is_preempted(cpu) does the obvious thing: it tells us whether a
vCPU is preempted or not.
Defaults to false on architectures that don't support it.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Translated the changelog to English. ]
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-2-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Group validation expects all events to be of the same PMU; however
is_uncore_pmu() is too wide, it matches _all_ uncore events, even
across PMUs.
This triggers failure when we group different events from different
uncore PMUs, like:
perf stat -vv -e '{uncore_cbox_0/config=0x0334/,uncore_qpi_0/event=1/}' -a sleep 1
Fix is_uncore_pmu() by only matching events to the box at hand.
Note that generic code; ran after this step; will disallow this
mixture of PMU events.
Reported-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20161118125354.GQ3117@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event
unwinds sometimes do 'weird' things. In particular, we seemed to be
ending up unwinding from random places on the NMI stack.
While it was somewhat expected that the event record BP,SP would not
match the interrupt BP,SP in that the interrupt is strictly later than
the record event, it was overlooked that it could be on an already
overwritten stack.
Therefore, don't copy the recorded BP,SP over the interrupted BP,SP
when we need stack unwinds.
Note that its still possible the unwind doesn't full match the actual
event, as its entirely possible to have done an (I)RET between record
and interrupt, but on average it should still point in the general
direction of where the event came from. Also, it's the best we can do,
considering.
The particular scenario that triggered the bogus NMI stack unwind was
a PEBS event with very short period, upon enabling the event at the
tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly
triggers a record (while still on the NMI stack) which in turn
triggers the next PMI. This then causes back-to-back NMIs and we'll
try and unwind the stack-frame from the last NMI, which obviously is
now overwritten by our own.
Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk>
Cc: dvyukov@google.com <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: ca037701a025 ("perf, x86: Add PEBS infrastructure")
Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The following commit:
75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")
... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
access_ok() check.
Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
whereas the access_ok() uses whatever the current address limit of the task is.
We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
then see vmalloc faults when we access what looks like pointers into vmalloc
space:
[] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290
[] CPU: 3 PID: 3685731 Comm: sh Tainted: G W 4.6.0-5_fbk1_223_gdbf0f40 #1
[] Call Trace:
[] <NMI> [<ffffffff814717d1>] dump_stack+0x4d/0x6c
[] [<ffffffff81076e43>] __warn+0xd3/0xf0
[] [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20
[] [<ffffffff8104a899>] vmalloc_fault+0x289/0x290
[] [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490
[] [<ffffffff8104b70c>] do_page_fault+0xc/0x10
[] [<ffffffff81794e82>] page_fault+0x22/0x30
[] [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0
[] [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190
[] [<ffffffff811512c7>] perf_callchain+0x67/0x80
[] [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370
[] [<ffffffff8114e840>] perf_event_output+0x20/0x60
[] [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130
[] [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0
[] [<ffffffff8114f484>] perf_event_overflow+0x14/0x20
[] [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0
[] [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20
[] [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0
[] [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20
[] [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50
[] [<ffffffff8101ea31>] nmi_handle+0x61/0x110
[] [<ffffffff8101ef94>] default_do_nmi+0x44/0x110
[] [<ffffffff8101f13b>] do_nmi+0xdb/0x150
[] [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] <<EOE>> <IRQ> [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0
Fix this by moving the valid_user_frame() check to before the uaccess
that loads the return address and the pointer to the next frame.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Fixes: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|