Age | Commit message (Collapse) | Author |
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Almost all architectures include it. Add a ARCH_NO_PREEMPT symbol to
disable preempt support for alpha, hexagon, non-coldfire m68k and
user mode Linux.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Move the source of lib/Kconfig.debug and arch/$(ARCH)/Kconfig.debug to
the top-level Kconfig. For two architectures that means moving their
arch-specific symbols in that menu into a new arch Kconfig.debug file,
and for a few more creating a dummy file so that we can include it
unconditionally.
Also move the actual 'Kernel hacking' menu to lib/Kconfig.debug, where
it belongs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Instead of duplicating the source statements in every architecture just
do it once in the toplevel Kconfig file.
Note that with this the inclusion of arch/$(SRCARCH/Kconfig moves out of
the top-level Kconfig into arch/Kconfig so that don't violate ordering
constraits while keeping a sensible menu structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
With the optimizations for TLB invalidation from commit 0cef77c7798a
("powerpc/64s/radix: flush remote CPUs out of single-threaded
mm_cpumask"), the scope of a TLBI (global vs. local) can now be
influenced by the value of the 'copros' counter of the memory context.
When calling mm_context_remove_copro(), the 'copros' counter is
decremented first before flushing. It may have the unintended side
effect of sending local TLBIs when we explicitly need global
invalidations in this case. Thus breaking any nMMU user in a bad and
unpredictable way.
Fix it by flushing first, before updating the 'copros' counter, so
that invalidations will be global.
Fixes: 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.
When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.
is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.
A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.
Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master(). As these fields are not
handled atomically, that clears the is_added bit.
Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state. This avoids the
race because is_added no longer shares a memory location with is_busmaster.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
EEH recovery currently fails on pSeries for some IOV capable PCI
devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
certain device tree properties for the device. (Found on an IOV
capable device using the ipr driver.)
Recovery fails in pci_enable_resources() at the check on r->parent,
because r->flags is set and r->parent is not. This state is due to
sriov_init() setting the start, end and flags members of the IOV BARs
but the parent not being set later in
pseries_pci_fixup_iov_resources(), because the
"ibm,open-sriov-vf-bar-info" property is missing.
Correct this by zeroing the resource flags for IOV BARs when they
can't be configured (this is the same method used by sriov_init() and
__pci_read_base()).
VFs cleared this way can't be enabled later, because that requires
another device tree property, "ibm,number-of-configurable-vfs" as well
as support for the RTAS function "ibm_map_pes". These are all part of
hypervisor support for IOV and it seems unlikely that a hypervisor
would ever partially, but not fully, support it. (None are currently
provided by QEMU/KVM.)
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Bryant G. Ly <bryantly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
There is nothing arch-specific about PCI or dma-debug, so call
dma_debug_add_bus() from the PCI core just after registering the bus type.
Most of dma-debug is already generic; this just adds reporting of pending
dma-allocations on driver unload for arches other than powerpc, sh, and
x86.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
|
|
mmu_init_secondary() calls ppc44x_pin_tlb() which is marked __init,
leading to a warning:
The function mmu_init_secondary() references
the function __init ppc44x_pin_tlb().
There's no CPU hotplug support on 44x so mmu_init_secondary() will
only be called at boot. Therefore we should mark it as __init.
Signed-off-by: Alexey Spirkov <alexeis@astrosoft.ru>
[mpe: Flesh out change log details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Paul Menzel reported that kmemleak was producing reports such as:
unreferenced object 0xc0000000f8b80000 (size 16384):
comm "init", pid 1, jiffies 4294937416 (age 312.240s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d997deb7>] __pud_alloc+0x80/0x190
[<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0
[<00000000091e51c2>] shift_arg_pages+0xc0/0x210
[<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0
[<0000000060871529>] load_elf_binary+0x41c/0x1648
[<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280
[<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
[<000000005f953a6e>] sys_execve+0x58/0x70
[<000000009700a858>] system_call+0x5c/0x70
Indicating that a PUD was being leaked.
However what's really happening is that kmemleak is not able to
recognise the references from the PGD to the PUD, because they are not
fully qualified pointers.
We can confirm that in xmon, eg:
Find the task struct for pid 1 "init":
0:mon> P
task_struct ->thread.ksp PID PPID S P CMD
c0000001fe7c0000 c0000001fe803960 1 0 S 13 systemd
Dump virtual address 0 to find the PGD:
0:mon> dv 0 c0000001fe7c0000
pgd @ 0xc0000000f8b01000
Dump the memory of the PGD:
0:mon> d c0000000f8b01000
c0000000f8b01000 00000000f8b90000 0000000000000000 |................|
c0000000f8b01010 0000000000000000 0000000000000000 |................|
c0000000f8b01020 0000000000000000 0000000000000000 |................|
c0000000f8b01030 0000000000000000 00000000f8b80000 |................|
^^^^^^^^^^^^^^^^
There we can see the reference to our supposedly leaked PUD. But
because it's missing the leading 0xc, kmemleak won't recognise it.
We can confirm it's still in use by translating an address that is
mapped via it:
0:mon> dv 7fff94000000 c0000001fe7c0000
pgd @ 0xc0000000f8b01000
pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <--
pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000
pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000
ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386
Maps physical address = 0x00000001d5ce0000
Flags = Accessed Dirty Read Write
The fix is fairly simple. We need to tell kmemleak to ignore PUD
allocations and never report them as leaks. We can also tell it not to
scan the PGD, because it will never find pointers in there. However it
will still notice if we allocate a PGD and then leak it.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Split asm/tlbflush.h into:
asm/nohash/tlbflush.h
asm/book3s/32/tlbflush.h
asm/book3s/64/tlbflush.h (already existing)
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
asm/tlbflush.h is only needed for:
- using functions xxx_flush_tlb_xxx()
- using MMU_NO_CONTEXT
- including asm-generic/pgtable.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
mmu-44x.h doesn't need asm/page.h if PAGE_SHIFT are replaced by CONFIG_PPC_XX_PAGES
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Remove superflous includes and add missing ones
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
PPC_PIN_SIZE is specific to the 44x and is defined in mmu.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
set_breakpoint() is only used in process.c so make it static
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Files not using fixmap consts or functions don't need asm/fixmap.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Only include linux/stringify.h is files using __stringify()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Files not using cpu_has_feature() don't need cpu_has_feature.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
page.h doesn't need kdump.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Commit 1e175d2 ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full
VCPU ID space", 2018-07-25) added code that uses kvm->arch.emul_smt_mode
before any VCPUs are created. However, userspace can change
kvm->arch.emul_smt_mode at any time up until the first VCPU is created.
Hence it is (theoretically) possible for the check in
kvmppc_core_vcpu_create_hv() to race with another userspace thread
changing kvm->arch.emul_smt_mode.
This fixes it by moving the test that uses kvm->arch.emul_smt_mode into
the block where kvm->lock is held.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Commit 1e175d2 ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full
VCPU ID space", 2018-07-25) allowed use of VCPU IDs up to
KVM_MAX_VCPU_ID on POWER9 in all guest SMT modes and guest emulated
hardware SMT modes. However, with the current definition of
KVM_MAX_VCPU_ID, a guest SMT mode of 1 and an emulated SMT mode of 8,
it is only possible to create KVM_MAX_VCPUS / 2 VCPUS, because
threads_per_subcore is 4 on POWER9 CPUs. (Using an emulated SMT mode
of 8 is useful when migrating VMs to or from POWER8 hosts.)
This increases KVM_MAX_VCPU_ID to 8 * KVM_MAX_VCPUS when HV KVM is
configured in, so that a full complement of KVM_MAX_VCPUS VCPUs can
be created on POWER9 in all guest SMT modes and emulated hardware
SMT modes.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
It is not currently possible to create the full number of possible
VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fewer
threads per core than its core stride (or "VSMT mode"). This is
because the VCORE ID and XIVE offsets grow beyond KVM_MAX_VCPUS
even though the VCPU ID is less than KVM_MAX_VCPU_ID.
To address this, "pack" the VCORE ID and XIVE offsets by using
knowledge of the way the VCPU IDs will be used when there are fewer
guest threads per core than the core stride. The primary thread of
each core will always be used first. Then, if the guest uses more than
one thread per core, these secondary threads will sequentially follow
the primary in each core.
So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
VCPUs are being spaced apart, so at least half of each core is empty,
and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
into the second half of each core (4..7, in an 8-thread core).
Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
each core is being left empty, and we can map down into the second and
third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
threads are being used and 7/8 of the core is empty, allowing use of
the 1, 5, 3 and 7 thread slots.
(Strides less than 8 are handled similarly.)
This allows the VCORE ID or offset to be calculated quickly from the
VCPU ID or XIVE server numbers, without access to the VCPU structure.
[paulus@ozlabs.org - tidied up comment a little, changed some WARN_ONCE
to pr_devel, wrapped line, fixed id check.]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Currently architectures can override __atomic_op_*() to define the barriers
used before/after a relaxed atomic when used to build acquire/release/fence
variants.
This has the unfortunate property of requiring the architecture to define the
full wrapper for the atomics, rather than just the barriers they care about,
and gets in the way of generating atomics which can be easily read.
Instead, this patch has architectures define an optional set of barriers:
* __atomic_acquire_fence()
* __atomic_release_fence()
* __atomic_pre_full_fence()
* __atomic_post_full_fence()
... which <linux/atomic.h> uses to build the wrappers.
It would be nice if we could undef these, along with the __atomic_op_*()
wrappers, but that would break the cmpxchg() wrappers, which are written
in preprocessor. Undefs would have been nice, but alas.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andy.shevchenko@gmail.com
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: catalin.marinas@arm.com
Cc: dvyukov@google.com
Cc: glider@google.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/20180716113017.3909-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
Pull networking fixes from David Miller:
1) Handle stations tied to AP_VLANs properly during mac80211 hw
reconfig. From Manikanta Pubbisetty.
2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.
3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
Ben Elisha.
4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.
5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.
6) Fix cached netdev name leak in nf_tables, from Florian Westphal.
7) Fix memory leaks on chain rename, also from Florian Westphal.
8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
Cheng.
9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.
10) Fix link local address handling with VRF, from David Ahern.
11) Don't clobber 'err' on a successful call to __skb_linearize() in
skb_segment(). From Eric Dumazet.
12) Fix vxlan fdb notification races, from Roopa Prabhu.
13) Hash UDP fragments consistently, from Paolo Abeni.
14) If TCP receives lots of out of order tiny packets, we do really
silly stuff. Make the out-of-order queue ending more robust to this
kind of behavior, from Eric Dumazet.
15) Don't leak netlink dump state in nf_tables, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: axienet: Fix double deregister of mdio
qmi_wwan: fix interface number for DW5821e production firmware
ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
bnx2x: Fix invalid memory access in rss hash config path.
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
r8169: restore previous behavior to accept BIOS WoL settings
cfg80211: never ignore user regulatory hint
sock: fix sg page frag coalescing in sk_alloc_sg
netfilter: nf_tables: move dumper state allocation into ->start
tcp: add tcp_ooo_try_coalesce() helper
tcp: call tcp_drop() from tcp_data_queue_ofo()
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
tcp: avoid collapses in tcp_prune_queue() if possible
tcp: free batches of packets in tcp_prune_ofo_queue()
ip: hash fragments consistently
ipv6: use fib6_info_hold_safe() when necessary
can: xilinx_can: fix power management handling
can: xilinx_can: fix incorrect clear of non-processed interrupts
can: xilinx_can: fix RX overflow interrupt not being enabled
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
...
|
|
The RAW console does not need writes to be atomic, so relax
opal_put_chars to be able to do partial writes, and implement an
_atomic variant which does not take a spinlock. This API is used
in xmon, so the less locking that is used, the better chance there
is that a crash can be debugged.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
OPAL console writes do not have to synchronously flush firmware /
hardware buffers unless they are going through the udbg path.
Remove the unconditional flushing from opal_put_chars. Flush if
there was no space in the buffer as an optimisation (callers loop
waiting for success in that case). udbg flushing is moved to
udbg_opal_putc.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
opal_put_chars deals with partial writes because in OPALv1,
opal_console_write_buffer_space did not work correctly. That firmware
is not supported.
This reworks the opal_put_chars code to no longer deal with partial
writes by turning them into full writes. Partial write handling is still
supported in terms of what gets returned to the caller, but it may not
go to the console atomically. A warning message is printed in this
case.
This allows console flushing to be moved out of the opal_write_lock
spinlock. That could cause the lock to be held for long periods if the
console is busy (especially if it was being spammed by firmware),
which is dangerous because the lock is taken by xmon to debug the
system. Flushing outside the lock improves the situation a bit.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
A new console flushing firmware API was introduced to replace event
polling loops, and implemented in opal-kmsg with affddff69c55e
("powerpc/powernv: Add a kmsg_dumper that flushes console output on
panic"), to flush the console in the panic path.
The OPAL console driver has other situations where interrupts are off
and it needs to flush the console synchronously. These still use a
polling loop.
So move the opal-kmsg flush code to opal_flush_console, and use the
new function in opal-kmsg and opal_put_chars.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Use the more refined and tested event polling loop from opal_put_chars
as the fallback console flush in the opal-kmsg path. This loop is used
by the console driver today, whereas the opal-kmsg fallback is not
likely to have been used for years.
Use WARN_ONCE rather than a printk when the fallback is invoked to
prepare for moving the console flush into a common function.
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY,
so implement the standard OPAL_BUSY handling for it.
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The OPAL console driver does not delay in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware.
It can't yet be made to sleep because it is called under spinlock,
but it can be changed to the standard OPAL_BUSY loop form, and a
delay added to keep it from hitting the firmware too frequently.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The intention here is to consume and discard the remaining buffer
upon error. This works if there has not been a previous partial write.
If there has been, then total_len is no longer total number of bytes
to copy. total_len is always "bytes left to copy", so it should be
added to written bytes.
This code may not be exercised any more if partial writes will not be
hit, but this is a small bugfix before a larger change.
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
handler
Fixes: 8034f715f ("powernv/opal-dump: Convert to irq domain")
Converts all the return explicit number to a more proper IRQ_HANDLED,
which looks proper incase of interrupt handler returning case.
Here, It also removes error message like "nobody cared" which was
getting unveiled while returning -1 or 0 from handler.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Moves the return value check of 'opal_dump_info' to a proper place which
was previously unnecessarily filling all the dump info even on failure.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Since commit dc3106690b20 ("powerpc: tm: Always use fp_state and
vr_state to store live registers") tm_reclaim_thread() doesn't use the
parameter anymore, both callers have to bother getting it as they have
no need for a struct thread_info either.
Just remove it and adjust the callers.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In commit eb5c3f1c8647 ("powerpc: Always save/restore checkpointed regs
during treclaim/trecheckpoint") __tm_recheckpoint was modified to no
longer take the second parameter 'unsigned long orig_msr' as part of a
TM rewrite to simplify the reclaiming/recheckpointing process.
There is a comment in the asm file where the function is delcared which
has an incorrect prototype with the 'orig_msr' parameter.
This patch corrects the comment.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This patch is based on the previous VMX patch on memcmp().
To optimize ppc64 memcmp() with VMX instruction, we need to think about
the VMX penalty brought with: If kernel uses VMX instruction, it needs
to save/restore current thread's VMX registers. There are 32 x 128 bits
VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.
The major concern regarding the memcmp() performance in kernel is KSM,
who will use memcmp() frequently to merge identical pages. So it will
make sense to take some measures/enhancement on KSM to see whether any
improvement can be done here. Cyril Bur indicates that the memcmp() for
KSM has a higher possibility to fail (unmatch) early in previous bytes
in following mail.
https://patchwork.ozlabs.org/patch/817322/#1773629
And I am taking a follow-up on this with this patch.
Per some testing, it shows KSM memcmp() will fail early at previous 32
bytes. More specifically:
- 76% cases will fail/unmatch before 16 bytes;
- 83% cases will fail/unmatch before 32 bytes;
- 84% cases will fail/unmatch before 64 bytes;
So 32 bytes looks a better choice than other bytes for pre-checking.
The early failure is also true for memcmp() for non-KSM case. With a
non-typical call load, it shows ~73% cases fail before first 32 bytes.
This patch adds a 32 bytes pre-checking firstly before jumping into VMX
operations, to avoid the unnecessary VMX penalty. It is not limited to
KSM case. And the testing shows ~20% improvement on memcmp() average
execution time with this patch.
And note the 32B pre-checking is only performed when the compare size
is long enough (>=4K currently) to allow VMX operation.
The detail data and analysis is at:
https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This patch add VMX primitives to do memcmp() in case the compare size
is equal or greater than 4K bytes. KSM feature can benefit from this.
Test result with following test program(replace the "^>" with ""):
------
># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
>#include <malloc.h>
>#include <stdlib.h>
>#include <string.h>
>#include <time.h>
>#include "utils.h"
>#define SIZE (1024 * 1024 * 900)
>#define ITERATIONS 40
int test_memcmp(const void *s1, const void *s2, size_t n);
static int testcase(void)
{
char *s1;
char *s2;
unsigned long i;
s1 = memalign(128, SIZE);
if (!s1) {
perror("memalign");
exit(1);
}
s2 = memalign(128, SIZE);
if (!s2) {
perror("memalign");
exit(1);
}
for (i = 0; i < SIZE; i++) {
s1[i] = i & 0xff;
s2[i] = i & 0xff;
}
for (i = 0; i < ITERATIONS; i++) {
int ret = test_memcmp(s1, s2, SIZE);
if (ret) {
printf("return %d at[%ld]! should have returned zero\n", ret, i);
abort();
}
}
return 0;
}
int main(void)
{
return test_harness(testcase, "memcmp");
}
------
Without this patch (but with the first patch "powerpc/64: Align bytes
before fall back to .Lshort in powerpc64 memcmp()." in the series):
4.726728762 seconds time elapsed ( +- 3.54%)
With VMX patch:
4.234335473 seconds time elapsed ( +- 2.63%)
There is ~+10% improvement.
Testing with unaligned and different offset version (make s1 and s2 shift
random offset within 16 bytes) can archieve higher improvement than 10%..
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Some old tool chains don't know about instructions like vcmpequd.
This patch adds .long macro for vcmpequd and vcmpequb, which is
a preparation to optimize ppc64 memcmp with VMX instructions.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|