Age | Commit message (Collapse) | Author |
|
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Chris Leech <cleech@redhat.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The data in NVRAM is not guaranteed to be NUL terminated. Since
snprintf expects byte-stream to accommodate null byte, the CHAP secret
is truncated. Use sprintf instead of snprintf to fix the truncation of
CHAP name and secret.
Signed-off-by: Andrew Vasquez <andrew.vasquez@cavium.com>
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Chris Leech <cleech@redhat.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch fixes incorrect handle used for abort IOCB.
Fixes: b027a5ace443 ("scsi: qla2xxx: Fix queue ID for async abort with Multiqueue")
Signed-off-by: Darren Trapp <darren.trapp@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch is based on Max's original patch.
When the qla2xxx firmware is unavailable, eventually
qla2x00_sp_timeout() is reached, which calls the timeout function and
frees the srb_t instance.
The timeout function always resolves to qla2x00_async_iocb_timeout(),
which invokes another callback function called "done". All of these
qla2x00_*_sp_done() callbacks also free the srb_t instance; after
returning to qla2x00_sp_timeout(), it is freed again.
The fix is to remove the "sp->free(sp)" call from qla2x00_sp_timeout()
and add it to those code paths in qla2x00_async_iocb_timeout() which
do not already free the object.
This is how it looks like with KASAN:
BUG: KASAN: use-after-free in qla2x00_sp_timeout+0x228/0x250
Read of size 8 at addr ffff88278147a590 by task swapper/2/0
Allocated by task 1502:
save_stack+0x33/0xa0
kasan_kmalloc+0xa0/0xd0
kmem_cache_alloc+0xb8/0x1c0
mempool_alloc+0xd6/0x260
qla24xx_async_gnl+0x3c5/0x1100
Freed by task 0:
save_stack+0x33/0xa0
kasan_slab_free+0x72/0xc0
kmem_cache_free+0x75/0x200
qla24xx_async_gnl_sp_done+0x556/0x9e0
qla2x00_async_iocb_timeout+0x1c7/0x420
qla2x00_sp_timeout+0x16d/0x250
call_timer_fn+0x36/0x200
The buggy address belongs to the object at ffff88278147a440
which belongs to the cache qla2xxx_srbs of size 344
The buggy address is located 336 bytes inside of
344-byte region [ffff88278147a440, ffff88278147a598)
Reported-by: Max Kellermann <mk@cm4all.com>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: Max Kellermann <mk@cm4all.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Increase cmd_per_lun to allow more I/Os in progress per device,
particularly for NVMe's. The Hyper-V host side can handle the higher
count with no issues.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix one typo of render_mmio trace, exchange the mmio value of old and new.
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
GGTT is in BAR0 with 8 bytes aligned. With a qemu patch (commit:
38d49e8c1523d97d2191190d3f7b4ce7a0ab5aa3), VFIO can use 8-byte reads/
writes to access it.
This patch is to support the 8-byte GGTT reads/writes.
Ideally, we would like to support 8-byte reads/writes for the total BAR0.
But it needs more work for handling 8-byte MMIO reads/writes.
This patch can fix the issue caused by partial updating GGTT entry, during
guest booting up.
v3:
- Use intel_vgpu_get_bar_gpa() stead. (Zhenyu)
- Include all the GGTT checking logic in gtt_entry(). (Zhenyu)
v2:
- Limit to GGTT entry. (Zhenyu)
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
Guest may set this register on KBL platform, it can impact hardware
behavior, so add it into the gen9 render list. Otherwise gpu hang issue may
happen during different vgpu switch.
v2: separate it from patch set.
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
As we peek inside struct device to query members guarded by CONFIG_PM,
so must be the code.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207160428.17015-1-chris@chris-wilson.co.uk
(cherry picked from commit 05273c950a3c93c5f96be8807eaf24f2cc9f1c1e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-4-tvrtko.ursulin@linux.intel.com
|
|
We are not allowed to call intel_runtime_pm_get from the PMU counter read
callback since the former can sleep, and the latter is running under IRQ
context.
To workaround this, we record the last known RC6 and while runtime
suspended estimate its increase by querying the runtime PM core
timestamps.
Downside of this approach is that we can temporarily lose a chunk of RC6
time, from the last PMU read-out to runtime suspend entry, but that will
eventually catch up, once device comes back online and in the presence of
PMU queries.
Also, we have to be careful not to overshoot the RC6 estimate, so once
resumed after a period of approximation, we only update the counter once
it catches up. With the observation that RC6 is increasing while the
device is suspended, this should not pose a problem and can only cause
slight inaccuracies due clock base differences.
v2: Simplify by estimating on top of PM core counters. (Imre)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104943
Fixes: 6060b6aec03c ("drm/i915/pmu: Add RC6 residency metrics")
Testcase: igt/perf_pmu/rc6-runtime-pm
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180206183311.17924-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 1fe699e30113ed6f6e853ff44710d256072ea627)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-3-tvrtko.ursulin@linux.intel.com
|
|
Commit 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking
inside for busy-stats") added a tasklet_disable call in busy stats
enabling, but we failed to understand that the PMU enable callback runs
as an hard IRQ (IPI).
Consequence of this is that the PMU enable callback can interrupt the
execlists tasklet, and will then deadlock when it calls
intel_engine_stats_enable->tasklet_disable.
To fix this, I realized it is possible to move the engine stats enablement
and disablement to PMU event init and destroy hooks. This allows for much
simpler implementation since those hooks run in normal context (can
sleep).
v2: Extract engine_event_destroy. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats")
Testcase: igt/perf_pmu/enable-race-*
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205093448.13877-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit b2f78cda260bc6a1a2d382b1d85a29e69b5b3724)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-2-tvrtko.ursulin@linux.intel.com
|
|
In order to prevent a race condition where we may end up overaccounting
the active state and leaving the busy-stats believing the GPU is 100%
busy, lock out the tasklet while we reconstruct the busy state. There is
no direct spinlock guard for the execlists->port[], so we need to
utilise tasklet_disable() as a synchronous barrier to prevent it, the
only writer to execlists->port[], from running at the same time as the
enable.
Fixes: 4900727d35bb ("drm/i915/pmu: Reconstruct active state on starting busy-stats")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180115092041.13509-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit 99e48bf98dd036090b480a12c39e8b971731247e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-1-tvrtko.ursulin@linux.intel.com
|
|
When a request is preempted, it is unsubmitted from the HW queue and
removed from the active list of breadcrumbs. In the process, this
however triggers the signaler and it may see the clear rbtree with the
old, and still valid, seqno, or it may match the cleared seqno with the
now zero rq->global_seqno. This confuses the signaler into action and
signaling the fence.
Fixes: d6a2289d9d6b ("drm/i915: Remove the preempted request from the execution queue")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180206094633.30181-1-chris@chris-wilson.co.uk
(cherry picked from commit fd10e2ce9905030d922e179a8047a4d50daffd8e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180213090154.17373-1-chris@chris-wilson.co.uk
|
|
We need to halt the controller immediately if we haven't completed
initialization as indicated by the new "connecting" state.
Fixes: ad70062cdb ("nvme-pci: introduce RECONNECTING state to mark initializing procedure")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.
This patch fixes that by setting the queue's CMB base address each time
the queue is created.
Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <christian.d.black@intel.com>
Cc: Jon Derrick <jonathan.derrick@intel.com>
Cc: <stable@vger.kernel.org> # 4.9+
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
nvme_update_formats will invoke nvme_ns_remove under namespaces_mutext.
The will cause deadlock because nvme_ns_remove will also require
the namespaces_mutext. Fix it by getting the ns entries which should
be removed under namespaces_mutext and invoke nvme_ns_remove out of
namespaces_mutext.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
Some other drivers may be waiting for our extcon to show-up, exiting their
probe methods with -EPROBE_DEFER until we show up.
These drivers will typically get the cable state directly after getting
the extcon, this commit changes the int3496 code to wait for the initial
processing of the id-pin to complete before exiting probe() with 0, which
will cause devices waiting on the defered probe to get reprobed.
This fixes a race where the initial work might still be running while other
drivers were already calling extcon_get_state().
Fixes: 2f556bdb9f2e ("extcon: int3496: Add Intel INT3496 ACPI ... driver")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
|
|
It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:
In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense. After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.
In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately. Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.
Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
adds an rcu read critical section to __rd_conn_create. The
memory allocations in that critcal section need to use
GFP_ATOMIC to avoid sleeping.
This patch was verified with syzkaller reproducer.
Reported-by: syzbot+a0564419941aaae3fe3c@syzkaller.appspotmail.com
Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS fix from James Hogan:
"A single change (and associated DT binding update) to allow the
address of the MIPS Cluster Power Controller (CPC) to be chosen by DT,
which allows SMP to work on generic MIPS kernels where the bootloader
hasn't configured the CPC address (i.e. the new Ranchu platform)"
* tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: CPC: Map registers using DT in mips_cpc_default_phys_base()
dt-bindings: Document mti,mips-cpc binding
|
|
Jiri Pirko says:
====================
net: sched: couple of fixes
This patchset contains couple of fixes following-up the shared block
patchsets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The offending commit wrongly assumes 1:1 mapping between block and q.
However, there are multiple blocks for a single q for classful qdiscs.
Since the obscure tc_u_common sharing mechanism expects it to be shared
among a qdisc, fix it by storing q pointer in case the block is not
shared.
Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Fixes: 7fa9d974f3c2 ("net: sched: cls_u32: use block instead of q in tc_u_common")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is pointless to set block->q for block which are shared among
multiple qdiscs. So remove the assignment in that case. Do a bit of code
reshuffle to make block->index initialized at that point so we can use
tcf_block_shared() helper.
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create()
use ERR_PTR macro to propagate int err through return of a pointer,
the return value is not NULL in case of failure. So if one
of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table
is not NULL and mlxsw_sp_vr_is_used wrongly assumes
that vr is in use which leads to crash like following one:
[ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9
[ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum]
Fix this by using local variables to hold the pointers and set vr->*
only in case everything went fine.
Fixes: 76610ebbde18 ("mlxsw: spectrum_router: Refactor virtual router handling")
Fixes: a3d9bc506d64 ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support")
Fixes: d42b0965b1d4 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change "minimun" to "minimum".
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have expected busses to set up a coherent mask to properly use the
common dma mapping code for a long time, and now that I've added a warning
macio turned out to not set one up yet. This sets it to the same value
as the dma_mask, which seems to be what the drivers expect.
Reported-by: Mathieu Malaterre <malat@debian.org>
Tested-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This fixes a compile problem of some user space applications by not
including linux/libc-compat.h in uapi/if_ether.h.
linux/libc-compat.h checks which "features" the header files, included
from the libc, provide to make the Linux kernel uapi header files only
provide no conflicting structures and enums. If a user application mixes
kernel headers and libc headers it could happen that linux/libc-compat.h
gets included too early where not all other libc headers are included
yet. Then the linux/libc-compat.h would not prevent all the
redefinitions and we run into compile problems.
This patch removes the include of linux/libc-compat.h from
uapi/if_ether.h to fix the recently introduced case, but not all as this
is more or less impossible.
It is no problem to do the check directly in the if_ether.h file and not
in libc-compat.h as this does not need any fancy glibc header detection
as glibc never provided struct ethhdr and should define
__UAPI_DEF_ETHHDR by them self when they will provide this.
The following test program did not compile correctly any more:
#include <linux/if_ether.h>
#include <netinet/in.h>
#include <linux/in.h>
int main(void)
{
return 0;
}
Fixes: 6926e041a892 ("uapi/if_ether.h: prevent redefinition of struct ethhdr")
Reported-by: Guillaume Nault <g.nault@alphalink.fr>
Cc: <stable@vger.kernel.org> # 4.15
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.
Reference:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html
Changes since v2->v3:
-using __set_current_state() instead of set_current_state()
Changes since v1->v2:
-setting task state once in blk_poll, instead of multiple
callers.
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In the following commit:
ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
... we added code to memory_failure() to unmap the page from the
kernel 1:1 virtual address space to avoid speculative access to the
page logging additional errors.
But memory_failure() may not always succeed in taking the page offline,
especially if the page belongs to the kernel. This can happen if
there are too many corrected errors on a page and either mcelog(8)
or drivers/ras/cec.c asks to take a page offline.
Since we remove the 1:1 mapping early in memory_failure(), we can
end up with the page unmapped, but still in use. On the next access
the kernel crashes :-(
There are also various debug paths that call memory_failure() to simulate
occurrence of an error. Since there is no actual error in memory, we
don't need to map out the page for those cases.
Revert most of the previous attempt and keep the solution local to
arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
1) there is a real error
2) memory_failure() succeeds.
All of this only applies to 64-bit systems. 32-bit kernel doesn't map
all of memory into kernel space. It isn't worth adding the code to unmap
the piece that is mapped because nobody would run a 32-bit kernel on a
machine that has recoverable machine checks.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert (Persistent Memory) <elliott@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org #v4.14
Fixes: ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
directory
EFI is complicated enough that being able to view its pagetables is
quite helpful. Rather than requiring users to fish it out of dmesg
on an appropriately configured kernel, let users view it in debugfs
as well.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/ba158a93f3250e6fca752cff2cfb1fcdd9f2b50c.1517414050.git.luto@kernel.org
[ Fixed trivial whitespace damage and fixed missing export. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
sme_pgtable_calc() is unnecessary complex. It can be re-written in a
more stream-lined way.
As a side effect, we would get the code ready to boot-time switching
between paging modes.
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180131135404.40692-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
sme_populate_pgd() and sme_populate_pgd_large() operate on the identity
mapping, which means they want virtual addresses to be equal to physical
one, without PAGE_OFFSET shift.
We also need to avoid paravirtualization call there.
Getting this done is tricky. We cannot use usual page table helpers.
It forces us to open-code a lot of things. It makes code ugly and hard
to modify.
We can get it work with the page table helpers, but it requires few
preprocessor tricks.
- Define __pa() and __va() to be compatible with identity mapping.
- Undef CONFIG_PARAVIRT and CONFIG_PARAVIRT_SPINLOCKS before including
any file. This way we can avoid paravirtualization calls.
Now we can user normal page table helpers just fine.
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180131135404.40692-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
There are bunch of functions in mem_encrypt.c that operate on the
identity mapping, which means they want virtual addresses to be equal to
physical one, without PAGE_OFFSET shift.
We also need to avoid paravirtualizaion call there.
Getting this done is tricky. We cannot use usual page table helpers.
It forces us to open-code a lot of things. It makes code ugly and hard
to modify.
We can get it work with the page table helpers, but it requires few
preprocessor tricks. These tricks may have side effects for the rest of
the file.
Let's isolate such functions into own translation unit.
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180131135404.40692-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The TLB invalidation info is allocated on the stack, which might cause
it to be unaligned. Since this information may be transferred to
different cores for TLB shootdown, this may cause an additional cache
line to become shared. While the overhead is likely to be small, the
fix is simple.
We do not use __cacheline_aligned() since it also defines the section,
which is inappropriate for stack variables.
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180131211912.52064-1-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
While reading this header I noticed that the locking stuff has moved to
kernel/locking/*, so update the path in semaphore.h to point to that.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180201114119.1090-1-tycho@tycho.ws
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
test_and_{}_bit()
A test_and_{}_bit() operation fails if the value of the bit is such that
the modification does not take place. For example, if test_and_set_bit()
returns 1. In these cases, follow the behaviour of cmpxchg and allow the
operation to be unordered. This also applies to test_and_set_bit_lock()
if the lock is found to be be taken already.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518528619-20049-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When queuing on the qspinlock, the count field for the current CPU's head
node is incremented. This needn't be atomic because locking in e.g. IRQ
context is balanced and so an IRQ will return with node->count as it
found it.
However, the compiler could in theory reorder the initialisation of
node[idx] before the increment of the head node->count, causing an
IRQ to overwrite the initialised node and potentially corrupt the lock
state.
Avoid the potential for this harmful compiler reordering by placing a
barrier() between the increment of the head node->count and the subsequent
node initialisation.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When a locker ends up queuing on the qspinlock locking slowpath, we
initialise the relevant mcs node and publish it indirectly by updating
the tail portion of the lock word using xchg_tail. If we find that there
was a pre-existing locker in the queue, we subsequently update their
->next field to point at our node so that we are notified when it's our
turn to take the lock.
This can be roughly illustrated as follows:
/* Initialise the fields in node and encode a pointer to node in tail */
tail = initialise_node(node);
/*
* Exchange tail into the lockword using an atomic read-modify-write
* operation with release semantics
*/
old = xchg_tail(lock, tail);
/* If there was a pre-existing waiter ... */
if (old & _Q_TAIL_MASK) {
prev = decode_tail(old);
smp_read_barrier_depends();
/* ... then update their ->next field to point to node.
WRITE_ONCE(prev->next, node);
}
The conditional update of prev->next therefore relies on the address
dependency from the result of xchg_tail ensuring order against the
prior initialisation of node. However, since the release semantics of
the xchg_tail operation apply only to the write portion of the RmW,
then this ordering is not guaranteed and it is possible for the CPU
to return old before the writes to node have been published, consequently
allowing us to point prev->next to an uninitialised node.
This patch fixes the problem by making the update of prev->next a RELEASE
operation, which also removes the reliance on dependency ordering.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518528177-19169-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With link time optimizations enabled, I get a link failure:
./ccLbOEHX.ltrans19.ltrans.o: In function `override_function_with_return':
<artificial>:(.text+0x7f3): undefined reference to `just_return_func'
Marking the symbol .globl makes it work as expected.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 540adea3809f ("error-injection: Separate error-injection from kprobe")
Link: http://lkml.kernel.org/r/20180202145634.200291-3-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The latest UV platforms include the new ApachePass NVDIMMs into the
UV address space. This has introduced address ranges in the Global
Address Map Table that are less than the previous lowest range, which
was 2GB. Fix the address calculation so it accommodates address ranges
from bytes to exabytes.
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Andrew Banman <andrew.banman@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 73fbc1eba7ff ("MIPS: fix mem=X@Y commandline processing") added a
fix to ensure that the memory range between PHYS_OFFSET and low memory
address specified by mem= cmdline argument is not later processed by
free_all_bootmem. This change was incorrect for systems where the
commandline specifies more than 1 mem argument, as it will cause all
memory between PHYS_OFFSET and each of the memory offsets to be marked
as reserved, which results in parts of the RAM marked as reserved
(Creator CI20's u-boot has a default commandline argument 'mem=256M@0x0
mem=768M@0x30000000').
Change the behaviour to ensure that only the range between PHYS_OFFSET
and the lowest start address of the memories is marked as protected.
This change also ensures that the range is marked protected even if it's
only defined through the devicetree and not only via commandline
arguments.
Reported-by: Mathieu Malaterre <mathieu.malaterre@gmail.com>
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@mips.com>
Fixes: 73fbc1eba7ff ("MIPS: fix mem=X@Y commandline processing")
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # v4.11+
Tested-by: Mathieu Malaterre <malat@debian.org>
Patchwork: https://patchwork.linux-mips.org/patch/18562/
Signed-off-by: James Hogan <jhogan@kernel.org>
|
|
The file was generated by make command and should not be in the source tree.
Signed-off-by: Progyan Bhattacharya <progyanb@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since schedutil kernel thread directly set priority to 0, the macro
SUGOV_KTHREAD_PRIORITY is not used. So remove it.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/1518097702-9665-1-git-send-email-leo.yan@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Remove the __init annotation from bmips_cpu_setup() to avoid the
following warning.
WARNING: vmlinux.o(.text+0x35c950): Section mismatch in reference from the function brcmstb_pm_s3() to the function .init.text:bmips_cpu_setup()
The function brcmstb_pm_s3() references
the function __init bmips_cpu_setup().
This is often because brcmstb_pm_s3 lacks a __init
annotation or the annotation of bmips_cpu_setup is wrong.
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: linux-mips@linux-mips.org
Reviewed-by: James Hogan <jhogan@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/18589/
Signed-off-by: James Hogan <jhogan@kernel.org>
|
|
physical CPU
When a physical CPU is hot-removed, the following warning messages
are shown while the uncore device is removed in uncore_pci_remove():
WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
uncore_pci_remove+0xf1/0x110
...
CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
...
Call Trace:
pci_device_remove+0x36/0xb0
device_release_driver_internal+0x145/0x210
pci_stop_bus_device+0x76/0xa0
pci_stop_root_bus+0x44/0x60
acpi_pci_root_remove+0x1f/0x80
acpi_bus_trim+0x54/0x90
acpi_bus_trim+0x2e/0x90
acpi_device_hotplug+0x2bc/0x4b0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x141/0x340
worker_thread+0x47/0x3e0
kthread+0xf5/0x130
When uncore_pci_remove() runs, it tries to get the package ID to
clear the value of uncore_extra_pci_dev[].dev[] by using
topology_phys_to_logical_pkg(). The warning messesages are
shown because topology_phys_to_logical_pkg() returns -1.
arch/x86/events/intel/uncore.c:
static void uncore_pci_remove(struct pci_dev *pdev)
{
...
phys_id = uncore_pcibus_to_physid(pdev->bus);
...
pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
uncore_extra_pci_dev[pkg].dev[i] = NULL;
break;
}
}
WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
topology_phys_to_logical_pkg() tries to find
cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
arch/x86/kernel/smpboot.c:
int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
int cpu;
for_each_possible_cpu(cpu) {
struct cpuinfo_x86 *c = &cpu_data(cpu);
if (c->initialized && c->phys_proc_id == phys_pkg)
return c->logical_proc_id;
}
return -1;
}
However, the phys_proc_id was already set to 0 by remove_siblinginfo()
when the CPU was offlined.
So, topology_phys_to_logical_pkg() cannot find the correct
logical_proc_id and always returns -1.
As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
messages are shown.
What is worse is that the bogus 'pkg' index results in two bugs:
- We dereference uncore_extra_pci_dev[] with a negative index
- We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
This should not cause any problems, because ->phys_proc_id is not
used after it is hot-removed and it is re-set while hot-adding.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yasu.isimatu@gmail.com
Cc: <stable@vger.kernel.org>
Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array")
Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With glibc 2.26 'struct ucontext' is removed to improve POSIX
compliance, which breaks powerpc/alignment_handler selftest. Fix the
test by using ucontext_t. Tested on ppc, works with older glibc
versions as well.
Fixes the following:
alignment_handler.c: In function ‘sighandler’:
alignment_handler.c:68:5: error: dereferencing pointer to incomplete type ‘struct ucontext’
ucp->uc_mcontext.gp_regs[PT_NIP] += 4;
Signed-off-by: Harish <harish@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
If KEXEC_CORE is not enabled, powernv builds fail as follows.
arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
arch/powerpc/platforms/powernv/smp.c:236:4: error:
implicit declaration of function 'crash_ipi_callback'
Add dummy function calls, similar to kdump_in_progress(), to solve the
problem.
Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel can get HMI's")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Commit e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with
memoryless nodes") adds an unconditional call to
find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR
is enabled. This results in the following build error if this is not
the case.
arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
undefined reference to `.find_and_online_cpu_nid'
Follow the guideline provided by similar functions and provide a dummy
function if CONFIG_PPC_SPLPAR is not enabled. This also moves the
external function declaration into an include file where it should be.
Fixes: e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[mpe: Change subject to emphasise the build fix]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
On powerpc we allocate page table pages from slab caches of different
sizes. Currently we have a constructor that zeroes out the objects when
we allocate them for the first time.
We expect the objects to be zeroed out when we free the the object
back to slab cache. This happens in the unmap path. For hugetlb pages
we call huge_pte_get_and_clear() to do that.
With the current configuration of page table size, both PUD and PGD
level tables are allocated from the same slab cache. At the PUD level,
we use the second half of the table to store the slot information. But
we never clear that when unmapping.
When such a freed object is then allocated for a PGD page, the second
half of the page table page will not be zeroed as expected. This
results in a kernel crash.
Fix it by always clearing PGD pages when they're allocated.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Change log wording and formatting, add whitespace]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The hugetlb pte entries are at the PMD and PUD level, so we can't use
PTRS_PER_PTE to find the second half of the page table. Use the right
offset for PUD/PMD to get to the second half of the table.
Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|