summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-02-13scsi: qedi: Cleanup local str variableNilesh Javali
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Chris Leech <cleech@redhat.com> Acked-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13scsi: qedi: Fix truncation of CHAP name and secretAndrew Vasquez
The data in NVRAM is not guaranteed to be NUL terminated. Since snprintf expects byte-stream to accommodate null byte, the CHAP secret is truncated. Use sprintf instead of snprintf to fix the truncation of CHAP name and secret. Signed-off-by: Andrew Vasquez <andrew.vasquez@cavium.com> Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Chris Leech <cleech@redhat.com> Acked-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13scsi: qla2xxx: Fix incorrect handle for abort IOCBHimanshu Madhani
This patch fixes incorrect handle used for abort IOCB. Fixes: b027a5ace443 ("scsi: qla2xxx: Fix queue ID for async abort with Multiqueue") Signed-off-by: Darren Trapp <darren.trapp@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13scsi: qla2xxx: Fix double free bug after firmware timeoutQuinn Tran
This patch is based on Max's original patch. When the qla2xxx firmware is unavailable, eventually qla2x00_sp_timeout() is reached, which calls the timeout function and frees the srb_t instance. The timeout function always resolves to qla2x00_async_iocb_timeout(), which invokes another callback function called "done". All of these qla2x00_*_sp_done() callbacks also free the srb_t instance; after returning to qla2x00_sp_timeout(), it is freed again. The fix is to remove the "sp->free(sp)" call from qla2x00_sp_timeout() and add it to those code paths in qla2x00_async_iocb_timeout() which do not already free the object. This is how it looks like with KASAN: BUG: KASAN: use-after-free in qla2x00_sp_timeout+0x228/0x250 Read of size 8 at addr ffff88278147a590 by task swapper/2/0 Allocated by task 1502: save_stack+0x33/0xa0 kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc+0xb8/0x1c0 mempool_alloc+0xd6/0x260 qla24xx_async_gnl+0x3c5/0x1100 Freed by task 0: save_stack+0x33/0xa0 kasan_slab_free+0x72/0xc0 kmem_cache_free+0x75/0x200 qla24xx_async_gnl_sp_done+0x556/0x9e0 qla2x00_async_iocb_timeout+0x1c7/0x420 qla2x00_sp_timeout+0x16d/0x250 call_timer_fn+0x36/0x200 The buggy address belongs to the object at ffff88278147a440 which belongs to the cache qla2xxx_srbs of size 344 The buggy address is located 336 bytes inside of 344-byte region [ffff88278147a440, ffff88278147a598) Reported-by: Max Kellermann <mk@cm4all.com> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Cc: Max Kellermann <mk@cm4all.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13scsi: storvsc: Increase cmd_per_lun for higher speed devicesMichael Kelley (EOSG)
Increase cmd_per_lun to allow more I/Os in progress per device, particularly for NVMe's. The Hyper-V host side can handle the higher count with no issues. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-14drm/i915/gvt: fix one typo of render_mmio traceWeinan Li
Fix one typo of render_mmio trace, exchange the mmio value of old and new. Signed-off-by: Weinan Li <weinan.z.li@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-02-14drm/i915/gvt: Support BAR0 8-byte reads/writesTina Zhang
GGTT is in BAR0 with 8 bytes aligned. With a qemu patch (commit: 38d49e8c1523d97d2191190d3f7b4ce7a0ab5aa3), VFIO can use 8-byte reads/ writes to access it. This patch is to support the 8-byte GGTT reads/writes. Ideally, we would like to support 8-byte reads/writes for the total BAR0. But it needs more work for handling 8-byte MMIO reads/writes. This patch can fix the issue caused by partial updating GGTT entry, during guest booting up. v3: - Use intel_vgpu_get_bar_gpa() stead. (Zhenyu) - Include all the GGTT checking logic in gtt_entry(). (Zhenyu) v2: - Limit to GGTT entry. (Zhenyu) Signed-off-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-02-14drm/i915/gvt: add 0xe4f0 into gen9 render listWeinan Li
Guest may set this register on KBL platform, it can impact hardware behavior, so add it into the gen9 render list. Otherwise gpu hang issue may happen during different vgpu switch. v2: separate it from patch set. Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Weinan Li <weinan.z.li@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-02-13drm/i915/pmu: Fix building without CONFIG_PMChris Wilson
As we peek inside struct device to query members guarded by CONFIG_PM, so must be the code. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180207160428.17015-1-chris@chris-wilson.co.uk (cherry picked from commit 05273c950a3c93c5f96be8807eaf24f2cc9f1c1e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-4-tvrtko.ursulin@linux.intel.com
2018-02-13drm/i915/pmu: Fix sleep under atomic in RC6 readoutTvrtko Ursulin
We are not allowed to call intel_runtime_pm_get from the PMU counter read callback since the former can sleep, and the latter is running under IRQ context. To workaround this, we record the last known RC6 and while runtime suspended estimate its increase by querying the runtime PM core timestamps. Downside of this approach is that we can temporarily lose a chunk of RC6 time, from the last PMU read-out to runtime suspend entry, but that will eventually catch up, once device comes back online and in the presence of PMU queries. Also, we have to be careful not to overshoot the RC6 estimate, so once resumed after a period of approximation, we only update the counter once it catches up. With the observation that RC6 is increasing while the device is suspended, this should not pose a problem and can only cause slight inaccuracies due clock base differences. v2: Simplify by estimating on top of PM core counters. (Imre) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104943 Fixes: 6060b6aec03c ("drm/i915/pmu: Add RC6 residency metrics") Testcase: igt/perf_pmu/rc6-runtime-pm Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180206183311.17924-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit 1fe699e30113ed6f6e853ff44710d256072ea627) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-3-tvrtko.ursulin@linux.intel.com
2018-02-13drm/i915/pmu: Fix PMU enable vs execlists tasklet raceTvrtko Ursulin
Commit 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats") added a tasklet_disable call in busy stats enabling, but we failed to understand that the PMU enable callback runs as an hard IRQ (IPI). Consequence of this is that the PMU enable callback can interrupt the execlists tasklet, and will then deadlock when it calls intel_engine_stats_enable->tasklet_disable. To fix this, I realized it is possible to move the engine stats enablement and disablement to PMU event init and destroy hooks. This allows for much simpler implementation since those hooks run in normal context (can sleep). v2: Extract engine_event_destroy. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats") Testcase: igt/perf_pmu/enable-race-* Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180205093448.13877-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit b2f78cda260bc6a1a2d382b1d85a29e69b5b3724) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-2-tvrtko.ursulin@linux.intel.com
2018-02-13drm/i915: Lock out execlist tasklet while peeking inside for busy-statsChris Wilson
In order to prevent a race condition where we may end up overaccounting the active state and leaving the busy-stats believing the GPU is 100% busy, lock out the tasklet while we reconstruct the busy state. There is no direct spinlock guard for the execlists->port[], so we need to utilise tasklet_disable() as a synchronous barrier to prevent it, the only writer to execlists->port[], from running at the same time as the enable. Fixes: 4900727d35bb ("drm/i915/pmu: Reconstruct active state on starting busy-stats") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180115092041.13509-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (cherry picked from commit 99e48bf98dd036090b480a12c39e8b971731247e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180213095747.2424-1-tvrtko.ursulin@linux.intel.com
2018-02-13drm/i915/breadcrumbs: Ignore unsubmitted signalersChris Wilson
When a request is preempted, it is unsubmitted from the HW queue and removed from the active list of breadcrumbs. In the process, this however triggers the signaler and it may see the clear rbtree with the old, and still valid, seqno, or it may match the cleared seqno with the now zero rq->global_seqno. This confuses the signaler into action and signaling the fence. Fixes: d6a2289d9d6b ("drm/i915: Remove the preempted request from the execution queue") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.12+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180206094633.30181-1-chris@chris-wilson.co.uk (cherry picked from commit fd10e2ce9905030d922e179a8047a4d50daffd8e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180213090154.17373-1-chris@chris-wilson.co.uk
2018-02-13nvme-pci: Fix timeouts in connecting stateKeith Busch
We need to halt the controller immediately if we haven't completed initialization as indicated by the new "connecting" state. Fixes: ad70062cdb ("nvme-pci: introduce RECONNECTING state to mark initializing procedure") Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-02-13nvme-pci: Remap CMB SQ entries on every controller resetKeith Busch
The controller memory buffer is remapped into a kernel address on each reset, but the driver was setting the submission queue base address only on the very first queue creation. The remapped address is likely to change after a reset, so accessing the old address will hit a kernel bug. This patch fixes that by setting the queue's CMB base address each time the queue is created. Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path") Reported-by: Christian Black <christian.d.black@intel.com> Cc: Jon Derrick <jonathan.derrick@intel.com> Cc: <stable@vger.kernel.org> # 4.9+ Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-02-13nvme: fix the deadlock in nvme_update_formatsJianchao Wang
nvme_update_formats will invoke nvme_ns_remove under namespaces_mutext. The will cause deadlock because nvme_ns_remove will also require the namespaces_mutext. Fix it by getting the ns entries which should be removed under namespaces_mutext and invoke nvme_ns_remove out of namespaces_mutext. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2018-02-14extcon: int3496: process id-pin first so that we start with the right statusHans de Goede
Some other drivers may be waiting for our extcon to show-up, exiting their probe methods with -EPROBE_DEFER until we show up. These drivers will typically get the cable state directly after getting the extcon, this commit changes the int3496 code to wait for the initial processing of the id-pin to complete before exiting probe() with 0, which will cause devices waiting on the defered probe to get reprobed. This fixes a race where the initial work might still be running while other drivers were already calling extcon_get_state(). Fixes: 2f556bdb9f2e ("extcon: int3496: Add Intel INT3496 ACPI ... driver") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2018-02-13gfs2: Fixes to "Implement iomap for block_map"Andreas Gruenbacher
It turns out that commit 3974320ca6 "Implement iomap for block_map" introduced a few bugs that trigger occasional failures with xfstest generic/476: In gfs2_iomap_begin, we jump to do_alloc when we determine that we are beyond the end of the allocated metadata (height > ip->i_height). There, we can end up calling hole_size with a metapath that doesn't match the current metadata tree, which doesn't make sense. After untangling the code at do_alloc, fix this by checking if the block we are looking for is within the range of allocated metadata. In addition, add a BUG() in case gfs2_iomap_begin is accidentally called for reading stuffed files: this is handled separately. Make sure we don't truncate iomap->length for reads beyond the end of the file; in that case, the entire range counts as a hole. Finally, revert to taking a bitmap write lock when doing allocations. It's unclear why that change didn't lead to any failures during testing. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-13rds: do not call ->conn_alloc with GFP_KERNELSowmini Varadhan
Commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management") adds an rcu read critical section to __rd_conn_create. The memory allocations in that critcal section need to use GFP_ATOMIC to avoid sleeping. This patch was verified with syzkaller reproducer. Reported-by: syzbot+a0564419941aaae3fe3c@syzkaller.appspotmail.com Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13Merge tag 'mips_4.16_2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS fix from James Hogan: "A single change (and associated DT binding update) to allow the address of the MIPS Cluster Power Controller (CPC) to be chosen by DT, which allows SMP to work on generic MIPS kernels where the bootloader hasn't configured the CPC address (i.e. the new Ranchu platform)" * tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: CPC: Map registers using DT in mips_cpc_default_phys_base() dt-bindings: Document mti,mips-cpc binding
2018-02-13Merge branch 'net-sched-couple-of-fixes'David S. Miller
Jiri Pirko says: ==================== net: sched: couple of fixes This patchset contains couple of fixes following-up the shared block patchsets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: sched: fix tc_u_common lookupJiri Pirko
The offending commit wrongly assumes 1:1 mapping between block and q. However, there are multiple blocks for a single q for classful qdiscs. Since the obscure tc_u_common sharing mechanism expects it to be shared among a qdisc, fix it by storing q pointer in case the block is not shared. Reported-by: Paweł Staszewski <pstaszewski@itcare.pl> Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Fixes: 7fa9d974f3c2 ("net: sched: cls_u32: use block instead of q in tc_u_common") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: sched: don't set q pointer for shared blocksJiri Pirko
It is pointless to set block->q for block which are shared among multiple qdiscs. So remove the assignment in that case. Do a bit of code reshuffle to make block->index initialized at that point so we can use tcf_block_shared() helper. Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13mlxsw: spectrum_router: Fix error path in mlxsw_sp_vr_createJiri Pirko
Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create() use ERR_PTR macro to propagate int err through return of a pointer, the return value is not NULL in case of failure. So if one of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table is not NULL and mlxsw_sp_vr_is_used wrongly assumes that vr is in use which leads to crash like following one: [ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9 [ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum] Fix this by using local variables to hold the pointers and set vr->* only in case everything went fine. Fixes: 76610ebbde18 ("mlxsw: spectrum_router: Refactor virtual router handling") Fixes: a3d9bc506d64 ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support") Fixes: d42b0965b1d4 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: af_unix: fix typo in UNIX_SKB_FRAGS_SZ commentTobias Klauser
Change "minimun" to "minimum". Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13powerpc/macio: set a proper dma_coherent_maskChristoph Hellwig
We have expected busses to set up a coherent mask to properly use the common dma mapping code for a long time, and now that I've added a warning macio turned out to not set one up yet. This sets it to the same value as the dma_mask, which seems to be what the drivers expect. Reported-by: Mathieu Malaterre <malat@debian.org> Tested-by: Mathieu Malaterre <malat@debian.org> Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-13uapi/if_ether.h: move __UAPI_DEF_ETHHDR libc defineHauke Mehrtens
This fixes a compile problem of some user space applications by not including linux/libc-compat.h in uapi/if_ether.h. linux/libc-compat.h checks which "features" the header files, included from the libc, provide to make the Linux kernel uapi header files only provide no conflicting structures and enums. If a user application mixes kernel headers and libc headers it could happen that linux/libc-compat.h gets included too early where not all other libc headers are included yet. Then the linux/libc-compat.h would not prevent all the redefinitions and we run into compile problems. This patch removes the include of linux/libc-compat.h from uapi/if_ether.h to fix the recently introduced case, but not all as this is more or less impossible. It is no problem to do the check directly in the if_ether.h file and not in libc-compat.h as this does not need any fancy glibc header detection as glibc never provided struct ethhdr and should define __UAPI_DEF_ETHHDR by them self when they will provide this. The following test program did not compile correctly any more: #include <linux/if_ether.h> #include <netinet/in.h> #include <linux/in.h> int main(void) { return 0; } Fixes: 6926e041a892 ("uapi/if_ether.h: prevent redefinition of struct ethhdr") Reported-by: Guillaume Nault <g.nault@alphalink.fr> Cc: <stable@vger.kernel.org> # 4.15 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13blk: optimization for classic pollingNitesh Shetty
This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe. Reference: http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html Changes since v2->v3: -using __set_current_state() instead of set_current_state() Changes since v1->v2: -setting task state once in blk_poll, instead of multiple callers. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-13x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pagesTony Luck
In the following commit: ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages") ... we added code to memory_failure() to unmap the page from the kernel 1:1 virtual address space to avoid speculative access to the page logging additional errors. But memory_failure() may not always succeed in taking the page offline, especially if the page belongs to the kernel. This can happen if there are too many corrected errors on a page and either mcelog(8) or drivers/ras/cec.c asks to take a page offline. Since we remove the 1:1 mapping early in memory_failure(), we can end up with the page unmapped, but still in use. On the next access the kernel crashes :-( There are also various debug paths that call memory_failure() to simulate occurrence of an error. Since there is no actual error in memory, we don't need to map out the page for those cases. Revert most of the previous attempt and keep the solution local to arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when: 1) there is a real error 2) memory_failure() succeeds. All of this only applies to 64-bit systems. 32-bit kernel doesn't map all of memory into kernel space. It isn't worth adding the code to unmap the piece that is mapped because nobody would run a 32-bit kernel on a machine that has recoverable machine checks. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert (Persistent Memory) <elliott@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org #v4.14 Fixes: ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/mm/dump_pagetables: Add the EFI pagetable to the debugfs 'page_tables' ↵Andy Lutomirski
directory EFI is complicated enough that being able to view its pagetables is quite helpful. Rather than requiring users to fish it out of dmesg on an appropriately configured kernel, let users view it in debugfs as well. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ba158a93f3250e6fca752cff2cfb1fcdd9f2b50c.1517414050.git.luto@kernel.org [ Fixed trivial whitespace damage and fixed missing export. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/mm/encrypt: Simplify sme_pgtable_calc()Kirill A. Shutemov
sme_pgtable_calc() is unnecessary complex. It can be re-written in a more stream-lined way. As a side effect, we would get the code ready to boot-time switching between paging modes. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180131135404.40692-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/mm/encrypt: Simplify sme_populate_pgd() and sme_populate_pgd_large()Kirill A. Shutemov
sme_populate_pgd() and sme_populate_pgd_large() operate on the identity mapping, which means they want virtual addresses to be equal to physical one, without PAGE_OFFSET shift. We also need to avoid paravirtualization call there. Getting this done is tricky. We cannot use usual page table helpers. It forces us to open-code a lot of things. It makes code ugly and hard to modify. We can get it work with the page table helpers, but it requires few preprocessor tricks. - Define __pa() and __va() to be compatible with identity mapping. - Undef CONFIG_PARAVIRT and CONFIG_PARAVIRT_SPINLOCKS before including any file. This way we can avoid paravirtualization calls. Now we can user normal page table helpers just fine. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180131135404.40692-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/mm/encrypt: Move page table helpers into separate translation unitKirill A. Shutemov
There are bunch of functions in mem_encrypt.c that operate on the identity mapping, which means they want virtual addresses to be equal to physical one, without PAGE_OFFSET shift. We also need to avoid paravirtualizaion call there. Getting this done is tricky. We cannot use usual page table helpers. It forces us to open-code a lot of things. It makes code ugly and hard to modify. We can get it work with the page table helpers, but it requires few preprocessor tricks. These tricks may have side effects for the rest of the file. Let's isolate such functions into own translation unit. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180131135404.40692-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/mm: Align TLB invalidation infoNadav Amit
The TLB invalidation info is allocated on the stack, which might cause it to be unaligned. Since this information may be transferred to different cores for TLB shootdown, this may cause an additional cache line to become shared. While the overhead is likely to be small, the fix is simple. We do not use __cacheline_aligned() since it also defines the section, which is inappropriate for stack variables. Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180131211912.52064-1-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13locking/semaphore: Update the file path in documentationTycho Andersen
While reading this header I noticed that the locking stuff has moved to kernel/locking/*, so update the path in semaphore.h to point to that. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180201114119.1090-1-tycho@tycho.ws Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13locking/atomic/bitops: Document and clarify ordering semantics for failed ↵Will Deacon
test_and_{}_bit() A test_and_{}_bit() operation fails if the value of the bit is such that the modification does not take place. For example, if test_and_set_bit() returns 1. In these cases, follow the behaviour of cmpxchg and allow the operation to be unordered. This also applies to test_and_set_bit_lock() if the lock is found to be be taken already. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1518528619-20049-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13locking/qspinlock: Ensure node->count is updated before initialising nodeWill Deacon
When queuing on the qspinlock, the count field for the current CPU's head node is incremented. This needn't be atomic because locking in e.g. IRQ context is balanced and so an IRQ will return with node->count as it found it. However, the compiler could in theory reorder the initialisation of node[idx] before the increment of the head node->count, causing an IRQ to overwrite the initialised node and potentially corrupt the lock state. Avoid the potential for this harmful compiler reordering by placing a barrier() between the increment of the head node->count and the subsequent node initialisation. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13locking/qspinlock: Ensure node is initialised before updating prev->nextWill Deacon
When a locker ends up queuing on the qspinlock locking slowpath, we initialise the relevant mcs node and publish it indirectly by updating the tail portion of the lock word using xchg_tail. If we find that there was a pre-existing locker in the queue, we subsequently update their ->next field to point at our node so that we are notified when it's our turn to take the lock. This can be roughly illustrated as follows: /* Initialise the fields in node and encode a pointer to node in tail */ tail = initialise_node(node); /* * Exchange tail into the lockword using an atomic read-modify-write * operation with release semantics */ old = xchg_tail(lock, tail); /* If there was a pre-existing waiter ... */ if (old & _Q_TAIL_MASK) { prev = decode_tail(old); smp_read_barrier_depends(); /* ... then update their ->next field to point to node. WRITE_ONCE(prev->next, node); } The conditional update of prev->next therefore relies on the address dependency from the result of xchg_tail ensuring order against the prior initialisation of node. However, since the release semantics of the xchg_tail operation apply only to the write portion of the RmW, then this ordering is not guaranteed and it is possible for the CPU to return old before the writes to node have been published, consequently allowing us to point prev->next to an uninitialised node. This patch fixes the problem by making the update of prev->next a RELEASE operation, which also removes the reliance on dependency ordering. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1518528177-19169-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/error_inject: Make just_return_func() globally visibleArnd Bergmann
With link time optimizations enabled, I get a link failure: ./ccLbOEHX.ltrans19.ltrans.o: In function `override_function_with_return': <artificial>:(.text+0x7f3): undefined reference to `just_return_func' Marking the symbol .globl makes it work as expected. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Josef Bacik <jbacik@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 540adea3809f ("error-injection: Separate error-injection from kprobe") Link: http://lkml.kernel.org/r/20180202145634.200291-3-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/platform/UV: Fix GAM Range Table entries less than 1GBmike.travis@hpe.com
The latest UV platforms include the new ApachePass NVDIMMs into the UV address space. This has introduced address ranges in the Global Address Map Table that are less than the previous lowest range, which was 2GB. Fix the address calculation so it accommodates address ranges from bytes to exabytes. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Andrew Banman <andrew.banman@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13MIPS: Fix incorrect mem=X@Y handlingMarcin Nowakowski
Commit 73fbc1eba7ff ("MIPS: fix mem=X@Y commandline processing") added a fix to ensure that the memory range between PHYS_OFFSET and low memory address specified by mem= cmdline argument is not later processed by free_all_bootmem. This change was incorrect for systems where the commandline specifies more than 1 mem argument, as it will cause all memory between PHYS_OFFSET and each of the memory offsets to be marked as reserved, which results in parts of the RAM marked as reserved (Creator CI20's u-boot has a default commandline argument 'mem=256M@0x0 mem=768M@0x30000000'). Change the behaviour to ensure that only the range between PHYS_OFFSET and the lowest start address of the memories is marked as protected. This change also ensures that the range is marked protected even if it's only defined through the devicetree and not only via commandline arguments. Reported-by: Mathieu Malaterre <mathieu.malaterre@gmail.com> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@mips.com> Fixes: 73fbc1eba7ff ("MIPS: fix mem=X@Y commandline processing") Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # v4.11+ Tested-by: Mathieu Malaterre <malat@debian.org> Patchwork: https://patchwork.linux-mips.org/patch/18562/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-02-13x86/build: Add arch/x86/tools/insn_decoder_test to .gitignoreProgyan Bhattacharya
The file was generated by make command and should not be in the source tree. Signed-off-by: Progyan Bhattacharya <progyanb@acm.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13sched/cpufreq: Remove unused SUGOV_KTHREAD_PRIORITY macroLeo Yan
Since schedutil kernel thread directly set priority to 0, the macro SUGOV_KTHREAD_PRIORITY is not used. So remove it. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikram Mulukutla <markivx@codeaurora.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/1518097702-9665-1-git-send-email-leo.yan@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13MIPS: BMIPS: Fix section mismatch warningJaedon Shin
Remove the __init annotation from bmips_cpu_setup() to avoid the following warning. WARNING: vmlinux.o(.text+0x35c950): Section mismatch in reference from the function brcmstb_pm_s3() to the function .init.text:bmips_cpu_setup() The function brcmstb_pm_s3() references the function __init bmips_cpu_setup(). This is often because brcmstb_pm_s3 lacks a __init annotation or the annotation of bmips_cpu_setup is wrong. Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: linux-mips@linux-mips.org Reviewed-by: James Hogan <jhogan@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/18589/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-02-13x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a ↵Masayoshi Mizuma
physical CPU When a physical CPU is hot-removed, the following warning messages are shown while the uncore device is removed in uncore_pci_remove(): WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0xf1/0x110 ... CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 Workqueue: kacpi_hotplug acpi_hotplug_work_fn ... Call Trace: pci_device_remove+0x36/0xb0 device_release_driver_internal+0x145/0x210 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x54/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x141/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 When uncore_pci_remove() runs, it tries to get the package ID to clear the value of uncore_extra_pci_dev[].dev[] by using topology_phys_to_logical_pkg(). The warning messesages are shown because topology_phys_to_logical_pkg() returns -1. arch/x86/events/intel/uncore.c: static void uncore_pci_remove(struct pci_dev *pdev) { ... phys_id = uncore_pcibus_to_physid(pdev->bus); ... pkg = topology_phys_to_logical_pkg(phys_id); // returns -1 for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { uncore_extra_pci_dev[pkg].dev[i] = NULL; break; } } WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!! topology_phys_to_logical_pkg() tries to find cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. arch/x86/kernel/smpboot.c: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { int cpu; for_each_possible_cpu(cpu) { struct cpuinfo_x86 *c = &cpu_data(cpu); if (c->initialized && c->phys_proc_id == phys_pkg) return c->logical_proc_id; } return -1; } However, the phys_proc_id was already set to 0 by remove_siblinginfo() when the CPU was offlined. So, topology_phys_to_logical_pkg() cannot find the correct logical_proc_id and always returns -1. As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning messages are shown. What is worse is that the bogus 'pkg' index results in two bugs: - We dereference uncore_extra_pci_dev[] with a negative index - We fail to clean up a stale pointer in uncore_extra_pci_dev[][] To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo(). This should not cause any problems, because ->phys_proc_id is not used after it is hot-removed and it is re-set while hot-adding. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: yasu.isimatu@gmail.com Cc: <stable@vger.kernel.org> Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array") Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13selftests/powerpc: Fix to use ucontext_t instead of struct ucontextHarish
With glibc 2.26 'struct ucontext' is removed to improve POSIX compliance, which breaks powerpc/alignment_handler selftest. Fix the test by using ucontext_t. Tested on ppc, works with older glibc versions as well. Fixes the following: alignment_handler.c: In function ‘sighandler’: alignment_handler.c:68:5: error: dereferencing pointer to incomplete type ‘struct ucontext’ ucp->uc_mcontext.gp_regs[PT_NIP] += 4; Signed-off-by: Harish <harish@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/kdump: Fix powernv build break when KEXEC_CORE=nGuenter Roeck
If KEXEC_CORE is not enabled, powernv builds fail as follows. arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self': arch/powerpc/platforms/powernv/smp.c:236:4: error: implicit declaration of function 'crash_ipi_callback' Add dummy function calls, similar to kdump_in_progress(), to solve the problem. Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel can get HMI's") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplugGuenter Roeck
Commit e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes") adds an unconditional call to find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR is enabled. This results in the following build error if this is not the case. arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu': arch/powerpc/platforms/pseries/hotplug-cpu.c:369: undefined reference to `.find_and_online_cpu_nid' Follow the guideline provided by similar functions and provide a dummy function if CONFIG_PPC_SPLPAR is not enabled. This also moves the external function declaration into an include file where it should be. Fixes: e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes") Signed-off-by: Guenter Roeck <linux@roeck-us.net> [mpe: Change subject to emphasise the build fix] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm/hash64: Zero PGD pages on allocationAneesh Kumar K.V
On powerpc we allocate page table pages from slab caches of different sizes. Currently we have a constructor that zeroes out the objects when we allocate them for the first time. We expect the objects to be zeroed out when we free the the object back to slab cache. This happens in the unmap path. For hugetlb pages we call huge_pte_get_and_clear() to do that. With the current configuration of page table size, both PUD and PGD level tables are allocated from the same slab cache. At the PUD level, we use the second half of the table to store the slot information. But we never clear that when unmapping. When such a freed object is then allocated for a PGD page, the second half of the page table page will not be zeroed as expected. This results in a kernel crash. Fix it by always clearing PGD pages when they're allocated. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Change log wording and formatting, add whitespace] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13powerpc/mm/hash64: Store the slot information at the right offset for hugetlbAneesh Kumar K.V
The hugetlb pte entries are at the PMD and PUD level, so we can't use PTRS_PER_PTE to find the second half of the page table. Use the right offset for PUD/PMD to get to the second half of the table. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>