summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-16drm/i915/gvt: Add mmio iterator intel_gvt_for_each_tracked_mmio()Changbin Du
This patch add a function intel_gvt_for_each_tracked_mmio() to iterate each tracked mmio. The caller don't be aware of how the tracked mmios are presented internally. v2: remove snapshot_hw_mmio_registers(). Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: opregion virtualization for win guestXiaolin Zhang
this is an enhanced opregion emulation for win guest support by initializing more data members including opregion header size, version and child device propertity for display port. for simplicity, redefined child_device_config structure. Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: update CSB and CSB write pointer in virtual HWSPWeinan Li
The engine provides a mirror of the CSB and CSB write pointer in the HWSP. Read these status from virtual HWSP in VM can reduce CPU utilization while applications have much more short GPU workloads. Here we update the corresponding data in virtual HWSP as it in virtual MMIO. Before read these status from HWSP in GVT-g VM, please ensure the host support it by checking the BIT(3) of caps in PVINFO. Virtual HWSP only support GEN8+ platform, since the HWSP MMIO may change follow the platform update, please add the corresponding MMIO emulation when enable new platforms in GVT-g. v3 : Add address audit in HWSP address update. v4 : Separate this patch with enalbe virtual HWSP in VM. Use intel_gvt_render_mmio_to_ring_id() to determine ring_id by offset. v5 : Remove unnessary check about Gen8, GVT-g only support Gen8+. Signed-off-by: Weinan Li <weinan.z.li@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: Refine broken PPGTT scratchZhi Wang
Refine previously broken PPGTT scratch. Scratch PTE was no correctly handled and also the handling of scratch entries in page table walk was not well organized, which brings gaps of introducing lazy shadow. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Introduce ops->set_present()Zhi Wang
We need ops->set_present() during generating a new scratch page table entry. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Introduce page table type of current level in GTT type ↵Zhi Wang
enumerations Need to figure out page table type of current level by GTT entry type during getting a scratch page table entry. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Fix a bug of unexpectedly clear scratch page tableZhi Wang
During a vGPU reset, the scratch page table shouldn't be cleared, what needs to be cleared should be the scratch page. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Let the caller choose if a shadow page should be put into hash ↵Zhi Wang
table As we want to re-use intel_vgpu_shadow_page in buidling scrach page table and we don't want to put scrach page table page into hash table, a new param is introduced to give the caller a choice to decide if a shadow page should be put into hash table. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Use I915_GTT_PAGE_SIZEZhi Wang
As there is already an I915_GTT_PAGE_SIZE marco in i915, let GVT-g use it as well. Also this patch re-names some GTT marcos with additional prefix. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()Zhi Wang
Since many emulation logic needs to convert the offset of ring registers into ring id, we export it for other caller which might need it. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Factor intel_vgpu_page_trackZhi Wang
As the data structure of "intel_vgpu_guest_page" will become much heavier in future, it's better to factor out the guest memory page track mechnisim as early as possible. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Refine shadow batch bufferZhi Wang
1) Use standard i915 GEM object sequence to access the shadow batch buffer. 2) Manage i915 vma life cycle to solve one FIXME. v2: - Refine code structure. - Refine the usage of GEM APIs. - Add the missing lock/unlock in release_shadow_batch_buffer. Test on my SKL NuC. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Refine find_bb_size()Zhi Wang
Returns the error code if something is wrong and the size of batch buffer is passed through the pointer. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Use BIT() to make klockwork happyZhi Wang
Replace the plain bit usage with BIT() to make klockwork happy. Cc: Deng Hongyi <hongyi.deng@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Add basic debugfs infrastructureChangbin Du
We need debugfs entry to expose some debug information of gvt and vGPUs. The first tool will be added is mmio-diff, which help to find the difference values of host and vGPU mmio. It's useful for platform enabling. This patch just add a basic debugfs infrastructure, each vGPU has its own sub-folder. Two simple attributes are created as a template. . ├── num_tracked_mmio ├── vgpu1 | └── active └── vgpu2 └── active Signed-off-by: Changbin Du <changbin.du@intel.com> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Refactor vGPU type code in kvmgt partfred gao
all the vGPU type related code in kvmgt will be moved into gvt.c/gvt.h files while the common vGPU type related interfaces will be called. v2: - intel_gvt_{init,cleanup}_vgpu_type_groups are initialized in gvt part. (Wang, Zhi) Signed-off-by: fred gao <fred.gao@intel.com> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Move vGPU type related code into gvt filefred gao
In this patch, all the vGPU type related code will be merged into same gvt file and the common interface will be exposed to both XenGT and KvmGT. v2: - remove the useless mdev_* gvt_ops. add get_gvt_attr ops for MPT module. intel_gvt_{init,cleanup}_vgpu_type_groups are initialized in gvt part. (Wang, Zhi) - set gvt_vgpu_type_groups[i] to NULL. (Zhang,Xiong) Signed-off-by: fred gao <fred.gao@intel.com> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Move clean_workloads() into scheduler.cZhi Wang
Move clean_workloads() into scheduler.c since it's not specific to execlist. v2: - Remove clean_workloads in intel_vgpu_select_submission_ops. (Zhenyu) Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Introduce intel_vgpu_reset_submissionZhi Wang
Introduce an generic API to reset vGPU virtual submission interface. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Introduce vGPU submission opsZhi Wang
Introduce vGPU submission ops to support easy switching submission mode of one vGPU between different OSes. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Remove one extra declaration in scheduler.hZhi Wang
Now the function has been moved into scheduler.c. The extra declaration is not necessary. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Move common vGPU workload creation into scheduler.cZhi Wang
Move common vGPU workload creation functions into scheduler.c since they are not specific to execlist emulation. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Move common workload preparation into prepare_workload()Zhi Wang
Move common workload preparation into prepare_workload() in scheduler.c, as they are not specific to execlist emulation. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Factor out prepare_workload()Zhi Wang
Factor out prepare_workload() for the following re-factor. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Factor out vGPU workload creation/destroyZhi Wang
Factor out vGPU workload creation/destroy functions since they are not specific to execlist emulation. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Use dyndbg for gvt debug infoShuo Liu
It's better enable/disable and classify gvt debug info dynamically. This patch change it to dyndbg so can be dynamically enable/disable each item. All gvt log can be enabled by, $ echo 'file *gvt* +p' > /sys/kernel/debug/dynamic_debug/control Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: ensure -ve return value is handled correctlyColin Ian King
An earlier fix changed the return type from find_bb_size however the integer return is being assigned to a unsigned int so the -ve error check will never be detected. Make bb_size an int to fix this. Detected by CoverityScan CID#1456886 ("Unsigned compared against 0") Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: Add VM healthy check for submit_contextfred gao
When a scan error occurs in submit_context, this patch is to decrease the mm ref count and free the workload struct before the workload is abandoned. v2: - submit_context related code should be combined together. (Zhenyu) v3: - free all the unsubmitted workloads. (Zhenyu) v4: - refine the clean path. (Zhenyu) v5: - polish the title. (Zhenyu) Signed-off-by: fred gao <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: Add VM healthy check for workload_threadfred gao
When a scan error occurs in dispatch_workload, this patch is to check the healthy state and free all the queued workloads before the failsafe mode is entered. Signed-off-by: fred gao <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: Change the return type during command scanfred gao
Generally, there are 3 types of errors during command scan: a) some commands might be unknown with EBADRQC; b) some cmd access invalid address with EFAULT; c) some unexpected force nonpriv cmd with EPERM. later the healthy state can be judged through the return error. v2: - remove some internal i915 errors rating. (Zhenyu) v3: - the healthy state is judged through the internal defined return error. (Zhenyu) - force non priv cmd error can be ignored. (Kevin) v4: - reuse standard defined errno instead of recreate, e.g EBADRQC for unknown cmd, EFAULT for invalid address, EPERM for nonpriv. (Zhenyu) v5: - remove some irrelevant code for the patch. - fix typo of vgpu_is_vm_unhealthy. (Zhenyu) v6: - move the healthy check and failsafe code into another patch. (Zhenyu) v7: - polish title and commit message. (Zhenyu) Signed-off-by: fred gao <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-16drm/i915/gvt: Do not allocate initial ring scan bufferZhi Wang
Theoretically, the largest bulk of commands in the ring buffer of an engine might be the first submission, which usually contains a lot of commands to initialize the HW. After removing the initial allocation of the ring scan buffer and let krealloc() do everything we need, we still have a big chance to get the buffer of suitable size in the first submission. Tested on my SKL NUC. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Move ring scan buffers into intel_vgpu_submissionZhi Wang
Move ring scan buffers into intel_vgpu_submission since they belongs to a part of vGPU submission stuffs. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Rename reserved ring bufferZhi Wang
"reserved" means reserve something from somewhere. Actually they are buffers used by command scanner. Rename it to ring_scan_buffer. v2: - Remove the usage of an extra variable. (Zhenyu) Fixes: 0a53bc07f044 ("drm/i915/gvt: Separate cmd scan from request allocation") Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Fix a memory leak in cmd_parser.cZhi Wang
The pointer points to the original memory can never take the return value of krealloc(). Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Move tlb_handle_pending into intel_vgpu_submissionZhi Wang
Move tlb_handle_pending into intel_vgpu_submssion since it belongs to a part of vGPU submission stuffs Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Introduce intel_vgpu_submissionZhi Wang
Introduce intel_vgpu_submission to hold all members related to submission in struct intel_vgpu before. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Move workload cache init/clean into intel_vgpu_{setup, ↵Zhi Wang
clean}_submission() Move vGPU workload cache initialization/de-initialization into intel_vgpu_{setup, clean}_submission() since they are not specific to execlist stuffs. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Rename intel_vgpu_{init, clean}_gvt_context()Zhi Wang
To move workload related functions into scheduler.c, an expected way is to collect all the init/clean functions related to vGPU workload submission into fewer functions. Rename intel_vgpu_{init, clean}_gvt_context() for above usage in future. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Make elsp_dwords in the right orderZhi Wang
The context descriptors in elsp_dwords are stored in a reversed order and the definition of context descriptor is also reversed. The revesred stuff is hard to be used and might cause misunderstanding. Make them in the right oder for following code re-factoring. Tested on my SKL NUC. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2017-11-16drm/i915/gvt: Add support for opregion virtualizationXiaolin Zhang
opregion emulated with a copy from host which leads to some display bugs such as guest resolution adjustment failure due to host opregion fail to claim port D support. with a fake opregion table provided to fully emulate opregion to meet guest port requirement. v1 - initial patch v2 - reforamt opregion arrary with 0x02x output v3 - opregion array removed with opregion generation on host initizaiton v4 - rebased v3 patch from stable branch to staging branch which also has different struct child_device_config and addressed v3 review comments. Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-16Merge branch 'drm-next-4.15-dc' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-next Various fixes for DC for 4.15. * 'drm-next-4.15-dc' of git://people.freedesktop.org/~agd5f/linux: drm/amd/display: fix MST link training fail division by 0 drm/amd/display: Fix formatting for null pointer dereference fix drm/amd/display: Remove dangling planes on dc commit state drm/amd/display: add flip_immediate to commit update for stream drm/amd/display: Miss register MST encoder cbs drm/amd/display: Fix warnings on S3 resume drm/amd/display: use num_timing_generator instead of pipe_count drm/amd/display: use configurable FBC option in dm drm/amd/display: fix AZ clock not enabled before program AZ endpoint amdgpu/dm: Don't use DRM_ERROR in amdgpu_dm_atomic_check
2017-11-15memory hotplug: fix comments when adding sectionFan Du
Here, pfn_to_node should be page_to_nid. Link: http://lkml.kernel.org/r/1510735205-22540-1-git-send-email-fan.du@intel.com Signed-off-by: Fan Du <fan.du@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: make alloc_node_mem_map a void call if we don't have ↵Oscar Salvador
CONFIG_FLAT_NODE_MEM_MAP free_area_init_node() calls alloc_node_mem_map(), but this function does nothing unless we have CONFIG_FLAT_NODE_MEM_MAP. As a cleanup, we can move the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" within alloc_node_mem_map() out of the function, and define a alloc_node_mem_map() { } when CONFIG_FLAT_NODE_MEM_MAP is not present. This also moves the printk that lays within the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" block from free_area_init_node() to alloc_node_mem_map(), getting rid of the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" in free_area_init_node(). [akpm@linux-foundation.org: clean up the printk while we're there] Link: http://lkml.kernel.org/r/20171114111935.GA11758@techadventures.net Signed-off-by: Oscar Salvador <osalvador@techadventures.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: simplify nodemask printingMichal Hocko
alloc_warn() and dump_header() have to explicitly handle NULL nodemask which forces both paths to use pr_cont. We can do better. printk already handles NULL pointers properly so all we need is to teach nodemask_pr_args to handle NULL nodemask carefully. This allows simplification of both alloc_warn() and dump_header() and gets rid of pr_cont altogether. This patch has been motivated by patch from Joe Perches http://lkml.kernel.org/r/b31236dfe3fc924054fd7842bde678e71d193638.1509991345.git.joe@perches.com [akpm@linux-foundation.org: fix tile warning, per Arnd] Link: http://lkml.kernel.org/r/20171109100531.3cn2hcqnuj7mjaju@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Joe Perches <joe@perches.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm,oom_reaper: remove pointless kthread_run() error checkTetsuo Handa
Since oom_init() is called before userspace processes start, memory allocation failure for creating the OOM reaper kernel thread will let the OOM killer call panic() rather than wake up the OOM reaper. Link: http://lkml.kernel.org/r/1510137800-4602-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm/page_ext.c: check if page_ext is not preparedJaewon Kim
online_page_ext() and page_ext_init() allocate page_ext for each section, but they do not allocate if the first PFN is !pfn_present(pfn) or !pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN, __set_page_owner will try to get page_ext through lookup_page_ext. Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value 0. This incurrs invalid address access. This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00 is being used for page_ext. section->page_ext is NULL, get_entry returned invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00. To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext will be checked at all times. Unable to handle kernel paging request at virtual address 01dfa014 ------------[ cut here ]------------ Kernel BUG at ffffff80082371e0 [verbose debug info unavailable] Internal error: Oops: 96000045 [#1] PREEMPT SMP Modules linked in: PC is at __set_page_owner+0x48/0x78 LR is at __set_page_owner+0x44/0x78 __set_page_owner+0x48/0x78 get_page_from_freelist+0x880/0x8e8 __alloc_pages_nodemask+0x14c/0xc48 __do_page_cache_readahead+0xdc/0x264 filemap_fault+0x2ac/0x550 ext4_filemap_fault+0x3c/0x58 __do_fault+0x80/0x120 handle_mm_fault+0x704/0xbb0 do_page_fault+0x2e8/0x394 do_mem_abort+0x88/0x124 Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return value of lookup_page_ext for all call sites"). Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com Fixes: eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging") Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: <stable@vger.kernel.org> [depends on f86e427197, see above] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15writeback: remove unused function parameterWang Long
The parameter `struct bdi_writeback *wb` is not been used in the function body. Remove it. Link: http://lkml.kernel.org/r/1509685485-15278-1-git-send-email-wanglong19@meituan.com Signed-off-by: Wang Long <wanglong19@meituan.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: do not rely on preempt_count in print_vma_addrMichal Hocko
The preempt count check on print_vma_addr has been added by commit e8bff74afbdb ("x86: fix "BUG: sleeping function called from invalid context" in print_vma_addr()") and it relied on the elevated preempt count from preempt_conditional_sti because preempt_count check doesn't work on non preemptive kernels by default. The code has evolved though and commit d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag handling") has replaced preempt_conditional_sti by an explicit preempt_disable which is noop on !PREEMPT so the check in print_vma_addr is broken. Fix the issue by using trylock on mmap_sem rather than chacking the preempt count. The allocation we are relying on has to be GFP_NOWAIT as well. There is a chance that we won't dump the vma state if the lock is contended or the memory short but this is acceptable outcome and much less fragile than the not working preemption check or tricks around it. Link: http://lkml.kernel.org/r/20171106134031.g6dbelg55mrbyc6i@dhcp22.suse.cz Fixes: d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag handling") Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Yang Shi <yang.s@alibaba-inc.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm, sparse: do not swamp log with huge vmemmap allocation failuresMichal Hocko
While doing memory hotplug tests under heavy memory pressure we have noticed too many page allocation failures when allocating vmemmap memmap backed by huge page kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO) [...] Call Trace: dump_trace+0x59/0x310 show_stack_log_lvl+0xea/0x170 show_stack+0x21/0x40 dump_stack+0x5c/0x7c warn_alloc_failed+0xe2/0x150 __alloc_pages_nodemask+0x3ed/0xb20 alloc_pages_current+0x7f/0x100 vmemmap_alloc_block+0x79/0xb6 __vmemmap_alloc_block_buf+0x136/0x145 vmemmap_populate+0xd2/0x2b9 sparse_mem_map_populate+0x23/0x30 sparse_add_one_section+0x68/0x18e __add_pages+0x10a/0x1d0 arch_add_memory+0x4a/0xc0 add_memory_resource+0x89/0x160 add_memory+0x6d/0xd0 acpi_memory_device_add+0x181/0x251 acpi_bus_attach+0xfd/0x19b acpi_bus_scan+0x59/0x69 acpi_device_hotplug+0xd2/0x41f acpi_hotplug_work_fn+0x1a/0x23 process_one_work+0x14e/0x410 worker_thread+0x116/0x490 kthread+0xbd/0xe0 ret_from_fork+0x3f/0x70 and we do see many of those because essentially every allocation fails for each memory section. This is an excessive way to tell the user that there is nothing to really worry about because we do have a fallback mechanism to use base pages. The only downside might be a performance degradation due to TLB pressure. This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn explicitly once on the first allocation failure. This will reduce the noise in the kernel log considerably, while we still have an indication that a performance might be impacted. [mhocko@kernel.org: forgot to git add the follow up fix] Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>