Age | Commit message (Collapse) | Author |
|
There is a lot of duplication in the rubric around actually setting or
clearing a mem region flag. Create a new helper function to do this and
reduce each of memblock_mark_hotplug() and memblock_clear_hotplug() to a
single line.
This will be useful if someone were to add a new mem region flag - which
I hope to be doing some day soon. But it looks like a plausible cleanup
even without that - so I'd like to get it out of the way now.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Emil Medve <Emilian.Medve@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
task_struct->memcg_kmem_skip_account was initially introduced to avoid
recursion during kmem cache creation: memcg_kmem_get_cache, which is
called by kmem_cache_alloc to determine the per-memcg cache to account
allocation to, may issue lazy cache creation if the needed cache doesn't
exist, which means issuing yet another kmem_cache_alloc. We can't just
pass a flag to the nested kmem_cache_alloc disabling kmem accounting,
because there are hidden allocations, e.g. in INIT_WORK. So we
introduced a flag on the task_struct, memcg_kmem_skip_account, making
memcg_kmem_get_cache return immediately.
By its nature, the flag may also be used to disable accounting for
allocations shared among different cgroups, and currently it is used this
way in memcg_activate_kmem. Using it like this looks like abusing it to
me. If we want to disable accounting for some allocations (which we will
definitely want one day), we should either add GFP_NO_MEMCG or GFP_MEMCG
flag in order to blacklist/whitelist some allocations.
For now, let's simply remove memcg_stop/resume_kmem_account from
memcg_activate_kmem.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We already assured the current task has mm in memcg_kmem_should_charge,
no need to double check.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cpuset code stopped using cgroup_lock in favor of cpuset_mutex long ago.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The alignment in cma_alloc() was done w.r.t. the bitmap. This is a
problem when, for example:
- a device requires 16M (order 12) alignment
- the CMA region is not 16 M aligned
In such a case, can result with the CMA region starting at, say,
0x2f800000 but any allocation you make from there will be aligned from
there. Requesting an allocation of 32 M with 16 M alignment will result
in an allocation from 0x2f800000 to 0x31800000, which doesn't work very
well if your strange device requires 16M alignment.
Change to use bitmap_find_next_zero_area_off() to account for the
difference in alignment at reserve-time and alloc-time.
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a bitmap_find_next_zero_area_off() function which works like
bitmap_find_next_zero_area() function except it allows an offset to be
specified when alignment is checked. This lets caller request a bit such
that its number plus the offset is aligned according to the mask.
[gregory.0xf0@gmail.com: Retrieved from https://patchwork.linuxtv.org/patch/6254/ and updated documentation]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The unmap_mapping_range family of functions do the unmapping of user pages
(ultimately via zap_page_range_single) without touching the actual
interval tree, thus share the lock.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Shrinking/truncate logic can call nommu_shrink_inode_mappings() to verify
that any shared mappings of the inode in question aren't broken (dead
zone). afaict the only user being ramfs to handle the size change
attribute.
Pretty much a no-brainer to share the lock.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
No brainer conversion: collect_procs_file() only schedules a process for
later kill, share the lock, similarly to the anon vma variant.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__xip_unmap() will remove the xip sparse page from the cache and take down
pte mapping, without altering the interval tree, thus share the
i_mmap_rwsem when searching for the ptes to unmap.
Additionally, tidy up the function a bit and make variables only local to
the interval tree walk loop.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Both register and unregister call build_map_info() in order to create the
list of mappings before installing or removing breakpoints for every mm
which maps file backed memory. As such, there is no reason to hold the
i_mmap_rwsem exclusively, so share it and allow concurrent readers to
build the mapping data.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Similarly to the anon memory counterpart, we can share the mapping's lock
ownership as the interval tree is not modified when doing doing the walk,
only the file page.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
similar data, one for file backed pages and the other for anon memory. To
this end, this lock can also be a rwsem. In addition, there are some
important opportunities to share the lock when there are no tree
modifications.
This conversion is straightforward. For now, all users take the write
lock.
[sfr@canb.auug.org.au: update fremap.c]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Convert all open coded mutex_lock/unlock calls to the
i_mmap_[lock/unlock]_write() helpers.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This series is a continuation of the conversion of the i_mmap_mutex to
rwsem, following what we have for the anon memory counterpart. With
Hugh's feedback from the first iteration.
Ultimately, the most obvious paths that require exclusive ownership of the
lock is when we modify the VMA interval tree, via
vma_interval_tree_insert() and vma_interval_tree_remove() families. Cases
such as unmapping, where the ptes content is changed but the tree remains
untouched should make it safe to share the i_mmap_rwsem.
As such, the code of course is straightforward, however the devil is very
much in the details. While its been tested on a number of workloads
without anything exploding, I would not be surprised if there are some
less documented/known assumptions about the lock that could suffer from
these changes. Or maybe I'm just missing something, but either way I
believe its at the point where it could use more eyes and hopefully some
time in linux-next.
Because the lock type conversion is the heart of this patchset,
its worth noting a few comparisons between mutex vs rwsem (xadd):
(i) Same size, no extra footprint.
(ii) Both have CONFIG_XXX_SPIN_ON_OWNER capabilities for
exclusive lock ownership.
(iii) Both can be slightly unfair wrt exclusive ownership, with
writer lock stealing properties, not necessarily respecting
FIFO order for granting the lock when contended.
(iv) Mutexes can be slightly faster than rwsems when
the lock is non-contended.
(v) Both suck at performance for debug (slowpaths), which
shouldn't matter anyway.
Sharing the lock is obviously beneficial, and sem writer ownership is
close enough to mutexes. The biggest winner of these changes is
migration.
As for concrete numbers, the following performance results are for a
4-socket 60-core IvyBridge-EX with 130Gb of RAM.
Both alltests and disk (xfs+ramdisk) workloads of aim7 suite do quite well
with this set, with a steady ~60% throughput (jpm) increase for alltests
and up to ~30% for disk for high amounts of concurrency. Lower counts of
workload users (< 100) does not show much difference at all, so at least
no regressions.
3.18-rc1 3.18-rc1-i_mmap_rwsem
alltests-100 17918.72 ( 0.00%) 28417.97 ( 58.59%)
alltests-200 16529.39 ( 0.00%) 26807.92 ( 62.18%)
alltests-300 16591.17 ( 0.00%) 26878.08 ( 62.00%)
alltests-400 16490.37 ( 0.00%) 26664.63 ( 61.70%)
alltests-500 16593.17 ( 0.00%) 26433.72 ( 59.30%)
alltests-600 16508.56 ( 0.00%) 26409.20 ( 59.97%)
alltests-700 16508.19 ( 0.00%) 26298.58 ( 59.31%)
alltests-800 16437.58 ( 0.00%) 26433.02 ( 60.81%)
alltests-900 16418.35 ( 0.00%) 26241.61 ( 59.83%)
alltests-1000 16369.00 ( 0.00%) 26195.76 ( 60.03%)
alltests-1100 16330.11 ( 0.00%) 26133.46 ( 60.03%)
alltests-1200 16341.30 ( 0.00%) 26084.03 ( 59.62%)
alltests-1300 16304.75 ( 0.00%) 26024.74 ( 59.61%)
alltests-1400 16231.08 ( 0.00%) 25952.35 ( 59.89%)
alltests-1500 16168.06 ( 0.00%) 25850.58 ( 59.89%)
alltests-1600 16142.56 ( 0.00%) 25767.42 ( 59.62%)
alltests-1700 16118.91 ( 0.00%) 25689.58 ( 59.38%)
alltests-1800 16068.06 ( 0.00%) 25599.71 ( 59.32%)
alltests-1900 16046.94 ( 0.00%) 25525.92 ( 59.07%)
alltests-2000 16007.26 ( 0.00%) 25513.07 ( 59.38%)
disk-100 7582.14 ( 0.00%) 7257.48 ( -4.28%)
disk-200 6962.44 ( 0.00%) 7109.15 ( 2.11%)
disk-300 6435.93 ( 0.00%) 6904.75 ( 7.28%)
disk-400 6370.84 ( 0.00%) 6861.26 ( 7.70%)
disk-500 6353.42 ( 0.00%) 6846.71 ( 7.76%)
disk-600 6368.82 ( 0.00%) 6806.75 ( 6.88%)
disk-700 6331.37 ( 0.00%) 6796.01 ( 7.34%)
disk-800 6324.22 ( 0.00%) 6788.00 ( 7.33%)
disk-900 6253.52 ( 0.00%) 6750.43 ( 7.95%)
disk-1000 6242.53 ( 0.00%) 6855.11 ( 9.81%)
disk-1100 6234.75 ( 0.00%) 6858.47 ( 10.00%)
disk-1200 6312.76 ( 0.00%) 6845.13 ( 8.43%)
disk-1300 6309.95 ( 0.00%) 6834.51 ( 8.31%)
disk-1400 6171.76 ( 0.00%) 6787.09 ( 9.97%)
disk-1500 6139.81 ( 0.00%) 6761.09 ( 10.12%)
disk-1600 4807.12 ( 0.00%) 6725.33 ( 39.90%)
disk-1700 4669.50 ( 0.00%) 5985.38 ( 28.18%)
disk-1800 4663.51 ( 0.00%) 5972.99 ( 28.08%)
disk-1900 4674.31 ( 0.00%) 5949.94 ( 27.29%)
disk-2000 4668.36 ( 0.00%) 5834.93 ( 24.99%)
In addition, a 67.5% increase in successfully migrated NUMA pages, thus
improving node locality.
The patch layout is simple but designed for bisection (in case reversion
is needed if the changes break upstream) and easier review:
o Patches 1-4 convert the i_mmap lock from mutex to rwsem.
o Patches 5-10 share the lock in specific paths, each patch
details the rationale behind why it should be safe.
This patchset has been tested with: postgres 9.4 (with brand new hugetlb
support), hugetlbfs test suite (all tests pass, in fact more tests pass
with these changes than with an upstream kernel), ltp, aim7 benchmarks,
memcached and iozone with the -B option for mmap'ing. *Untested* paths
are nommu, memory-failure, uprobes and xip.
This patch (of 8):
Various parts of the kernel acquire and release this mutex, so add
i_mmap_lock_write() and immap_unlock_write() helper functions that will
encapsulate this logic. The next patch will make use of these.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
My current email address will be gone shortly, update my email to be a
gmail one.
Signed-off-by: Xiubo Li <Li.Xiubo@freescale.com>
Cc: Timur Tabi <timur@tabi.org>
Cc: Takashi Iwai <tiwai@suse.de>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 7654e9d4fd8f ("drivers/rtc/rtc-snvs: fix suspend/resume")
replaces SIMPLE_DEV_PM_OPS with direct declaration of snvs_rtc_pm_ops,
but does so outside #ifdef CONFIG_PM_SLEEP. This causes the driver
build to fail if CONFIG_PM_SLEEP is not configured.
Fixes: 7654e9d4fd8f ("drivers/rtc/rtc-snvs: fix suspend/resume")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Sanchayan Maity <maitysanchayan@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the vgic initializes its internal state it does so based on the
number of VCPUs available at the time. If we allow KVM to create more
VCPUs after the VGIC has been initialized, we are likely to error out in
unfortunate ways later, perform buffer overflows etc.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Some code paths will need to check to see if the internal state of the
vgic has been initialized (such as when creating new VCPUs), so
introduce such a macro that checks the nr_cpus field which is set when
the vgic has been initialized.
Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
vgic_init() will call this function, and code should never errornously
assume the vgic to be properly initialized after an error.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The vgic_initialized() macro currently returns the state of the
vgic->ready flag, which indicates if the vgic is ready to be used when
running a VM, not specifically if its internal state has been
initialized.
Rename the macro accordingly in preparation for a more nuanced
initialization flow.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
VGIC initialization currently happens in three phases:
(1) kvm_vgic_create() (triggered by userspace GIC creation)
(2) vgic_init_maps() (triggered by userspace GIC register read/write
requests, or from kvm_vgic_init() if not already run)
(3) kvm_vgic_init() (triggered by first VM run)
We were doing initialization of some state to correspond with the
state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
since it will overwrite changes made by userspace using the
register access APIs before the VM is run. Move this initialization
earlier, into the vgic_init_maps() phase.
This fixes a bug where QEMU could successfully restore a saved
VM state snapshot into a VM that had already been run, but could
not restore it "from cold" using the -loadvm command line option
(the symptoms being that the restored VM would run but interrupts
were ignored).
Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
kvm_vgic_map_resources.
[ This patch is originally written by Peter Maydell, but I have
modified it somewhat heavily, renaming various bits and moving code
around. If something is broken, I am to be blamed. - Christoffer ]
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Introduce a new function to unmap user RAM regions in the stage2 page
tables. This is needed on reboot (or when the guest turns off the MMU)
to ensure we fault in pages again and make the dcache, RAM, and icache
coherent.
Using unmap_stage2_range for the whole guest physical range does not
work, because that unmaps IO regions (such as the GIC) which will not be
recreated or in the best case faulted in on a page-by-page basis.
Call this function on secondary and subsequent calls to the
KVM_ARM_VCPU_INIT ioctl so that a reset VCPU will detect the guest
Stage-1 MMU is off when faulting in pages and make the caches coherent.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
should really be turned off for the VM adhering to the suggestions in
the PSCI spec, and it's the sane thing to do.
Also, clarify the behavior and expectations for exits to user space with
the KVM_EXIT_SYSTEM_EVENT case.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
It is not clear that this ioctl can be called multiple times for a given
vcpu. Userspace already does this, so clarify the ABI.
Also specify that userspace is expected to always make secondary and
subsequent calls to the ioctl with the same parameters for the VCPU as
the initial call (which userspace also already does).
Add code to check that userspace doesn't violate that ABI in the future,
and move the kvm_vcpu_set_target() function which is currently
duplicated between the 32-bit and 64-bit versions in guest.c to a common
static function in arm.c, shared between both architectures.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When userspace resets the vcpu using KVM_ARM_VCPU_INIT, we should also
reset the HCR, because we now modify the HCR dynamically to
enable/disable trapping of guest accesses to the VM registers.
This is crucial for reboot of VMs working since otherwise we will not be
doing the necessary cache maintenance operations when faulting in pages
with the guest MMU off.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The implementation of KVM_ARM_VCPU_INIT is currently not doing what
userspace expects, namely making sure that a vcpu which may have been
turned off using PSCI is returned to its initial state, which would be
powered on if userspace does not set the KVM_ARM_VCPU_POWER_OFF flag.
Implement the expected functionality and clarify the ABI.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
If a VCPU was originally started with power off (typically to be brought
up by PSCI in SMP configurations), there is no need to clear the
POWER_OFF flag in the kernel, as this flag is only tested during the
init ioctl itself.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When issuing a MAPD command, one of the parameters passed to the ITS
is the number of EventID bits used to index the per-device Interrupt
Translation Table (ITT). Crucially, this is the number of bits
*minus one*.
This has two consequences:
- The size of the ITT has to be a strict power of two, no matter
how many different events the device is actually going to generate.
- It is impossible to express an ITT with a single entry, as you
would have to tell the ITS to "use zero bit from the EventID",
and that clashes with "minus one" above.
Fix this by allocating the ITT with the number of vectors rounded up
to the next power of two, with a minimum of two entries.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Yun Wu (Abel) <wuyun.wu@huawei.com>
Cc: Robert Richter <robert.richter@caviumnetworks.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The ITS code could do a bit less in the alloc/free paths, and a bit
more in the activate/deactivate methods, giving a better separation
between software allocation and HW programing.
Suggested-by: Wuyun Wu (Abel) <wuyun.wu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Yun Wu (Abel) <wuyun.wu@huawei.com>
Cc: Robert Richter <robert.richter@caviumnetworks.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Fix stupid thinko on the path freeing the interrupts, where only
the first interrupt would get reset, and none of the others.
This should only affect multi-MSI allocations.
Reported-by: Wuyun Wu (Abel) <wuyun.wu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Robert Richter <robert.richter@caviumnetworks.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
When recording the state of a task for the sched_switch tracepoint a check of
task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
because, technically, a task being preempted is really in the TASK_RUNNING
state, and that is what should be recorded when tracing a sched_switch,
even if the task put itself into another state (it hasn't scheduled out
in that state yet).
But with the change to use per_cpu preempt counts, the
task_thread_info(p)->preempt_count is no longer used, and instead
task_preempt_count(p) is used.
The problem is that this does not use the current preempt count but a stale
one from a previous sched_switch. The task_preempt_count(p) uses
saved_preempt_count and not preempt_count(). But for tracing sched_switch,
if p is current, we really want preempt_count().
I hit this bug when I was tracing sleep and the call from do_nanosleep()
scheduled out in the "RUNNING" state.
sleep-4290 [000] 537272.259992: sched_switch: sleep:4290 [120] R ==> swapper/0:0 [120]
sleep-4290 [000] 537272.260015: kernel_stack: <stack trace>
=> __schedule (ffffffff8150864a)
=> schedule (ffffffff815089f8)
=> do_nanosleep (ffffffff8150b76c)
=> hrtimer_nanosleep (ffffffff8108d66b)
=> SyS_nanosleep (ffffffff8108d750)
=> return_to_handler (ffffffff8150e8e5)
=> tracesys_phase2 (ffffffff8150c844)
After a bit of hair pulling, I found that the state was really
TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
set and caused the sched_switch tracepoint to show it as RUNNING.
Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.home
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org # 3.13+
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 01028747559a "sched: Create more preempt_count accessors"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The torture test should quit once it actually induces an error in the
flash. This step was accidentally removed during refactoring.
Without this fix, the torturetest just continues infinitely, or until
the maximum cycle count is reached. e.g.:
...
[ 7619.218171] mtd_test: error -5 while erasing EB 100
[ 7619.297981] mtd_test: error -5 while erasing EB 100
[ 7619.377953] mtd_test: error -5 while erasing EB 100
[ 7619.457998] mtd_test: error -5 while erasing EB 100
[ 7619.537990] mtd_test: error -5 while erasing EB 100
...
Fixes: 6cf78358c94f ("mtd: mtd_torturetest: use mtd_test helpers")
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <stable@vger.kernel.org>
|
|
On device remove, when testing the cmtd field of an of_flash
struct to decide whether it is a concatenated device or not,
we get a false positive on cmtd == NULL, and dereference it
subsequently. This may occur if of_flash_remove() is called
from the cleanup path of of_flash_probe().
Instead, test for NULL first, and only then perform the test
for a concatenated device.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so files that are
build conditionally if CONFIG_PM_RUNTIME is set may now be build
if CONFIG_PM is set.
Replace CONFIG_PM_RUNTIME with CONFIG_PM in kernel/trace/Makefile
for this reason.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org.
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM in x86/kernel/apic/io_apic.c.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Pull another networking update from David Miller:
"Small follow-up to the main merge pull from the other day:
1) Alexander Duyck's DMA memory barrier patch set.
2) cxgb4 driver fixes from Karen Xie.
3) Add missing export of fixed_phy_register() to modules, from Mark
Salter.
4) DSA bug fixes from Florian Fainelli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
net/macb: add TX multiqueue support for gem
linux/interrupt.h: remove the definition of unused tasklet_hi_enable
jme: replace calls to redundant function
net: ethernet: davicom: Allow to select DM9000 for nios2
net: ethernet: smsc: Allow to select SMC91X for nios2
cxgb4: Add support for QSA modules
libcxgbi: fix freeing skb prematurely
cxgb4i: use set_wr_txq() to set tx queues
cxgb4i: handle non-pdu-aligned rx data
cxgb4i: additional types of negative advice
cxgb4/cxgb4i: set the max. pdu length in firmware
cxgb4i: fix credit check for tx_data_wr
cxgb4i: fix tx immediate data credit check
net: phy: export fixed_phy_register()
fib_trie: Fix trie balancing issue if new node pushes down existing node
vlan: Add ability to always enable TSO/UFO
r8169:update rtl8168g pcie ephy parameter
net: dsa: bcm_sf2: force link for all fixed PHY devices
fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
...
|
|
There're now no users left of the SET_PM_RUNTIME_PM_OPS() macro, since
all have converted to use the SET_RUNTIME_PM_OPS() macro instead, so
let's remove it.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The currently used SET_PM_RUNTIME_PM_OPS() macro is defined to the
SET_RUNTIME_PM_OPS() macro. Convert to the later, since that's the
proper one to use.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so Kconfig options
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace PM_RUNTIME with PM in Kconfig dependencies throughout the
tree.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere in the code under
arch/arm/ (the defconfig files will be modified later).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Nishanth Menon <nm@ti.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under sound/.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Acked-by: Brian Austin <brian.austin@cirrus.com>
Acked-by: Mark Brown <broonie@kernel.org>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under
drivers/phy/.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
The alternative of CONFIG_PM_SLEEP and CONFIG_PM_RUNTIME may be
replaced with CONFIG_PM too.
Make these changes in 2 files under drivers/video/.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under
drivers/tty/.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under
drivers/spi/.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
|
|
Pull IDE update from David Miller:
"Two small IDE layer adjustments"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
drivers: ide: Fix mostly harmless off-by-one hardcoded value
IDE: Deletion of an unnecessary check before the function call "module_put"
|
|
Pull sparc update from David Miller:
"Not a lot of stuff this time around, mostly bug fixing:
- Fix alignment of 32-bit crosscall datastructure on Leon, from
Andreas Larsson.
- Several fixes to the virtual disk driver on sparc64 by Dwight
Engen, including handling resets of the service domain properly"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sunvdc: reconnect ldc after vds service domain restarts
sparc/ldc: create separate ldc_unbind from ldc_free
vio: create routines for inc,dec vio dring indexes
sunvdc: fix module unload/reload
sparc32, leon: Align ccall_info to prevent unaligned traps on crosscall
|
|
Ralf Baechle says:
"This should have been part of the merge commit c0222ac08666 (Merge
branch 'upstream' of git://git.linux-mips.org/pub/scm/-
ralf/upstream-linus) but I forgot to mention the need for this in my
pull request"
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull ARM updates from Russell King:
"The major updates included in this update are:
- Clang compatible stack pointer accesses by Behan Webster.
- SA11x0 updates from Dmitry Eremin-Solenikov.
- kgdb handling of breakpoints with read-only text/modules
- Support for Privileged-no-execute feature on ARMv7 to prevent
userspace code execution by the kernel.
- AMBA primecell bus handling of irq-safe runtime PM
- Unwinding support for memset/memzero/memmove/memcpy functions
- VFP fixes for Krait CPUs and improvements in detecting the VFP
architecture
- A number of code cleanups (using pr_*, removing or reducing the
severity of a couple of kernel messages, splitting ftrace asm code
out to a separate file, etc.)
- Add machine name to stack dump output"
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (62 commits)
ARM: 8247/2: pcmcia: sa1100: make use of device clock
ARM: 8246/2: pcmcia: sa1111: provide device clock
ARM: 8245/1: pcmcia: soc-common: enable/disable socket clocks
ARM: 8244/1: fbdev: sa1100fb: make use of device clock
ARM: 8243/1: sa1100: add a clock alias for sa1111 pcmcia device
ARM: 8242/1: sa1100: add cpu clock
ARM: 8221/1: PJ4: allow building in Thumb-2 mode
ARM: 8234/1: sa1100: reorder IRQ handling code
ARM: 8233/1: sa1100: switch to hwirq usage
ARM: 8232/1: sa1100: merge GPIO multiplexer IRQ to "normal" irq domain
ARM: 8231/1: sa1100: introduce irqdomains support
ARM: 8230/1: sa1100: shift IRQs by one
ARM: 8229/1: sa1100: replace irq numbers with names in irq driver
ARM: 8228/1: sa1100: drop entry-macro.S
ARM: 8227/1: sa1100: switch to MULTI_IRQ_HANDLER
ARM: 8241/1: Update processor_modes for hyp and monitor mode
ARM: 8240/1: MCPM: document mcpm_sync_init()
ARM: 8239/1: Introduce {set,clear}_pte_bit
ARM: 8238/1: mm: Refine set_memory_* functions
ARM: 8237/1: fix flush_pfn_alias
...
|