summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-03-25mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolateLaura Abbott
Commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") changed the logic of unset_migratetype_isolate to check the buddy allocator and explicitly call __free_pages to merge. The page that is being freed in this path never had prep_new_page called so set_page_refcounted is called explicitly but there is no call to kernel_map_pages. With the default kernel_map_pages this is mostly harmless but if kernel_map_pages does any manipulation of the page tables (unmapping or setting pages to read only) this may trigger a fault: alloc_contig_range test_pages_isolated(ceb00, ced00) failed Unable to handle kernel paging request at virtual address ffffffc0cec00000 pgd = ffffffc045fc4000 [ffffffc0cec00000] *pgd=0000000000000000 Internal error: Oops: 9600004f [#1] PREEMPT SMP Modules linked in: exfatfs CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1 task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000 PC is at memset+0xc8/0x1c0 LR is at kernel_map_pages+0x1ec/0x244 Fix this by calling kernel_map_pages to ensure the page is set in the page table properly Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm/slub: fix lockups on PREEMPT && !SMP kernelsMark Rutland
Commit 9aabf810a67c ("mm/slub: optimize alloc/free fastpath by removing preemption on/off") introduced an occasional hang for kernels built with CONFIG_PREEMPT && !CONFIG_SMP. The problem is the following loop the patch introduced to slab_alloc_node and slab_free: do { tid = this_cpu_read(s->cpu_slab->tid); c = raw_cpu_ptr(s->cpu_slab); } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid)); GCC 4.9 has been observed to hoist the load of c and c->tid above the loop for !SMP kernels (as in this case raw_cpu_ptr(x) is compile-time constant and does not force a reload). On arm64 the generated assembly looks like: ldr x4, [x0,#8] loop: ldr x1, [x0,#8] cmp x1, x4 b.ne loop If the thread is preempted between the load of c->tid (into x1) and tid (into x4), and an allocation or free occurs in another thread (bumping the cpu_slab's tid), the thread will be stuck in the loop until s->cpu_slab->tid wraps, which may be forever in the absence of allocations/frees on the same CPU. This patch changes the loop condition to access c->tid with READ_ONCE. This ensures that the value is reloaded even when the compiler would otherwise assume it could cache the value, and also ensures that the load will not be torn. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm/memory hotplug: postpone the reset of obsolete pgdatGu Zheng
Qiu Xishi reported the following BUG when testing hot-add/hot-remove node under stress condition: BUG: unable to handle kernel paging request at 0000000000025f60 IP: next_online_pgdat+0x1/0x50 PGD 0 Oops: 0000 [#1] SMP ACPI: Device does not support D3cold Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 Workqueue: events vmstat_update task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 RIP: 0010: next_online_pgdat+0x1/0x50 RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: refresh_cpu_vm_stats+0xd0/0x140 vmstat_update+0x11/0x50 process_one_work+0x194/0x3d0 worker_thread+0x12b/0x410 kthread+0xc6/0xd0 ret_from_fork+0x7c/0xb0 The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of try_offline_node, which will reset all the content of pgdat to 0, as the pgdat is accessed lock-free, so that the users still using the pgdat will panic, such as the vmstat_update routine. process A: offline node XX: vmstat_updat() refresh_cpu_vm_stats() for_each_populated_zone() find online node XX cond_resched() offline cpu and memory, then try_offline_node() node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) zone = next_zone(zone) pg_data_t *pgdat = zone->zone_pgdat; // here pgdat is NULL now next_online_pgdat(pgdat) next_online_node(pgdat->node_id); // NULL pointer access So the solution here is postponing the reset of obsolete pgdat from try_offline_node() to hotadd_new_pgdat(), and just resetting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset 0 to avoid breaking pointer information in pgdat. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Xishi Qiu <qiuxishi@huawei.com> Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25MAINTAINERS: correct rtc armada38x pattern entryJoe Perches
Commit c6a95dbee793 ("MAINTAINERS: add the RTC driver for the Armada38x") typoed the pattern, fix it. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm/pagewalk.c: prevent positive return value of walk_page_test() from being ↵Naoya Horiguchi
passed to callers walk_page_test() is purely pagewalk's internal stuff, and its positive return values are not intended to be passed to the callers of pagewalk. However, in the current code if the last vma in the do-while loop in walk_page_range() happens to return a positive value, it leaks outside walk_page_range(). So the user visible effect is invalid/unexpected return value (according to the reporter, mbind() causes it.) This patch fixes it simply by reinitializing the return value after checked. Another exposed interface, walk_page_vma(), already returns 0 for such cases so no problem. Fixes: fafaa4264eba ("pagewalk: improve vma handling") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Kazutomo Yoshii <kazutomo.yoshii@gmail.com> Reported-by: Kazutomo Yoshii <kazutomo.yoshii@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm: fix anon_vma->degree underflow in anon_vma endless growing preventionLeon Yu
I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after upgrading to 3.19 and had no luck with 4.0-rc1 neither. So, after looking into new logic introduced by commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy"), I found chances are that unlink_anon_vmas() is called without incrementing dst->anon_vma->degree in anon_vma_clone() due to allocation failure. If dst->anon_vma is not NULL in error path, its degree will be incorrectly decremented in unlink_anon_vmas() and eventually underflow when exiting as a result of another call to unlink_anon_vmas(). That's how "kernel BUG at mm/rmap.c:399!" is triggered for me. This patch fixes the underflow by dropping dst->anon_vma when allocation fails. It's safe to do so regardless of original value of dst->anon_vma because dst->anon_vma doesn't have valid meaning if anon_vma_clone() fails. Besides, callers don't care dst->anon_vma in such case neither. Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as anon_vma_clone() now does the work. [akpm@linux-foundation.org: tweak comment] Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25drivers/rtc/rtc-mrst: fix suspend/resumeLars-Peter Clausen
The Moorestown RTC driver implements suspend and resume callbacks and assigns them to the suspend and resume fields of the device_driver struct. These callbacks are never actually called by anything though. Modify the driver to properly use dev_pm_ops so that the suspend and resume functions are actually executed upon suspend/resume. [akpm@linux-foundation.org: device_driver.name is const char *] Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25aoe: update aoe maintainer informationEd Cashin
The coraid.com email address is defunct. The old aoe support area hosted at coraid.com is no longer up. These changes update the email and website to current ones. Signed-off-by: Ed Cashin <ed.cashin@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A small collection of fixes that has been gathered over the last few weeks. This contains: - A one-liner fix for NVMe, fixing a missing list_head init that could makes us oops on hitting recovery at load time. - Two small blk-mq fixes: - Fixup a bad goto jump on error handling. - Fix for oopsing if running out of reserved tags. - A memory leak fix for NBD. - Two small writeback fixes from Tejun, fixing a missing init to INITIAL_JIFFIES, and a possible underflow introduced recently. - A core merge fixup in sg gap detection, where rq->biotail was indexed with the count of rq->bio" * 'for-linus' of git://git.kernel.dk/linux-block: writeback: fix possible underflow in write bandwidth calculation NVMe: Initialize device list head before starting Fix bug in blk_rq_merge_ok blkmq: Fix NULL pointer deref when all reserved tags in blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path nbd: fix possible memory leak writeback: add missing INITIAL_JIFFIES init in global_update_bandwidth()
2015-03-25selinux: fix sel_write_enforce broken return valueJoe Perches
Return a negative error value like the rest of the entries in this function. Cc: <stable@vger.kernel.org> Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-03-25s390/smp: reenable smt after resumeHeiko Carstens
After a suspend/resume cycle we missed to enable smt again, which leads to all sorts of bugs, since the kernel assumes smt is enabled, while the hardware thinks it is not. Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reported-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-24Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull two arm64 fixes from Catalin Marinas: - switch_mm() fix where init_mm.pgd ends up in the user TTBR0; swapper_pg_dir is not suitable for user mappings - this_cpu accessors fix for preemption safety * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: percpu: Make this_cpu accessors pre-empt safe arm64: Use the reserved TTBR0 if context switching to the init_mm
2015-03-24Merge tag 'powerpc-4.0-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc fixes from Michael Ellerman: - Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER - Little endian fixes for post mobility device tree update - Add PVR for POWER8NVL processor - Fixes for hypervisor doorbell handling * tag 'powerpc-4.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER powerpc/pseries: Little endian fixes for post mobility device tree update powerpc: Add PVR for POWER8NVL processor powerpc/powernv: Fixes for hypervisor doorbell handling
2015-03-24Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Marcelo Tosatti: "Fix for higher-order page allocation failures, fix Xen-on-KVM with x2apic, L1 crash with unrestricted guest mode (nested VMX)" * git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: avoid page allocation failure in kvm_set_memory_region() KVM: x86: call irq notifiers with directed EOI KVM: nVMX: mask unrestricted_guest if disabled on L0
2015-03-24Merge branch 'for-4.0-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fix from Tejun Heo: "One patch to fix a regression from the recent switch to blk-mq tag allocation which can cause oops on SAS-attached SATA drives" * 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: Add a new flag to destinguish sas controller
2015-03-24Merge tag 'mfd-fixes-4.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fixes from Lee Jones: - Use DMA'able addresses for DMA; rtsx_usb - Use return value in the correct way; kempld-core * tag 'mfd-fixes-4.0' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: kempld-core: Fix callback return value check mfd: rtsx_usb: Prevent DMA from stack
2015-03-25drm/i915: Don't try to reference the fb in get_initial_plane_config()Damien Lespiau
Tvrtko noticed a new warning on boot: WARNING: CPU: 1 PID: 353 at include/linux/kref.h:47 drm_framebuffer_reference+0x6c/0x80 [drm]() Call Trace: [<ffffffff8161f10c>] dump_stack+0x4f/0x7b [<ffffffff81052caa>] warn_slowpath_common+0xaa/0xd0 [<ffffffff81052d8a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa00d035c>] drm_framebuffer_reference+0x6c/0x80 [drm] [<ffffffffa01c0df7>] update_state_fb.isra.54+0x47/0x50 [i915] [<ffffffffa01ccd5c>] skylake_get_initial_plane_config+0x93c/0x950 [i915] [<ffffffffa01e8721>] intel_modeset_init+0x1551/0x17c0 [i915] [<ffffffffa02476e0>] i915_driver_load+0xed0/0x11e0 [i915] [<ffffffff81627aa1>] ? _raw_spin_unlock_irqrestore+0x51/0x70 [<ffffffffa00ca8b7>] drm_dev_register+0x77/0x110 [drm] [<ffffffffa00cda3b>] drm_get_pci_dev+0x11b/0x1f0 [drm] [<ffffffff81098e3d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81627aa1>] ? _raw_spin_unlock_irqrestore+0x51/0x70 [<ffffffffa0145276>] i915_pci_probe+0x56/0x60 [i915] [<ffffffff813ad59c>] pci_device_probe+0x7c/0x100 [<ffffffff81466aad>] driver_probe_device+0x16d/0x380 We cannot take a reference at this point, not before intel_framebuffer_init() and the underlying drm_framebuffer_init(). Introduced in: commit 706dc7b549175e47f23e913b7f1e52874a7d0f56 Author: Matt Roper <matthew.d.roper@intel.com> Date: Tue Feb 3 13:10:04 2015 -0800 drm/i915: Ensure plane->state->fb stays in sync with plane->fb v2: Don't move update_state_fb(). It was moved around because I originally put update_state_fb() in intel_alloc_plane_obj() before finding a better place. (Matt) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From drm-next: (cherry picked from commit f55548b5af87ebfc586ca75748947f1c1b1a4a52) Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-03-24Merge tag 'spi-v4.0-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of driver specific fixes of the usual "important if you have that device" kind together with a fix for a use after free bug that was introduced into the trace code in some of the recent refactoring of the message queue handling" * tag 'spi-v4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: trigger trace event for message-done before mesg->complete spi: dw-mid: clear BUSY flag fist and test other one spi: qup: Fix cs-num DT property parsing
2015-03-24Merge tag 'regulator-fix-v4.0-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Two fixes here, one typo fix in the documentation and one fix for a system hang with one of the Palmas chips caused by the use of an incorrect offset being provided for one of the registers" * tag 'regulator-fix-v4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: Fix documentation for regmap in the config regulator: palmas: Correct TPS659038 register definition for REGEN2
2015-03-24Merge tag 'regmap-fix-v4.0-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "This patch fixes a bad interaction between the support that was added for having regmaps without devices for early system controller initialization and the trace support. There's a very good analysis of the actual issue in the commit message for the change" * tag 'regmap-fix-v4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: introduce regmap_name to fix syscon regmap trace events
2015-03-24ARM: dts: sunxi: Remove overclocked/overvoltaged OPPChen-Yu Tsai
Without proper regulator support for individual boards, it is dangerous to have overclocked/overvoltaged OPPs in the list. Cpufreq will increase the frequency without the accompanying voltage increase, resulting in an unstable system. Remove them for now. We can revisit them with the new version of OPP bindings, which support boost settings and frequency ranges, among other things. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2015-03-24ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP settingChen-Yu Tsai
The Olimex A10-Lime is known to be unstable when running at 1008MHz. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2015-03-24x86/asm/entry: Check for syscall exit work with IRQs disabledAndy Lutomirski
We currently have a race: if we're preempted during syscall exit, we can fail to process syscall return work that is queued up while we're preempted in ret_from_sys_call after checking ti.flags. Fix it by disabling interrupts before checking ti.flags. Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com> Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tejun Heo <tj@kernel.org> Fixes: 96b6352c1271 ("x86_64, entry: Remove the syscall exit audit") Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24arm64: percpu: Make this_cpu accessors pre-empt safeSteve Capper
this_cpu operations were implemented for arm64 in: 5284e1b arm64: xchg: Implement cmpxchg_double f97fc81 arm64: percpu: Implement this_cpu operations Unfortunately, it is possible for pre-emption to take place between address generation and data access. This can lead to cases where data is being manipulated by this_cpu for a different CPU than it was called on. Which effectively breaks the spec. This patch disables pre-emption for the this_cpu operations guaranteeing that address generation and data manipulation take place without a pre-emption in-between. Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double") Fixes: f97fc810798c ("arm64: percpu: Implement this_cpu operations") Reported-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Steve Capper <steve.capper@linaro.org> [catalin.marinas@arm.com: remove space after type cast] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-24Merge remote-tracking branches 'spi/fix/dw', 'spi/fix/queue' and ↵Mark Brown
'spi/fix/qup' into spi-linus
2015-03-24Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Improve support of compressed kernel modules (Jiri Olsa) - Add --kallsyms option to 'perf diff' (David Ahern) - Add pid/tid filtering to 'report' and 'script' commands (David Ahern) - Add support for __print_array() in libtraceevent (Javi Merino) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Fix 'probe' to get ummapped symbol address on kernel (Masami Hiramatsu) - Print big numbers using thousands' group in 'kmem' (Namhyung Kim) - Remove (null) value of "Sort order" for perf mem report (Yunlong Song) Infrastructure changes: - Handle NULL comm name in libtracevent (Josef Bacik) - Libtraceevent synchronization with trace-cmd repo (Steven Rostedt) - Work around lack of sched_getcpu() in glibc < 2.6. (Vinson Lee) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24perf tools: Add pid/tid filtering to report and script commandsDavid Ahern
The 'record' and 'top' tools already allow a user to specify a CSV of pids and/or tids of tasks to collect data. Add those options to the 'report' and 'script' analysis commands to only consider samples related to the given pids/tids. This is also inline with the existing comm option. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1427212361-7066-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24perf diff: Add kallsyms optionDavid Ahern
Required for off-box analysis to convert kernel addresses. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1427212317-7018-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Add support for __print_array()Javi Merino
Since 6ea22486ba46 ("tracing: Add array printing helper") trace can generate traces with variable element size arrays. Add support to parse them. Signed-off-by: Javi Merino <javi.merino@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1427195239-15730-1-git-send-email-javi.merino@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Free filter tokens in process_filter()Steven Rostedt (Red Hat)
valgrind showed that the filter token wasn't being freed properly in process_filter(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20150324135923.817723903@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Add way to find sub buffer boundarySteven Rostedt (Red Hat)
For debugging purposes, it may be helpful for the kbuffer library to flag when crossing a sub buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20150324135923.650983637@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent kbuffer: Remove extra update to data pointer in PADDINGSteven Rostedt (Red Hat)
When a event PADDING is hit (a deleted event that is still in the ring buffer), translate_data() sets the length of the padding and also updates the data pointer which is passed back to the caller. This is unneeded because the caller also updates the data pointer with the passed back length. translate_data() should not update the pointer, only set the length. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org # 3.12+ Link: http://lkml.kernel.org/r/20150324135923.461431960@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Make plugin options either string or booleanSteven Rostedt
When a plugin option is defined, by default it is a boolean (true or false). If the option is something else, then it needs to set its "value" field to a default string other than NULL (can be just ""). If the value is not set then the option is considered boolean, and the updating of the option value will be handled accordingly. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20150324135923.308372986@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Add pevent_data_pid_from_comm()Steven Rostedt (Red Hat)
There is a pevent_data_comm_from_pid() that returns the cmdline stored for a given pid in order for users to map pids to comms, but there's no method to convert a comm back to a pid. This is useful for filters that specify a comm instead of a PID (it's faster than searching each individual event). Add a way to retrieve a comm from a pid. Since there can be more than one pid associated to a comm, it returns a data structure that lets the user iterate over all the saved comms for a given pid. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20150324135923.001103479@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Handle %z in bprint formatSteven Rostedt (Red Hat)
The %z printf specifier was not handled making trace_printk()s in the kernel that used this break on output. Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shawn Bohrer <shawn.bohrer@gmail.com> Link: http://lkml.kernel.org/r/20150324135922.844361717@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Copy trace_clock and free itSteven Rostedt (Red Hat)
The pevent->trace_clock should not be a direct pointer to what was given. It should be copied and freed. Note, valgrind pointed this out when a caller passed in a pointer that needed to be freed and it never was. Ideally, pevent should copy it (which this change does), and free the copy. It's up to the caller to free the clock string passed in. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20150324135922.695906738@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Handle NULL comm nameJosef Bacik
It is possible that a pid has no associated comm attached to it, although it can still be passed to pevent_register_comm(). But if comm is NULL, it will cause strdup() to segfault. To prevent this from happening, if comm is NULL use the default "<...>" name for the pid. Signed-off-by: Josef Bacik <jbacik@fb.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20150324135922.549965495@goodmis.org Link: http://lkml.kernel.org/p/1403799732-30308-1-git-send-email-jbacik@fb.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24perf symbols: Save DSO loading errno to better report errorsArnaldo Carvalho de Melo
Before, when some problem happened while trying to load the kernel symtab, 'perf top' would show: ┌─Warning:───────────────────────────┐ │The vmlinux file can't be used. │ │Kernel samples will not be resolved.│ │ │ │ │ │Press any key... │ └────────────────────────────────────┘ Now, it reports: # perf top --vmlinux /dev/null ┌─Warning:───────────────────────────────────────────┐ │The /tmp/passwd file can't be used: Invalid ELF file│ │Kernel samples will not be resolved. │ │ │ │ │ │Press any key... │ └────────────────────────────────────────────────────┘ This is possible because we now register the reason for not being able to load the symtab in the dso->load_errno member, and provide a dso__strerror_load() routine to format this error into a strerror like string with a short reason for the error while loading. That can be just forwarding the dso__strerror_load() call to strerror_r(), or, for a separate errno range providing a custom message. Reported-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-u5rb5uq63xqhkfb8uv2lxd5u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24perf target: Simplify handling of strerror_r returnArnaldo Carvalho de Melo
To deal with forwarding the strerror_r (GNU) return we need to check if the returned value is the buffer we passed or maybe some constant (unknown error), simplify that action by using scnprintf, that will do all the buflen size checks, trimming if needed. Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-d0ik6i5gjew56j0qphql28ou@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24perf tools: Work around lack of sched_getcpu in glibc < 2.6.Vinson Lee
This patch fixes this build error with glibc < 2.6. CC util/cloexec.o cc1: warnings being treated as errors util/cloexec.c: In function ‘perf_flag_probe’: util/cloexec.c:24: error: implicit declaration of function ‘sched_getcpu’ util/cloexec.c:24: error: nested extern declaration of ‘sched_getcpu’ make: *** [util/cloexec.o] Error 1 Signed-off-by: Vinson Lee <vlee@twitter.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: stable@vger.kernel.org # 3.18+ Link: http://lkml.kernel.org/r/1427137761-16119-1-git-send-email-vlee@twopensource.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24perf kmem: Print big numbers using thousands' groupNamhyung Kim
Like perf stat, this makes easy to read the numbers on stat like below: # perf kmem stat SUMMARY ======= Total bytes requested: 9,770,900 Total bytes allocated: 9,782,712 Total bytes wasted on internal fragmentation: 11,812 Internal fragmentation: 0.120744% Cross CPU allocations: 74/152,819 Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1427092244-22764-2-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24tools lib traceevent: Factor out allocating and processing argsJavi Merino
The sequence of allocating the print_arg field, calling process_arg() and verifying that the next event delimiter is repeated twice in process_hex() and will also be used for process_int_array(). Factor it out to a function to avoid writing the same code again and again. Signed-off-by: Javi Merino <javi.merino@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1426875176-30244-2-git-send-email-javi.merino@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24perf probe: Fix to get ummapped symbol address on kernelMasami Hiramatsu
Fix to get correctly unmapped symbol address on kernel. This allows us to probe on syscall symbols which are aliases of SyS_ functions with using debuginfo. Without this fix: ---- # ./perf probe -a sys_write Failed to find debug information for address 3b0100 Probe point 'sys_write' not found. Error: Failed to add events. ---- The address 0x3b0100 is a mapped address, and not usable in debuginfo. With this fix: ---- # ./perf probe -a sys_write Added new event: probe:sys_write (on sys_write) You can now use it in all perf tools, such as: perf record -e probe:sys_write -aR sleep 1 ---- Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: He Kuang <hekuang@huawei.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20150322114022.32639.19096.stgit@localhost.localdomain Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24perf tools: Remove (null) value of "Sort order" for perf mem reportYunlong Song
When '--sort' is not set, 'perf mem report" will print a null pointer as the output value of sort order, so fix it. Example: Before this patch: $ perf mem report # To display the perf.data header info, please use --header/--header-only options. # # Samples: 18 of event 'cpu/mem-loads/pp' # Total weight : 188 # Sort order : (null) # ... After this patch: $ perf mem report # To display the perf.data header info, please use --header/--header-only options. # # Samples: 18 of event 'cpu/mem-loads/pp' # Total weight : 188 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427082605-12881-1-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-24drm: Fixup racy refcounting in plane_force_disableDaniel Vetter
Originally it was impossible to be dropping the last refcount in this function since there was always one around still from the idr. But in commit 83f45fc360c8e16a330474860ebda872d1384c8c Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Aug 6 09:10:18 2014 +0200 drm: Don't grab an fb reference for the idr we've switched to weak references, broke that assumption but forgot to fix it up. Since we still force-disable planes it's only possible to hit this when racing multiple rmfb with fbdev restoring or similar evil things. As long as userspace is nice it's impossible to hit the BUG_ON. But the BUG_ON would most likely be hit from fbdev code, which usually invovles the console_lock besides all modeset locks. So very likely we'd never get the bug reports if this was hit in the wild, hence better be safe than sorry and backport. Spotted by Matt Roper while reviewing other patches. [airlied: pull this back into 4.0 - the oops happens there] Cc: stable@vger.kernel.org Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-03-23kvm: avoid page allocation failure in kvm_set_memory_region()Igor Mammedov
KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23KVM: x86: call irq notifiers with directed EOIRadim Krčmář
kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled. We need to do that for irq notifiers. (Like with edge interrupts.) Fix it by skipping EOI broadcast only. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211 Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23dm: fix add_disk() NULL pointer due to race with free_dev()Mike Snitzer
Commit c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info") exposed DM to a latent race in free_dev() vs add_disk() in relation to management of the device's minor number. Fix this by refactoring free_dev() to match cleanup order of the alloc_dev() error path. Move cleanup of the gendisk, queue, and bdev to _before_ the cleanup of the idr managed minor number. Also, purely due to cleanup that fell out during the free_dev() audit: - adjust dm_blk_close() to access the gendisk's private_data under the _minor_lock spinlock. - move __dm_destroy()'s dm_get_live_table() call out from under the _minor_lock spinlock. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202449 Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-03-23Merge remote-tracking branches 'regulator/fix/doc' and ↵Mark Brown
'regulator/fix/palmas' into regulator-linus
2015-03-23arm64: Use the reserved TTBR0 if context switching to the init_mmCatalin Marinas
The idle_task_exit() function may call switch_mm() with next == &init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so this patch simply sets the reserved TTBR0. Cc: <stable@vger.kernel.org> Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org> Tested-by: Jon Medhurst (Tixy) <tixy@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>