git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2013-09-02	RDMA/ocrdma: Fix to work with even a single MSI-X vector	Naresh Gottumukkala
	There are cases like SRIOV where can get only one MSI-X vector allocated for RoCE. In that case we need to use the vector for both data plane and control plane. We need to use EQ create version V2. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02	RDMA/ocrdma: Remove the MTU check based on Ethernet MTU	Naresh Gottumukkala
	Also increase MAX AH to 512. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02	RDMA/ocrdma: Add support for fast register work requests (FRWR)	Naresh Gottumukkala
	Also get the max_srq value from query_config mailbox response. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02	RDMA/ocrdma: Create IRD queue fix	Naresh Gottumukkala
	1) Fix ocrdma_get_num_posted_shift for upto 128 QPs. 2) Create for min of dev->max_wqe and requested wqe in create_qp. 3) As part of creating ird queue, populate with basic header templates. 4) Make sure all the DB memory allocated to userspace are page aligned. 5) Fix issue in checking the mmap local cache. 6) Some code cleanup. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02	net: make snmp_mib_free static inline	Cong Wang
	Fengguang reported: net/built-in.o: In function `in6_dev_finish_destroy': (.text+0x4ca7d): undefined reference to `snmp_mib_free' this is due to snmp_mib_free() is defined when CONFIG_INET is enabled, but in6_dev_finish_destroy() is now moved to core kernel. I think snmp_mib_free() is small enough to be inlined, so just make it static inline. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-02	vxlan: include net/ip6_checksum.h for csum_ipv6_magic()	Cong Wang
	Fengguang reported a compile warning: drivers/net/vxlan.c: In function 'vxlan6_xmit_skb': drivers/net/vxlan.c:1352:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors this patch fixes it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-02	vxlan: fix flowi6_proto value	Cong Wang
	It should be IPPROTO_UDP. Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-02	Linux 3.11v3.11	Linus Torvalds

2013-09-02	perf trace: Tell arg formatters the arg index	Arnaldo Carvalho de Melo
	... so that it can mask args relative to its position, like the 'mode' arg that may or not be printed according to the 'flags' (O_CREAT) value. [root@zoo ~]# perf trace -a -e openat,open_by_handle_at \| head -1 469.754 ( 0.034 ms): 1183 openat(dfd: -100, filename: 0x7fbde40014b0, flags: CLOEXEC\|DIRECTORY\|NONBLOCK) = 23 [root@zoo ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-bgokqpkufd4sio7ixxknf1ux@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	perf trace: Add beautifier for open's flags arg	Arnaldo Carvalho de Melo
	Suppressing the mode when O_CREAT not present, needs improvements on the arg masking mechanism to be reused in openat, open_by_handle_at, mq_open: [root@zoo ~]# perf trace -a -e open \| grep -v 'flags: RDONLY' \| head -5 147.541 ( 0.028 ms): 1188 open(filename: 0x33c17782fb, flags: CLOEXEC ) = 23 229.898 ( 0.020 ms): 2071 open(filename: 0x3d93c80, flags: NOATIME ) = -1 EPERM Operation not permitted [root@zoo ~]# perf trace -a -e open \| grep CREAT 1406.697 ( 0.024 ms): 616 open(filename: 0x7fffc3a0f910, flags: CREAT\|TRUNC\|WRONLY, mode: 438 ) = -1 ENOENT No such file or directory 2032.770 ( 0.804 ms): 4354 open(filename: 0x7f33ac814368, flags: CREAT\|EXCL\|RDWR, mode: 384 ) = 115 ^C[root@zoo ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-c7vm6klaf995qw1vqdih5t7q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	lockref: implement lockless reference count updates using cmpxchg()	Linus Torvalds
	Instead of taking the spinlock, the lockless versions atomically check that the lock is not taken, and do the reference count update using a cmpxchg() loop. This is semantically identical to doing the reference count update protected by the lock, but avoids the "wait for lock" contention that you get when accesses to the reference count are contended. Note that a "lockref" is absolutely _not_ equivalent to an atomic_t. Even when the lockref reference counts are updated atomically with cmpxchg, the fact that they also verify the state of the spinlock means that the lockless updates can never happen while somebody else holds the spinlock. So while "lockref_put_or_lock()" looks a lot like just another name for "atomic_dec_and_lock()", and both optimize to lockless updates, they are fundamentally different: the decrement done by atomic_dec_and_lock() is truly independent of any lock (as long as it doesn't decrement to zero), so a locked region can still see the count change. The lockref structure, in contrast, really is a locked reference count. If you hold the spinlock, the reference count will be stable and you can modify the reference count without using atomics, because even the lockless updates will see and respect the state of the lock. In order to enable the cmpxchg lockless code, the architecture needs to do three things: (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit in an aligned u64, and have a "cmpxchg()" implementation that works on such a u64 data type. (2) define a helper function to test for a spinlock being unlocked ("arch_spin_value_unlocked()") (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its Kconfig file. This enables it for x86-64 (but not 32-bit, we'd need to make sure cmpxchg() turns into the proper cmpxchg8b in order to enable it for 32-bit mode). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02	lockref: uninline lockref helper functions	Linus Torvalds
	They aren't very good to inline, since they already call external functions (the spinlock code), and we're going to create rather more complicated versions of them that can do the reference count updates locklessly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02	vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()	Linus Torvalds
	This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c and re-implements it using the lockref infrastructure instead. It also adds a lot of comments about what is actually going on, because turning a dentry that was looked up using RCU into a long-lived reference counted entry is one of the more subtle parts of the rcu walk. We also used to be _particularly_ subtle in unlazy_walk() where we re-validate both the dentry and its parent using the same sequence count. We used to do it by nesting the locks and then verifying the sequence count just once. That was silly, because nested locking is expensive, but the sequence count check is not. So this just re-validates the dentry and the parent separately, avoiding the nested locking, and making the lockref lookup possible. Acked-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02	perf trace: Add beautifier for lseek's whence arg	Arnaldo Carvalho de Melo
	[root@zoo ~]# perf trace -a -e lseek \| head -1 546.922 ( 0.004 ms): 1184 lseek(fd: 26, offset: 0, whence: CUR) = 2 [root@zoo ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-2eiuhwz9jbnhj80q6jaqeji4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()	Waiman Long
	A valid parent pointer is always going to have a non-zero reference count, but if we look up the parent optimistically without locking, we have to protect against the (very unlikely) race against renaming changing the parent from under us. We do that by using lockref_get_not_zero(), and then re-checking the parent pointer after getting a valid reference. [ This is a re-implementation of a chunk from the original patch by Waiman Long: "dcache: Enable lockless update of dentry's refcount". I've completely rewritten the patch-series and split it up, but I'm attributing this part to Waiman as it's close enough to his earlier patch - Linus ] Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02	lockref: add 'lockref_get_or_lock() helper	Linus Torvalds
	This behaves like "lockref_get_not_zero()", but instead of doing nothing if the count was zero, it returns with the lock held. This allows callers to revalidate the lockref-protected data structure if required even if the count was zero to begin with, and possibly increment the count if it passes muster. In particular, the dentry code wants this when it wants to turn an RCU-protected dentry into a stable refcounted one: if the dentry count it zero, but the sequence number still validates the dentry, we can take a reference to it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02	IB/core: Better checking of userspace values for receive flow steering	Matan Barak
	- Don't allow unsupported comp_mask values, user should check ibv_query_device to know which features are supported. - Add a check in ib_uverbs_create_flow() to verify the size passed from the user space. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02	perf tools: Fix symbol offset computation for some dsos	David Ahern
	For some dsos (e.g., libc, libpthread, kernel modules) the symbol offset is huge. e.g., qemu-kvm 17238/17242 [007] 762235.640311: ffffffff816288a1 __schedule+0x451 ([kernel.kallsyms]) ffffffff81629609 schedule+0x29 ([kernel.kallsyms]) ffffffffa00a6ded kvm_vcpu_block+0xffffffffa00a106d (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00bae6b kvm_arch_vcpu_ioctl_run+0xffffffffa00a118b (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00a4d7a kvm_vcpu_ioctl+0xffffffffa00a141a (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffff811a7bdb do_vfs_ioctl+0x8b ([kernel.kallsyms]) ffffffff811a80c1 sys_ioctl+0x91 ([kernel.kallsyms]) ffffffff81633182 system_call+0x72 ([kernel.kallsyms]) 7f882a97af27 __GI___ioctl+0x7f882a891007 (/lib64/libc-2.14.90.so) 100000002 [unknown] ([unknown]) It seems to be maps with a non-0 start. Taking that into account the offsets are correct: qemu-kvm 17238/17242 [007] 762235.640311: ffffffff816288a1 __schedule+0x451 ([kernel.kallsyms]) ffffffff81629609 schedule+0x29 ([kernel.kallsyms]) ffffffffa00a6ded kvm_vcpu_block+0x6d (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00bae6b kvm_arch_vcpu_ioctl_run+0x18b (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00a4d7a kvm_vcpu_ioctl+0x41a (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffff811a7bdb do_vfs_ioctl+0x8b ([kernel.kallsyms]) ffffffff811a80c1 sys_ioctl+0x91 ([kernel.kallsyms]) ffffffff81633182 system_call+0x72 ([kernel.kallsyms]) 7f882a97af27 __GI___ioctl+0x7 (/lib64/libc-2.14.90.so) 100000002 [unknown] ([unknown]) Signed-off-by: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1375026512-45826-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	perf list: Skip unsupported events	Namhyung Kim
	Some hardware events might not be supported on a system. Listing those events seems meaningless and confusing to users. Let's skip them. Before: $ perf list cache \| wc -l 33 After: $ perf list cache \| wc -l 27 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377571313-14722-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	perf tests: Add 'keep tracking' test	Adrian Hunter
	Add a test for the newly added PERF_COUNT_SW_DUMMY event. The test checks that tracking events continue when an event is disabled but a dummy software event is not disabled. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377975053-3811-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	perf tools: Add support for PERF_COUNT_SW_DUMMY	Adrian Hunter
	Add support for the new dummy software event PERF_COUNT_SW_DUMMY. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377975053-3811-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	perf: Add a dummy software event to keep tracking	Adrian Hunter
	When an event is disabled the "tracking" events selected by the 'mmap', 'comm' and 'task' bits of struct perf_event_attr, are also disabled. However, the information those events provide is necessary to resolve symbols for when the main event is re-enabled. The "tracking" events can be kept enabled by putting them on another event, but that requires an event that otherwise does nothing. A new software event PERF_COUNT_SW_DUMMY is added for that purpose. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377975053-3811-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	perf trace: Add beautifier for futex 'operation' parm	Arnaldo Carvalho de Melo
	That uses the arg mask mechanism just introduced to suppress ignored arguments according to the futex operation. Based on an initial patch from David Ahern that showed the need for some way to allow args to tell how many further args should be shown. Initial-patch-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-0k30it46r4hv5eanefbdmj5t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	perf trace: Allow syscall arg formatters to mask args	Arnaldo Carvalho de Melo
	The futex syscall ignores some arguments according to the 'operation' arg, so allow arg formatters to mask those. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-abqrg3oldgfsdnltfrvso9f7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "This is a bug fix for the pm80xx driver. It turns out that when the new hardware support was added in 3.10 the IO command size was kept at the old hard coded value. This means that the driver attaches to some new cards and then simply hangs the system" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] pm80xx: fix Adaptec 71605H hang
2013-09-02	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot fix from Peter Anvin: "A single very small boot fix for very large memory systems (> 0.5T)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
2013-09-02	Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma	Linus Torvalds
	Pull slave-dma fix from Vinod Koul: "A fix for resolving TI_EDMA driver's build error in allmodconfig to have filter function built in"" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: dma/Kconfig: TI_EDMA needs to be boolean
2013-09-02	Merge tag 'asoc-v3.12-3' of ↵	Takashi Iwai
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next ASoC: Final updates for v3.12 A few final updates for v3.12 - some cleanups, a bug fix for ssm2602, pop removal for rt5640 and fixes for the reporting of unidirectional links in the MXS SGTL5000 driver.
2013-09-02	ALSA: hda - hdmi: Fallback to ALSA allocation when selecting CA	Anssi Hannula
	hdmi_channel_allocation() tries to find a HDMI channel allocation that matches the number channels in the playback stream and contains only speakers that the HDMI sink has reported as available via EDID. If no such allocation is found, 0 (stereo audio) is used. Using CA 0 causes the audio causes the sink to discard everything except the first two channels (front left and front right). However, the sink may be capable of receiving more channels than it has speakers (and then perform downmix or discard the extra channels), in which case it is preferable to use a CA that contains extra channels than to use CA 0 which discards all the non-stereo channels. Additionally, it seems that HBR (HD) passthrough output does not work on Intel HDMI codecs when CA is set to 0 (possibly the codec zeroes channels not present in CA). This happens with all receivers that report a 5.1 speaker mask since a HBR stream is carried on 8 channels to the codec. Add a fallback in the CA selection so that the CA channel count at least matches the stream channel count, even if the stream contains channels not present in the sink speaker descriptor. Thanks to GrimGriefer at OpenELEC forums for discovering that changing the sink speaker mask allowed HBR output. Reported-by: GrimGriefer Reported-by: Ashecrow Reported-by: Frank Zafka <kafkaesque1978@gmail.com> Reported-by: Peter Frühberger <fritsch@xbmc.org> Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-09-02	perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()	Joe Perches
	Use the convenience function instead of __GFP_ZERO. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/f58599ae1a8d7b32d37e9cf283e95fba6452f7f6.1377809875.git.joe@perches.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	perf: Export struct perf_branch_entry to userspace	Vince Weaver
	If PERF_SAMPLE_BRANCH_STACK is enabled then samples are returned with the format { u64 from, to, flags } but the flags layout is not specified. This field has the type struct perf_branch_entry; move this definition into include/uapi/linux/perf_event.h so users can access these fields. This is similar to the existing inclusion of perf_mem_data_src in the include/uapi/linux/perf_event.h file. Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308231544420.1889@vincent-weaver-1.um.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	perf: Add attr->mmap2 attribute to an event	Stephane Eranian
	Adds a new PERF_RECORD_MMAP2 record type which is essence an expanded version of PERF_RECORD_MMAP. Used to request mmap records with more information about the mapping, including device major, minor and the inode number and generation for mappings associated with files or shared memory segments. Works for code and data (with attr->mmap_data set). Existing PERF_RECORD_MMAP record is unmodified by this patch. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/1377079825-19057-2-git-send-email-eranian@google.com [ Added Al to the Cc:. Are the ino, maj/min exports of vma->vm_file OK? ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	perf/x86: Add Silvermont (22nm Atom) support	Yan, Zheng
	Compared to old atom, Silvermont has offcore and has more events that support PEBS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X	Yan, Zheng
	Silvermont (22nm Atom) has two offcore response configuration MSRs, unlike other Intel CPU, its event code for MSR_OFFCORE_RSP_1 is 0x02b7. To avoid complicating intel_fixup_er(), use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X. So intel_fixup_er() can find the event code for OFFCORE_RSP_N by x86_pmu.extra_regs[N].event. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched/fair: Fix the sd_parent_degenerate() code	Peter Zijlstra
	I found that on my WSM box I had a redundant domain: [ 0.949769] CPU0 attaching sched-domain: [ 0.953765] domain 0: span 0,12 level SIBLING [ 0.958335] groups: 0 (cpu_power = 587) 12 (cpu_power = 588) [ 0.964548] domain 1: span 0-5,12-17 level MC [ 0.969206] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176) [ 0.984993] domain 2: span 0-5,12-17 level CPU [ 0.989822] groups: 0-5,12-17 (cpu_power = 7055) [ 0.995049] domain 3: span 0-23 level NUMA [ 0.999620] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056) Note how domain 2 has only a single group and spans the same CPUs as domain 1. We should not keep such domains and do in fact have code to prune these. It turns out that the 'new' SD_PREFER_SIBLING flag causes this, it makes sd_parent_degenerate() fail on the CPU domain. We can easily fix this by 'ignoring' the SD_PREFER_SIBLING bit and transfering it to whatever domain ends up covering the span. With this patch the domains now look like this: [ 0.950419] CPU0 attaching sched-domain: [ 0.954454] domain 0: span 0,12 level SIBLING [ 0.959039] groups: 0 (cpu_power = 587) 12 (cpu_power = 588) [ 0.965271] domain 1: span 0-5,12-17 level MC [ 0.969936] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176) [ 0.985737] domain 2: span 0-23 level NUMA [ 0.990231] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056) Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-ys201g4jwukj0h8xcamakxq1@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched/fair: Rework and comment the group_imb code	Peter Zijlstra
	Rik reported some weirdness due to the group_imb code. As a start to looking at it, clean it up a little and add a few explanatory comments. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-caeeqttnla4wrrmhp5uf89gp@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched/fair: Optimize find_busiest_queue()	Peter Zijlstra
	Use for_each_cpu_and() and thereby avoid computing the capacity for CPUs we know we're not interested in. Reviewed-by: Paul Turner <pjt@google.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-lppceyv6kb3a19g8spmrn20b@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched/fair: Make group power more consistent	Peter Zijlstra
	For easier access, less dereferences and more consistent value, store the group power in update_sg_lb_stats() and use it thereafter. The actual value in sched_group::sched_group_power::power can change throughout the load-balance pass if we're unlucky. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-739xxqkyvftrhnh9ncudutc7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched/fair: Remove duplicate load_per_task computations	Peter Zijlstra
	Since we already compute (but don't store) the sgs load_per_task value in update_sg_lb_stats() we might as well store it and not re-compute it later on. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-ym1vmljiwbzgdnnrwp9azftq@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched/fair: Shrink sg_lb_stats and play memset games	Peter Zijlstra
	We can shrink sg_lb_stats because rq::nr_running is an unsigned int and cpu numbers are 'int' Before: sgs: /* size: 72, cachelines: 2, members: 10 / sds: / size: 184, cachelines: 3, members: 7 / After: sgs: / size: 56, cachelines: 1, members: 10 / sds: / size: 152, cachelines: 3, members: 7 */ Further we can avoid clearing all of sds since we do a total clear/assignment of sg_stats in update_sg_lb_stats() with exception of busiest_stat.avg_load which is referenced in update_sd_pick_busiest(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-0klzmz9okll8wc0nsudguc9p@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched: Clean-up struct sd_lb_stat	Joonsoo Kim
	There is no reason to maintain separate variables for this_group and busiest_group in sd_lb_stat, except saving some space. But this structure is always allocated in stack, so this saving isn't really benificial [peterz: reducing stack space is good; in this case readability increases enough that I think its still beneficial] This patch unify these variables, so IMO, readability may be improved. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> [ Rename this to local -- avoids confusion between this_cpu and the C++ this pointer. ] Reviewed-by: Paul Turner <pjt@google.com> [ Lots of style edits, a few fixes and a rename. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375778203-31343-4-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched: Factor out code to should_we_balance()	Joonsoo Kim
	Now checking whether this cpu is appropriate to balance or not is embedded into update_sg_lb_stats() and this checking has no direct relationship to this function. There is not enough reason to place this checking at update_sg_lb_stats(), except saving one iteration for sched_group_cpus. In this patch, I factor out this checking to should_we_balance() function. And before doing actual work for load_balancing, check whether this cpu is appropriate to balance via should_we_balance(). If this cpu is not a candidate for balancing, it quit the work immediately. With this change, we can save two memset cost and can expect better compiler optimization. Below is result of this patch. * Vanilla * text data bss dec hex filename 34499 1136 116 35751 8ba7 kernel/sched/fair.o * Patched * text data bss dec hex filename 34243 1136 116 35495 8aa7 kernel/sched/fair.o In addition, rename @balance to @continue_balancing in order to represent its purpose more clearly. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> [ s/should_balance/continue_balancing/g ] Reviewed-by: Paul Turner <pjt@google.com> [ Made style changes and a fix in should_we_balance(). ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375778203-31343-3-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	sched: Remove one division operation in find_busiest_queue()	Joonsoo Kim
	Remove one division operation in find_busiest_queue() by using crosswise multiplication: wl_i / power_i > wl_j / power_j := wl_i * power_j > wl_j * power_i Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> [ Expanded the changelog. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375778203-31343-2-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	perf: Prevent race in unthrottling code	Jiri Olsa
	The current throttling code triggers WARN below via following workload (only hit on AMD machine with 48 CPUs): # while [ 1 ]; do perf record perf bench sched messaging; done WARNING: at arch/x86/kernel/cpu/perf_event.c:1054 x86_pmu_start+0xc6/0x100() SNIP Call Trace: <IRQ> [<ffffffff815f62d6>] dump_stack+0x19/0x1b [<ffffffff8105f531>] warn_slowpath_common+0x61/0x80 [<ffffffff8105f60a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810213a6>] x86_pmu_start+0xc6/0x100 [<ffffffff81129dd2>] perf_adjust_freq_unthr_context.part.75+0x182/0x1a0 [<ffffffff8112a058>] perf_event_task_tick+0xc8/0xf0 [<ffffffff81093221>] scheduler_tick+0xd1/0x140 [<ffffffff81070176>] update_process_times+0x66/0x80 [<ffffffff810b9565>] tick_sched_handle.isra.15+0x25/0x60 [<ffffffff810b95e1>] tick_sched_timer+0x41/0x60 [<ffffffff81087c24>] __run_hrtimer+0x74/0x1d0 [<ffffffff810b95a0>] ? tick_sched_handle.isra.15+0x60/0x60 [<ffffffff81088407>] hrtimer_interrupt+0xf7/0x240 [<ffffffff81606829>] smp_apic_timer_interrupt+0x69/0x9c [<ffffffff8160569d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff81129f74>] ? __perf_event_task_sched_in+0x184/0x1a0 [<ffffffff814dd937>] ? kfree_skbmem+0x37/0x90 [<ffffffff815f2c47>] ? __slab_free+0x1ac/0x30f [<ffffffff8118143d>] ? kfree+0xfd/0x130 [<ffffffff81181622>] kmem_cache_free+0x1b2/0x1d0 [<ffffffff814dd937>] kfree_skbmem+0x37/0x90 [<ffffffff814e03c4>] consume_skb+0x34/0x80 [<ffffffff8158b057>] unix_stream_recvmsg+0x4e7/0x820 [<ffffffff814d5546>] sock_aio_read.part.7+0x116/0x130 [<ffffffff8112c10c>] ? __perf_sw_event+0x19c/0x1e0 [<ffffffff814d5581>] sock_aio_read+0x21/0x30 [<ffffffff8119a5d0>] do_sync_read+0x80/0xb0 [<ffffffff8119ac85>] vfs_read+0x145/0x170 [<ffffffff8119b699>] SyS_read+0x49/0xa0 [<ffffffff810df516>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff81604a19>] system_call_fastpath+0x16/0x1b ---[ end trace 622b7e226c4a766a ]--- The reason is a race in perf_event_task_tick() throttling code. The race flow (simplified code): - perf_throttled_count is per cpu variable and is CPU throttling flag, here starting with 0 - perf_throttled_seq is sequence/domain for allowed count of interrupts within the tick, gets increased each tick on single CPU (CPU bounded event): ... workload perf_event_task_tick: \| \| T0 inc(perf_throttled_seq) \| T1 needs_unthr = xchg(perf_throttled_count, 0) == 0 tick gets interrupted: ... event gets throttled under new seq ... T2 last NMI comes, event is throttled - inc(perf_throttled_count) back to tick: \| perf_adjust_freq_unthr_context: \| \| T3 unthrottling is skiped for event (needs_unthr == 0) \| T4 event is stop and started via freq adjustment \| tick ends ... workload ... no sample is hit for event ... perf_event_task_tick: \| \| T5 needs_unthr = xchg(perf_throttled_count, 0) != 0 (from T2) \| T6 unthrottling is done on event (interrupts == MAX_INTERRUPTS) \| event is already started (from T4) -> WARN Fixing this by not checking needs_unthr again and thus check all events for unthrottling. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1377355554-8934-1-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02	drm/radeon: support render nodes	Christian König
	Enable support for drm render nodes for radeon by flagging the ioctls that are safe and just needed for rendering. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-09-02	drm/nouveau: Support render nodes	Martin Peres
	Enable support for drm render nodes for nouveau by flagging the ioctls that are safe and just needed for rendering. Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Martin Peres <martin.peres@labri.fr> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-09-02	drm/i915: Support render nodes	Kristian Høgsberg
	Enable support for drm render nodes for i915 by flagging the ioctls that are safe and just needed for rendering. v2: mark reg_read, set_caching and get_caching (ickle, danvet) Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-09-02	drm: fix DRM_IOCTL_MODE_GETFB handle-leak	David Herrmann
	DRM_IOCTL_MODE_GETFB is used to retrieve information about a given framebuffer ID. It is a read-only helper and was thus declassified for unprivileged access in: commit a14b1b42477c5ef089fcda88cbaae50d979eb8f9 Author: Mandeep Singh Baines <mandeep.baines@gmail.com> Date: Fri Jan 20 12:11:16 2012 -0800 drm: remove master fd restriction on mode setting getters However, alongside width, height and stride information, DRM_IOCTL_MODE_GETFB also passes back a handle to the underlying buffer of the framebuffer. This handle allows users to mmap() it and read or write into it. Obviously, this should be restricted to DRM-Master. With the current setup, any process with access to /dev/dri/card0 (which means any process with access to hardware-accelerated rendering) can access the current screen framebuffer and modify it ad libitum. For backwards-compatibility reasons we want to keep the DRM_IOCTL_MODE_GETFB call unprivileged. Besides, it provides quite useful information regarding screen setup. So we simply test whether the caller is the current DRM-Master and if not, we return 0 as handle, which is always invalid. A following DRM_IOCTL_GEM_CLOSE on this handle will fail with EINVAL, but we accept this. Users shouldn't test for errors during GEM_CLOSE, anyway. And it is still better as a failing MODE_GETFB call. v2: add capable(CAP_SYS_ADMIN) check for compatibility with i-g-t Cc: <stable@vger.kernel.org> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-09-02	drm/msm: convert to drm_bridge	Rob Clark
	Drop the msm_connector base class, and special calls to base class methods from the encoder, and use instead drm_bridge. This allows for a cleaner division between the hdmi (and in future dsi) blocks, from the mdp block. Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-09-02	drm: Add drm_bridge	Sean Paul
	This patch adds the notion of a drm_bridge. A bridge is a chained device which hangs off an encoder. The drm driver using the bridge should provide the association between encoder and bridge. Once a bridge is associated with an encoder, it will participate in mode set, and dpms (via the enable/disable hooks). Signed-off-by: Sean Paul <seanpaul@chromium.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>