git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2013-04-26	ARM: s3c64xx: cpuidle: use init/exit common routine	Daniel Lezcano
	Remove the duplicated code and use the cpuidle common code for initialization. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-26	NFC: Move LLCP code to the NFC top level diirectory	Samuel Ortiz
	And stop making it optional. LLCP is a fundamental part of the NFC specifications and making it optional does not make much sense. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-26	sched: Fix init NOHZ_IDLE flag	Vincent Guittot
	On my SMP platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle - which makes the scheduler unhappy. The root cause is: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. More generally, the NOHZ_IDLE flag must be initialized when new sched_domains are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned. This condition can be ensured by adding a synchronize_rcu() between the destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE flag will not be updated with old sched_domain once it has been initialized. But this solution introduces a additionnal latency in the rebuild sequence that is called during cpu hotplug. As suggested by Frederic Weisbecker, another solution is to have the same rcu lifecycle for both NOHZ_IDLE and sched_domain struct. A new nohz_idle field is added to sched_domain so both status and sched_domain will share the same RCU lifecycle and will be always synchronized. In addition, there is no more need to protect nohz_idle against concurrent access as it is only modified by 2 exclusive functions called by local cpu. This solution has been prefered to the creation of a new struct with an extra pointer indirection for sched_domain. The synchronization is done at the cost of : - An additional indirection and a rcu_dereference for accessing nohz_idle. - We use only the nohz_idle field of the top sched_domain. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linaro-kernel@lists.linaro.org Cc: peterz@infradead.org Cc: fweisbec@gmail.com Cc: pjt@google.com Cc: rostedt@goodmis.org Cc: efault@gmx.de Link: http://lkml.kernel.org/r/1366729142-14662-1-git-send-email-vincent.guittot@linaro.org [ Fixed !NO_HZ build bug. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26	ARM: 7700/2: Make cpu_init() notrace	Jon Medhurst
	On resume from CPU power down any trace hooks enabled in cpu_init() will get called before that function has done set_my_cpu_offset(), so any use of per-cpu variables by trace hook code will cause bad things to happen. Prevent this by marking the function notrace. This fixes lockups/crashes seen when enabling function tracer on TC2 with the not yet mainlined cpuidle driver. Signed-off-by: Jon Medhurst <tixy@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-04-26	mfd: ab8500: Export ab8500_gpadc_sw_hw_convert properly	Arnd Bergmann
	Apparently the ab8500_gpadc_sw_hw_convert function got renamed from ab8500_gpadc_convert to ab8500_gpadc_sw_hw_convert in commit 734823462 "mfd: ab8500-gpadc: Add gpadc hw conversion", but the export for this function did not get changed at the same time, causing this allyesconfig error: ERROR: "ab8500_gpadc_sw_hw_convert" [drivers/hwmon/ab8500.ko] undefined! This patch fixes the export. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: M'boumba Cedric Madianga <cedric.madianga@stericsson.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Mattias WALLIN <mattias.wallin@stericsson.com> Cc: Michel JAOUEN <michel.jaouen@stericsson.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-26	GFS2: Flush work queue before clearing glock hash tables	Bob Peterson
	There was a timing window when a GFS2 file system was unmounted that caused GFS2 to call BUG() and panic the kernel. The call to BUG() is meant to ensure that the glock reference count, gl_ref, never gets down to zero and bounce back up again. What was happening during umount is that function gfs2_put_super was dequeing its glocks for well-known files. In particular, we saw it on the journal glock, sd_jinode_gh. The dequeue caused delayed work to be queued for the glock state machine, to transition the lock to an "unlocked" state. While the work was still queued, gfs2_put_super called gfs2_gl_hash_clear to clear out the glock hash tables. If the timing was just so, the glock work function would drop the reference count at the time when it was being checked for zero, and that caused BUG() to be called. This patch calls flush_workqueue before clearing the glock hash tables, thereby ensuring that the delayed work is executed before the hash tables are cleared, and therefore the reference count never goes to zero until the glock is cleared. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-26	nohz: Reduce overhead under high-freq idling patterns	Ingo Molnar
	One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine, which apparently can break out of its idle loop rather frequently, with high frequency. In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant context switching, because tick_nohz_stop_sched_tick() will, if delta_jiffies == 0, mis-identify this as a timer event - activating the TIMER_SOFTIRQ, which wakes up ksoftirqd. Fix this by treating delta_jiffies == 0 the same way we treat other short wakeups, delta_jiffies == 1. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26	perf/x86/intel/P4: Robistify P4 PMU types	Ingo Molnar
	Linus found, while extending integer type extension checks in the sparse static code checker, various fragile patterns of mixed signed/unsigned 64-bit/32-bit integer use in perf_events_p4.c. The relevant hardware register ABI is 64 bit wide on 32-bit kernels as well, so clean it all up a bit, remove unnecessary casts, and make sure we use 64-bit unsigned integers in these places. [ Unfortunately this patch was not tested on real P4 hardware, those are pretty rare already. If this patch causes any problems on P4 hardware then please holler ... ] Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130424072630.GB1780@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26	s390/pci: use pci_scan_root_bus	Sebastian Ott
	The pci config space accessors on s390 are (now) smart enough to figure out if a pci function is available. So instead of calling pci_create_root_bus and then pci_scan_single_device for each available function just call pci_scan_root_bus and let the pci core do the scanning (via config reads on all possible functions) and device creation. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	s390/scm_blk: fix memleak in init function	Sebastian Ott
	If the allocation of a single request fails the already allocated requests will not be freed. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	s390/scm_blk: allow more cluster size values	Sebastian Ott
	Allow 0 and powers of 2 between 2 and 128 for write_cluster_size. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	s390/cio: fix irq statistics	Sebastian Ott
	When we fetch an interrupt on the CCW console using tsch (via ccw_device_wait_idle formerly known as wait_cons_dev) we increment the irq count for the affected interruption class but it is not accounted as an IO interrupt. This is broken since commit b603d258a43b4e7339660bdd3b5c25eacd603e54 "s390: remove superfluous tpi from wait_cons_dev" Fix it so that the sum of the individual interrupts per class matches the number of IO interrupts again. Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	s390/memory hotplug: prevent offline of active memory increments	Heiko Carstens
	In case a machine supports memory hotplug all active memory increments present at IPL time have been initialized with a "usecount" of 1. This is wrong if the memory increment size is larger than the memory section size of the memory hotplug code. If that is the case the usecount must be initialized with the number of memory sections that fit into one memory increment. Otherwise it is possible to put a memory increment into standby state even if there are still active sections. Afterwards addressing exceptions might happen which cause the kernel to panic. However even worse, if a memory increment was put into standby state and afterwards into active state again, it's contents would have been zeroed, leading to memory corruption. This was only an issue for machines that support standby memory and have at least 256GB memory. This is broken since commit fdb1bb15 "[S390] sclp/memory hotplug: fix initial usecount of increments". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: stable@vger.kernel.org # 2.6.39+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	s390: remove small stack config option	Heiko Carstens
	We've seen repeatedly that 8KB stack size on 64 bit kernels is not sufficient. So simply remove the config option. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	s390: system call path micro optimization	Martin Schwidefsky
	Add a pointer to the system call table to the thread_info structure. The TIF_31BIT bit is set or cleared by SET_PERSONALITY exactly once for the lifetime of a process. With the pointer to the correct system call table in thread_info the system call code in entry64.S path can drop the check for TIF_31BIT which saves a couple of instructions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	s390: lowcore stack pointer offsets	Martin Schwidefsky
	Store the stack pointers in the lowcore for the kernel stack, the async stack and the panic stack with the offset required for the first user. This avoids an unnecessary add instruction on the system call path. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26	gpio: grgpio: Add irq support	Andreas Larsson
	The drivers sets up an irq domain and hands out unique irqs to irq capable gpio lines regardless of how underlying irq maps to gpio lines. Any gpio line can map to any one or none of the irqs of the core, independently of each other. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-04-26	gpio: grgpio: Add device driver for GRGPIO cores	Andreas Larsson
	This driver supports GRGPIO gpio cores available in the GRLIB VHDL IP core library from Aeroflex Gaisler. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-04-26	lockdep: Consolidate bug messages into a single print_lockdep_off() function	Dave Jones
	Also add some missing printk levels. Signed-off-by: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130425174002.GA26769@redhat.com [ Tweaked the messages a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26	lockdep: Print out additional debugging advice when we hit lockdep BUGs	Dave Jones
	We occasionally get reports of these BUGs being hit, and the stack trace doesn't necessarily always tell us what we need to know about why we are hitting those limits. If users start attaching /proc/lock_stats to reports we may have more of a clue what's going on. Signed-off-by: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130423163403.GA12839@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26	Merge branch '3.10/fb-mmap' into for-next	Tomi Valkeinen
	Merge topic branch to get vm_iomap_memory into use. Conflicts: drivers/video/fbmon.c
2013-04-26	powerpc/perf: Enable branch stack sampling framework	Anshuman Khandual
	Provides basic enablement for perf branch stack sampling framework on POWER8 processor based platforms. Adds new BHRB related elements into cpu_hw_event structure to represent current BHRB config, BHRB filter configuration, manage context and to hold output BHRB buffer during PMU interrupt before passing to the user space. This also enables processing of BHRB data and converts them into generic perf branch stack data format. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Define BHRB generic functions, data and flags for POWER8	Anshuman Khandual
	This patch populates BHRB specific data for power_pmu structure. It also implements POWER8 specific BHRB filter and configuration functions. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Add new BHRB related generic functions, data and flags	Anshuman Khandual
	This patch adds couple of generic functions to power_pmu structure which would configure the BHRB and it's filters. It also adds representation of the number of BHRB entries present on the PMU. A new PMU flag PPMU_BHRB would indicate presence of BHRB feature. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Add basic assembly code to read BHRB entries on POWER8	Anshuman Khandual
	This patch adds the basic assembly code to read BHRB buffer. BHRB entries are valid only after a PMU interrupt has happened (when MMCR0[PMAO]=1) and BHRB has been freezed. BHRB read should not be attempted when it is still enabled (MMCR0[PMAE]=1) and getting updated, as this can produce non-deterministic results. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Add new BHRB related instructions for POWER8	Anshuman Khandual
	This patch adds new POWER8 instruction encoding for reading and clearing Branch History Rolling Buffer entries. The new instruction 'mfbhrbe' (move from branch history rolling buffer entry) is used to read BHRB buffer entries and instruction 'clrbhrb' (clear branch history rolling buffer) is used to clear the entire buffer. The instruction 'clrbhrb' has straight forward encoding. But the instruction encoding format for reading the BHRB entries is like 'mfbhrbe RT, BHRBE' where it takes two arguments, i.e the index for the BHRB buffer entry to read and a general purpose register to put the value which was read from the buffer entry. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Power8 PMU support	Michael Ellerman
	This patch adds support for the power8 PMU to perf. Work is ongoing to add generic cache events. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Add support for SIER	Michael Ellerman
	On power8 we have a new SIER (Sampled Instruction Event Register), which captures information about instructions when we have random sampling enabled. Add support for loading the SIER into pt_regs, overloading regs->dar. Also set the new NO_SIPR flag in regs->result if we don't have SIPR. Update regs_sihv/sipr() to look for SIPR/SIHV in SIER. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Add regs_no_sipr()	Michael Ellerman
	On power8 the presence or absence of SIPR depends on settings at runtime, so convert to using a dynamic flag for NO_SIPR. Existing backends that set NO_SIPR unconditionally set the dynamic flag obviously. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Add an accessor for regs->result	Michael Ellerman
	Add an accessor for regs->result so we can use it to store more flags in future. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Convert mmcra_sipr/sihv() to regs_sipr/sihv()	Michael Ellerman
	On power8 the SIPR and SIHV are not in MMCRA, so convert the routines to take regs and change the names accordingly. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/perf: Add an explict flag indicating presence of SLOT field	Michael Ellerman
	In perf_ip_adjust() we potentially use the MMCRA[SLOT] field to adjust the reported IP of a sampled instruction. Currently the logic is written so that if the backend does NOT have the PPMU_ALT_SIPR flag set then we assume MMCRA[SLOT] exists. However on power8 we do not want to set ALT_SIPR (it's in a third location), and we also do not have MMCRA[SLOT]. So add a new flag which only indicates whether MMCRA[SLOT] exists. Naively we'd set it on everything except power6/7, because they set ALT_SIPR, and we've reversed the polarity of the flag. But it's more complicated than that. mpc7450 is 32-bit, and uses its own version of perf_ip_adjust() which doesn't use MMCRA[SLOT], so it doesn't need the new flag set and the behaviour is unchanged. PPC970 (and I assume power4) don't have MMCRA[SLOT], so shouldn't have the new flag set. This is a behaviour change on those cpus, though we were probably getting lucky and the bits in question were 0. power5 and power5+ set the new flag, behaviour unchanged. power6 & power7 do not set the new flag, behaviour unchanged. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc: Initialise PMU related regs on Power8	Michael Ellerman
	For both HV and guest kernels, intialise PMU regs to something sane. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/powernv: Fix invalid IOMMU table	Gavin Shan
	Ben found the root cause. Commit 37f02195bee9c25ce44e25204f40b7961a6d7c9d ("powerpc/pci: fix PCI-e devices rescan issue on powerpc platform") overwrites the IOMMU table of PCI device while enabling PCI device. The patch intends to fix the IOMMU table after that point. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/powernv: Build DMA space for PE on PHB3	Gavin Shan
	The patch intends to build 32-bits DMA space for individual PEs on PHB3. The TVE# is recognized by the combo of PE# and fixed bits from DMA address, which is zero for 32-bits DMA space. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/powernv: TCE invalidation for PHB3	Gavin Shan
	The TCE should be invalidated while it's created or free'd. The approach to do that for IODA1 and IODA2 compliant PHBs are different. So the patch differentiate them with different functions called to do that for IODA1 and IODA2 compliant PHBs. It's notable that the PCI address is used to invalidate the corresponding TCE on IODA2 compliant PHB3. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/powernv: Patch MSI EOI handler on P8	Gavin Shan
	The EOI handler of MSI/MSI-X interrupts for P8 (PHB3) need additional steps to handle the P/Q bits in IVE before EOIing the corresponding interrupt. The patch changes the EOI handler to cover that. we have individual IRQ chip in each PHB instance. During the MSI IRQ setup time, the IRQ chip is copied over from the original one for that IRQ, and the EOI handler is patched with the one that will handle the P/Q bits (As Ben suggested). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/powernv: Add option CONFIG_POWERNV_MSI	Gavin Shan
	As Michael Ellerman suggested, to add CONFIG_POWERNV_MSI for PowerNV platform. That's similar to CONFIG_PSERIES_MSI for pSeries platform. For now, we don't make it dependent on CONFIG_EEH since it's not ready to enable that yet. Apart from that, we also enable CONFIG_PPC_MSI_BITMAP on selecting CONFIG_POWERNV_MSI. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/powernv: Supports PHB3	Gavin Shan
	The patch intends to initialize PHB3 during system boot stage. The flag "PNV_PHB_MODEL_PHB3" is introduced to differentiate IODA2 compatible PHB3 from other types of PHBs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc: Fix "attempt to move .org backwards" error	Paul Mackerras
	Building a 64-bit powerpc kernel with PR KVM enabled currently gives this error: AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1 This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into 33 instructions, but we only have space for 32 at the decrementer interrupt vector (from 0x900 to 0x980). In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we currently have two instances of the HMT_MEDIUM macro, which has the effect of setting the SMT thread priority to medium. One is the first instruction, and is overwritten by a no-op on processors where we save the PPR (processor priority register), that is, POWER7 or later. The other is after we have saved the PPR. In order to reduce the code at 0x900 by one instruction, we omit the first HMT_MEDIUM. On processors without SMT this will have no effect since HMT_MEDIUM is a no-op there. On POWER5 and RS64 machines this will mean that the first few instructions take a little longer in the case where a decrementer interrupt occurs when the hardware thread is running at low SMT priority. On POWER6 and later machines, the hardware automatically boosts the thread priority when a decrementer interrupt is taken if the thread priority was below medium, so this change won't make any difference. The alternative would be to branch out of line after saving the CFAR. However, that would incur an extra overhead on all processors, whereas the approach adopted here only adds overhead on older threaded processors. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Add /proc interface to control topology updates	Nathan Fontenot
	There are instances in which we do not want topology updates to occur. In order to allow this a /proc interface (/proc/powerpc/topology_updates) is introduced so that topology updates can be enabled and disabled. This patch also adds a prrn_is_enabled() call so that PRRN events are handled in the kernel only if topology updating is enabled. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Enable PRRN handling	Nathan Fontenot
	The Linux kernel and platform firmware negotiate their mutual support of the PRRN option via the ibm,client-architecture-support interface. This patch simply sets the appropriate fields in the client architecture vector to indicate Linux support for PRRN and will allow the firmware to report PRRN events via the RTAS event-scan mechanism. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: RE-enable Virtual Processor Home Node updating	Jesse Larrew
	The new PRRN firmware feature provides a more convenient and event-driven interface than VPHN for notifying Linux of changes to the NUMA affinity of platform resources. However, for practical reasons, it may not be feasible for some customers to update to the latest firmware. For these customers, the VPHN feature supported on previous firmware versions may still be the best option. The VPHN feature was previously disabled due to races with the load balancing code when accessing the NUMA cpu maps, but the new stop_machine() approach protects the NUMA cpu maps from these concurrent accesses. It should be safe to re-enable this feature now. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Update NUMA VDSO information when updating CPU maps	Jesse Larrew
	The following patch adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3: Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3. This patch ensures that this information is also updated when the NUMA affinity of a cpu changes. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Use stop machine to update cpu maps	Nathan Fontenot
	The new PRRN firmware feature allows CPU and memory resources to be transparently reassigned across NUMA boundaries. When this happens, the kernel must update the node maps to reflect the new affinity information. Although the NUMA maps can be protected by locking primitives during the update itself, this is insufficient to prevent concurrent accesses to these structures. Since cpumask_of_node() hands out a pointer to these structures, they can still be modified outside of the lock. Furthermore, tracking down each usage of these pointers and adding locks would be quite invasive and difficult to maintain. The approach used is to make a list of affected cpus and call stop_machine to have the update routine run on each of the affected cpus allowing them to update themselves. Each cpu finds itself in the list of cpus and makes the appropriate updates. We need to have each cpu do this for themselves to handle calls to vdso_getcpu_init() added in a subsequent patch. Situations like these are best handled using stop_machine(). Since the NUMA affinity updates are exceptionally rare events, this approach has the benefit of not adding any overhead while accessing the NUMA maps during normal operation. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Update CPU maps when device tree is updated	Jesse Larrew
	Platform events such as partition migration or the new PRRN firmware feature can cause the NUMA characteristics of a CPU to change, and these changes will be reflected in the device tree nodes for the affected CPUs. This patch registers a handler for Open Firmware device tree updates and reconfigures the CPU and node maps whenever the associativity changes. Currently, this is accomplished by marking the affected CPUs in the cpu_associativity_changes_mask and allowing arch_update_cpu_topology() to retrieve the new associativity information using hcall_vphn(). Protecting the NUMA cpu maps from concurrent access during an update operation will be addressed in a subsequent patch in this series. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Update numa.c to use updated firmware_has_feature()	Nathan Fontenot
	Update the numa code to use the updated firmware_has_feature() when checking for type 1 affinity. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Update firmware_has_feature() to check architecture vector ↵	Nathan Fontenot
	5 bits The firmware_has_feature() function makes it easy to check for supported features of the hypervisor. This patch extends the capability of firmware_has_feature() to include checking for specified bits in vector 5 of the architecture vector as reported in the device tree. As part of this the #defines used for the architecture vector are re-defined such that each option has the index into vector 5 and the feature bit encoded into it. This makes checking for architecture bits when initiating data for firmware_has_feature much easier. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Use ARRAY_SIZE to iterate over firmware_features_table array	Nathan Fontenot
	When iterating over the entries in firmware_features_table we only need to go over the actual number of entries in the array instead of declaring it to be bigger and checking to make sure there is a valid entry in every slot. This patch removes the FIRMWARE_MAX_FEATURES #define and replaces the array looping with the use of ARRAY_SIZE(). Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26	powerpc/pseries: Move architecture vector definitions to prom.h	Nathan Fontenot
	As part of handling of PRRN events we need to check vector 5 of the architecture vector bits reported in the device tree to ensure PRRN event handling is enabled. To do this firmware_has_feature() is updated (in a subsequent patch) to make this check vector 5 bits. To avoid having to re-define bits in the architecture vector the bit definitions are moved to prom.h. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>