git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2013-06-21	Merge tag 'renesas-cleanup-boot-for-v3.11' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/soc From Simon Horman: Renesas ARM based SoC boot cleanup for v3.11 Work by Magnus Damm and others to clean up the boot of and move things closer to supporting multi-arch. As a side effect of this work it was decided to remove support for two boards, Bonito and AP4EVB. Those patches are included in this series as they depend on earlier patches in the series. * tag 'renesas-cleanup-boot-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: Remove Bonito board support ARM: shmobile: Remove AP4EVB board support ARM: shmobile: Remove mach/memory.h ARM: shmobile: Remove MEMORY_START/SIZE ARM: shmobile: Enable ARM_PATCH_PHYS_VIRT ARM: shmobile: Remove old SCU boot code ARM: shmobile: EMEV2 SMP with SCU boot fn and args ARM: shmobile: sh73a0 SMP with SCU boot fn and args ARM: shmobile: r8a7779 SMP with SCU boot fn and args ARM: shmobile: Add SCU boot function using argument ARM: shmobile: Add SMP boot function and argument ARM: shmobile: Rework sh7372 sleep code to use virt_to_phys() ARM: shmobile: Remove romImage CONFIG_MEMORY_START ARM: shmobile: Let romImage rely on default ATAGS ARM: shmobile: uImage load address rework Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-21	Merge tag 'renesas-cleanup-for-v3.11' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/late From Simon Horman: Renesas ARM based SoC cleanups for v3.11 __initdata annotations for the r8a7790 SoC by Morimoto-san. * tag 'renesas-cleanup-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: (158 commits) ARM: shmobile: r8a7790: add __initdata on resource and device data Based on 'renesas-pinmux-for-v3.11' and 'renesas-soc-for-v3.11 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-21	Merge branch 'ux500/cleanup' into next/drivers	Arnd Bergmann
	Patches from Lee Jones: This gets rid of mop500_snowball_ethernet_clock_enable() which is no longer in use. It also straightens out a bug which ensures the SMSC911x's regulator is turned on at start-up when using Device Tree. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-21	ARM: ux500: Remove mop500_snowball_ethernet_clock_enable()	Lee Jones
	mop500_snowball_ethernet_clock_enable() provided a means to enable a clock which was used for the SMSC911x Ethernet device on Snowball. It was merely a stand-in until the driver was common clk compliant. Now that it is, this can be removed for both DT and ATAGs booting. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-21	ARM: ux500: Correct the EN_3v3 regulator's on/off GPIO	Lee Jones
	When this node was added, the AB8500 GPIO driver was pretty broken. As a hack, we pretended that NOMADIK GPIO 26 was the correct on/off pin, as it was unused. It worked because AB8500 GPIO 26 was in an 'always on from boot' state. Now the AB8500 GPIO driver is working, the default state for all the pins is 'off'. Let's flip back over to use the correct GPIO which is _actually_ attached to the regulator. We're also taking the opportunity to straighten out some formatting misdemeanours, swapping spaces for tabs. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-21	ARM: ux500: Provide a AB8500 GPIO Device Tree node	Lee Jones
	Here we're adding a node for the AB8500 GPIO device. This will allow other DT:ed components to obtain GPIOs for use within their drivers. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-21	ALSA: usb: uniform style used in MODULE_SUPPORTED_DEVICE()	Antonio Ospite
	In sound/usb/card.c and sound/usb/misc/ua101.c there are no spaces between the vendor and the device names, use this style in the other drivers too. This also helps keeping consistency when new drivers copies from the ones already in the mainline tree. Signed-off-by: Antonio Ospite <ao2@amarulasolutions.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-06-21	ALSA: snd-usb-6fire: use vmalloc buffers	Antonio Ospite
	For USB devices it's not necessary to allocate physically contiguous buffers. Signed-off-by: Antonio Ospite <ao2@amarulasolutions.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-06-21	ALSA: snd-usb-caiaq: use vmalloc buffers	Antonio Ospite
	For USB devices it's not necessary to allocate physically contiguous buffers. Signed-off-by: Antonio Ospite <ao2@amarulasolutions.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-06-21	ALSA: snd-usb-caiaq: remove the unused snd_card_used variable	Antonio Ospite
	The snd_card_used variable is only read but never written, remove it. Signed-off-by: Antonio Ospite <ao2@amarulasolutions.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-06-21	ALSA: vx_core: off by one in vx_read_status()	Dan Carpenter
	This code is older than git, and I haven't tested it, but if size == SIZE_MAX_STATUS then we would write one space past the end of the rmh->Stat[] array. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-06-21	Documentation / CPU hotplug: Rephrase the outdated description for MADT entries	Hanjun Guo
	More than 256 entries in ACPI MADT is supported from ACPI 3.0, so the information in should be Documentation/cpu-hotplug.txt updated. [rjw: Changelog] Suggested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-21	tick: Fix tick_broadcast_pending_mask not cleared	Daniel Lezcano
	The recent modification in the cpuidle framework consolidated the timer broadcast code across the different drivers by setting a new flag in the idle state. It tells the cpuidle core code to enter/exit the broadcast mode for the cpu when entering a deep idle state. The broadcast timer enter/exit is no longer handled by the back-end driver. This change made the local interrupt to be enabled before calling CLOCK_EVENT_NOTIFY_EXIT. On a tegra114, a four cores system, when the flag has been introduced in the driver, the following warning appeared: WARNING: at kernel/time/tick-broadcast.c:578 tick_broadcast_oneshot_control CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-rc3-next-20130529+ #15 [<c00667f8>] (tick_broadcast_oneshot_control+0x1a4/0x1d0) from [<c0065cd0>] (tick_notify+0x240/0x40c) [<c0065cd0>] (tick_notify+0x240/0x40c) from [<c0044724>] (notifier_call_chain+0x44/0x84) [<c0044724>] (notifier_call_chain+0x44/0x84) from [<c0044828>] (raw_notifier_call_chain+0x18/0x20) [<c0044828>] (raw_notifier_call_chain+0x18/0x20) from [<c00650cc>] (clockevents_notify+0x28/0x170) [<c00650cc>] (clockevents_notify+0x28/0x170) from [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168) [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168) from [<c000ea94>] (arch_cpu_idle+0x8/0x38) [<c000ea94>] (arch_cpu_idle+0x8/0x38) from [<c005ea80>] (cpu_startup_entry+0x60/0x134) [<c005ea80>] (cpu_startup_entry+0x60/0x134) from [<804fe9a4>] (0x804fe9a4) I don't have the hardware, so I wasn't able to reproduce the warning but after looking a while at the code, I deduced the following: 1. the CPU2 enters a deep idle state and sets the broadcast timer 2. the timer expires, the tick_handle_oneshot_broadcast function is called, setting the tick_broadcast_pending_mask and waking up the idle cpu CPU2 3. the CPU2 exits idle handles the interrupt and then invokes tick_broadcast_oneshot_control with CLOCK_EVENT_NOTIFY_EXIT which runs the following code: [...] if (dev->next_event.tv64 == KTIME_MAX) goto out; if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_pending_mask)) goto out; [...] So if there is no next event scheduled for CPU2, we fulfil the first condition and jump out without clearing the tick_broadcast_pending_mask. 4. CPU2 goes to deep idle again and calls tick_broadcast_oneshot_control with CLOCK_NOTIFY_EVENT_ENTER but with the tick_broadcast_pending_mask set for CPU2, triggering the warning. The issue only surfaced due to the modifications of the cpuidle framework, which resulted in interrupts being enabled before the call to the clockevents code. If the call happens before interrupts have been enabled, the warning cannot trigger, because there is still the event pending which caused the broadcast timer expiry. Move the check for the next event below the check for the pending bit, so the pending bit gets cleared whether an event is scheduled on the cpu or not. [ tglx: Massaged changelog ] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reported-and-tested-by: Joseph Lo <josephl@nvidia.com> Cc: Stephen Warren <swarren@nvidia.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/1371485735-31249-1-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-21	Merge tag 'efi-urgent' into x86/urgent	H. Peter Anvin
	* Don't leak random kernel memory to EFI variable NVRAM when attempting to initiate garbage collection. Also, free the kernel memory when we're done with it instead of leaking - Ben Hutchings Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-21	regulators: max8973: fix multiple instance support	Guennadi Liakhovetski
	Currently the max8973 regulator driver uses a single static struct of regulator operations for all chip instances, but can overwrite some of its members depending on configuration. This will affect all other MAX8973 instances on the system. This patch fixes this bug by allocating a separate copy of the struct for each chip instance. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski+renesas@gmail.com> Signed-off-by: Mark Brown <broonie@linaro.org>
2013-06-21	spi/pxa2xx: fix memory corruption due to wrong size used in devm_kzalloc()	Mika Westerberg
	ACPI part of the driver accidentally used sizeof(ssp) instead of the correct sizeof(pdata). This leads to nasty memory corruptions like the one below: BUG: unable to handle kernel paging request at 0000000749fd30b8 IP: [<ffffffff813fe8a1>] __list_del_entry+0x31/0xd0 PGD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 3.10.0-rc6v3.10-rc6_sdhci_modprobe+ #443 task: ffff8801483a0940 ti: ffff88014839e000 task.ti: ffff88014839e000 RIP: 0010:[<ffffffff813fe8a1>] [<ffffffff813fe8a1>] __list_del_entry+0x31/0xd0 RSP: 0000:ffff88014839fde8 EFLAGS: 00010046 RAX: ffff880149fd30b0 RBX: ffff880149fd3040 RCX: dead000000200200 RDX: 0000000749fd30b0 RSI: ffff880149fd3058 RDI: ffff88014834d640 RBP: ffff88014839fde8 R08: ffff88014834d640 R09: 0000000000000001 R10: ffff8801483a0940 R11: 0000000000000001 R12: ffff880149fd3040 R13: ffffffff810e0b30 R14: ffff8801483a0940 R15: ffff88014834d640 FS: 0000000000000000(0000) GS:ffff880149e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000168 CR3: 0000000001e0b000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88014839fe48 ffffffff810e0baf ffffffff81120abd ffff88014839fe20 ffff8801483a0940 ffff8801483a0940 ffff8801483a0940 ffff8801486b1c90 ffff88014834d640 ffffffff810e0b30 0000000000000000 0000000000000000 Call Trace: [<ffffffff810e0baf>] worker_thread+0x7f/0x390 [<ffffffff81120abd>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810e0b30>] ? manage_workers.isra.22+0x2b0/0x2b0 [<ffffffff810e6c09>] kthread+0xd9/0xe0 [<ffffffff810f93df>] ? local_clock+0x3f/0x50 [<ffffffff810e6b30>] ? kthread_create_on_node+0x110/0x110 [<ffffffff818c5dec>] ret_from_fork+0x7c/0xb0 [<ffffffff810e6b30>] ? kthread_create_on_node+0x110/0x110 Fix this by using the right structure size in devm_kzalloc(). Reported-by: Jerome Blin <jerome.blin@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Mark Brown <broonie@linaro.org> Cc: stable@vger.kernel.org # 3.9+
2013-06-21	x86/efi: Fix dummy variable buffer allocation	Ben Hutchings
	1. Check for allocation failure 2. Clear the buffer contents, as they may actually be written to flash 3. Don't leak the buffer Compile-tested only. [ Tested successfully on my buggy ASUS machine - Matt ] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-21	Merge tag 'v3.11-rockchip-basics' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/soc From Heiko Stuebner: Adds basic support for Rockchip Cortex-A9 SoCs. * tag 'v3.11-rockchip-basics' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm: add basic support for Rockchip RK3066a boards arm: add debug uarts for rockchip rk29xx and rk3xxx series arm: Add basic clocks for Rockchip rk3066a SoCs clocksource: dw_apb_timer_of: use clocksource_of_init clocksource: dw_apb_timer_of: select DW_APB_TIMER clocksource: dw_apb_timer_of: add clock-handling clocksource: dw_apb_timer_of: enable the use the clocksource as sched clock Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-21	arm: add basic support for Rockchip RK3066a boards	Heiko Stuebner
	This adds a generic devicetree board file and a dtsi for boards based on the RK3066a SoCs from Rockchip. Apart from the generic parts (gic, clocks, pinctrl) the only components currently supported are the timers, uarts and mmc ports (all DesignWare- based). Signed-off-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Olof Johansson <olof@lixom.net>
2013-06-21	arm: add debug uarts for rockchip rk29xx and rk3xxx series	Heiko Stuebner
	Uarts on all recent Rockchip SoCs are Synopsis DesignWare 8250 types. Only their addresses vary very much. This patch adds the necessary definitions to use any of the uart ports for early debug purposes. Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2013-06-21	arm: Add basic clocks for Rockchip rk3066a SoCs	Heiko Stuebner
	This adds a basic clock setup for rk3066a SoCs. Only the gates are set up currently, as the mux and dividers should use the upcoming generic devicetree bindings. Clocks whose rates need to be known are supplied by fixed-rate "dummy"-clocks that provide the correct rate. This is uncritical insofar that the only bootloader currently in existence for Rockchip devices is the propietary Rockchip one that always setups the clocks in the necessary way. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Mike Turquette <mturquette@linaro.org>
2013-06-21	arm: multi_v7_defconfig: Enable initrd/initramfs support	Soren Brinkmann
	Add CONFIG_BLK_DEV_INITRD to the defconfig to support initramfs and initrd. Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2013-06-21	arm: multi_v7_defconfig: Enable Zynq UART driver	Soren Brinkmann
	Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2013-06-21	powerpc: Optimize hugepage invalidate	Aneesh Kumar K.V
	Hugepage invalidate involves invalidating multiple hpte entries. Optimize the operation using H_BULK_REMOVE on lpar platforms. On native, reduce the number of tlb flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/THP: Enable THP on PPC64	Aneesh Kumar K.V
	We enable only if the we support 16MB page size. Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: split hugepage when using subpage protection	Aneesh Kumar K.V
	We find all the overlapping vma and mark them such that we don't allocate hugepage in that range. Also we split existing huge page so that the normal page hash can be invalidated and new page faulted in with new protection bits. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: disable assert_pte_locked for collapse_huge_page	Aneesh Kumar K.V
	With THP we set pmd to none, before we do pte_clear. Hence we can't walk page table to get the pte lock ptr and verify whether it is locked. THP do take pte lock before calling pte_clear. So we don't change the locking rules here. It is that we can't use page table walking to check whether pte locks are held with THP. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: Prevent gcc to re-read the pagetables	Aneesh Kumar K.V
	GCC is very likely to read the pagetables just once and cache them in the local stack or in a register, but it is can also decide to re-read the pagetables. The problem is that the pagetable in those places can change from under gcc. With THP/hugetlbfs the pmd (and pgd for hugetlbfs giga pages) can change under gup_fast. The pages won't be freed untill we finish gup fast because we have irq disabled and we free these pages via rcu callback. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: Make linux pagetable walk safe with THP enabled	Aneesh Kumar K.V
	We need to have irqs disabled to handle all the possible parallel update for linux page table without holding locks. Events that we are intersted in while walking page tables are 1) Page fault 2) umap 3) THP split 4) THP collapse A) local_irq_disabled: ------------------------ 1) page fault: A none to valid transition via page fault is not an issue because we would either see a none or valid. If it is none, we would error out the page table walk. We may need to use on stack values when checking for type of page table elements, because if we do if (!is_hugepd()) { if (!pmd_none() { if (pmd_bad() { We could take that bad condition because the pmd got converted to a hugepd after the !is_hugepd check via a hugetlb fault. The right way would be to check for pmd_none higher up or use on stack value. 2) A valid to none conversion via unmap: We can safely walk the upper level table, because we don't remove the the page table entries until rcu grace period. So even if we followed a wrong pointer we still have the pointer valid till the grace period. A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent pte_clear we take _PAGE_BUSY. 3) THP split: A valid transparent hugepage is converted to nomal page. Before we split we do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING So when walking page table we need to check for pmd_trans_splitting and handle that. The pte returned should also need to be checked for _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save the value of PTE on stack and check for the flag in the local pte value. If we don't have the value set we can safely operate on the local pte value and we atomicaly set _PAGE_BUSY. 4) THP collapse: A normal page gets converted to hugepage. In the collapse path, we mark the pmd none early (pmdp_clear_flush). With irq disabled, if we are aleady walking page table we would see the pmd_none and won't continue. If we see a valid PMD, we should still check for _PAGE_PRESENT before setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/THP: Add code to handle HPTE faults for hugepages	Aneesh Kumar K.V
	The deposted PTE page in the second half of the PMD table is used to track the state on hash PTEs. After updating the HPTE, we mark the coresponding slot in the deposted PTE page valid. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: Update gup_pmd_range to handle transparent hugepages	Aneesh Kumar K.V
	Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/kvm: Handle transparent hugepage in KVM	Aneesh Kumar K.V
	We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte	Aneesh Kumar K.V
	Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages	Aneesh Kumar K.V
	Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code	Aneesh Kumar K.V
	We will use this in the later patch for handling THP pages Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/THP: Implement transparent hugepages for ppc64	Aneesh Kumar K.V
	We now have pmd entries covering 16MB range and the PMD table double its original size. We use the second half of the PMD table to deposit the pgtable (PTE page). The depoisted PTE page is further used to track the HPTE information. The information include [ secondary group \| 3 bit hidx \| valid ]. We use one byte per each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need 4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk the PTE page and invalidate all valid HPTEs. This patch implements necessary arch specific functions for THP support and also hugepage invalidate logic. These PMD related functions are intentionally kept similar to their PTE counter-part. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/THP: Double the PMD table size for THP	Aneesh Kumar K.V
	THP code does PTE page allocation along with large page request and deposit them for later use. This is to ensure that we won't have any failures when we split hugepages to regular pages. On powerpc we want to use the deposited PTE page for storing hash pte slot and secondary bit information for the HPTEs. We use the second half of the pmd table to save the deposted PTE page. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/mm: handle hugepage size correctly when invalidating hpte entries	Aneesh Kumar K.V
	If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/eeh: Debugfs for error injection	Gavin Shan
	The patch creates debugfs entries (powerpc/PCIxxxx/err_injct) for injecting EEH errors for testing purpose. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/powernv: Debugfs directory for PHB	Gavin Shan
	The patch creates one debugfs directory ("powerpc/PCIxxxx") for each PHB so that we can hook EEH error injection debugfs entry there in proceeding patch. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powerpc/eeh: Register OPAL notifier for PCI error	Gavin Shan
	The patch registers OPAL event notifier and process the PCI errors from firmware. If we have pending PCI errors, special EEH event (without binding PE) will be sent to EEH core for processing. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powernv/opal: Disable OPAL notifier upon poweroff	Gavin Shan
	While we're restarting or powering off the system, we needn't the OPAL notifier any more. So just to disable that. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21	powernv/opal: Notifier for OPAL events	Gavin Shan
	This patch implements a notifier to receive a notification on OPAL event mask changes. The notifier is only called as a result of an OPAL interrupt, which will happen upon reception of FSP messages or PCI errors. Any event mask change detected as a result of opal_poll_events() will not result in a notifier call. [benh: changelog] Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20	x86, trace: Add irq vector tracepoints	Seiji Aguchi
	[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20	x86: Rename variables for debugging	Seiji Aguchi
	Rename variables for debugging to describe meaning of them precisely. Also, introduce a generic way to switch IDT by checking a current state, debug on/off. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20	x86, trace: Introduce entering/exiting_irq()	Seiji Aguchi
	When implementing tracepoints in interrupt handers, if the tracepoints are simply added in the performance sensitive path of interrupt handers, it may cause potential performance problem due to the time penalty. To solve the problem, an idea is to prepare non-trace/trace irq handers and switch their IDTs at the enabling/disabling time. So, let's introduce entering_irq()/exiting_irq() for pre/post- processing of each irq handler. A way to use them is as follows. Non-trace irq handler: smp_irq_handler() { entering_irq(); /* pre-processing of this handler / __smp_irq_handler(); / * common logic between non-trace and trace handlers * in a vector. / exiting_irq(); / post-processing of this handler / } Trace irq_handler: smp_trace_irq_handler() { entering_irq(); / pre-processing of this handler / trace_irq_entry(); / tracepoint for irq entry / __smp_irq_handler(); / * common logic between non-trace and trace handlers * in a vector. / trace_irq_exit(); / tracepoint for irq exit / exiting_irq(); / post-processing of this handler */ } If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks cleaner. smp_trace_irq_handler() { trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); } But it doesn't work. The problem is with irq_enter/exit() being called. They must be called before trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any tracepoints are used, as tracepoints use rcu to synchronize. As a possible alternative, we may be able to call irq_enter() first as follows if irq_enter() can nest. smp_trace_irq_hander() { irq_entry(); trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); irq_exit(); } But it doesn't work, either. If irq_enter() is nested, it may have a time penalty because it has to check if it was already called or not. The time penalty is not desired in performance sensitive paths even if it is tiny. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20	tracing: Add DEFINE_EVENT_FN() macro	Steven Rostedt
	Each TRACE_EVENT() adds several helper functions. If two or more trace events share the same structure and print format, they can also share most of these helper functions and save a lot of space from duplicate code. This is why the DECLARE_EVENT_CLASS() and DEFINE_EVENT() were created. Some events require a trigger to be called at registering and unregistering of the event and to do so they use TRACE_EVENT_FN(). If multiple events require a trigger, they currently have no choice but to use TRACE_EVENT_FN() as there's no DEFINE_EVENT_FN() available. This unfortunately causes a lot of wasted duplicate code created. By adding a DEFINE_EVENT_FN(), these events can still use a DECLARE_EVENT_CLASS() and then define their own triggers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/51C3236C.8030508@hds.com Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20	x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S	H. Peter Anvin
	There is no point in using "xorq" to clear a register... use "xorl" to clear the bottom 32 bits, and the upper 32 bits get cleared by virtue of zero extension. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/n/tip-b76zi1gep39c0zs8fbvkhie9@git.kernel.org
2013-06-20	Merge tag 'v3.10-rc6' into x86/cleanups	H. Peter Anvin
	Linux 3.10-rc6 We need a change that is the mainline tree for further work. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20	x86, fpu: Use static_cpu_has_safe before alternatives	Borislav Petkov
	The call stack below shows how this happens: basically eager_fpu_init() calls __thread_fpu_begin(current) which then does if (!use_eager_fpu()), which, in turn, uses static_cpu_has. And we're executing before alternatives so static_cpu_has doesn't work there yet. Use the safe variant in this path which becomes optimal after alternatives have run. WARNING: at arch/x86/kernel/cpu/common.c:1368 warn_pre_alternatives+0x1e/0x20() You're using static_cpu_has before alternatives have run! Modules linked in: Pid: 0, comm: swapper Not tainted 3.9.0-rc8+ #1 Call Trace: warn_slowpath_common warn_slowpath_fmt ? fpu_finit warn_pre_alternatives eager_fpu_init fpu_init cpu_init trap_init start_kernel ? repair_env_string x86_64_start_reservations x86_64_start_kernel Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-6-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>