linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2015-03-23	mmc: dw_mmc: exynos: dw_mci_exynos_prepare_hs400_tuning() can be static	Wu Fengguang
	Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: omap_hsmmc: add hibernation support	Russ Dill
	Setting a dev_pm_ops suspend/resume pair but not a set of hibernation functions means those pm functions will not be called upon hibernation. Fix this by using SET_SYSTEM_SLEEP_PM_OPS, which appropriately assigns the suspend and hibernation handlers and move omap_hsmmc_x callbacks under CONFIG_PM_SLEEP to avoid build warnings. Signed-off-by: Russ Dill <Russ.Dill@ti.com> [Grygorii.Strashko@linaro.org: rebased on top of K4.0] Signed-off-by: Grygorii Strashko <grygorii.strashko@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: omap_hsmmc: use distinctive code paths for cover / card detect logic	Andreas Fenkart
	Mobile phones (some) have no card detect pin, but can detect if the cover is removed. The purpose is the same; detect if card is being added/removed, but the details differ. When the cover is removed, it does not mean the card is gone. But it might, since it is accessible now. It's like a warning. All the driver does is to limit write access to the card, see protect_card flag. In contrast, card detect notifies us after the fact, e.g. card is gone, card is inserted. We can't take precautions, but we can rely on those events, -- the card is really gone, or do scan the card. To summarize there is not much code sharing between cover and card detect, it only increases confusion. By splitting, both will be simplified in a followup patch. Signed-off-by: Andreas Fenkart <afenkart@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: omap_hsmmc: use slot-gpio functions to manage read-only pin directly	Andreas Fenkart
	The indirection via omap_hsmmc_get_ro and omap_hsmmc_get_wp is redundant. Also dropped setting gpio_wp to EINVAL since platform date is read-only Untested: no device with ro pin was available, but change is fairly simple Signed-off-by: Andreas Fenkart <afenkart@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: omap_hsmmc: remove unused fields from struct omap_hsmmc_host	Andreas Fenkart
	addon to: 09108968b7b72b6083a3bfc8f8259a74ed57255e mmc: omap_hsmmc: remove prepare/complete system suspend support Signed-off-by: Andreas Fenkart <afenkart@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sunxi: Use devm_reset_control_get_optional() for reset control	Chen-Yu Tsai
	The reset control for the sunxi mmc controller is optional. Some newer platforms (sun6i, sun8i, sun9i) have it, while older ones (sun4i, sun5i, sun7i) don't. Use the properly stubbed _optional version so the driver does not fail to compile when RESET_CONTROLLER=n. This patch also adds a check for deferred probing on the reset control. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Cc: <stable@vger.kernel.org> # 3.16+ Acked-by: David Lanzendörfer <david.lanzendoerfer@o2s.ch> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sdhci: set the .remove to sdhci_pltfm_unregister()	Kevin Hao
	In these drivers, the driver specific .remove function just a simple wrapper of function sdhci_pltfm_unregister(). So remove these wrappers and just set .remove to sdhci_pltfm_unregister(). Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sdhci: disable the clock in sdhci_pltfm_unregister()	Kevin Hao
	So we can avoid to sprinkle the clk_disable_unprepare() in many drivers. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sdhci-bcm-kona: kill the "external_clk" member in driver private struct	Kevin Hao
	Actually we can use the "clk" in the struct sdhci_pltfm_host. Also change the "external clock" to "core clock" and kill two redundant private functions in this driver as suggested by Ray Jui. Signed-off-by: Kevin Hao <haokexin@gmail.com> Reviewed-by: Ray Jui <rjui@broadcom.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sdhci-sirf: kill the "clk" member in driver private struct	Kevin Hao
	Actually we can use the "clk" in the struct sdhci_pltfm_host. With this change we can also kill the private function for get max clock in this driver. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: tegra: use devm help functions to get the clk and gpio	Kevin Hao
	Simplify the error and remove path. Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sdhci-dove: kill the driver specific private struct	Kevin Hao
	There is only one "clk" member in this driver specific private struct. Actually we can use the "clk" member in the struct sdhci_pltfm_host, and then kill this struct completely. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sdhci-dove: remove the unneeded error check	Kevin Hao
	The function clk_disable_unprepare() already take care of either error or null cases. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: sirf: update sdhci_sirf_execute_tuning procedure	weijun yang
	For the original tuning code, delay value is set to SD Bus Clock Delay Register (SD_CLK_DELAY_SETTING) as (val \| (Val << 7) \| (val << 16)), which means CLK_DELAY_IN1, CLK_DELAY_IN2 and CLK_DELAY_OUT are the same and with 128 steps. This is doubtful. In CSR design specification documents CS-304575-DR-3H, this issue is clarified, the delay[13:0] in SD_CLK_DELAY_SETTING is simplied to the concatenation of {CLK_DELAY_IN2, CLK_DELAY_IN1}. Besides, for CMD19 tuning, no need to set CLK_DELAY_OUT([22,16] of SD_CLK_DELAY_SETTING). Signed-off-by: weijun yang <york.yang@csr.com> Signed-off-by: Barry Song <baohua.song@csr.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: Don't crash if we get an interrupt before slot has initted	Doug Anderson
	It's unlikely that this is really needed on any single-slot systems where we disable card detects until the end of probe, but it still seems safer to check to make sure that a slot has been initted before we try to dereference it to find the SDIO interrupt mask. Signed-off-by: Doug Anderson <dianders@chromium.org> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: Only enable CD after setup and only if needed	Doug Anderson
	We really don't want to get a card detect interrupt during probe time since it can confuse things. Let's disable the card detect interrupt until we're in a really good place: the end of probe. Let's also simply avoid enabling the card detect interrupt if it's not used. It appears that (at least on rk3288) when vqmmc is turned on it can cause a bogus "card detect" interrupt. That meant that we were getting a predictable card detect interrupt while we were in mmc_add_host(). On the version of the kernel I'm working with at least (3.14), this is not a great time to get a card detect interrupt since I think that we don't grab all the needed locks in mmc_add_host() and children. I put stack dumps in dw_mci_setup_bus() and found that I could see two distinct stack crawls that looked like: Caller one: * dw_mci_setup_bus * dw_mci_set_ios * mmc_power_up * mmc_start_host * mmc_add_host Caller two: * dw_mci_setup_bus * dw_mci_set_ios * mmc_set_chip_select * mmc_go_idle * mmc_rescan * process_one_work * worker_thread * kthread Signed-off-by: Doug Anderson <dianders@chromium.org> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: Don't start commands while busy	Doug Anderson
	We've seen problems on some WiFi modules where we seem to send a CMD53 (which requires the data lines) while the module is asserting busy. We shouldn't do that. The Designware Databook says that before issuing a new data transfer command we should check for busy, so that's what we'll do. We'll leverage the existing dw_mmc knowledge about whether it should wait for the previous command to finish to know whether we should check for busy before sending the command. This means we won't end up incorrectly waiting for things like CMD52 (SDIO) or CMD13 (SD) which don't use the data line. Note that this also has the advantage of making sure that we don't change the clock while the card is busy, too. Signed-off-by: Doug Anderson <dianders@chromium.org> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: Give a good reset after we give power	Doug Anderson
	We should give dw_mmc a good reset after we apply power. On some boards vqmmc may actually be connected to the IP block in the SoC so it's good to reset after power comes in. Without this we sometimes see failures enumerating cards on rk3288. Signed-off-by: Doug Anderson <dianders@chromium.org> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: Make sure we only adjust the clock when power is on	Doug Anderson
	It appears that we can confuse things if we try to turn on the MMC clock when the power is off. Adjust is so that we turn the clock on (using dw_mci_setup_bus) after power is all the way on and we turn the clock off before the power goes off. Signed-off-by: Doug Anderson <dianders@chromium.org> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: fix mmc_test by not sending abort for DRTO/EBE errors	addy ke
	The STOP command can terminate a data transfer between a memory card and mmc controller. As show in Synopsys DesignWare Cores Mobile Storage Host Databook: Data timeout and Data end-bit error will terminate further data transfer by mmc controller. So we should not send abort command to terminate a data transfer again if we got DRTO and EBE interrupt. After this patch, all mmc_test cases can pass on RK3288-Pink2 board. Signed-off-by: Addy Ke <addy.ke@rock-chips.com> Reviewed-by: Doug Anderson <dianders@chromium.org> Tested-by: Doug Anderson <dianders@chromium.org> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: rockchip: add support MMC_CAP_RUNTIME_RESUME capability	addy ke
	To support HS200 and UHS mode, mmc core will call init_card() to execute tuning: - sdio: init_card can be executed at runtime resume. - sd and mmc: init_card can be executed at resume or runtime resume, which depends on MMC_CAP_RUNTIME_RESUME capability. On rk3288 SoC, host will get DRTO interrupt when host send command to read tuning data. This will spend more than 111ms: drto_ms = drto_clks * 1000 / bus_hz = 111ms. And the total tuning time will be more than 400ms. So we should add MMC_CAP_RUNTIME_RESUME capability to execute tuning at runtime resume. Only if we do so, can we pass resume test. Reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Addy Ke <addy.ke@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	mmc: dw_mmc: exynos: Support eMMC's HS400 mode	Seungwon Jeon
	Implements HS400 mode support for exynos host driver. This also include some updates as new mode is added. Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com> [Alim: addressed review comments] Tested-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-03-23	metag: Fix ioremap_wc/ioremap_cached build errors	James Hogan
	When ioremap_wc() or ioremap_cached() are used without first including asm/pgtable.h, the _PAGE_CACHEABLE or _PAGE_WR_COMBINE definitions aren't found, resulting in build errors like the following (in next-20150323 due to "lib: devres: add a helper function for ioremap_wc"): lib/devres.c: In function ‘devm_ioremap_wc’: lib/devres.c:91: error: ‘_PAGE_WR_COMBINE’ undeclared We can't easily include asm/pgtable.h in asm/io.h due to dependency problems, so split out the _PAGE_* definitions from asm/pgtable.h into a separate asm/pgtable-bits.h header (as a couple of other architectures already do), and include that in io.h instead. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Cc: Abhilash Kesavan <a.kesavan@samsung.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23	parisc: Fix pmd code to depend on PT_NLEVELS value, not on CONFIG_64BIT	Helge Deller
	Make the code which sets up the pmd depend on PT_NLEVELS == 3, not on CONFIG_64BIT. The reason is, that a 64bit kernel with a page size greater than 4k doesn't need the pmd and thus has PT_NLEVELS = 2. Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23	parisc: mm: don't count preallocated pmds	Mikulas Patocka
	The patch dc6c9a35b66b520cf67e05d8ca60ebecad3b0479 that counts pmds allocated for a process introduced a bug on 64-bit PA-RISC kernels. The PA-RISC architecture preallocates one pmd with each pgd. This preallocated pmd can never be freed - pmd_free does nothing when it is called with this pmd. When the kernel attempts to free this preallocated pmd, it decreases the count of allocated pmds. The result is that the counter underflows and this error is reported. This patch fixes the bug by artifically incrementing the counter in pmd_free when the kernel tries to free the preallocated pmd. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23	x86/asm/entry: Replace some open-coded VM86 checks with v8086_mode() checks	Andy Lutomirski
	This allows us to remove some unnecessary ifdefs. There should be no change to the generated code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f7e00f0d668e253abf0bd8bf36491ac47bd761ff.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Remove user_mode_vm()	Andy Lutomirski
	It has no callers anymore. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a594afd6a0bddb1311bd7c92a15201c87fbb8681.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()'	Andy Lutomirski
	user_mode_vm() and user_mode() are now the same. Change all callers of user_mode_vm() to user_mode(). The next patch will remove the definition of user_mode_vm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org [ Merged to a more recent kernel. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Make user_mode() work correctly if regs came from VM86 mode	Andy Lutomirski
	user_mode() is now identical to user_mode_vm(). Subsequent patches will change all callers of user_mode_vm() to user_mode() and then delete user_mode_vm(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0dd03eacb5f0a2b5ba0240de25347a31b493c289.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Use user_mode_ignore_vm86() where appropriate	Andy Lutomirski
	A few of the user_mode() checks in traps.c are immediately after explicit checks for vm86 mode. Change them to user_mode_ignore_vm86(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0b324d5b75c3402be07f8d3c6245ed7f4995029e.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry, perf: Explicitly optimize vm86 handling in code_segment_base()	Andy Lutomirski
	There's no point in checking the VM bit on 64-bit, and, since we're explicitly checking it, we can use user_mode_ignore_vm86() after the check. While we're at it, rearrange the #ifdef slightly to make the code flow a bit clearer. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/dc1457a734feccd03a19bb3538a7648582f57cdd.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Add user_mode_ignore_vm86()	Andy Lutomirski
	user_mode() is dangerous and user_mode_vm() has a confusing name. Add user_mode_ignore_vm86() (equivalent to current user_mode()). We'll change the small number of legitimate users of user_mode() to user_mode_ignore_vm86(). Inspired by grsec, although this works rather differently. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/202c56ca63823c338af8e2e54948dbe222da6343.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	Merge tag 'v4.0-rc5' into x86/asm, to resolve conflicts	Ingo Molnar
	Conflicts: arch/x86/kernel/entry_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	irqchip: digicolor: Move digicolor_set_gc to init section	Baruch Siach
	The digicolor_set_gc() routine is only called from __init annotated digicolor_of_init(). Annotate digicolor_set_gc() with __init as well to save a few bytes at run time. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Link: https://lkml.kernel.org/r/a3b57ecdbe0b07f55c20c07ff98f1f694275722d.1427009985.git.baruch@tkos.co.il Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2015-03-23	parisc: Add compile-time check when adding new syscalls	Helge Deller
	Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23	sched/rt: Use IPI to trigger RT task push migration instead of pulling	Steven Rostedt
	When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joern Engel <joern@purestorage.com> Cc: Clark Williams <williams@redhat.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	irq_work: Fix build failure when CONFIG_IRQ_WORK is not defined	Steven Rostedt
	When CONFIG_IRQ_WORK is not defined (difficult to do, as it also requires CONFIG_PRINTK not to be defined), we get a build failure: kernel/built-in.o: In function `flush_smp_call_function_queue': kernel/smp.c:263: undefined reference to `irq_work_run' kernel/smp.c:263: undefined reference to `irq_work_run' Makefile:933: recipe for target 'vmlinux' failed Simplest thing to do is to make irq_work_run() a nop when not set. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150319101851.4d224d9b@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵	Ingo Molnar
	applying new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	timers/tick/broadcast-hrtimer: Fix suspicious RCU usage in idle loop	Preeti U Murthy
	The hrtimer mode of broadcast queues hrtimers in the idle entry path so as to wakeup cpus in deep idle states. The associated call graph is : cpuidle_idle_call() \|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....)) \|_____tick_broadcast_set_event() \|____clockevents_program_event() \|____bc_set_next() The hrtimer_{start/cancel} functions call into tracing which uses RCU. But it is not legal to call into RCU in cpuidle because it is one of the quiescent states. Hence protect this region with RCU_NONIDLE which informs RCU that the cpu is momentarily non-idle. As an aside it is helpful to point out that the clock event device that is programmed here is not a per-cpu clock device; it is a pseudo clock device, used by the broadcast framework alone. The per-cpu clock device programming never goes through bc_set_next(). Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: linuxppc-dev@ozlabs.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	lockdep: Fix the module unload key range freeing logic	Peter Zijlstra
	Module unload calls lockdep_free_key_range(), which removes entries from the data structures. Most of the lockdep code OTOH assumes the data structures are append only; in specific see the comments in add_lock_to_list() and look_up_lock_class(). Clearly this has only worked by accident; make it work proper. The actual scenario to make it go boom would involve the memory freed by the module unlock being re-allocated and re-used for a lock inside of a rcu-sched grace period. This is a very unlikely scenario, still better plug the hole. Use RCU list iteration in all places and ammend the comments. Change lockdep_free_key_range() to issue a sync_sched() between removal from the lists and returning -- which results in the memory being freed. Further ensure the callers are placed correctly and comment the requirements. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Tsyvarev <tsyvarev@ispras.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	sched: Fix RLIMIT_RTTIME when PI-boosting to RT	Brian Silverman
	When non-realtime tasks get priority-inheritance boosted to a realtime scheduling class, RLIMIT_RTTIME starts to apply to them. However, the counter used for checking this (the same one used for SCHED_RR timeslices) was not getting reset. This meant that tasks running with a non-realtime scheduling class which are repeatedly boosted to a realtime one, but never block while they are running realtime, eventually hit the timeout without ever running for a time over the limit. This patch resets the realtime timeslice counter when un-PI-boosting from an RT to a non-RT scheduling class. I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT mutex which induces priority boosting and spins while boosted that gets killed by a SIGXCPU on non-fixed kernels but doesn't with this patch applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and does happen eventually with PREEMPT_VOLUNTARY kernels. Signed-off-by: Brian Silverman <brian@peloton-tech.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: austin@peloton-tech.com Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	perf: Fix irq_work 'tail' recursion	Peter Zijlstra
	Vince reported a watchdog lockup like: [<ffffffff8115e114>] perf_tp_event+0xc4/0x210 [<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160 [<ffffffff810b7f10>] lock_release+0x130/0x260 [<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40 [<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80 [<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0 [<ffffffff811f71ce>] send_sigio+0xae/0x100 [<ffffffff811f72b7>] kill_fasync+0x97/0xf0 [<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0 [<ffffffff8115d103>] perf_pending_event+0x33/0x60 [<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80 [<ffffffff8114e448>] irq_work_run+0x18/0x40 [<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0 [<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80 Which is caused by an irq_work generating new irq_work and therefore not allowing forward progress. This happens because processing the perf irq_work triggers another perf event (tracepoint stuff) which in turn generates an irq_work ad infinitum. Avoid this by raising the recursion counter in the irq_work -- which effectively disables all software events (including tracepoints) from actually triggering again. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	irqchip: renesas-irqc: Add functional clock to bindings	Geert Uytterhoeven
	The external IRQ controller has a functional clock, which is used for power management. Document it. Fix a typo in the r8a73a4 SoC name while we're at it. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Link: https://lkml.kernel.org/r/1426704961-27322-4-git-send-email-geert+renesas@glider.be Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2015-03-23	irqchip: renesas-irqc: Add minimal runtime PM support	Geert Uytterhoeven
	This is just enough to let pm_clk_*() enable the functional clock, and manage it for suspend/resume, if present. Before, it was assumed enabled by the bootloader or reset state. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Link: https://lkml.kernel.org/r/1426704961-27322-3-git-send-email-geert+renesas@glider.be Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2015-03-23	irqchip: renesas-irqc: Add more register documentation	Geert Uytterhoeven
	Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Link: https://lkml.kernel.org/r/1426704961-27322-2-git-send-email-geert+renesas@glider.be Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2015-03-23	x86/boot: Standardize strcmp()	Arjun Sreedharan
	strcmp() is always expected to return 0 when arguments are equal, negative when its first argument @str1 is less than its second argument @str2 and a positive value otherwise. Previously strcmp("a", "b") returned 1. Now it gives -1, as it is supposed to. Until now this bug never triggered, because all uses for strcmp() in the boot code tested for nonzero: triton:~/tip> git grep strcmp arch/x86/boot/ arch/x86/boot/boot.h:int strcmp(const char str1, const char str2); arch/x86/boot/edd.c: if (!strcmp(eddarg, "skipmbr") \|\| !strcmp(eddarg, "skip")) { arch/x86/boot/edd.c: else if (!strcmp(eddarg, "off")) arch/x86/boot/edd.c: else if (!strcmp(eddarg, "on")) should in the future strcmp() be used in a comparative way in the boot code, it might have led to (not so subtle) bugs. Signed-off-by: Arjun Sreedharan <arjun024@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426520267-1803-1-git-send-email-arjun024@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/cpu/cacheinfo: Fix cache_get_priv_group() for Intel processors	Sudeep Holla
	The private pointer provided by the cacheinfo code is used to implement the AMD L3 cache-specific attributes using a pointer to the northbridge descriptor. It is needed for performing L3-specific operations and for that we need a couple of PCI devices and other service information, all contained in the northbridge descriptor. This results in failure of cacheinfo setup as shown below as cache_get_priv_group() returns the uninitialised private attributes which are not valid for Intel processors. ------------[ cut here ]------------ WARNING: CPU: 3 PID: 1 at fs/sysfs/group.c:102 internal_create_group+0x151/0x280() sysfs: (bin_)attrs not set by subsystem for group: index3/ Modules linked in: CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc3+ #1 Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014 ... Call Trace: dump_stack warn_slowpath_common warn_slowpath_fmt internal_create_group sysfs_create_groups device_add cpu_device_create ? __kmalloc cache_add_dev cacheinfo_sysfs_init ? container_dev_init do_one_initcall kernel_init_freeable ? rest_init kernel_init ret_from_fork ? rest_init This patch fixes the issue by checking if the L3 cache indices are populated correctly (AMD-specific) before initializing the private attributes. Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/mce: Reindent __mcheck_cpu_apply_quirks() properly	Borislav Petkov
	Had some strange 3 tabs + 2 chars indentation, probably from me. Fix it. No code changed: # arch/x86/kernel/cpu/mcheck/mce.o: text data bss dec hex filename 21371 5923 264 27558 6ba6 mce.o.before 21371 5923 264 27558 6ba6 mce.o.after md5: eb3996c84d15e08ed836f043df2cbb01 mce.o.before.asm eb3996c84d15e08ed836f043df2cbb01 mce.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/mce: Use safe MSR accesses for AMD quirk	Jesse Larrew
	Certain MSRs are only relevant to a kernel in host mode, and kvm had chosen not to implement these MSRs at all for guests. If a guest kernel ever tried to access these MSRs, the result was a general protection fault. KVM will be separately patched to return 0 when these MSRs are read, and this patch ensures that MSR accesses are tolerant of exceptions. Signed-off-by: Jesse Larrew <jesse.larrew@amd.com> [ Drop {} braces around loop ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joel Schopp <joel.schopp@amd.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/1426262619-5016-1-git-send-email-jesse.larrew@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/fpu: Kill eager_fpu_init_bp()	Oleg Nesterov
	Now that eager_fpu_init_bp() does setup_init_fpu_buf() only and nothing else, we can remove it and move this code into its "caller", eager_fpu_init(). This avoids the confusing games with "static __refdata void (*boot_func)": init_xstate_buf can be NULL only during boot, so it is safe to call the __init-annotated setup_init_fpu_buf() function in eager_fpu_init(), we just need to mark it as __init_refok. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150314151334.GC13029@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>