linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2011-07-20	mmc: dw_mmc: brackets in register access macros	James Hogan
	Add brackets around use of the dev argument to the mci_{read,write}{w,l,q}() macros, for extra safety. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: dw_mmc: convert card tasklet to workqueue	James Hogan
	Convert the card insert/remove tasklet to a workqueue, and call the setpower platform specific callback without the spinlock held. This means neither of the setpower or get_cd callbacks are called from atomic context which allows them to sleep. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: dw_mmc: fix race with request and removal	James Hogan
	When a request is made, the card presence is checked and the request is queued. These two parts must be atomic with respect to card removal, or a card removal could be handled in between, and the new request wouldn't get cancelled until another card was inserted. Therefore move the spinlock protection from dw_mci_queue_request() up into dw_mci_request() to cover the presence check. Note that the test_bit() used for the presence check isn't atomic itself, so should have been protected by a spinlock anyway. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: dw_mmc: clear TXDR/RXDR ints before enabling	James Hogan
	DMA is only used for transactions exceeding a certain length, otherwise PIO is used. The TXDR and RXDR interrupts are masked when in DMA mode but still fire. When switching to PIO mode (e.g. to get SCR field when an SD card is inserted) these interrupts are not cleared and so they trigger the ISR as soon as they are unmasked. If the previous DMA did a write, then the ISR will handle the TXDR interrupt even if the transaction is a read, completing the transaction without modifying the read buffer. This is fixed primarily by clearing these two interrupts before unmasking them when setting up PIO mode, and also by making the ISR more robust by only handling TXDR/RXDR in the correct read/write direction. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: sdhi: Add write16_hook	Simon Horman
	Some controllers require waiting for the bus to become idle before writing to some registers. I have implemented this by adding a hook to sd_ctrl_write16() and implementing a hook for SDHI which waits for the bus to become idle. Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Magnus Damm <magnus.damm@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: tmio: Share register access functions	Simon Horman
	Move register access functions into a shared header. Use sd_ctrl_write16 in tmio_mmc_dma.c:tmio_mmc_enable_dma(). Other than avoiding (trivial) open-coding, the motivation for this is to allow platform-hooks in access functions to be applied across all applicable accesses. Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Magnus Damm <magnus.damm@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: tmio: name 0xd8 as CTL_DMA_ENABLE	Simon Horman
	This reflects at least the current usage of this register and I think it improves the readability of the code ever so slightly. Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Magnus Damm <magnus.damm@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: block: add checking of r/w command response	Russell King - ARM Linux
	Check the status bits in the r/w command response for any errors. If error bits are set, then we won't have seen any data transferred, so it's pointless doing any further checking. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: block: improve error recovery from command channel errors	Russell King - ARM Linux
	Command channel errors fall into four classes: 1. The command was issued with the card in the wrong state 2. The command failed to be received by the card correctly 3. The cards response failed to be received by the host (CRC error) 4. The card failed to respond to the card For (1), in theory we should know that the card is in the correct state. However, a failed stop command (or other failure) may result in the card remaining in a data transfer state from the previous command. If we detect this condition, we try to recover by sending a stop command. For the initial commands (set block count and the read/write command) no data will have been transferred. All that we need deal with is retrying at this point. A failed stop command can be remedied as above. If we are unable to recover the card (eg, the card ignores our requests for status, or we don't recognise the error code) then we immediately fail the request. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: block: allow get_card_status() to return error status	Russell King - ARM Linux
	If the MMC_SEND_STATUS command is not successful, we should not return a zero status word, but instead allow the caller to know positively that an error occurred. Convert the open-coded get_card_status() to use the helper function, and provide definitions for the card state field. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: dw_mmc: protect a sequence of request and request-done.	Seungwon Jeon
	Response timeout (RTO), Response crc error (RCRC) and Response error (RE) signals come with command done (CD) and can be raised preceding command done (CD). That is these error interrupts and CD can be handled in separate dw_mci_interrupt(). If mmc_request_done() is called because of a response timeout before command done has occured, we might send the next request before the CD of current request is finished. This can bring about a broken sequence of request and request-done. And Data error interrupt (DRTO, DCRC, SBE, EBE) and data transfer over (DTO) have the same problem. Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: dw_mmc: set the card_width bit per card.	Seungwon Jeon
	This patch sets the card_width bit of CTYPE for the corresponding card. CTYPE[31] and CTYPE[16] correspond respectively to card[15] and card[0] for 8-bit mode. And CTYPE[15] and CTYPE[0] correspond respectively to card[15] and CTYPE[0] for 1-bit or 4-bit mode. Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	ARM: mmp2: update mmp2_defconfig to support mmc	Zhangfei Gao
	1. support brownstone 2. support mmc 3. support basic filesystem and language 4. remove dynamic_debug, since too many log during access sd Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: sdhci-pxa: move platform data to include/linux/platform_data	Zhangfei Gao
	As suggested by Arnd, move platform data to include/linux/platform_data in order to improve build coverage for the driver. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: host: delete obsolete sdhci-pxa.c	Zhangfei Gao
	Delete obsolete sdhci-pxa.c, which was previously shared amongst the entire PXA series. Instead we now use sdhci-pxav3.c for mmp2 and sdhci-pxav2.c for pxa9xx. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: update mmp2 mmc resources in arch/arm	Zhangfei Gao
	Update MMP2 platform code to "sdhci-pxav3", following the driver rename. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: host: split up sdhci-pxa, create sdhci-pxav2.c	Zhangfei Gao
	sdhci-pltfm driver for PXAV2 SoCs, such as pxa910. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Jun Nie <njun@marvell.com> Signed-off-by: Qiming Wu <wuqm@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: host: split up sdhci-pxa, create sdhci-pxav3.c	Zhangfei Gao
	sdhci-pltfm driver for PXAV3 SoCs, such as MMP2. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Philip Rakity <prakity@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: sdhci: change sdhci-pltfm into a module	Shawn Guo
	There are a couple of problems left from the sdhci pltfm and OF consolidation changes. * When building more than one sdhci-pltfm based drivers in the same image, linker will give multiple definition error on the sdhci-pltfm helper functions. For example right now, building sdhci-of-esdhc and sdhci-of-hlwd together is a valid combination from Kconfig view. * With the current build method, there is error with building the drivers as module, but module installation fails with modprobe. The patch fixes above problems by changing sdhci-pltfm into a module. To avoid EXPORT_SYMBOL on so many big endian IO accessors, it moves these accessors into sdhci-pltfm.h as the 'static inline' functions. As a result, sdhci.h needs to be included in sdhci-pltfm.h, and in turn can be removed from individual drivers which already include sdhci-pltfm.h. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: Standardize header file inclusion checks.	Robert P. J. Day
	Standardize the checks for multiple MMC header file inclusion, including adding comments to terminating #endif's, and fixing one incorrect comment. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: sdhci: merge two sdhci-pltfm.h into one	Shawn Guo
	The structure sdhci_pltfm_data is not necessarily to be in a public header like include/linux/mmc/sdhci-pltfm.h, so the patch moves it into drivers/mmc/host/sdhci-pltfm.h and eliminates the former one. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: sdhci: make sdhci-of device drivers self registered	Shawn Guo
	The patch turns the sdhci-of-core common stuff into helper functions added into sdhci-pltfm.c, and makes sdhci-of device drviers self registered using the same pair of .probe and .remove used by sdhci-pltfm device drivers. As a result, sdhci-of-core.c and sdhci-of.h can be eliminated with those common things merged into sdhci-pltfm.c and sdhci-pltfm.h respectively. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: sdhci: eliminate sdhci_of_host and sdhci_of_data	Shawn Guo
	The patch migrates the use of sdhci_of_host and sdhci_of_data to sdhci_pltfm_host and sdhci_pltfm_data, so that the former pair can be eliminated. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	mmc: sdhci: make sdhci-pltfm device drivers self registered	Shawn Guo
	The patch turns the common stuff in sdhci-pltfm.c into functions, and add device drivers their own .probe and .remove which in turn call into the common functions, so that those sdhci-pltfm device drivers register itself and keep all device specific things away from common sdhci-pltfm file. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-20	rcu: Fix wrong check in list_splice_init_rcu()	Jan H. Schönherr
	If the list to be spliced is empty, then list_splice_init_rcu() has nothing to do. Unfortunately, list_splice_init_rcu() does not check the list to be spliced; it instead checks the list to be spliced into. This results in memory leaks given current usage. This commit therefore fixes the empty-list check. Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-07-20	net,rcu: Convert call_rcu(xt_rateest_free_rcu) to kfree_rcu()	Paul E. McKenney
	The RCU callback xt_rateest_free_rcu() just calls kfree(), so we can use kfree_rcu() instead of call_rcu(). This also allows us to dispense with an rcu_barrier() call, speeding up unloading of this module. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	sysctl,rcu: Convert call_rcu(free_head) to kfree	Paul E. McKenney
	The RCU callback free_head just calls kfree(), so we can use kfree_rcu() instead of call_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	vmalloc,rcu: Convert call_rcu(rcu_free_vb) to kfree_rcu()	Lai Jiangshan
	The rcu callback rcu_free_vb() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(rcu_free_vb). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Namhyung Kim <namhyung@gmail.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	vmalloc,rcu: Convert call_rcu(rcu_free_va) to kfree_rcu()	Lai Jiangshan
	The rcu callback rcu_free_va() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(rcu_free_va). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Namhyung Kim <namhyung@gmail.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	ipc,rcu: Convert call_rcu(ipc_immediate_free) to kfree_rcu()	Lai Jiangshan
	The rcu callback ipc_immediate_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(ipc_immediate_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	ipc,rcu: Convert call_rcu(free_un) to kfree_rcu()	Lai Jiangshan
	The rcu callback free_un() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(free_un). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	security,rcu: Convert call_rcu(sel_netport_free) to kfree_rcu()	Lai Jiangshan
	The rcu callback sel_netport_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(sel_netport_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Eric Paris <eparis@parisplace.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	security,rcu: Convert call_rcu(sel_netnode_free) to kfree_rcu()	Lai Jiangshan
	The rcu callback sel_netnode_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(sel_netnode_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Eric Paris <eparis@parisplace.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	ia64,rcu: Convert call_rcu(sn_irq_info_free) to kfree_rcu()	Lai Jiangshan
	The rcu callback sn_irq_info_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(sn_irq_info_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	block,rcu: Convert call_rcu(disk_free_ptbl_rcu_cb) to kfree_rcu()	Lai Jiangshan
	The rcu callback disk_free_ptbl_rcu_cb() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(disk_free_ptbl_rcu_cb). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	scsi,rcu: Convert call_rcu(fc_rport_free_rcu) to kfree_rcu()	Lai Jiangshan
	The rcu callback fc_rport_free_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(fc_rport_free_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Robert Love <robert.w.love@intel.com> Cc: "James E.J. Bottomley" <James.Bottomley@suse.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu()	Lai Jiangshan
	The rcu callback __put_tree() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(__put_tree). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	Merge branch 'stable/xen-pciback-0.6.3' into stable/drivers	Konrad Rzeszutek Wilk
	* stable/xen-pciback-0.6.3: xen/pciback: Have 'passthrough' option instead of XEN_PCIDEV_BACKEND_PASS and XEN_PCIDEV_BACKEND_VPCI xen/pciback: Remove the DEBUG option. xen/pciback: Drop two backends, squash and cleanup some code. xen/pciback: Print out the MSI/MSI-X (PIRQ) values xen/pciback: Don't setup an fake IRQ handler for SR-IOV devices. xen: rename pciback module to xen-pciback. xen/pciback: Fine-grain the spinlocks and fix BUG: scheduling while atomic cases. xen/pciback: Allocate IRQ handler for device that is shared with guest. xen/pciback: Disable MSI/MSI-X when reseting a device xen/pciback: guest SR-IOV support for PV guest xen/pciback: Register the owner (domain) of the PCI device. xen/pciback: Cleanup the driver based on checkpatch warnings and errors. xen/pciback: xen pci backend driver. Conflicts: drivers/xen/Kconfig
2011-07-20	Merge branch 'rcu/urgent' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/urgent
2011-07-20	security,rcu: Convert call_rcu(whitelist_item_free) to kfree_rcu()	Lai Jiangshan
	The rcu callback whitelist_item_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(whitelist_item_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: James Morris <jmorris@namei.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	md,rcu: Convert call_rcu(free_conf) to kfree_rcu()	Lai Jiangshan
	The rcu callback free_conf() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(free_conf). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NeilBrown <neilb@suse.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20	signal: align __lock_task_sighand() irq disabling and RCU	Paul E. McKenney
	The __lock_task_sighand() function calls rcu_read_lock() with interrupts and preemption enabled, but later calls rcu_read_unlock() with interrupts disabled. It is therefore possible that this RCU read-side critical section will be preempted and later RCU priority boosted, which means that rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but with interrupts disabled. This results in lockdep splats, so this commit nests the RCU read-side critical section within the interrupt-disabled region of code. This prevents the RCU read-side critical section from being preempted, and thus prevents the attempt to deboost with interrupts disabled. It is quite possible that a better long-term fix is to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-07-20	softirq,rcu: Inform RCU of irq_exit() activity	Peter Zijlstra
	The rcu_read_unlock_special() function relies on in_irq() to exclude scheduler activity from interrupt level. This fails because exit_irq() can invoke the scheduler after clearing the preempt_count() bits that in_irq() uses to determine that it is at interrupt level. This situation can result in failures as follows: $task IRQ SoftIRQ rcu_read_lock() /* do stuff / <preempt> \|= UNLOCK_BLOCKED rcu_read_unlock() --t->rcu_read_lock_nesting irq_enter(); / do stuff, don't use RCU / irq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); invoke_softirq() ttwu(); spin_lock_irq(&pi->lock) rcu_read_lock(); / do stuff / rcu_read_unlock(); rcu_read_unlock_special() rcu_report_exp_rnp() ttwu() spin_lock_irq(&pi->lock) / deadlock */ rcu_read_unlock_special(t); Ed can simply trigger this 'easy' because invoke_softirq() immediately does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff first, but even without that the above happens. Cure this by also excluding softirqs from the rcu_read_unlock_special() handler and ensuring the force_irqthreads ksoftirqd/# wakeup is done from full softirq context. [ Alternatively, delaying the ->rcu_read_lock_nesting decrement until after the special handling would make the thing more robust in the face of interrupts as well. And there is a separate patch for that. ] Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-07-20	sched: Add irq_{enter,exit}() to scheduler_ipi()	Peter Zijlstra
	Ensure scheduler_ipi() calls irq_{enter,exit} when it does some actual work. Traditionally we never did any actual work from the resched IPI and all magic happened in the return from interrupt path. Now that we do do some work, we need to ensure irq_{enter,exit} are called so that we don't confuse things. This affects things like timekeeping, NO_HZ and RCU, basically everything with a hook in irq_enter/exit. Explicit examples of things going wrong are: sched_clock_cpu() -- has a callback when leaving NO_HZ state to take a new reading from GTOD and TSC. Without this callback, time is stuck in the past. RCU -- needs in_irq() to work in order to avoid some nasty deadlocks Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-07-20	rcu: protect __rcu_read_unlock() against scheduler-using irq handlers	Paul E. McKenney
	The addition of RCU read-side critical sections within runqueue and priority-inheritance lock critical sections introduced some deadlock cycles, for example, involving interrupts from __rcu_read_unlock() where the interrupt handlers call wake_up(). This situation can cause the instance of __rcu_read_unlock() invoked from interrupt to do some of the processing that would otherwise have been carried out by the task-level instance of __rcu_read_unlock(). When the interrupt-level instance of __rcu_read_unlock() is called with a scheduler lock held from interrupt-entry/exit situations where in_irq() returns false, deadlock can result. This commit resolves these deadlocks by using negative values of the per-task ->rcu_read_lock_nesting counter to indicate that an instance of __rcu_read_unlock() is in flight, which in turn prevents instances from interrupt handlers from doing any special processing. This patch is inspired by Steven Rostedt's earlier patch that similarly made __rcu_read_unlock() guard against interrupt-mediated recursion (see https://lkml.org/lkml/2011/7/15/326), but this commit refines Steven's approach to avoid the need for preemption disabling on the __rcu_read_unlock() fastpath and to also avoid the need for manipulating a separate per-CPU variable. This patch avoids need for preempt_disable() by instead using negative values of the per-task ->rcu_read_lock_nesting counter. Note that nested rcu_read_lock()/rcu_read_unlock() pairs are still permitted, but they will never see ->rcu_read_lock_nesting go to zero, and will therefore never invoke rcu_read_unlock_special(), thus preventing them from seeing the RCU_READ_UNLOCK_BLOCKED bit should it be set in ->rcu_read_unlock_special. This patch also adds a check for ->rcu_read_unlock_special being negative in rcu_check_callbacks(), thus preventing the RCU_READ_UNLOCK_NEED_QS bit from being set should a scheduling-clock interrupt occur while __rcu_read_unlock() is exiting from an outermost RCU read-side critical section. Of course, __rcu_read_unlock() can be preempted during the time that ->rcu_read_lock_nesting is negative. This could result in the setting of the RCU_READ_UNLOCK_BLOCKED bit after __rcu_read_unlock() checks it, and would also result it this task being queued on the corresponding rcu_node structure's blkd_tasks list. Therefore, some later RCU read-side critical section would enter rcu_read_unlock_special() to clean up -- which could result in deadlock if that critical section happened to be in the scheduler where the runqueue or priority-inheritance locks were held. This situation is dealt with by making rcu_preempt_note_context_switch() check for negative ->rcu_read_lock_nesting, thus refraining from queuing the task (and from setting RCU_READ_UNLOCK_BLOCKED) if we are already exiting from the outermost RCU read-side critical section (in other words, we really are no longer actually in that RCU read-side critical section). In addition, rcu_preempt_note_context_switch() invokes rcu_read_unlock_special() to carry out the cleanup in this case, which clears out the ->rcu_read_unlock_special bits and dequeues the task (if necessary), in turn avoiding needless delay of the current RCU grace period and needless RCU priority boosting. It is still illegal to call rcu_read_unlock() while holding a scheduler lock if the prior RCU read-side critical section has ever had either preemption or irqs enabled. However, the common use case is legal, namely where then entire RCU read-side critical section executes with irqs disabled, for example, when the scheduler lock is held across the entire lifetime of the RCU read-side critical section. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-07-20	slab: shrink sizeof(struct kmem_cache)	Eric Dumazet
	Reduce high order allocations for some setups. (NR_CPUS=4096 -> we need 64KB per kmem_cache struct) We now allocate exact needed size (using nr_cpu_ids and nr_node_ids) This also makes code a bit smaller on x86_64, since some field offsets are less than the 127 limit : Before patch : # size mm/slab.o text data bss dec hex filename 22605 361665 32 384302 5dd2e mm/slab.o After patch : # size mm/slab.o text data bss dec hex filename 22349 353473 8224 384046 5dc2e mm/slab.o CC: Andrew Morton <akpm@linux-foundation.org> Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-07-20	sched: Avoid creating superfluous NUMA domains on non-NUMA systems	Peter Zijlstra
	When creating sched_domains, stop when we've covered the entire target span instead of continuing to create domains, only to later find they're redundant and throw them away again. This avoids single node systems from touching funny NUMA sched_domain creation code and reduces the risks of the new SD_OVERLAP code. Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Anton Blanchard <anton@samba.org> Cc: mahesh@linux.vnet.ibm.com Cc: benh@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1311180177.29152.57.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-20	sched: Allow for overlapping sched_domain spans	Peter Zijlstra
	Allow for sched_domain spans that overlap by giving such domains their own sched_group list instead of sharing the sched_groups amongst each-other. This is needed for machines with more than 16 nodes, because sched_domain_node_span() will generate a node mask from the 16 nearest nodes without regard if these masks have any overlap. Currently sched_domains have a sched_group that maps to their child sched_domain span, and since there is no overlap we share the sched_group between the sched_domains of the various CPUs. If however there is overlap, we would need to link the sched_group list in different ways for each cpu, and hence sharing isn't possible. In order to solve this, allocate private sched_groups for each CPU's sched_domain but have the sched_groups share a sched_group_power structure such that we can uniquely track the power. Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-08bxqw9wis3qti9u5inifh3y@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-20	sched: Break out cpu_power from the sched_group structure	Peter Zijlstra
	In order to prepare for non-unique sched_groups per domain, we need to carry the cpu_power elsewhere, so put a level of indirection in. Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-qkho2byuhe4482fuknss40ad@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-20	HID: emsff: properly handle emsff_init failure	Axel Lin
	emsff_init() may fail, let's properly handle the failure. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>