summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2010-07-05rbtree: Undo augmented trees performance damage and regressionPeter Zijlstra
Reimplement augmented RB-trees without sprinkling extra branches all over the RB-tree code (which lives in the scheduler hot path). This approach is 'borrowed' from Fabio's BFQ implementation and relies on traversing the rebalance path after the RB-tree-op to correct the heap property for insertion/removal and make up for the damage done by the tree rotations. For insertion the rebalance path is trivially that from the new node upwards to the root, for removal it is that from the deepest node in the path from the to be removed node that will still be around after the removal. [ This patch also fixes a video driver regression reported by Ali Gholami Rudi - the memtype->subtree_max_end was updated incorrectly. ] Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Ali Gholami Rudi <ali@rudi.ir> Cc: Fabio Checconi <fabio@gandalf.sssup.it> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1275414172.27810.27961.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-05Merge commit 'v2.6.35-rc4' into perf/coreIngo Molnar
Merge reason: Pick up the latest perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-04module: initialize module dynamic debug laterYehuda Sadeh
We should initialize the module dynamic debug datastructures only after determining that the module is not loaded yet. This fixes a bug that introduced in 2.6.35-rc2, where when a trying to load a module twice, we also load it's dynamic printing data twice which causes all sorts of nasty issues. Also handle the dynamic debug cleanup later on failure. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed a #ifdef) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-04netdevice.h: Change netif_<level> macros to call netdev_<level> functionsJoe Perches
Reduces text ~300 bytes of text (woohoo!) in an x86 defconfig $ size vmlinux* text data bss dec hex filename 7198526 720112 1366288 9284926 8dad3e vmlinux 7198862 720112 1366288 9285262 8dae8e vmlinux.netdev Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-04netdevice.h net/core/dev.c: Convert netdev_<level> logging macros to functionsJoe Perches
Reduces an x86 defconfig text and data ~2k. text is smaller, data is larger. $ size vmlinux* text data bss dec hex filename 7198862 720112 1366288 9285262 8dae8e vmlinux 7205273 716016 1366288 9287577 8db799 vmlinux.device_h Uses %pV and struct va_format Format arguments are verified before printk Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-04device.h drivers/base/core.c Convert dev_<level> logging macros to functionsJoe Perches
Reduces an x86 defconfig text and data ~55k, .6% smaller. $ size vmlinux* text data bss dec hex filename 7205273 716016 1366288 9287577 8db799 vmlinux 7258890 719768 1366288 9344946 8e97b2 vmlinux.master Uses %pV and struct va_format Format arguments are verified before printk The dev_info macro is converted to _dev_info because there are existing uses of variables named dev_info in the kernel tree like drivers/net/pcmcia/pcnet_cs.c A dev_info macro is created to call _dev_info Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-04vsprintf: Recursive vsnprintf: Add "%pV", struct va_formatJoe Perches
Add the ability to print a format and va_list from a structure pointer Allows __dev_printk to be implemented as a single printk while minimizing string space duplication. %pV should not be used without some mechanism to verify the format and argument use ala __attribute__(format (printf(...))). Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-04slab: fix caller tracking on !CONFIG_DEBUG_SLAB && CONFIG_TRACINGXiaotian Feng
In slab, all __xxx_track_caller is defined on CONFIG_DEBUG_SLAB || CONFIG_TRACING, thus caller tracking function should be worked for CONFIG_TRACING. But if CONFIG_DEBUG_SLAB is not set, include/linux/slab.h will define xxx_track_caller to __xxx() without consideration of CONFIG_TRACING. This will break the caller tracking behaviour then. Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-04Input: mcs - Add MCS touchkey driverJoonyoung Shim
This adds support for MELPAS MCS5000/MSC5080 touch key controllers. Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-07-03Input: ads7846 - do not allow altering platform dataDmitry Torokhov
Tested-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-07-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-07-02linux/net.h: fix kernel-doc warningsRandy Dunlap
Fix kernel-doc warnings in linux/net.h: Warning(include/linux/net.h:151): No description found for parameter 'wq' Warning(include/linux/net.h:151): Excess struct/union/enum/typedef member 'fasync_list' description in 'socket' Warning(include/linux/net.h:151): Excess struct/union/enum/typedef member 'wait' description in 'socket' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02net: decreasing real_num_tx_queues needs to flush qdiscJohn Fastabend
Reducing real_num_queues needs to flush the qdisc otherwise skbs with queue_mappings greater then real_num_tx_queues can be sent to the underlying driver. The flow for this is, dev_queue_xmit() dev_pick_tx() skb_tx_hash() => hash using real_num_tx_queues skb_set_queue_mapping() ... qdisc_enqueue_root() => enqueue skb on txq from hash ... dev->real_num_tx_queues -= n ... sch_direct_xmit() dev_hard_start_xmit() ndo_start_xmit(skb,dev) => skb queue set with old hash skbs are enqueued on the qdisc with skb->queue_mapping set 0 < queue_mappings < real_num_tx_queues. When the driver decreases real_num_tx_queues skb's may be dequeued from the qdisc with a queue_mapping greater then real_num_tx_queues. This fixes a case in ixgbe where this was occurring with DCB and FCoE. Because the driver is using queue_mapping to map skbs to tx descriptor rings we can potentially map skbs to rings that no longer exist. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02sched: qdisc_reset_all_tx is calling qdisc_reset without qdisc_lockJohn Fastabend
When calling qdisc_reset() the qdisc lock needs to be held. In this case there is at least one driver i4l which is using this without holding the lock. Add the locking here. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Cure nr_iowait_cpu() users init: Fix comment init, sched: Fix race between init and kthreadd
2010-07-02workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND insteadTejun Heo
WQ_SINGLE_CPU combined with @max_active of 1 is used to achieve full ordering among works queued to a workqueue. The same can be achieved using WQ_UNBOUND as unbound workqueues always use the gcwq for WORK_CPU_UNBOUND. As @max_active is always one and benefits from cpu locality isn't accessible anyway, serving them with unbound workqueues should be fine. Drop WQ_SINGLE_CPU support and use WQ_UNBOUND instead. Note that most single thread workqueue users will be converted to use multithread or non-reentrant instead and only the ones which require strict ordering will keep using WQ_UNBOUND + @max_active of 1. Signed-off-by: Tejun Heo <tj@kernel.org>
2010-07-02workqueue: implement unbound workqueueTejun Heo
This patch implements unbound workqueue which can be specified with WQ_UNBOUND flag on creation. An unbound workqueue has the following properties. * It uses a dedicated gcwq with a pseudo CPU number WORK_CPU_UNBOUND. This gcwq is always online and disassociated. * Workers are not bound to any CPU and not concurrency managed. Works are dispatched to workers as soon as possible and the only applied limitation is @max_active. IOW, all unbound workqeueues are implicitly high priority. Unbound workqueues can be used as simple execution context provider. Contexts unbound to any cpu are served as soon as possible. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: David Howells <dhowells@redhat.com>
2010-07-02workqueue: prepare for WQ_UNBOUND implementationTejun Heo
In preparation of WQ_UNBOUND addition, make the following changes. * Add WORK_CPU_* constants for pseudo cpu id numbers used (currently only WORK_CPU_NONE) and use them instead of NR_CPUS. This is to allow another pseudo cpu id for unbound cpu. * Reorder WQ_* flags. * Make workqueue_struct->cpu_wq a union which contains a percpu pointer, regular pointer and an unsigned long value and use kzalloc/kfree() in UP allocation path. This will be used to implement unbound workqueues which will use only one cwq on SMPs. * Move alloc_cwqs() allocation after initialization of wq fields, so that alloc_cwqs() has access to wq->flags. * Trivial relocation of wq local variables in freeze functions. These changes don't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org>
2010-07-02libata: take advantage of cmwq and remove concurrency limitationsTejun Heo
libata has two concurrency related limitations. a. ata_wq which is used for polling PIO has single thread per CPU. If there are multiple devices doing polling PIO on the same CPU, they can't be executed simultaneously. b. ata_aux_wq which is used for SCSI probing has single thread. In cases where SCSI probing is stalled for extended period of time which is possible for ATAPI devices, this will stall all probing. #a is solved by increasing maximum concurrency of ata_wq. Please note that polling PIO might be used under allocation path and thus needs to be served by a separate wq with a rescuer. #b is solved by using the default wq instead and achieving exclusion via per-port mutex. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com>
2010-07-02drm: add per-event vblank event trace pointsJesse Barnes
Allows us to track each process that requests and completes events. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-07-01Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: ata_generic: implement ATA_GEN_* flags and force enable DMA on MBP 7,1 ahci,ata_generic: let ata_generic handle new MBP w/ MCP89 libahci: Fix bug in storing EM messages
2010-07-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 Conflicts: drivers/net/wireless/libertas/host.h
2010-07-01Revert "net: Make accesses to ->br_port safe for sparse RCU"Paul E. McKenney
This reverts commit 81bdf5bd7349bd4523538cbd7878f334bc2bfe14, which is obsoleted by commit f350a0a87374 from the net tree.
2010-07-01ahci,ata_generic: let ata_generic handle new MBP w/ MCP89Tejun Heo
For yet unknown reason, MCP89 on MBP 7,1 doesn't work w/ ahci under linux but the controller doesn't require explicit mode setting and works fine with ata_generic. Make ahci ignore the controller on MBP 7,1 and let ata_generic take it for now. Reported in bko#15923. https://bugzilla.kernel.org/show_bug.cgi?id=15923 NVIDIA is investigating why ahci mode doesn't work. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peer Chen <pchen@nvidia.com> Cc: stable@kernel.org Reported-by: Anders Østhus <grapz666@gmail.com> Reported-by: Andreas Graf <andreas_graf@csgraf.de> Reported-by: Benoit Gschwind <gschwind@gnu-log.net> Reported-by: Damien Cassou <damien.cassou@gmail.com> Reported-by: tixetsal@juno.com Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-07-01Merge branch 'drm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (27 commits) drm/radeon/kms: remove rv100 bios connector quirk drm/radeon/kms/pm: fix power state indexing on igp chips in dynpm mode DRM / radeon / KMS: Fix hibernation regression related to radeon PM (was: Re: [Regression, post-2.6.34] Hibernation broken on machines with radeon/KMS and r300) drm/radeon/kms/igp: fix possible divide by 0 in bandwidth code (v2) drm/radeon: add quirk to make HP nx6125 laptop resume. drm/radeon/kms: add some missing regs to evergreen gpu init drm/radeon/kms: fix typos in evergreen command checker drm/radeon/kms: avoid oops on mac r4xx cards fb: fix colliding defines for fb flags. drm/radeon/kms: Force HDP_NONSURF to maximum size drm/radeon/kms: disable frac fb dividers for rs6xx drm/radeon/kms: don't read attempt to read bios from VRAM on unposted GPU. drm/radeon/kms: fix typo in evergreen_gpu_init drm/radeon/kms: return ret in cursor_set failure path drm/ttm: non pooled page allocation should have GFP_USER set drm/radeon/r100/r200: fix calculation of compressed cube maps drm/radeon/r200: handle more hw tex coord types drm/radeon/kms: CS checker texture fixes for r1xx/r2xx/r3xx drm/radeon: add fake RN50 table for powerpc drm/fb: Fix video= mode computation ...
2010-07-01sched: Cure nr_iowait_cpu() usersPeter Zijlstra
Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us()) broke things by not making sure preemption was indeed disabled by the callers of nr_iowait_cpu() which took the iowait value of the current cpu. This resulted in a heap of preempt warnings. Cure this by making nr_iowait_cpu() take a cpu number and fix up the callers to pass in the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1277968037.1868.120.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-01Merge branch 'linus' into core/rcuIngo Molnar
Conflicts: fs/fs-writeback.c Merge reason: Resolve the conflict Note, i picked the version from Linus's tree, which effectively reverts the fs-writeback.c bits of: b97181f: fs: remove all rcu head initializations, except on_stack initializations As the upstream changes to this file changed this code heavily and the first attempt to resolve the conflict resulted in a non-booting kernel. It's safer to re-try this portion of the commit cleanly. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-01fb: fix colliding defines for fb flags.Dave Airlie
When I added the flags I must have been using a 25 line terminal and missed the following flags. The collided with flag has one user in staging despite being in-tree for 5 years. I'm happy to push this via my drm tree unless someone really wants to do it. Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@kernel.org
2010-06-30Merge commit 'v2.6.35-rc3' into nextDmitry Torokhov
2010-06-30ethtool: Add support for control of RX flow hash indirectionBen Hutchings
Many NICs use an indirection table to map an RX flow hash value to one of an arbitrary number of queues (not necessarily a power of 2). It can be useful to remove some queues from this indirection table so that they are only used for flows that are specifically filtered there. It may also be useful to weight the mapping to account for user processes with the same CPU-affinity as the RX interrupts. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30ethtool: Change ethtool_op_set_flags to validate flagsBen Hutchings
ethtool_op_set_flags() does not check for unsupported flags, and has no way of doing so. This means it is not suitable for use as a default implementation of ethtool_ops::set_flags. Add a 'supported' parameter specifying the flags that the driver and hardware support, validate the requested flags against this, and change all current callers to pass this parameter. Change some other trivial implementations of ethtool_ops::set_flags to call ethtool_op_set_flags(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30fragment: add fast path for in-order fragmentsChangli Gao
add fast path for in-order fragments As the fragments are sent in order in most of OSes, such as Windows, Darwin and FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue. In the fast path, we check if the skb at the end of the inet_frag_queue is the prev we expect. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_frag.h | 1 + net/ipv4/ip_fragment.c | 12 ++++++++++++ net/ipv6/reassembly.c | 11 +++++++++++ 3 files changed, 24 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30snmp: 64bit ipstats_mib for all archesEric Dumazet
/proc/net/snmp and /proc/net/netstat expose SNMP counters. Width of these counters is either 32 or 64 bits, depending on the size of "unsigned long" in kernel. This means user program parsing these files must already be prepared to deal with 64bit values, regardless of user program being 32 or 64 bit. This patch introduces 64bit snmp values for IPSTAT mib, where some counters can wrap pretty fast if they are 32bit wide. # netstat -s|egrep "InOctets|OutOctets" InOctets: 244068329096 OutOctets: 244069348848 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30mv643xx_eth: use sw csum for big packetsSaeed Bishara
Some controllers (KW, Dove) limits the TX IP/layer4 checksum offloading to a max size. Signed-off-by: Saeed Bishara <saeed@marvell.com> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30net/neighbour.h: fix typoKulikov Vasiliy
'Shoul' must be 'should'. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30eeprom_93cx6: Add support for 93c86 EEPROMs.Gertjan van Wingerde
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-30xfrm: fix XFRMA_MARK extraction in xfrm_mark_getAndreas Steffen
Determine the size of the xfrm_mark struct, not of its pointer. Signed-off-by: Andreas Steffen <andreas.steffen@strongswan.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30Input: i8042 - mark stubs in i8042.h "static inline"Feng Tang
Otherwise we may run into following: drivers/platform/built-in.o: In function `i8042_lock_chip': /home/test/ws2/projects/linux-2.6/include/linux/i8042.h:50: multiple definition of `i8042_lock_chip' drivers/input/serio/built-in.o:/home/test/ws2/projects/linux-2.6/include/linux/i8042.h:50: first defined here ... make[1]: *** [drivers/built-in.o] Error 1 make: *** [drivers] Error 2 Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-06-29compiler-gcc.h: gcc-4.5 needs noclone and noinline on __naked functionsMikael Pettersson
A __naked function is defined in C but with a body completely implemented by asm(), including any prologue and epilogue. These asm() bodies expect standard calling conventions for parameter passing. Older GCCs implement that correctly, but 4.[56] currently do not, see GCC PR44290. In the Linux kernel this breaks ARM, causing most arch/arm/mm/copypage-*.c modules to get miscompiled, resulting in kernel crashes during bootup. Part of the kernel fix is to augment the __naked function attribute to also imply noinline and noclone. This patch implements that, and has been verified to fix boot failures with gcc-4.5 compiled 2.6.34 and 2.6.35-rc1 kernels. The patch is a no-op with older GCCs. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Khem Raj <raj.khem@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-29Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Don't count_vm_events for discard bio in submit_bio. cfq: fix recursive call in cfq_blkiocg_update_completion_stats() cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n cfq: Don't allow queue merges for queues that have no process references block: fix DISCARD_BARRIER requests cciss: set SCSI max cmd len to 16, as default is wrong cpqarray: fix two more wrong section type cpqarray: fix wrong __init type on pci probe function drbd: Fixed a race between disk-attach and unexpected state changes writeback: fix pin_sb_for_writeback writeback: add missing requeue_io in writeback_inodes_wb writeback: simplify and split bdi_start_writeback writeback: simplify wakeup_flusher_threads writeback: fix writeback_inodes_wb from writeback_inodes_sb writeback: enforce s_umount locking in writeback_inodes_sb writeback: queue work on stack in writeback_inodes_sb writeback: fix writeback completion notifications
2010-06-29fs: fix superblock iteration racenpiggin@suse.de
list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. 6754af6 introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-29sched: Fix spelling of siblingMichael Neuling
No logic changes, only spelling. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: linuxppc-dev@ozlabs.org Cc: David Howells <dhowells@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <15249.1277776921@neuling.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-29workqueue: implement cpu intensive workqueueTejun Heo
This patch implements cpu intensive workqueue which can be specified with WQ_CPU_INTENSIVE flag on creation. Works queued to a cpu intensive workqueue don't participate in concurrency management. IOW, it doesn't contribute to gcwq->nr_running and thus doesn't delay excution of other works. Note that although cpu intensive works won't delay other works, they can be delayed by other works. Combine with WQ_HIGHPRI to avoid being delayed by other works too. As the name suggests this is useful when using workqueue for cpu intensive works. Workers executing cpu intensive works are not considered for workqueue concurrency management and left for the scheduler to manage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
2010-06-29workqueue: implement high priority workqueueTejun Heo
This patch implements high priority workqueue which can be specified with WQ_HIGHPRI flag on creation. A high priority workqueue has the following properties. * A work queued to it is queued at the head of the worklist of the respective gcwq after other highpri works, while normal works are always appended at the end. * As long as there are highpri works on gcwq->worklist, [__]need_more_worker() remains %true and process_one_work() wakes up another worker before it start executing a work. The above two properties guarantee that works queued to high priority workqueues are dispatched to workers and start execution as soon as possible regardless of the state of other works. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org>
2010-06-29workqueue: implement several utility APIsTejun Heo
Implement the following utility APIs. workqueue_set_max_active() : adjust max_active of a wq workqueue_congested() : test whether a wq is contested work_cpu() : determine the last / current cpu of a work work_busy() : query whether a work is busy * Anton Blanchard fixed missing ret initialization in work_busy(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Anton Blanchard <anton@samba.org>
2010-06-29workqueue: s/__create_workqueue()/alloc_workqueue()/, and add system workqueuesTejun Heo
This patch makes changes to make new workqueue features available to its users. * Now that workqueue is more featureful, there should be a public workqueue creation function which takes paramters to control them. Rename __create_workqueue() to alloc_workqueue() and make 0 max_active mean WQ_DFL_ACTIVE. In the long run, all create_workqueue_*() will be converted over to alloc_workqueue(). * To further unify access interface, rename keventd_wq to system_wq and export it. * Add system_long_wq and system_nrt_wq. The former is to host long running works separately (so that flush_scheduled_work() dosen't take so long) and the latter guarantees any queued work item is never executed in parallel by multiple CPUs. These will be used by future patches to update workqueue users. Signed-off-by: Tejun Heo <tj@kernel.org>
2010-06-29workqueue: increase max_active of keventd and kill current_is_keventd()Tejun Heo
Define WQ_MAX_ACTIVE and create keventd with max_active set to half of it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1 works concurrently. Unless some combination can result in dependency loop longer than max_active, deadlock won't happen and thus it's unnecessary to check whether current_is_keventd() before trying to schedule a work. Kill current_is_keventd(). (Lockdep annotations are broken. We need lock_map_acquire_read_norecurse()) Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com>
2010-06-29workqueue: implement concurrency managed dynamic worker poolTejun Heo
Instead of creating a worker for each cwq and putting it into the shared pool, manage per-cpu workers dynamically. Works aren't supposed to be cpu cycle hogs and maintaining just enough concurrency to prevent work processing from stalling due to lack of processing context is optimal. gcwq keeps the number of concurrent active workers to minimum but no less. As long as there's one or more running workers on the cpu, no new worker is scheduled so that works can be processed in batch as much as possible but when the last running worker blocks, gcwq immediately schedules new worker so that the cpu doesn't sit idle while there are works to be processed. gcwq always keeps at least single idle worker around. When a new worker is necessary and the worker is the last idle one, the worker assumes the role of "manager" and manages the worker pool - ie. creates another worker. Forward-progress is guaranteed by having dedicated rescue workers for workqueues which may be necessary while creating a new worker. When the manager is having problem creating a new worker, mayday timer activates and rescue workers are summoned to the cpu and execute works which might be necessary to create new workers. Trustee is expanded to serve the role of manager while a CPU is being taken down and stays down. As no new works are supposed to be queued on a dead cpu, it just needs to drain all the existing ones. Trustee continues to try to create new workers and summon rescuers as long as there are pending works. If the CPU is brought back up while the trustee is still trying to drain the gcwq from the previous offlining, the trustee will kill all idles ones and tell workers which are still busy to rebind to the cpu, and pass control over to gcwq which assumes the manager role as necessary. Concurrency managed worker pool reduces the number of workers drastically. Only workers which are necessary to keep the processing going are created and kept. Also, it reduces cache footprint by avoiding unnecessarily switching contexts between different workers. Please note that this patch does not increase max_active of any workqueue. All workqueues can still only process one work per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
2010-06-29workqueue: implement WQ_NON_REENTRANTTejun Heo
With gcwq managing all the workers and work->data pointing to the last gcwq it was on, non-reentrance can be easily implemented by checking whether the work is still running on the previous gcwq on queueing. Implement it. Signed-off-by: Tejun Heo <tj@kernel.org>
2010-06-29workqueue: carry cpu number in work data once execution startsTejun Heo
To implement non-reentrant workqueue, the last gcwq a work was executed on must be reliably obtainable as long as the work structure is valid even if the previous workqueue has been destroyed. To achieve this, work->data will be overloaded to carry the last cpu number once execution starts so that the previous gcwq can be located reliably. This means that cwq can't be obtained from work after execution starts but only gcwq. Implement set_work_{cwq|cpu}(), get_work_[g]cwq() and clear_work_data() to set work data to the cpu number when starting execution, access the overloaded work data and clear it after cancellation. queue_delayed_work_on() is updated to preserve the last cpu while in-flight in timer and other callers which depended on getting cwq from work after execution starts are converted to depend on gcwq instead. * Anton Blanchard fixed compile error on powerpc due to missing linux/threads.h include. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Anton Blanchard <anton@samba.org>