summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-10-19rtmutex-tester: make it build without BKLArnd Bergmann
The big kernel lock is going away, so make sure that if it is disabled by Kconfig, we do not try to validate it, which would result in compile errors. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org>
2010-10-19dvb-core: kill the big kernel lockArnd Bergmann
The dvb core only uses the big kernel lock in the open and ioctl functions, which means it can be replaced with a dvb specific mutex. Fortunately, all the ioctl functions go through dvb_usercopy, so we can move the serialization in there. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: linux-media@vger.kernel.org
2010-10-19dvb/bt8xx: kill the big kernel lockArnd Bergmann
The bt8xx driver only uses the big kernel lock in its dst_ca_ioctl function and never to serialize against other code, so we can trivially replace it with a private mutex. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-media@vger.kernel.org Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
2010-10-19tlclk: remove big kernel lockArnd Bergmann
This driver already has a global mutex, so let's just use that in the open function instead of the BKL. It may not even be needed there, but this patch should have the smallest impact. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Mark Gross <mark.gross@intel.com>
2010-10-19fix rawctl compat ioctls breakage on amd64 and itanicAl Viro
RAW_SETBIND and RAW_GETBIND 32bit versions are fscked in interesting ways. 1) fs/compat_ioctl.c has COMPATIBLE_IOCTL(RAW_SETBIND) followed by HANDLE_IOCTL(RAW_SETBIND, raw_ioctl). The latter is ignored. 2) on amd64 (and itanic) the damn thing is broken - we have int + u64 + u64 and layouts on i386 and amd64 are _not_ the same. raw_ioctl() would work there, but it's never called due to (1). As it is, i386 /sbin/raw definitely doesn't work on amd64 boxen. 3) switching to raw_ioctl() as is would *not* work on e.g. sparc64 and ppc64, which would be rather sad, seeing that normal userland there is 32bit. The thing is, slapping __packed on the struct in question does not DTRT - it eliminates *all* padding. The real solution is to use compat_u64. 4) of course, all that stuff has no business being outside of raw.c in the first place - there should be ->compat_ioctl() for /dev/rawctl instead of messing with compat_ioctl.c. [akpm@linux-foundation.org: coding-style fixes] [arnd@arndb.de: port to 2.6.36] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-19uml: kill big kernel lockArnd Bergmann
Three uml device drivers still use the big kernel lock, but all of them can be safely converted to using a per-driver mutex instead. Most likely this is not even necessary, so after further review these can and should be removed as well. The exec system call no longer requires the BKL either, so remove it from there, too. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jeff Dike <jdike@addtoit.com> Cc: user-mode-linux-devel@lists.sourceforge.net
2010-10-19workqueue: remove in_workqueue_context()Tejun Heo
Commit a25909a4 (lockdep: Add an in_workqueue_context() lockdep-based test function) added in_workqueue_context() but there hasn't been any in-kernel user and the lockdep annotation in workqueue is scheduled to change. Remove the unused function. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-10-19workqueue: Clarify that schedule_on_each_cpu is synchronousTejun Heo
The documentation for schedule_on_each_cpu() states that it calls a function on each online CPU from keventd. This can easily be interpreted as an asyncronous call because the description does not mention that flush_work is called. Clarify that it is synchronous. tj: rephrased a bit Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-10-19memory_hotplug: drop spurious calls to flush_scheduled_work()Tejun Heo
lru_add_drain_all() uses schedule_on_each_cpu() which is synchronous. There is no reason to call flush_scheduled_work() after lru_add_drain_all(). Drop the spurious calls. This is to prepare for the deprecation and removal of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mel@csn.ul.ie>
2010-10-19ipvs: IPv6 tunnel modeHans Schillstrom
IPv6 encapsulation uses a bad source address for the tunnel. i.e. VIP will be used as local-addr and encap. dst addr. Decapsulation will not accept this. Example LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100) (eth0 2003::1:0:1/96) RS (ethX 2003::1:0:5/96) tcpdump 2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40) 2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0 In Linux IPv6 impl. you can't have a tunnel with an any cast address receiving packets (I have not tried to interpret RFC 2473) To have receive capabilities the tunnel must have: - Local address set as multicast addr or an unicast addr - Remote address set as an unicast addr. - Loop back addres or Link local address are not allowed. This causes us to setup a tunnel in the Real Server with the LVS as the remote address, here you can't use the VIP address since it's used inside the tunnel. Solution Use outgoing interface IPv6 address (match against the destination). i.e. use ip6_route_output() to look up the route cache and then use ipv6_dev_get_saddr(...) to set the source address of the encapsulated packet. Additionally, cache the results in new destination fields: dst_cookie and dst_saddr and properly check the returned dst from ip6_route_output. We now add xfrm_lookup call only for the tunneling method where the source address is a local one. Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19netfilter: ctnetlink: add expectation deletion eventsPablo Neira Ayuso
This patch allows to listen to events that inform about expectations destroyed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19cciss: fix PCI IDs for new Smart Array controllersMike Miller
cciss: fix PCI IDs for new controllers This patch fixes the botched up PCI IDs of new controllers. Please consider this patch for inclusion. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-19x86: ioapic: Call free_irte only if interrupt remapping enabledYinghai Lu
On a system that support intr-rempping when booting with "intremap=off" [ 177.895501] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 [ 177.913316] IP: [<ffffffff8145fc18>] free_irte+0x47/0xc0 ... [ 178.173326] Call Trace: [ 178.173574] [<ffffffff810515b4>] destroy_irq+0x3a/0x75 [ 178.192934] [<ffffffff81051834>] arch_teardown_msi_irq+0xe/0x10 [ 178.193418] [<ffffffff81458dc3>] arch_teardown_msi_irqs+0x56/0x7f [ 178.213021] [<ffffffff81458e79>] free_msi_irqs+0x8d/0xeb Call free_irte only when interrupt remapping is enabled. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CBCB274.7010108@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-19perf, powerpc: Fix power_pmu_event_init to not use event->ctxPaul Mackerras
Commit c3f00c70 ("perf: Separate find_get_context() from event initialization") changed the generic perf_event code to call perf_event_alloc, which calls the arch-specific event_init code, before looking up the context for the new event. Unfortunately, power_pmu_event_init uses event->ctx->task to see whether the new event is a per-task event or a system-wide event, and thus crashes since event->ctx is NULL at the point where power_pmu_event_init gets called. (The reason it needs to know whether it is a per-task event is because there are some hardware events on Power systems which only count when the processor is not idle, and there are some fixed-function counters which count such events. For example, the "run cycles" event counts cycles when the processor is not idle. If the user asks to count cycles, we can use "run cycles" if this is a per-task event, since the processor is running when the task is running, by definition. We can't use "run cycles" if the user asks for "cycles" on a system-wide counter.) Fortunately the information we need is in the event->attach_state field, so we just use that instead. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20101019055535.GA10398@drongo> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Alexey Kardashevskiy <aik@au1.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-19Merge branch 'v2.6.36-rc8' into for-2.6.37/barrierJens Axboe
Conflicts: block/blk-core.c drivers/block/loop.c mm/swapfile.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-19block: fix accounting bug on cross partition mergesYasuaki Ishimatsu
/proc/diskstats would display a strange output as follows. $ cat /proc/diskstats |grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137 Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE. The detailed root cause is as follows. Assuming that there are two partition, sda1 and sda2. 1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1. | hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 --------------------------- 2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed. | hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 --------------------------- 3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented. | hd_struct->in_flight --------------------------- sda1 | -1 sda2 | 1 --------------------------- The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do. When reloading partition tables, quiesce IO to ensure that no request references to the partition struct exists. When it is safe to free the partition table, the IO for that device is restarted again. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-19Merge branch 'tip/perf/recordmcount-2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-10-19ARM: S5PV310: Fix build error on GPIO mapKukjin Kim
This patch fixes build error about GPIO address due to conflict of commit 4d914705 and 19a2c065. - commit 4d914705: Fix on GPIO base addresses - commit 19a2c065: Moves initial map for merging S5P64X0 Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2010-10-18Merge branch 'hotplug' into develRussell King
Conflicts: arch/arm/kernel/head-common.S
2010-10-18Merge branches 'at91', 'dcache', 'ftrace', 'hwbpt', 'misc', 'mmci', 's3c', ↵Russell King
'st-ux' and 'unwind' into devel
2010-10-18ftrace: Remove recursion between recordmcount and scripts/mod/emptySteven Rostedt
When DYNAMIC_FTRACE is enabled and we use the C version of recordmcount, all objects are run through the recordmcount program to create a separate section that stores all the callers of mcount. The build process has a special file: scripts/mod/empty.o. This is built from empty.c which is literally an empty file (except for a single comment). This file is used to find information about the target elf format, like endianness and word size. The problem comes up when we need to build recordmcount. The build process requires that empty.o is built first. The build rules for empty.o will try to execute recordmcount on the empty.o file. We get an error that recordmcount does not exist. To avoid this recursion, the build file will skip running recordmcount if the file that it is building is script/mod/empty.o. [ extra comment Suggested-by: Sam Ravnborg <sam@ravnborg.org> ] Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-18ARM: 6441/1: ux500: The platform is not just based on early drop silicon ↵Srinidhi Kasagar
version. Update Kconfig text accordingly. Signed-off-by: srinidhi kasagar <srinidhi.kasagar@stericsson.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-10-18Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linusLinus Torvalds
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: MIPS: Enable ISA_DMA_API config to fix build failure MIPS: 32-bit: Fix build failure in asm/fcntl.h MIPS: Remove all generated vmlinuz* files on "make clean" MIPS: do_sigaltstack() expects userland pointers MIPS: Fix error values in case of bad_stack MIPS: Sanitize restart logics MIPS: secure_computing, syscall audit: syscall number should in r2, not r0. MIPS: Don't block signals if we'd failed to setup a sigframe
2010-10-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: evdev - fix EVIOCSABS regression Input: evdev - fix Ooops in EVIOCGABS/EVIOCSABS
2010-10-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: ohci: fix TI TSB82AA2 regression since 2.6.35
2010-10-18xfs: semaphore cleanupThomas Gleixner
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead. (Ported to current XFS code by <aelder@sgi.com>.) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18mxc_nand: do not depend on disabling the irq in the interrupt handlerSascha Hauer
This patch reverts the driver to enabling/disabling the NFC interrupt mask rather than enabling/disabling the system interrupt. This cleans up the driver so that it doesn't rely on interrupts being disabled within the interrupt handler. For i.MX21 we keep the current behaviour, that is calling enable_irq/disable_irq_nosync to enable/disable interrupts. This patch is based on earlier work by John Ogness. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: John Ogness <john.ogness@linutronix.de> Tested-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-18xfs: Extend project quotas to support 32bit project idsArkadiusz Mi?kiewicz
This patch adds support for 32bit project quota identifiers. On disk format is backward compatible with 16bit projid numbers. projid on disk is now kept in two 16bit values - di_projid_lo (which holds the same position as old 16bit projid value) and new di_projid_hi (takes existing padding) and converts from/to 32bit value on the fly. xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used to enable PROJID32BIT support. Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove xfs_buf wrappersChristoph Hellwig
Stop having two different names for many buffer functions and use the more descriptive xfs_buf_* names directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove xfs_cred.hChristoph Hellwig
We're not actually passing around credentials inside XFS for a while now, so remove all xfs_cred.h with it's cred_t typedef and all instances of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove xfs_globals.hChristoph Hellwig
This header only provides one extern that isn't actually declared anywhere, and shadowed by a macro. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove xfs_version.hChristoph Hellwig
It used to have a place when it contained an automatically generated CVS version, but these days it's entirely superflous. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove xfs_refcache.hChristoph Hellwig
This header has been completely unused for a couple of years. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: fix the xfs_trans_committedChristoph Hellwig
Use the correct prototype for xfs_trans_committed instead of casting it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove unused t_callback field in struct xfs_transChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: fix bogus m_maxagi check in xfs_igetChristoph Hellwig
These days inode64 should only control which AGs we allocate new inodes from, while we still try to support reading all existing inodes. To make this actually work the check ontop of xfs_iget needs to be relaxed to allow inodes in all allocation groups instead of just those that we allow allocating inodes from. Note that we can't simply remove the check - it prevents us from accessing invalid data when fed invalid inode numbers from NFS or bulkstat. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: do not use xfs_mod_incore_sb_batch for per-cpu countersChristoph Hellwig
Update the per-cpu counters manually in xfs_trans_unreserve_and_mod_sb and remove support for per-cpu counters from xfs_mod_incore_sb_batch to simplify it. And added benefit is that we don't have to take m_sb_lock for transactions that only modify per-cpu counters. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: do not use xfs_mod_incore_sb for per-cpu countersChristoph Hellwig
Export xfs_icsb_modify_counters and always use it for modifying the per-cpu counters. Remove support for per-cpu counters from xfs_mod_incore_sb to simplify it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove XFS_MOUNT_NO_PERCPU_SBChristoph Hellwig
Fail the mount if we can't allocate memory for the per-CPU counters. This is consistent with how we handle everything else in the mount path and makes the superblock counter modification a lot simpler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: pack xfs_buf structure more tightlyDave Chinner
pahole reports the struct xfs_buf has quite a few holes in it, so packing the structure better will reduce the size of it by 16 bytes. Also, move all the fields used in cache lookups into the first cacheline. Before on x86_64: /* size: 320, cachelines: 5 */ /* sum members: 298, holes: 6, sum holes: 22 */ After on x86_64: /* size: 304, cachelines: 5 */ /* padding: 6 */ /* last cacheline: 48 bytes */ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: convert buffer cache hash to rbtreeDave Chinner
The buffer cache hash is showing typical hash scalability problems. In large scale testing the number of cached items growing far larger than the hash can efficiently handle. Hence we need to move to a self-scaling cache indexing mechanism. I have selected rbtrees for indexing becuse they can have O(log n) search scalability, and insert and remove cost is not excessive, even on large trees. Hence we should be able to cache large numbers of buffers without incurring the excessive cache miss search penalties that the hash is imposing on us. To ensure we still have parallel access to the cache, we need multiple trees. Rather than hashing the buffers by disk address to select a tree, it seems more sensible to separate trees by typical access patterns. Most operations use buffers from within a single AG at a time, so rather than searching lots of different lists, separate the buffer indexes out into per-AG rbtrees. This means that searches during metadata operation have a much higher chance of hitting cache resident nodes, and that updates of the tree are less likely to disturb trees being accessed on other CPUs doing independent operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: serialise inode reclaim within an AGDave Chinner
Memory reclaim via shrinkers has a terrible habit of having N+M concurrent shrinker executions (N = num CPUs, M = num kswapds) all trying to shrink the same cache. When the cache they are all working on is protected by a single spinlock, massive contention an slowdowns occur. Wrap the per-ag inode caches with a reclaim mutex to serialise reclaim access to the AG. This will block concurrent reclaim in each AG but still allow reclaim to scan multiple AGs concurrently. Allow shrinkers to move on to the next AG if it can't get the lock, and if we can't get any AG, then start blocking on locks. To prevent reclaimers from continually scanning the same inodes in each AG, add a cursor that tracks where the last reclaim got up to and start from that point on the next reclaim. This should avoid only ever scanning a small number of inodes at the satart of each AG and not making progress. If we have a non-shrinker based reclaim pass, ignore the cursor and reset it to zero once we are done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: batch inode reclaim lookupDave Chinner
Batch and optimise the per-ag inode lookup for reclaim to minimise scanning overhead. This involves gang lookups on the radix trees to get multiple inodes during each tree walk, and tighter validation of what inodes can be reclaimed without blocking befor we take any locks. This is based on ideas suggested in a proof-of-concept patch posted by Nick Piggin. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: implement batched inode lookups for AG walkingDave Chinner
With the reclaim code separated from the generic walking code, it is simple to implement batched lookups for the generic walk code. Separate out the inode validation from the execute operations and modify the tree lookups to get a batch of inodes at a time. Reclaim operations will be optimised separately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: split out inode walk inode grabbingDave Chinner
When doing read side inode cache walks, the code to validate and grab an inode is common to all callers. Split it out of the execute callbacks in preparation for batching lookups. Similarly, split out the inode reference dropping from the execute callbacks into the main lookup look to be symmetric with the grab. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: split inode AG walking into separate code for reclaimDave Chinner
The reclaim walk requires different locking and has a slightly different walk algorithm, so separate it out so that it can be optimised separately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove buftarg hash for external devicesDave Chinner
For RT and external log devices, we never use hashed buffers on them now. Remove the buftarg hash tables that are set up for them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: use unhashed buffers for size checksDave Chinner
When we are checking we can access the last block of each device, we do not need to use cached buffers as they will be tossed away immediately. Use uncached buffers for size checks so that all IO prior to full in-memory structure initialisation does not use the buffer cache. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: kill XBF_FS_MANAGED buffersDave Chinner
Filesystem level managed buffers are buffers that have their lifecycle controlled by the filesystem layer, not the buffer cache. We currently cache these buffers, which makes cleanup and cache walking somewhat troublesome. Convert the fs managed buffers to uncached buffers obtained by via xfs_buf_get_uncached(), and remove the XBF_FS_MANAGED special cases from the buffer cache. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: store xfs_mount in the buftarg instead of in the xfs_bufDave Chinner
Each buffer contains both a buftarg pointer and a mount pointer. If we add a mount pointer into the buftarg, we can avoid needing the b_mount field in every buffer and grab it from the buftarg when needed instead. This shrinks the xfs_buf by 8 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>