linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2013-07-16	Merge branch 'bcache-for-3.11' of git://evilpiepirate.org/~kent/linux-bcache ↵	Jens Axboe
	into for-3.11/drivers Kent writes: Hey Jens - I've been busy torture testing and chasing bugs, here's the fruits of my labors. These are all fairly small fixes, some of them quite important.
2013-07-12	bcache: Allocation kthread fixes	Kent Overstreet
	The alloc kthread should've been using try_to_freeze() - and also there was the potential for the alloc kthread to get woken up after it had shut down, which would have been bad. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2013-07-12	bcache: Fix GC_SECTORS_USED() calculation	Kent Overstreet
	Part of the job of garbage collection is to add up however many sectors of live data it finds in each bucket, but that doesn't work very well if it doesn't reset GC_SECTORS_USED() when it starts. Whoops. This wouldn't have broken anything horribly, but allocation tries to preferentially reclaim buckets that are mostly empty and that's not gonna work with an incorrect GC_SECTORS_USED() value. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12	bcache: Journal replay fix	Kent Overstreet
	The journal replay code starts by finding something that looks like a valid journal entry, then it does a binary search over the unchecked region of the journal for the journal entries with the highest sequence numbers. Trouble is, the logic was wrong - journal_read_bucket() returns true if it found journal entries we need, but if the range of journal entries we're looking for loops around the end of the journal - in that case journal_read_bucket() could return true when it hadn't found the highest sequence number we'd seen yet, and in that case the binary search did the wrong thing. Whoops. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12	bcache: Shutdown fix	Kent Overstreet
	Stopping a cache set is supposed to make it stop attached backing devices, but somewhere along the way that code got lost. Fixing this mainly has the effect of fixing our reboot notifier. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12	bcache: Fix a sysfs splat on shutdown	Kent Overstreet
	If we stopped a bcache device when we were already detaching (or something like that), bcache_device_unlink() would try to remove a symlink from sysfs that was already gone because the bcache dev kobject had already been removed from sysfs. So keep track of whether we've removed stuff from sysfs. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12	bcache: Advertise that flushes are supported	Kent Overstreet
	Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered out unless we say we support them... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12	bcache: check for allocation failures	Dan Carpenter
	There is a missing NULL check after the kzalloc(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2013-07-12	bcache: Fix a dumb race	Kent Overstreet
	In the far-too-complicated closure code - closures can have destructors, for probably dubious reasons; they get run after the closure is no longer waiting on anything but before dropping the parent ref, intended just for freeing whatever memory the closure is embedded in. Trouble is, when remaining goes to 0 and we've got nothing more to run - we also have to unlock the closure, setting remaining to -1. If there's a destructor, that unlock isn't doing anything - nobody could be trying to lock it if we're about to free it - but if the unlock _is needed... that check for a destructor was racy. Argh. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-02	Merge branch 'bcache-for-3.11' of git://evilpiepirate.org/~kent/linux-bcache ↵	Jens Axboe
	into for-3.11/drivers
2013-07-02	Merge tag 'v3.10-rc7' into for-3.11/drivers	Jens Axboe
	Linux 3.10-rc7 Pull this in early to avoid doing it with the bcache merge, since there are a number of changes to bcache between my old base (3.10-rc1) and the new pull request.
2013-07-01	bcache: Use standard utility code	Kent Overstreet
	Some of bcache's utility code has made it into the rest of the kernel, so drop the bcache versions. Bcache used to have a workaround for allocating from a bio set under generic_make_request() (if you allocated more than once, the bios you already allocated would get stuck on current->bio_list when you submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT when allocating bios under generic_make_request() so that allocation could fail and it could retry from workqueue. But bio_alloc_bioset() has a workaround now, so we can drop this hack and the associated error handling. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-07-01	bcache: Update email address	Kent Overstreet
	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2013-07-01	bcache: Delete fuzz tester	Kent Overstreet
	This code has rotted and it hasn't been used in ages anyways. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2013-07-01	bcache: Document shrinker reserve better	Kent Overstreet
	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2013-07-01	bcache: FUA fixes	Kent Overstreet
	Journal writes need to be marked FUA, not just REQ_FLUSH. And btree node writes have... weird ordering requirements. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-28	drbd: Allow online change of al-stripes and al-stripe-size	Philipp Reisner
	Allow to change the AL layout with an resize operation. For that the reisze command gets two new fields: al_stripes and al_stripe_size. In order to make the operation crash save: 1) Lock out all IO and MD-IO 2) Write the super block with MDF_PRIMARY_IND clear 3) write the bitmap to the new location (all zeros, since we allow only while connected) 4) Initialize the new AL-area 5) Write the super block with the restored MDF_PRIMARY_IND. 6) Unfreeze all IO Since the AL-layout has no influence on the protocol, this operation needs to be beforemed on both sides of a resource (if intended). Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28	drbd: Constants should be UPPERCASE	Philipp Reisner
	Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28	drbd: Ignore the exit code of a fence-peer handler if it returns too late	Philipp Reisner
	In case the connection was established and lost again before the a fence-peer handler returns, ignore the exit code of this instance. (And use the exit code of the later started instance) Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28	drbd: Fix rcu_read_lock balance on error path	Andreas Gruenbacher
	Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28	drbd: fix error return code in drbd_init()	Wei Yongjun
	Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28	drbd: Do not sleep inside rcu	Andreas Gruenbacher
	Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28	Merge branch 'stable/for-jens-3.10' of ↵	Jens Axboe
	git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.11/drivers Konrad writes: It has the 'feature-max-indirect-segments' implemented in both backend and frontend. The current problem with the backend and frontend is that the segment size is limited to 11 pages. It means we can at most squeeze in 44kB per request. The ring can hold 32 (next power of two below 36) requests, meaning we can do 1.4M of outstanding requests. Nowadays that is not enough. The problem in the past was addressed in two ways - but neither one went upstream. The first solution to this proposed by Justin from Spectralogic was to negotiate the segment size. This means that the ‘struct blkif_sring_entry’ is now a variable size. It can expand from 112 bytes (cover 11 pages of data - 44kB) to 1580 bytes (256 pages of data - so 1MB). It is a simple extension by just making the array in the request expand from 11 to a variable size negotiated. But it had limits: this extension still limits the number of segments per request to 255 (as the total number must be specified in the request, which only has an 8-bit field for that purpose). The other solution (from Intel - Ronghui) was to create one extra ring that only has the ‘struct blkif_request_segment’ in them. The ‘struct blkif_request’ would be changed to have an index in said ‘segment ring’. There is only one segment ring. This means that the size of the initial ring is still the same. The requests would point to the segment and enumerate out how many of the indexes it wants to use. The limit is of course the size of the segment. If one assumes a one-page segment this means we can in one request cover ~4MB. Those patches were posted as RFC and the author never followed up on the ideas on changing it to be a bit more flexible. There is yet another mechanism that could be employed (which these patches implement) - and it borrows from VirtIO protocol. And that is the ‘indirect descriptors’. This very similar to what Intel suggests, but with a twist. The twist is to negotiate how many of these 'segment' pages (aka indirect descriptor pages) we want to support (in reality we negotiate how many entries in the segment we want to cover, and we module the number if it is bigger than the segment size). This means that with the existing 36 slots in the ring (single page) we can cover: 32 slots * each blkif_request_indirect covers: 512 * 4096 ~= 64M. Since we ample space in the blkif_request_indirect to span more than one indirect page, that number (64M) can be also multiplied by eight = 512MB. Roger Pau Monne took the idea and implemented them in these patches. They work great and the corner cases (migration between backends with and without this extension) work nicely. The backend has a limit right now off how many indirect entries it can handle: one indirect page, and at maximum 256 entries (out of 512 - so 50% of the page is used). That comes out to 32 slots * 256 entries in a indirect page * 1 indirect page per request * 4096 = 32MB. This is a conservative number that can change in the future. Right now it strikes a good balance between giving excellent performance, memory usage in the backend, and balancing the needs of many guests. In the patchset there is also the split of the blkback structure to be per-VBD. This means that the spinlock contention we had with many guests trying to do I/O and all the blkback threads hitting the same lock has been eliminated. Also there are bug-fixes to deal with oddly sized sectors, insane amounts on th ring, and also a security fix (posted earlier).
2013-06-26	bcache: Refresh usage docs	Gabriel de Perthuis
	Mention udev autoregistration, symlinks. Write down some sysfs paths. Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Send label uevents	Gabriel de Perthuis
	Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Send a uevent with a cached device's UUID	Gabriel de Perthuis
	Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com>
2013-06-26	doc: Fix typo in documentation/bcache.txt	Masanari Iida
	Correct spelling typo in documentation/bcache.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Write out full stripes	Kent Overstreet
	Now that we're tracking dirty data per stripe, we can add two optimizations for raid5/6: * If a stripe is already dirty, force writes to that stripe to writeback mode - to help build up full stripes of dirty data * When flushing dirty data, preferentially write out full stripes first if there are any. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Track dirty data by stripe	Kent Overstreet
	To make background writeback aware of raid5/6 stripes, we first need to track the amount of dirty data within each stripe - we do this by breaking up the existing sectors_dirty into per stripe atomic_ts Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Initialize sectors_dirty when attaching	Kent Overstreet
	Previously, dirty_data wouldn't get initialized until the first garbage collection... which was a bit of a problem for background writeback (as the PD controller keys off of it) and also confusing for users. This is also prep work for making background writeback aware of raid5/6 stripes. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Improve lazy sorting	Kent Overstreet
	The old lazy sorting code was kind of hacky - rewrite in a way that mathematically makes more sense; the idea is that the size of the sets of keys in a btree node should increase by a more or less fixed ratio from smallest to biggest. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Rip out pkey()/pbtree()	Kent Overstreet
	Old gcc doesnt like the struct hack, and it is kind of ugly. So finish off the work to convert pr_debug() statements to tracepoints, and delete pkey()/pbtree(). Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Fix/revamp tracepoints	Kent Overstreet
	The tracepoints were reworked to be more sensible, and fixed a null pointer deref in one of the tracepoints. Converted some of the pr_debug()s to tracepoints - this is partly a performance optimization; it used to be that with DEBUG or CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it was changed to an empty inline function. Some of the pr_debug() statements had rather expensive function calls as part of the arguments, so this code was getting run unnecessarily even on non debug kernels - in some fast paths, too. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Refactor btree io	Kent Overstreet
	The most significant change is that btree reads are now done synchronously, instead of asynchronously and doing the post read stuff from a workqueue. This was originally done because we can't block on IO under generic_make_request(). But - we already have a mechanism to punt cache lookups to workqueue if needed, so if we just use that we don't have to deal with the complexity of doing things asynchronously. The main benefit is this makes the locking situation saner; we can hold our write lock on the btree node until we're finished reading it, and we don't need that btree_node_read_done() flag anymore. Also, for writes, btree_write() was broken out into btree_node_write() and btree_leaf_dirty() - the old code with the boolean argument was dumb and confusing. The prio_blocked mechanism was improved a bit too, now the only counter is in struct btree_write, we don't mess with transfering a count from struct btree anymore. This required changing garbage collection to block prios at the start and unblock when it finishes, which is cleaner than what it was doing anyways (the old code had mostly the same effect, but was doing it in a convoluted way) And the btree iter btree_node_read_done() uses was converted to a real mempool. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Convert allocator thread to kthread	Kent Overstreet
	Using a workqueue when we just want a single thread is a bit silly. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: Warn when a device is already registered.	Gabriel de Perthuis
	Signed-off-by: Gabriel de Perthuis <g2p.code+bcache@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	bcache: fix a spurious gcc complaint, use scnprintf	Kent Overstreet
	An old version of gcc was complaining about using a const int as the size of a stack allocated array. Which should be fine - but using ARRAY_SIZE() is better, anyways. Also, refactor the code to use scnprintf(). Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-26	md: bcache: io.c: fix a potential NULL pointer dereference	Kumar Amit Mehta
	bio_alloc_bioset returns NULL on failure. This fix adds a missing check for potential NULL pointer dereferencing. Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-06-25	xen-blkback: check the number of iovecs before allocating a bios	Roger Pau Monne
	With the introduction of indirect segments we can receive requests with a number of segments bigger than the maximum number of allowed iovecs in a bios, so make sure that blkback doesn't try to allocate a bios with more iovecs than BIO_MAX_PAGES Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-22	Linux 3.10-rc7v3.10-rc7	Linus Torvalds

2013-06-22	Merge tag 'fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "These are two fixes that came in this week, one for a regression we introduced in 3.10 in the GIC interrupt code, and the other one fixes a typo in newly introduced code" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: irqchip: gic: call gic_cpu_init() as well in CPU_STARTING_FROZEN case ARM: dts: Correct the base address of pinctrl_3 on Exynos5250
2013-06-22	Merge tag 'driver-core-3.10-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg Kroah-Hartman: "Here's a single patch for the firmware core that resolves a reported oops in the firmware core that people have been hitting." * tag 'driver-core-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: firmware loader: fix use-after-free by double abort
2013-06-22	Merge tag 'usb-3.10-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg Kroah-Hartman: "Here are two USB patches for 3.10. One updates the Kconfig wording for CONFIG_USB_PHY to make it, hopefully, more obvious what this option is (I know you complained about this when it hit the tree.) The other is a new device id for a driver" * tag 'usb-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: ti_usb_3410_5052: new device id for Abbot strip port cable usb: phy: Improve Kconfig help for CONFIG_USB_PHY
2013-06-22	Merge tag 'tty-3.10-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pul tty fixes from Greg Kroah-Hartman: "Here are two tty core fixes that resolve some regressions that have been reported recently. Both tiny fixes, but needed" * tag 'tty-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: Fix transient pty write() EIO tty/vt: Return EBUSY if deallocating VT1 and it is busy
2013-06-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending	Linus Torvalds
	Pull SCSI target fixes from Nicholas Bellinger: "Included is the recent tcm_qla2xxx residual underrun length fix from Roland, along with Joern's iscsi-target patch for session_lock breakage within iscsit_stop_time2retain_timer() code. Both are CC'ed to stable. The remaining two are specific to recent iscsi-target + iser conversion changes. One drops some left-over debug noise, and Andy's patch fixes configfs attribute handling during an explicit network portal feature bit disable when iser-target is unsupported." * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iscsi-target: Remove left over v3.10-rc debug printks target/iscsi: Fix op=disable + error handling cases in np_store_iser tcm_qla2xxx: Fix residual for underrun commands that fail target/iscsi: don't corrupt bh_count in iscsit_stop_time2retain_timer()
2013-06-22	Merge branch 'v4l_for_linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Another set of fixes for Kernel 3.10. This series contain: - two Kbuild fixes for randconfig - a buffer overflow when using rtl28xuu with r820t tuner - one clk fixup on exynos4-is driver" * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] Fix build when drivers are builtin and frontend modules [media] s5p makefiles: don't override other selections on obj-[ym] [media] exynos4-is: Fix FIMC-IS clocks initialization [media] rtl28xxu: fix buffer overflow when probing Rafael Micro r820t tuner
2013-06-22	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Several fixes for bugs caught while looking through f_pos (ab)users" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aout32 coredump compat fix splice: don't pass the address of ->f_pos to methods mconsole: we'd better initialize pos before passing it to vfs_read()...
2013-06-22	aout32 coredump compat fix	Al Viro
	dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout handles it correctly (seeks by PAGE_SIZE - sizeof(struct user), getting the current position to PAGE_SIZE), compat one seeks by PAGE_SIZE and ends up at PAGE_SIZE + already written... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-21	xen-blkfront: set blk_queue_max_hw_sectors correctly	Roger Pau Monne
	Now that indirect segments are enabled blk_queue_max_hw_sectors must be set to match the maximum number of sectors we can handle in a request. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-21	xen-blkback: workaround compiler bug in gcc 4.1	Roger Pau Monne
	The code generat with gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54) creates an unbound loop for the second foreach_grant_safe loop in purge_persistent_gnt. The workaround is to avoid having this second loop and instead perform all the work inside the first loop by adding a new variable, clean_used, that will be set when all the desired persistent grants have been removed and we need to iterate over the remaining ones to remove the WAS_ACTIVE flag. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Tom O'Neill <toneill@vmem.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>