summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-08-02dm raid: support metadata devicesJonathan Brassow
Add the ability to parse and use metadata devices to dm-raid. Although not strictly required, without the metadata devices, many features of RAID are unavailable. They are used to store a superblock and bitmap. The role, or position in the array, of each device must be recorded in its superblock. This is to help with fault handling, array reshaping, and sanity checks. RAID 4/5/6 devices must be loaded in a specific order: in this way, the 'array_position' field helps validate the correctness of the mapping when it is loaded. It can be used during reshaping to identify which devices are added/removed. Fault handling is impossible without this field. For example, when a device fails it is recorded in the superblock. If this is a RAID1 device and the offending device is removed from the array, there must be a way during subsequent array assembly to determine that the failed device was the one removed. This is done by correlating the 'array_position' field and the bit-field variable 'failed_devices'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm raid: add write_mostly parameterJonathan Brassow
Add the write_mostly parameter to RAID1 dm-raid tables. This allows the user to set the WriteMostly flag on a RAID1 device that should normally be avoided for read I/O. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm raid: add region_size parameterJonathan Brassow
Allow the user to specify the region_size. Ensures that the supplied value meets md's constraints, viz. the number of regions does not exceed 2^21. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm raid: improve table parameters documentationJonathan Brassow
Add more information about some dm-raid table parameters and clarify how parameters are printed when 'dmsetup table' is issued. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm ioctl: forbid multiple device specifiersMikulas Patocka
Exactly one of name, uuid or device must be specified when referencing an existing device. This removes the ambiguity (risking the wrong device being updated) if two conflicting parameters were specified. Previously one parameter got used and any others were ignored silently. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm ioctl: introduce __get_dev_cellMikulas Patocka
Move logic to find device based on major/minor number to a separate function __get_dev_cell (similar to __get_uuid_cell and __get_name_cell). This makes the function __find_device_hash_cell more straightforward. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm ioctl: fill in device parameters in more ioctlsMikulas Patocka
Move parameter filling from find_device to __find_device_hash_cell. This patch causes ioctls using __find_device_hash_cell (DM_DEV_REMOVE_CMD, DM_DEV_SUSPEND_CMD - resume, DM_TABLE_CLEAR_CMD) to return device parameters, bringing them into line with the other ioctls. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm flakey: add corrupt_bio_byte featureMike Snitzer
Add corrupt_bio_byte feature to simulate corruption by overwriting a byte at a specified position with a specified value during intervals when the device is "down". Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm flakey: add drop_writesMike Snitzer
Add 'drop_writes' option to drop writes silently while the device is 'down'. Reads are not touched. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm flakey: support feature argsMike Snitzer
Add the ability to specify arbitrary feature flags when creating a flakey target. This code uses the same target argument helpers that the multipath target does. Also remove the superfluous 'dm-flakey' prefixes from the error messages, as they already contain the prefix 'flakey'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm flakey: use dm_target_offset and support discardsMike Snitzer
Use dm_target_offset() and support discards. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm table: share target argument parsing functionsMike Snitzer
Move multipath target argument parsing code into dm-table so other targets can share it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm snapshot: skip reading origin when overwriting complete chunkMikulas Patocka
If we write a full chunk in the snapshot, skip reading the origin device because the whole chunk will be overwritten anyway. This patch changes the snapshot write logic when a full chunk is written. In this case: 1. allocate the exception 2. dispatch the bio (but don't report the bio completion to device mapper) 3. write the exception record 4. report bio completed Callbacks must be done through the kcopyd thread, because callbacks must not race with each other. So we create two new functions: dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback. (This function must not be called from interrupt context.) dm_kcopyd_do_callback: submit callback. (This function may be called from interrupt context.) Performance test (on snapshots with 4k chunk size): without the patch: non-direct-io sequential write (dd): 17.7MB/s direct-io sequential write (dd): 20.9MB/s non-direct-io random write (mkfs.ext2): 0.44s with the patch: non-direct-io sequential write (dd): 26.5MB/s direct-io sequential write (dd): 33.2MB/s non-direct-io random write (mkfs.ext2): 0.27s Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm: ignore merge_bvec for snapshots when safeMikulas Patocka
Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate whether the device can accept bios larger than the size its merge function returns. When set, use this to send large bios to snapshots which can split them if necessary. Snapshot I/O may be significantly fragmented and this approach seems to improve peformance. Before the patch, dm_set_device_limits restricted bio size to page size if the underlying device had a merge function and the target didn't provide a merge function. After the patch, dm_set_device_limits restricts bio size to page size if the underlying device has a merge function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't provide a merge function. The snapshot target can't provide a merge function because when the merge function is called, it is impossible to determine where the bio will be remapped. Previously this led us to impose a 4k limit, which we can now remove if the snapshot store is located on a device without a merge function. Together with another patch for optimizing full chunk writes, it improves performance from 29MB/s to 40MB/s when writing to the filesystem on snapshot store. If the snapshot store is placed on a non-dm device with a merge function (such as md-raid), device mapper still limits all bios to page size. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm table: clean dm_get_device and move exportsMike Snitzer
There is no need for __table_get_device to be factored out. Also move the exports to the end of their respective functions. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm raid: tidy includesAlasdair G Kergon
A dm target only needs to use include/linux dm headers. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm ioctl: prevent empty messageAlasdair G Kergon
Detect invalid empty messages in core dm instead of requiring every target to check this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm raid: cleanup parameter handlingJonathan Brassow
Re-order the parameters so they are handled consistently in the same order where defined, parsed and output. Only include rebuild parameters in the STATUSTYPE_TABLE output if they were supplied in the original table line. Correct the parameter count when outputting rebuild: there are two words, not one. Use case-independent checks for keywords (as in other device-mapper targets). Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm snapshot: style cleanupsJonathan Brassow
Coding style cleanups. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
2011-08-02dm snapshot: remove unused definitionsMikulas Patocka
Remove a couple of unused #defines. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm kcopyd: remove nr_pages field from job structureMikulas Patocka
The nr_pages field in struct kcopyd_job is only used temporarily in run_pages_job() to count the number of required pages. We can use a local variable instead. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm kcopyd: remove offset field from job structureMikulas Patocka
The offset field in struct kcopyd_job is always zero so remove it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm: use vzallocJoe Perches
Use vzalloc() instead of vmalloc()+memset(). Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm log: userspace use list_moveKirill A. Shutemov
Replace list_del() followed by list_add() with list_move(). Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm log: clean up bit little endian bitopsAkinobu Mita
Using __test_and_{set,clear}_bit_le() with ignoring its return value can be replaced with __{set,clear}_bit_le(). This also removes unnecessary casts. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm table: fix discard supportMike Snitzer
Remove 'discards_supported' from the dm_table structure. The same information can be easily discovered from the table's target(s) in dm_table_supports_discards(). Before this fix dm_table_supports_discards() would skip checking the individual targets' 'discards_supported' flag if any one target in the table didn't set num_discard_requests > 0. Now the per-target 'discards_supported' flag is effective at insuring the final DM device advertises discard support. But, to be clear, targets that don't support discards (!num_discard_requests) will not receive discard requests. Also DMWARN if a target sets 'discards_supported' override but forgets to set 'num_discard_requests'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm: suppress endian warningsAlasdair G Kergon
Suppress sparse warnings about cpu_to_le32() by using __le32 types for on-disk data etc. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm: fix idr leak on module removalAlasdair G Kergon
Destroy _minor_idr when unloading the core dm module. (Found by kmemleak.) Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm io: flush cpu cache with vmapped ioMikulas Patocka
For normal kernel pages, CPU cache is synchronized by the dma layer. However, this is not done for pages allocated with vmalloc. If we do I/O to/from vmallocated pages, we must synchronize CPU cache explicitly. Prior to doing I/O on vmallocated page we must call flush_kernel_vmap_range to flush dirty cache on the virtual address. After finished read we must call invalidate_kernel_vmap_range to invalidate cache on the virtual address, so that accesses to the virtual address return newly read data and not stale data from CPU cache. This patch fixes metadata corruption on dm-snapshots on PA-RISC and possibly other architectures with caches indexed by virtual address. Cc: stable <stable@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02dm mpath: fix potential NULL pointer in feature arg processingMike Snitzer
Avoid dereferencing a NULL pointer if the number of feature arguments supplied is fewer than indicated. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
2011-08-02dm snapshot: flush disk cache when mergingMikulas Patocka
This patch makes dm-snapshot flush disk cache when writing metadata for merging snapshot. Without cache flushing the disk may reorder metadata write and other data writes and there is a possibility of data corruption in case of power fault. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02block/genhd.c: remove useless cast in diskstats_show()Herbert Poetzl
Remove the (unsigned long long) cast in diskstats_show() and adjusts the seq_printf() format string to 'unsigned long' diskstats_show() uses part_stat_read() to get the stats, which either accesses the specified field in the struct disk_stats directly (non SMP) or sums up the per CPU values in a variable of the same type as the field, so in any case the result will have the same type and range as the specified field which for all disk_stats entries is unsigned long Also, for unsigned long ranges the output of %lu should be identical to the one of %llu, so no change in the actual proc entry contents. Signed-off-by: Herbert Poetzl <herbert@13thfloor.at> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-02drivers/cdrom/cdrom.c: relax check on dvd manufacturer valueAndrew Morton
The report has an ISO which has a very long manufacturer ID. It seems that Linux is wrong, not the ISO maker. Relax the check for the length of this field: emit a warning and truncate the incoming data to 2048 bytes rather than rejecting the entire thing. dvd_manufact.value isn't null-terminated. I'm not even sure if it's a string. The kernel doesn't apepar to use it anyway. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=39062 Reported-by: <ale.goujon@gmail.com> Tested-by: <ale.goujon@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-02drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parseH Hartley Sweeten
The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-02bsg-lib: add module.h includeJens Axboe
Due to conflicts with the moduleh tree in linux-next, we run into an include file mess. We really need export.h in that tree, but if we add module.h locally then the issue is easier to resolve. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-02bnx2x: Prevent restarting Tx during bnx2x_nic_unloadVladislav Zolotarov
Tx queues were stopped before bp->state was changed to a value different from BNX2X_STATE_OPEN, which allowed the bnx2x_tx_int() called from the NAPI context to re-enable it. This then allowed the netdev->ndo_start_xmit() to be called in the middle of the function reset and rings freeing. This patch changes bp->state to a value different from BNX2X_STATE_OPEN BEFORE disabling the Tx queues in order to restore the broken protection against the above race in the bnx2x_tx_int(). Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02watchdog: Cleanup WATCHDOG_CORE help textJosh Boyer
The newly added WATCHDOG_CORE option is a bool, but the help text suggests it can be built as a module. Fix it up. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2011-08-02watchdog: Fix POST failure on ASUS P5N32-E SLI and similar boardsMart Gerrits
At present the module does not unset the NO_REBOOT bit upon shutdown, this causes the BIOS to fail the POST once and reset. During the next boot it displays the following error message: ***** Warning: System BOOT Fail ***** Your system last boot fail or POST interrupted. Please enter setup to load default and reboot again. Press F1 to continue, DEL to enter SETUP With this patch the NO_REBOOT flag will be unset on shutdown and thus stop this failure from occurring. Tested on 'ASUS P5N32-E SLI with BIOS revision 1801' and 'ASUS P5N32-E SLI PLUS with BIOS revision 1502'. Signed-off-by: Mart Gerrits <mart1987@gmail.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2011-08-02watchdog: shwdt: fix usage of mod_timerDavid Engraf
Fix the usage of mod_timer() and make the driver usable. mod_timer() must be called with an absolute timeout in jiffies. The old implementation used a relative timeout thus the hardware watchdog was never triggered. Signed-off-by: David Engraf <david.engraf@sysgo.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Wim Van sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable <stable@kernel.org>
2011-08-02Merge branch 'fix/asoc' into for-linusTakashi Iwai
2011-08-02ALSA: asihpi - Clarify adapter index validity checkEliot Blennerhassett
Avoids assigning possibly invalid address to pa, even if it is never dereferenced. Correct error response to reflect request object/function ids. Signed-off-by: Eliot Blennerhassett <eblennerhassett@audioscience.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-08-02cfq-iosched: Reduce linked group count upon group destructionVivek Goyal
FQ keeps track of number of groups which are linked on blkcg->blkg_list. This is useful to avoid races between queue exit and cgroup exit code paths. So if at the request queue exit time linked group count is not zero, that means there are some group out there which is yet to be deleted under rcu read period and queue exit code should wait for on rcu period. In my previous patch I forgot to decrease the number of group count. So in current form, we nr_blkcg_linked_grps is always non-zero and we will always wait one rcu period (if BLK_CGROUP=y). The side effect of this is that it can increase boot time. I am surprised, nobody complained so far. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-02ACPI print OSI(Linux) warning only onceLen Brown
This message gets repeated on some machines: https://bugzilla.kernel.org/show_bug.cgi?id=29292 Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-01fix block device fallout from ->fsync() changesRafael J. Wysocki
blkdev_fsync() needs to write pages in pagecache... Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01oom: task->mm == NULL doesn't mean the memory was freedOleg Nesterov
exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which frees the memory. However select_bad_process() checks ->mm != NULL before TIF_MEMDIE, so it continues to kill other tasks even if we have the oom-killed task freeing its memory. Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip the tasks which have already passed exit_notify() to ensure a zombie with TIF_MEMDIE set can't block oom-killer. Alternatively we could probably clear TIF_MEMDIE after exit_mmap(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-01net: add kerneldoc to skb_copy_bits()Eric Dumazet
Since skb_copy_bits() is called from assembly, add a fat comment to make clear we should think twice before changing its prototype. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01Documentation/bonding.txt: Update to 3.x version numbersJesper Juhl
Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01doc: Update the email address for Paul Moore in various source filesPaul Moore
My @hp.com will no longer be valid starting August 5, 2011 so an update is necessary. My new email address is employer independent so we don't have to worry about doing this again any time soon. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01doc: Update the MAINTAINERS info for Paul MoorePaul Moore
My @hp.com will no longer be valid starting August 5, 2011 so an update is necessary. My new email address is employer independent so we don't have to worry about doing this again any time soon. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01atm: br2864: sent packets truncated in VC routed modeChas Williams
Reported-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>