summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-06-21btrfs: Check name_len on add_inode_ref call pathSu Yue
replay_one_buffer first reads buffers and dispatches items accroding to the item type. In this patch, add_inode_ref handles inode_ref and inode_extref. Then add_inode_ref calls ref_get_fields and extref_get_fields to read ref/extref name for the first time. So checking name_len before reading those two is fine. add_inode_ref also calls inode_in_dir to match ref/extref in parent_dir. The call graph includes btrfs_match_dir_item_name to read dir_item name in the parent dir. Checking first dir_item is not enough. Change it to verify every dir_item while doing matches. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21btrfs: Check name_len with boundary in verify dir_itemSu Yue
Originally, verify_dir_item verifies name_len of dir_item with fixed values but not item boundary. If corrupted name_len was not bigger than the fixed value, for example 255, the function will think the dir_item is fine. And then reading beyond boundary will cause crash. Example: 1. Corrupt one dir_item name_len to be 255. 2. Run 'ls -lar /mnt/test/ > /dev/null' dmesg: [ 48.451449] BTRFS info (device vdb1): disk space caching is enabled [ 48.451453] BTRFS info (device vdb1): has skinny extents [ 48.489420] general protection fault: 0000 [#1] SMP [ 48.489571] Modules linked in: ext4 jbd2 mbcache btrfs xor raid6_pq [ 48.489716] CPU: 1 PID: 2710 Comm: ls Not tainted 4.10.0-rc1 #5 [ 48.489853] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 [ 48.490008] task: ffff880035df1bc0 task.stack: ffffc90004800000 [ 48.490008] RIP: 0010:read_extent_buffer+0xd2/0x190 [btrfs] [ 48.490008] RSP: 0018:ffffc90004803d98 EFLAGS: 00010202 [ 48.490008] RAX: 000000000000001b RBX: 000000000000001b RCX: 0000000000000000 [ 48.490008] RDX: ffff880079dbf36c RSI: 0005080000000000 RDI: ffff880079dbf368 [ 48.490008] RBP: ffffc90004803dc8 R08: ffff880078e8cc48 R09: ffff880000000000 [ 48.490008] R10: 0000160000000000 R11: 0000000000001000 R12: ffff880079dbf288 [ 48.490008] R13: ffff880078e8ca88 R14: 0000000000000003 R15: ffffc90004803e20 [ 48.490008] FS: 00007fef50c60800(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000 [ 48.490008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.490008] CR2: 000055f335ac2ff8 CR3: 000000007356d000 CR4: 00000000001406e0 [ 48.490008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 48.490008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 48.490008] Call Trace: [ 48.490008] btrfs_real_readdir+0x3b7/0x4a0 [btrfs] [ 48.490008] iterate_dir+0x181/0x1b0 [ 48.490008] SyS_getdents+0xa7/0x150 [ 48.490008] ? fillonedir+0x150/0x150 [ 48.490008] entry_SYSCALL_64_fastpath+0x18/0xad [ 48.490008] RIP: 0033:0x7fef5032546b [ 48.490008] RSP: 002b:00007ffeafcdb830 EFLAGS: 00000206 ORIG_RAX: 000000000000004e [ 48.490008] RAX: ffffffffffffffda RBX: 00007fef5061db38 RCX: 00007fef5032546b [ 48.490008] RDX: 0000000000008000 RSI: 000055f335abaff0 RDI: 0000000000000003 [ 48.490008] RBP: 00007fef5061dae0 R08: 00007fef5061db48 R09: 0000000000000000 [ 48.490008] R10: 000055f335abafc0 R11: 0000000000000206 R12: 00007fef5061db38 [ 48.490008] R13: 0000000000008040 R14: 00007fef5061db38 R15: 000000000000270e [ 48.490008] RIP: read_extent_buffer+0xd2/0x190 [btrfs] RSP: ffffc90004803d98 [ 48.499455] ---[ end trace 321920d8e8339505 ]--- Fix it by adding a parameter @slot and check name_len with item boundary by calling btrfs_is_name_len_valid. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> rev Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21btrfs: Introduce btrfs_is_name_len_valid to avoid reading beyond boundarySu Yue
Introduce function btrfs_is_name_len_valid. The function compares parameter @name_len with item boundary then returns true if name_len is valid. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ s/btrfs_leaf_data/BTRFS_LEAF_DATA_OFFSET/ ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21btrfs: move dev stats accounting out of wait_dev_flushDavid Sterba
We should really just wait in wait_dev_flush and let the caller decide what to do with the error value. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21btrfs: account as waiting for IO, while waiting fot the flush bio completionDavid Sterba
Similar to what submit_bio_wait does, we should account for IO while waiting for a bio completion. This has marginal visible effects, flush bio is short-lived. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21btrfs: preallocate device flush bioDavid Sterba
For devices that support flushing, we allocate a bio, submit, wait for it and then free it. The bio allocation does not fail so ENOMEM is not a problem but we still may unnecessarily stress the allocation subsystem. Instead, we can allocate the bio at the same time we allocate the device and reuse it each time we need to flush the barriers. The bio is reset before each use. Reference counting is simplified to just device allocation (get) and freeing (put). The bio used to be submitted through the integrity checker which will find out that bio has no data attached and call submit_bio. Status of the bio in flight needs to be tracked separately in case the device caches get switched off between write and wait. Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21Btrfs: incremental send, fix invalid path for unlink commandsFilipe Manana
An incremental send can contain unlink operations with an invalid target path when we rename some directory inode A, then rename some file inode B to the old name of inode A and directory inode A is an ancestor of inode B in the parent snapshot (but not anymore in the send snapshot). Consider the following example scenario where this issue happens. Parent snapshot: . (ino 256) | |--- dir1/ (ino 257) |--- dir2/ (ino 258) | |--- file1 (ino 259) | |--- file3 (ino 261) | |--- dir3/ (ino 262) |--- file22 (ino 260) |--- dir4/ (ino 263) Send snapshot: . (ino 256) | |--- dir1/ (ino 257) |--- dir2/ (ino 258) |--- dir3 (ino 260) |--- file3/ (ino 262) |--- dir4/ (ino 263) |--- file11 (ino 269) |--- file33 (ino 261) When attempting to apply the corresponding incremental send stream, an unlink operation contains an invalid path which makes the receiver fail. The following is verbose output of the btrfs receive command: receiving snapshot snap2 uuid=7d5450da-a573-e043-a451-ec85f4879f0f (...) utimes utimes dir1 utimes dir1/dir2 link dir1/dir3/dir4/file11 -> dir1/dir2/file1 unlink dir1/dir2/file1 utimes dir1/dir2 truncate dir1/dir3/dir4/file11 size=0 utimes dir1/dir3/dir4/file11 rename dir1/dir3 -> o262-7-0 link dir1/dir3 -> o262-7-0/file22 unlink dir1/dir3/file22 ERROR: unlink dir1/dir3/file22 failed. Not a directory The following steps happen during the computation of the incremental send stream the lead to this issue: 1) Before we start processing the new and deleted references for inode 260, we compute the full path of the deleted reference ("dir1/dir3/file22") and cache it in the list of deleted references for our inode. 2) We then start processing the new references for inode 260, for which there is only one new, located at "dir1/dir3". When processing this new reference, we check that inode 262, which was not yet processed, collides with the new reference and because of that we orphanize inode 262 so its new full path becomes "o262-7-0". 3) After the orphanization of inode 262, we create the new reference for inode 260 by issuing a link command with a target path of "dir1/dir3" and a source path of "o262-7-0/file22". 4) We then start processing the deleted references for inode 260, for which there is only one with the base name of "file22", and issue an unlink operation containing the target path computed at step 1, which is wrong because that path no longer exists and should be replaced with "o262-7-0/file22". So fix this issue by recomputing the full path of deleted references if when we processed the new references for an inode we ended up orphanizing any other inode that is an ancestor of our inode in the parent snapshot. A test case for fstests follows soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> [ adjusted after prev patch removed fs_path::dir_path and dir_path_len ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21Btrfs: send, fix invalid path after renaming and linking fileFilipe Manana
Currently an incremental snapshot can generate link operations which contain an invalid target path. Such case happens when in the send snapshot a file was renamed, a new hard link added for it and some other inode (with a lower number) got renamed to the former name of that file. Example: Parent snapshot . (ino 256) | |--- f1 (ino 257) |--- f2 (ino 258) |--- f3 (ino 259) Send snapshot . (ino 256) | |--- f2 (ino 257) |--- f3 (ino 258) |--- f4 (ino 259) |--- f5 (ino 258) The following steps happen when computing the incremental send stream: 1) When processing inode 257, inode 258 is orphanized (renamed to "o258-7-0"), because its current reference has the same name as the new reference for inode 257; 2) When processing inode 258, we iterate over all its new references, which have the names "f3" and "f5". The first iteration sees name "f5" and renames the inode from its orphan name ("o258-7-0") to "f5", while the second iteration sees the name "f3" and, incorrectly, issues a link operation with a target name matching the orphan name, which no longer exists. The first iteration had reset the current valid path of the inode to "f5", but in the second iteration we lost it because we found another inode, with a higher number of 259, which has a reference named "f3" as well, so we orphanized inode 259 and recomputed the current valid path of inode 258 to its old orphan name because inode 259 could be an ancestor of inode 258 and therefore the current valid path could contain the pre-orphanization name of inode 259. However in this case inode 259 is not an ancestor of inode 258 so the current valid path should not be recomputed. This makes the receiver fail with the following error: ERROR: link f3 -> o258-7-0 failed: No such file or directory So fix this by not recomputing the current valid path for an inode whenever we find a colliding reference from some not yet processed inode (inode number higher then the one currently being processed), unless that other inode is an ancestor of the one we are currently processing. A test case for fstests will follow soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21Btrfs: fix invalid extent maps due to hole punchingFilipe Manana
While punching a hole in a range that is not aligned with the sector size (currently the same as the page size) we can end up leaving an extent map in memory with a length that is smaller then the sector size or with a start offset that is not aligned to the sector size. Both cases are not expected and can lead to problems. This issue is easily detected after the patch from commit a7e3b975a0f9 ("Btrfs: fix reported number of inode blocks"), introduced in kernel 4.12-rc1, in a scenario like the following for example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -c "pwrite -S 0xaa -b 100K 0 100K" /mnt/foo $ xfs_io -c "fpunch 60K 90K" /mnt/foo $ xfs_io -c "pwrite -S 0xbb -b 100K 50K 100K" /mnt/foo $ xfs_io -c "pwrite -S 0xcc -b 50K 100K 50K" /mnt/foo $ umount /mnt After the unmount operation we can see several warnings emmitted due to underflows related to space reservation counters: [ 2837.443299] ------------[ cut here ]------------ [ 2837.447395] WARNING: CPU: 8 PID: 2474 at fs/btrfs/inode.c:9444 btrfs_destroy_inode+0xe8/0x27e [btrfs] [ 2837.452108] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button se rio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_gene ric raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.458389] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.459754] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.462379] Call Trace: [ 2837.462379] dump_stack+0x68/0x92 [ 2837.462379] __warn+0xc2/0xdd [ 2837.462379] warn_slowpath_null+0x1d/0x1f [ 2837.462379] btrfs_destroy_inode+0xe8/0x27e [btrfs] [ 2837.462379] destroy_inode+0x3d/0x55 [ 2837.462379] evict+0x177/0x17e [ 2837.462379] dispose_list+0x50/0x71 [ 2837.462379] evict_inodes+0x132/0x141 [ 2837.462379] generic_shutdown_super+0x3f/0xeb [ 2837.462379] kill_anon_super+0x12/0x1c [ 2837.462379] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.462379] deactivate_locked_super+0x30/0x68 [ 2837.462379] deactivate_super+0x36/0x39 [ 2837.462379] cleanup_mnt+0x58/0x76 [ 2837.462379] __cleanup_mnt+0x12/0x14 [ 2837.462379] task_work_run+0x77/0x9b [ 2837.462379] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.462379] syscall_return_slowpath+0x196/0x1b9 [ 2837.462379] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.462379] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.462379] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.462379] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.462379] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.462379] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.462379] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.462379] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.519355] ---[ end trace e79345fe24b30b8d ]--- [ 2837.596256] ------------[ cut here ]------------ [ 2837.597625] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:5699 btrfs_free_block_groups+0x246/0x3eb [btrfs] [ 2837.603547] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.659372] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.663359] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.663359] Call Trace: [ 2837.663359] dump_stack+0x68/0x92 [ 2837.663359] __warn+0xc2/0xdd [ 2837.663359] warn_slowpath_null+0x1d/0x1f [ 2837.663359] btrfs_free_block_groups+0x246/0x3eb [btrfs] [ 2837.663359] close_ctree+0x1dd/0x2e1 [btrfs] [ 2837.663359] ? evict_inodes+0x132/0x141 [ 2837.663359] btrfs_put_super+0x15/0x17 [btrfs] [ 2837.663359] generic_shutdown_super+0x6a/0xeb [ 2837.663359] kill_anon_super+0x12/0x1c [ 2837.663359] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.663359] deactivate_locked_super+0x30/0x68 [ 2837.663359] deactivate_super+0x36/0x39 [ 2837.663359] cleanup_mnt+0x58/0x76 [ 2837.663359] __cleanup_mnt+0x12/0x14 [ 2837.663359] task_work_run+0x77/0x9b [ 2837.663359] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.663359] syscall_return_slowpath+0x196/0x1b9 [ 2837.663359] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.663359] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.663359] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.663359] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.663359] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.663359] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.663359] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.663359] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.739445] ---[ end trace e79345fe24b30b8e ]--- [ 2837.745595] ------------[ cut here ]------------ [ 2837.746412] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:5700 btrfs_free_block_groups+0x261/0x3eb [btrfs] [ 2837.747955] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.755395] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.756769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.758526] Call Trace: [ 2837.758925] dump_stack+0x68/0x92 [ 2837.759383] __warn+0xc2/0xdd [ 2837.759383] warn_slowpath_null+0x1d/0x1f [ 2837.759383] btrfs_free_block_groups+0x261/0x3eb [btrfs] [ 2837.759383] close_ctree+0x1dd/0x2e1 [btrfs] [ 2837.759383] ? evict_inodes+0x132/0x141 [ 2837.759383] btrfs_put_super+0x15/0x17 [btrfs] [ 2837.759383] generic_shutdown_super+0x6a/0xeb [ 2837.759383] kill_anon_super+0x12/0x1c [ 2837.759383] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.759383] deactivate_locked_super+0x30/0x68 [ 2837.759383] deactivate_super+0x36/0x39 [ 2837.759383] cleanup_mnt+0x58/0x76 [ 2837.759383] __cleanup_mnt+0x12/0x14 [ 2837.759383] task_work_run+0x77/0x9b [ 2837.759383] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.759383] syscall_return_slowpath+0x196/0x1b9 [ 2837.759383] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.759383] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.759383] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.759383] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.759383] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.759383] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.759383] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.759383] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.777063] ---[ end trace e79345fe24b30b8f ]--- [ 2837.778235] ------------[ cut here ]------------ [ 2837.778856] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:9825 btrfs_free_block_groups+0x348/0x3eb [btrfs] [ 2837.791385] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.797711] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.798594] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.800118] Call Trace: [ 2837.800515] dump_stack+0x68/0x92 [ 2837.801015] __warn+0xc2/0xdd [ 2837.801471] warn_slowpath_null+0x1d/0x1f [ 2837.801698] btrfs_free_block_groups+0x348/0x3eb [btrfs] [ 2837.801698] close_ctree+0x1dd/0x2e1 [btrfs] [ 2837.801698] ? evict_inodes+0x132/0x141 [ 2837.801698] btrfs_put_super+0x15/0x17 [btrfs] [ 2837.801698] generic_shutdown_super+0x6a/0xeb [ 2837.801698] kill_anon_super+0x12/0x1c [ 2837.801698] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.801698] deactivate_locked_super+0x30/0x68 [ 2837.801698] deactivate_super+0x36/0x39 [ 2837.801698] cleanup_mnt+0x58/0x76 [ 2837.801698] __cleanup_mnt+0x12/0x14 [ 2837.801698] task_work_run+0x77/0x9b [ 2837.801698] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.801698] syscall_return_slowpath+0x196/0x1b9 [ 2837.801698] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.801698] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.801698] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.801698] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.801698] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.801698] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.801698] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.801698] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.818441] ---[ end trace e79345fe24b30b90 ]--- [ 2837.818991] BTRFS info (device sdc): space_info 1 has 7974912 free, is not full [ 2837.819830] BTRFS info (device sdc): space_info total=8388608, used=417792, pinned=0, reserved=0, may_use=18446744073709547520, readonly=0 What happens in the above example is the following: 1) When punching the hole, at btrfs_punch_hole(), the variable tail_len is set to 2048 (as tail_start is 148Kb + 1 and offset + len is 150Kb). This results in the creation of an extent map with a length of 2Kb starting at file offset 148Kb, through find_first_non_hole() -> btrfs_get_extent(). 2) The second write (first write after the hole punch operation), sets the range [50Kb, 152Kb[ to delalloc. 3) The third write, at btrfs_find_new_delalloc_bytes(), sees the extent map covering the range [148Kb, 150Kb[ and ends up calling set_extent_bit() for the same range, which results in splitting an existing extent state record, covering the range [148Kb, 152Kb[ into two 2Kb extent state records, covering the ranges [148Kb, 150Kb[ and [150Kb, 152Kb[. 4) Finally at lock_and_cleanup_extent_if_need(), immediately after calling btrfs_find_new_delalloc_bytes() we clear the delalloc bit from the range [100Kb, 152Kb[ which results in the btrfs_clear_bit_hook() callback being invoked against the two 2Kb extent state records that cover the ranges [148Kb, 150Kb[ and [150Kb, 152Kb[. When called against the first 2Kb extent state, it calls btrfs_delalloc_release_metadata() with a length argument of 2048 bytes. That function rounds up the length to a sector size aligned length, so it ends up considering a length of 4096 bytes, and then calls calc_csum_metadata_size() which results in decrementing the inode's csum_bytes counter by 4096 bytes, so after it stays a value of 0 bytes. Then the same happens when btrfs_clear_bit_hook() is called against the second extent state that has a length of 2Kb, covering the range [150Kb, 152Kb[, the length is rounded up to 4096 and calc_csum_metadata_size() ends up being called to decrement 4096 bytes from the inode's csum_bytes counter, which at that time has a value of 0, leading to an underflow, which is exactly what triggers the first warning, at btrfs_destroy_inode(). All the other warnings relate to several space accounting counters that underflow as well due to similar reasons. A similar case but where the hole punching operation creates an extent map with a start offset not aligned to the sector size is the following: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "fpunch 695K 820K" $SCRATCH_MNT/bar $ xfs_io -c "pwrite -S 0xaa 1008K 307K" $SCRATCH_MNT/bar $ xfs_io -c "pwrite -S 0xbb -b 630K 1073K 630K" $SCRATCH_MNT/bar $ xfs_io -c "pwrite -S 0xcc -b 459K 1068K 459K" $SCRATCH_MNT/bar $ umount /mnt During the unmount operation we get similar traces for the same reasons as in the first example. So fix the hole punching operation to make sure it never creates extent maps with a length that is not aligned to the sector size nor with a start offset that is not aligned to the sector size, as this breaks all assumptions and it's a land mine. Fixes: d77815461f04 ("btrfs: Avoid trucating page or punching hole in a already existed hole.") Cc: <stable@vger.kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21btrfs: add cond_resched to btrfs_qgroup_trace_leaf_itemsJeff Mahoney
On an uncontended system, we can end up hitting soft lockups while doing replace_path. At the core, and frequently called is btrfs_qgroup_trace_leaf_items, so it makes sense to add a cond_resched there. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-21btrfs: use new block error codeDan Carpenter
This function is supposed to return blk_status_t error codes now but there was a stray -ENOMEM left behind. Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-21CIFS: Fix some return values in case of error in 'crypt_message'Christophe Jaillet
'rc' is known to be 0 at this point. So if 'init_sg' or 'kzalloc' fails, we should return -ENOMEM instead. Also remove a useless 'rc' in a debug message as it is meaningless here. Fixes: 026e93dc0a3ee ("CIFS: Encrypt SMB3 requests before sending") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2017-06-20block: Make most scsi_req_init() calls implicitBart Van Assche
Instead of explicitly calling scsi_req_init() after blk_get_request(), call that function from inside blk_get_request(). Add an .initialize_rq_fn() callback function to the block drivers that need it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn() because it is too small to keep it as a separate function. Keep the scsi_req_init() call in ide_prep_sense() because it follows a blk_rq_init() call. References: commit 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20cifs: remove redundant return in cifs_creation_time_getColin Ian King
There is a redundant return in function cifs_creation_time_get that appears to be old vestigial code than can be removed. So remove it. Detected by CoverityScan, CID#1361924 ("Structurally dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-06-20CIFS: Improve readdir verbosityPavel Shilovsky
Downgrade the loglevel for SMB2 to prevent filling the log with messages if e.g. readdir was interrupted. Also make SMB2 and SMB1 codepaths do the same logging during readdir. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2017-06-20CIFS: check if pages is null rather than bv for a failed allocationColin Ian King
pages is being allocated however a null check on bv is being used to see if the allocation failed. Fix this by checking if pages is null. Detected by CoverityScan, CID#1432974 ("Logically dead code") Fixes: ccf7f4088af2dd ("CIFS: Add asynchronous context to support kernel AIO") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-06-20CIFS: Set ->should_dirty in cifs_user_readv()Dan Carpenter
The current code causes a static checker warning because ITER_IOVEC is zero so the condition is never true. Fixes: 6685c5e2d1ac ("CIFS: Add asynchronous read support through kernel AIO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-06-20percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batchNikolay Borisov
Currently, percpu_counter_add is a wrapper around __percpu_counter_add which is preempt safe due to explicit calls to preempt_disable. Given how __ prefix is used in percpu related interfaces, the naming unfortunately creates the false sense that __percpu_counter_add is less safe than percpu_counter_add. In terms of context-safety, they're equivalent. The only difference is that the __ version takes a batch parameter. Make this a bit more explicit by just renaming __percpu_counter_add to percpu_counter_add_batch. This patch doesn't cause any functional changes. tj: Minor updates to patch description for clarity. Cosmetic indentation updates. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@fb.com> Cc: linux-mm@kvack.org Cc: "David S. Miller" <davem@davemloft.net>
2017-06-20xfs: don't allow bmap on rt filesDarrick J. Wong
bmap returns a dumb LBA address but not the block device that goes with that LBA. Swapfiles don't care about this and will blindly assume that the data volume is the correct blockdev, which is totally bogus for files on the rt subvolume. This results in the swap code doing IOs to arbitrary locations on the data device(!) if the passed in mapping is a realtime file, so just turn off bmap for rt files. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-06-20xfs: allow reading of already-locked remote symbolic linkDarrick J. Wong
Expose the readlink variant that doesn't take the inode lock so that the scrubber can inspect symlink contents. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: pass along transaction context when reading xattr block buffersDarrick J. Wong
Teach the extended attribute reading functions to pass along a transaction context if one was supplied. The extended attribute scrub code will use transactions to lock buffers and avoid deadlocking with itself in the case of loops; since it will already have the inode locked, also create xattr get/list helpers that don't take locks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: pass along transaction context when reading directory block buffersDarrick J. Wong
Teach the directory reading functions to pass along a transaction context if one was supplied. The directory scrub code will use transactions to lock buffers and avoid deadlocking with itself in the case of loops. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: return the hash value of a leaf1 directory blockDarrick J. Wong
Modify the existing dir leafn lasthash function to enable us to calculate the highest hash value of a leaf1 block. This will be used by the directory scrubbing code to check the sanity of hashes in leaf1 directory blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: refactor the ifork block counting functionDarrick J. Wong
Refactor the inode fork block counting function to count extents for us at the same time. This will be used by the bmbt scrubber function. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: make _bmap_count_blocks consistent wrt delalloc extent behaviorDarrick J. Wong
There is an inconsistency in the way that _bmap_count_blocks deals with delalloc reservations -- if the specified fork is in extents format, *count is set to the total number of blocks referenced by the in-core fork, including delalloc extents. However, if the fork is in btree format, *count is set to the number of blocks referenced by the on-disk fork, which does /not/ include delalloc extents. For the lone existing caller of _bmap_count_blocks this hasn't been an issue because the function is only used to count xattr fork blocks (where there aren't any delalloc reservations). However, when scrub comes along it will use this same function to check di_nblocks against both on-disk extent maps, so we need this behavior to be consistent. Therefore, fix _bmap_count_leaves not to include delalloc extents and remove unnecessary parameters. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20GFS2: Eliminate vestigial sd_log_flush_wrappedBob Peterson
Superblock variable sd_log_flush_wrapped is set, but never referenced, so this patch eliminates it. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-06-20btrfs: nowait aio supportGoldwyn Rodrigues
Return EAGAIN if any of the following checks fail + i_rwsem is not lockable + NODATACOW or PREALLOC is not set + Cannot nocow at the desired location + Writing beyond end of file which is not allocated Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20xfs: nowait aio supportGoldwyn Rodrigues
If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable immediately. IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin if it needs allocation either due to file extension, writing to a hole, or COW or waiting for other DIOs to finish. Return -EAGAIN if we don't have extent list in memory. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20ext4: nowait aio supportGoldwyn Rodrigues
Return EAGAIN if any of the following checks fail for direct I/O: + i_rwsem is lockable + Writing beyond end of file (will trigger allocation) + Blocks are not allocated at the write location Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20block: return on congested block deviceGoldwyn Rodrigues
A new bio operation flag REQ_NOWAIT is introduced to identify bio's orignating from iocb with IOCB_NOWAIT. This flag indicates to return immediately if a request cannot be made instead of retrying. Stacked devices such as md (the ones with make_request_fn hooks) currently are not supported because it may block for housekeeping. For example, an md can have a part of the device suspended. For this reason, only request based devices are supported. In the future, this feature will be expanded to stacked devices by teaching them how to handle the REQ_NOWAIT flags. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20fs: Introduce IOMAP_NOWAITGoldwyn Rodrigues
IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps. This is used by XFS in the XFS patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAITGoldwyn Rodrigues
RWF_NOWAIT informs kernel to bail out if an AIO request will block for reasons such as file allocations, or a writeback triggered, or would block while allocating requests while performing direct I/O. RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags. FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable of returning -EAGAIN if the AIO call will block. This must be set by supporting filesystems in the ->open() call. Filesystems xfs, btrfs and ext4 would be supported in the following patches. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20fs: Use RWF_* flags for AIO operationsGoldwyn Rodrigues
aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will carry the RWF_* flags. We cannot use aio_flags because they are not checked for validity which may break existing applications. Note, the only place RWF_HIPRI comes in effect is dio_await_one(). All the rest of the locations, aio code return -EIOCBQUEUED before the checks for RWF_HIPRI. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20fs: Separate out kiocb flags setup based on RWF_* flagsGoldwyn Rodrigues
Also added RWF_SUPPORTED to encompass all flags. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20btrfs: Round down values which are written for total_bytes_sizeNikolay Borisov
We got an internal report about a file system not wanting to mount following 99e3ecfcb9f4 ("Btrfs: add more validation checks for superblock"). BTRFS error (device sdb1): super_total_bytes 1000203816960 mismatch with fs_devices total_rw_bytes 1000203820544 Subtracting the numbers we get a difference of less than a 4kb. Upon closer inspection it became apparent that mkfs actually rounds down the size of the device to a multiple of sector size. However, the same cannot be said for various functions which modify the total size and are called from btrfs_balance as well as when adding a new device. So this patch ensures that values being saved into on-disk data structures are always rounded down to a multiple of sectorsize. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-20btrfs: Manually implement device_total_bytes getter/setterNikolay Borisov
The device->total_bytes member needs to always be rounded down to sectorsize so that it corresponds to the value of super->total_bytes. However, there are multiple places where the setter is fed a value which is not rounded which can cause a fs to be unmountable due to the check introduced in 99e3ecfcb9f4 ("Btrfs: add more validation checks for superblock"). This patch implements the getter/setter manually so that in a later patch I can add necessary code to catch offenders. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-20btrfs: obsolete and remove mount option alloc_startDavid Sterba
The mount option alloc_start was used in the past for debugging and stressing the chunk allocator. Not meant to be used by users, so we're not breaking anybody's setup. There was some added complexity handling changes of the value and when it was not same as default. Such code has likely been untested and I think it's better to remove it. This patch kills all use of alloc_start, and by doing that also fixes a bug when alloc_size is set, potentially called from statfs: in btrfs_calc_avail_data_space, traversing the list in RCU, the RCU protection is temporarily dropped so btrfs_account_dev_extents_size can be called and then RCU is locked again! Doing that inside list_for_each_entry_rcu is just asking for trouble, but unlikely to be observed in practice. Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-20btrfs: move fs_info::fs_frozen to the flagsDavid Sterba
We can keep the state among the other fs_info flags, there's no reason why fs_frozen would need to be separate. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-20btrfs: cleanup duplicate return value in insert_inline_extentDavid Sterba
The pattern when err is used for function exit and ret is used for return values of callees is not used here. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-20fs/proc: kcore: use kcore_list type to check for vmalloc/module addressArd Biesheuvel
Instead of passing each start address into is_vmalloc_or_module_addr() to decide whether it falls into either the VMALLOC or the MODULES region, we can simply check the type field of the current kcore_list entry, since it will be set to KCORE_VMALLOC based on exactly the same conditions. As a bonus, when reading the KCORE_TEXT region on architectures that have one, this will avoid using vread() on the region if it happens to intersect with a KCORE_VMALLOC region. This is due the fact that the KCORE_TEXT region is the first one to be added to the kcore region list. Reported-by: Tan Xiaojun <tanxiaojun@huawei.com> Tested-by: Tan Xiaojun <tanxiaojun@huawei.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list namingIngo Molnar
So I've noticed a number of instances where it was not obvious from the code whether ->task_list was for a wait-queue head or a wait-queue entry. Furthermore, there's a number of wait-queue users where the lists are not for 'tasks' but other entities (poll tables, etc.), in which case the 'task_list' name is actively confusing. To clear this all up, name the wait-queue head and entry list structure fields unambiguously: struct wait_queue_head::task_list => ::head struct wait_queue_entry::task_list => ::entry For example, this code: rqw->wait.task_list.next != &wait->task_list ... is was pretty unclear (to me) what it's doing, while now it's written this way: rqw->wait.head.next != &wait->entry ... which makes it pretty clear that we are iterating a list until we see the head. Other examples are: list_for_each_entry_safe(pos, next, &x->task_list, task_list) { list_for_each_entry(wq, &fence->wait.task_list, task_list) { ... where it's unclear (to me) what we are iterating, and during review it's hard to tell whether it's trying to walk a wait-queue entry (which would be a bug), while now it's written as: list_for_each_entry_safe(pos, next, &x->head, entry) { list_for_each_entry(wq, &fence->wait.head, entry) { Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into ↵Ingo Molnar
<linux/wait_bit.h> The wait_bit*() types and APIs are mixed into wait.h, but they are a pretty orthogonal extension of wait-queues. Furthermore, only about 50 kernel files use these APIs, while over 1000 use the regular wait-queue functionality. So clean up the main wait.h by moving the wait-bit functionality out of it, into a separate .h and .c file: include/linux/wait_bit.h for types and APIs kernel/sched/wait_bit.c for the implementation Update all header dependencies. This reduces the size of wait.h rather significantly, by about 30%. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20sched/wait: Standardize 'struct wait_bit_queue' wait-queue entry field nameIngo Molnar
Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly name it as a wait-queue entry. Propagate it to a couple of usage sites where the wait-bit-queue internals are exposed. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20sched/wait: Rename wait_queue_t => wait_queue_entry_tIngo Molnar
Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-19cifs: use get_random_u32 for 32-bit lock randomJason A. Donenfeld
Using get_random_u32 here is faster, more fitting of the use case, and just as cryptographically secure. It also has the benefit of providing better randomness at early boot, which is sometimes when this is used. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-19xfs: separate function to check if inode shares extentsDarrick J. Wong
Separate the "clear reflink flag" function into one function that checks if the flag is needed, and a second function that checks and clears the flag. The inode scrub code will want to check the necessity of the flag without clearing it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-19xfs: reflink find shared should take a transactionDarrick J. Wong
Adapt _reflink_find_shared to take an optional transaction pointer. The inode scrubber code will need to decide (within transaction context) if a file has shared blocks. To avoid buffer deadlocks, we must pass the tp through to this function's utility calls. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-19xfs: check if an inode is cached and allocatedDarrick J. Wong
Check the inode cache for a particular inode number. If it's in the cache, check that it's not currently being reclaimed. If it's not being reclaimed, return zero if the inode is allocated. This function will be used by various scrubbers to decide if the cache is more up to date than the disk in terms of checking if an inode is allocated. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-19xfs: export _inobt_btrec_to_irec and _ialloc_cluster_alignment for scrubDarrick J. Wong
Create a function to extract an in-core inobt record from a generic btree_rec union so that scrub will be able to check inobt records and check inode block alignment. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-19xfs: plumb in needed functions for range querying of various btreesDarrick J. Wong
Plumb in the pieces (init_high_key, diff_two_keys) necessary to call query_range on the inode space and block mapping btrees and to extract raw btree records. This will eventually be used by the inobt and bmbt scrubbers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>