summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2019-09-16rbd: pull rbd_img_request_create() dout out into the callersIlya Dryomov
Make it more informative: log op_type, offset and length for block layer requests and initiating obj_req for child requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-09-16rbd: fix response length parameter for encoded stringsDongsheng Yang
rbd_dev_image_id() allocates space for length but passes a smaller value to rbd_obj_method_sync(). rbd_dev_v2_object_prefix() doesn't allocate space for length. Fix both to be consistent. Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-09-11null_blk: validate the number of devicesAndré Almeida
A negative number of devices is nonsensical, so change the type to unsigned. If the number of devices is 0, it is impossible for userspace to interact with the module, so refuse loading the driver for that case. Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-11null_blk: fix module name at log messageAndré Almeida
The name of the module is "null_blk", not "null". Make `pr_info()` follow the pattern of `pr_err()` log messages. Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05block: Set ELEVATOR_F_ZBD_SEQ_WRITE for nullblk zoned disksDamien Le Moal
Using the helper blk_queue_required_elevator_features(), set the elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request queue of null_blk devices created with zoned mode enabled. This feature requirement can always be satisfied as the mq-deadline elevator is always selected for in-kernel compilation when CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-04paride/pcd: need to check if cd->disk is null in pcd_detectzhengbin
If alloc_disk fails in pcd_init_units, cd->disk & pi are empty, we need to check if cd->disk is null in pcd_detect. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-04paride/pcd: need to set queue to NULL before put_diskzhengbin
In pcd_init_units, if blk_mq_init_sq_queue fails, need to set queue to NULL before put_disk, otherwise null-ptr-deref Read will occur. put_disk kobject_put disk_release blk_put_queue(disk->queue) Fixes: f0d176255401 ("paride/pcd: Fix potential NULL pointer dereference and mem leak") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-04paride/pf: need to set queue to NULL before put_diskzhengbin
In pf_init_units, if blk_mq_init_sq_queue fails, need to set queue to NULL before put_disk, otherwise null-ptr-deref Read will occur. put_disk kobject_put disk_release blk_put_queue(disk->queue) Fixes: 77218ddf46d8 ("paride: convert pf to blk-mq") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28rbd: restore zeroing past the overlap when reading from parentIlya Dryomov
The parent image is read only up to the overlap point, the rest of the buffer should be zeroed. This snuck in because as it turns out the overlap test case has not been triggering this code path for a while now. Fixes: a9b67e69949d ("rbd: replace obj_req->tried_parent with obj_req->read_state") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-08-23null_blk: fix inline misuseJens Axboe
You can't magically mark a function inline and expect that to work. Fixes: fceb5d1b19cb ("null_blk: create a helper for zoned devices") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for req completionChaitanya Kulkarni
This patch creates a helper function for handling the request completion in the null_handle_cmd(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for zoned devicesChaitanya Kulkarni
This patch creates a helper function for handling zoned block device operations. This patch also restructured the code for null_blk_zoned.c and uses the pattern to return blk_status_t and catch the error in the function null_handle_cmd() into cmd->error variable instead of setting it up in the deeper layer just like the way it is done for flush, badblocks and memory backed case in the null_handle_cmd(). We also move null_handle_zoned() to the null_blk_zoned.c to keep the zoned code separate. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for mem-backed opsChaitanya Kulkarni
This patch creates a helper for handling requests when null_blk is memory backed in the null_handle_cmd(). Although the helper is very simple right now, it makes the code flow consistent with the rest of code in the null_handle_cmd() and provides a uniform code structure for future code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for badblocksChaitanya Kulkarni
This patch creates a helper for handling badblocks code in the null_handle_cmd(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for throttlingChaitanya Kulkarni
This patch creates a helper for handling throttling code in the null_handle_cmd(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: move duplicate code to callersChaitanya Kulkarni
This is a preparation patch which moves the duplicate code for sectors and nr_sectors calculations for bio vs request mode into their respective callers (null_queue_bio(), null_qeueue_req()). Now the core function only deals with the respective actions and commands instead of having to calculte the bio vs req operations and different sector related variables. We also move the flush command handling at the top which significantly simplifies the rest of the code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20nbd: fix max number of supported devsMike Christie
This fixes a bug added in 4.10 with commit: commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024 Author: Josef Bacik <jbacik@fb.com> Date: Tue Nov 22 14:04:40 2016 -0500 nbd: add multi-connection support that limited the number of devices to 256. Before the patch we could create 1000s of devices, but the patch switched us from using our own thread to using a work queue which has a default limit of 256 active works. The problem is that our recv_work function sits in a loop until disconnection but only handles IO for one connection. The work is started when the connection is started/restarted, but if we end up creating 257 or more connections, the queue_work call just queues connection257+'s recv_work and that waits for connection 1 - 256's recv_work to be disconnected and that work instance completing. Instead of reverting back to kthreads, this has us allocate a workqueue_struct per device, so we can block in the work. Cc: stable@vger.kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20nbd: fix zero cmd timeout handling v2Mike Christie
This fixes a regression added in 4.9 with commit: commit 0eadf37afc2500e1162c9040ec26a705b9af8d47 Author: Josef Bacik <jbacik@fb.com> Date: Thu Sep 8 12:33:40 2016 -0700 nbd: allow block mq to deal with timeouts where before the patch userspace would set the timeout to 0 to disable it. With the above patch, a zero timeout tells the block layer to use the default value of 30 seconds. For setups where commands can take a long time or experience transient issues like network disruptions this then results in IO errors being sent to the application. To fix this, the patch still uses the common block layer timeout framework, but if zero is set, nbd just logs a message and then resets the timer when it expires. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20nbd: add missing config putMike Christie
Fix bug added with the patch: commit 8f3ea35929a0806ad1397db99a89ffee0140822a Author: Josef Bacik <josef@toxicpanda.com> Date: Mon Jul 16 12:11:35 2018 -0400 nbd: handle unexpected replies better where if the timeout handler runs when the completion path is and we fail to grab the mutex in the timeout handler we will leave a config reference and cannot free the config later. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20nbd: add function to convert blk req op to nbd cmdMike Christie
This adds a helper function to convert a block req op to a nbd cmd type. It will be used in the last patch to log the type in the timeout handler. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-20nbd: add set cmd timeout helperMike Christie
Add a helper to set the cmd timeout. It does not really do a lot now, but will be more useful in the next patches. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-19Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kernel thread signal handling fix from Eric Biederman: "I overlooked the fact that kernel threads are created with all signals set to SIG_IGN, and accidentally caused a regression in cifs and drbd when replacing force_sig with send_sig. This is my fix for that regression. I add a new function allow_kernel_signal which allows kernel threads to receive signals sent from the kernel, but continues to ignore all signals sent from userspace. This ensures the user space interface for cifs and drbd remain the same. These kernel threads depend on blocking networking calls which block until something is received or a signal is pending. Making receiving of signals somewhat necessary for these kernel threads. Perhaps someday we can cleanup those interfaces and remove allow_kernel_signal. If not allow_kernel_signal is pretty trivial and clearly documents what is going on so I don't think we will mind carrying it" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Allow cifs and drbd to receive their terminating signals
2019-08-19signal: Allow cifs and drbd to receive their terminating signalsEric W. Biederman
My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-08-12xen/blkback: fix memory leaksWenwen Wang
In read_per_ring_refs(), after 'req' and related memory regions are allocated, xen_blkif_map() is invoked to map the shared frame, irq, and etc. However, if this mapping process fails, no cleanup is performed, leading to memory leaks. To fix this issue, invoke the cleanup before returning the error. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-09floppy: fix usercopy directionJann Horn
As sparse points out, these two copy_from_user() should actually be copy_to_user(). Fixes: 229b53c9bf4e ("take floppy compat ioctls to sodding floppy.c") Cc: stable@vger.kernel.org Acked-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08loop: Add LOOP_SET_DIRECT_IO to compat ioctlAlessio Balsini
Enabling Direct I/O with loop devices helps reducing memory usage by avoiding double caching. 32 bit applications running on 64 bits systems are currently not able to request direct I/O because is missing from the lo_compat_ioctl. This patch fixes the compatibility issue mentioned above by exporting LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry. The input argument for this ioctl is a single long converted to a 1-bit boolean, so compatibility is preserved. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08loop: set PF_MEMALLOC_NOIO for the worker threadMikulas Patocka
A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08block: aoe: Fix kernel crash due to atomic sleep when exitingHe Zhe
Since commit 3582dd291788 ("aoe: convert aoeblk to blk-mq"), aoedev_downdev has had the possibility of sleeping and causing the following crash. BUG: scheduling while atomic: rmmod/2242/0x00000003 Modules linked in: aoe Preemption disabled at: [<ffffffffc01d95e5>] flush+0x95/0x4a0 [aoe] CPU: 7 PID: 2242 Comm: rmmod Tainted: G I 5.2.3 #1 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009 Call Trace: dump_stack+0x4f/0x6a ? flush+0x95/0x4a0 [aoe] __schedule_bug.cold+0x44/0x54 __schedule+0x44f/0x680 schedule+0x44/0xd0 blk_mq_freeze_queue_wait+0x46/0xb0 ? wait_woken+0x80/0x80 blk_mq_freeze_queue+0x1b/0x20 aoedev_downdev+0x111/0x160 [aoe] flush+0xff/0x4a0 [aoe] aoedev_exit+0x23/0x30 [aoe] aoe_exit+0x35/0x948 [aoe] __se_sys_delete_module+0x183/0x210 __x64_sys_delete_module+0x16/0x20 do_syscall_64+0x4d/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f24e0043b07 Code: 73 01 c3 48 8b 0d 89 73 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 73 0b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe18f7f1e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f24e0043b07 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555c3ecf87c8 RBP: 00007ffe18f7f1f0 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f24e00b4ac0 R11: 0000000000000206 R12: 00007ffe18f7f238 R13: 00007ffe18f7f410 R14: 00007ffe18f80e73 R15: 0000555c3ecf8760 This patch, handling in the same way of pass two, unlocks the locks and restart pass one after aoedev_downdev is done. Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq") Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04null_blk: implement REQ_OP_ZONE_RESET_ALLChaitanya Kulkarni
This patch implements newly introduced zone reset all operation for null_blk driver. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-31nbd: replace kill_bdev() with __invalidate_device() againMunehisa Kamata
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()") once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere") resurrected kill_bdev() and it has been there since then. So buffer_head mappings still get killed on a server disconnection, and we can still hit the BUG_ON on a filesystem on the top of the nbd device. EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null) block nbd0: Receive control failed (result -32) block nbd0: shutting down sockets print_req_error: I/O error, dev nbd0, sector 66264 flags 3000 EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block print_req_error: I/O error, dev nbd0, sector 2264 flags 3000 EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure ------------[ cut here ]------------ kernel BUG at fs/buffer.c:3057! invalid opcode: 0000 [#1] SMP PTI CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4 Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:submit_bh_wbc+0x18b/0x190 ... Call Trace: jbd2_write_superblock+0xf1/0x230 [jbd2] ? account_entity_enqueue+0xc5/0xf0 jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2] jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2] ? __switch_to_asm+0x40/0x70 ... ? lock_timer_base+0x67/0x80 kjournald2+0x121/0x360 [jbd2] ? remove_wait_queue+0x60/0x60 kthread+0xf8/0x130 ? commit_timeout+0x10/0x10 [jbd2] ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 With __invalidate_device(), I no longer hit the BUG_ON with sync or unmount on the disconnected device. Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere") Cc: linux-block@vger.kernel.org Cc: Ratna Manoj Bolla <manoj.br@gmail.com> Cc: nbd@other.debian.org Cc: stable@vger.kernel.org Cc: David Woodhouse <dwmw@amazon.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-30loop: Fix mount(2) failure due to race with LOOP_SET_FDJan Kara
Commit 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") made LOOP_SET_FD ioctl acquire exclusive block device reference while it updates loop device binding. However this can make perfectly valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding temporarily the exclusive bdev reference in cases like this: for i in {a..z}{a..z}; do dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024 mkfs.ext2 $i.image mkdir mnt$i done echo "Run" for i in {a..z}{a..z}; do mount -o loop -t ext2 $i.image mnt$i & done Fix the problem by not getting full exclusive bdev reference in LOOP_SET_FD but instead just mark the bdev as being claimed while we update the binding information. This just blocks new exclusive openers instead of failing them with EBUSY thus fixing the problem. Fixes: 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") Cc: stable@vger.kernel.org Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-29ataflop: Mark expected switch fall-throughGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warning (Building: m68k): drivers/block/ataflop.c: In function ‘fd_locked_ioctl’: drivers/block/ataflop.c:1728:3: warning: this statement may fall through [-Wimplicit-fallthrough=] set_capacity(floppy->disk, MAX_DISK_SIZE * 2); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/block/ataflop.c:1729:2: note: here case FDFMTEND: ^~~~ Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-26Merge tag 'for-linus-20190726' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Several io_uring fixes/improvements: - Blocking fix for O_DIRECT (me) - Latter page slowness for registered buffers (me) - Fix poll hang under certain conditions (me) - Defer sequence check fix for wrapped rings (Zhengyuan) - Mismatch in async inc/dec accounting (Zhengyuan) - Memory ordering issue that could cause stall (Zhengyuan) - Track sequential defer in bytes, not pages (Zhengyuan) - NVMe pull request from Christoph - Set of hang fixes for wbt (Josef) - Redundant error message kill for libahci (Ding) - Remove unused blk_mq_sched_started_request() and related ops (Marcos) - drbd dynamic alloc shash descriptor to reduce stack use (Arnd) - blkcg ->pd_stat() non-debug print (Tejun) - bcache memory leak fix (Wei) - Comment fix (Akinobu) - BFQ perf regression fix (Paolo) * tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits) io_uring: ensure ->list is initialized for poll commands Revert "nvme-pci: don't create a read hctx mapping without read queues" nvme: fix multipath crash when ANA is deactivated nvme: fix memory leak caused by incorrect subsystem free nvme: ignore subnqn for ADATA SX6000LNP drbd: dynamically allocate shash descriptor block: blk-mq: Remove blk_mq_sched_started_request and started_request bcache: fix possible memory leak in bch_cached_dev_run() io_uring: track io length in async_list based on bytes io_uring: don't use iov_iter_advance() for fixed buffers block: properly handle IOCB_NOWAIT for async O_DIRECT IO blk-mq: allow REQ_NOWAIT to return an error inline io_uring: add a memory barrier before atomic_read rq-qos: use a mb for got_token rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule rq-qos: don't reset has_sleepers on spurious wakeups rq-qos: fix missed wake-ups in rq_qos_throttle wait: add wq_has_single_sleeper helper block, bfq: check also in-flight I/O in dispatch plugging block: fix sysfs module parameters directory path in comment ...
2019-07-23drbd: dynamically allocate shash descriptorArnd Bergmann
Building with clang and KASAN, we get a warning about an overly large stack frame on 32-bit architectures: drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect' [-Werror,-Wframe-larger-than=] We already allocate other data dynamically in this function, so just do the same for the shash descriptor, which makes up most of this memory. Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/ Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "Lots of exciting things this time! - support for rbd object-map and fast-diff features (myself). This will speed up reads, discards and things like snap diffs on sparse images. - ceph.snap.btime vxattr to expose snapshot creation time (David Disseldorp). This will be used to integrate with "Restore Previous Versions" feature added in Windows 7 for folks who reexport ceph through SMB. - security xattrs for ceph (Zheng Yan). Only selinux is supported for now due to the limitations of ->dentry_init_security(). - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff Layton). This is actually a single feature bit which was missing because of the filesystem pieces. With this in, the kernel client will finally be reported as "luminous" by "ceph features" -- it is still being reported as "jewel" even though all required Luminous features were implemented in 4.13. - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention with xattrs is to not terminate and this was causing inconsistencies with ceph-fuse. - change filesystem time granularity from 1 us to 1 ns, again fixing an inconsistency with ceph-fuse (Luis Henriques). On top of this there are some additional dentry name handling and cap flushing fixes from Zheng. Finally, Jeff is formally taking over for Zheng as the filesystem maintainer" * tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits) ceph: fix end offset in truncate_inode_pages_range call ceph: use generic_delete_inode() for ->drop_inode ceph: use ceph_evict_inode to cleanup inode's resource ceph: initialize superblock s_time_gran to 1 MAINTAINERS: take over for Zheng as CephFS kernel client maintainer rbd: setallochint only if object doesn't exist rbd: support for object-map and fast-diff rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe() libceph: export osd_req_op_data() macro libceph: change ceph_osdc_call() to take page vector for response libceph: bump CEPH_MSG_MAX_DATA_LEN (again) rbd: new exclusive lock wait/wake code rbd: quiescing lock should wait for image requests rbd: lock should be quiesced on reacquire rbd: introduce copyup state machine rbd: rename rbd_obj_setup_*() to rbd_obj_init_*() rbd: move OSD request allocation into object request state machines rbd: factor out __rbd_osd_setup_discard_ops() rbd: factor out rbd_osd_setup_copyup() rbd: introduce obj_req->osd_reqs list ...
2019-07-18Merge branch 'floppy'Linus Torvalds
Merge floppy ioctl verification fixes from Denis Efremov. This also marks the floppy driver as orphaned - it turns out that Jiri no longer has working hardware. Actual working physical floppy hardware is getting hard to find, and while Willy was able to test this, I think the driver can be considered pretty much dead from an actual hardware standpoint. The hardware that is still sold seems to be mainly USB-based, which doesn't use this legacy driver at all. The old floppy disk controller is still emulated in various VM environments, so the driver isn't going away, but let's see if anybody is interested to step up to maintain it. The lack of hardware also likely means that the ioctl range verification fixes are probably mostly relevant to anybody using floppies in a virtual environment. Which is probably also going away in favor of USB storage emulation, but who knows. Will Decon reviewed the patches but I'm not rebasing them just for that, so I'll add a Reviewed-by: Will Deacon <will@kernel.org> here instead. * floppy: MAINTAINERS: mark floppy.c orphaned floppy: fix out-of-bounds read in copy_buffer floppy: fix invalid pointer dereference in drive_name floppy: fix out-of-bounds read in next_valid_format floppy: fix div-by-zero in setup_format_params
2019-07-17floppy: fix out-of-bounds read in copy_bufferDenis Efremov
This fixes a global out-of-bounds read access in the copy_buffer function of the floppy driver. The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect and head fields (unsigned int) of the floppy_drive structure are used to compute the max_sector (int) in the make_raw_rw_request function. It is possible to overflow the max_sector. Next, max_sector is passed to the copy_buffer function and used in one of the memcpy calls. An unprivileged user could trigger the bug if the device is accessible, but requires a floppy disk to be inserted. The patch adds the check for the .sect * .head multiplication for not overflowing in the set_geometry function. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17floppy: fix invalid pointer dereference in drive_nameDenis Efremov
This fixes the invalid pointer dereference in the drive_name function of the floppy driver. The native_format field of the struct floppy_drive_params is used as floppy_type array index in the drive_name function. Thus, the field should be checked the same way as the autodetect field. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should be used to call the drive_name. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for a value of the native_format field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17floppy: fix out-of-bounds read in next_valid_formatDenis Efremov
This fixes a global out-of-bounds read access in the next_valid_format function of the floppy driver. The values from autodetect field of the struct floppy_drive_params are used as indices for the floppy_type array in the next_valid_format function 'floppy_type[DP->autodetect[probed_format]].sect'. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for values of the autodetect field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17floppy: fix div-by-zero in setup_format_paramsDenis Efremov
This fixes a divide by zero error in the setup_format_params function of the floppy driver. Two consecutive ioctls can trigger the bug: The first one should set the drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK to become zero. Next, the floppy format operation should be called. A floppy disk is not required to be inserted. An unprivileged user could trigger the bug if the device is accessible. The patch checks F_SECT_PER_TRACK for a non-zero value in the set_geometry function. The proper check should involve a reasonable upper limit for the .sect and .rate fields, but it could change the UAPI. The patch also checks F_SECT_PER_TRACK in the setup_format_params, and cancels the formatting operation in case of zero. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16Merge tag 'docs/v5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull rst conversion of docs from Mauro Carvalho Chehab: "As agreed with Jon, I'm sending this big series directly to you, c/c him, as this series required a special care, in order to avoid conflicts with other trees" * tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits) docs: kbuild: fix build with pdf and fix some minor issues docs: block: fix pdf output docs: arm: fix a breakage with pdf output docs: don't use nested tables docs: gpio: add sysfs interface to the admin-guide docs: locking: add it to the main index docs: add some directories to the main documentation index docs: add SPDX tags to new index files docs: add a memory-devices subdir to driver-api docs: phy: place documentation under driver-api docs: serial: move it to the driver-api docs: driver-api: add remaining converted dirs to it docs: driver-api: add xilinx driver API documentation docs: driver-api: add a series of orphaned documents docs: admin-guide: add a series of orphaned documents docs: cgroup-v1: add it to the admin-guide book docs: aoe: add it to the driver-api book docs: add some documentation dirs to the driver-api book docs: driver-model: move it to the driver-api book docs: lp855x-driver.rst: add it to the driver-api book ...
2019-07-15Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block updates from Jens Axboe: "A later pull request with some followup items. I had some vacation coming up to the merge window, so certain things items were delayed a bit. This pull request also contains fixes that came in within the last few days of the merge window, which I didn't want to push right before sending you a pull request. This contains: - NVMe pull request, mostly fixes, but also a few minor items on the feature side that were timing constrained (Christoph et al) - Report zones fixes (Damien) - Removal of dead code (Damien) - Turn on cgroup psi memstall (Josef) - block cgroup MAINTAINERS entry (Konstantin) - Flush init fix (Josef) - blk-throttle low iops timing fix (Konstantin) - nbd resize fixes (Mike) - nbd 0 blocksize crash fix (Xiubo) - block integrity error leak fix (Wenwen) - blk-cgroup writeback and priority inheritance fixes (Tejun)" * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits) MAINTAINERS: add entry for block io cgroup null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED block: Limit zone array allocation size sd_zbc: Fix report zones buffer allocation block: Kill gfp_t argument of blkdev_report_zones() block: Allow mapping of vmalloc-ed buffers block/bio-integrity: fix a memory leak bug nvme: fix NULL deref for fabrics options nbd: add netlink reconfigure resize support nbd: fix crash when the blksize is zero block: Disable write plugging for zoned block devices block: Fix elevator name declaration block: Remove unused definitions nvme: fix regression upon hot device removal and insertion blk-throttle: fix zero wait time for iops throttled group block: Fix potential overflow in blk_report_zones() blkcg: implement REQ_CGROUP_PUNT blkcg, writeback: Implement wbc_blkcg_css() blkcg, writeback: Add wbc->no_cgroup_owner blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ...
2019-07-15docs: blockdev: add it to the admin-guideMauro Carvalho Chehab
The blockdev book basically contains user-faced documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15docs: blockdev: convert to ReSTMauro Carvalho Chehab
Rename the blockdev documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. The drbd sub-directory contains some graphs and data flows. Add those too to the documentation. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-12null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONEDJens Axboe
A previous commit changed the prototype, but didn't adjust the function for when zoned device support is disabled. Fix it up. Fixes: bd976e527259 ("block: Kill gfp_t argument of blkdev_report_zones()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-11block: Kill gfp_t argument of blkdev_report_zones()Damien Le Moal
Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In preparation of using vmalloc() for large report buffer and zone array allocations used by this function, remove its "gfp_t gfp_mask" argument and rely on the caller context to use memalloc_noio_save/restore() where necessary (block layer zone revalidation and dm-zoned I/O error path). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10nbd: add netlink reconfigure resize supportMike Christie
If the device is setup with ioctl we can resize the device after the initial setup, but if the device is setup with netlink we cannot use the resize related ioctls and there is no netlink reconfigure size ATTR handling code. This patch adds netlink reconfigure resize support to match the ioctl interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10nbd: fix crash when the blksize is zeroXiubo Li
This will allow the blksize to be set zero and then use 1024 as default. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> [fix to use goto out instead of return in genl_connect] Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-09Merge tag 'docs-5.3' of git://git.lwn.net/linuxLinus Torvalds
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
2019-07-09Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "This is the main block updates for 5.3. Nothing earth shattering or major in here, just fixes, additions, and improvements all over the map. This contains: - Series of documentation fixes (Bart) - Optimization of the blk-mq ctx get/put (Bart) - null_blk removal race condition fix (Bob) - req/bio_op() cleanups (Chaitanya) - Series cleaning up the segment accounting, and request/bio mapping (Christoph) - Series cleaning up the page getting/putting for bios (Christoph) - block cgroup cleanups and moving it to where it is used (Christoph) - block cgroup fixes (Tejun) - Series of fixes and improvements to bcache, most notably a write deadlock fix (Coly) - blk-iolatency STS_AGAIN and accounting fixes (Dennis) - Series of improvements and fixes to BFQ (Douglas, Paolo) - debugfs_create() return value check removal for drbd (Greg) - Use struct_size(), where appropriate (Gustavo) - Two lighnvm fixes (Heiner, Geert) - MD fixes, including a read balance and corruption fix (Guoqing, Marcos, Xiao, Yufen) - block opal shadow mbr additions (Jonas, Revanth) - sbitmap compare-and-exhange improvemnts (Pavel) - Fix for potential bio->bi_size overflow (Ming) - NVMe pull requests: - improved PCIe suspent support (Keith Busch) - error injection support for the admin queue (Akinobu Mita) - Fibre Channel discovery improvements (James Smart) - tracing improvements including nvmetc tracing support (Minwoo Im) - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya Kulkarni)" - Various little fixes and improvements to drivers and core" * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits) blk-iolatency: fix STS_AGAIN handling block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES blk-mq: simplify blk_mq_make_request() blk-mq: remove blk_mq_put_ctx() sbitmap: Replace cmpxchg with xchg block: fix .bi_size overflow block: sed-opal: check size of shadow mbr block: sed-opal: ioctl for writing to shadow mbr block: sed-opal: add ioctl for done-mark of shadow mbr block: never take page references for ITER_BVEC direct-io: use bio_release_pages in dio_bio_complete block_dev: use bio_release_pages in bio_unmap_user block_dev: use bio_release_pages in blkdev_bio_end_io iomap: use bio_release_pages in iomap_dio_bio_end_io block: use bio_release_pages in bio_map_user_iov block: use bio_release_pages in bio_unmap_user block: optionally mark pages dirty in bio_release_pages block: move the BIO_NO_PAGE_REF check into bio_release_pages block: skd_main.c: Remove call to memset after dma_alloc_coherent block: mtip32xx: Remove call to memset after dma_alloc_coherent ...