linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2019-10-25	nbd: protect cmd->status with cmd->lock	Josef Bacik
	We already do this for the most part, except in timeout and clear_req. For the timeout case we take the lock after we grab a ref on the config, but that isn't really necessary because we're safe to touch the cmd at this point, so just move the order around. For the clear_req cause this is initiated by the user, so again is safe. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	block: reorder bio::__bi_remaining for better packing	David Sterba
	Simple reordering of __bi_remaining can reduce bio size by 8 bytes that are now wasted on padding (measured on x86_64): struct bio { struct bio * bi_next; /* 0 8 / struct gendisk bi_disk; /* 8 8 / unsigned int bi_opf; / 16 4 / short unsigned int bi_flags; / 20 2 / short unsigned int bi_ioprio; / 22 2 / short unsigned int bi_write_hint; / 24 2 / blk_status_t bi_status; / 26 1 / u8 bi_partno; / 27 1 / / XXX 4 bytes hole, try to pack / struct bvec_iter bi_iter; / 32 24 / / XXX last struct has 4 bytes of padding / atomic_t __bi_remaining; / 56 4 / / XXX 4 bytes hole, try to pack / [...] / size: 104, cachelines: 2, members: 19 / / sum members: 96, holes: 2, sum holes: 8 / / paddings: 1, sum paddings: 4 / / last cacheline: 40 bytes / }; Now becomes: struct bio { struct bio bi_next; /* 0 8 / struct gendisk bi_disk; /* 8 8 / unsigned int bi_opf; / 16 4 / short unsigned int bi_flags; / 20 2 / short unsigned int bi_ioprio; / 22 2 / short unsigned int bi_write_hint; / 24 2 / blk_status_t bi_status; / 26 1 / u8 bi_partno; / 27 1 / atomic_t __bi_remaining; / 28 4 / struct bvec_iter bi_iter; / 32 24 / / XXX last struct has 4 bytes of padding / [...] / size: 96, cachelines: 2, members: 19 / / paddings: 1, sum paddings: 4 / / last cacheline: 32 bytes */ }; Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	block: Reduce the amount of memory used for tag sets	Bart Van Assche
	Instead of allocating an array of size nr_cpu_ids for set->tags, allocate an array of size set->nr_hw_queues. This patch improves behavior that was introduced by commit 868f2f0b7206 ("blk-mq: dynamic h/w context count"). Reallocating tag sets from inside __blk_mq_update_nr_hw_queues() is safe because: - All request queues that share the tag sets are frozen before the tag sets are reallocated. - blk_mq_queue_tag_busy_iter() holds q->q_usage_counter while active and hence is serialized against __blk_mq_update_nr_hw_queues(). Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	block: Reduce the amount of memory required per request queue	Bart Van Assche
	Instead of always allocating at least nr_cpu_ids hardware queues per request queue, reallocate q->queue_hw_ctx if it has to grow. This patch improves behavior that was introduced by commit 868f2f0b7206 ("blk-mq: dynamic h/w context count"). Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()	Bart Van Assche
	Since the blk_mq_{,un}freeze_queue() calls in __blk_mq_update_nr_hw_queues() already serialize __blk_mq_update_nr_hw_queues() against blk_mq_queue_tag_busy_iter(), the synchronize_rcu() call in __blk_mq_update_nr_hw_queues() is not necessary. Hence remove it. Note: the synchronize_rcu() call in __blk_mq_update_nr_hw_queues() was introduced by commit f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter"). Commit 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") removed the rcu_read_{,un}lock() calls that correspond to the synchronize_rcu() call in __blk_mq_update_nr_hw_queues(). Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	Merge tag 'modules-for-v5.4-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules fixes from Jessica Yu: - Revert __ksymtab_$namespace.$symbol naming scheme back to __ksymtab_$symbol, as it was causing issues with depmod. Instead, have modpost extract a symbol's namespace from __kstrtabns and __ksymtab_strings. - Fix `make nsdeps` for out of tree kernel builds (make O=...) caused by unescaped '/'. Use a different sed delimiter to avoid this problem. * tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: scripts/nsdeps: use alternative sed delimiter symbol namespaces: revert to previous __ksymtab name scheme modpost: make updating the symbol namespace explicit modpost: delegate updating namespaces to separate function
2019-10-25	Merge tag 'armsoc-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "A slightly larger set of fixes have accrued in the last two weeks. Mostly a collection of the usual smaller fixes: - Marvell Armada: USB phy setup issues on Turris Mox - Broadcom: GPIO/pinmux DT mapping corrections for Stingray, MMC bus width fix for RPi Zero W, GPIO LED removal for RPI CM3. Also some maintainer updates. - OMAP: Fixlets for display config, interrupt settings for wifi, some clock/PM pieces. Also IOMMU regression fix and a ti-sysc no-watchdog regression fix. - i.MX: A few fixes around PM/settings, some devicetree fixlets and catching up with config option changes in DRM - Rockchip: RockRro64 misc DT fixups, Hugsun X99 USB-C, Kevin display panel settings ... and some smaller fixes for Davinci (backlight, McBSP DMA), Allwinner (phy regulators, PMU removal on A64, etc)" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits) ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157 MAINTAINERS: Update the Spreadtrum SoC maintainer MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue bus: ti-sysc: Fix watchdog quirk handling ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs ARM: davinci_all_defconfig: enable GPIO backlight ARM: davinci: dm365: Fix McBSP dma_slave_map entry ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk ARM: dts: imx7s: Correct GPT's ipg clock source ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect' ARM: dts: imx6q-logicpd: Re-Enable SNVS power key arm64: dts: lx2160a: Correct CPU core idle state name mailmap: Add Simon Arlott (replacement for expired email address) arm64: dts: rockchip: Fix override mode for rk3399-kevin panel ...
2019-10-25	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM fixes from Paolo Bonzini: "Bugfixes for ARM, PPC and x86, plus selftest improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Don't leak L1 MMIO regions to L2 KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update kvm: clear kvmclock MSR on reset KVM: x86: fix bugon.cocci warnings KVM: VMX: Remove specialized handling of unexpected exit-reasons selftests: kvm: fix sync_regs_test with newer gccs selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supported selftests: kvm: consolidate VMX support checks selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twice KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled selftests: kvm: synchronize .gitignore to Makefile kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID KVM: arm64: pmu: Reset sample period on overflow handling KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systems KVM: arm64: pmu: Fix cycle counter truncation KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
2019-10-25	Merge tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Quiet week this week, which I suspect means some people just didn't get around to sending me fixes pulls in time. This has 2 komeda and a bunch of amdgpu fixes in it: komeda: - typo fixes - flushing pipes fix amdgpu: - Fix suspend/resume issue related to multi-media engines - Fix memory leak in user ptr code related to hmm conversion - Fix possible VM faults when allocating page table memory - Fix error handling in bo list ioctl" * tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm: drm/komeda: Fix typos in komeda_splitter_validate drm/komeda: Don't flush inactive pipes drm/amdgpu/vce: fix allocation size in enc ring test drm/amdgpu: fix error handling in amdgpu_bo_list_create drm/amdgpu: fix potential VM faults drm/amdgpu: user pages array memory leak fix drm/amdgpu/vcn: fix allocation size in enc ring test drm/amdgpu/uvd7: fix allocation size in enc ring test (v2) drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)
2019-10-25	Merge tag 'mmc-v5.4-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC host fixes: - mxs: Fix flags passed to dmaengine_prep_slave_sg - cqhci: Add a missing memory barrier - sdhci-omap: Fix tuning procedure for temperatures < -20C" * tag 'mmc-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: mxs: fix flags passed to dmaengine_prep_slave_sg mmc: cqhci: Commit descriptors before setting the doorbell mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C
2019-10-25	Btrfs: fix race leading to metadata space leak after task received signal	Filipe Manana
	When a task that is allocating metadata needs to wait for the async reclaim job to process its ticket and gets a signal (because it was killed for example) before doing the wait, the task ends up erroring out but with space reserved for its ticket, which never gets released, resulting in a metadata space leak (more specifically a leak in the bytes_may_use counter of the metadata space_info object). Here's the sequence of steps leading to the space leak: 1) A task tries to create a file for example, so it ends up trying to start a transaction at btrfs_create(); 2) The filesystem is currently in a state where there is not enough metadata free space to satisfy the transaction's needs. So at space-info.c:__reserve_metadata_bytes() we create a ticket and add it to the list of tickets of the space info object. Also, because the metadata async reclaim job is not running, we queue a job ro run metadata reclaim; 3) In the meanwhile the task receives a signal (like SIGTERM from a kill command for example); 4) After queing the async reclaim job, at __reserve_metadata_bytes(), we unlock the metadata space info and call handle_reserve_ticket(); 5) That last function calls wait_reserve_ticket(), which acquires the lock from the metadata space info. Then in the first iteration of its while loop, it calls prepare_to_wait_event(), which returns -ERESTARTSYS because the task has a pending signal. As a result, we set the error field of the ticket to -EINTR and exit the while loop without deleting the ticket from the list of tickets (in the space info object). After exiting the loop we unlock the space info; 6) The async reclaim job is able to release enough metadata, acquires the metadata space info's lock and then reserves space for the ticket, since the ticket is still in the list of (non-priority) tickets. The space reservation happens at btrfs_try_granting_tickets(), called from maybe_fail_all_tickets(). This increments the bytes_may_use counter from the metadata space info object, sets the ticket's bytes field to zero (meaning success, that space was reserved) and removes it from the list of tickets; 7) wait_reserve_ticket() returns, with the error field of the ticket set to -EINTR. Then handle_reserve_ticket() just propagates that error to the caller. Because an error was returned, the caller does not release the reserved space, since the expectation is that any error means no space was reserved. Fix this by removing the ticket from the list, while holding the space info lock, at wait_reserve_ticket() when prepare_to_wait_event() returns an error. Also add some comments and an assertion to guarantee we never end up with a ticket that has an error set and a bytes counter field set to zero, to more easily detect regressions in the future. This issue could be triggered sporadically by some test cases from fstests such as generic/269 for example, which tries to fill a filesystem and then kills fsstress processes running in the background. When this issue happens, we get a warning in syslog/dmesg when unmounting the filesystem, like the following: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 13240 at fs/btrfs/block-group.c:3186 btrfs_free_block_groups+0x314/0x470 [btrfs] (...) CPU: 0 PID: 13240 Comm: umount Tainted: G W L 5.3.0-rc8-btrfs-next-48+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_free_block_groups+0x314/0x470 [btrfs] (...) RSP: 0018:ffff9910c14cfdb8 EFLAGS: 00010286 RAX: 0000000000000024 RBX: ffff89cd8a4d55f0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff89cdf6a178a8 RDI: ffff89cdf6a178a8 RBP: ffff9910c14cfde8 R08: 0000000000000000 R09: 0000000000000001 R10: ffff89cd4d618040 R11: 0000000000000000 R12: ffff89cd8a4d5508 R13: ffff89cde7c4a600 R14: dead000000000122 R15: dead000000000100 FS: 00007f42754432c0(0000) GS:ffff89cdf6a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd25a47f730 CR3: 000000021f8d6006 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: close_ctree+0x1ad/0x390 [btrfs] generic_shutdown_super+0x6c/0x110 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0xa0 [btrfs] deactivate_locked_super+0x3a/0x70 cleanup_mnt+0xb4/0x160 task_work_run+0x7e/0xc0 exit_to_usermode_loop+0xfa/0x100 do_syscall_64+0x1cb/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f4274d2cb37 (...) RSP: 002b:00007ffcff701d38 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000557ebde2f060 RCX: 00007f4274d2cb37 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000557ebde2f240 RBP: 0000557ebde2f240 R08: 0000557ebde2f270 R09: 0000000000000015 R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f427522ee64 R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcff701fc0 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffffb12b561e>] copy_process+0x75e/0x1fd0 softirqs last enabled at (0): [<ffffffffb12b561e>] copy_process+0x75e/0x1fd0 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace bcf4b235461b26f6 ]--- BTRFS info (device sdb): space_info 4 has 19116032 free, is full BTRFS info (device sdb): space_info total=33554432, used=14176256, pinned=0, reserved=0, may_use=196608, readonly=65536 BTRFS info (device sdb): global_block_rsv: size 0 reserved 0 BTRFS info (device sdb): trans_block_rsv: size 0 reserved 0 BTRFS info (device sdb): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sdb): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sdb): delayed_refs_rsv: size 0 reserved 0 Fixes: 374bf9c5cd7d0b ("btrfs: unify error handling for ticket flushing") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-25	btrfs: tree-checker: Fix wrong check on max devid	Qu Wenruo
	[BUG] The following script will cause false alert on devid check. #!/bin/bash dev1=/dev/test/test dev2=/dev/test/scratch1 mnt=/mnt/btrfs umount $dev1 &> /dev/null umount $dev2 &> /dev/null umount $mnt &> /dev/null mkfs.btrfs -f $dev1 mount $dev1 $mnt _fail() { echo "!!! FAILED !!!" exit 1 } for ((i = 0; i < 4096; i++)); do btrfs dev add -f $dev2 $mnt \|\| _fail btrfs dev del $dev1 $mnt \|\| _fail dev_tmp=$dev1 dev1=$dev2 dev2=$dev_tmp done [CAUSE] Tree-checker uses BTRFS_MAX_DEVS() and BTRFS_MAX_DEVS_SYS_CHUNK() as upper limit for devid. But we can have devid holes just like above script. So the check for devid is incorrect and could cause false alert. [FIX] Just remove the whole devid check. We don't have any hard requirement for devid assignment. Furthermore, even devid could get corrupted by a bitflip, we still have dev extents verification at mount time, so corrupted data won't sneak in. This fixes fstests btrfs/194. Reported-by: Anand Jain <anand.jain@oracle.com> Fixes: ab4ba2e13346 ("btrfs: tree-checker: Verify dev item") CC: stable@vger.kernel.org # 5.2+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-25	btrfs: Consider system chunk array size for new SYSTEM chunks	Qu Wenruo
	For SYSTEM chunks, despite the regular chunk item size limit, there is another limit due to system chunk array size. The extra limit was removed in a refactoring, so add it back. Fixes: e3ecdb3fdecf ("btrfs: factor out devs_max setting in __btrfs_alloc_chunk") CC: stable@vger.kernel.org # 5.3+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-25	io_uring: fix bad inflight accounting for SETUP_IOPOLL\|SETUP_SQTHREAD	Jens Axboe
	We currently assume that submissions from the sqthread are successful, and if IO polling is enabled, we use that value for knowing how many completions to look for. But if we overflowed the CQ ring or some requests simply got errored and already completed, they won't be available for polling. For the case of IO polling and SQTHREAD usage, look at the pending poll list. If it ever hits empty then we know that we don't have anymore pollable requests inflight. For that case, simply reset the inflight count to zero. Reported-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	io_uring: used cached copies of sq->dropped and cq->overflow	Jens Axboe
	We currently use the ring values directly, but that can lead to issues if the application is malicious and changes these values on our behalf. Created in-kernel cached versions of them, and just overwrite the user side when we update them. This is similar to how we treat the sq/cq ring tail/head updates. Reported-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space	James Morse
	Compat user-space is unable to perform ICIMVAU instructions from user-space. Instead it uses a compat-syscall. Add the workaround for Neoverse-N1 #1542419 to this code path. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-25	arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419	James Morse
	Systems affected by Neoverse-N1 #1542419 support DIC so do not need to perform icache maintenance once new instructions are cleaned to the PoU. For the errata workaround, the kernel hides DIC from user-space, so that the unnecessary cache maintenance can be trapped by firmware. To reduce the number of traps, produce a fake IminLine value based on PAGE_SIZE. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-25	arm64: errata: Hide CTR_EL0.DIC on systems affected by Neoverse-N1 #1542419	James Morse
	Cores affected by Neoverse-N1 #1542419 could execute a stale instruction when a branch is updated to point to freshly generated instructions. To workaround this issue we need user-space to issue unnecessary icache maintenance that we can trap. Start by hiding CTR_EL0.DIC. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-25	arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()	Yunfeng Ye
	In cases like suspend-to-disk and suspend-to-ram, a large number of CPU cores need to be shut down. At present, the CPU hotplug operation is serialised, and the CPU cores can only be shut down one by one. In this process, if PSCI affinity_info() does not return LEVEL_OFF quickly, cpu_psci_cpu_kill() needs to wait for 10ms. If hundreds of CPU cores need to be shut down, it will take a long time. Normally, there is no need to wait 10ms in cpu_psci_cpu_kill(). So change the wait interval from 10 ms to max 1 ms and use usleep_range() instead of msleep() for more accurate timer. In addition, reducing the time interval will increase the messages output, so remove the "Retry ..." message, instead, track time and output to the the sucessful message. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-25	arm64: pgtable: Correct typo in comment	Mark Brown
	vmmemmap -> vmemmap Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-25	arm64: docs: cpu-feature-registers: Document ID_AA64PFR1_EL1	Dave Martin
	Commit d71be2b6c0e1 ("arm64: cpufeature: Detect SSBS and advertise to userspace") exposes ID_AA64PFR1_EL1 to userspace, but didn't update the documentation to match. Add it. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-25	arm64: cpufeature: Fix typos in comment	Shaokun Zhang
	Fix up one typos: CTR_E0 -> CTR_EL0 Cc: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-25	ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157	Patrice Chotard
	Relax qspi pins slew-rate to minimize peak currents. Fixes: 844030057339 ("ARM: dts: stm32: add flash nor support on stm32mp157c eval board") Link: https://lore.kernel.org/r/20191025130122.11407-1-alexandre.torgue@st.com Signed-off-by: Patrice Chotard <patrice.chotard@st.com> Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-25	io_uring: Fix race for sqes with userspace	Pavel Begunkov
	io_ring_submit() finalises with 1. io_commit_sqring(), which releases sqes to the userspace 2. Then calls to io_queue_link_head(), accessing released head's sqe Reorder them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	io_uring: Fix broken links with offloading	Pavel Begunkov
	io_sq_thread() processes sqes by 8 without considering links. As a result, links will be randomely subdivided. The easiest way to fix it is to call io_get_sqring() inside io_submit_sqes() as do io_ring_submit(). Downsides: 1. This removes optimisation of not grabbing mm_struct for fixed files 2. It submitting all sqes in one go, without finer-grained sheduling with cq processing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	io_uring: Fix corrupted user_data	Pavel Begunkov
	There is a bug, where failed linked requests are returned not with specified @user_data, but with garbage from a kernel stack. The reason is that io_fail_links() uses req->user_data, which is uninitialised when called from io_queue_sqe() on fail path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25	xen: issue deprecation warning for 32-bit pv guest	Juergen Gross
	Support for the kernel as Xen 32-bit PV guest will soon be removed. Issue a warning when booted as such. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-25	EDAC/amd64: Set grain per DIMM	Yazen Ghannam
	The following commit introduced a warning on error reports without a non-zero grain value. 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") The amd64_edac_mod module does not provide a value, so the warning will be given on the first reported memory error. Set the grain per DIMM to cacheline size (64 bytes). This is the current recommendation. Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
2019-10-25	Merge tag 'irqchip-fixes-5.4-2' of ↵	Thomas Gleixner
	git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull the second lot of irqchip updates for 5.4 from Marc Zyngier: - Sifive PLIC: force driver to skip non-relevant contexts - GICv4: Don't send VMOVP commands to ITSs that don't have this vPE mapped
2019-10-25	x86/kvm: Fix -Wmissing-prototypes warnings	Yi Wang
	We get two warning when build kernel with W=1: arch/x86/kernel/kvm.c:872:6: warning: no previous prototype for ‘arch_haltpoll_enable’ [-Wmissing-prototypes] arch/x86/kernel/kvm.c:885:6: warning: no previous prototype for ‘arch_haltpoll_disable’ [-Wmissing-prototypes] Including the missing head file can fix this. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-25	kvm: Allocate memslots and buses before calling kvm_arch_init_vm	Jim Mattson
	This reorganization will allow us to call kvm_arch_destroy_vm in the event that kvm_create_vm fails after calling kvm_arch_init_vm. Suggested-by: Junaid Shahid <junaids@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-25	powerpc/powernv/eeh: Fix oops when probing cxl devices	Frederic Barrat
	Recent cleanup in the way EEH support is added to a device causes a kernel oops when the cxl driver probes a device and creates virtual devices discovered on the FPGA: BUG: Kernel NULL pointer dereference at 0x000000a0 Faulting instruction address: 0xc000000000048070 Oops: Kernel access of bad area, sig: 7 [#1] ... NIP eeh_add_device_late.part.9+0x50/0x1e0 LR eeh_add_device_late.part.9+0x3c/0x1e0 Call Trace: _dev_info+0x5c/0x6c (unreliable) pnv_pcibios_bus_add_device+0x60/0xb0 pcibios_bus_add_device+0x40/0x60 pci_bus_add_device+0x30/0x100 pci_bus_add_devices+0x64/0xd0 cxl_pci_vphb_add+0xe0/0x130 [cxl] cxl_probe+0x504/0x5b0 [cxl] local_pci_probe+0x6c/0x110 work_for_cpu_fn+0x38/0x60 The root cause is that those cxl virtual devices don't have a representation in the device tree and therefore no associated pci_dn structure. In eeh_add_device_late(), pdn is NULL, so edev is NULL and we oops. We never had explicit support for EEH for those virtual devices. Instead, EEH events are reported to the (real) pci device and handled by the cxl driver. Which can then forward to the virtual devices and handle dependencies. The fact that we try adding EEH support for the virtual devices is new and a side-effect of the recent cleanup. This patch fixes it by skipping adding EEH support on powernv for devices which don't have a pci_dn structure. The cxl driver doesn't create virtual devices on pseries so this patch doesn't fix it there intentionally. Fixes: b905f8cdca77 ("powerpc/eeh: EEH for pSeries hot plug") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016162833.22509-1-fbarrat@linux.ibm.com
2019-10-25	irqchip/sifive-plic: Skip contexts except supervisor in plic_init()	Alan Mikhak
	Modify plic_init() to skip .dts interrupt contexts other than supervisor external interrupt. The .dts entry for plic may specify multiple interrupt contexts. For example, it may assign two entries IRQ_M_EXT and IRQ_S_EXT, in that order, to the same interrupt controller. This patch modifies plic_init() to skip the IRQ_M_EXT context since IRQ_S_EXT is currently the only supported context. If IRQ_M_EXT is not skipped, plic_init() will report "handler already present for context" when it comes across the IRQ_S_EXT context in the next iteration of its loop. Without this patch, .dts would have to be edited to replace the value of IRQ_M_EXT with -1 for it to be skipped. Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # arch/riscv Link: https://lkml.kernel.org/r/1571933503-21504-1-git-send-email-alan.mikhak@sifive.com
2019-10-25	pinctrl: cherryview: Allocate IRQ chip dynamic	Andy Shevchenko
	Keeping the IRQ chip definition static shares it with multiple instances of the GPIO chip in the system. This is bad and now we get this warning from GPIO library: "detected irqchip that is shared with multiple gpiochips: please fix the driver." Hence, move the IRQ chip definition from being driver static into the struct intel_pinctrl. So a unique IRQ chip is used for each GPIO chip instance. This patch is heavily based on the attachment to the bug by Christoph Marz. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=202543 Fixes: 6e08d6bbebeb ("pinctrl: Add Intel Cherryview/Braswell pin controller support") Depends-on: 83b9dc11312f ("pinctrl: cherryview: Associate IRQ descriptors to irqdomain") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-10-25	ACPI: processor: Add QoS requests for all CPUs	Rafael J. Wysocki
	The _PPC change notifications from the platform firmware are per-CPU, so acpi_processor_ppc_init() needs to add a frequency QoS request for each CPU covered by a cpufreq policy to take all of them into account. Even though ACPI thermal control of CPUs sets frequency limits per processor package, it also needs a frequency QoS request for each CPU in a cpufreq policy in case some of them are taken offline and the frequency limit needs to be set through the remaining online ones (this is slightly excessive, because all CPUs covered by one cpufreq policy will set the same frequency limit through their QoS requests, but it is not incorrect). Modify the code in accordance with the above observations. Fixes: d15ce412737a ("ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-10-25	clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume	Marek Szyprowski
	Properly save and restore all top PLL related configuration registers during suspend/resume cycle. So far driver only handled EPLL and RPLL clocks, all other were reset to default values after suspend/resume cycle. This caused for example lower G3D (MALI Panfrost) performance after system resume, even if performance governor has been selected. Reported-by: Reported-by: Marian Mihailescu <mihailescu2m@gmail.com> Fixes: 773424326b51 ("clk: samsung: exynos5420: add more registers to restore list") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
2019-10-25	arm64: dts: ls1028a: fix a compatible issue	Yuantian Tang
	The I2C multiplexer used on ls1028aqds is PCA9547, not PCA9847. If the wrong compatible was used, this chip will not be able to be probed correctly and hence fail to work. Signed-off-by: Yuantian Tang <andy.tang@nxp.com> Acked-by: Li Yang <leoyang.li@nxp.com> Fixes: 8897f3255c9c ("arm64: dts: Add support for NXP LS1028A SoC") Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-10-25	Merge tag 'drm-fixes-5.4-2019-10-23' of ↵	Dave Airlie
	git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.4-2019-10-23: amdgpu: - Fix suspend/resume issue related to multi-media engines - Fix memory leak in user ptr code related to hmm conversion - Fix possible VM faults when allocating page table memory - Fix error handling in bo list ioctl Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024031809.3155-1-alexander.deucher@amd.com
2019-10-25	Merge tag 'drm-misc-fixes-2019-10-23' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Two fixes for komeda, one for typos and one to prevent an hardware issue when flushing inactive pipes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191023112643.evpp6f23mpjwdsn4@gilmour
2019-10-25	autofs: fix a leak in autofs_expire_indirect()	Al Viro
	if the second call of should_expire() in there ends up grabbing and returning a new reference to dentry, we need to drop it before continuing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-10-24	cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs	Dave Wysochanski
	There's a deadlock that is possible and can easily be seen with a test where multiple readers open/read/close of the same file and a disruption occurs causing reconnect. The deadlock is due a reader thread inside cifs_strict_readv calling down_read and obtaining lock_sem, and then after reconnect inside cifs_reopen_file calling down_read a second time. If in between the two down_read calls, a down_write comes from another process, deadlock occurs. CPU0 CPU1 ---- ---- cifs_strict_readv() down_read(&cifsi->lock_sem); _cifsFileInfo_put OR cifs_new_fileinfo down_write(&cifsi->lock_sem); cifs_reopen_file() down_read(&cifsi->lock_sem); Fix the above by changing all down_write(lock_sem) calls to down_write_trylock(lock_sem)/msleep() loop, which in turn makes the second down_read call benign since it will never block behind the writer while holding lock_sem. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed--by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-10-24	CIFS: Fix use after free of file info structures	Pavel Shilovsky
	Currently the code assumes that if a file info entry belongs to lists of open file handles of an inode and a tcon then it has non-zero reference. The recent changes broke that assumption when putting the last reference of the file info. There may be a situation when a file is being deleted but nothing prevents another thread to reference it again and start using it. This happens because we do not hold the inode list lock while checking the number of references of the file info structure. Fix this by doing the proper locking when doing the check. Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo") Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic") Cc: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-10-24	CIFS: Fix retry mid list corruption on reconnects	Pavel Shilovsky
	When the client hits reconnect it iterates over the mid pending queue marking entries for retry and moving them to a temporary list to issue callbacks later without holding GlobalMid_Lock. In the same time there is no guarantee that mids can't be removed from the temporary list or even freed completely by another thread. It may cause a temporary list corruption: [ 430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469 [ 430.464668] ------------[ cut here ]------------ [ 430.466569] kernel BUG at lib/list_debug.c:51! [ 430.468476] invalid opcode: 0000 [#1] SMP PTI [ 430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19 [ 430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55 ... [ 430.510426] Call Trace: [ 430.511500] cifs_reconnect+0x25e/0x610 [cifs] [ 430.513350] cifs_readv_from_socket+0x220/0x250 [cifs] [ 430.515464] cifs_read_from_socket+0x4a/0x70 [cifs] [ 430.517452] ? try_to_wake_up+0x212/0x650 [ 430.519122] ? cifs_small_buf_get+0x16/0x30 [cifs] [ 430.521086] ? allocate_buffers+0x66/0x120 [cifs] [ 430.523019] cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 430.525116] kthread+0xfb/0x130 [ 430.526421] ? cifs_handle_standard+0x190/0x190 [cifs] [ 430.528514] ? kthread_park+0x90/0x90 [ 430.530019] ret_from_fork+0x35/0x40 Fix this by obtaining extra references for mids being retried and marking them as MID_DELETED which indicates that such a mid has been dequeued from the pending list. Also move mid cleanup logic from DeleteMidQEntry to _cifs_mid_q_entry_release which is called when the last reference to a particular mid is put. This allows to avoid any use-after-free of response buffers. The patch needs to be backported to stable kernels. A stable tag is not mentioned below because the patch doesn't apply cleanly to any actively maintained stable kernel. Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-and-tested-by: David Wysochanski <dwysocha@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-10-24	scsi: sd: define variable dif as unsigned int instead of bool	Xiang Chen
	Variable dif in function sd_setup_read_write_cmnd() is the return value of function scsi_host_dif_capable() which returns dif capability of disks. If define it as bool, even for the disks which support DIF3, the function still return dif=1, which causes IO error. So define variable dif as unsigned int instead of bool. Fixes: e249e42d277e ("scsi: sd: Clean up sd_setup_read_write_cmnd()") Link: https://lore.kernel.org/r/1571725628-132736-1-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24	scsi: target: cxgbit: Fix cxgbit_fw4_ack()	Bart Van Assche
	Use the pointer 'p' after having tested that pointer instead of before. Fixes: 5cadafb236df ("target/cxgbit: Fix endianness annotations") Cc: Varun Prakash <varun@chelsio.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191023202150.22173-1-bvanassche@acm.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24	IB/core: Avoid deadlock during netlink message handling	Parav Pandit
	When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive locking of rdma_nl_mutex with the below call sequence. [..] rdma_nl_rcv() mutex_lock() [..] rdma_nl_rcv_msg() ib_get_client_nl_info() request_module() iw_cm_init() rdma_nl_register() mutex_lock(); <- Deadlock, acquiring mutex again Due to above call sequence, following call trace and deadlock is observed. kernel: __mutex_lock+0x35e/0x860 kernel: ? __mutex_lock+0x129/0x860 kernel: ? rdma_nl_register+0x1a/0x90 [ib_core] kernel: rdma_nl_register+0x1a/0x90 [ib_core] kernel: ? 0xffffffffc029b000 kernel: iw_cm_init+0x34/0x1000 [iw_cm] kernel: do_one_initcall+0x67/0x2d4 kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0 kernel: do_init_module+0x5a/0x223 kernel: load_module+0x1998/0x1e10 kernel: ? __symbol_put+0x60/0x60 kernel: __do_sys_finit_module+0x94/0xe0 kernel: do_syscall_64+0x5a/0x270 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe process stack trace: [<0>] __request_module+0x1c9/0x460 [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core] [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core] [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core] [<0>] netlink_unicast+0x179/0x220 [<0>] netlink_sendmsg+0x2f6/0x3f0 [<0>] sock_sendmsg+0x30/0x40 [<0>] ___sys_sendmsg+0x27a/0x290 [<0>] __sys_sendmsg+0x58/0xa0 [<0>] do_syscall_64+0x5a/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe To overcome this deadlock and to allow multiple netlink messages to progress in parallel, following scheme is implemented. 1. Split the lock protecting the cb_table into a per-index lock, and make it a rwlock. This lock is used to ensure no callbacks are running after unregistration returns. Since a module will not be registered once it is already running callbacks, this avoids the deadlock. 2. Use smp_store_release() to update the cb_table during registration so that no lock is required. This avoids lockdep problems with thinking all the rwsems are the same lock class. Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload") Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24	Merge branch 'md-next' of ↵	Jens Axboe
	https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.5/drivers Pull MD changes from Song. * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: no longer compare spare disk superblock events in super_load md: improve handling of bio with REQ_PREFLUSH in md_flush_request() md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit md/raid0: Fix an error message in raid0_make_request()
2019-10-24	Merge tag 'devicetree-fixes-for-5.4-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull Devicetree fixes from Rob Herring: "A couple more DT fixes for 5.4: fix a ref count, memory leak, and Risc-V cpu schema warnings" * tag 'devicetree-fixes-for-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: reserved_mem: add missing of_node_put() for proper ref-counting of: unittest: fix memory leak in unittest_data_add dt-bindings: riscv: Fix CPU schema errors
2019-10-24	md: no longer compare spare disk superblock events in super_load	Yufen Yu
	We have a test case as follow: mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] \ --assume-clean --bitmap=internal mdadm -S /dev/md1 mdadm -A /dev/md1 /dev/sd[b-c] --run --force mdadm --zero /dev/sda mdadm /dev/md1 -a /dev/sda echo offline > /sys/block/sdc/device/state echo offline > /sys/block/sdb/device/state sleep 5 mdadm -S /dev/md1 echo running > /sys/block/sdb/device/state echo running > /sys/block/sdc/device/state mdadm -A /dev/md1 /dev/sd[a-c] --run --force When we readd /dev/sda to the array, it started to do recovery. After offline the other two disks in md1, the recovery have been interrupted and superblock update info cannot be written to the offline disks. While the spare disk (/dev/sda) can continue to update superblock info. After stopping the array and assemble it, we found the array run fail, with the follow kernel message: [ 172.986064] md: kicking non-fresh sdb from array! [ 173.004210] md: kicking non-fresh sdc from array! [ 173.022383] md/raid1:md1: active with 0 out of 4 mirrors [ 173.022406] md1: failed to create bitmap (-5) [ 173.023466] md: md1 stopped. Since both sdb and sdc have the value of 'sb->events' smaller than that in sda, they have been kicked from the array. However, the only remained disk sda is in 'spare' state before stop and it cannot be added to conf->mirrors[] array. In the end, raid array assemble and run fail. In fact, we can use the older disk sdb or sdc to assemble the array. That means we should not choose the 'spare' disk as the fresh disk in analyze_sbs(). To fix the problem, we do not compare superblock events when it is a spare disk, as same as validate_super. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-10-24	md: improve handling of bio with REQ_PREFLUSH in md_flush_request()	David Jeffery
	If pers->make_request fails in md_flush_request(), the bio is lost. To fix this, pass back a bool to indicate if the original make_request call should continue to handle the I/O and instead of assuming the flush logic will push it to completion. Convert md_flush_request to return a bool and no longer calls the raid driver's make_request function. If the return is true, then the md flush logic has or will complete the bio and the md make_request call is done. If false, then the md make_request function needs to keep processing like it is a normal bio. Let the original call to md_handle_request handle any need to retry sending the bio to the raid driver's make_request function should it be needed. Also mark md_flush_request and the make_request function pointer as __must_check to issue warnings should these critical return values be ignored. Fixes: 2bc13b83e629 ("md: batch flush requests.") Cc: stable@vger.kernel.org # # v4.19+ Cc: NeilBrown <neilb@suse.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>