linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-02-08	Merge tag 'drm-for-v4.16-part2-fixes' of ↵	Linus Torvalds
	git://people.freedesktop.org/~airlied/linux Pull more drm updates from Dave Airlie: "Ben missed sending his nouveau tree, but he really didn't have much stuff in it: - GP108 acceleration support is enabled by "secure boot" support - some clockgating work on Kepler, and bunch of fixes - the bulk of the diff is regenerated firmware files, the change to them really isn't that large. Otherwise this contains regular Intel and AMDGPU fixes" * tag 'drm-for-v4.16-part2-fixes' of git://people.freedesktop.org/~airlied/linux: (59 commits) drm/i915/bios: add DP max link rate to VBT child device struct drm/i915/cnp: Properly handle VBT ddc pin out of bounds. drm/i915/cnp: Ignore VBT request for know invalid DDC pin. drm/i915/cmdparser: Do not check past the cmd length. drm/i915/cmdparser: Check reg_table_count before derefencing. drm/i915/bxt, glk: Increase PCODE timeouts during CDCLK freq changing drm/i915/gvt: Use KVM r/w to access guest opregion drm/i915/gvt: Fix aperture read/write emulation when enable x-no-mmap=on drm/i915/gvt: only reset execlist state of one engine during VM engine reset drm/i915/gvt: refine intel_vgpu_submission_ops as per engine ops drm/amdgpu: re-enable CGCG on CZ and disable on ST drm/nouveau/clk: fix gcc-7 -Wint-in-bool-context warning drm/nouveau/mmu: Fix trailing semicolon drm/nouveau: Introduce NvPmEnableGating option drm/nouveau: Add support for SLCG for Kepler2 drm/nouveau: Add support for BLCG on Kepler2 drm/nouveau: Add support for BLCG on Kepler1 drm/nouveau: Add support for basic clockgating on Kepler1 drm/nouveau/kms/nv50: fix handling of gamma since atomic conversion drm/nouveau/kms/nv50: use INTERPOLATE_257_UNITY_RANGE LUT on newer chipsets ...
2018-02-08	Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph updates from Ilya Dryomov: "Things have been very quiet on the rbd side, as work continues on the big ticket items slated for the next merge window. On the CephFS side we have a large number of cap handling improvements, a fix for our long-standing abuse of ->journal_info in ceph_readpages() and yet another dentry pointer management patch" * tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client: ceph: improving efficiency of syncfs libceph: check kstrndup() return value ceph: try to allocate enough memory for reserved caps ceph: fix race of queuing delayed caps ceph: delete unreachable code in ceph_check_caps() ceph: limit rate of cap import/export error messages ceph: fix incorrect snaprealm when adding caps ceph: fix un-balanced fsc->writeback_count update ceph: track read contexts in ceph_file_info ceph: avoid dereferencing invalid pointer during cached readdir ceph: use atomic_t for ceph_inode_info::i_shared_gen ceph: cleanup traceless reply handling for rename ceph: voluntarily drop Fx cap for readdir request ceph: properly drop caps for setattr request ceph: voluntarily drop Lx cap for link/rename requests ceph: voluntarily drop Ax cap for requests that create new inode rbd: whitelist RBD_FEATURE_OPERATIONS feature bit rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full() rbd: use kmem_cache_zalloc() in rbd_img_request_create() rbd: obj_request->completion is unused
2018-02-08	tuntap: add missing xdp flush	Jason Wang
	When using devmap to redirect packets between interfaces, xdp_do_flush() is usually a must to flush any batched packets. Unfortunately this is missed in current tuntap implementation. Unlike most hardware driver which did XDP inside NAPI loop and call xdp_do_flush() at then end of each round of poll. TAP did it in the context of process e.g tun_get_user(). So fix this by count the pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or MSG_MORE was cleared by sendmsg() caller. With this fix, xdp_redirect_map works again between two TAPs. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	Merge tag 'arm64-upstream' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: "As I mentioned in the last pull request, there's a second batch of security updates for arm64 with mitigations for Spectre/v1 and an improved one for Spectre/v2 (via a newly defined firmware interface API). Spectre v1 mitigation: - back-end version of array_index_mask_nospec() - masking of the syscall number to restrict speculation through the syscall table - masking of __user pointers prior to deference in uaccess routines Spectre v2 mitigation update: - using the new firmware SMC calling convention specification update - removing the current PSCI GET_VERSION firmware call mitigation as vendors are deploying new SMCCC-capable firmware - additional branch predictor hardening for synchronous exceptions and interrupts while in user mode Meltdown v3 mitigation update: - Cavium Thunder X is unaffected but a hardware erratum gets in the way. The kernel now starts with the page tables mapped as global and switches to non-global if kpti needs to be enabled. Other: - Theoretical trylock bug fixed" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits) arm64: Kill PSCI_GET_VERSION as a variant-2 workaround arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: smccc: Implement SMCCC v1.1 inline primitive arm/arm64: smccc: Make function identifiers an unsigned quantity firmware/psci: Expose SMCCC version through psci_ops firmware/psci: Expose PSCI conduit arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: KVM: Turn kvm_psci_version into a static inline arm/arm64: KVM: Advertise SMCCC v1.1 arm/arm64: KVM: Implement PSCI 1.0 support arm/arm64: KVM: Add smccc accessors to PSCI code arm/arm64: KVM: Add PSCI_VERSION helper arm/arm64: KVM: Consolidate the PSCI include files arm64: KVM: Increment PC after handling an SMC trap arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: entry: Apply BP hardening for suspicious interrupts from EL0 arm64: entry: Apply BP hardening for high-priority synchronous exceptions arm64: futex: Mask __user pointers prior to dereference ...
2018-02-08	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio/vhost updates from Michael Tsirkin: "virtio, vhost: fixes, cleanups, features This includes the disk/cache memory stats for for the virtio balloon, as well as multiple fixes and cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: don't hold onto file pointer for VHOST_SET_LOG_FD vhost: don't hold onto file pointer for VHOST_SET_VRING_ERR vhost: don't hold onto file pointer for VHOST_SET_VRING_CALL ringtest: ring.c malloc & memset to calloc virtio_vop: don't kfree device on register failure virtio_pci: don't kfree device on register failure virtio: split device_register into device_initialize and device_add vhost: remove unused lock check flag in vhost_dev_cleanup() vhost: Remove the unused variable. virtio_blk: print capacity at probe time virtio: make VIRTIO a menuconfig to ease disabling it all virtio/ringtest: virtio_ring: fix up need_event math virtio/ringtest: fix up need_event math virtio: virtio_mmio: make of_device_ids const. firmware: Use PTR_ERR_OR_ZERO() virtio-mmio: Use PTR_ERR_OR_ZERO() vhost/scsi: Improve a size determination in four functions virtio_balloon: include disk/file caches memory statistics
2018-02-08	Merge ath-current from ↵	Kalle Valo
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git ath.git fixes for 4.16. Major changes: ath10k * correct firmware RAM dump length for QCA6174/QCA9377 * add new QCA988X device id * fix a kernel panic during pci probe * revert a recent commit which broke ath10k firmware metadata parsing ath9k * fix a noise floor regression introduced during the merge window * add new device id
2018-02-08	nvme: Fix discard buffer overrun	Keith Busch
	This patch checks the discard range array bounds before setting it in case the driver gets a badly formed request. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2018-02-08	nvme: delete NVME_CTRL_LIVE --> NVME_CTRL_CONNECTING transition	Max Gurtovoy
	There is no logical reason to move from live state to connecting state. In case of initial connection establishment, the transition should be NVME_CTRL_NEW --> NVME_CTRL_CONNECTING --> NVME_CTRL_LIVE. In case of error recovery or reset, the transition should be NVME_CTRL_LIVE --> NVME_CTRL_RESETTING --> NVME_CTRL_CONNECTING --> NVME_CTRL_LIVE. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2018-02-08	nvme-rdma: use NVME_CTRL_CONNECTING state to mark init process	Max Gurtovoy
	In order to avoid concurrent error recovery during initialization process (allowed by the NVME_CTRL_NEW --> NVME_CTRL_RESETTING transition) we must mark the ctrl as CONNECTING before initial connection establisment. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2018-02-08	nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING	Max Gurtovoy
	In pci transport, this state is used to mark the initialization process. This should be also used in other transports as well. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2018-02-08	nfp: populate MODULE_VERSION	Jakub Kicinski
	DKMS and similar out-of-tree module replacement services use module version to make sure the out-of-tree software is not older than the module shipped with the kernel. We use the kernel version in ethtool -i output, put it into MODULE_VERSION as well. Reported-by: Jan Gutter <jan.gutter@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	nfp: limit the number of TSO segments	Jakub Kicinski
	Most FWs limit the number of TSO segments a frame can produce to 64. This is for fairness and efficiency (of FW datapath) reasons. If a frame with larger number of segments is submitted the FW will drop it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	nfp: forbid disabling hw-tc-offload on representors while offload active	Jakub Kicinski
	All netdevs which can accept TC offloads must implement .ndo_set_features(). nfp_reprs currently do not do that, which means hw-tc-offload can be turned on and off even when offloads are active. Whether the offloads are active is really a question to nfp_ports, so remove the per-app tc_busy callback indirection thing, and simply count the number of offloaded items in nfp_port structure. Fixes: 8a2768732a4d ("nfp: provide infrastructure for offloading flower based TC filters") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	nfp: don't advertise hw-tc-offload on non-port netdevs	Jakub Kicinski
	nfp_port is a structure which represents an ASIC port, both PCIe vNIC (on a PF or a VF) or the external MAC port. vNIC netdev (struct nfp_net) and pure representor netdev (struct nfp_repr) both have a pointer to this structure. nfp_reprs always have a port associated. nfp_nets, however, only represent a device port in legacy mode, where they are considered the MAC port. In switchdev mode they are just the CPU's side of the PCIe link. By definition TC offloads only apply to device ports. Don't set the flag on vNICs without a port (i.e. in switchdev mode). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	nfp: bpf: require ETH table	Jakub Kicinski
	Upcoming changes will require all netdevs supporting TC offloads to have a full struct nfp_port. Require those for BPF offload. The operation without management FW reporting information about Ethernet ports is something we only support for very old and very basic NIC firmwares anyway. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08	ocxl: fix signed comparison with less than zero	Colin Ian King
	Currently the comparison of used < 0 is always false because uses is a size_t. Fix this by making used a ssize_t type. Detected by Coccinelle: drivers/misc/ocxl/file.c:320:6-10: WARNING: Unsigned expression compared with zero: used < 0 Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi devices") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08	Revert "ath10k: add sanity check to ie_len before parsing fw/board ie"	Ryan Hsu
	This reverts commit 9ed4f91628737c820af6a1815b65bc06bd31518f. The commit introduced a regression that over read the ie with the padding. - the expected IE information ath10k_pci 0000:03:00.0: found firmware features ie (1 B) ath10k_pci 0000:03:00.0: Enabling feature bit: 6 ath10k_pci 0000:03:00.0: Enabling feature bit: 7 ath10k_pci 0000:03:00.0: features ath10k_pci 0000:03:00.0: 00000000: c0 00 00 00 00 00 00 00 - the wrong IE with padding is read (0x77) ath10k_pci 0000:03:00.0: found firmware features ie (4 B) ath10k_pci 0000:03:00.0: Enabling feature bit: 6 ath10k_pci 0000:03:00.0: Enabling feature bit: 7 ath10k_pci 0000:03:00.0: Enabling feature bit: 8 ath10k_pci 0000:03:00.0: Enabling feature bit: 9 ath10k_pci 0000:03:00.0: Enabling feature bit: 10 ath10k_pci 0000:03:00.0: Enabling feature bit: 12 ath10k_pci 0000:03:00.0: Enabling feature bit: 13 ath10k_pci 0000:03:00.0: Enabling feature bit: 14 ath10k_pci 0000:03:00.0: Enabling feature bit: 16 ath10k_pci 0000:03:00.0: Enabling feature bit: 17 ath10k_pci 0000:03:00.0: Enabling feature bit: 18 ath10k_pci 0000:03:00.0: features ath10k_pci 0000:03:00.0: 00000000: c0 77 07 00 00 00 00 00 Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Ryan Hsu <ryanhsu@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-08	crypto: sun4i_ss_prng - convert lock to _bh in sun4i_ss_prng_generate	Artem Savkov
	Lockdep detects a possible deadlock in sun4i_ss_prng_generate() and throws an "inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage" warning. Disabling softirqs to fix this. Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG") Signed-off-by: Artem Savkov <artem.savkov@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-08	crypto: sun4i_ss_prng - fix return value of sun4i_ss_prng_generate	Artem Savkov
	According to crypto/rng.h generate function should return 0 on success and < 0 on error. Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG") Signed-off-by: Artem Savkov <artem.savkov@gmail.com> Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-08	crypto: caam - fix endless loop when DECO acquire fails	Horia Geantă
	In case DECO0 cannot be acquired - i.e. run_descriptor_deco0() fails with -ENODEV, caam_probe() enters an endless loop: run_descriptor_deco0 ret -ENODEV -> instantiate_rng -ENODEV, overwritten by -EAGAIN ret -EAGAIN -> caam_probe -EAGAIN results in endless loop It turns out the error path in instantiate_rng() is incorrect, the checks are done in the wrong order. Cc: <stable@vger.kernel.org> # 3.13+ Fixes: 1005bccd7a4a6 ("crypto: caam - enable instantiation of all RNG4 state handles") Reported-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Suggested-by: Auer Lukas <lukas.auer@aisec.fraunhofer.de> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-08	crypto: talitos - fix Kernel Oops on hashing an empty file	LEROY Christophe
	Performing the hash of an empty file leads to a kernel Oops [ 44.504600] Unable to handle kernel paging request for data at address 0x0000000c [ 44.512819] Faulting instruction address: 0xc02d2be8 [ 44.524088] Oops: Kernel access of bad area, sig: 11 [#1] [ 44.529171] BE PREEMPT CMPC885 [ 44.532232] CPU: 0 PID: 491 Comm: md5sum Not tainted 4.15.0-rc8-00211-g3a968610b6ea #81 [ 44.540814] NIP: c02d2be8 LR: c02d2984 CTR: 00000000 [ 44.545812] REGS: c6813c90 TRAP: 0300 Not tainted (4.15.0-rc8-00211-g3a968610b6ea) [ 44.554223] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48222822 XER: 20000000 [ 44.560855] DAR: 0000000c DSISR: c0000000 [ 44.560855] GPR00: c02d28fc c6813d40 c6828000 c646fa40 00000001 00000001 00000001 00000000 [ 44.560855] GPR08: 0000004c 00000000 c000bfcc 00000000 28222822 100280d4 00000000 10020008 [ 44.560855] GPR16: 00000000 00000020 00000000 00000000 10024008 00000000 c646f9f0 c6179a10 [ 44.560855] GPR24: 00000000 00000001 c62f0018 c6179a10 00000000 c6367a30 c62f0000 c646f9c0 [ 44.598542] NIP [c02d2be8] ahash_process_req+0x448/0x700 [ 44.603751] LR [c02d2984] ahash_process_req+0x1e4/0x700 [ 44.608868] Call Trace: [ 44.611329] [c6813d40] [c02d28fc] ahash_process_req+0x15c/0x700 (unreliable) [ 44.618302] [c6813d90] [c02060c4] hash_recvmsg+0x11c/0x210 [ 44.623716] [c6813db0] [c0331354] ___sys_recvmsg+0x98/0x138 [ 44.629226] [c6813eb0] [c03332c0] __sys_recvmsg+0x40/0x84 [ 44.634562] [c6813f10] [c03336c0] SyS_socketcall+0xb8/0x1d4 [ 44.640073] [c6813f40] [c000d1ac] ret_from_syscall+0x0/0x38 [ 44.645530] Instruction dump: [ 44.648465] 38c00001 7f63db78 4e800421 7c791b78 54690ffe 0f090000 80ff0190 2f870000 [ 44.656122] 40befe50 2f990001 409e0210 813f01bc <8129000c> b39e003a 7d29c214 913e003c This patch fixes that Oops by checking if src is NULL. Fixes: 6a1e8d14156d4 ("crypto: talitos - making mapping helpers more generic") Cc: <stable@vger.kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-08	nfp: bpf: fix immed relocation for larger offsets	Jakub Kicinski
	Immed relocation is missing a shift which means for larger offsets the lower and higher part of the address would be ORed together. Fixes: ce4ebfd859c3 ("nfp: bpf: add helpers for updating immediate instructions") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-08	Merge branches 'acpi-video', 'acpi-battery' and 'acpi-cppc'	Rafael J. Wysocki
	* acpi-video: ACPI / video: Use true for boolean value * acpi-battery: ACPI / battery: Add quirk for Asus UX360UA and UX410UAK * acpi-cppc: ACPI / CPPC: Use 64-bit arithmetic instead of 32-bit
2018-02-08	Merge branches 'acpi-tables', 'acpi-bus' and 'acpi-processor'	Rafael J. Wysocki
	* acpi-tables: ACPI: SPCR: Make SPCR available to x86 ACPI / tables: Add IORT to injectable table list * acpi-bus: ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530 ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs ACPI / bus: Do not call _STA on battery devices with unmet dependencies PCI: acpiphp_ibm: prepare for acpi_get_object_info() no longer returning status ACPI: export acpi_bus_get_status_handle() * acpi-processor: ACPI / processor: Set default C1 idle state description ACPI: processor_perflib: Do not send _PPC change notification if not ready
2018-02-08	Merge branch 'acpica'	Rafael J. Wysocki
	* acpica: ACPICA: Update version to 20180105 ACPICA: All acpica: Update copyrights to 2018 ACPICA: Add a missing pair of parentheses ACPICA: Prefer ACPI_TO_POINTER() over ACPI_ADD_PTR() ACPICA: Avoid NULL pointer arithmetic ACPICA: Linux: add support for X32 ABI compilation
2018-02-08	Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-domains'	Rafael J. Wysocki
	* pm-cpufreq: arm: imx: Add MODULE_ALIAS for cpufreq cpufreq: Add and use cpufreq_for_each_{valid_,}entry_idx() cpufreq: intel_pstate: Enable HWP during system resume on CPU0 cpufreq: scpi: fix error return code in scpi_cpufreq_init() cpufreq: scpi: fix static checker warning cdev isn't an ERR_PTR cpufreq: remove at32ap-cpufreq cpufreq: AMD: Ignore the check for ProcFeedback in ST/CZ cpufreq: Skip cpufreq resume if it's not suspended * pm-cpuidle: x86: PM: Make APM idle driver initialize polling state * pm-domains: PM / domains: Fix up domain-idle-states OF parsing
2018-02-08	arm: imx: Add MODULE_ALIAS for cpufreq	Nicolas Chauvet
	Without this, the imx6q-cpufreq driver isn't loaded automatically when built as a module Tested on wandboard quad with a fedora 27 kernel rpm Signed-off-by: Nicolas Chauvet <kwizart@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-08	cpufreq: Add and use cpufreq_for_each_{valid_,}entry_idx()	Dominik Brodowski
	Pointer subtraction is slow and tedious. Therefore, replace all instances where cpufreq_for_each_{valid_,}entry loops contained such substractions with an iteration macro providing an index to the frequency_table entry. Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/r/20180120020237.GM13338@ZenIV.linux.org.uk Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-08	cpufreq: intel_pstate: Enable HWP during system resume on CPU0	Chen Yu
	When maxcpus=1 is in the kernel command line, the BP is responsible for re-enabling the HWP - because currently only the APs invoke intel_pstate_hwp_enable() during their online process - which might put the system into unstable state after resume. Fix this by enabling the HWP explicitly on BP during resume. Reported-by: Doug Smythies <dsmythies@telus.net> Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Yu Chen <yu.c.chen@intel.com> [ rjw: Subject/changelog, minor modifications ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-08	cpufreq: scpi: fix error return code in scpi_cpufreq_init()	Wei Yongjun
	Fix to return a negative error code from the clk_get() error handling case instead of 0, as done elsewhere in this function. Fixes: 343a8d17fa8d (cpufreq: scpi: remove arm_big_little dependency) Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-08	ACPI: sbshc: remove raw pointer from printk() message	Greg Kroah-Hartman
	There's no need to be printing a raw kernel pointer to the kernel log at every boot. So just remove it, and change the whole message to use the correct dev_info() call at the same time. Reported-by: Wang Qize <wang_qize@venustech.com.cn> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07	net: ethernet: ti: cpsw: fix net watchdog timeout	Grygorii Strashko
	It was discovered that simple program which indefinitely sends 200b UDP packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog timeout is triggered due to race between cpsw_ndo_start_xmit() and cpsw_tx_handler() [NAPI] cpsw_ndo_start_xmit() if (unlikely(!cpdma_check_free_tx_desc(txch))) { txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq); ^^ as per [1] barier has to be used after set_bit() otherwise new value might not be visible to other cpus } cpsw_tx_handler() if (unlikely(netif_tx_queue_stopped(txq))) netif_tx_wake_queue(txq); and when it happens ndev TX queue became disabled forever while driver's HW TX queue is empty. Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue() calls and double check for free TX descriptors after stopping ndev TX queue - if there are free TX descriptors wake up ndev TX queue. [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07	ibmvnic: Ensure that buffers are NULL after free	Thomas Falcon
	This change will guard against a double free in the case that the buffers were previously freed at some other time, such as during a device reset. It resolves a kernel oops that occurred when changing the VNIC device's MTU. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07	ibmvnic: Fix rx queue cleanup for non-fatal resets	John Allen
	At some point, a check was added to exit the polling routine during resets. This makes sense for most reset conditions, but for a non-fatal error, we expect the polling routine to continue running to properly clean up the rx queues. This patch checks if we are performing a non-fatal reset and if we are, continues normal polling operation. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07	i40e: Fix the number of queues available to be mapped for use	Amritha Nambiar
	Fix the number of queues per enabled TC and report available queues to the kernel without having to limit them to the max RSS limit so they are available to be mapped for XPS. This allows a queue per processing thread available for handling traffic for the given traffic class. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07	sun: Add SPDX license tags to Sun network drivers	Shannon Nelson
	Add the appropriate SPDX license tags to the Sun network drivers as outlined in Documentation/process/license-rules.rst. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07	cxgb4: Fix error handling path in 'init_one()'	Christophe JAILLET
	Commit baf5086840ab1 ("cxgb4: restructure VF mgmt code") has reordered some code but an error handling label has not been updated accordingly. So fix it and free 'adapter' if 't4_wait_dev_ready()' fails. Fixes: baf5086840ab1 ("cxgb4: restructure VF mgmt code") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07	Merge branch 'for-linus' into test	Jens Axboe
	* for-linus: block, bfq: add requeue-request hook bcache: fix for data collapse after re-attaching an attached device bcache: return attach error when no cache set exist bcache: set writeback_rate_update_seconds in range [1, 60] seconds bcache: fix for allocator and register thread race bcache: set error_limit correctly bcache: properly set task state in bch_writeback_thread() bcache: fix high CPU occupancy during journal bcache: add journal statistic block: Add should_fail_bio() for bpf error injection blk-wbt: account flush requests correctly
2018-02-08	Merge tag 'drm-intel-next-fixes-2018-02-07' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-intel into drm-next Fix for pcode timeouts on BXT and GLK, cmdparser fixes and fixes for new vbt version on CFL and CNL. GVT contains vGPU reset enhancement, which refines vGPU reset flow and the support of virtual aperture read/write when x-no-mmap=on is set in KVM, which is required by a test case from Redhat and also another fix for virtual OpRegion. * tag 'drm-intel-next-fixes-2018-02-07' of git://anongit.freedesktop.org/drm/drm-intel: drm/i915/bios: add DP max link rate to VBT child device struct drm/i915/cnp: Properly handle VBT ddc pin out of bounds. drm/i915/cnp: Ignore VBT request for know invalid DDC pin. drm/i915/cmdparser: Do not check past the cmd length. drm/i915/cmdparser: Check reg_table_count before derefencing. drm/i915/bxt, glk: Increase PCODE timeouts during CDCLK freq changing drm/i915/gvt: Use KVM r/w to access guest opregion drm/i915/gvt: Fix aperture read/write emulation when enable x-no-mmap=on drm/i915/gvt: only reset execlist state of one engine during VM engine reset drm/i915/gvt: refine intel_vgpu_submission_ops as per engine ops
2018-02-07	Merge tag 'regulator-fix-v4.16-suspend' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "Fix suspend to idle. Testing on mainline after the initial regulator pull request went in identified a regression for suspend to idle due to it calling the suspend operations with states that it wasn't realized could happen, this patch fixes the problem" * tag 'regulator-fix-v4.16-suspend' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: Fix suspend to idle
2018-02-07	Merge tag 'fbdev-v4.16' of git://github.com/bzolnier/linux	Linus Torvalds
	Pull fbdev updates from Bartlomiej Zolnierkiewicz: "There is nothing really major here: - fix display-timings lookup in the Device Tree in atmel_lcdfb driver (Johan Hovold) - fix video mode and line_length to be set correctly in vfb driver (Pieter "PoroCYon" Sluys) - fix returning nonsensical values to the user-space on GIO_FONTX ioctl when using dummy console (Nicolas Pitre) - add missing license tag to mmpfb driver (Arnd Bergmann) - convert radeonfb and pxa3xx_gcu drivers to use ktime_get[_ts64]() instead of the deprecated do_gettimeofday() (Arnd Bergmann) - switch udlfb driver from using the pr_() logging functions to the dev_() ones + related cleanups (Ladislav Michl) - use __raw I/O accessors also on arm64 (Ji Zhang) - fix Kconfig help text for intelfb driver (Randy Dunlap) - do not duplicate features data in omapfb driver (Ladislav Michl) - misc cleanups (Colin Ian King, Markus Elfring, Rasmus Villemoes, Vasyl Gomonovych, Himanshu Jha, Michael Trimarchi)" * tag 'fbdev-v4.16' of git://github.com/bzolnier/linux: (25 commits) video: udlfb: Switch from the pr_() to the dev_() logging functions video: udlfb: Constify read only data video: fbdev/mmp: add MODULE_LICENSE console/dummy: leave .con_font_get set to NULL fbdev: mxsfb: use framebuffer_alloc in the correct way video: udlfb: Do not name private data 'dev' video: udlfb: Remove noisy warnings video: udlfb: Remove redundant gdev variable video: udlfb: Remove unnecessary local variable fbdev: auo_k190x: Use zeroing memory allocator instead of allocator/memset vfb: fix video mode and line_length being set when loaded fbdev: arm64 use __raw I/O memory api omapfb: dss: Do not duplicate features data video: fbdev: omap2: Use PTR_ERR_OR_ZERO() fbdev: au1200fb: delete duplicate header contents fbdev: pxa3xx: use ktime_get_ts64 for time stamps fbdev: radeon: use ktime_get() for HZ calibration video: smscufx: Improve a size determination in two functions video: udlfb: Delete an unnecessary return statement in two functions video: udlfb: Improve a size determination in dlfb_alloc_urb_list() ...
2018-02-07	Merge tag 'platform-drivers-x86-v4.16-2' of ↵	Linus Torvalds
	git://git.infradead.org/linux-platform-drivers-x86 Pull more x86 platform-drivers updates from Andy Shevchenko: "The DEFINE_SHOW_ATTRIBUTE() macro was defined privately in three locations and is useful for new and old users to avoid a lot of code duplication. Move the macro to seq_file.h. Along with above, clean up three drivers to use that macro. This, due to dependencies, was sent separately since affected changes weren't upstream originally yet. The rationale of doing this now is to allow use of new macro in v4.17 cycle in a conflictless manner" * tag 'platform-drivers-x86-v4.16-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: samsung-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro platform/x86: ideapad-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro platform/x86: dell-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro seq_file: Introduce DEFINE_SHOW_ATTRIBUTE() helper macro
2018-02-07	drm/i915/bios: add DP max link rate to VBT child device struct	Jani Nikula
	Update VBT defs to reflect revision 216. While at it, default the expected child device struct size to sizeof the size rather than a hardcoded value. v2: Fix bit order (David) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180118153310.32437-1-jani.nikula@intel.com (cherry picked from commit c4fb60b9aba9f939d3f8575df23fd8d5958ec6ed) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-08	Merge branch 'drm-next-4.16' of git://people.freedesktop.org/~agd5f/linux ↵	Dave Airlie
	into drm-next A few more misc fixes for 4.16. * 'drm-next-4.16' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: re-enable CGCG on CZ and disable on ST drm/amdgpu: disable coarse grain clockgating for ST drm/radeon: adjust tested variable drm/amdgpu: remove WARN_ON when VM isn't found v2 drm/amdgpu: fix locking in vega10_ih_prescreen_iv drm/amdgpu: fix another potential cause of VM faults drm/amdgpu: use queue 0 for kiq ring drm/ttm: Fix 'buf' pointer update in ttm_bo_vm_access_kmap() (v2) drm/ttm: fix missing parameter change for ttm_bo_cleanup_refs
2018-02-07	Merge tag 'linux-watchdog-4.16-rc1' of ↵	Linus Torvalds
	git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - new watchdog device drivers for Realtek RTD1295 and Spreadtrum SC9860 platform - add support for the following devices: jz4780 SoC, AST25xx series SoC and r8a77970 SoC - convert to watchdog framework: i6300esb_wdt, xen_wdt and sp5100_tco - several fixes for watchdog core - remove at32ap700x and obsolete documentation - gpio: Convert to use GPIO descriptors - rename gemini into FTWDT010 as this IP block is generc from Faraday Technology - various clean-ups and small bugfixes - add Guenter Roeck as co-maintainer - change maintainers e-mail address * tag 'linux-watchdog-4.16-rc1' of git://www.linux-watchdog.org/linux-watchdog: (74 commits) documentation: watchdog: remove documentation of w83697hf_wdt/w83697ug_wdt documentation: watchdog: remove documentation for ixp2000 documentation: watchdog: remove documentation of at32ap700x_wdt watchdog: remove at32ap700x_wdt watchdog: sp5100_tco: Add support for recent FCH versions watchdog: sp5100-tco: Abort if watchdog is disabled by hardware watchdog: sp5100_tco: Use bit operations watchdog: sp5100_tco: Convert to use watchdog subsystem watchdog: sp5100_tco: Clean up function and variable names watchdog: sp5100_tco: Use dev_ print functions where possible watchdog: sp5100_tco: Match PCI device early watchdog: sp5100_tco: Clean up sp5100_tco_setupdevice watchdog: sp5100_tco: Use standard error codes watchdog: sp5100_tco: Use request_muxed_region where possible watchdog: sp5100_tco: Fix watchdog disable bit watchdog: sp5100_tco: Always use SP5100_IO_PM_{INDEX_REG,DATA_REG} watchdog: core: make sure the watchdog_worker is not deferred watchdog: mt7621: switch to using managed devm_watchdog_register_device() watchdog: mt7621: set WDOG_HW_RUNNING bit when appropriate watchdog: imx2_wdt: restore previous timeout after suspend+resume ...
2018-02-07	bcache: fix for data collapse after re-attaching an attached device	Tang Junhui
	back-end device sdm has already attached a cache_set with ID f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with another cache set, and it returns with an error: [root]# cd /sys/block/sdm/bcache [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach -bash: echo: write error: Invalid argument After that, execute a command to modify the label of bcache device: [root]# echo data_disk1 > label Then we reboot the system, when the system power on, the back-end device can not attach to cache_set, a messages show in the log: Feb 5 12:05:52 ceph152 kernel: [922385.508498] bcache: bch_cached_dev_attach() couldn't find uuid for sdm in set In sysfs_attach(), dc->sb.set_uuid was assigned to the value which input through sysfs, no matter whether it is success or not in bch_cached_dev_attach(). For example, If the back-end device has already attached to an cache set, bch_cached_dev_attach() would fail, but dc->sb.set_uuid was changed. Then modify the label of bcache device, it will call bch_write_bdev_super(), which would write the dc->sb.set_uuid to the super block, so we record a wrong cache set ID in the super block, after the system reboot, the cache set couldn't find the uuid of the back-end device, so the bcache device couldn't exist and use any more. In this patch, we don't assigned cache set ID to dc->sb.set_uuid in sysfs_attach() directly, but input it into bch_cached_dev_attach(), and assigned dc->sb.set_uuid to the cache set ID after the back-end device attached to the cache set successful. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07	bcache: return attach error when no cache set exist	Tang Junhui
	I attach a back-end device to a cache set, and the cache set is not registered yet, this back-end device did not attach successfully, and no error returned: [root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/attach [root]# In sysfs_attach(), the return value "v" is initialized to "size" in the beginning, and if no cache set exist in bch_cache_sets, the "v" value would not change any more, and return to sysfs, sysfs regard it as success since the "size" is a positive number. This patch fixes this issue by assigning "v" with "-ENOENT" in the initialization. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07	bcache: set writeback_rate_update_seconds in range [1, 60] seconds	Coly Li
	dc->writeback_rate_update_seconds can be set via sysfs and its value can be set to [1, ULONG_MAX]. It does not make sense to set such a large value, 60 seconds is long enough value considering the default 5 seconds works well for long time. Because dc->writeback_rate_update is a special delayed work, it re-arms itself inside the delayed work routine update_writeback_rate(). When stopping it by cancel_delayed_work_sync(), there should be a timeout to wait and make sure the re-armed delayed work is stopped too. A small max value of dc->writeback_rate_update_seconds is also helpful to decide a reasonable small timeout. This patch limits sysfs interface to set dc->writeback_rate_update_seconds in range of [1, 60] seconds, and replaces the hand-coded number by macros. Changelog: v2: fix a rebase typo in v4, which is pointed out by Michael Lyle. v1: initial version. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07	bcache: fix for allocator and register thread race	Tang Junhui
	After long time running of random small IO writing, I reboot the machine, and after the machine power on, I found bcache got stuck, the stack is: [root@ceph153 ~]# cat /proc/2510/task//stack [<ffffffffa06b2455>] closure_sync+0x25/0x90 [bcache] [<ffffffffa06b6be8>] bch_journal+0x118/0x2b0 [bcache] [<ffffffffa06b6dc7>] bch_journal_meta+0x47/0x70 [bcache] [<ffffffffa06be8f7>] bch_prio_write+0x237/0x340 [bcache] [<ffffffffa06a8018>] bch_allocator_thread+0x3c8/0x3d0 [bcache] [<ffffffff810a631f>] kthread+0xcf/0xe0 [<ffffffff8164c318>] ret_from_fork+0x58/0x90 [<ffffffffffffffff>] 0xffffffffffffffff [root@ceph153 ~]# cat /proc/2038/task//stack [<ffffffffa06b1abd>] __bch_btree_map_nodes+0x12d/0x150 [bcache] [<ffffffffa06b1bd1>] bch_btree_insert+0xf1/0x170 [bcache] [<ffffffffa06b637f>] bch_journal_replay+0x13f/0x230 [bcache] [<ffffffffa06c75fe>] run_cache_set+0x79a/0x7c2 [bcache] [<ffffffffa06c0cf8>] register_bcache+0xd48/0x1310 [bcache] [<ffffffff812f702f>] kobj_attr_store+0xf/0x20 [<ffffffff8125b216>] sysfs_write_file+0xc6/0x140 [<ffffffff811dfbfd>] vfs_write+0xbd/0x1e0 [<ffffffff811e069f>] SyS_write+0x7f/0xe0 [<ffffffff8164c3c9>] system_call_fastpath+0x16/0x1 The stack shows the register thread and allocator thread were getting stuck when registering cache device. I reboot the machine several times, the issue always exsit in this machine. I debug the code, and found the call trace as bellow: register_bcache() ==>run_cache_set() ==>bch_journal_replay() ==>bch_btree_insert() ==>__bch_btree_map_nodes() ==>btree_insert_fn() ==>btree_split() //node need split ==>btree_check_reserve() In btree_check_reserve(), It will check if there is enough buckets of RESERVE_BTREE type, since allocator thread did not work yet, so no buckets of RESERVE_BTREE type allocated, so the register thread waits on c->btree_cache_wait, and goes to sleep. Then the allocator thread initialized, the call trace is bellow: bch_allocator_thread() ==>bch_prio_write() ==>bch_journal_meta() ==>bch_journal() ==>journal_wait_for_write() In journal_wait_for_write(), It will check if journal is full by journal_full(), but the long time random small IO writing causes the exhaustion of journal buckets(journal.blocks_free=0), In order to release the journal buckets, the allocator calls btree_flush_write() to flush keys to btree nodes, and waits on c->journal.wait until btree nodes writing over or there has already some journal buckets space, then the allocator thread goes to sleep. but in btree_flush_write(), since bch_journal_replay() is not finished, so no btree nodes have journal (condition "if (btree_current_write(b)->journal)" never satisfied), so we got no btree node to flush, no journal bucket released, and allocator sleep all the times. Through the above analysis, we can see that: 1) Register thread wait for allocator thread to allocate buckets of RESERVE_BTREE type; 2) Alloctor thread wait for register thread to replay journal, so it can flush btree nodes and get journal bucket. then they are all got stuck by waiting for each other. Hua Rui provided a patch for me, by allocating some buckets of RESERVE_BTREE type in advance, so the register thread can get bucket when btree node splitting and no need to waiting for the allocator thread. I tested it, it has effect, and register thread run a step forward, but finally are still got stuck, the reason is only 8 bucket of RESERVE_BTREE type were allocated, and in bch_journal_replay(), after 2 btree nodes splitting, only 4 bucket of RESERVE_BTREE type left, then btree_check_reserve() is not satisfied anymore, so it goes to sleep again, and in the same time, alloctor thread did not flush enough btree nodes to release a journal bucket, so they all got stuck again. So we need to allocate more buckets of RESERVE_BTREE type in advance, but how much is enough? By experience and test, I think it should be as much as journal buckets. Then I modify the code as this patch, and test in the machine, and it works. This patch modified base on Hua Rui’s patch, and allocate more buckets of RESERVE_BTREE type in advance to avoid register thread and allocate thread going to wait for each other. [patch v2] ca->sb.njournal_buckets would be 0 in the first time after cache creation, and no journal exists, so just 8 btree buckets is OK. Signed-off-by: Hua Rui <huarui.dev@gmail.com> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07	bcache: set error_limit correctly	Coly Li
	Struct cache uses io_errors for two purposes, - Error decay: when cache set error_decay is set, io_errors is used to generate a small piece of delay when I/O error happens. - I/O errors counter: in order to generate big enough value for error decay, I/O errors counter value is stored by left shifting 20 bits (a.k.a IO_ERROR_SHIFT). In function bch_count_io_errors(), if I/O errors counter reaches cache set error limit, bch_cache_set_error() will be called to retire the whold cache set. But current code is problematic when checking the error limit, see the following code piece from bch_count_io_errors(), 90 if (error) { 91 char buf[BDEVNAME_SIZE]; 92 unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT, 93 &ca->io_errors); 94 errors >>= IO_ERROR_SHIFT; 95 96 if (errors < ca->set->error_limit) 97 pr_err("%s: IO error on %s, recovering", 98 bdevname(ca->bdev, buf), m); 99 else 100 bch_cache_set_error(ca->set, 101 "%s: too many IO errors %s", 102 bdevname(ca->bdev, buf), m); 103 } At line 94, errors is right shifting IO_ERROR_SHIFT bits, now it is real errors counter to compare at line 96. But ca->set->error_limit is initia- lized with an amplified value in bch_cache_set_alloc(), 1545 c->error_limit = 8 << IO_ERROR_SHIFT; It means by default, in bch_count_io_errors(), before 8<<20 errors happened bch_cache_set_error() won't be called to retire the problematic cache device. If the average request size is 64KB, it means bcache won't handle failed device until 512GB data is requested. This is too large to be an I/O threashold. So I believe the correct error limit should be much less. This patch sets default cache set error limit to 8, then in bch_count_io_errors() when errors counter reaches 8 (if it is default value), function bch_cache_set_error() will be called to retire the whole cache set. This patch also removes bits shifting when store or show io_error_limit value via sysfs interface. Nowadays most of SSDs handle internal flash failure automatically by LBA address re-indirect mapping. If an I/O error can be observed by upper layer code, it will be a notable error because that SSD can not re-indirect map the problematic LBA address to an available flash block. This situation indicates the whole SSD will be failed very soon. Therefore setting 8 as the default io error limit value makes sense, it is enough for most of cache devices. Changelog: v2: add reviewed-by from Hannes. v1: initial version for review. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>