summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-17tls: fix sw_ctx leakSabrina Dubroca
During setsockopt(SOL_TCP, TLS_TX), if initialization of the software context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't reassign ctx->priv_ctx to NULL, so we can't even do another attempt to set it up on the same socket, as it will fail with -EEXIST. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge tag 'linux-can-fixes-for-4.15-20180116' of ↵David S. Miller
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2018-01-16 this is a pull reqeust of a single patch for net/master: This patch by Stephane Grosjean fixes a potential bug in the packet fragmentation in the peak USB driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17nvme-pci: take sglist coalescing in dma_map_sg into accountChristoph Hellwig
Some iommu implementations can merge physically and/or virtually contiguous segments inside sg_map_dma. The NVMe SGL support does not take this into account and will warn because of falling off a loop. Pass the number of mapped segments to nvme_pci_setup_sgls so that the SGL setup can take the number of mapped segments into account. Reported-by: Fangjian (Turing) <f.fangjian@huawei.com> Fixes: a7a7cbe3 ("nvme-pci: add SGL support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17nvme-pci: check segement valid for SGL useKeith Busch
The driver needs to verify there is a payload with a command before seeing if it should use SGLs to map it. Fixes: 955b1b5a00ba ("nvme-pci: move use_sgl initialization to nvme_init_iod()") Reported-by: Paul Menzel <pmenzel+linux-nvme@molgen.mpg.de> Reviewed-by: Paul Menzel <pmenzel+linux-nvme@molgen.mpg.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17net/tls: Only attach to sockets in ESTABLISHED stateIlya Lesokhin
Calling accept on a TCP socket with a TLS ulp attached results in two sockets that share the same ulp context. The ulp context is freed while a socket is destroyed, so after one of the sockets is released, the second second will trigger a use after free when it tries to access the ulp context attached to it. We restrict the TLS ulp to sockets in ESTABLISHED state to prevent the scenario above. Fixes: 3c4d7559159b ("tls: kernel TLS support") Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17ubi: fastmap: Erase outdated anchor PEBs during attachSascha Hauer
The fastmap update code might erase the current fastmap anchor PEB in case it doesn't find any new free PEB. When a power cut happens in this situation we must not have any outdated fastmap anchor PEB on the device, because that would be used to attach during next boot. The easiest way to make that sure is to erase all outdated fastmap anchor PEBs synchronously during attach. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Richard Weinberger <richard@nod.at> Fixes: dbb7d2a88d2a ("UBI: Add fastmap core") Cc: <stable@vger.kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubifs: switch to fscrypt_prepare_setattr()Eric Biggers
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubifs: switch to fscrypt_prepare_lookup()Eric Biggers
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubifs: switch to fscrypt_prepare_rename()Eric Biggers
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubifs: switch to fscrypt_prepare_link()Eric Biggers
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubifs: switch to fscrypt_file_open()Eric Biggers
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubi: fastmap: Clean up the initialization of pointer pColin Ian King
The pointer p is being initialized with one value and a few lines later being set to a newer replacement value. Clean up the code by using the latter assignment to p as the initial value. Cleans up clang warning: drivers/mtd/ubi/fastmap.c:217:19: warning: Value stored to 'p' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubi: fastmap: Use kmem_cache_free to deallocate memoryPan Bian
Memory allocated by kmem_cache_alloc() should not be deallocated with kfree(). Use kmem_cache_free() instead. Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubi: Fix race condition between ubi volume creation and udevClay McClure
Similar to commit 714fb87e8bc0 ("ubi: Fix race condition between ubi device creation and udev"), we should make the volume active before registering it. Signed-off-by: Clay McClure <clay@daemons.net> Cc: <stable@vger.kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17net: fs_enet: do not call phy_stop() in interruptsChristophe Leroy
In case of TX timeout, fs_timeout() calls phy_stop(), which triggers the following BUG_ON() as we are in interrupt. [92708.199889] kernel BUG at drivers/net/phy/mdio_bus.c:482! [92708.204985] Oops: Exception in kernel mode, sig: 5 [#1] [92708.210119] PREEMPT [92708.212107] CMPC885 [92708.214216] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W 4.9.61 #39 [92708.223227] task: c60f0a40 task.stack: c6104000 [92708.227697] NIP: c02a84bc LR: c02a947c CTR: c02a93d8 [92708.232614] REGS: c6105c70 TRAP: 0700 Tainted: G W (4.9.61) [92708.241193] MSR: 00021032 <ME,IR,DR,RI>[92708.244818] CR: 24000822 XER: 20000000 [92708.248767] GPR00: c02a947c c6105d20 c60f0a40 c62b4c00 00000005 0000001f c069aad8 0001a688 GPR08: 00000007 00000100 c02a93d8 00000000 000005fc 00000000 c6213240 c06338e4 GPR16: 00000001 c06330d4 c0633094 00000000 c0680000 c6104000 c6104000 00000000 GPR24: 00000200 00000000 ffffffff 00000004 00000078 00009032 00000000 c62b4c00 NIP [c02a84bc] mdiobus_read+0x20/0x74 [92708.281517] LR [c02a947c] kszphy_config_intr+0xa4/0xc4 [92708.286547] Call Trace: [92708.288980] [c6105d20] [c6104000] 0xc6104000 (unreliable) [92708.294339] [c6105d40] [c02a947c] kszphy_config_intr+0xa4/0xc4 [92708.300098] [c6105d50] [c02a5330] phy_stop+0x60/0x9c [92708.305007] [c6105d60] [c02c84d0] fs_timeout+0xdc/0x110 [92708.310197] [c6105d80] [c035cd48] dev_watchdog+0x268/0x2a0 [92708.315593] [c6105db0] [c0060288] call_timer_fn+0x34/0x17c [92708.321014] [c6105dd0] [c00605f0] run_timer_softirq+0x21c/0x2e4 [92708.326887] [c6105e50] [c001e19c] __do_softirq+0xf4/0x2f4 [92708.332207] [c6105eb0] [c001e3c8] run_ksoftirqd+0x2c/0x40 [92708.337560] [c6105ec0] [c003b420] smpboot_thread_fn+0x1f0/0x258 [92708.343405] [c6105ef0] [c003745c] kthread+0xbc/0xd0 [92708.348217] [c6105f40] [c000c400] ret_from_kernel_thread+0x5c/0x64 [92708.354275] Instruction dump: [92708.357207] 7c0803a6 bbc10018 38210020 4e800020 7c0802a6 9421ffe0 54290024 bfc10018 [92708.364865] 90010024 7c7f1b78 81290008 552902ee <0f090000> 3bc3002c 7fc3f378 90810008 [92708.372711] ---[ end trace 42b05441616fafd7 ]--- This patch moves fs_timeout() actions into an async worker. Fixes: commit 48257c4f168e5 ("Add fs_enet ethernet network driver, for several embedded platforms") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17r8152: disable RX aggregation on Dell TB16 dockKai-Heng Feng
r8153 on Dell TB15/16 dock corrupts rx packets. This change is suggested by Realtek. They guess that the XHCI controller doesn't have enough buffer, and their guesswork is correct, once the RX aggregation gets disabled, the issue is gone. ASMedia is currently working on a real sulotion for this issue. Dell and ODM confirm the bcdDevice and iSerialNumber is unique for TB16. Note that TB15 has different bcdDevice and iSerialNumber, which are not unique values. If you still have TB15, please contact Dell to replace it with TB16. BugLink: https://bugs.launchpad.net/bugs/1729674 Cc: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - A rather involved set of memory hardware encryption fixes to support the early loading of microcode files via the initrd. These are larger than what we normally take at such a late -rc stage, but there are two mitigating factors: 1) much of the changes are limited to the SME code itself 2) being able to early load microcode has increased importance in the post-Meltdown/Spectre era. - An IRQ vector allocator fix - An Intel RDT driver use-after-free fix - An APIC driver bug fix/revert to make certain older systems boot again - A pkeys ABI fix - TSC calibration fixes - A kdump fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/vector: Fix off by one in error path x86/intel_rdt/cqm: Prevent use after free x86/mm: Encrypt the initrd earlier for BSP microcode update x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption x86/mm: Centralize PMD flags in sme_encrypt_kernel() x86/mm: Use a struct to reduce parameters for SME PGD mapping x86/mm: Clean up register saving in the __enc_copy() assembly code x86/idt: Mark IDT tables __initconst Revert "x86/apic: Remove init_bsp_APIC()" x86/mm/pkeys: Fix fill_sig_info_pkey x86/tsc: Print tsc_khz, when it differs from cpu_khz x86/tsc: Fix erroneous TSC rate on Skylake Xeon x86/tsc: Future-proof native_calibrate_tsc() kdump: Write the correct address of mem_section into vmcoreinfo
2018-01-17Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "A delayacct statistics correctness fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: delayacct: Account blkio completion on the correct task
2018-01-17Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Ingo Molnar: "An Intel RAPL events fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/rapl: Fix Haswell and Broadwell server RAPL event
2018-01-17Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two futex fixes: a input parameters robustness fix, and futex race fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Prevent overflow by strengthen input validation futex: Avoid violating the 10th rule of futex
2018-01-17phy: brcm-sata: fix semicolon.cocci warningsFengguang Wu
drivers/phy/broadcom/phy-brcm-sata.c:534:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: 3e507769d15e ("phy: brcm-sata: Implement calibrate callback") CC: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-01-17tun: fix a memory leak for tfile->tx_arrayCong Wang
tfile->tun could be detached before we close the tun fd, via tun_detach_all(), so it should not be used to check for tfile->tx_array. As Jason suggested, we probably have to clean it up unconditionally both in __tun_deatch() and tun_detach_all(), but this requires to check if it is initialized or not. Currently skb_array_cleanup() doesn't have such a check, so I check it in the caller and introduce a helper function, it is a bit ugly but we can always improve it in net-next. Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: 1576d9860599 ("tun: switch to use skb array for tx") Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti bits and fixes from Thomas Gleixner: "This last update contains: - An objtool fix to prevent a segfault with the gold linker by changing the invocation order. That's not just for gold, it's a general robustness improvement. - An improved error message for objtool which spares tearing hairs. - Make KASAN fail loudly if there is not enough memory instead of oopsing at some random place later - RSB fill on context switch to prevent RSB underflow and speculation through other units. - Make the retpoline/RSB functionality work reliably for both Intel and AMD - Add retpoline to the module version magic so mismatch can be detected - A small (non-fix) update for cpufeatures which prevents cpu feature clashing for the upcoming extra mitigation bits to ease backporting" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: module: Add retpoline tag to VERMAGIC x86/cpufeature: Move processor tracing out of scattered features objtool: Improve error message for bad file argument objtool: Fix seg fault with gold linker x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros x86/retpoline: Fill RSB on context switch for affected CPUs x86/kasan: Panic if there is not enough memory to boot
2018-01-17Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A one-liner fix which prevents deferrable timers becoming stale when the system does not switch into NOHZ mode" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Unconditionally check deferrable base
2018-01-17ARM: net: bpf: clarify tail_call indexRussell King
As per 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT"), the index used for array lookup is defined to be 32-bit wide. Update a misleading comment that suggests it is 64-bit wide. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17ARM: net: bpf: fix LDX instructionsRussell King
When the source and destination register are identical, our JIT does not generate correct code, which leads to kernel oopses. Fix this by (a) generating more efficient code, and (b) making use of the temporary earlier if we will overwrite the address register. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17ARM: net: bpf: fix register savingRussell King
When an eBPF program tail-calls another eBPF program, it enters it after the prologue to avoid having complex stack manipulations. This can lead to kernel oopses, and similar. Resolve this by always using a fixed stack layout, a CPU register frame pointer, and using this when reloading registers before returning. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17ARM: net: bpf: correct stack layout documentationRussell King
The stack layout documentation incorrectly suggests that the BPF JIT scratch space starts immediately below BPF_FP. This is not correct, so let's fix the documentation to reflect reality. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17ARM: net: bpf: move stack documentationRussell King
Move the stack documentation towards the top of the file, where it's relevant for things like the register layout. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17ARM: net: bpf: fix stack alignmentRussell King
As per 2dede2d8e925 ("ARM EABI: stack pointer must be 64-bit aligned after a CPU exception") the stack should be aligned to a 64-bit boundary on EABI systems. Ensure that the eBPF JIT appropraitely aligns the stack. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17ARM: net: bpf: fix tail call jumpsRussell King
When a tail call fails, it is documented that the tail call should continue execution at the following instruction. An example tail call sequence is: 12: (85) call bpf_tail_call#12 13: (b7) r0 = 0 14: (95) exit The ARM assembler for the tail call in this case ends up branching to instruction 14 instead of instruction 13, resulting in the BPF filter returning a non-zero value: 178: ldr r8, [sp, #588] ; insn 12 17c: ldr r6, [r8, r6] 180: ldr r8, [sp, #580] 184: cmp r8, r6 188: bcs 0x1e8 18c: ldr r6, [sp, #524] 190: ldr r7, [sp, #528] 194: cmp r7, #0 198: cmpeq r6, #32 19c: bhi 0x1e8 1a0: adds r6, r6, #1 1a4: adc r7, r7, #0 1a8: str r6, [sp, #524] 1ac: str r7, [sp, #528] 1b0: mov r6, #104 1b4: ldr r8, [sp, #588] 1b8: add r6, r8, r6 1bc: ldr r8, [sp, #580] 1c0: lsl r7, r8, #2 1c4: ldr r6, [r6, r7] 1c8: cmp r6, #0 1cc: beq 0x1e8 1d0: mov r8, #32 1d4: ldr r6, [r6, r8] 1d8: add r6, r6, #44 1dc: bx r6 1e0: mov r0, #0 ; insn 13 1e4: mov r1, #0 1e8: add sp, sp, #596 ; insn 14 1ec: pop {r4, r5, r6, r7, r8, sl, pc} For other sequences, the tail call could end up branching midway through the following BPF instructions, or maybe off the end of the function, leading to unknown behaviours. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17ARM: net: bpf: avoid 'bx' instruction on non-Thumb capable CPUsRussell King
Avoid the 'bx' instruction on CPUs that have no support for Thumb and thus do not implement this instruction by moving the generation of this opcode to a separate function that selects between: bx reg and mov pc, reg according to the capabilities of the CPU. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-01-17mtd: ubi: Use 'max_bad_blocks' to compute bad_peb_limit if availableJeff Westfahl
If the user has not set max_beb_per1024 using either the cmdline or Kconfig options for doing so, use the MTD function 'max_bad_blocks' to compute the UBI bad_peb_limit. Signed-off-by: Jeff Westfahl <jeff.westfahl@ni.com> Signed-off-by: Zach Brown <zach.brown@ni.com> Acked-by: Boris Brezillon <boris.brezillon@free-electron.com> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17ubifs: Fix uninitialized variable in search_dh_cookie()Geert Uytterhoeven
fs/ubifs/tnc.c: In function ‘search_dh_cookie’: fs/ubifs/tnc.c:1893: warning: ‘err’ is used uninitialized in this function Indeed, err is always used uninitialized. According to an original review comment from Hyunchul, acknowledged by Richard, err should be initialized to -ENOENT to avoid the first call to tnc_next(). But we can achieve the same by reordering the code. Fixes: 781f675e2d7e ("ubifs: Fix unlink code wrt. double hash lookups") Reported-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-17block: Fix __bio_integrity_endio() documentationBart Van Assche
Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio") Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17nvme-pci: clean up SMBSZ bit definitionsChristoph Hellwig
Define the bit positions instead of macros using the magic values, and move the expanded helpers to calculate the size and size unit into the implementation C file. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2018-01-17nvme-pci: clean up CMB initializationChristoph Hellwig
Refactor the call to nvme_map_cmb, and change the conditions for probing for the CMB. First remove the version check as NVMe TPs always apply to earlier versions of the spec as well. Second check for the whole CMBSZ register for support of the CMB feature instead of just the size field inside of it to simplify the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2018-01-17nvme-fc: correct hang in nvme_ns_remove()James Smart
When connectivity is lost to a device, the association is terminated and the blk-mq queues are quiesced/stopped. When connectivity is re-established, they are resumed. If connectivity is lost for a sufficient amount of time that the controller is then deleted, the delete path starts tearing down queues, and eventually calling nvme_ns_remove(). It appears that pending commands may cause blk_cleanup_queue() to never complete and the teardown stalls. Correct by starting the ns queues after transitioning to a DELETING state, allowing pending commands to be flushed with io failures. Thus the delete path is clear when reached. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-17nvme-fc: fix rogue admin cmds stalling teardownJames Smart
When connectivity is lost to a device, the association is terminated and the blk-mq queues are quiesced/stopped. When connectivity is re-established, they are resumed. If an admin command is received while connectivity is list, the ioctl queues the command on the admin_q and the command stalls (the thread issuing the ioctl hangs/waits). if the connectivity is lost long enough such that the controller is then deleted, the delete code makes its calls to initiate the delete, which then expects the core layer to call the transport when all references are removed and the controller can be freed. Unfortunately, nothing in this path dequeued the admin command, so a reference sits outstanding and things stop, hanging the delete indefinitely. Correct by unquiescing the admin queue in the delete association. This means any admin command (which should only be from an ioctl) issued after connectivity is lost will detect the controller is in a reconnecting state and will (fast) fail the command. Thus, a pending reference can no longer be created. Once connectivity is re-established, a new ioctl/admin command would see proper device state and function again. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-17blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_requestMike Snitzer
After commit: 923218f6166a ("blk-mq: don't allocate driver tag upfront for flush rq") we no longer use the 'can_block' argument in blk_mq_sched_insert_request(). Kill it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Added actual commit message as to why it's being removed. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedbackMing Lei
blk_insert_cloned_request() is called in the fast path of a dm-rq driver (e.g. blk-mq request-based DM mpath). blk_insert_cloned_request() uses blk_mq_request_bypass_insert() to directly append the request to the blk-mq hctx->dispatch_list of the underlying queue. 1) This way isn't efficient enough because the hctx spinlock is always used. 2) With blk_insert_cloned_request(), we completely bypass underlying queue's elevator and depend on the upper-level dm-rq driver's elevator to schedule IO. But dm-rq currently can't get the underlying queue's dispatch feedback at all. Without knowing whether a request was issued or not (e.g. due to underlying queue being busy) the dm-rq elevator will not be able to provide effective IO merging (as a side-effect of dm-rq currently blindly destaging a request from its elevator only to requeue it after a delay, which kills any opportunity for merging). This obviously causes very bad sequential IO performance. Fix this by updating blk_insert_cloned_request() to use blk_mq_request_direct_issue(). blk_mq_request_direct_issue() allows a request to be issued directly to the underlying queue and returns the dispatch feedback (blk_status_t). If blk_mq_request_direct_issue() returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE to _not_ destage the request. Whereby preserving the opportunity to merge IO. With this, request-based DM's blk-mq sequential IO performance is vastly improved (as much as 3X in mpath/virtio-scsi testing). Signed-off-by: Ming Lei <ming.lei@redhat.com> [blk-mq.c changes heavily influenced by Ming Lei's initial solution, but they were refactored to make them less fragile and easier to read/review] Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17blk-mq: factor out a few helpers from __blk_mq_try_issue_directlyMike Snitzer
No functional change. Just makes code flow more logically. In following commit, __blk_mq_try_issue_directly() will be used to return the dispatch result (blk_status_t) to DM. DM needs this information to improve IO merging. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17blk-mq: turn WARN_ON in __blk_mq_run_hw_queue into printkMing Lei
We know this WARN_ON is harmless and in reality it may be trigged, so convert it to printk() and dump_stack() to avoid to confusing people. Also add comment about two releated races here. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17blk-mq: make sure hctx->next_cpu is set correctlyMing Lei
When hctx->next_cpu is set from possible online CPUs, there is one race in which hctx->next_cpu may be set as >= nr_cpu_ids, and finally break workqueue. The race can be triggered in the following two sitations: 1) when one CPU is becoming DEAD, blk_mq_hctx_notify_dead() is called to dispatch requests from the DEAD cpu context, but at that time, this DEAD CPU has been cleared from 'cpu_online_mask', so all CPUs in hctx->cpumask may become offline, and cause hctx->next_cpu set a bad value. 2) blk_mq_delay_run_hw_queue() is called from CPU B, and found the queue should be run on the other CPU A, then CPU A may become offline at the same time and all CPUs in hctx->cpumask become offline. This patch deals with this issue by re-selecting next CPU, and making sure it is set correctly. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: "jianchao.wang" <jianchao.w.wang@oracle.com> Tested-by: "jianchao.wang" <jianchao.w.wang@oracle.com> Fixes: 20e4d81393 ("blk-mq: simplify queue mapping & schedule with each possisble CPU") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17Merge tag 'perf-core-for-mingo-4.16-20180117' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Fix various per event 'max-stack' and 'call-graph=dwarf' issues, mostly in 'perf trace', allowing to use 'perf trace --call-graph' with 'dwarf' and 'fp' to setup the callgraph details for the syscall events and make that apply to other events, whilhe allowing to override that on a per-event basis, using '-e sched:*switch/call-graph=dwarf/' for instance (Arnaldo Carvalho de Melo) - Improve the --time percent support in record/report/script (Jin Yao) - Fix copyfile_offset update of output offset (Jiri Olsa) - Add python script to profile and resolve physical mem type (Kan Liang) - Add ARM Statistical Profiling Extensions (SPE) support (Kim Phillips) - Remove trailing semicolon in the evlist code (Luis de Bethencourt) - Fix incorrect handling of type _TERM_DRV_CFG (Mathieu Poirier) - Use asprintf when possible in libtraceevent (Federico Vaga) - Fix bad force_token escape sequence in libtraceevent (Michael Sartain) - Add UL suffix to MISSING_EVENTS in libtraceevent (Michael Sartain) - value of unknown symbolic fields in libtraceevent (Jan Kiszka) - libtraceevent updates: (Steven Rostedt) o Show value of flags that have not been parsed o Simplify pointer print logic and fix %pF o Handle new pointer processing of bprint strings o Show contents (in hex) of data of unrecognized type records o Fix get_field_str() for dynamic strings - Add missing break in FALSE case of pevent_filter_clear_trivial() (Taeung Song) - Fix failed memory allocation for get_cpuid_str (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-17Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-17ata: ahci_brcm: Recover from failures to identify devicesFlorian Fainelli
When powering up, the SATA controller may fail to mount the HDD. The SATA controller will lock up, preventing it from negotiating to a lower speed or transmitting data. Root cause is power supply noise creating resonance at 6 Ghz and 3 GHz frequencies, which causes instability in the Clock-Data Recovery (CDR) frontend module, resulting in false acquisition of the clock at SATA 6G/3G speeds. The SATA controller may fail to mount the HDD and lock up, requiring a power cycle. Broadcom chips suspected of being susceptible to this issue include BCM7445, BCM7439, and BCM7366. The Kernel implements an error recovery mechanism that resets the SATA PHY and digital controller when the controller locks up. During this error recovery process, typically there is less activity on the board and Broadcom STB chip, so that the power supply is less noisy, thus allowing the SATA controller to lock correctly. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-01-17phy: brcm-sata: Implement calibrate callbackFlorian Fainelli
Implement the calibration callback to allow turning on the Clock-Data Recovery module clamping when necessary, e.g: during failure to identify a SATA hard drive from ahci_brcm.c::brcm_ahci_read_id. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-01-17aoe: use ktime_t instead of timevalTina Ruchandani
'struct frame' uses two variables to store the sent timestamp - 'struct timeval' and jiffies. jiffies is used to avoid discrepancies caused by updates to system time. 'struct timeval' is deprecated because it uses 32-bit representation for seconds which will overflow in year 2038. This patch does the following: - Replace the use of 'struct timeval' and jiffies with ktime_t, which is the recommended type for timestamping - ktime_t provides both long range (like jiffies) and high resolution (like timeval). Using ktime_get (monotonic time) instead of wall-clock time prevents any discprepancies caused by updates to system time. [updates by Arnd below] The original patch from Tina never went anywhere as we discussed how to keep the impact on performance minimal. I've started over now but arrived at basically the same patch that she had originally, except for an slightly improved tsince_hr() function. I'm making it more robust against overflows, and also optimize explicitly for the common case in which a frame is less than 4.2 seconds old, using only a 32-bit division in that case. This should make the new version more efficient than the old code, since we replace the existing two 32-bit division in do_gettimeofday() plus one multiplication with a single single 32-bit division in tsince_hr() and drop the double bookkeeping. It's also more efficient than the ktime_get_us() API we discussed before, since that would also rely on multiple divisions. Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.html Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Cc: Ed Cashin <ed.cashin@acm.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17drm/vmwgfx: fix memory corruption with legacy/sou connectorsRob Clark
It looks like in all cases 'struct vmw_connector_state' is used. But only in stdu connectors, was atomic_{duplicate,destroy}_state() properly subclassed. Leading to writes beyond the end of the allocated connector state block and all sorts of fun memory corruption related crashes. Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state" Cc: <stable@vger.kernel.org> Signed-off-by: Rob Clark <rclark@redhat.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>