summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-03-30ubifs: Fix debug messages for an invalid filename in ubifs_dump_nodeHyunchul Lee
if a character is not printable, print '?' instead of that. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-03-30ubifs: Remove filename from debug messages in ubifs_readdirHyunchul Lee
if filename is encrypted, filename could have no printable characters. so remove it. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-03-30ubifs: Fix memory leak in error path in ubifs_mknodRichard Weinberger
When fscrypt_setup_filename() fails we have to free dev. Signed-off-by: Richard Weinberger <richard@nod.at>
2017-03-30ubi/upd: Always flush after prepared for an updateSebastian Siewior
In commit 6afaf8a484cb ("UBI: flush wl before clearing update marker") I managed to trigger and fix a similar bug. Now here is another version of which I assumed it wouldn't matter back then but it turns out UBI has a check for it and will error out like this: |ubi0 warning: validate_vid_hdr: inconsistent used_ebs |ubi0 error: validate_vid_hdr: inconsistent VID header at PEB 592 All you need to trigger this is? "ubiupdatevol /dev/ubi0_0 file" + a powercut in the middle of the operation. ubi_start_update() sets the update-marker and puts all EBs on the erase list. After that userland can proceed to write new data while the old EB aren't erased completely. A powercut at this point is usually not that much of a tragedy. UBI won't give read access to the static volume because it has the update marker. It will most likely set the corrupted flag because it misses some EBs. So we are all good. Unless the size of the image that has been written differs from the old image in the magnitude of at least one EB. In that case UBI will find two different values for `used_ebs' and refuse to attach the image with the error message mentioned above. So in order not to get in the situation, the patch will ensure that we wait until everything is removed before it tries to write any data. The alternative would be to detect such a case and remove all EBs at the attached time after we processed the volume-table and see the update-marker set. The patch looks bigger and I doubt it is worth it since usually the write() will wait from time to time for a new EB since usually there not that many spare EB that can be used. Cc: stable@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-03-30s390/uaccess: get_user() should zero on failure (again)Heiko Carstens
Commit fd2d2b191fe7 ("s390: get_user() should zero on failure") intended to fix s390's get_user() implementation which did not zero the target operand if the read from user space faulted. Unfortunately the patch has no effect: the corresponding inline assembly specifies that the operand is only written to ("=") and the previous value is discarded. Therefore the compiler is free to and actually does omit the zero initialization. To fix this simply change the contraint modifier to "+", so the compiler cannot omit the initialization anymore. Fixes: c9ca78415ac1 ("s390/uaccess: provide inline variants of get_user/put_user") Fixes: fd2d2b191fe7 ("s390: get_user() should zero on failure") Cc: stable@vger.kernel.org Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-30drm/i915/gvt: Activate/de-activate vGPU in mdev ops.Zhi Wang
This patch introduces two functions for activating/de-activating vGPU in mdev ops. A racing condition was found between virtual vblank emulation and KVGMT mdev release path. V-blank emulation will emulate and inject V-blank interrupt for every active vGPU with holding gvt->lock, while in mdev release path, it will directly release hypervisor handle without changing vGPU status or taking gvt->lock, so a kernel oops is encountered when vblank emulation is injecting a interrupt with a invalid hypervisor handle. (Reported by Terrence) To solve this problem, we factor out vGPU activation/de-activation from vGPU creation/destruction path and let KVMGT mdev release ops de-activate the vGPU before release hypervisor handle. Once a vGPU is de-activated, GVT-g will not emulate v-blank for it or touch the hypervisor handle. Fixes: 659643f ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT") Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-03-29Merge branch 'for-rc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management fixes from Zhang Rui: - Fix a potential deadlock in cpu_cooling driver, which was introduced in 4.11-rc1. (Matthew Wilcox) - Fix the cpu_cooling and devfreq_cooling code to handle possible error return value from OPP calls, together with three minor fixes in the same patch series. (Viresh Kumar) * 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal: cpu_cooling: Check OPP for errors thermal: cpu_cooling: Replace dev_warn with dev_err thermal: devfreq: Check OPP for errors thermal: devfreq_cooling: Replace dev_warn with dev_err thermal: devfreq: Simplify expression thermal: Fix potential deadlock in cpu_cooling
2017-03-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a rather large update with Netfilter fixes, specifically targeted to incorrect RCU usage in several spots and the userspace conntrack helper infrastructure (nfnetlink_cthelper), more specifically they are: 1) expect_class_max is incorrect set via cthelper, as in kernel semantics mandate that this represents the array of expectation classes minus 1. Patch from Liping Zhang. 2) Expectation policy updates via cthelper are currently broken for several reasons: This code allows illegal changes in the policy such as changing the number of expeciation classes, it is leaking the updated policy and such update occurs with no RCU protection at all. Fix this by adding a new nfnl_cthelper_update_policy() that describes what is really legal on the update path. 3) Fix several memory leaks in cthelper, from Jeffy Chen. 4) synchronize_rcu() is missing in the removal path of several modules, this may lead to races since CPU may still be running on code that has just gone. Also from Liping Zhang. 5) Don't use the helper hashtable from cthelper, it is not safe to walk over those bits without the helper mutex. Fix this by introducing a new independent list for userspace helpers. From Liping Zhang. 6) nf_ct_extend_unregister() needs synchronize_rcu() to make sure no packets are walking on any conntrack extension that is gone after module removal, again from Liping. 7) nf_nat_snmp may crash if we fail to unregister the helper due to accidental leftover code, from Gao Feng. 8) Fix leak in nfnetlink_queue with secctx support, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Five fixes for this series: - a fix from me to ensure that blk-mq drivers that terminate IO in their ->queue_rq() handler by returning QUEUE_ERROR don't stall with a scheduler enabled. - four nbd fixes from Josef and Ratna, fixing various problems that are critical enough to go in for this cycle. They have been well tested" * 'for-linus' of git://git.kernel.dk/linux-block: nbd: replace kill_bdev() with __invalidate_device() nbd: set queue timeout properly nbd: set rq->errors to actual error code nbd: handle ERESTARTSYS properly blk-mq: include errors in did_work calculation
2017-03-29ezchip: nps_enet: check if napi has been completedZakharov Vlad
After a new NAPI_STATE_MISSED state was added to NAPI we can get into this state and in such case we have to reschedule NAPI as some work is still pending and we have to process it. napi_complete_done() function returns false if we have to reschedule something (e.g. in case we were in MISSED state) as current polling have not been completed yet. nps_enet driver hasn't been verifying the return value of napi_complete_done() and has been forcibly enabling interrupts. That is not correct as we should not enable interrupts before we have processed all scheduled work. As a result we were getting trapped in interrupt hanlder chain as we had never been able to disabale ethernet interrupts again. So this patch makes nps_enet_poll() func verify return value of napi_complete_done() and enable interrupts only in case all scheduled work has been completed. Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29Merge branch 'bnxt_en-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: Small misc. fixes. Fix a NULL pointer crash in open failure path, wrong arguments when printing error messages, and a DMA unmap bug in XDP shutdown path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29bnxt_en: Fix DMA unmapping of the RX buffers in XDP mode during shutdown.Michael Chan
In bnxt_free_rx_skbs(), which is called to free up all RX buffers during shutdown, we need to unmap the page if we are running in XDP mode. Fixes: c61fb99cae51 ("bnxt_en: Add RX page mode support.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29bnxt_en: Correct the order of arguments to netdev_err() in bnxt_set_tpa()Sankar Patchineelam
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29bnxt_en: Fix NULL pointer dereference in reopen failure pathSankar Patchineelam
Net device reset can fail when the h/w or f/w is in a bad state. Subsequent netdevice open fails in bnxt_hwrm_stat_ctx_alloc(). The cleanup invokes bnxt_hwrm_resource_free() which inturn calls bnxt_disable_int(). In this routine, the code segment if (ring->fw_ring_id != INVALID_HW_RING_ID) BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons); results in NULL pointer dereference as cpr->cp_doorbell is not yet initialized, and fw_ring_id is zero. The fix is to initialize cpr fw_ring_id to INVALID_HW_RING_ID before bnxt_init_chip() is invoked. Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29cpuidle: powernv: Pass correct drv->cpumask for registrationVaidyanathan Srinivasan
drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init(). On PowerNV platform cpu_present could be less than cpu_possible in cases where firmware detects the cpu, but it is not available to the OS. When CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence we skip creating cpu_device. This breaks cpuidle on powernv where register_cpu() is not called for cpus in cpu_possible_mask that cannot be hot-added at runtime. Trying cpuidle_register_device() on cpu without cpu_device will cause crash like this: cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490] pc: c00000000022c8bc: string+0x34/0x60 lr: c00000000022ed78: vsnprintf+0x284/0x42c sp: c000000ff1503710 msr: 9000000000009033 dar: 6000000060000000 current = 0xc000000ff1480000 paca = 0xc00000000fe82d00 softe: 0 irq_happened: 0x01 pid = 1, comm = swapper/8 Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4 (Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017 enter ? for help [link register ] c00000000022ed78 vsnprintf+0x284/0x42c [c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable) [c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44 [c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc [c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74 [c000000ff15038c0] c000000000619694 printk+0x38/0x4c [c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60 [c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4 [c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78 [c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0 [c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c [c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc [c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0 [c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c [c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c [c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c [c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78 This patch fixes the bug by passing correct cpumask from powernv-cpuidle driver. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [ rjw: Comment massage ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-29Merge branch 'apw' (xfrm_user fixes)Linus Torvalds
Merge xfrm_user validation fixes from Andy Whitcroft: "Two patches we are applying to Ubuntu for XFRM_MSG_NEWAE validation issue reported by ZDI. The first of these is the primary fix, and the second is for a more theoretical issue that Kees pointed out when reviewing the first" * emailed patches from Andy Whitcroft <apw@canonical.com>: xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
2017-03-29Merge branch 'for-chris-4.11-rc5' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11
2017-03-29parisc: Avoid stalled CPU warnings after system shutdownHelge Deller
Commit 73580dac7618 ("parisc: Fix system shutdown halt") introduced an endless loop for systems which don't provide a software power off function. But the soft lockup detector will detect this and report stalled CPUs after some time. Avoid those unwanted warnings by disabling the soft lockup detector. Fixes: 73580dac7618 ("parisc: Fix system shutdown halt") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # 4.9+
2017-03-29parisc: Clean up fixup routines for get_user()/put_user()Helge Deller
Al Viro noticed that userspace accesses via get_user()/put_user() can be simplified a lot with regard to usage of the exception handling. This patch implements a fixup routine for get_user() and put_user() in such that the exception handler will automatically load -EFAULT into the register %r8 (the error value) in case on a fault on userspace. Additionally the fixup routine will zero the target register on fault in case of a get_user() call. The target register is extracted out of the faulting assembly instruction. This patch brings a few benefits over the old implementation: 1. Exception handling gets much cleaner, easier and smaller in size. 2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped. 3. No need to hardcode %r9 as target register for get_user() any longer. This helps the compiler register allocator and thus creates less assembler statements. 4. No dependency on the exception_data contents any longer. 5. Nested faults will be handled cleanly. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Helge Deller <deller@gmx.de>
2017-03-29parisc: Fix access fault handling in pa_memcpy()Helge Deller
pa_memcpy() is the major memcpy implementation in the parisc kernel which is used to do any kind of userspace/kernel memory copies. Al Viro noticed various bugs in the implementation of pa_mempcy(), most notably that in case of faults it may report back to have copied more bytes than it actually did. Fixing those bugs is quite hard in the C-implementation, because the compiler is messing around with the registers and we are not guaranteed that specific variables are always in the same processor registers. This makes proper fault handling complicated. This patch implements pa_memcpy() in assembler. That way we have correct fault handling and adding a 64-bit copy routine was quite easy. Runtime tested with 32- and 64bit kernels. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2017-03-29blk-mq: include errors in did_work calculationJens Axboe
Currently we return true in blk_mq_dispatch_rq_list() if we queued IO successfully, but we really want to return whether or not the we made progress. Progress includes if we got an error return. If we don't, this can lead to a hang in blk_mq_sched_dispatch_requests() when a driver is draining IO by returning BLK_MQ_QUEUE_ERROR instead of manually ending the IO in error and return BLK_MQ_QUEUE_OK. Tested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29block-mq: don't re-queue if we get a queue errorJosef Bacik
When try to issue a request directly and we fail we will requeue the request, but call blk_mq_end_request() as well. This leads to the completed request being on a queuelist and getting ended twice, which causes list corruption in schedulers and other shenanigans. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29blkcg: allocate struct blkcg_gq outside request queue spinlockTahsin Erdogan
blkg_conf_prep() currently calls blkg_lookup_create() while holding request queue spinlock. This means allocating memory for struct blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM failures in call paths like below: pcpu_alloc+0x68f/0x710 __alloc_percpu_gfp+0xd/0x10 __percpu_counter_init+0x55/0xc0 cfq_pd_alloc+0x3b2/0x4e0 blkg_alloc+0x187/0x230 blkg_create+0x489/0x670 blkg_lookup_create+0x9a/0x230 blkg_conf_prep+0x1fb/0x240 __cfqg_set_weight_device.isra.105+0x5c/0x180 cfq_set_weight_on_dfl+0x69/0xc0 cgroup_file_write+0x39/0x1c0 kernfs_fop_write+0x13f/0x1d0 __vfs_write+0x23/0x120 vfs_write+0xc2/0x1f0 SyS_write+0x44/0xb0 entry_SYSCALL_64_fastpath+0x18/0xad In the code path above, percpu allocator cannot call vmalloc() due to queue spinlock. A failure in this call path gives grief to tools which are trying to configure io weights. We see occasional failures happen shortly after reboots even when system is not under any memory pressure. Machines with a lot of cpus are more vulnerable to this condition. Do struct blkcg_gq allocations outside the queue spinlock to allow blocking during memory allocations. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29Revert "blkcg: allocate struct blkcg_gq outside request queue spinlock"Jens Axboe
I inadvertently applied the v5 version of this patch, whereas the agreed upon version was v5. Revert this one so we can apply the right one. This reverts commit 7fc6b87a9ff537e7df32b1278118ce9c5bcd6788.
2017-03-29blk-mq: fix a typo and a spelling mistakeJens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29blk-mq-pci: Fix two spelling mistakesSagi Grimberg
Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29ARM: 8665/1: nommu: access ID_PFR1 only if CPUID schemeafzal mohammed
Greg upon trying to boot no-MMU Kernel on ARM926EJ reported boot failure. He root caused it to ID_PFR1 access introduced by the commit mentioned in the fixes tag below. All CP15 processors need not have processor feature registers, only for architectures defined by CPUID scheme would have it. Hence check for it before accessing processor feature register, ID_PFR1. Fixes: f8300a0b5de0 ("ARM: 8647/2: nommu: dynamic exception base address setting") Reported-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Tested-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-03-29ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memoryRussell King
dma_get_sgtable() tries to create a scatterlist table containing valid struct page pointers for the coherent memory allocation passed in to it. However, memory can be declared via dma_declare_coherent_memory(), or via other reservation schemes which means that coherent memory is not guaranteed to be backed by struct pages. In such cases, the resulting scatterlist table contains pointers to invalid pages, which causes kernel oops later. This patch adds detection of such memory, and refuses to create a scatterlist table for such memory. Reported-by: Shuah Khan <shuahkhan@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-03-29l2tp: purge socket queues in the .destruct() callbackGuillaume Nault
The Rx path may grab the socket right before pppol2tp_release(), but nothing guarantees that it will enqueue packets before skb_queue_purge(). Therefore, the socket can be destroyed without its queues fully purged. Fix this by purging queues in pppol2tp_session_destruct() where we're guaranteed nothing is still referencing the socket. Fixes: 9e9cb6221aa7 ("l2tp: fix userspace reception on plain L2TP sockets") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29l2tp: hold tunnel socket when handling control frames in l2tp_ip and l2tp_ip6Guillaume Nault
The code following l2tp_tunnel_find() expects that a new reference is held on sk. Either sk_receive_skb() or the discard_put error path will drop a reference from the tunnel's socket. This issue exists in both l2tp_ip and l2tp_ip6. Fixes: a3c18422a4b4 ("l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29Merge branch 'regset' (PTRACE_SETREGSET data leakage)Linus Torvalds
Merge PTRACE_SETREGSET leakage fixes from Dave Martin: "This series is the collection of fixes I proposed on this topic, that have not yet appeared upstream or in the stable branches, The issue can leak kernel stack, but doesn't appear to allow userspace to attack the kernel directly. The affected architectures are c6x, h8300, metag, mips and sparc. [ Mark Salter points out that c6x has no MMU or other mechanism to prevent userspace access to kernel code or data on c6x, but it doesn't hurt to clean that case up too. ] The bugs arise from use of user_regset_copyin(). Users of user_regset_copyin() can work in one of two ways: 1) Copy directly to thread_struct or equivalent. (This seems to be the design assumption of the regset API, and is the most common approach.) 2) Copy to a local variable and then transfer to thread_struct. (A significant minority of cases.) Buggy code typically involves approach 2" * emailed patches from Dave Martin <Dave.Martin@arm.com>: sparc/ptrace: Preserve previous registers for short regset write mips/ptrace: Preserve previous registers for short regset write metag/ptrace: Reject partial NT_METAG_RPIPE writes metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS metag/ptrace: Preserve previous registers for short regset write h8300/ptrace: Fix incorrect register transfer count c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
2017-03-29sparc/ptrace: Preserve previous registers for short regset writeDave Martin
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29mips/ptrace: Preserve previous registers for short regset writeDave Martin
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29metag/ptrace: Reject partial NT_METAG_RPIPE writesDave Martin
It's not clear what behaviour is sensible when doing partial write of NT_METAG_RPIPE, so just don't bother. This patch assumes that userspace will never rely on a partial SETREGSET in this case, since it's not clear what should happen anyway. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUSDave Martin
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill TXSTATUS, a well-defined default value is used, based on the task's current value. Suggested-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29metag/ptrace: Preserve previous registers for short regset writeDave Martin
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29h8300/ptrace: Fix incorrect register transfer countDave Martin
regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun if CONFIG_CPU_H8S is set, since this adds an extra entry to register_offset[] but not to user_regs_struct. So, iterate over user_regs_struct based on its actual size, not based on the length of register_offset[]. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29c6x/ptrace: Remove useless PTRACE_SETREGSET implementationDave Martin
gpr_set won't work correctly and can never have been tested, and the correct behaviour is not clear due to the endianness-dependent task layout. So, just remove it. The core code will now return -EOPNOTSUPPORT when trying to set NT_PRSTATUS on this architecture until/unless a correct implementation is supplied. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harderAndy Whitcroft
Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to wrapping issues. To ensure we are correctly ensuring that the two ESN structures are the same size compare both the overall size as reported by xfrm_replay_state_esn_len() and the internal length are the same. CVE-2017-7184 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_windowAndy Whitcroft
When a new xfrm state is created during an XFRM_MSG_NEWSA call we validate the user supplied replay_esn to ensure that the size is valid and to ensure that the replay_window size is within the allocated buffer. However later it is possible to update this replay_esn via a XFRM_MSG_NEWAE call. There we again validate the size of the supplied buffer matches the existing state and if so inject the contents. We do not at this point check that the replay_window is within the allocated memory. This leads to out-of-bounds reads and writes triggered by netlink packets. This leads to memory corruption and the potential for priviledge escalation. We already attempt to validate the incoming replay information in xfrm_new_ae() via xfrm_replay_verify_len(). This confirms that the user is not trying to change the size of the replay state buffer which includes the replay_esn. It however does not check the replay_window remains within that buffer. Add validation of the contained replay_window. CVE-2017-7184 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixesJames Bottomley
2017-03-29block: fix leak of q->rq_wbOmar Sandoval
CONFIG_DEBUG_TEST_DRIVER_REMOVE found a possible leak of q->rq_wb when a request queue is reregistered. This has been a problem since wbt was introduced, but the WARN_ON(!list_empty(&stats->callbacks)) in the blk-stat rework exposed it. Fix it by cleaning up wbt when we unregister the queue. Fixes: 87760e5eef35 ("block: hook up writeback throttling") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29blk-mq: fix leak of q->statsOmar Sandoval
blk_alloc_queue_node() already allocates q->stats, so blk_mq_init_allocated_queue() is overwriting it with a new allocation. Fixes: a83b576c9c25 ("block: fix stacked driver stats init and free") Reviewed-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29block: warn if sharing request queue across gendisksOmar Sandoval
Now that the remaining drivers have been converted to one request queue per gendisk, let's warn if a request queue gets registered more than once. This will catch future drivers which might do it inadvertently or any old drivers that I may have missed. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29block: block new I/O just after queue is set as dyingMing Lei
Before commit 780db2071a(blk-mq: decouble blk-mq freezing from generic bypassing), the dying flag is checked before entering queue, and Tejun converts the checking into .mq_freeze_depth, and assumes the counter is increased just after dying flag is set. Unfortunately we doesn't do that in blk_set_queue_dying(). This patch calls blk_freeze_queue_start() in blk_set_queue_dying(), so that we can block new I/O coming once the queue is set as dying. Given blk_set_queue_dying() is always called in remove path of block device, and queue will be cleaned up later, we don't need to worry about undoing the counter. Cc: Tejun Heo <tj@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29block: rename blk_mq_freeze_queue_start()Ming Lei
As the .q_usage_counter is used by both legacy and mq path, we need to block new I/O if queue becomes dead in blk_queue_enter(). So rename it and we can use this function in both paths. Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29block: add a read barrier in blk_queue_enter()Ming Lei
Without the barrier, reading DEAD flag of .q_usage_counter and reading .mq_freeze_depth may be reordered, then the following wait_event_interruptible() may never return. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29blk-mq: comment on races related with timeout handlerMing Lei
This patch adds comment on two races related with timeout handler: - requeue from queue busy vs. timeout - rq free & reallocation vs. timeout Both the races themselves and current solution aren't explicit enough, so add comments on them. Cc: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29blk-mq: don't complete un-started request in timeout handlerMing Lei
When iterating busy requests in timeout handler, if the STARTED flag of one request isn't set, that means the request is being processed in block layer or driver, and isn't submitted to hardware yet. In current implementation of blk_mq_check_expired(), if the request queue becomes dying, un-started requests are handled as being completed/freed immediately. This way is wrong, and can cause rq corruption or double allocation[1][2], when doing I/O and removing&resetting NVMe device at the sametime. This patch fixes several issues reported by Yi Zhang. [1]. oops log 1 [ 581.789754] ------------[ cut here ]------------ [ 581.789758] kernel BUG at block/blk-mq.c:374! [ 581.789760] invalid opcode: 0000 [#1] SMP [ 581.789761] Modules linked in: vfat fat ipmi_ssif intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvme irqbypass crct10dif_pclmul nvme_core crc32_pclmul ghash_clmulni_intel intel_cstate ipmi_si mei_me ipmi_devintf intel_uncore sg ipmi_msghandler intel_rapl_perf iTCO_wdt mei iTCO_vendor_support mxm_wmi lpc_ich dcdbas shpchp pcspkr acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd dm_multipath grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci crc32c_intel tg3 libata megaraid_sas i2c_core ptp fjes pps_core dm_mirror dm_region_hash dm_log dm_mod [ 581.789796] CPU: 1 PID: 1617 Comm: kworker/1:1H Not tainted 4.10.0.bz1420297+ #4 [ 581.789797] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016 [ 581.789804] Workqueue: kblockd blk_mq_timeout_work [ 581.789806] task: ffff8804721c8000 task.stack: ffffc90006ee4000 [ 581.789809] RIP: 0010:blk_mq_end_request+0x58/0x70 [ 581.789810] RSP: 0018:ffffc90006ee7d50 EFLAGS: 00010202 [ 581.789811] RAX: 0000000000000001 RBX: ffff8802e4195340 RCX: ffff88028e2f4b88 [ 581.789812] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000 [ 581.789813] RBP: ffffc90006ee7d60 R08: 0000000000000003 R09: ffff88028e2f4b00 [ 581.789814] R10: 0000000000001000 R11: 0000000000000001 R12: 00000000fffffffb [ 581.789815] R13: ffff88042abe5780 R14: 000000000000002d R15: ffff88046fbdff80 [ 581.789817] FS: 0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000 [ 581.789818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 581.789819] CR2: 00007f64f403a008 CR3: 000000014d078000 CR4: 00000000001406e0 [ 581.789820] Call Trace: [ 581.789825] blk_mq_check_expired+0x76/0x80 [ 581.789828] bt_iter+0x45/0x50 [ 581.789830] blk_mq_queue_tag_busy_iter+0xdd/0x1f0 [ 581.789832] ? blk_mq_rq_timed_out+0x70/0x70 [ 581.789833] ? blk_mq_rq_timed_out+0x70/0x70 [ 581.789840] ? __switch_to+0x140/0x450 [ 581.789841] blk_mq_timeout_work+0x88/0x170 [ 581.789845] process_one_work+0x165/0x410 [ 581.789847] worker_thread+0x137/0x4c0 [ 581.789851] kthread+0x101/0x140 [ 581.789853] ? rescuer_thread+0x3b0/0x3b0 [ 581.789855] ? kthread_park+0x90/0x90 [ 581.789860] ret_from_fork+0x2c/0x40 [ 581.789861] Code: 48 85 c0 74 0d 44 89 e6 48 89 df ff d0 5b 41 5c 5d c3 48 8b bb 70 01 00 00 48 85 ff 75 0f 48 89 df e8 7d f0 ff ff 5b 41 5c 5d c3 <0f> 0b e8 71 f0 ff ff 90 eb e9 0f 1f 40 00 66 2e 0f 1f 84 00 00 [ 581.789882] RIP: blk_mq_end_request+0x58/0x70 RSP: ffffc90006ee7d50 [ 581.789889] ---[ end trace bcaf03d9a14a0a70 ]--- [2]. oops log2 [ 6984.857362] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 6984.857372] IP: nvme_queue_rq+0x6e6/0x8cd [nvme] [ 6984.857373] PGD 0 [ 6984.857374] [ 6984.857376] Oops: 0000 [#1] SMP [ 6984.857379] Modules linked in: ipmi_ssif vfat fat intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_devintf intel_cstate sg dcdbas intel_uncore mei_me intel_rapl_perf mei pcspkr lpc_ich ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss dm_multipath nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crc32c_intel sysimgblt fb_sys_fops ttm nvme drm nvme_core ahci libahci i2c_core tg3 libata ptp megaraid_sas pps_core fjes dm_mirror dm_region_hash dm_log dm_mod [ 6984.857416] CPU: 7 PID: 1635 Comm: kworker/7:1H Not tainted 4.10.0-2.el7.bz1420297.x86_64 #1 [ 6984.857417] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016 [ 6984.857427] Workqueue: kblockd blk_mq_run_work_fn [ 6984.857429] task: ffff880476e3da00 task.stack: ffffc90002e90000 [ 6984.857432] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme] [ 6984.857433] RSP: 0018:ffffc90002e93c50 EFLAGS: 00010246 [ 6984.857434] RAX: 0000000000000000 RBX: ffff880275646600 RCX: 0000000000001000 [ 6984.857435] RDX: 0000000000000fff RSI: 00000002fba2a000 RDI: ffff8804734e6950 [ 6984.857436] RBP: ffffc90002e93d30 R08: 0000000000002000 R09: 0000000000001000 [ 6984.857437] R10: 0000000000001000 R11: 0000000000000000 R12: ffff8804741d8000 [ 6984.857438] R13: 0000000000000040 R14: ffff880475649f80 R15: ffff8804734e6780 [ 6984.857439] FS: 0000000000000000(0000) GS:ffff88047fcc0000(0000) knlGS:0000000000000000 [ 6984.857440] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6984.857442] CR2: 0000000000000010 CR3: 0000000001c09000 CR4: 00000000001406e0 [ 6984.857443] Call Trace: [ 6984.857451] ? mempool_free+0x2b/0x80 [ 6984.857455] ? bio_free+0x4e/0x60 [ 6984.857459] blk_mq_dispatch_rq_list+0xf5/0x230 [ 6984.857462] blk_mq_process_rq_list+0x133/0x170 [ 6984.857465] __blk_mq_run_hw_queue+0x8c/0xa0 [ 6984.857467] blk_mq_run_work_fn+0x12/0x20 [ 6984.857473] process_one_work+0x165/0x410 [ 6984.857475] worker_thread+0x137/0x4c0 [ 6984.857478] kthread+0x101/0x140 [ 6984.857480] ? rescuer_thread+0x3b0/0x3b0 [ 6984.857481] ? kthread_park+0x90/0x90 [ 6984.857489] ret_from_fork+0x2c/0x40 [ 6984.857490] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44 89 95 60 ff ff ff e8 b7 dd 12 e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c> 8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff [ 6984.857511] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: ffffc90002e93c50 [ 6984.857512] CR2: 0000000000000010 [ 6984.895359] ---[ end trace 2d7ceb528432bf83 ]--- Cc: stable@vger.kernel.org Reported-by: Yi Zhang <yizhan@redhat.com> Tested-by: Yi Zhang <yizhan@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29drm/etnaviv: (re-)protect fence allocation with GPU mutexLucas Stach
The fence allocation needs to be protected by the GPU mutex, otherwise the fence seqnos of concurrent submits might not match the insertion order of the jobs in the kernel ring. This breaks the assumption that jobs complete with monotonically increasing fence seqnos. Fixes: d9853490176c (drm/etnaviv: take GPU lock later in the submit process) CC: stable@vger.kernel.org #4.9+ Signed-off-by: Lucas Stach <l.stach@pengutronix.de>