summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-05-24Merge tag 'for-linus-20180524' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Two fixes that should go into this release: - a loop writeback error clearing fix from Jeff - the sr sense fix from myself" * tag 'for-linus-20180524' of git://git.kernel.dk/linux-block: loop: clear wb_err in bd_inode when detaching backing file sr: pass down correctly sized SCSI sense buffer
2018-05-24Merge tag 'pm-4.17-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a regression from the 4.15 cycle that caused the system suspend and resume overhead to increase on many systems and triggered more serious problems on some of them (Rafael Wysocki)" * tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / core: Fix direct_complete handling for devices with no callbacks
2018-05-24bpf: properly enforce index mask to prevent out-of-bounds speculationDaniel Borkmann
While reviewing the verifier code, I recently noticed that the following two program variants in relation to tail calls can be loaded. Variant 1: # bpftool p d x i 15 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:5] 3: (05) goto pc+2 4: (18) r2 = map[id:6] 6: (b7) r3 = 7 7: (35) if r3 >= 0xa0 goto pc+2 8: (54) (u32) r3 &= (u32) 255 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 5 5: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B # bpftool m s i 6 6: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B Variant 2: # bpftool p d x i 20 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:8] 3: (05) goto pc+2 4: (18) r2 = map[id:7] 6: (b7) r3 = 7 7: (35) if r3 >= 0x4 goto pc+2 8: (54) (u32) r3 &= (u32) 3 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 8 8: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B # bpftool m s i 7 7: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B In both cases the index masking inserted by the verifier in order to control out of bounds speculation from a CPU via b2157399cc98 ("bpf: prevent out-of-bounds speculation") seems to be incorrect in what it is enforcing. In the 1st variant, the mask is applied from the map with the significantly larger number of entries where we would allow to a certain degree out of bounds speculation for the smaller map, and in the 2nd variant where the mask is applied from the map with the smaller number of entries, we get buggy behavior since we truncate the index of the larger map. The original intent from commit b2157399cc98 is to reject such occasions where two or more different tail call maps are used in the same tail call helper invocation. However, the check on the BPF_MAP_PTR_POISON is never hit since we never poisoned the saved pointer in the first place! We do this explicitly for map lookups but in case of tail calls we basically used the tail call map in insn_aux_data that was processed in the most recent path which the verifier walked. Thus any prior path that stored a pointer in insn_aux_data at the helper location was always overridden. Fix it by moving the map pointer poison logic into a small helper that covers both BPF helpers with the same logic. After that in fixup_bpf_calls() the poison check is then hit for tail calls and the program rejected. Latter only happens in unprivileged case since this is the *only* occasion where a rewrite needs to happen, and where such rewrite is specific to the map (max_entries, index_mask). In the privileged case the rewrite is generic for the insn->imm / insn->code update so multiple maps from different paths can be handled just fine since all the remaining logic happens in the instruction processing itself. This is similar to the case of map lookups: in case there is a collision of maps in fixup_bpf_calls() we must skip the inlined rewrite since this will turn the generic instruction sequence into a non- generic one. Thus the patch_call_imm will simply update the insn->imm location where the bpf_map_lookup_elem() will later take care of the dispatch. Given we need this 'poison' state as a check, the information of whether a map is an unpriv_array gets lost, so enforcing it prior to that needs an additional state. In general this check is needed since there are some complex and tail call intensive BPF programs out there where LLVM tends to generate such code occasionally. We therefore convert the map_ptr rather into map_state to store all this w/o extra memory overhead, and the bit whether one of the maps involved in the collision was from an unpriv_array thus needs to be retained as well there. Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24x86/kvm: fix LAPIC timer drift when guest uses periodic modeDavid Vrabel
Since 4.10, commit 8003c9ae204e (KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support), guests using periodic LAPIC timers (such as FreeBSD 8.4) would see their timers drift significantly over time. Differences in the underlying clocks and numerical errors means the periods of the two timers (hv and sw) are not the same. This difference will accumulate with every expiry resulting in a large error between the hv and sw timer. This means the sw timer may be running slow when compared to the hv timer. When the timer is switched from hv to sw, the now active sw timer will expire late. The guest VCPU is reentered and it switches to using the hv timer. This timer catches up, injecting multiple IRQs into the guest (of which the guest only sees one as it does not get to run until the hv timer has caught up) and thus the guest's timer rate is low (and becomes increasing slower over time as the sw timer lags further and further behind). I believe a similar problem would occur if the hv timer is the slower one, but I have not observed this. Fix this by synchronizing the deadlines for both timers to the same time source on every tick. This prevents the errors from accumulating. Fixes: 8003c9ae204e21204e49816c5ea629357e283b06 Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: David Vrabel <david.vrabel@nutanix.com> Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24Merge tag 'kvm-ppc-fixes-4.17-1' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc Fixes for PPC KVM: - Close a hole which could possibly lead to the host timebase getting out of sync. - Three fixes relating to PTEs and TLB entries for radix guests. - Fix a bug which could lead to an interrupt never getting delivered to the guest, if it is pending for a guest vCPU when the vCPU gets offlined.
2018-05-24spi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCACGeert Uytterhoeven
According to section 59.2.4 MSIOF Receive Mode Register 1 (SIRMDR1) in the R-Car Gen3 datasheet Rev.1.00, the value of the SIRMDR1.SYNCAC bit must match the value of the SITMDR1.SYNCAC bit. However, sh_msiof_spi_setup() changes only the latter. Fix this by updating the SIRMDR1 register like the SITMDR1 register, taking into account register bits that exist in SITMDR1 only. Reported-by: Renesas BSP team via Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Fixes: 7ff0b53c4051145d ("spi: sh-msiof: Avoid writing to registers from spi_master.setup()") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-05-24ahci: Add PCI ID for Cannon Lake PCH-LP AHCIMika Westerberg
This one should be using the default LPM policy for mobile chipsets so add the PCI ID to the driver list of supported revices. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2018-05-24drm/psr: Fix missed entry in PSR setup time table.Dhinakaran Pandiyan
Entry corresponding to 220 us setup time was missing. I am not aware of any specific bug this fixes, but this could potentially result in enabling PSR on a panel with a higher setup time requirement than supported by the hardware. I verified the value is present in eDP spec versions 1.3, 1.4 and 1.4a. Fixes: 6608804b3d7f ("drm/dp: Add drm_dp_psr_setup_time()") Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jose Roberto de Souza <jose.souza@intel.com> Cc: dri-devel@lists.freedesktop.org Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Tarun Vyas <tarun.vyas@intel.com> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180511195145.3829-3-dhinakaran.pandiyan@intel.com
2018-05-24MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRsMaciej W. Rozycki
Use 64-bit accesses for 64-bit floating-point general registers with PTRACE_PEEKUSR, removing the truncation of their upper halves in the FR=1 mode, caused by commit bbd426f542cb ("MIPS: Simplify FP context access"), which inadvertently switched them to using 32-bit accesses. The PTRACE_POKEUSR side is fine as it's never been broken and continues using 64-bit accesses. Fixes: bbd426f542cb ("MIPS: Simplify FP context access") Signed-off-by: Maciej W. Rozycki <macro@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.15+ Patchwork: https://patchwork.linux-mips.org/patch/19334/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-24MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requestsMaciej W. Rozycki
Having PR_FP_MODE_FRE (i.e. Config5.FRE) set without PR_FP_MODE_FR (i.e. Status.FR) is not supported as the lone purpose of Config5.FRE is to emulate Status.FR=0 handling on FPU hardware that has Status.FR=1 hardwired[1][2]. Also we do not handle this case elsewhere, and assume throughout our code that TIF_HYBRID_FPREGS and TIF_32BIT_FPREGS cannot be set both at once for a task, leading to inconsistent behaviour if this does happen. Return unsuccessfully then from prctl(2) PR_SET_FP_MODE calls requesting PR_FP_MODE_FRE to be set with PR_FP_MODE_FR clear. This corresponds to modes allowed by `mips_set_personality_fp'. References: [1] "MIPS Architecture For Programmers, Vol. III: MIPS32 / microMIPS32 Privileged Resource Architecture", Imagination Technologies, Document Number: MD00090, Revision 6.02, July 10, 2015, Table 9.69 "Config5 Register Field Descriptions", p. 262 [2] "MIPS Architecture For Programmers, Volume III: MIPS64 / microMIPS64 Privileged Resource Architecture", Imagination Technologies, Document Number: MD00091, Revision 6.03, December 22, 2015, Table 9.72 "Config5 Register Field Descriptions", p. 288 Fixes: 9791554b45a2 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS") Signed-off-by: Maciej W. Rozycki <macro@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 4.0+ Patchwork: https://patchwork.linux-mips.org/patch/19327/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-24MIPS: lantiq: gphy: Drop reboot/remove reset assertsMathias Kresin
While doing a global software reset, these bits are not cleared and let some bootloader fail to initialise the GPHYs. The bootloader don't expect the GPHYs in reset, as they aren't during power on. The asserts were a workaround for a wrong syscon-reboot mask. With a mask set which includes the GPHY resets, these resets aren't required any more. Fixes: 126534141b45 ("MIPS: lantiq: Add a GPHY driver which uses the RCU syscon-mfd") Signed-off-by: Mathias Kresin <dev@kresin.me> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Cc: John Crispin <john@phrozen.org> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 4.14+ Patchwork: https://patchwork.linux-mips.org/patch/19003/ [jhogan@kernel.org: Fix build warnings] Signed-off-by: James Hogan <jhogan@kernel.org>
2018-05-24arm64: Make sure permission updates happen for pmd/pudLaura Abbott
Commit 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings") disallowed block mappings for ioremap since that code does not honor break-before-make. The same APIs are also used for permission updating though and the extra checks prevent the permission updates from happening, even though this should be permitted. This results in read-only permissions not being fully applied. Visibly, this can occasionaly be seen as a failure on the built in rodata test when the test data ends up in a section or as an odd RW gap on the page table dump. Fix this by using pgattr_change_is_safe instead of p*d_present for determining if the change is permitted. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Peter Robinson <pbrobinson@gmail.com> Reported-by: Peter Robinson <pbrobinson@gmail.com> Fixes: 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings") Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-05-24m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap()Michael Schmitz
If 020/030 support is enabled, get_io_area() leaves an IO_SIZE gap between mappings which is added to the vm_struct representing the mapping. __ioremap() uses the actual requested size (after alignment), while __iounmap() is passed the size from the vm_struct. On 020/030, early termination descriptors are used to set up mappings of extent 'size', which are validated on unmapping. The unmapped gap of size IO_SIZE defeats the sanity check of the pmd tables, causing __iounmap() to loop forever on 030. On 040/060, unmapping of page table entries does not check for a valid mapping, so the umapping loop always completes there. Adjust size to be unmapped by the gap that had been added in the vm_struct prior. This fixes the hang in atari_platform_init() reported a long time ago, and a similar one reported by Finn recently (addressed by removing ioremap() use from the SWIM driver. Tested on my Falcon in 030 mode - untested but should work the same on 040/060 (the extra page tables cleared there would never have been set up anyway). Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> [geert: Minor commit description improvements] [geert: This was fixed in 2.4.23, but not in 2.5.x] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org
2018-05-24Btrfs: fix error handling in btrfs_truncate()Omar Sandoval
Jun Wu at Facebook reported that an internal service was seeing a return value of 1 from ftruncate() on Btrfs in some cases. This is coming from the NEED_TRUNCATE_BLOCK return value from btrfs_truncate_inode_items(). btrfs_truncate() uses two variables for error handling, ret and err. When btrfs_truncate_inode_items() returns non-zero, we set err to the return value. However, NEED_TRUNCATE_BLOCK is not an error. Make sure we only set err if ret is an error (i.e., negative). To reproduce the issue: mount a filesystem with -o compress-force=zstd and the following program will encounter return value of 1 from ftruncate: int main(void) { char buf[256] = { 0 }; int ret; int fd; fd = open("test", O_CREAT | O_WRONLY | O_TRUNC, 0666); if (fd == -1) { perror("open"); return EXIT_FAILURE; } if (write(fd, buf, sizeof(buf)) != sizeof(buf)) { perror("write"); close(fd); return EXIT_FAILURE; } if (fsync(fd) == -1) { perror("fsync"); close(fd); return EXIT_FAILURE; } ret = ftruncate(fd, 128); if (ret) { printf("ftruncate() returned %d\n", ret); close(fd); return EXIT_FAILURE; } close(fd); return EXIT_SUCCESS; } Fixes: ddfae63cc8e0 ("btrfs: move btrfs_truncate_block out of trans handle") CC: stable@vger.kernel.org # 4.15+ Reported-by: Jun Wu <quark@fb.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-24dma-debug: check scatterlist segmentsRobin Murphy
Drivers/subsystems creating scatterlists for DMA should be taking care to respect the scatter-gather limitations of the appropriate device, as described by dma_parms. A DMA API implementation cannot feasibly split a scatterlist into *more* entries than originally passed, so it is not well defined what they should do when given a segment larger than the limit they are also required to respect. Conversely, devices which are less limited than the rather conservative defaults, or indeed have no limitations at all (e.g. GPUs with their own internal MMU), should be encouraged to set appropriate dma_parms, as they may get more efficient DMA mapping performance out of it. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-24Merge tag 'perf-core-for-mingo-4.18-20180523' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements from Arnaldo Carvalho de Melo: - Create extra kernel maps to help in decoding samples in x86 PTI entry trampolines (Adrian Hunter) - Copy x86 PTI entry trampoline sections in the kcore copy used for annotation and intel_pt CPU traces decoding (Adrian Hunter) - Support 'perf annotate --group' for non-explicit recorded event "groups", showing multiple columns, one for each event, just like when dealing with explicit event groups (those enclosed with {}) (Jin Yao) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-23fix io_destroy()/aio_complete() raceAl Viro
If io_destroy() gets to cancelling everything that can be cancelled and gets to kiocb_cancel() calling the function driver has left in ->ki_cancel, it becomes vulnerable to a race with IO completion. At that point req is already taken off the list and aio_complete() does *NOT* spin until we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds to kiocb_free(), freing req just it gets passed to ->ki_cancel(). Fix is simple - remove from the list after the call of kiocb_cancel(). All instances of ->ki_cancel() already have to cope with the being called with iocb still on list - that's what happens in io_cancel(2). Cc: stable@kernel.org Fixes: 0460fef2a921 "aio: use cancellation list lazily" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-24netfilter: provide correct argument to nla_strlcpy()Eric Dumazet
Recent patch forgot to remove nla_data(), upsetting syzkaller a bit. BUG: KASAN: slab-out-of-bounds in nla_strlcpy+0x13d/0x150 lib/nlattr.c:314 Read of size 1 at addr ffff8801ad1f4fdd by task syz-executor189/4509 CPU: 1 PID: 4509 Comm: syz-executor189 Not tainted 4.17.0-rc6+ #62 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 nla_strlcpy+0x13d/0x150 lib/nlattr.c:314 nfnl_acct_new+0x574/0xc50 net/netfilter/nfnetlink_acct.c:118 nfnetlink_rcv_msg+0xdb5/0xff0 net/netfilter/nfnetlink.c:212 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448 nfnetlink_rcv+0x1fe/0x1ba0 net/netfilter/nfnetlink.c:513 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 sock_write_iter+0x35a/0x5a0 net/socket.c:908 call_write_iter include/linux/fs.h:1784 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x64d/0x960 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0xf9/0x250 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 Fixes: 4e09fc873d92 ("netfilter: prefer nla_strlcpy for dealing with NLA_STRING attributes") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Westphal <fw@strlen.de> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23docs: update kernel versions and dates in tablesTim Bird
Every once in a while, we should update the examples to reflect more recent kernel versions. Update the tables describing kernel releases, the merge window, and current longterm maintained kernel, from 2.6-era kernels to 4.x. Signed-off-by: Tim Bird <tim.bird@sony.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-05-23RDMA/hns: Move the location for initializing tmp_lenoulijun
When posted work request, it need to compute the length of all sges of every wr and fill it into the msg_len field of send wqe. Thus, While posting multiple wr, tmp_len should be reinitialized to zero. Fixes: 8b9b8d143b46 ("RDMA/hns: Fix the endian problem for hns") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-23RDMA/hns: Bugfix for cq record db for kerneloulijun
When use cq record db for kernel, it needs to set the hr_cq->db_en to 1 and configure the dma address of record cq db of qp context. Fixes: 86188a8810ed ("RDMA/hns: Support cq record doorbell for kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-23bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueueTejun Heo
From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Wed, 23 May 2018 10:29:00 -0700 cgwb_release() punts the actual release to cgwb_release_workfn() on system_wq. Depending on the number of cgroups or block devices, there can be a lot of cgwb_release_workfn() in flight at the same time. We're periodically seeing close to 256 kworkers getting stuck with the following stack trace and overtime the entire system gets stuck. [<ffffffff810ee40c>] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330 [<ffffffff810ee634>] synchronize_rcu_expedited+0x24/0x30 [<ffffffff811ccf23>] bdi_unregister+0x53/0x290 [<ffffffff811cd1e9>] release_bdi+0x89/0xc0 [<ffffffff811cd645>] wb_exit+0x85/0xa0 [<ffffffff811cdc84>] cgwb_release_workfn+0x54/0xb0 [<ffffffff810a68d0>] process_one_work+0x150/0x410 [<ffffffff810a71fd>] worker_thread+0x6d/0x520 [<ffffffff810ad3dc>] kthread+0x12c/0x160 [<ffffffff81969019>] ret_from_fork+0x29/0x40 [<ffffffffffffffff>] 0xffffffffffffffff The events leading to the lockup are... 1. A lot of cgwb_release_workfn() is queued at the same time and all system_wq kworkers are assigned to execute them. 2. They all end up calling synchronize_rcu_expedited(). One of them wins and tries to perform the expedited synchronization. 3. However, that invovles queueing rcu_exp_work to system_wq and waiting for it. Because #1 is holding all available kworkers on system_wq, rcu_exp_work can't be executed. cgwb_release_workfn() is waiting for synchronize_rcu_expedited() which in turn is waiting for cgwb_release_workfn() to free up some of the kworkers. We shouldn't be scheduling hundreds of cgwb_release_workfn() at the same time. There's nothing to be gained from that. This patch updates cgwb release path to use a dedicated percpu workqueue with @max_active of 1. While this resolves the problem at hand, it might be a good idea to isolate rcu_exp_work to its own workqueue too as it can be used from various paths and is prone to this sort of indirect A-A deadlocks. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-23nbd: set discard granularity properlyJosef Bacik
For some reason we had discard granularity set to 512 always even when discards were disabled. Fix this by having the default be 0, and then if we turn it on set the discard granularity to the blocksize. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-23IB/uverbs: Fix uverbs_attr_get_objJason Gunthorpe
The err pointer comes from uverbs_attr_get, not from the uobject member, which does not store an ERR_PTR. Fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-05-23RDMA/qedr: Fix doorbell bar mapping for dpi > 1Kalderon, Michal
Each user_context receives a separate dpi value and thus a different address on the doorbell bar. The qedr_mmap function needs to validate the address and map the doorbell bar accordingly. The current implementation always checked against dpi=0 doorbell range leading to a wrong mapping for doorbell bar. (It entered an else case that mapped the address differently). qedr_mmap should only be used for doorbells, so the else was actually wrong in the first place. This only has an affect on arm architecture and not an issue on a x86 based architecture. This lead to doorbells not occurring on arm based systems and left applications that use more than one dpi (or several applications run simultaneously ) to hang. Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs") Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-23net/mlx4: Fix irq-unsafe spinlock usageJack Morgenstein
spin_lock/unlock was used instead of spin_un/lock_irq in a procedure used in process space, on a spinlock which can be grabbed in an interrupt. This caused the stack trace below to be displayed (on kernel 4.17.0-rc1 compiled with Lock Debugging enabled): [ 154.661474] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 154.668909] 4.17.0-rc1-rdma_rc_mlx+ #3 Tainted: G I [ 154.675856] ----------------------------------------------------- [ 154.682706] modprobe/10159 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 154.690254] 00000000f3b0e495 (&(&qp_table->lock)->rlock){+.+.}, at: mlx4_qp_remove+0x20/0x50 [mlx4_core] [ 154.700927] and this task is already holding: [ 154.707461] 0000000094373b5d (&(&cq->lock)->rlock/1){....}, at: destroy_qp_common+0x111/0x560 [mlx4_ib] [ 154.718028] which would create a new lock dependency: [ 154.723705] (&(&cq->lock)->rlock/1){....} -> (&(&qp_table->lock)->rlock){+.+.} [ 154.731922] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 154.740798] (&(&cq->lock)->rlock){..-.} [ 154.740800] ... which became SOFTIRQ-irq-safe at: [ 154.752163] _raw_spin_lock_irqsave+0x3e/0x50 [ 154.757163] mlx4_ib_poll_cq+0x36/0x900 [mlx4_ib] [ 154.762554] ipoib_tx_poll+0x4a/0xf0 [ib_ipoib] ... to a SOFTIRQ-irq-unsafe lock: [ 154.815603] (&(&qp_table->lock)->rlock){+.+.} [ 154.815604] ... which became SOFTIRQ-irq-unsafe at: [ 154.827718] ... [ 154.827720] _raw_spin_lock+0x35/0x50 [ 154.833912] mlx4_qp_lookup+0x1e/0x50 [mlx4_core] [ 154.839302] mlx4_flow_attach+0x3f/0x3d0 [mlx4_core] Since mlx4_qp_lookup() is called only in process space, we can simply replace the spin_un/lock calls with spin_un/lock_irq calls. Fixes: 6dc06c08bef1 ("net/mlx4: Fix the check in attaching steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23platform/chrome: chromeos_laptop - supply properties for ACPI devicesDmitry Torokhov
BayTrail-based and newer Chromebooks describe their peripherals in ACPI; unfortunately their description is not complete, and peripherals drivers, such as driver for Atmel Touch controllers, has to resort to DMI-matching to configure the peripherals properly. To avoid polluting peripheral driver code, let's teach chromeos_laptop driver to supply missing data via generic device properties. Note we supply "compatible" string for Atmel peripherals not because it is needed for matching devices and driver (matching is still done on ACPI HID entries), but because peripherals driver will be using presence of "compatible" property to determine if device properties have been attached to the device, and fail to bind if they are absent. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Benson Leung <bleung@chromium.org>
2018-05-23net: phy: broadcom: Fix bcm_write_exp()Florian Fainelli
On newer PHYs, we need to select the expansion register to write with setting bits [11:8] to 0xf. This was done correctly by bcm7xxx.c prior to being migrated to generic code under bcm-phy-lib.c which unfortunately used the older implementation from the BCM54xx days. Fix this by creating an inline stub: bcm_write_exp_sel() which adds the correct value (MII_BCM54XX_EXP_SEL_ER) and update both the Cygnus PHY and BCM7xxx PHY drivers which require setting these bits. broadcom.c is unchanged because some PHYs even use a different selector method, so let them specify it directly (e.g: SerDes secondary selector). Fixes: a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23net: phy: broadcom: Fix auxiliary control register readsFlorian Fainelli
We are currently doing auxiliary control register reads with the shadow register value 0b111 (0x7) which incidentally is also the selector value that should be present in bits [2:0]. Fix this by using the appropriate selector mask which is defined (MII_BCM54XX_AUXCTL_SHDWSEL_MASK). This does not have a functional impact yet because we always access the MII_BCM54XX_AUXCTL_SHDWSEL_MISC (0x7) register in the current code. This might change at some point though. Fixes: 5b4e29005123 ("net: phy: broadcom: add bcm54xx_auxctl_read") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23net: ipv4: add missing RTA_TABLE to rtm_ipv4_policyRoopa Prabhu
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23platform/chrome: chromeos_tbmc - add SPDX identifierBenson Leung
Replace the original license statement with the SPDX identifier. Add also one line of description as recommended by the COPYING file. Signed-off-by: Benson Leung <bleung@chromium.org>
2018-05-23net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase messageColin Ian King
Trivial fix to spelling mistake in mlx4_dbg debug message and also change the phrasing of the message so that is is more readable Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23platform: chrome: Add Tablet Switch ACPI driverGwendal Grignou
Add a kernel driver for GOOG0006, an ACPI driver reporting an event when the tablet switch status changes. On an ACPI based convertible chromebook check evtest display tablet mode switch changes: Available devices: .. /dev/input/event3: Tablet Mode Switch .. Testing ... (interrupt to exit) Event: time 1484879712.604360, type 5 (EV_SW), code 1 (SW_TABLET_MODE), value 1 Event: time 1484879712.604360, -------------- SYN_REPORT ------------ Event: time 1484879715.132228, type 5 (EV_SW), code 1 (SW_TABLET_MODE), value 0 Event: time 1484879715.132228, -------------- SYN_REPORT ------------ ... Check state is updated at resume time when different from suspend time. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Signed-off-by: Benson Leung <bleung@chromium.org>
2018-05-23ibmvnic: Only do H_EOI for mobility eventsNathan Fontenot
When enabling the sub-CRQ IRQ a previous update sent a H_EOI prior to the enablement to clear any pending interrupts that may be present across a partition migration. This fixed a firmware bug where a migration could erroneously indicate that a H_EOI was pending. The H_EOI should only be sent when enabling during a mobility event though. Doing so at other time could wrong and can produce extra driver output when IRQs are enabled when doing TX completion. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23Input: elan_i2c_smbus - fix corrupted stackBenjamin Tissoires
New ICs (like the one on the Lenovo T480s) answer to ETP_SMBUS_IAP_VERSION_CMD 4 bytes instead of 3. This corrupts the stack as i2c_smbus_read_block_data() uses the values returned by the i2c device to know how many data it need to return. i2c_smbus_read_block_data() can read up to 32 bytes (I2C_SMBUS_BLOCK_MAX) and there is no safeguard on how many bytes are provided in the return value. Ensure we always have enough space for any future firmware. Also 0-initialize the values to prevent any access to uninitialized memory. Cc: <stable@vger.kernel.org> # v4.4.x, v4.9.x, v4.14.x, v4.15.x, v4.16.x Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: KT Liao <kt.liao@emc.com.tw> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-05-23Input: synaptics - add Lenovo 80 series ids to SMBusBenjamin Tissoires
This time, Lenovo decided to go with different pieces in its latest series of Thinkpads. For those we have been able to test: - the T480 is using Synaptics with an IBM trackpoint -> it behaves properly with or without intertouch, there is no point not using RMI4 - the X1 Carbon 6th gen is using Synaptics with an IBM trackpoint -> the touchpad doesn't behave properly under PS/2 so we have to switch it to RMI4 if we do not want to have disappointed users - the X280 is using Synaptics with an ALPS trackpoint -> the recent fixes in the trackpoint handling fixed it so upstream now works fine with or without RMI4, and there is no point not using RMI4 - the T480s is using an Elan touchpad, so that's a different story Cc: <stable@vger.kernel.org> # v4.14.x, v4.15.x, v4.16.x Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: KT Liao <kt.liao@emc.com.tw> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-05-23Input: synaptics - add Intertouch support on X1 Carbon 6th and X280Aaron Ma
Synaptics devices reported it has Intertouch support, and it fails via PS/2 as following logs: psmouse serio2: Failed to reset mouse on synaptics-pt/serio0 psmouse serio2: Failed to enable mouse on synaptics-pt/serio0 Set these new devices to use SMBus to fix this issue, then they report SMBus version 3 is using, patch: https://patchwork.kernel.org/patch/9989547/ enabled SMBus ver 3 and makes synaptics devices work fine on SMBus mode. Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-05-23Input: synaptics - Lenovo Thinkpad X1 Carbon G5 (2017) with Elantech ↵Edvard Holst
trackpoints should use RMI Lenovo use two different trackpoints in the fifth generation Thinkpad X1 Carbon. Both are accessible over SMBUS/RMI but the pnpIDs are missing. This patch is for the Elantech trackpoint specifically which also reports SMB version 3 so rmi_smbus needs to be updated in order to handle it. For the record, I was not the first one to come up with this patch as it has been floating around the internet for a while now. However, I have spent significant time with testing and my efforts to find the original author of the patch have been unsuccessful. Signed-off-by: Edvard Holst <edvard.holst@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-05-23Merge tag 'wireless-drivers-for-davem-2018-05-22' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.17 Hopefully the last fixes for 4.17. ssb is again causing problems so we had to revert a commit and fix it better. Also a small fix to bcma and some MAINTAINERS file updates. ssb * fix regression with all module PCI cards, for example using b43 and b44 drivers * try again fixing a MIPS linker error bcma * fix truncated info log messages ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23Input: synaptics - Lenovo Carbon X1 Gen5 (2017) devices should use RMIDmitry Torokhov
The touchpad on Lenovo Carbon X1 Gen 5 (2017 - Kabylake) is accessible over SMBUS/RMI, so let's activate it by default. Cc: stable@vger.kernel.org Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-05-23tuntap: correctly set SOCKWQ_ASYNC_NOSPACEJason Wang
When link is down, writes to the device might fail with -EIO. Userspace needs an indication when the status is resolved. As a fix, tun_net_open() attempts to wake up writers - but that is only effective if SOCKWQ_ASYNC_NOSPACE has been set in the past. This is not the case of vhost_net which only poll for EPOLLOUT after it meets errors during sendmsg(). This patch fixes this by making sure SOCKWQ_ASYNC_NOSPACE is set when socket is not writable or device is down to guarantee EPOLLOUT will be raised in either tun_chr_poll() or tun_sock_write_space() after device is up. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Eric Dumazet <edumazet@google.com> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23Merge branch 'virtio_net-mergeable-XDP'David S. Miller
Jason Wang says: ==================== Fix several issues of virtio-net mergeable XDP Please review the patches that tries to fix several issues of virtio-net mergeable XDP. Changes from V1: - check against 1 before decreasing instead of resetting to 1 - typoe fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23virtio-net: fix leaking page for gso packet during mergeable XDPJason Wang
We need to drop refcnt to xdp_page if we see a gso packet. Otherwise it will be leaked. Fixing this by moving the check of gso packet above the linearizing logic. While at it, remove useless comment as well. Cc: John Fastabend <john.fastabend@gmail.com> Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23virtio-net: correctly check num_buf during err pathJason Wang
If we successfully linearize the packet, num_buf will be set to zero which may confuse error handling path which assumes num_buf is at least 1 and this can lead the code tries to pop the descriptor of next buffer. Fixing this by checking num_buf against 1 before decreasing. Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23virtio-net: correctly transmit XDP buff after linearizingJason Wang
We should not go for the error path after successfully transmitting a XDP buffer after linearizing. Since the error path may try to pop and drop next packet and increase the drop counters. Fixing this by simply drop the refcnt of original page and go for xmit path. Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23virtio-net: correctly redirect linearized packetJason Wang
After a linearized packet was redirected by XDP, we should not go for the err path which will try to pop buffers for the next packet and increase the drop counter. Fixing this by just drop the page refcnt for the original page. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Reported-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23Merge tag 'mac80211-for-davem-2018-05-23' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A handful of fixes: * hwsim radio dump wasn't working for the first radio * mesh was updating statistics incorrectly * a netlink message allocation was possibly too short * wiphy name limit was still too long * in certain cases regdb query could find a NULL pointer ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23Merge tag 'mfd-fixes-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fix from Lee Jones: "A single cros_ec_spi fix correcting the handling for long-running commands" * tag 'mfd-fixes-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: cros_ec: Retry commands when EC is known to be busy
2018-05-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha fixes from Matt Turner: "A few small changes for alpha" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2 alpha: simplify get_arch_dma_ops alpha: use dma_direct_ops for jensen
2018-05-23nvme: host: core: fix precedence of ternary operatorIvan Bornyakov
Ternary operator have lower precedence then bitwise or, so 'cdw10' was calculated wrong. Signed-off-by: Ivan Bornyakov <brnkv.i1@gmail.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com>