summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-14x86, memremap: fix altmap accounting at freeDan Williams
Commit 24b6d4164348 "mm: pass the vmem_altmap to vmemmap_free" converted the vmemmap_free() path to pass the altmap argument all the way through the call chain rather than looking it up based on the page. Unfortunately that ends up over freeing altmap allocated pages in some cases since free_pagetable() is used to free both memmap space and pte space, where only the memmap stored in huge pages uses altmap allocations. Given that altmap allocations for memmap space are special cased in vmemmap_populate_hugepages() add a symmetric / special case free_hugepage_table() to handle altmap freeing, and cleanup the unneeded passing of altmap to leaf functions that do not require it. Without this change the sanity check accounting in devm_memremap_pages_release() will throw a warning with the following signature. nd_pmem pfn10.1: devm_memremap_pages_release: failed to free all reserved pages WARNING: CPU: 44 PID: 3539 at kernel/memremap.c:310 devm_memremap_pages_release+0x1c7/0x220 CPU: 44 PID: 3539 Comm: ndctl Tainted: G L 4.16.0-rc1-linux-stable #7 RIP: 0010:devm_memremap_pages_release+0x1c7/0x220 [..] Call Trace: release_nodes+0x225/0x270 device_release_driver_internal+0x15d/0x210 bus_remove_device+0xe2/0x160 device_del+0x130/0x310 ? klist_release+0x56/0x100 ? nd_region_notify+0xc0/0xc0 [libnvdimm] device_unregister+0x16/0x60 This was missed in testing since not all configurations will trigger this warning. Fixes: 24b6d4164348 ("mm: pass the vmem_altmap to vmemmap_free") Reported-by: Jane Chu <jane.chu@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-14btrfs: add missing initialization in btrfs_check_sharedEdmund Nadolski
This patch addresses an issue that causes fiemap to falsely report a shared extent. The test case is as follows: xfs_io -f -d -c "pwrite -b 16k 0 64k" -c "fiemap -v" /media/scratch/file5 sync xfs_io -c "fiemap -v" /media/scratch/file5 which gives the resulting output: wrote 65536/65536 bytes at offset 0 64 KiB, 4 ops; 0.0000 sec (121.359 MiB/sec and 7766.9903 ops/sec) /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x2001 /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x1 This is because btrfs_check_shared calls find_parent_nodes repeatedly in a loop, passing a share_check struct to report the count of shared extent. But btrfs_check_shared does not re-initialize the count value to zero for subsequent calls from the loop, resulting in a false share count value. This is a regressive behavior from 4.13. With proper re-initialization the test result is as follows: wrote 65536/65536 bytes at offset 0 64 KiB, 4 ops; 0.0000 sec (110.035 MiB/sec and 7042.2535 ops/sec) /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x1 /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x1 which corrects the regression. Fixes: 3ec4d3238ab ("btrfs: allow backref search checks for shared extents") Signed-off-by: Edmund Nadolski <enadolski@suse.com> [ add text from cover letter to changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-14btrfs: Fix NULL pointer exception in find_bio_stripeDmitriy Gorokh
On detaching of a disk which is a part of a RAID6 filesystem, the following kernel OOPS may happen: [63122.680461] BTRFS error (device sdo): bdev /dev/sdo errs: wr 0, rd 0, flush 1, corrupt 0, gen 0 [63122.719584] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo [63122.719587] BTRFS error (device sdo): bdev /dev/sdo errs: wr 1, rd 0, flush 1, corrupt 0, gen 0 [63122.803516] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo [63122.803519] BTRFS error (device sdo): bdev /dev/sdo errs: wr 2, rd 0, flush 1, corrupt 0, gen 0 [63122.863902] BTRFS critical (device sdo): fatal error on device /dev/sdo [63122.935338] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [63122.946554] IP: fail_bio_stripe+0x58/0xa0 [btrfs] [63122.958185] PGD 9ecda067 P4D 9ecda067 PUD b2b37067 PMD 0 [63122.971202] Oops: 0000 [#1] SMP [63123.006760] CPU: 0 PID: 3979 Comm: kworker/u8:9 Tainted: G W 4.14.2-16-scst34x+ #8 [63123.007091] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [63123.007402] Workqueue: btrfs-worker btrfs_worker_helper [btrfs] [63123.007595] task: ffff880036ea4040 task.stack: ffffc90006384000 [63123.007796] RIP: 0010:fail_bio_stripe+0x58/0xa0 [btrfs] [63123.007968] RSP: 0018:ffffc90006387ad8 EFLAGS: 00010287 [63123.008140] RAX: 0000000000000002 RBX: ffff88004beaa0b8 RCX: ffff8800b2bd5690 [63123.008359] RDX: 0000000000000000 RSI: ffff88007bb43500 RDI: ffff88004beaa000 [63123.008621] RBP: ffffc90006387ae8 R08: 0000000099100000 R09: ffff8800b2bd5600 [63123.008840] R10: 0000000000000004 R11: 0000000000010000 R12: ffff88007bb43500 [63123.009059] R13: 00000000fffffffb R14: ffff880036fc5180 R15: 0000000000000004 [63123.009278] FS: 0000000000000000(0000) GS:ffff8800b7000000(0000) knlGS:0000000000000000 [63123.009564] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [63123.009748] CR2: 0000000000000080 CR3: 00000000b0866000 CR4: 00000000000406f0 [63123.009969] Call Trace: [63123.010085] raid_write_end_io+0x7e/0x80 [btrfs] [63123.010251] bio_endio+0xa1/0x120 [63123.010378] generic_make_request+0x218/0x270 [63123.010921] submit_bio+0x66/0x130 [63123.011073] finish_rmw+0x3fc/0x5b0 [btrfs] [63123.011245] full_stripe_write+0x96/0xc0 [btrfs] [63123.011428] raid56_parity_write+0x117/0x170 [btrfs] [63123.011604] btrfs_map_bio+0x2ec/0x320 [btrfs] [63123.011759] ? ___cache_free+0x1c5/0x300 [63123.011909] __btrfs_submit_bio_done+0x26/0x50 [btrfs] [63123.012087] run_one_async_done+0x9c/0xc0 [btrfs] [63123.012257] normal_work_helper+0x19e/0x300 [btrfs] [63123.012429] btrfs_worker_helper+0x12/0x20 [btrfs] [63123.012656] process_one_work+0x14d/0x350 [63123.012888] worker_thread+0x4d/0x3a0 [63123.013026] ? _raw_spin_unlock_irqrestore+0x15/0x20 [63123.013192] kthread+0x109/0x140 [63123.013315] ? process_scheduled_works+0x40/0x40 [63123.013472] ? kthread_stop+0x110/0x110 [63123.013610] ret_from_fork+0x25/0x30 [63123.014469] RIP: fail_bio_stripe+0x58/0xa0 [btrfs] RSP: ffffc90006387ad8 [63123.014678] CR2: 0000000000000080 [63123.016590] ---[ end trace a295ea7259c17880 ]— This is reproducible in a cycle, where a series of writes is followed by SCSI device delete command. The test may take up to few minutes. Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") [ no signed-off-by provided ] Author: Dmitriy Gorokh <Dmitriy.Gorokh@wdc.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-14IB/mlx5: Fix cleanup order on unloadMark Bloch
On load we create private CQ/QP/PD in order to be used by UMR, we create those resources after we register ourself as an IB device, and we destroy them after we unregister as an IB device. This was changed by commit 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") which moved the destruction before we unregistration. This allowed to trigger an invalid memory access when unloading mlx5_ib while there are open resources: BUG: unable to handle kernel paging request at 00000001002c012c ... Call Trace: mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib] __slab_free+0x9a/0x2d0 delay_time_func+0x10/0x10 [mlx5_ib] unreg_umr.isra.15+0x4b/0x50 [mlx5_ib] mlx5_mr_cache_free+0x46/0x150 [mlx5_ib] clean_mr+0xc9/0x190 [mlx5_ib] dereg_mr+0xba/0xf0 [mlx5_ib] ib_dereg_mr+0x13/0x20 [ib_core] remove_commit_idr_uobject+0x16/0x70 [ib_uverbs] uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs] ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs] ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs] ib_unregister_device+0xd4/0x190 [ib_core] __mlx5_ib_remove+0x2e/0x40 [mlx5_ib] mlx5_remove_device+0xf5/0x120 [mlx5_core] mlx5_unregister_interface+0x37/0x90 [mlx5_core] mlx5_ib_cleanup+0xc/0x225 [mlx5_ib] SyS_delete_module+0x153/0x230 do_syscall_64+0x62/0x110 entry_SYSCALL_64_after_hwframe+0x21/0x86 ... We restore the original behavior by breaking the UMR stage into two parts, pre and post IB registration stages, this way we can restore the original functionality and maintain clean separation of logic between stages. Fixes: 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14drm/amdgpu/dce: Don't turn off DP sink when disconnectedMichel Dänzer
Turning off the sink in this case causes various issues, because userspace expects it to stay on until it turns it off explicitly. Instead, turn the sink off and back on when a display is connected again. This dance seems necessary for link training to work correctly. Bugzilla: https://bugs.freedesktop.org/105308 Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-14drm/amdgpu: save/restore backlight level in legacy dce codeAlex Deucher
Save/restore the backlight level scratch register in S3/S4 so the backlight level comes back at the previously requested level. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=199047 Fixes: 4ec6ecf48c64d (drm/amdgpu: drop scratch regs save and restore from S3/S4 handling) Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-14drm/radeon: fix prime teardown orderChristian König
We unmapped imported DMA-bufs when the GEM handle was dropped, not when the hardware was done with the buffere. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> CC: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-14drm/amdgpu: fix prime teardown orderChristian König
We unmapped imported DMA-bufs when the GEM handle was dropped, not when the hardware was done with the buffere. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> CC: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-14RDMA/ucma: Don't allow join attempts for unsupported AF familyLeon Romanovsky
Users can provide garbage while calling to ucma_join_ip_multicast(), it will indirectly cause to rdma_addr_size() return 0, making the call to ucma_process_join(), which had the right checks, but it is better to check the input as early as possible. The following crash from syzkaller revealed it. kernel BUG at lib/string.c:1052! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051 RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286 RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000 RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12 RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998 R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00 FS: 0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: memcpy include/linux/string.h:344 [inline] ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633 __vfs_write+0xef/0x970 fs/read_write.c:480 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline] do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f9ec99 RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100 RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de 55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56 RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0 Fixes: 5bc2b7b397b0 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast") Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14RDMA/ucma: Fix access to non-initialized CM_ID objectLeon Romanovsky
The attempt to join multicast group without ensuring that CMA device exists will lead to the following crash reported by syzkaller. [ 64.076794] BUG: KASAN: null-ptr-deref in rdma_join_multicast+0x26e/0x12c0 [ 64.076797] Read of size 8 at addr 00000000000000b0 by task join/691 [ 64.076797] [ 64.076800] CPU: 1 PID: 691 Comm: join Not tainted 4.16.0-rc1-00219-gb97853b65b93 #23 [ 64.076802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4 [ 64.076803] Call Trace: [ 64.076809] dump_stack+0x5c/0x77 [ 64.076817] kasan_report+0x163/0x380 [ 64.085859] ? rdma_join_multicast+0x26e/0x12c0 [ 64.086634] rdma_join_multicast+0x26e/0x12c0 [ 64.087370] ? rdma_disconnect+0xf0/0xf0 [ 64.088579] ? __radix_tree_replace+0xc3/0x110 [ 64.089132] ? node_tag_clear+0x81/0xb0 [ 64.089606] ? idr_alloc_u32+0x12e/0x1a0 [ 64.090517] ? __fprop_inc_percpu_max+0x150/0x150 [ 64.091768] ? tracing_record_taskinfo+0x10/0xc0 [ 64.092340] ? idr_alloc+0x76/0xc0 [ 64.092951] ? idr_alloc_u32+0x1a0/0x1a0 [ 64.093632] ? ucma_process_join+0x23d/0x460 [ 64.094510] ucma_process_join+0x23d/0x460 [ 64.095199] ? ucma_migrate_id+0x440/0x440 [ 64.095696] ? futex_wake+0x10b/0x2a0 [ 64.096159] ucma_join_multicast+0x88/0xe0 [ 64.096660] ? ucma_process_join+0x460/0x460 [ 64.097540] ? _copy_from_user+0x5e/0x90 [ 64.098017] ucma_write+0x174/0x1f0 [ 64.098640] ? ucma_resolve_route+0xf0/0xf0 [ 64.099343] ? rb_erase_cached+0x6c7/0x7f0 [ 64.099839] __vfs_write+0xc4/0x350 [ 64.100622] ? perf_syscall_enter+0xe4/0x5f0 [ 64.101335] ? kernel_read+0xa0/0xa0 [ 64.103525] ? perf_sched_cb_inc+0xc0/0xc0 [ 64.105510] ? syscall_exit_register+0x2a0/0x2a0 [ 64.107359] ? __switch_to+0x351/0x640 [ 64.109285] ? fsnotify+0x899/0x8f0 [ 64.111610] ? fsnotify_unmount_inodes+0x170/0x170 [ 64.113876] ? __fsnotify_update_child_dentry_flags+0x30/0x30 [ 64.115813] ? ring_buffer_record_is_on+0xd/0x20 [ 64.117824] ? __fget+0xa8/0xf0 [ 64.119869] vfs_write+0xf7/0x280 [ 64.122001] SyS_write+0xa1/0x120 [ 64.124213] ? SyS_read+0x120/0x120 [ 64.126644] ? SyS_read+0x120/0x120 [ 64.128563] do_syscall_64+0xeb/0x250 [ 64.130732] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 64.132984] RIP: 0033:0x7f5c994ade99 [ 64.135699] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 64.138740] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99 [ 64.141056] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015 [ 64.143536] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000 [ 64.146017] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0 [ 64.148608] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0 [ 64.151060] [ 64.153703] Disabling lock debugging due to kernel taint [ 64.156032] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 [ 64.159066] IP: rdma_join_multicast+0x26e/0x12c0 [ 64.161451] PGD 80000001d0298067 P4D 80000001d0298067 PUD 1dea39067 PMD 0 [ 64.164442] Oops: 0000 [#1] SMP KASAN PTI [ 64.166817] CPU: 1 PID: 691 Comm: join Tainted: G B 4.16.0-rc1-00219-gb97853b65b93 #23 [ 64.170004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4 [ 64.174985] RIP: 0010:rdma_join_multicast+0x26e/0x12c0 [ 64.177246] RSP: 0018:ffff8801c8207860 EFLAGS: 00010282 [ 64.179901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff94789522 [ 64.183344] RDX: 1ffffffff2d50fa5 RSI: 0000000000000297 RDI: 0000000000000297 [ 64.186237] RBP: ffff8801c8207a50 R08: 0000000000000000 R09: ffffed0039040ea7 [ 64.189328] R10: 0000000000000001 R11: ffffed0039040ea6 R12: 0000000000000000 [ 64.192634] R13: 0000000000000000 R14: ffff8801e2022800 R15: ffff8801d4ac2400 [ 64.196105] FS: 00007f5c99b98700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000 [ 64.199211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 64.202046] CR2: 00000000000000b0 CR3: 00000001d1c48004 CR4: 00000000003606a0 [ 64.205032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 64.208221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 64.211554] Call Trace: [ 64.213464] ? rdma_disconnect+0xf0/0xf0 [ 64.216124] ? __radix_tree_replace+0xc3/0x110 [ 64.219337] ? node_tag_clear+0x81/0xb0 [ 64.222140] ? idr_alloc_u32+0x12e/0x1a0 [ 64.224422] ? __fprop_inc_percpu_max+0x150/0x150 [ 64.226588] ? tracing_record_taskinfo+0x10/0xc0 [ 64.229763] ? idr_alloc+0x76/0xc0 [ 64.232186] ? idr_alloc_u32+0x1a0/0x1a0 [ 64.234505] ? ucma_process_join+0x23d/0x460 [ 64.237024] ucma_process_join+0x23d/0x460 [ 64.240076] ? ucma_migrate_id+0x440/0x440 [ 64.243284] ? futex_wake+0x10b/0x2a0 [ 64.245302] ucma_join_multicast+0x88/0xe0 [ 64.247783] ? ucma_process_join+0x460/0x460 [ 64.250841] ? _copy_from_user+0x5e/0x90 [ 64.253878] ucma_write+0x174/0x1f0 [ 64.257008] ? ucma_resolve_route+0xf0/0xf0 [ 64.259877] ? rb_erase_cached+0x6c7/0x7f0 [ 64.262746] __vfs_write+0xc4/0x350 [ 64.265537] ? perf_syscall_enter+0xe4/0x5f0 [ 64.267792] ? kernel_read+0xa0/0xa0 [ 64.270358] ? perf_sched_cb_inc+0xc0/0xc0 [ 64.272575] ? syscall_exit_register+0x2a0/0x2a0 [ 64.275367] ? __switch_to+0x351/0x640 [ 64.277700] ? fsnotify+0x899/0x8f0 [ 64.280530] ? fsnotify_unmount_inodes+0x170/0x170 [ 64.283156] ? __fsnotify_update_child_dentry_flags+0x30/0x30 [ 64.286182] ? ring_buffer_record_is_on+0xd/0x20 [ 64.288749] ? __fget+0xa8/0xf0 [ 64.291136] vfs_write+0xf7/0x280 [ 64.292972] SyS_write+0xa1/0x120 [ 64.294965] ? SyS_read+0x120/0x120 [ 64.297474] ? SyS_read+0x120/0x120 [ 64.299751] do_syscall_64+0xeb/0x250 [ 64.301826] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 64.304352] RIP: 0033:0x7f5c994ade99 [ 64.306711] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 64.309577] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99 [ 64.312334] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015 [ 64.315783] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000 [ 64.318365] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0 [ 64.320980] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0 [ 64.323515] Code: e8 e8 79 08 ff 4c 89 ff 45 0f b6 a7 b8 01 00 00 e8 68 7c 08 ff 49 8b 1f 4d 89 e5 49 c1 e4 04 48 8 [ 64.330753] RIP: rdma_join_multicast+0x26e/0x12c0 RSP: ffff8801c8207860 [ 64.332979] CR2: 00000000000000b0 [ 64.335550] ---[ end trace 0c00c17a408849c1 ]--- Reported-by: <syzbot+e6aba77967bd72cbc9d6@syzkaller.appspotmail.com> Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14Merge tag 'platform-drivers-x86-v4.16-7' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform drives fixes from Darren Hart: - DELL_SMBIOS conditionally depends on ACPI_WMI in the same way it depends on DCDBAS, update the Kconfig accordingly. - fix the dell driver init order to ensure that the driver dependencies are met, avoiding race conditions resulting in boot failure on certain systems when the drivers are built-in. * tag 'platform-drivers-x86-v4.16-7' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: Fix dell driver init order platform/x86: dell-smbios: Resolve dependency error on ACPI_WMI
2018-03-14RDMA/core: Do not use invalid destination in determining port reuseTatyana Nikolova
cma_port_is_unique() allows local port reuse if the quad (source address and port, destination address and port) for this connection is unique. However, if the destination info is zero or unspecified, it can't make a correct decision but still allows port reuse. For example, sometimes rdma_bind_addr() is called with unspecified destination and reusing the port can lead to creating a connection with a duplicate quad, after the destination is resolved. The issue manifests when MPI scale-up tests hang after the duplicate quad is used. Set the destination address family and add checks for zero destination address and port to prevent source port reuse based on invalid destination. Fixes: 19b752a19dce ("IB/cma: Allow port reuse for rdma_id") Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14dm mpath: fix passing integrity dataSteffen Maier
After v4.12 commit e2460f2a4bc7 ("dm: mark targets that pass integrity data"), dm-multipath, e.g. on DIF+DIX SCSI disk paths, does not support block integrity any more. So add it to the whitelist. This is also a pre-requisite to use block integrity with other dm layer(s) on top of multipath, such as kpartx partitions (dm-linear) or LVM. Also, bump target version to reflect this fix. Fixes: e2460f2a4bc7 ("dm: mark targets that pass integrity data") Cc: <stable@vger.kernel.org> #4.12+ Bisected-by: Fedor Loshakov <loshakov@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-03-14RDMA/mlx5: Fix crash while accessing garbage pointer and freed memoryLeon Romanovsky
The failure in rereg_mr flow caused to set garbage value (error value) into mr->umem pointer. This pointer is accessed at the release stage and it causes to the following crash. There is not enough to simply change umem to point to NULL, because the MR struct is needed to be accessed during MR deregistration phase, so delay kfree too. [ 6.237617] BUG: unable to handle kernel NULL pointer dereference a 0000000000000228 [ 6.238756] IP: ib_dereg_mr+0xd/0x30 [ 6.239264] PGD 80000000167eb067 P4D 80000000167eb067 PUD 167f9067 PMD 0 [ 6.240320] Oops: 0000 [#1] SMP PTI [ 6.240782] CPU: 0 PID: 367 Comm: dereg Not tainted 4.16.0-rc1-00029-gc198fafe0453 #183 [ 6.242120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 6.244504] RIP: 0010:ib_dereg_mr+0xd/0x30 [ 6.245253] RSP: 0018:ffffaf5d001d7d68 EFLAGS: 00010246 [ 6.246100] RAX: 0000000000000000 RBX: ffff95d4172daf00 RCX: 0000000000000000 [ 6.247414] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff95d41a317600 [ 6.248591] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 [ 6.249810] R10: ffff95d417033c10 R11: 0000000000000000 R12: ffff95d4172c3a80 [ 6.251121] R13: ffff95d4172c3720 R14: ffff95d4172c3a98 R15: 00000000ffffffff [ 6.252437] FS: 0000000000000000(0000) GS:ffff95d41fc00000(0000) knlGS:0000000000000000 [ 6.253887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.254814] CR2: 0000000000000228 CR3: 00000000172b4000 CR4: 00000000000006b0 [ 6.255943] Call Trace: [ 6.256368] remove_commit_idr_uobject+0x1b/0x80 [ 6.257118] uverbs_cleanup_ucontext+0xe4/0x190 [ 6.257855] ib_uverbs_cleanup_ucontext.constprop.14+0x19/0x40 [ 6.258857] ib_uverbs_close+0x2a/0x100 [ 6.259494] __fput+0xca/0x1c0 [ 6.259938] task_work_run+0x84/0xa0 [ 6.260519] do_exit+0x312/0xb40 [ 6.261023] ? __do_page_fault+0x24d/0x490 [ 6.261707] do_group_exit+0x3a/0xa0 [ 6.262267] SyS_exit_group+0x10/0x10 [ 6.262802] do_syscall_64+0x75/0x180 [ 6.263391] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 6.264253] RIP: 0033:0x7f1b39c49488 [ 6.264827] RSP: 002b:00007ffe2de05b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 6.266049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1b39c49488 [ 6.267187] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 [ 6.268377] RBP: 00007f1b39f258e0 R08: 00000000000000e7 R09: ffffffffffffff98 [ 6.269640] R10: 00007f1b3a147260 R11: 0000000000000246 R12: 00007f1b39f258e0 [ 6.270783] R13: 00007f1b39f2ac20 R14: 0000000000000000 R15: 0000000000000000 [ 6.271943] Code: 74 07 31 d2 e9 25 d8 6c 00 b8 da ff ff ff c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 07 53 48 8b 5f 08 <48> 8b 80 28 02 00 00 e8 f7 d7 6c 00 85 c0 75 04 3e ff 4b 18 5b [ 6.274927] RIP: ib_dereg_mr+0xd/0x30 RSP: ffffaf5d001d7d68 [ 6.275760] CR2: 0000000000000228 [ 6.276200] ---[ end trace a35641f1c474bd20 ]--- Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14i40e: restore TCPv4 input set when re-enabling ATRJacob Keller
When we re-enable ATR we need to restore the input set for TCPv4 filters, in order for ATR to function correctly. We already do this for the normal case of re-enabling ATR when disabling ntuple support. However, when re-enabling ATR after the last TCPv4 filter is removed (but when ntuple support is still active), we did not restore the TCPv4 filter input set. This can cause problems if the TCPv4 filters from FDir had changed the input set, as ATR will no longer behave as expected. When clearing the ATR auto-disable flag, make sure we restore the TCPv4 input set to avoid this. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14i40e: fix for wrong partition id calculation on OCP mezz cardsMariusz Stachura
This patch overwrites number of ports for X722 devices with support for OCP PHY mezzanine. The old method with checking if port is disabled in the PRTGEN_CNF register cannot be used in this case. When the OCP is removed, ports were seen as disabled, which resulted in wrong calculation of partition id, that caused WoL to be disabled on certain ports. Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14i40e: factor out re-enable functions for ATR and SBJacob Keller
A future patch needs to expand on the logic for re-enabling ATR. Doing so would cause some code to break the 80-character line limit. To reduce the level of indentation, factor out helper functions for re-enabling ATR and SB rules. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14i40e: track filter type statistics when deleting invalid filtersJacob Keller
When hardware has trouble with a particular filter, we delete it from the list. Unfortunately, we did not properly update the per-filter statistic when doing so. Create a helper function to handle this, and properly reduce the necessary counter so that it tracks the number of active filters properly. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14i40e: Fix permission check for VF MAC filtersFilip Sadowski
When VF requests adding of MAC filters the checking is done against number of already present MAC filters not adding them at the same time. It makes it possible to add a bunch of filters at once possibly exceeding acceptable limit of I40E_VC_MAX_MAC_ADDR_PER_VF filters. This happens because when checking vf->num_mac, we do not check how many filters are being requested at once. Modify the check function to ensure that it knows how many filters are being requested. This allows the check to ensure that the total number of filters in a single request does not cause us to go over the limit. Additionally, move the check to within the lock to ensure that the vf->num_mac is checked while holding the lock to maintain consistency. We could have simply moved the call to i40e_vf_check_permission to within the loop, but this could cause a request to be non-atomic, and add some but not all the addresses, while reporting an error code. We want to avoid this behavior so that users are not confused about which filters have or have not been added. Signed-off-by: Filip Sadowski <filip.sadowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14x86/mm: Fix vmalloc_fault to use pXd_largeToshi Kani
Gratian Crisan reported that vmalloc_fault() crashes when CONFIG_HUGETLBFS is not set since the function inadvertently uses pXn_huge(), which always return 0 in this case. ioremap() does not depend on CONFIG_HUGETLBFS. Fix vmalloc_fault() to call pXd_large() instead. Fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages properly") Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20180313170347.3829-2-toshi.kani@hpe.com
2018-03-14i40e: Cleanup i40e_vlan_rx_registerJacob Keller
We used to use the function i40e_vlan_rx_register as a way to hook into the now defunct .ndo_vlan_rx_register netdev hook. This was removed but we kept the function around because we still used it internally to control enabling or disabling of VLAN stripping. As pointed out in upstream review, VLAN stripping is only used in a single location and the previous function is quite small, just inline it into i40e_restore_vlan() rather than carrying the function separately. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14i40e: Fix attach VF to VM issuePaweł Jabłoński
Fix for "Resource temporarily unavailable" problem when virsh is trying to attach a device to VM. When the VF driver is loaded on host and virsh is trying to attach it to the VM and set a MAC address, it ends with a race condition between i40e_reset_vf and i40e_ndo_set_vf_mac functions. The bug is fixed by adding polling in i40e_ndo_set_vf_mac function For when the VF is in Reset mode. Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14i40evf/i40evf_main: Fix variable assignment in i40evf_parse_cls_flowerGustavo A R Silva
It seems this is a copy-paste error and that the proper variable to use in this particular case is _src_ instead of _dst_. Addresses-Coverity-ID: 1465282 ("Copy-paste error") Fixes: 0075fa0fadd0 ("i40evf: Add support to apply cloud filters") Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14i40e: remove i40e_fcoe filesCorentin Labbe
i40e_fcoe support was removed via commit 9eed69a9147c ("i40e: Drop FCoE code from core driver files") But this left files in place but un-compilable. Let's finish the cleaning. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14RDMAVT: Fix synchronization around percpu_refTejun Heo
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: linux-rdma@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-14fs/aio: Use RCU accessors for kioctx_table->table[]Tejun Heo
While converting ioctx index from a list to a table, db446a08c23d ("aio: convert the ioctx list to table lookup v3") missed tagging kioctx_table->table[] as an array of RCU pointers and using the appropriate RCU accessors. This introduces a small window in the lookup path where init and access may race. Mark kioctx_table->table[] with __rcu and use the approriate RCU accessors when using the field. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3") Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # v3.12+
2018-03-14fs/aio: Add explicit RCU grace period when freeing kioctxTejun Heo
While fixing refcounting, e34ecee2ae79 ("aio: Fix a trinity splat") incorrectly removed explicit RCU grace period before freeing kioctx. The intention seems to be depending on the internal RCU grace periods of percpu_ref; however, percpu_ref uses a different flavor of RCU, sched-RCU. This can lead to kioctx being freed while RCU read protected dereferences are still in progress. Fix it by updating free_ioctx() to go through call_rcu() explicitly. v2: Comment added to explain double bouncing. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Fixes: e34ecee2ae79 ("aio: Fix a trinity splat") Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # v3.13+
2018-03-14kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3Marc Zyngier
On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to force synchronization between the memory-mapped guest view and the system-register view that the hypervisor uses. This is incorrect, as the spec calls out the need for "a DSB whose required access type is both loads and stores with any Shareability attribute", while we're only synchronizing stores. We also lack an isb after the dsb to ensure that the latter has actually been executed before we start reading stuff from the sysregs. The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb() just after. Cc: stable@vger.kernel.org Fixes: f68d2b1b73cc ("arm64: KVM: Implement vgic-v3 save/restore") Acked-by: Christoffer Dall <cdall@kernel.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintidMarc Zyngier
The vgic code is trying to be clever when injecting GICv2 SGIs, and will happily populate LRs with the same interrupt number if they come from multiple vcpus (after all, they are distinct interrupt sources). Unfortunately, this is against the letter of the architecture, and the GICv2 architecture spec says "Each valid interrupt stored in the List registers must have a unique VirtualID for that virtual CPU interface.". GICv3 has similar (although slightly ambiguous) restrictions. This results in guests locking up when using GICv2-on-GICv3, for example. The obvious fix is to stop trying so hard, and inject a single vcpu per SGI per guest entry. After all, pending SGIs with multiple source vcpus are pretty rare, and are mostly seen in scenario where the physical CPUs are severely overcomitted. But as we now only inject a single instance of a multi-source SGI per vcpu entry, we may delay those interrupts for longer than strictly necessary, and run the risk of injecting lower priority interrupts in the meantime. In order to address this, we adopt a three stage strategy: - If we encounter a multi-source SGI in the AP list while computing its depth, we force the list to be sorted - When populating the LRs, we prevent the injection of any interrupt of lower priority than that of the first multi-source SGI we've injected. - Finally, the injection of a multi-source SGI triggers the request of a maintenance interrupt when there will be no pending interrupt in the LRs (HCR_NPIE). At the point where the last pending interrupt in the LRs switches from Pending to Active, the maintenance interrupt will be delivered, allowing us to add the remaining SGIs using the same process. Cc: stable@vger.kernel.org Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Acked-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14KVM: arm/arm64: Reduce verbosity of KVM init logArd Biesheuvel
On my GICv3 system, the following is printed to the kernel log at boot: kvm [1]: 8-bit VMID kvm [1]: IDMAP page: d20e35000 kvm [1]: HYP VA range: 800000000000:ffffffffffff kvm [1]: vgic-v2@2c020000 kvm [1]: GIC system register CPU interface enabled kvm [1]: vgic interrupt IRQ1 kvm [1]: virtual timer IRQ4 kvm [1]: Hyp mode initialized successfully The KVM IDMAP is a mapping of a statically allocated kernel structure, and so printing its physical address leaks the physical placement of the kernel when physical KASLR in effect. So change the kvm_info() to kvm_debug() to remove it from the log output. While at it, trim the output a bit more: IRQ numbers can be found in /proc/interrupts, and the HYP VA and vgic-v2 lines are not highly informational either. Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14KVM: arm/arm64: Reset mapped IRQs on VM resetChristoffer Dall
We currently don't allow resetting mapped IRQs from userspace, because their state is controlled by the hardware. But we do need to reset the state when the VM is reset, so we provide a function for the 'owner' of the mapped interrupt to reset the interrupt state. Currently only the timer uses mapped interrupts, so we call this function from the timer reset logic. Cc: stable@vger.kernel.org Fixes: 4c60e360d6df ("KVM: arm/arm64: Provide a get_input_level for the arch timer") Signed-off-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUNChristoffer Dall
Calling vcpu_load() registers preempt notifiers for this vcpu and calls kvm_arch_vcpu_load(). The latter will soon be doing a lot of heavy lifting on arm/arm64 and will try to do things such as enabling the virtual timer and setting us up to handle interrupts from the timer hardware. Loading state onto hardware registers and enabling hardware to signal interrupts can be problematic when we're not actually about to run the VCPU, because it makes it difficult to establish the right context when handling interrupts from the timer, and it makes the register access code difficult to reason about. Luckily, now when we call vcpu_load in each ioctl implementation, we can simply remove the call from the non-KVM_RUN vcpu ioctls, and our kvm_arch_vcpu_load() is only used for loading vcpu content to the physical CPU when we're actually going to run the vcpu. Cc: stable@vger.kernel.org Fixes: 9b062471e52a ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl") Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pendingAndre Przywara
Our irq_is_pending() helper function accesses multiple members of the vgic_irq struct, so we need to hold the lock when calling it. Add that requirement as a comment to the definition and take the lock around the call in vgic_mmio_read_pending(), where we were missing it before. Fixes: 96b298000db4 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14platform/x86: Fix dell driver init orderDarren Hart (VMware)
Update the initcall ordering to satisfy the following dependency ordering: 1. DCDBAS, ACPI_WMI 2. DELL_SMBIOS, DELL_RBTN 3. DELL_LAPTOP, DELL_WMI By assigning them to the following initcall levels: subsys_initcall: DCDBAS, ACPI_WMI module_init: DELL_SMBIOS, DELL_RBTN late_initcall: DELL_LAPTOP, DELL_WMI Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Mario.Limonciello@dell.com Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-03-14platform/x86: dell-smbios: Resolve dependency error on ACPI_WMIDarren Hart
Similarly to DCDBAS for DELL_SMBIOS_SMM, if DELL_SMBIOS_WMI is enabled, DELL_SMBIOS becomes dependent on ACPI_WMI. Update the depends lines to prevent a configuration where DELL_SMBIOS=y and either backend dependency =m. Update the comment accordingly. Cc: Mario Limonciello <mario.limonciello@dell.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-03-14sunvnet: does not support GSO for sctpCathy Zhou
The NETIF_F_GSO_SOFTWARE implies support for GSO on SCTP, but the sunvnet driver does not support GSO for sctp. Here we remove the NETIF_F_GSO_SOFTWARE feature flag and only report NETIF_F_ALL_TSO instead. Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14Merge tag 'linux-can-fixes-for-4.16-20180314' of ↵David S. Miller
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2018-03-14 this is a pull request of two patches for net/master. Both patches are by Andri Yngvason and fix problems in the cc770 driver, that show up quite fast on RT systems, but also on non RT setups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14Merge branch 'sctp-add-support-for-some-sctp-auth-APIs-from-RFC6458'David S. Miller
Xin Long says: ==================== sctp: add support for some sctp auth APIs from RFC6458 This patchset mainly adds support for SCTP AUTH Information for sendmsg, described in RFC6458: 5.3.8. SCTP AUTH Information Structure (SCTP_AUTHINFO) and also adds a sockopt described in RFC6458: 8.3.4. Deactivate a Shared Key (SCTP_AUTH_DEACTIVATE_KEY) and two types of events for AUTHENTICATION_EVENT described in RFC6458: 6.1.8. SCTP_AUTHENTICATION_EVENT: - SCTP_AUTH_NO_AUTH - SCTP_AUTH_FREE_KEY After this patchset, we have fully support for sctp_sendv in kernel. Note that this patchset won't touch that sctp options merge conflict. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENTXin Long
This patch is to add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT, as described in section 6.1.8 of RFC6458. SCTP_AUTH_NO_AUTH: This report indicates that the peer does not support SCTP authentication as defined in [RFC4895]. Note that the implementation is quite similar as that of SCTP_ADAPTATION_INDICATION. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14sctp: add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENTXin Long
This patch is to add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT, as described in section 6.1.8 of RFC6458. SCTP_AUTH_FREE_KEY: This report indicates that the SCTP implementation will no longer use the key identifier specified in auth_keynumber. After deactivating a key, it would never be used again, which means it's refcnt can't be held/increased by new chunks. But there may be some chunks in out queue still using it. So only when refcnt is 1, which means no chunk in outqueue is using/holding this key either, this EVENT would be sent. When users receive this notification, they could do DEL_KEY sockopt to remove this shkey, and also tell the peer that this key won't be used in any chunk thoroughly from now on, then the peer can remove it as well safely. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14sctp: add sockopt SCTP_AUTH_DEACTIVATE_KEYXin Long
This patch is to add sockopt SCTP_AUTH_DEACTIVATE_KEY, as described in section 8.3.4 of RFC6458. This set option indicates that the application will no longer send user messages using the indicated key identifier. Note that RFC requires that only deactivated keys that are no longer used by an association can be deleted, but for the backward compatibility, it is not to check deactivated when deleting or replacing one sh_key. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14sctp: add support for SCTP AUTH Information for sendmsgXin Long
This patch is to add support for SCTP AUTH Information for sendmsg, as described in section 5.3.8 of RFC6458. With this option, you can provide shared key identifier used for sending the user message. It's also a necessary send info for sctp_sendv. Note that it reuses sinfo->sinfo_tsn to indicate if this option is set and sinfo->sinfo_ssn to save the shkey ID which can be 0. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14sctp: add refcnt support for sh_keyXin Long
With refcnt support for sh_key, chunks auth sh_keys can be decided before enqueuing it. Changing the active key later will not affect the chunks already enqueued. Furthermore, this is necessary when adding the support for authinfo for sendmsg in next patch. Note that struct sctp_chunk can't be grown due to that performance drop issue on slow cpu, so it just reuses head_skb memory for shkey in sctp_chunk. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14tg3: prevent scheduling while atomic splatJonathan Toppins
The problem was introduced in commit 506b0a395f26 ("[netdrv] tg3: APE heartbeat changes"). The bug occurs because tp->lock spinlock is held which is obtained in tg3_start by way of tg3_full_lock(), line 11571. The documentation for usleep_range() specifically states it cannot be used inside a spinlock. Fixes: 506b0a395f26 ("[netdrv] tg3: APE heartbeat changes") Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14Merge branch 'smc-fixes'David S. Miller
Ursula Braun says: ==================== net/smc: fixes 2018-03-14 here are smc changes for the net-next tree. The first patch enables SMC to work with mlx5-RoCE-devices. Patches 2 and 3 deal with link group freeing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14net/smc: schedule free_work when link group is terminatedKarsten Graul
The free_work worker must be scheduled when the link group is abnormally terminated. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14net/smc: free link group without pending free_work onlyUrsula Braun
Make sure there is no pending or running free_work worker for the link group when freeing the link group. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14net/smc: pay attention to MAX_ORDER for CQ entriesUrsula Braun
smc allocates a certain number of CQ entries for used RoCE devices. For mlx5 devices the chosen constant number results in a large allocation causing this warning: [13355.124656] WARNING: CPU: 3 PID: 16535 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x2be/0x10c0 [13355.124657] Modules linked in: smc_diag(O) smc(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter mlx5_ib ib_core sunrpc mlx5_core s390_trng rng_core ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common ptp pps_core eadm_sch dm_multipath dm_mod vhost_net tun vhost tap sch_fq_codel kvm ip_tables x_tables autofs4 [last unloaded: smc] [13355.124672] CPU: 3 PID: 16535 Comm: kworker/3:0 Tainted: G O 4.14.0uschi #1 [13355.124673] Hardware name: IBM 3906 M04 704 (LPAR) [13355.124675] Workqueue: events smc_listen_work [smc] [13355.124677] task: 00000000e2f22100 task.stack: 0000000084720000 [13355.124678] Krnl PSW : 0704c00180000000 000000000029da76 (__alloc_pages_nodemask+0x2be/0x10c0) [13355.124681] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [13355.124682] Krnl GPRS: 0000000000000000 00550e00014080c0 0000000000000000 0000000000000001 [13355.124684] 000000000029d8b6 00000000f3bfd710 0000000000000000 00000000014080c0 [13355.124685] 0000000000000009 00000000ec277a00 0000000000200000 0000000000000000 [13355.124686] 0000000000000000 00000000000001ff 000000000029d8b6 0000000084723720 [13355.124708] Krnl Code: 000000000029da6a: a7110200 tmll %r1,512 000000000029da6e: a774ff29 brc 7,29d8c0 #000000000029da72: a7f40001 brc 15,29da74 >000000000029da76: a7f4ff25 brc 15,29d8c0 000000000029da7a: a7380000 lhi %r3,0 000000000029da7e: a7f4fef1 brc 15,29d860 000000000029da82: 5820f0c4 l %r2,196(%r15) 000000000029da86: a53e0048 llilh %r3,72 [13355.124720] Call Trace: [13355.124722] ([<000000000029d8b6>] __alloc_pages_nodemask+0xfe/0x10c0) [13355.124724] [<000000000013bd1e>] s390_dma_alloc+0x6e/0x148 [13355.124733] [<000003ff802eeba6>] mlx5_dma_zalloc_coherent_node+0x8e/0xe0 [mlx5_core] [13355.124740] [<000003ff802eee18>] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core] [13355.124744] [<000003ff804eb410>] mlx5_ib_create_cq+0x558/0x898 [mlx5_ib] [13355.124749] [<000003ff80407d40>] ib_create_cq+0x48/0x88 [ib_core] [13355.124751] [<000003ff80109fba>] smc_ib_setup_per_ibdev+0x52/0x118 [smc] [13355.124753] [<000003ff8010bcb6>] smc_conn_create+0x65e/0x728 [smc] [13355.124755] [<000003ff801081a2>] smc_listen_work+0x2d2/0x540 [smc] [13355.124756] [<0000000000162c66>] process_one_work+0x1be/0x440 [13355.124758] [<0000000000162f40>] worker_thread+0x58/0x458 [13355.124759] [<0000000000169e7e>] kthread+0x14e/0x168 [13355.124760] [<00000000009ce8be>] kernel_thread_starter+0x6/0xc [13355.124762] [<00000000009ce8b8>] kernel_thread_starter+0x0/0xc [13355.124762] Last Breaking-Event-Address: [13355.124764] [<000000000029da72>] __alloc_pages_nodemask+0x2ba/0x10c0 [13355.124764] ---[ end trace 34be38b581c0b585 ]--- This patch reduces the smc constant for the maximum number of allocated completion queue entries SMC_MAX_CQE by 2 to avoid high round up values in the mlx5 code, and reduces the number of allocated completion queue entries even more, if the final allocation for an mlx5 device hits the MAX_ORDER limit. Reported-by: Ihnken Menssen <menssen@de.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtuSabrina Dubroca
Prior to the rework of PMTU information storage in commit 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."), when a PMTU event advertising a PMTU smaller than net.ipv4.route.min_pmtu was received, we would disable setting the DF flag on packets by locking the MTU metric, and set the PMTU to net.ipv4.route.min_pmtu. Since then, we don't disable DF, and set PMTU to net.ipv4.route.min_pmtu, so the intermediate router that has this link with a small MTU will have to drop the packets. This patch reestablishes pre-2.6.39 behavior by splitting rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu. rt_mtu_locked indicates that we shouldn't set the DF bit on that path, and is checked in ip_dont_fragment(). One possible workaround is to set net.ipv4.route.min_pmtu to a value low enough to accommodate the lowest MTU encountered. Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14liquidio: Add support for liquidio 10GBase-T NICVeerasenareddy Burru
Added ethtool changes to show port type as TP (Twisted Pair) for 10GBASE-T ports. Same driver and firmware works for liquidio NIC with SFP+ ports or TP ports. Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>