linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-08-25	Merge branch 'perf/urgent' into perf/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25	perf/core: Fix group {cpu,task} validation	Mark Rutland
	Regardless of which events form a group, it does not make sense for the events to target different tasks and/or CPUs, as this leaves the group inconsistent and impossible to schedule. The core perf code assumes that these are consistent across (successfully intialised) groups. Core perf code only verifies this when moving SW events into a HW context. Thus, we can violate this requirement for pure SW groups and pure HW groups, unless the relevant PMU driver happens to perform this verification itself. These mismatched groups subsequently wreak havoc elsewhere. For example, we handle watchpoints as SW events, and reserve watchpoint HW on a per-CPU basis at pmu::event_init() time to ensure that any event that is initialised is guaranteed to have a slot at pmu::add() time. However, the core code only checks the group leader's cpu filter (via event_filter_match()), and can thus install follower events onto CPUs violating thier (mismatched) CPU filters, potentially installing them into a CPU without sufficient reserved slots. This can be triggered with the below test case, resulting in warnings from arch backends. #define _GNU_SOURCE #include <linux/hw_breakpoint.h> #include <linux/perf_event.h> #include <sched.h> #include <stdio.h> #include <sys/prctl.h> #include <sys/syscall.h> #include <unistd.h> static int perf_event_open(struct perf_event_attr attr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); } char watched_char; struct perf_event_attr wp_attr = { .type = PERF_TYPE_BREAKPOINT, .bp_type = HW_BREAKPOINT_RW, .bp_addr = (unsigned long)&watched_char, .bp_len = 1, .size = sizeof(wp_attr), }; int main(int argc, char argv[]) { int leader, ret; cpu_set_t cpus; /* * Force use of CPU0 to ensure our CPU0-bound events get scheduled. / CPU_ZERO(&cpus); CPU_SET(0, &cpus); ret = sched_setaffinity(0, sizeof(cpus), &cpus); if (ret) { printf("Unable to set cpu affinity\n"); return 1; } / open leader event, bound to this task, CPU0 only / leader = perf_event_open(&wp_attr, 0, 0, -1, 0); if (leader < 0) { printf("Couldn't open leader: %d\n", leader); return 1; } / * Open a follower event that is bound to the same task, but a * different CPU. This means that the group should never be possible to * schedule. / ret = perf_event_open(&wp_attr, 0, 1, leader, 0); if (ret < 0) { printf("Couldn't open mismatched follower: %d\n", ret); return 1; } else { printf("Opened leader/follower with mismastched CPUs\n"); } / * Open as many independent events as we can, all bound to the same * task, CPU0 only. / do { ret = perf_event_open(&wp_attr, 0, 0, -1, 0); } while (ret >= 0); / * Force enable/disble all events to trigger the erronoeous * installation of the follower event. */ printf("Opened all events. Toggling..\n"); for (;;) { prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0); prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0); } return 0; } Fix this by validating this requirement regardless of whether we're moving events. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhou Chengming <zhouchengming1@huawei.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25	x86/mm: Fix use-after-free of ldt_struct	Eric Biggers
	The following commit: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void fork_thread(void _arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73a2ebe ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> [v4.6+] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25	KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state	Paolo Bonzini
	The host pkru is restored right after vcpu exit (commit 1be0e61), so KVM_GET_XSAVE will return the host PKRU value instead. Fix this by using the guest PKRU explicitly in fill_xsave and load_xsave. This part is based on a patch by Junkang Fu. The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state, because it could have been changed by userspace since the last time it was saved, so skip loading it in kvm_load_guest_fpu. Reported-by: Junkang Fu <junkang.fjk@alibaba-inc.com> Cc: Yang Zhang <zy107165@alibaba-inc.com> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-25	KVM: x86: simplify handling of PKRU	Paolo Bonzini
	Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a simple comparison against the host value of the register. The write of PKRU in addition can be skipped if the guest has not enabled the feature. Once we do this, we need not test OSPKE in the host anymore, because guest_CR4.PKE=1 implies host_CR4.PKE=1. The static PKU test is kept to elide the code on older CPUs. Suggested-by: Yang Zhang <zy107165@alibaba-inc.com> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-25	KVM: x86: block guest protection keys unless the host has them enabled	Paolo Bonzini
	If the host has protection keys disabled, we cannot read and write the guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0. Block the PKU cpuid bit in that case. This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1. Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-25	esp: Fix skb tailroom calculation	Steffen Klassert
	We use skb_availroom to calculate the skb tailroom for the ESP trailer. skb_availroom calculates the tailroom and subtracts this value by reserved_tailroom. However reserved_tailroom is a union with the skb mark. This means that we subtract the tailroom by the skb mark if set. Fix this by using skb_tailroom instead. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-25	esp: Fix locking on page fragment allocation	Steffen Klassert
	We allocate the page fragment for the ESP trailer inside a spinlock, but consume it outside of the lock. This is racy as some other cou could get the same page fragment then. Fix this by consuming the page fragment inside the lock too. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-25	dmaengine: rcar-dmac: initialize all data before registering IRQ handler	Kuninori Morimoto
	Anton Volkov noticed that engine->dev is NULL before of_dma_controller_register() in probe. Thus there might be a NULL pointer dereference in rcar_dmac_chan_start_xfer while accessing chan->chan.device->dev which is equal to (&dmac->engine)->dev. On same reason, same and similar things will happen if we didn't initialize all necessary data before calling register irq function. To be more safety code, this patch initialize all necessary data before calling register irq function. Reported-by: Anton Volkov <avolkov@ispras.ru> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-08-25	Merge branch 'x86/asm' into x86/apic	Thomas Gleixner
	Pick up dependent changes to avoid merge conflicts
2017-08-25	dmaengine: k3dma: remove useless ON_WARN_ONCE()	Antonio Borneo
	Commit 36387a2b1f62b5c087c5fe6f0f7b23b94f722ad7 ("k3dma: Fix memory handling in preparation for cyclic mode") adds few calls to ON_WARN_ONCE() to track the use of ds_run/ds_done. After the two fixes: - dmaengine: k3dma: fix non-cyclic mode - dmaengine: k3dma: fix re-free of the same descriptor the behaviour of ds_run/ds_done is properly fixed. The remaining ON_WARN_ONCE() are never triggered and can be removed. Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-08-25	dmaengine: k3dma: fix double free of descriptor	Antonio Borneo
	Commit 36387a2b1f62b5c087c5fe6f0f7b23b94f722ad7 ("k3dma: Fix memory handling in preparation for cyclic mode") adds code to free the descriptor in ds_done. In cyclic mode, ds_done is never used and it's always NULL, so the added code is not executed. In non-cyclic mode, ds_done is used as a flag: when not NULL it signals that the descriptor has been consumed. No need to free it because it would be free by vchan_complete(). The fix takes back the code changed by the commit above: - remove the free on the descriptor; - initialize ds_done to NULL for the next run. Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-08-25	dmaengine: k3dma: fix non-cyclic mode	Antonio Borneo
	Commit 36387a2b1f62b5c087c5fe6f0f7b23b94f722ad7 ("k3dma: Fix memory handling in preparation for cyclic mode") broke the logic around ds_run/ds_done in case of non-cyclic DMA. This went unnoticed as the only user of k3dma was the i2s audio driver but, with a patch set to enable dma on SPI, the issue popped out. The fix re-applies the initialization to ds_run/ds_done in k3_dma_start_txd() that were removed by the commit above. Also, one of the calls to k3_dma_start_txd() is triggered by (ds_done != NULL), so remove the noisy and useless call to WARN_ON_ONCE(). Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-08-25	drm/exynos: simplify set_pixfmt() in DECON and FIMD drivers	Tobias Jakobi
	DRM core already checks the validity of the pixelformat. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos: consistent use of cpp	Tobias Jakobi
	A recent commit (272725c7db4da1fd3229d944fc76d2e98e3a144e) has removed the use of 'bits_per_pixel' in DRM. However the corresponding Exynos driver code still uses the ambiguous 'bpp', even though it is now initialized from fb->cpp[0]. Consistenly use 'cpp' in FIMD, DECON7 and DECON5433 drivers. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
2017-08-24	netvsc: fix deadlock betwen link status and removal	stephen hemminger
	There is a deadlock possible when canceling the link status delayed work queue. The removal process is run with RTNL held, and the link status callback is acquring RTNL. Resolve the issue by using trylock and rescheduling. If cancel is in process, that block it from happening. Fixes: 122a5f6410f4 ("staging: hv: use delayed_work for netvsc_send_garp()") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	strparser: initialize all callbacks	Eric Biggers
	commit bbb03029a899 ("strparser: Generalize strparser") added more function pointers to 'struct strp_callbacks'; however, kcm_attach() was not updated to initialize them. This could cause the ->lock() and/or ->unlock() function pointers to be set to garbage values, causing a crash in strp_work(). Fix the bug by moving the callback structs into static memory, so unspecified members are zeroed. Also constify them while we're at it. This bug was found by syzkaller, which encountered the following splat: IP: 0x55 PGD 3b1ca067 P4D 3b1ca067 PUD 3b12f067 PMD 0 Oops: 0010 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: kstrp strp_work task: ffff88006bb0e480 task.stack: ffff88006bb10000 RIP: 0010:0x55 RSP: 0018:ffff88006bb17540 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000 RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48 RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000 R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48 R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000 FS: 0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098 worker_thread+0x223/0x1860 kernel/workqueue.c:2233 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Code: Bad RIP value. RIP: 0x55 RSP: ffff88006bb17540 CR2: 0000000000000055 ---[ end trace f0e4920047069cee ]--- Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and CONFIG_AF_KCM=y): #include <linux/bpf.h> #include <linux/kcm.h> #include <linux/types.h> #include <stdint.h> #include <sys/ioctl.h> #include <sys/socket.h> #include <sys/syscall.h> #include <unistd.h> static const struct bpf_insn bpf_insns[3] = { { .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) / { .code = 0x95 }, / BPF_EXIT_INSN() */ }; static const union bpf_attr bpf_attr = { .prog_type = 1, .insn_cnt = 2, .insns = (uintptr_t)&bpf_insns, .license = (uintptr_t)"", }; int main(void) { int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &bpf_attr, sizeof(bpf_attr)); int inet_fd = socket(AF_INET, SOCK_STREAM, 0); int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0); ioctl(kcm_fd, SIOCKCMATTACH, &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd }); } Fixes: bbb03029a899 ("strparser: Generalize strparser") Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	hv_netvsc: Fix rndis_filter_close error during netvsc_remove	Haiyang Zhang
	We now remove rndis filter before unregister_netdev(), which calls device close. It involves closing rndis filter already removed. This patch fixes this error. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	Merge branch 'tipc-buffer-reassignment-fixes'	David S. Miller
	Parthasarathy Bhuvaragan says: ==================== tipc: buffer reassignment fixes This series contains fixes for buffer reassignments and a context imbalance. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	tipc: context imbalance at node read unlock	Parthasarathy Bhuvaragan
	If we fail to find a valid bearer in tipc_node_get_linkname(), node_read_unlock() is called without holding the node read lock. This commit fixes this error. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	tipc: reassign pointers after skb reallocation / linearization	Parthasarathy Bhuvaragan
	In tipc_msg_reverse(), we assign skb attributes to local pointers in stack at startup. This is followed by skb_linearize() and for cloned buffers we perform skb relocation using pskb_expand_head(). Both these methods may update the skb attributes and thus making the pointers incorrect. In this commit, we fix this error by ensuring that the pointers are re-assigned after any of these skb operations. Fixes: 29042e19f2c60 ("tipc: let function tipc_msg_reverse() expand header when needed") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	tipc: perform skb_linearize() before parsing the inner header	Parthasarathy Bhuvaragan
	In tipc_rcv(), we linearize only the header and usually the packets are consumed as the nodes permit direct reception. However, if the skb contains tunnelled message due to fail over or synchronization we parse it in tipc_node_check_state() without performing linearization. This will cause link disturbances if the skb was non linear. In this commit, we perform linearization for the above messages. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	Merge tag 'mlx5-updates-2017-08-24' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-08-24 This series includes updates to mlx5 core driver. From Gal and Saeed, three cleanup patches. From Matan, Low level flow steering improvements and optimizations, - Use more efficient data structures for flow steering objects handling. - Add tracepoints to flow steering operations. - Overall these patches improve flow steering rule insertion rate by a factor of seven in large scales (~50K rules or more). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	hinic: uninitialized variable in hinic_api_cmd_init()	Dan Carpenter
	We never set the error code in this function. Fixes: eabf0fad81d5 ("net-next/hinic: Initialize api cmd resources") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	net_sched: fix a refcount_t issue with noop_qdisc	Eric Dumazet
	syzkaller reported a refcount_t warning [1] Issue here is that noop_qdisc refcnt was never really considered as a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set : if (qdisc->flags & TCQ_F_BUILTIN \|\| !refcount_dec_and_test(&qdisc->refcnt))) return; Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not really needed, but harmless until refcount_t came. To fix this problem, we simply need to not increment noop_qdisc.refcnt, since we never decrement it. [1] refcount_t: increment on 0; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152 RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282 RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000 RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8 RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0 R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000 attach_default_qdiscs net/sched/sch_generic.c:792 [inline] dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833 __dev_open+0x227/0x330 net/core/dev.c:1380 __dev_change_flags+0x695/0x990 net/core/dev.c:6726 dev_change_flags+0x88/0x140 net/core/dev.c:6792 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554 sock_do_ioctl+0x94/0xb0 net/socket.c:968 sock_ioctl+0x2c2/0x440 net/socket.c:1058 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Reshetova, Elena <elena.reshetova@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	net: mv643xx_eth: Be drop monitor friendly	Florian Fainelli
	txq_reclaim() does the normal transmit queue reclamation and rxq_deinit() does the RX ring cleanup, none of these are packet drops, so use dev_consume_skb() for both locations. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	mtd: nand: atmel: Relax tADL_min constraint	Boris Brezillon
	Version 4 of the ONFI spec mandates that tADL be at least 400 nanoseconds, but, depending on the master clock rate, 400 ns may not fit in the tADL field of the SMC reg. We need to relax the check and accept the -ERANGE return code. Note that previous versions of the ONFI spec had a lower tADL_min (100 or 200 ns). It's not clear why this timing constraint got increased but it seems most NANDs are fine with values lower than 400ns, so we should be safe. Fixes: f9ce2eddf176 ("mtd: nand: atmel: Add ->setup_data_interface() hooks") Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2017-08-24	mtd: nandsim: remove debugfs entries in error path	Uwe Kleine-König
	The debugfs entries must be removed before an error is returned in the probe function. Otherwise another try to load the module fails and when the debugfs files are accessed without the module loaded, the kernel still tries to call a function in that module. Fixes: 5346c27c5fed ("mtd: nandsim: Introduce debugfs infrastructure") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Richard Weinberger <richard@nod.at> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2017-08-25	drm/exynos: mixer: remove src offset from mixer_graph_buffer()	Tobias Jakobi
	We always translate the dma address such that the offsets of the source image are zero. Hence we can remove manipulation of the MXR_GRAPHIC_SXY(win) register and just zero them once in mixer_win_reset(). Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos: mixer: simplify mixer_graph_buffer()	Tobias Jakobi
	DRM core already checks in drm_atomic_plane_check() if the pixelformat is valid. Hence we can collapse the default case of the switch statement with the XRGB8888 case. No functional change. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos: mixer: simplify vp_video_buffer()	Tobias Jakobi
	DRM core already checks in drm_atomic_plane_check() if the pixelformat is valid. Hence we can drop the default case of the switch statement and collapse most of the code. Also rename the two booleans to reflect what true/false actually means, and to avoid mixing CrCb/NV21 descriptions. No functional change. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos: mixer: enable NV12MT support for the video plane	Tobias Jakobi
	The video processor supports a tiled version of the NV12 format, known as NV12MT in V4L2 terms. The support was removed in commit 083500baefd5f4c215a5a93aef2492c1aa775828 due to not being a real pixel format, but rather NV12 with a special memory layout. With the introduction of FB modifiers, we can now properly support this format again. Tested with a hacked up modetest from libdrm's test suite on an ODROID-X2 (Exynos4412). Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos: mixer: fix chroma comment in vp_video_buffer()	Tobias Jakobi
	The current comment sounds like the division op is done to compensate for some hardware erratum. But the chroma plane having half the height of the luma plane is just the way NV12/NV21 is defined, so clarify this behaviour. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	arm64: dts: exynos: remove i80-if-timings nodes	Andrzej Hajda
	Since i80/command mode is determined in runtime by propagating info from panel this property can be removed. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	dt-bindings: exynos5433-decon: remove i80-if-timings property	Andrzej Hajda
	Since i80/command mode is determined in runtime by propagating info from panel this property can be removed. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos/decon5433: use mode info stored in CRTC to detect i80 mode	Andrzej Hajda
	Since panel's mode of work is propagated properly from panel to DECON, there is no need to use redundant private device tree property. The only issue with such approach is that check for required interrupts should be postponed until panel communicate its requirements, ie to mode validation phase - mode_valid callback. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
2017-08-25	drm/exynos: add mode_valid callback to exynos_drm	Andrzej Hajda
	crtc::mode_valid callback is required to implement proper pipeline validation for command/video modes. Since Exynos uses private framework such callback should be added to it. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos/decon5433: refactor irq requesting code	Andrzej Hajda
	To allow runtime validation of mode of work irq request code should be split into two separate phases: - irq reqesting, - irq checking. Following patches will move 2nd phase to mode validation phase. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos/mic: use mode info stored in CRTC to detect i80 mode	Andrzej Hajda
	MIC driver should use info from CRTC to check mode of work instead of illegally peeking into nodes of other devices. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos/dsi: propagate info about command mode from panel	Andrzej Hajda
	mipi_dsi framework provides information about panel's mode of work. This info should be propagated upstream to configure all elements of the pipeline. As CRTC is the common denominator of the pipeline we can put such info into its structures. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos/dsi: refactor panel detection logic	Andrzej Hajda
	Description of drm_helper_hpd_irq_event clearly states that drivers supporting hotplug events per connector should use different helper - drm_kms_helper_hotplug_event. To achieve it following changes have been performed: - moved down all DSI ops - they require exynos_dsi_disable function to be defined earlier, - simplified exynos_dsi_detect - there is no real detection, it just returns if panel is attached, - DSI attach/detach callbacks attaches/detaches DRM panel and sets connector status and other context fields accordingly, all this is performed under mutex, as these callbacks are asynchronous. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos: use helper to set possible crtcs	Andrzej Hajda
	All encoders share the same code to set encoders possible_crtcs field. The patch creates helper to abstract out this code. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-25	drm/exynos/decon5433: use readl_poll_timeout helpers	Andrzej Hajda
	Linux core provide helpers for polling with timeout, lets use them. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-08-24	tg3: Be drop monitor friendly	Florian Fainelli
	tg3_tx() does the normal packet TX completion, tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a new SKB that is suitable to workaround HW bugs, and finally tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for these 3 locations to be SKB drop monitor friendly. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	net: systemport: Free DMA coherent descriptors on errors	Florian Fainelli
	In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we would return with an error, and call bcm_sysport_fini_tx_ring() and it would see that ring->cbs is NULL and do nothing. This would leak the coherent DMA descriptor area, so we need to free it on error before returning. Reported-by: Eric Dumazet <edumazet@gmail.com> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	net: bcmgenet: Be drop monitor friendly	Florian Fainelli
	There are 3 spots where we call dev_kfree_skb() but we are actually just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring setup and bcmgenet_free_rx_buffers() during RX ring cleanup. Fixes: d6707bec5986 ("net: bcmgenet: rewrite bcmgenet_rx_refill()") Fixes: f48bed16a756 ("net: bcmgenet: Free skb after last Tx frag") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	bpf: fix bpf_setsockopts return value	Yuchung Cheng
	This patch fixes a bug causing any sock operations to always return EINVAL. Fixes: a5192c52377e ("bpf: fix to bpf_setsockops"). Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Craig Gallek <kraig@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	net: systemport: Be drop monitor friendly	Florian Fainelli
	Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is used when a TX packet is completed, as well as when the RX ring is cleaned on shutdown. None of these two cases are packet drops, so be drop monitor friendly. Suggested-by: Eric Dumazet <edumazet@gmail.com> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	Merge branch 'ipv6-Route-ICMPv6-errors-with-the-flow-when-ECMP-in-use'	David S. Miller
	Jakub Sitnicki says: ==================== ipv6: Route ICMPv6 errors with the flow when ECMP in use This patch set is another take at making Path MTU Discovery work when server nodes are behind a router employing multipath routing in a load-balance or anycast setup (that is, when not every end-node can be reached by every path). The problem has been well described in RFC 7690 [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed to be routed back to the server node that sent a reply that exceeds path MTU. The proposed solution is two-fold: (1) on the server side - reflect the Flow Label [2]. This can be done without modifying the application using a new per-netns sysctl knob that has been proposed independently of this patchset in the patch entitled "ipv6: Add sysctl for per namespace flow label reflection" [3]. (2) on the ECMP router - make the ipv6 routing subsystem look into the ICMPv6 error packets and compute the flow-hash from its payload, i.e. the offending packet that triggered the error. This is the same behavior as ipv4 stack has already. With both parts in place Path MTU Discovery can work past the ECMP router when using IPv6. [1] https://tools.ietf.org/html/rfc7690 [2] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01 [3] http://patchwork.ozlabs.org/patch/804870/ v1 -> v2: - don't use "extern" in external function declaration in header file - style change, put as many arguments as possible on the first line of a function call, and align consecutive lines to the first argument - expand the cover letter based on the feedback v2 -> v3: - switch to computing flow-hash using flow dissector to align with recent changes to multipath routing in ipv4 stack - add a sysctl knob for enabling flow label reflection per netns --- Testing has covered multipath routing of ICMPv6 PTB errors in forward and local output path in a simple use-case of an HTTP server sending a reply which is over the path MTU size [3]. I have also checked if the flows get evenly spread over multiple paths (i.e. if there are no regressions) [4]. [3] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/pmtud [4] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/load-balance ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24	ipv6: Use multipath hash from flow info if available	Jakub Sitnicki
	Allow our callers to influence the choice of ECMP link by honoring the hash passed together with the flow info. This allows for special treatment of ICMP errors which we would like to route over the same path as the IPv6 datagram that triggered the error. Also go through rt6_multipath_hash(), in the usual case when we aren't dealing with an ICMP error, so that there is one central place where multipath hash is computed. Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>