summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-02-15nospec: Move array_index_nospec() parameter checking into separate macroWill Deacon
For architectures providing their own implementation of array_index_mask_nospec() in asm/barrier.h, attempting to use WARN_ONCE() to complain about out-of-range parameters using WARN_ON() results in a mess of mutually-dependent include files. Rather than unpick the dependencies, simply have the core code in nospec.h perform the checking for us. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1517840166-15399-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15x86/speculation: Fix up array_index_nospec_mask() asm constraintDan Williams
Allow the compiler to handle @size as an immediate value or memory directly rather than allocating a register. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/151797010204.1289.1510000292250184993.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15x86/debug: Use UD2 for WARN()Peter Zijlstra
Since the Intel SDM added an ModR/M byte to UD0 and binutils followed that specification, we now cannot disassemble our kernel anymore. This now means Intel and AMD disagree on the encoding of UD0. And instead of playing games with additional bytes that are valid ModR/M and single byte instructions (0xd6 for instance), simply use UD2 for both WARN() and BUG(). Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180208194406.GD25181@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15x86/debug, objtool: Annotate WARN()-related UD2 as reachableJosh Poimboeuf
By default, objtool assumes that a UD2 is a dead end. This is mainly because GCC 7+ sometimes inserts a UD2 when it detects a divide-by-zero condition. Now that WARN() is moving back to UD2, annotate the code after it as reachable so objtool can follow the code flow. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild test robot <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/0e483379275a42626ba8898117f918e1bf661e40.1518130694.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15objtool: Fix segfault in ignore_unreachable_insn()Josh Poimboeuf
Peter Zijlstra's patch for converting WARN() to use UD2 triggered a bunch of false "unreachable instruction" warnings, which then triggered a seg fault in ignore_unreachable_insn(). The seg fault happened when it tried to dereference a NULL 'insn->func' pointer. Thanks to static_cpu_has(), some functions can jump to a non-function area in the .altinstr_aux section. That breaks ignore_unreachable_insn()'s assumption that it's always inside the original function. Make sure ignore_unreachable_insn() only follows jumps within the current function. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild test robot <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/bace77a60d5af9b45eddb8f8fb9c776c8de657ef.1518130694.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systemsDominik Brodowski
The ldt_gdt and ptrace_syscall selftests, even in their 64-bit variant, use hard-coded 32-bit syscall numbers and call "int $0x80". This will fail on 64-bit systems with CONFIG_IA32_EMULATION=y disabled. Therefore, do not build these tests if we cannot build 32-bit binaries (which should be a good approximation for CONFIG_IA32_EMULATION=y being enabled). Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Link: http://lkml.kernel.org/r/20180211111013.16888-6-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15selftests/x86: Do not rely on "int $0x80" in single_step_syscall.cDominik Brodowski
On 64-bit builds, we should not rely on "int $0x80" working (it only does if CONFIG_IA32_EMULATION=y is enabled). To keep the "Set TF and check int80" test running on 64-bit installs with CONFIG_IA32_EMULATION=y enabled, build this test only if we can also build 32-bit binaries (which should be a good approximation for that). Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Link: http://lkml.kernel.org/r/20180211111013.16888-5-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=nCorentin Labbe
When CONFIG_NUMA is not set, the build fails with: arch/powerpc/platforms/pseries/hotplug-cpu.c:335:4: error: déclaration implicite de la fonction « update_numa_cpu_lookup_table » So we have to add update_numa_cpu_lookup_table() as an empty function when CONFIG_NUMA is not set. Fixes: 1d9a090783be ("powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15powerpc/powernv: IMC fix out of bounds memory access at shutdownNicholas Piggin
The OPAL IMC driver's shutdown handler disables nest PMU counters by walking nodes and taking the first CPU out of their cpumask, which is used to index into the paca (get_hard_smp_processor_id()). This does not always do the right thing, and in particular for CPU-less nodes it returns NR_CPUS and that overruns the paca and dereferences random memory. Fix it by being more careful about checking returned CPU, and only using online CPUs. It's not clear this shutdown code makes sense after commit 885dcd709b ("powerpc/perf: Add nest IMC PMU support"), but this should not make things worse Currently the bug causes us to call OPAL with a junk CPU number. A separate patch in development to change the way pacas are allocated escalates this bug into a crash: Unable to handle kernel paging request for data at address 0x2a21af1eeb000076 Faulting instruction address: 0xc0000000000a5468 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP opal_imc_counters_shutdown+0x148/0x1d0 LR opal_imc_counters_shutdown+0x134/0x1d0 Call Trace: opal_imc_counters_shutdown+0x134/0x1d0 (unreliable) platform_drv_shutdown+0x44/0x60 device_shutdown+0x1f8/0x350 kernel_restart_prepare+0x54/0x70 kernel_restart+0x28/0xc0 SyS_reboot+0x1d0/0x2c0 system_call+0x58/0x6c Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15powerpc/xive: Use hw CPU ids when configuring the CPU queuesCédric Le Goater
The CPU event notification queues on sPAPR should be configured using a hardware CPU identifier. The problem did not show up on the Power Hypervisor because pHyp supports 8 threads per core which keeps CPU number contiguous. This is not the case on all sPAPR virtual machines, some use SMT=1. Also improve error logging by adding the CPU number. Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15powerpc: Expose TSCR via sysfs only on powernvCyril Bur
The TSCR can only be accessed in hypervisor mode. Fixes: 88b5e12eeb11 ("powerpc: Expose TSCR via sysfs") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-14netfilter: nat: cope with negative port rangePaolo Abeni
syzbot reported a division by 0 bug in the netfilter nat code: divide error: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: 0018:ffff8801b2466778 EFLAGS: 00010246 RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900 R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30 get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362 nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406 nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124 redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41 nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302 nf_nat_ipv6_local_fn+0x33/0x5d0 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407 ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142 dccp_connect+0x369/0x670 net/dccp/output.c:564 dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946 __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x441c69 RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69 RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590 R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b 45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48 c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01 RIP: nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778 The problem is that currently we don't have any check on the configured port range. A port range == -1 triggers the bug, while other negative values may require a very long time to complete the following loop. This commit addresses the issue swapping the two ends on negative ranges. The check is performed in nf_nat_l4proto_unique_tuple() since the nft nat loads the port values from nft registers at runtime. v1 -> v2: use the correct 'Fixes' tag v2 -> v3: update commit message, drop unneeded READ_ONCE() Fixes: 5b1158e909ec ("[NETFILTER]: Add NAT support for nf_conntrack") Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: x_tables: fix missing timer initialization in xt_LEDPaolo Abeni
syzbot reported that xt_LED may try to use the ledinternal->timer without previously initializing it: ------------[ cut here ]------------ kernel BUG at kernel/time/timer.c:958! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 1826 Comm: kworker/1:2 Not tainted 4.15.0+ #306 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:__mod_timer kernel/time/timer.c:958 [inline] RIP: 0010:mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102 RSP: 0018:ffff8801d24fe9f8 EFLAGS: 00010293 RAX: ffff8801d25246c0 RBX: ffff8801aec6cb50 RCX: ffffffff816052c6 RDX: 0000000000000000 RSI: 00000000fffbd14b RDI: ffff8801aec6cb68 RBP: ffff8801d24fec98 R08: 0000000000000000 R09: 1ffff1003a49fd6c R10: ffff8801d24feb28 R11: 0000000000000005 R12: dffffc0000000000 R13: ffff8801d24fec70 R14: 00000000fffbd14b R15: ffff8801af608f90 FS: 0000000000000000(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000206d6fd0 CR3: 0000000006a22001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: led_tg+0x1db/0x2e0 net/netfilter/xt_LED.c:75 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_raw_hook+0x65/0x80 net/ipv6/netfilter/ip6table_raw.c:42 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook.constprop.27+0x3f6/0x830 include/linux/netfilter.h:243 NF_HOOK include/linux/netfilter.h:286 [inline] ndisc_send_skb+0xa51/0x1370 net/ipv6/ndisc.c:491 ndisc_send_ns+0x38a/0x870 net/ipv6/ndisc.c:633 addrconf_dad_work+0xb9e/0x1320 net/ipv6/addrconf.c:4008 process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x33c/0x400 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429 Code: 85 2a 0b 00 00 4d 8b 3c 24 4d 85 ff 75 9f 4c 8b bd 60 fd ff ff e8 bb 57 10 00 65 ff 0d 94 9a a1 7e e9 d9 fc ff ff e8 aa 57 10 00 <0f> 0b e8 a3 57 10 00 e9 14 fb ff ff e8 99 57 10 00 4c 89 bd 70 RIP: __mod_timer kernel/time/timer.c:958 [inline] RSP: ffff8801d24fe9f8 RIP: mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102 RSP: ffff8801d24fe9f8 ---[ end trace f661ab06f5dd8b3d ]--- The ledinternal struct can be shared between several different xt_LED targets, but the related timer is currently initialized only if the first target requires it. Fix it by unconditionally initializing the timer struct. v1 -> v2: call del_timer_sync() unconditionally, too. Fixes: 268cb38e1802 ("netfilter: x_tables: add LED trigger target") Reported-by: syzbot+10c98dc5725c6c8fc7fb@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14.gitignore: ignore ASN.1 auto generated filesZhu Lingshan
when build kernel with default configure, files: generatenet/ipv4/netfilter/nf_nat_snmp_basic-asn1.c net/ipv4/netfilter/nf_nat_snmp_basic-asn1.h will be automatically generated by ASN.1 compiler, so No need to track them in git, it's better to ignore them. Signed-off-by: Zhu Lingshan <lszhu@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: x_tables: use pr ratelimiting in all remaining spotsFlorian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: x_tables: use pr ratelimiting in matches/targetsFlorian Westphal
all of these print simple error message - use single pr_ratelimit call. checkpatch complains about lines > 80 but this would require splitting several "literals" over multiple lines which is worse. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: x_tables: rate-limit table mismatch warningsFlorian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: bridge: use pr ratelimitingFlorian Westphal
ebt_among still uses pr_err -- these errors indicate ebtables tool bug, not a usage error. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: xt_set: use pr ratelimitingFlorian Westphal
also convert this to info for consistency. These errors are informational message to user, given iptables doesn't have netlink extack equivalent. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: xt_NFQUEUE: use pr ratelimitingFlorian Westphal
switch this to info, since these aren't really errors. We only use printk because we cannot report meaningful errors in the xtables framework. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: xt_CT: use pr ratelimitingFlorian Westphal
checkpatch complains about line > 80 but this would require splitting "literal" over two lines which is worse. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: x_tables: use pr ratelimiting in xt coreFlorian Westphal
most messages are converted to info, since they occur in response to wrong usage. Size mismatch however is a real error (xtables ABI bug) that should not occur. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: x_tables: remove pr_info where possibleFlorian Westphal
remove several pr_info messages that cannot be triggered with iptables, the check is only to ensure input is sane. iptables(8) already prints error messages in these cases. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14netfilter: ipt_CLUSTERIP: fix a refcount bug in clusterip_config_find_get()Cong Wang
In clusterip_config_find_get() we hold RCU read lock so it could run concurrently with clusterip_config_entry_put(), as a result, the refcnt could go back to 1 from 0, which leads to a double list_del()... Just replace refcount_inc() with refcount_inc_not_zero(), as for c->refcount. Fixes: d73f33b16883 ("netfilter: CLUSTERIP: RCU conversion") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14tls: getsockopt return record sequence numberBoris Pismenny
Return the TLS record sequence number in getsockopt. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tls: reset the crypto info if copy_from_user failsBoris Pismenny
copy_from_user could copy some partial information, as a result TLS_CRYPTO_INFO_READY(crypto_info) could be true while crypto_info is using uninitialzed data. This patch resets crypto_info when copy_from_user fails. fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tls: retrun the correct IV in getsockoptBoris Pismenny
Current code returns four bytes of salt followed by four bytes of IV. This patch returns all eight bytes of IV. fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14Merge branch 'net-segmentation-offload-doc-fixes'David S. Miller
Daniel Axtens says: ==================== Updates to segmentation-offloads.txt I've been trying to wrap my head around GSO for a while now. This is a set of small changes to the docs that would probably have been helpful when I was starting out. I realise that GSO_DODGY is still a notable omission - I'm hesitant to write too much on it just yet as I don't understand it well and I think it's in the process of changing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14docs: segmentation-offloads.txt: add SCTP infoDaniel Axtens
Most of this is extracted from 90017accff61 ("sctp: Add GSO support"), with some extra text about GSO_BY_FRAGS and the need to check for it. Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14docs: segmentation-offloads.txt: Fix ref to SKB_GSO_TUNNEL_REMCSUMDaniel Axtens
The doc originally called it SKB_GSO_REMCSUM. Fix it. Fixes: f7a6272bf3cb ("Documentation: Add documentation for TSO and GSO features") Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14docs: segmentation-offloads.txt: update for UFO depreciationDaniel Axtens
UFO is deprecated except for tuntap and packet per 0c19f846d582, ("net: accept UFO datagrams from tuntap and packet"). Update UFO docs to reflect this. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14netfilter: add back stackpointer size checksFlorian Westphal
The rationale for removing the check is only correct for rulesets generated by ip(6)tables. In iptables, a jump can only occur to a user-defined chain, i.e. because we size the stack based on number of user-defined chains we cannot exceed stack size. However, the underlying binary format has no such restriction, and the validation step only ensures that the jump target is a valid rule start point. IOW, its possible to build a rule blob that has no user-defined chains but does contain a jump. If this happens, no jump stack gets allocated and crash occurs because no jumpstack was allocated. Fixes: 7814b6ec6d0d6 ("netfilter: xtables: don't save/restore jumpstack offset") Reported-by: syzbot+e783f671527912cd9403@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14Merge branch 'tipc-locking-fixes'David S. Miller
Ying Xue says: ==================== tipc: Fix missing RTNL lock protection during setting link properties At present it's unsafe to configure link properties through netlink as the entire setting process is not under RTNL lock protection. Now TIPC supports two different sets of netlink APIs at the same time, and they share the same set of backend functions to configure bearer, media and net properties. In order to solve the missing RTNL issue, we have to make the whole __tipc_nl_compat_doit() protected by RTNL, which means any function called within it cannot take RTNL any more. So in the series we first introduce the following new functions which doesn't hold RTNl lock: - __tipc_nl_bearer_disable() - __tipc_nl_bearer_enable() - __tipc_nl_bearer_set() - __tipc_nl_media_set() - __tipc_nl_net_set() Meanwhile, __tipc_nl_compat_doit() has been reconstructed to minimize the time of holding RTNL lock. Changes in v4: - Per suggestion of Kirill Tkhai, divided original big one patch into seven small ones so that they can be easily reviewed. Changes in v3: - Optimized return method of __tipc_nl_bearer_enable() regarding the comments from David M and Kirill Tkhai - Moved the allocations of memory in __tipc_nl_compat_doit() out of RTNL lock to minimize the time of holding RTNL lock according to the suggestion of Kirill Tkhai. Changes in v2: - The whole operation of setting bearer/media properties has been protected under RTNL, as per feedback from David M. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Fix missing RTNL lock protection during setting link propertiesYing Xue
Currently when user changes link properties, TIPC first checks if user's command message contains media name or bearer name through tipc_media_find() or tipc_bearer_find() which is protected by RTNL lock. But when tipc_nl_compat_link_set() conducts the checking with the two functions, it doesn't hold RTNL lock at all, as a result, the following complaints were reported: audit: type=1400 audit(1514679888.244:9): avc: denied { write } for pid=3194 comm="syzkaller021477" path="socket:[11143]" dev="sockfs" ino=11143 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=netlink_generic_socket permissive=1 Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> ============================= WARNING: suspicious RCU usage 4.15.0-rc5+ #152 Not tainted ----------------------------- net/tipc/bearer.c:177 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syzkaller021477/3194: #0: (cb_lock){++++}, at: [<00000000d20133ea>] genl_rcv+0x19/0x40 net/netlink/genetlink.c:634 #1: (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_lock net/netlink/genetlink.c:33 [inline] #1: (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_rcv_msg+0x115/0x140 net/netlink/genetlink.c:622 stack backtrace: CPU: 1 PID: 3194 Comm: syzkaller021477 Not tainted 4.15.0-rc5+ #152 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 tipc_bearer_find+0x2b4/0x3b0 net/tipc/bearer.c:177 tipc_nl_compat_link_set+0x329/0x9f0 net/tipc/netlink_compat.c:729 __tipc_nl_compat_doit net/tipc/netlink_compat.c:288 [inline] tipc_nl_compat_doit+0x15b/0x660 net/tipc/netlink_compat.c:335 tipc_nl_compat_handle net/tipc/netlink_compat.c:1119 [inline] tipc_nl_compat_recv+0x112f/0x18f0 net/tipc/netlink_compat.c:1201 genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:599 genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624 netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408 genl_rcv+0x28/0x40 net/netlink/genetlink.c:635 netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline] netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864 sock_sendmsg_nosec net/socket.c:636 [inline] sock_sendmsg+0xca/0x110 net/socket.c:646 sock_write_iter+0x31a/0x5d0 net/socket.c:915 call_write_iter include/linux/fs.h:1772 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x684/0x970 fs/read_write.c:482 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129 In order to correct the mistake, __tipc_nl_compat_doit() has been protected by RTNL lock, which means the whole operation of setting bearer/media properties is under RTNL protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reported-by: syzbot <syzbot+6345fd433db009b29413@syzkaller.appspotmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_net_setYing Xue
Introduce __tipc_nl_net_set() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_media_setYing Xue
Introduce __tipc_nl_media_set() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_bearer_setYing Xue
Introduce __tipc_nl_bearer_set() which doesn't holding RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_bearer_enableYing Xue
Introduce __tipc_nl_bearer_enable() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_bearer_disableYing Xue
Introduce __tipc_nl_bearer_disable() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Refactor __tipc_nl_compat_doitYing Xue
As preparation for adding RTNL to make (*cmd->transcode)() and (*cmd->transcode)() constantly protected by RTNL lock, we move out of memory allocations existing between them as many as possible so that the time of holding RTNL can be minimized in __tipc_nl_compat_doit(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14netfilter: drop outermost socket lock in getsockopt()Paolo Abeni
The Syzbot reported a possible deadlock in the netfilter area caused by rtnl lock, xt lock and socket lock being acquired with a different order on different code paths, leading to the following backtrace: Reviewed-by: Xin Long <lucien.xin@gmail.com> ====================================================== WARNING: possible circular locking dependency detected 4.15.0+ #301 Not tainted ------------------------------------------------------ syzkaller233489/4179 is trying to acquire lock: (rtnl_mutex){+.+.}, at: [<0000000048e996fd>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 but task is already holding lock: (&xt[i].mutex){+.+.}, at: [<00000000328553a2>] xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041 which lock already depends on the new lock. === Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope"), we already acquire the socket lock in the innermost scope, where needed. In such commit I forgot to remove the outer-most socket lock from the getsockopt() path, this commit addresses the issues dropping it now. v1 -> v2: fix bad subj, added relavant 'fixes' tag Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers") Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev") Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope") Reported-by: syzbot+ddde1c7b7ff7442d7f2d@syzkaller.appspotmail.com Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-14drm/i915: Fix DSI panels with v1 MIPI sequences without a DEASSERT sequence v3Hans de Goede
So far models of the Dell Venue 8 Pro, with a panel with MIPI panel index = 3, one of which has been kindly provided to me by Jan Brummer, where not working with the i915 driver, giving a black screen on the first modeset. The problem with at least these Dells is that their VBT defines a MIPI ASSERT sequence, but not a DEASSERT sequence. Instead they DEASSERT the reset in their INIT_OTP sequence, but the deassert must be done before calling intel_dsi_device_ready(), so that is too late. Simply doing the INIT_OTP sequence earlier is not enough to fix this, because the INIT_OTP sequence also sends various MIPI packets to the panel, which can only happen after calling intel_dsi_device_ready(). This commit fixes this by splitting the INIT_OTP sequence into everything before the first DSI packet and everything else, including the first DSI packet. The first part (everything before the first DSI packet) is then used as deassert sequence. Changed in v2: -Split the init OTP sequence into a deassert reset and the actual init OTP sequence, instead of calling it earlier and then having the first mipi_exec_send_packet() call call intel_dsi_device_ready(). Changes in v3: -Move the whole shebang to intel_bios.c Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82880 References: https://bugs.freedesktop.org/show_bug.cgi?id=101205 Cc: Jan-Michael Brummer <jan.brummer@tabos.org> Reported-by: Jan-Michael Brummer <jan.brummer@tabos.org> Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180214082151.25015-3-hdegoede@redhat.com (cherry picked from commit fb38e7ade9af4f3e96f5916c3f6f19bfc7d5f961) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-14drm/i915: Free memdup-ed DSI VBT data structures on driver_unloadHans de Goede
Make intel_bios_cleanup function free the DSI VBT data structures which are memdup-ed by parse_mipi_config() and parse_mipi_sequence(). Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180214082151.25015-2-hdegoede@redhat.com (cherry picked from commit e1b86c85f6c2029c31dba99823b6f3d9e15eaacd) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-14drm/i915: Add intel_bios_cleanup() functionHans de Goede
Add an intel_bios_cleanup() function to act as counterpart of intel_bios_init() and move the cleanup of vbt related resources there, putting it in the same file as the allocation. Changed in v2: -While touching the code anyways, remove the unnecessary: if (dev_priv->vbt.child_dev) done before kfree(dev_priv->vbt.child_dev) Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180214082151.25015-1-hdegoede@redhat.com (cherry picked from commit 785f076b3ba781804f2b22b347b4431e3efb0ab3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-14drm/i915/vlv: Add cdclk workaround for DSIHans de Goede
At least on the Chuwi Vi8 (non pro/plus) the LCD panel will show an image shifted aprox. 20% to the left (with wraparound) and sometimes also wrong colors, showing that the panel controller is starting with sampling the datastream somewhere mid-line. This happens after the first blanking and re-init of the panel. After looking at drm.debug output I noticed that initially we inherit the cdclk of 333333 KHz set by the GOP, but after the re-init we picked 266667 KHz, which turns out to be the cause of this problem, a quick hack to hard code the cdclk to 333333 KHz makes the problem go away. I've tested this on various Bay Trail devices, to make sure this not does cause regressions on other devices and the higher cdclk does not cause any problems on the following devices: -GP-electronic T701 1024x600 333333 KHz cdclk after this patch -PEAQ C1010 1920x1200 333333 KHz cdclk after this patch -PoV mobii-wintab-800w 800x1280 333333 KHz cdclk after this patch -Asus Transformer-T100TA 1368x768 320000 KHz cdclk after this patch Also interesting wrt this is the comment in vlv_calc_cdclk about the existing workaround to avoid 200 Mhz as clock because that causes issues in some cases. This commit extends the "do not use 200 Mhz" workaround with an extra check to require atleast 320000 KHz (avoiding 266667 KHz) when a DSI panel is active. Changes in v2: -Change the commit message and the code comment to not treat the GOP as a reference, the GOP should not be treated as a reference Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171220105017.11259-1-hdegoede@redhat.com (cherry picked from commit c8dae55a8ced625038d52d26e48273707fab2688) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-14Merge branch 'ibmvnic-leaks'David S. Miller
Thomas Falcon says: ==================== ibmvnic: Fix memory leaks in the driver This patch set is pretty self-explanatory. It includes a number of patches that fix memory leaks found with kmemleak in the ibmvnic driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14ibmvnic: Clean RX pool buffers during device closeThomas Falcon
During device close or reset, there were some cases of outstanding RX socket buffers not being freed. Include a function similar to the one that already exists to clean TX socket buffers in this case. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14ibmvnic: Free RX socket buffer in case of adapter errorThomas Falcon
If a RX buffer is returned to the client driver with an error, free the corresponding socket buffer before continuing. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14ibmvnic: Fix NAPI structures memory leakThomas Falcon
This memory is allocated during initialization but never freed, so do that now. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14ibmvnic: Fix login buffer memory leaksThomas Falcon
During device bringup, the driver exchanges login buffers with firmware. These buffers contain information such number of TX and RX queues alloted to the device, RX buffer size, etc. These buffers weren't being properly freed on device reset or close. We can free the buffer we send to firmware as soon as we get a response. There is information in the response buffer that the driver needs for normal operation so retain it until the next reset or removal. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>