summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-11-24netfilter: nat: switch to new rhlist interfaceFlorian Westphal
I got offlist bug report about failing connections and high cpu usage. This happens because we hit 'elasticity' checks in rhashtable that refuses bucket list exceeding 16 entries. The nat bysrc hash unfortunately needs to insert distinct objects that share same key and are identical (have same source tuple), this cannot be avoided. Switch to the rhlist interface which is designed for this. The nulls_base is removed here, I don't think its needed: A (unlikely) false positive results in unneeded port clash resolution, a false negative results in packet drop during conntrack confirmation, when we try to insert the duplicate into main conntrack hash table. Tested by adding multiple ip addresses to host, then adding iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE ... and then creating multiple connections, from same source port but different addresses: for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null & done (all of these then get hashed to same bysource slot) Then, to test that nat conflict resultion is working: nc -s 10.0.0.1 -p 1234 192.168.7.1 2000 nc -s 10.0.0.2 -p 1234 192.168.7.1 2000 tcp .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED] tcp .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED] tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED] tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED] [..] -> nat altered source ports to 1024 and 1025, respectively. This can also be confirmed on destination host which shows ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1024 ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1025 ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1234 Cc: Herbert Xu <herbert@gondor.apana.org.au> Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: nat: fix cmp return valueFlorian Westphal
The comparator works like memcmp, i.e. 0 means objects are equal. In other words, when objects are distinct they are treated as identical, when they are distinct they are allegedly the same. The first case is rare (distinct objects are unlikely to get hashed to same bucket). The second case results in unneeded port conflict resolutions attempts. Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: nft_hash: validate maximum value of u32 netlink hash attributeLaura Garcia Liebana
Use the function nft_parse_u32_check() to fetch the value and validate the u32 attribute into the hash len u8 field. This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes"). Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression") Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24KVM: arm/arm64: vgic: Don't notify EOI for non-SPIsMarc Zyngier
When we inject a level triggerered interrupt (and unless it is backed by the physical distributor - timer style), we request a maintenance interrupt. Part of the processing for that interrupt is to feed to the rest of KVM (and to the eventfd subsystem) the information that the interrupt has been EOIed. But that notification only makes sense for SPIs, and not PPIs (such as the PMU interrupt). Skip over the notification if the interrupt is not an SPI. Cc: stable@vger.kernel.org # 4.7+ Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-24sched: Extend scheduler's asym packingTim Chen
We generalize the scheduler's asym packing to provide an ordering of the cpu beyond just the cpu number. This allows the use of the ASYM_PACKING scheduler machinery to move loads to preferred CPU in a sched domain. The preference is defined with the cpu priority given by arch_asym_cpu_priority(cpu). We also record the most preferred cpu in a sched group when we build the cpu's capacity for fast lookup of preferred cpu during load balancing. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-pm@vger.kernel.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/0e73ae12737dfaafa46c07066cc7c5d3f1675e46.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24netfilter: fix nf_conntrack_helper documentationFlorian Westphal
Since kernel 4.7 this defaults to off. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: Update nf_send_reset6 to consider L3 domainDavid Ahern
nf_send_reset6 is not considering the L3 domain and lookups are sent to the wrong table. For example consider the following output rule: ip6tables -A OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset using perf to analyze lookups via the fib6_table_lookup tracepoint shows: swapper 0 [001] 248.787816: fib6:fib6_table_lookup: table 255 oif 0 iif 1 src 2100:1::3 dst 2100:1: ffffffff81439cdc perf_trace_fib6_table_lookup ([kernel.kallsyms]) ffffffff814c1ce3 trace_fib6_table_lookup ([kernel.kallsyms]) ffffffff814c3e89 ip6_pol_route ([kernel.kallsyms]) ffffffff814c40d5 ip6_pol_route_output ([kernel.kallsyms]) ffffffff814e7b6f fib6_rule_action ([kernel.kallsyms]) ffffffff81437f60 fib_rules_lookup ([kernel.kallsyms]) ffffffff814e7c79 fib6_rule_lookup ([kernel.kallsyms]) ffffffff814c2541 ip6_route_output_flags ([kernel.kallsyms]) 528 nf_send_reset6 ([nf_reject_ipv6]) The lookup is directed to table 255 rather than the table associated with the device via the L3 domain. Update nf_send_reset6 to pull the L3 domain from the dst currently attached to the skb. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: Update ip_route_me_harder to consider L3 domainDavid Ahern
ip_route_me_harder is not considering the L3 domain and sending lookups to the wrong table. For example consider the following output rule: iptables -I OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset using perf to analyze lookups via the fib_table_lookup tracepoint shows: vrf-test 1187 [001] 46887.295927: fib:fib_table_lookup: table 255 oif 0 iif 0 src 0.0.0.0 dst 10.100.1.254 tos 0 scope 0 flags 0 ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms]) ffffffff81493aac fib_table_lookup ([kernel.kallsyms]) ffffffff8148dda3 __inet_dev_addr_type ([kernel.kallsyms]) ffffffff8148ddf6 inet_addr_type ([kernel.kallsyms]) ffffffff8149e344 ip_route_me_harder ([kernel.kallsyms]) and vrf-test 1187 [001] 46887.295933: fib:fib_table_lookup: table 255 oif 0 iif 1 src 10.100.1.254 dst 10.100.1.2 tos 0 scope 0 flags ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms]) ffffffff81493aac fib_table_lookup ([kernel.kallsyms]) ffffffff814998ff fib4_rule_action ([kernel.kallsyms]) ffffffff81437f35 fib_rules_lookup ([kernel.kallsyms]) ffffffff81499758 __fib_lookup ([kernel.kallsyms]) ffffffff8144f010 fib_lookup.constprop.34 ([kernel.kallsyms]) ffffffff8144f759 __ip_route_output_key_hash ([kernel.kallsyms]) ffffffff8144fc6a ip_route_output_flow ([kernel.kallsyms]) ffffffff8149e39b ip_route_me_harder ([kernel.kallsyms]) In both cases the lookups are directed to table 255 rather than the table associated with the device via the L3 domain. Update both lookups to pull the L3 domain from the dst currently attached to the skb. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24drm: bridge: add DesignWare HDMI I2S audio supportKuninori Morimoto
Current dw-hdmi is supporting sound via AHB bus, but it has I2S audio feature too. This patch adds I2S audio support to dw-hdmi. This HDMI I2S is supported by using ALSA SoC common HDMI encoder driver. Tested-by: Jose Abreu <joabreu@synopsys.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Archit Taneja <architt@codeaurora.org> Link: http://patchwork.freedesktop.org/patch/msgid/8737j2bxba.wl%kuninori.morimoto.gx@renesas.com
2016-11-24xen-scsifront: Add a missing call to kfreeQuentin Lambert
Most error branches following the call to kmalloc contain a call to kfree. This patch add these calls where they are missing. This issue was found with Hector. Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-11-24drm: Check against color expansion in drm_mm_reserve_node()Chris Wilson
Use the color_adjust callback when reserving a node to check if inserting a node into this hole requires any additional space, and so if that space then conflicts with an existing allocation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161123141118.23876-2-chris@chris-wilson.co.uk
2016-11-24drm: Define drm_mm_for_each_node_in_range()Chris Wilson
Some clients would like to iterate over every node within a certain range. Make a nice little macro for them to hide the mixing of the rbtree search and linear walk. v2: Blurb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161123141118.23876-1-chris@chris-wilson.co.uk
2016-11-24drm/doc: Fix links in drm_property.cDaniel Vetter
One of the functions was missing () to make the autolinks work, unfortunately copy-pasted a few times all over. Cc: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161123192327.28819-1-daniel.vetter@ffwll.ch
2016-11-24drm/mediatek: fix null pointer dereferenceMatthias Brugger
The probe function requests the interrupt before initializing the ddp component. Which leads to a null pointer dereference at boot. Fix this by requesting the interrput after all components got initialized properly. Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Change-Id: I57193a7ab554dfb37c35a455900689333adf511c
2016-11-24drm/mediatek: fixed the calc method of data rate per laneJitao Shi
Tune dsi frame rate by pixel clock, dsi add some extra signal (i.e. Tlpx, Ths-prepare, Ths-zero, Ths-trail,Ths-exit) when enter and exit LP mode, those signals will cause h-time larger than normal and reduce FPS. So need to multiply a coefficient to offset the extra signal's effect. coefficient = ((htotal*bpp/lane_number)+Tlpx+Ths_prep+Ths_zero+ Ths_trail+Ths_exit)/(htotal*bpp/lane_number) Signed-off-by: Jitao Shi <jitao.shi@mediatek.com> Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
2016-11-24drm/mediatek: fix a typo of DISP_OD_CFG to OD_RELAYMODEBibby Hsieh
If we want to set the hardware OD to relay mode, we have to set DISP_OD_CFG register rather than OD_RELAYMODE; otherwise, the system will access the wrong address. Change-Id: Ifb9bb4caa63df906437d48b5d5326b6d04ea332a Fixes: 7216436420414144646f5d8343d061355fd23483 ("drm/mediatek: set mt8173 dithering function") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Bibby Hsieh <bibby.hsieh@mediatek.com> Acked-by: CK Hu <ck.hu@mediatek.com>
2016-11-24powerpc/boot: Fix the early OPAL console wrappersOliver O'Halloran
When configured with CONFIG_PPC_EARLY_DEBUG_OPAL=y the kernel expects the OPAL entry and base addresses to be passed in r8 and r9 respectively. Currently the wrapper does not attempt to restore these values before entering the decompressed kernel which causes the kernel to branch into whatever happens to be in r9 when doing a write to the OPAL console in early boot. This patch adds a platform_ops hook that can be used to branch into the new kernel. The OPAL console driver patches this at runtime so that if the console is used it will be restored just prior to entering the kernel. Fixes: 656ad58ef19e ("powerpc/boot: Add OPAL console to epapr wrappers") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-24drm/i915/gvt: fix getting 64bit bar size errorXiaoguang Chen
For 64bit bar while reading the higher 32bit the value should be returned directly. In the current implementation the higher 32bit value was discarded and not written to the cfg space of vgpu which lead to an incorrect bar size. Signed-off-by: Xiaoguang Chen <xiaoguang.chen@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2016-11-24x86/apic/uv: Silence a shift wrapping warningDan Carpenter
'm_io' is stored in 6 bits so it's a number in the 0-63 range. Static analysis tools complain that 1 << 63 will wrap so I have changed it to 1ULL << m_io. This code is over three years old so presumably the bug doesn't happen very frequently in real life or someone would have complained by now. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Fixes: b15cc4a12bed ("x86, uv, uv3: Update x2apic Support for SGI UV3") Link: http://lkml.kernel.org/r/20161123221908.GA23997@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-24x86/coredump: Always use user_regs_struct for compat_elf_gregset_tDmitry Safonov
Commit: 90954e7b9407 ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag") changed the coredumping code to construct the elf coredump file according to register set size - and that's good: if binary crashes with 32-bit code selector, generate 32-bit ELF core, otherwise - 64-bit core. That was made for restoring 32-bit applications on x86_64: we want 32-bit application after restore to generate 32-bit ELF dump on crash. All was quite good and recently I started reworking 32-bit applications dumping part of CRIU: now it has two parasites (32 and 64) for seizing compat/native tasks, after rework it'll have one parasite, working in 64-bit mode, to which 32-bit prologue long-jumps during infection. And while it has worked for my work machine, in VM with !CONFIG_X86_X32_ABI during reworking I faced that segfault in 32-bit binary, that has long-jumped to 64-bit mode results in dereference of garbage: 32-victim[19266]: segfault at f775ef65 ip 00000000f775ef65 sp 00000000f776aa50 error 14 BUG: unable to handle kernel paging request at ffffffffffffffff IP: [<ffffffff81332ce0>] strlen+0x0/0x20 [...] Call Trace: [] elf_core_dump+0x11a9/0x1480 [] do_coredump+0xa6b/0xe60 [] get_signal+0x1a8/0x5c0 [] do_signal+0x23/0x660 [] exit_to_usermode_loop+0x34/0x65 [] prepare_exit_to_usermode+0x2f/0x40 [] retint_user+0x8/0x10 That's because we have 64-bit registers set (with according total size) and we're writing it to elf_thread_core_info which has smaller size on !CONFIG_X86_X32_ABI. That lead to overwriting ELF notes part. Tested on 32-, 64-bit ELF crashes and on 32-bit binaries that have jumped with 64-bit code selector - all is readable with gdb. Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 90954e7b9407 ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-24sched/autogroup: Fix 64-bit kernel nice level adjustmentMike Galbraith
Michael Kerrisk reported: > Regarding the previous paragraph... My tests indicate > that writing *any* value to the autogroup [nice priority level] > file causes the task group to get a lower priority. Because autogroup didn't call the then meaningless scale_load()... Autogroup nice level adjustment has been broken ever since load resolution was increased for 64-bit kernels. Use scale_load() to scale group weight. Michael Kerrisk tested this patch to fix the problem: > Applied and tested against 4.9-rc6 on an Intel u7 (4 cores). > Test setup: > > Terminal window 1: running 40 CPU burner jobs > Terminal window 2: running 40 CPU burner jobs > Terminal window 1: running 1 CPU burner job > > Demonstrated that: > * Writing "0" to the autogroup file for TW1 now causes no change > to the rate at which the process on the terminal consume CPU. > * Writing -20 to the autogroup file for TW1 caused those processes > to get the lion's share of CPU while TW2 TW3 get a tiny amount. > * Writing -20 to the autogroup files for TW1 and TW3 allowed the > process on TW3 to get as much CPU as it was getting as when > the autogroup nice values for both terminals were 0. Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-man <linux-man@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1479897217.4306.6.camel@gmx.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-24Merge tag 'perf-core-for-mingo-20161123' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New tool: - 'perf sched timehist' provides an analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ------ ---------------- --------- --------- -------- 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 1.874604 [0011] <idle> 1.148 0.000 0.035 1.874723 [0005] <idle> 0.016 0.000 1.383 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. (David Ahern, Namhyung Kim) Improvements: - Make 'perf c2c report' support -f/--force, to allow skipping the ownership check for root users, for instance, just like the other tools (Jiri Olsa) - Allow sorting cachelines by total number of HITMs, in addition to local and remote numbers (Jiri Olsa) Fixes: - Make sure errors aren't suppressed by the TUI reset at the end of a 'perf c2c report' session (Jiri Olsa) Infrastructure changes: - Initial work on having the annotate code better support multiple architectures, including the ability to cross-annotate, i.e. to annotate perf.data files collected on an ARM system on a x86_64 workstation (Arnaldo Carvalho de Melo, Ravi Bangoria, Kim Phillips) - Use USECS_PER_SEC instead of hard coded number in libtraceevent (Steven Rostedt) - Add retrieval of preempt count and latency flags in libtraceevent (Steven Rostedt) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-24Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-24KVM: PPC: Correctly report KVM_CAP_PPC_ALLOC_HTABDavid Gibson
At present KVM on powerpc always reports KVM_CAP_PPC_ALLOC_HTAB as enabled. However, the ioctl() it advertises (KVM_PPC_ALLOCATE_HTAB) only actually works on KVM HV. On KVM PR it will fail with ENOTTY. QEMU already has a workaround for this, so it's not breaking things in practice, but it would be better to advertise this correctly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Move KVM_PPC_PVINFO_FLAGS_EV_IDLE definition next to its structureDavid Gibson
The KVM_PPC_PVINFO_FLAGS_EV_IDLE macro defines a bit for use in the flags field of struct kvm_ppc_pvinfo. However, changes since that was introduced have moved it away from that structure definition, which is confusing. Move it back next to the structure it belongs with. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Fix compilation with unusual configurationsPaul Mackerras
This adds the "again" parameter to the dummy version of kvmppc_check_passthru(), so that it matches the real version. This fixes compilation with CONFIG_BOOK3S_64_HV set but CONFIG_KVM_XICS=n. This includes asm/smp.h in book3s_hv_builtin.c to fix compilation with CONFIG_SMP=n. The explicit inclusion is necessary to provide definitions of hard_smp_processor_id() and get_hard_smp_processor_id() in UP configs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24ACPI / property: Hierarchical properties support updateRafael J. Wysocki
The definition document of the Hierarchical Properties Extension UUID for _DSD has been changed recently to allow local references to be used as sub-node link targets (previously, it only allowed strings to be used for that purpose). Update the code in drivers/acpi/property.c to reflect that change. Link: http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-23net/mlx4_en: Free netdev resources under state lockTariq Toukan
Make sure mlx4_en_free_resources is called under the netdev state lock. This is needed since RCU dereference of XDP prog should be protected. Fixes: 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sagi Grimberg <sagi@grimberg.me> CC: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"WANG Cong
This reverts commit 7c6ae610a1f0, because l2tp_xmit_skb() never returns NET_XMIT_CN, it ignores the return value of l2tp_xmit_core(). Cc: Gao Feng <gfree.wind@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()Zhang Shengju
For RT netlink, calcit() function should return the minimal size for netlink dump message. This will make sure that dump message for every network device can be stored. Currently, rtnl_calcit() function doesn't account the size of header of netlink message, this patch will fix it. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23bnxt_en: Fix a VXLAN vs GENEVE issueChristophe Jaillet
Knowing that: #define TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN (0x1UL << 0) #define TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE (0x5UL << 0) and that 'bnxt_hwrm_tunnel_dst_port_alloc()' is only called with one of these 2 constants, the TUNNEL_DST_PORT_ALLOC_REQ_TUNNEL_TYPE_GENEVE can not trigger. Replace the bit test that overlap by an equality test, just as in 'bnxt_hwrm_tunnel_dst_port_free()' above. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23netdevice.h: fix kernel-doc warningRandy Dunlap
Fix kernel-doc warning in <linux/netdevice.h> (missing ':'): ..//include/linux/netdevice.h:1904: warning: No description found for parameter 'prio_tc_map[TC_BITMASK + 1]' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23driver: macvlan: Check if need rollback multicast setting in macvlan_openGao Feng
When dev_set_promiscuity failed in macvlan_open, it always invokes dev_set_allmulti without checking if necessary. Now check the IFF_ALLMULTI flag firstly before rollback the multicast setting in the error handler. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23net: phy: micrel: fix KSZ8041FTL supported valueKirill Esipov
Fix setting of SUPPORTED_FIBRE bit as it was not present in features of KSZ8041. Signed-off-by: Kirill Esipov <yesipov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24Merge branch 'for-upstream/hdlcd' of git://linux-arm.org/linux-ld into drm-fixesDave Airlie
A late issue discovered by Russell King while testing his setup on Juno. * 'for-upstream/hdlcd' of git://linux-arm.org/linux-ld: drm/arm: hdlcd: fix plane base address update
2016-11-24Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes one small powerplay fix and one regression fix for older PX systems and d3cold * 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux: drm/radeon: fix power state when port pm is unavailable (v2) drm/amdgpu: fix power state when port pm is unavailable drm/amd/powerplay: avoid out of bounds access on array ps.
2016-11-23clk: qcom: Put venus core0/1 gdscs to hw control modeSricharan R
The venus video ip's internal core blocks are under the control of the firmware and their powerdomains needs to be 'ON' only when used by the firmware. So putting it into hw controlled mode lets this to happen, otherwise the firmware hangs checking for this. Signed-off-by: Sricharan R <sricharan@codeaurora.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-11-23clk: qcom: gdsc: Add support for gdscs with HW controlRajendra Nayak
Some GDSCs might support a HW control mode, where in the power domain (gdsc) is brought in and out of low power state (while unsued) without any SW assistance, saving power. Such GDSCs can be configured in a HW control mode when powered on until they are explicitly requested to be powered off by software. Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org> Signed-off-by: Sricharan R <sricharan@codeaurora.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-11-23xc2028: Fix use-after-free bug properlyTakashi Iwai
The commit 8dfbcc4351a0 ("[media] xc2028: avoid use after free") tried to address the reported use-after-free by clearing the reference. However, it's clearing the wrong pointer; it sets NULL to priv->ctrl.fname, but it's anyway overwritten by the next line memcpy(&priv->ctrl, p, sizeof(priv->ctrl)). OTOH, the actual code accessing the freed string is the strcmp() call with priv->fname: if (!firmware_name[0] && p->fname && priv->fname && strcmp(p->fname, priv->fname)) free_firmware(priv); where priv->fname points to the previous file name, and this was already freed by kfree(). For fixing the bug properly, this patch does the following: - Keep the copy of firmware file name in only priv->fname, priv->ctrl.fname isn't changed; - The allocation is done only when the firmware gets loaded; - The kfree() is called in free_firmware() commonly Fixes: commit 8dfbcc4351a0 ('[media] xc2028: avoid use after free') Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-11-23Merge tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "Most of these fix regressions or races, but there is one patch for stable that Arnd sent me Stable bugfix: - Hide array-bounds warning Bugfixes: - Keep a reference on lock states while checking - Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state - Don't call close if the open stateid has already been cleared - Fix CLOSE rases with OPEN - Fix a regression in DELEGRETURN" * tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.x: hide array-bounds warning NFSv4.1: Keep a reference on lock states while checking NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state NFSv4: Don't call close if the open stateid has already been cleared NFSv4: Fix CLOSE races with OPEN NFSv4.1: Fix a regression in DELEGRETURN
2016-11-23Merge branch 'clk-fixes' into clk-nextStephen Boyd
* clk-fixes: clk: bcm: Fix unmet Kconfig dependencies for CLK_BCM_63XX clk: sunxi-ng: enable so-said LDOs for A33 SoC's pll-mipi clock clk: sunxi-ng: sun6i-a31: Enable PLL-MIPI LDOs when ungating it
2016-11-23clk: bcm: Fix unmet Kconfig dependencies for CLK_BCM_63XXFlorian Fainelli
With commit f4e871509959 ("clk: iproc: Make clocks visible options"), COMMON_CLK_IPROC gained a dependency on ARCH_BCM_IPROC, yet CLK_BCM_63XX also selects that option, this causes the following Kconfig warning: warning: (CLK_BCM_63XX) selects COMMON_CLK_IPROC which has unmet direct dependencies ((ARCH_BCM_IPROC || COMPILE_TEST) && COMMON_CLK) Fix this by adding proper depends for COMMON_CLK_IPROC Fixes: f4e871509959 ("clk: iproc: Make clocks visible options") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> [sboyd@codeaurora.org: Drop default part as it's redundant] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-11-24KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00Suraj Jitindar Singh
The function kvmppc_set_arch_compat() is used to determine the value of the processor compatibility register (PCR) for a guest running in a given compatibility mode. There is currently no support for v3.00 of the ISA. Add support for v3.00 of the ISA which adds an ISA v2.07 compatilibity mode to the PCR. We also add a check to ensure the processor we are running on is capable of emulating the chosen processor (for example a POWER7 cannot emulate a POWER8, similarly with a POWER8 and a POWER9). Based on work by: Paul Mackerras <paulus@ozlabs.org> [paulus@ozlabs.org - moved dummy PCR_ARCH_300 definition here; set guest_pcr_bit when arch_compat == 0, added comment.] Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcoresPaul Mackerras
With POWER9, each CPU thread has its own MMU context and can be in the host or a guest independently of the other threads; there is still however a restriction that all threads must use the same type of address translation, either radix tree or hashed page table (HPT). Since we only support HPT guests on a HPT host at this point, we can treat the threads as being independent, and avoid all of the work of coordinating the CPU threads. To make this simpler, we introduce a new threads_per_vcore() function that returns 1 on POWER9 and threads_per_subcore on POWER7/8, and use that instead of threads_per_subcore or threads_per_core in various places. This also changes the value of the KVM_CAP_PPC_SMT capability on POWER9 systems from 4 to 1, so that userspace will not try to create VMs with multiple vcpus per vcore. (If userspace did create a VM that thought it was in an SMT mode, the VM might try to use the msgsndp instruction, which will not work as expected. In future it may be possible to trap and emulate msgsndp in order to allow VMs to think they are in an SMT mode, if only for the purpose of allowing migration from POWER8 systems.) With all this, we can now run guests on POWER9 as long as the host is running with HPT translation. Since userspace currently has no way to request radix tree translation for the guest, the guest has no choice but to use HPT translation. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Enable hypervisor virtualization interrupts while in guestPaul Mackerras
The new XIVE interrupt controller on POWER9 can direct external interrupts to the hypervisor or the guest. The interrupts directed to the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and come in as a "hypervisor virtualization interrupt". This sets the LPCR bit so that hypervisor virtualization interrupts can occur while we are in the guest. We then also need to cope with exiting the guest because of a hypervisor virtualization interrupt. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9Paul Mackerras
POWER9 replaces the various power-saving mode instructions on POWER8 (doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus a register, PSSCR, which controls the depth of the power-saving mode. This replaces the use of the nap instruction when threads are idle during guest execution with the stop instruction, and adds code to set PSSCR to a value which will allow an SMT mode switch while the thread is idle (given that the core as a whole won't be idle in these cases). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9Paul Mackerras
POWER9 includes a new interrupt controller, called XIVE, which is quite different from the XICS interrupt controller on POWER7 and POWER8 machines. KVM-HV accesses the XICS directly in several places in order to send and clear IPIs and handle interrupts from PCI devices being passed through to the guest. In order to make the transition to XIVE easier, OPAL firmware will include an emulation of XICS on top of XIVE. Access to the emulated XICS is via OPAL calls. The one complication is that the EOI (end-of-interrupt) function can now return a value indicating that another interrupt is pending; in this case, the XIVE will not signal an interrupt in hardware to the CPU, and software is supposed to acknowledge the new interrupt without waiting for another interrupt to be delivered in hardware. This adapts KVM-HV to use the OPAL calls on machines where there is no XICS hardware. When there is no XICS, we look for a device-tree node with "ibm,opal-intc" in its compatible property, which is how OPAL indicates that it provides XICS emulation. In order to handle the EOI return value, kvmppc_read_intr() has become kvmppc_read_one_intr(), with a boolean variable passed by reference which can be set by the EOI functions to indicate that another interrupt is pending. The new kvmppc_read_intr() keeps calling kvmppc_read_one_intr() until there are no more interrupts to process. The return value from kvmppc_read_intr() is the largest non-zero value of the returns from kvmppc_read_one_intr(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9Paul Mackerras
On POWER9, the msgsnd instruction is able to send interrupts to other cores, as well as other threads on the local core. Since msgsnd is generally simpler and faster than sending an IPI via the XICS, we use msgsnd for all IPIs sent by KVM on POWER9. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9Paul Mackerras
POWER9 adds new capabilities to the tlbie (TLB invalidate entry) and tlbiel (local tlbie) instructions. Both instructions get a set of new parameters (RIC, PRS and R) which appear as bits in the instruction word. The tlbiel instruction now has a second register operand, which contains a PID and/or LPID value if needed, and should otherwise contain 0. This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9 as well as older processors. Since we only handle HPT guests so far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction word as on previous processors, so we don't need to conditionally execute different instructions depending on the processor. The local flush on first entry to a guest in book3s_hv_rmhandlers.S is a loop which depends on the number of TLB sets. Rather than using feature sections to set the number of iterations based on which CPU we're on, we now work out this number at VM creation time and store it in the kvm_arch struct. That will make it possible to get the number from the device tree in future, which will help with compatibility with future processors. Since mmu_partition_table_set_entry() does a global flush of the whole LPID, we don't need to do the TLB flush on first entry to the guest on each processor. Therefore we don't set all bits in the tlb_need_flush bitmap on VM startup on POWER9. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRsPaul Mackerras
This adds code to handle two new guest-accessible special-purpose registers on POWER9: TIDR (thread ID register) and PSSCR (processor stop status and control register). They are context-switched between host and guest, and the guest values can be read and set via the one_reg interface. The PSSCR contains some fields which are guest-accessible and some which are only accessible in hypervisor mode. We only allow the guest-accessible fields to be read or set by userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>