summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-09-20KVM: VMX: use preemption timer to force immediate VMExitSean Christopherson
A VMX preemption timer value of '0' is guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. Use the preemption timer (if it's supported) to trigger immediate VMExit in place of the current method of sending a self-IPI. This ensures that pending VMExit injection to L1 occurs prior to executing any instructions in the guest (regardless of nesting level). When deferring VMExit injection, KVM generates an immediate VMExit from the (possibly nested) guest by sending itself an IPI. Because hardware interrupts are blocked prior to VMEnter and are unblocked (in hardware) after VMEnter, this results in taking a VMExit(INTR) before any guest instruction is executed. But, as this approach relies on the IPI being received before VMEnter executes, it only works as intended when KVM is running as L0. Because there are no architectural guarantees regarding when IPIs are delivered, when running nested the INTR may "arrive" long after L2 is running e.g. L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. For the most part, this unintended delay is not an issue since the events being injected to L1 also do not have architectural guarantees regarding their timing. The notable exception is the VMX preemption timer[1], which is architecturally guaranteed to cause a VMExit prior to executing any instructions in the guest if the timer value is '0' at VMEnter. Specifically, the delay in injecting the VMExit causes the preemption timer KVM unit test to fail when run in a nested guest. Note: this approach is viable even on CPUs with a broken preemption timer, as broken in this context only means the timer counts at the wrong rate. There are no known errata affecting timer value of '0'. [1] I/O SMIs also have guarantees on when they arrive, but I have no idea if/how those are emulated in KVM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: modify preemption timer bit only when arming timerSean Christopherson
Provide a singular location where the VMX preemption timer bit is set/cleared so that future usages of the preemption timer can ensure the VMCS bit is up-to-date without having to modify unrelated code paths. For example, the preemption timer can be used to force an immediate VMExit. Cache the status of the timer to avoid redundant VMREAD and VMWRITE, e.g. if the timer stays armed across multiple VMEnters/VMExits. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: immediately mark preemption timer expired only for zero valueSean Christopherson
A VMX preemption timer value of '0' at the time of VMEnter is architecturally guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. This architectural definition is in place to ensure that a previously expired timer is correctly recognized by the CPU as it is possible for the timer to reach zero and not trigger a VMexit due to a higher priority VMExit being signalled instead, e.g. a pending #DB that morphs into a VMExit. Whether by design or coincidence, commit f4124500c2c1 ("KVM: nVMX: Fully emulate preemption timer") special cased timer values of '0' and '1' to ensure prompt delivery of the VMExit. Unlike '0', a timer value of '1' has no has no architectural guarantees regarding when it is delivered. Modify the timer emulation to trigger immediate VMExit if and only if the timer value is '0', and document precisely why '0' is special. Do this even if calibration of the virtual TSC failed, i.e. VMExit will occur immediately regardless of the frequency of the timer. Making only '0' a special case gives KVM leeway to be more aggressive in ensuring the VMExit is injected prior to executing instructions in the nested guest, and also eliminates any ambiguity as to why '1' is a special case, e.g. why wasn't the threshold for a "short timeout" set to 10, 100, 1000, etc... Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: SVM: Switch to bitmap_zalloc()Andy Shevchenko
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM/MMU: Fix comment in walk_shadow_page_lockless_end()Tianyu Lan
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page() This patch is to fix the commit. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20kvm: selftests: use -pthread instead of -lpthreadLei Yang
I run into the following error testing/selftests/kvm/dirty_log_test.c:285: undefined reference to `pthread_create' testing/selftests/kvm/dirty_log_test.c:297: undefined reference to `pthread_join' collect2: error: ld returned 1 exit status my gcc version is gcc version 4.8.4 "-pthread" would work everywhere Signed-off-by: Lei Yang <Lei.Yang@windriver.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: x86: don't reset root in kvm_mmu_setup()Wei Yang
Here is the code path which shows kvm_mmu_setup() is invoked after kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path, this means the root_hpa and prev_roots are guaranteed to be invalid. And it is not necessary to reset it again. kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_create() vmx_create_vcpu() kvm_vcpu_init() kvm_arch_vcpu_init() kvm_mmu_create() kvm_arch_vcpu_setup() kvm_mmu_setup() kvm_init_mmu() This patch set reset_roots to false in kmv_mmu_setup(). Fixes: 50c28f21d045dde8c52548f8482d456b3f0956f5 Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20kvm: mmu: Don't read PDPTEs when paging is not enabledJunaid Shahid
kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and CR4.PAE = 1. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20x86/kvm/lapic: always disable MMIO interface in x2APIC modeVitaly Kuznetsov
When VMX is used with flexpriority disabled (because of no support or if disabled with module parameter) MMIO interface to lAPIC is still available in x2APIC mode while it shouldn't be (kvm-unit-tests): PASS: apic_disable: Local apic enabled in x2APIC mode PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set FAIL: apic_disable: *0xfee00030: 50014 The issue appears because we basically do nothing while switching to x2APIC mode when APIC access page is not used. apic_mmio_{read,write} only check if lAPIC is disabled before proceeding to actual write. When APIC access is virtualized we correctly manipulate with VMX controls in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes in x2APIC mode so there's no issue. Disabling MMIO interface seems to be easy. The question is: what do we do with these reads and writes? If we add apic_x2apic_mode() check to apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will go to userspace. When lAPIC is in kernel, Qemu uses this interface to inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This somehow works with disabled lAPIC but when we're in xAPIC mode we will get a real injected MSI from every write to lAPIC. Not good. The simplest solution seems to be to just ignore writes to the region and return ~0 for all reads when we're in x2APIC mode. This is what this patch does. However, this approach is inconsistent with what currently happens when flexpriority is enabled: we allocate APIC access page and create KVM memory region so in x2APIC modes all reads and writes go to this pre-allocated page which is, btw, the same for all vCPUs. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-19tools: bpf: fix license for a compat header fileJakub Kicinski
libc_compat.h is used by libbpf so make sure it's licensed under LGPL or BSD license. The license change should be OK, I'm the only author of the file. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-19IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loopBart Van Assche
Use different loop variables for the inner and outer loop. This avoids that an infinite loop occurs if there are more RDMA channels than target->req_ring_size. Fixes: d92c0da71a35 ("IB/srp: Add multichannel support") Cc: <stable@vger.kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-19Merge tag 'hwmon-for-linus-v4.19-rc5' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Guenter writes: "Various bug fixes for nct6775 driver"
2018-09-19Merge tag 'scsi-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi James writes: "SCSI fixes on 20180919 A couple of small but important fixes, one affecting big endian and the other fixing a BUG_ON in scatterlist processing. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>"
2018-09-19xen: issue warning message when out of grant maptrack entriesJuergen Gross
When a driver domain (e.g. dom0) is running out of maptrack entries it can't map any more foreign domain pages. Instead of silently stalling the affected domUs issue a rate limited warning in this case in order to make it easier to detect that situation. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-09-19xen/x86/vpmu: Zero struct pt_regs before calling into sample handling codeBoris Ostrovsky
Otherwise we may leak kernel stack for events that sample user registers. Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
2018-09-19MAINTAINERS: Add Borislav to the x86 maintainersThomas Gleixner
Borislav is effectivly maintaining parts of X86 already, make it official. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de>
2018-09-19ext2, dax: set ext2_dax_aops for dax filesToshi Kani
Sync syscall to DAX file needs to flush processor cache, but it currently does not flush to existing DAX files. This is because 'ext2_da_aops' is set to address_space_operations of existing DAX files, instead of 'ext2_dax_aops', since S_DAX flag is set after ext2_set_aops() in the open path. Similar to ext4, change ext2_iget() to initialize i_flags before ext2_set_aops(). Fixes: fb094c90748f ("ext2, dax: introduce ext2_dax_aops") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Jan Kara <jack@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-19Merge tag 'perf-urgent-for-mingo-4.19-20180918' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix the build on !_GNU_SOURCE libc systems such as Alpine Linux/musl libc due to usage of strerror_r glibc variant on libbpf (Arnaldo Carvalho de Melo) - Fix out-of-tree asciidoctor man page generation (Ben Hutchings) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-19x86/paravirt: Fix some warning messagesDan Carpenter
The first argument to WARN_ONCE() is a condition. Fixes: 5800dc5c19f3 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: virtualization@lists.linux-foundation.org Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda
2018-09-19drm: sun4i: drop second PLL from A64 HDMI PHYIcenowy Zheng
The A64 HDMI PHY seems to be not able to use the second video PLL as clock parent in experiments. Drop the support for the second PLL from A64 HDMI PHY driver. Fixes: b46e2c9f5f64 ("drm/sun4i: Add support for A64 HDMI PHY") Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180916043409.62374-2-icenowy@aosc.io
2018-09-19Merge branch 'linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Crypto stuff from Herbert: "This push fixes a potential boot hang in ccp and an incorrect CPU capability check in aegis/morus on x86." * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2 crypto: ccp - add timeout support in the SEV command
2018-09-19Merge tag 'trace-v4.19-rc4' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "Vaibhav Nagarnaik found that modifying the ring buffer size could cause a huge latency in the system because it does a while loop to free pages without releasing the CPU (on non preempt kernels). In a case where there are hundreds of thousands of pages to free it could actually cause a system stall. A properly place cond_resched() solves this issue."
2018-09-19Merge tag 'platform-drivers-x86-v4.19-2' of ↵Greg Kroah-Hartman
git://git.infradead.org/linux-platform-drivers-x86 Darren writes: "platform-drivers-x86 for v4.19-2 Free allocated ACPI buffers in two drivers. The following is an automated git shortlog grouped by driver: alienware-wmi: - Correct a memory leak dell-smbios-wmi: - Correct a memory leak" * tag 'platform-drivers-x86-v4.19-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: alienware-wmi: Correct a memory leak platform/x86: dell-smbios-wmi: Correct a memory leak
2018-09-18Merge branch 'ipv6-fix-issues-on-accessing-fib6_metrics'David S. Miller
Wei Wang says: ==================== ipv6: fix issues on accessing fib6_metrics The latest fix on the memory leak of fib6_metrics still causes use-after-free. This patch series first revert the previous fix and propose a new fix that is more inline with ipv4 logic and is tested to fix the use-after-free issue reported. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18ipv6: fix memory leak on dst->_metricsWei Wang
When dst->_metrics and f6i->fib6_metrics share the same memory, both take reference count on the dst_metrics structure. However, when dst is destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic() which does not take care of READONLY metrics and does not release refcnt. This causes memory leak. Similar to ipv4 logic, the fix is to properly release refcnt and free the memory space pointed by dst->_metrics if refcnt becomes 0. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Revert "ipv6: fix double refcount of fib6_metrics"Wei Wang
This reverts commit e70a3aad44cc8b24986687ffc98c4a4f6ecf25ea. This change causes use-after-free on dst->_metrics. The crash trace looks like this: [ 97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140 [ 97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801 [ 97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11 [ 97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018 [ 97.777957] Call Trace: [ 97.777971] [<ffffffff895709db>] dump_stack+0x4d/0x72 [ 97.777985] [<ffffffff881651df>] print_address_description+0x6f/0x260 [ 97.777997] [<ffffffff88165747>] kasan_report+0x257/0x370 [ 97.778001] [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140 [ 97.778004] [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20 [ 97.778008] [<ffffffff894488e6>] ip6_mtu+0x116/0x140 [ 97.778013] [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280 [ 97.778016] [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0 [ 97.778022] [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0 [ 97.778037] [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0 [ 97.778040] [<ffffffff881643b1>] ? save_stack+0xb1/0xd0 [ 97.778046] [<ffffffff89264c82>] tcp_send_mss+0x22/0x220 [ 97.778059] [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0 [ 97.778062] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20 [ 97.778066] [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60 [ 97.778070] [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280 [ 97.778075] [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430 [ 97.778078] [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0 [ 97.778082] [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140 [ 97.778085] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20 [ 97.778088] [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50 [ 97.778092] [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50 [ 97.778098] [<ffffffff89352d43>] inet_sendmsg+0x103/0x480 [ 97.778102] [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0 [ 97.778105] [<ffffffff890294da>] sock_sendmsg+0xba/0xf0 [ 97.778108] [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0 [ 97.778113] [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0 [ 97.778116] [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0 [ 97.778119] [<ffffffff881646d1>] ? memset+0x31/0x40 [ 97.778123] [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380 [ 97.778127] [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250 [ 97.778130] [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180 [ 97.778133] [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200 [ 97.778137] [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0 [ 97.778141] [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190 [ 97.778144] [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190 [ 97.778147] [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20 [ 97.778152] [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0 [ 97.778155] [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190 [ 97.778158] [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20 [ 97.778162] [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430 [ 97.778166] [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0 [ 97.778171] [<ffffffff8960131f>] ? page_fault+0x2f/0x50 [ 97.778174] [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 97.778177] RIP: 0033:0x7f83fa36000d [ 97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d [ 97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036 [ 97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168 [ 97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40 [ 97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000 [ 97.779684] Allocated by task 5919: [ 97.783185] save_stack+0x46/0xd0 [ 97.783187] kasan_kmalloc+0xad/0xe0 [ 97.783189] kmem_cache_alloc_trace+0xdf/0x580 [ 97.783190] ip6_convert_metrics.isra.79+0x7e/0x190 [ 97.783192] ip6_route_info_create+0x60a/0x2480 [ 97.783193] ip6_route_add+0x1d/0x80 [ 97.783195] inet6_rtm_newroute+0xdd/0xf0 [ 97.783198] rtnetlink_rcv_msg+0x641/0xb10 [ 97.783200] netlink_rcv_skb+0x27b/0x3e0 [ 97.783202] rtnetlink_rcv+0x15/0x20 [ 97.783203] netlink_unicast+0x4be/0x720 [ 97.783204] netlink_sendmsg+0x7bc/0xbf0 [ 97.783205] sock_sendmsg+0xba/0xf0 [ 97.783207] ___sys_sendmsg+0x6ca/0x8e0 [ 97.783208] __sys_sendmsg+0xe6/0x190 [ 97.783209] SyS_sendmsg+0x13/0x20 [ 97.783211] do_syscall_64+0x2ac/0x430 [ 97.783213] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 97.784709] Freed by task 0: [ 97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist [ 97.794497] save_stack+0x46/0xd0 [ 97.794499] kasan_slab_free+0x71/0xc0 [ 97.794500] kfree+0x7c/0xf0 [ 97.794501] fib6_info_destroy_rcu+0x24f/0x310 [ 97.794504] rcu_process_callbacks+0x38b/0x1730 [ 97.794506] __do_softirq+0x1c8/0x5d0 Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18sfp: fix oops with ethtool -mRussell King
If a network interface is created prior to the SFP socket being available, ethtool can request module information. This unfortunately leads to an oops: Unable to handle kernel NULL pointer dereference at virtual address 00000008 pgd = (ptrval) [00000008] *pgd=7c400831, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1480 Comm: ethtool Not tainted 4.19.0-rc3 #138 Hardware name: Broadcom Northstar Plus SoC PC is at sfp_get_module_info+0x8/0x10 LR is at dev_ethtool+0x218c/0x2afc Fix this by not filling in the network device's SFP bus pointer until SFP is fully bound, thereby avoiding the core calling into the SFP bus code. Fixes: ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices and sfp cages") Reported-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net: mvpp2: fix a txq_done race conditionAntoine Tenart
When no Tx IRQ is available, the txq_done() routine (called from tx_done()) shouldn't be called from the polling function, as in such case it is already called in the Tx path thanks to an hrtimer. This mostly occurred when using PPv2.1, as the engine then do not have Tx IRQs. Fixes: edc660fa09e2 ("net: mvpp2: replace TX coalescing interrupts with hrtimer") Reported-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Merge branch 'net-smc-fixes'David S. Miller
Ursula Braun says: ==================== net/smc: fixes 2018-09-18 here are some fixes in different areas of the smc code for the net tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: fix sizeof to int comparisonYueHaibing
Comparing an int to a size, which is unsigned, causes the int to become unsigned, giving the wrong result. kernel_sendmsg can return a negative error code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: no urgent data check for listen socketsKarsten Graul
Don't check a listen socket for pending urgent data in smc_poll(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: enable fallback for connection abort in state INITUrsula Braun
If a linkgroup is terminated abnormally already due to failing LLC CONFIRM LINK or LLC ADD LINK, fallback to TCP is still possible. In this case do not switch to state SMC_PEERABORTWAIT and do not set sk_err. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: remove duplicate mutex_unlockUrsula Braun
For a failing smc_listen_rdma_finish() smc_listen_decline() is called. If fallback is possible, the new socket is already enqueued to be accepted in smc_listen_decline(). Avoid enqueuing a second time afterwards in this case, otherwise the smc_create_lgr_pending lock is released twice: [ 373.463976] WARNING: bad unlock balance detected! [ 373.463978] 4.18.0-rc7+ #123 Tainted: G O [ 373.463979] ------------------------------------- [ 373.463980] kworker/1:1/30 is trying to release lock (smc_create_lgr_pending) at: [ 373.463990] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.463991] but there are no more locks to release! [ 373.463991] other info that might help us debug this: [ 373.463993] 2 locks held by kworker/1:1/30: [ 373.463994] #0: 00000000772cbaed ((wq_completion)"events"){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464000] #1: 000000003ad0894a ((work_completion)(&new_smc->smc_listen_work)){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464003] stack backtrace: [ 373.464005] CPU: 1 PID: 30 Comm: kworker/1:1 Kdump: loaded Tainted: G O 4.18.0-rc7uschi+ #123 [ 373.464007] Hardware name: IBM 2827 H43 738 (LPAR) [ 373.464010] Workqueue: events smc_listen_work [smc] [ 373.464011] Call Trace: [ 373.464015] ([<0000000000114100>] show_stack+0x60/0xd8) [ 373.464019] [<0000000000a8c9bc>] dump_stack+0x9c/0xd8 [ 373.464021] [<00000000001dcaf8>] print_unlock_imbalance_bug+0xf8/0x108 [ 373.464022] [<00000000001e045c>] lock_release+0x114/0x4f8 [ 373.464025] [<0000000000aa87fa>] __mutex_unlock_slowpath+0x4a/0x300 [ 373.464027] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.464029] [<0000000000197a68>] process_one_work+0x2a8/0x6b0 [ 373.464030] [<0000000000197ec2>] worker_thread+0x52/0x410 [ 373.464033] [<000000000019fd0e>] kthread+0x15e/0x178 [ 373.464035] [<0000000000aaf58a>] kernel_thread_starter+0x6/0xc [ 373.464052] [<0000000000aaf584>] kernel_thread_starter+0x0/0xc [ 373.464054] INFO: lockdep is turned off. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net/smc: fix non-blocking connect problemUrsula Braun
In state SMC_INIT smc_poll() delegates polling to the internal CLC socket. This means, once the connect worker has finished its kernel_connect() step, the poll wake-up may occur. This is not intended. The wake-up should occur from the wake up call in smc_connect_work() after __smc_connect() has finished. Thus in state SMC_INIT this patch now calls sock_poll_wait() on the main SMC socket. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18ravb: do not write 1 to reserved bitsKazuya Mizuguchi
EtherAVB hardware requires 0 to be written to status register bits in order to clear them, however, care must be taken not to: 1. Clear other bits, by writing zero to them 2. Write one to reserved bits This patch corrects the ravb driver with respect to the second point above. This is done by defining reserved bit masks for the affected registers and, after auditing the code, ensure all sites that may write a one to a reserved bit use are suitably masked. Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net: bnxt: Fix a uninitialized variable warning.zhong jiang
Fix the following compile warning: drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:49:5: warning: ‘nvm_param.dir_type’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (nvm_param.dir_type == BNXT_NVM_PORT_CFG) Signed-off-by: zhong jiang <zhongjiang@huawei.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Merge tag 'mlx5-fixes-2018-09-17' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-09-17 Sorry about the previous submission of this series which was mistakenly marked for net-next, here I am resending with 'net' mark. This series provides three fixes to mlx5 core and mlx5e netdevice driver. Please pull and let me know if there's any problem. For -stable v4.16: ('net/mlx5: Check for SQ and not RQ state when modifying hairpin SQ') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net: emac: fix fixed-link setup for the RTL8363SB switchChristian Lamparter
On the Netgear WNDAP620, the emac ethernet isn't receiving nor xmitting any frames from/to the RTL8363SB (identifies itself as a RTL8367RB). This is caused by the emac hardware not knowing the forced link parameters for speed, duplex, pause, etc. This begs the question, how this was working on the original driver code, when it was necessary to set the phy_address and phy_map to 0xffffffff. But I guess without access to the old PPC405/440/460 hardware, it's not possible to know. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18NFC: Fix the number of pipesSuren Baghdasaryan
According to ETSI TS 102 622 specification chapter 4.4 pipe identifier is 7 bits long which allows for 128 unique pipe IDs. Because NFC_HCI_MAX_PIPES is used as the number of pipes supported and not as the max pipe ID, its value should be 128 instead of 127. nfc_hci_recv_from_llc extracts pipe ID from packet header using NFC_HCI_FRAGMENT(0x7F) mask which allows for pipe ID value of 127. Same happens when NCI_HCP_MSG_GET_PIPE() is being used. With pipes array having only 127 elements and pipe ID of 127 the OOB memory access will result. Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Allen Pais <allen.pais@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18NFC: Fix possible memory corruption when handling SHDLC I-Frame commandsSuren Baghdasaryan
When handling SHDLC I-Frame commands "pipe" field used for indexing into an array should be checked before usage. If left unchecked it might access memory outside of the array of size NFC_HCI_MAX_PIPES(127). Malformed NFC HCI frames could be injected by a malicious NFC device communicating with the device being attacked (remote attack vector), or even by an attacker with physical access to the I2C bus such that they could influence the data transfers on that bus (local attack vector). skb->data is controlled by the attacker and has only been sanitized in the most trivial ways (CRC check), therefore we can consider the create_info struct and all of its members to tainted. 'create_info->pipe' with max value of 255 (uint8) is used to take an offset of the hdev->pipes array of 127 elements which can lead to OOB write. Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Allen Pais <allen.pais@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Suggested-by: Kevin Deus <kdeus@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18selftests: pmtu: properly redirect stderr to /dev/nullSabrina Dubroca
The cleanup function uses "$CMD 2 > /dev/null", which doesn't actually send stderr to /dev/null, so when the netns doesn't exist, the error message is shown. Use "2> /dev/null" instead, so that those messages disappear, as was intended. Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Merge branch 'stmmac-Coalesce-and-tail-addr-fixes'David S. Miller
Jose Abreu says: ==================== net: stmmac: Coalesce and tail addr fixes The fix for coalesce timer and a fix in tail address setting that impacts XGMAC2 operation. The series is: Tested-by: Jerome Brunet <jbrunet@baylibre.com> on a113 s400 board (single queue) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net: stmmac: Fixup the tail addr setting in xmit pathJose Abreu
Currently we are always setting the tail address of descriptor list to the end of the pre-allocated list. According to databook this is not correct. Tail address should point to the last available descriptor + 1, which means we have to update the tail address everytime we call the xmit function. This should make no impact in older versions of MAC but in newer versions there are some DMA features which allows the IP to fetch descriptors in advance and in a non sequential order so its critical that we set the tail address correctly. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Fixes: f748be531d70 ("stmmac: support new GMAC4") Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18net: stmmac: Rework coalesce timer and fix multi-queue racesJose Abreu
This follows David Miller advice and tries to fix coalesce timer in multi-queue scenarios. We are now using per-queue coalesce values and per-queue TX timer. Coalesce timer default values was changed to 1ms and the coalesce frames to 25. Tested in B2B setup between XGMAC2 and GMAC5. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Fixes: ce736788e8a ("net: stmmac: adding multiple buffers for TX") Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18pinctrl: cannonlake: Fix gpio base for GPP-ESimon Detheridge
The gpio base for GPP-E was set incorrectly to 258 instead of 256, preventing the touchpad working on my Tong Fang GK5CN5Z laptop. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=200787 Signed-off-by: Simon Detheridge <s@sd.ai> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-09-18Input: uinput - allow for max == min during input_absinfo validationPeter Hutterer
These values are inclusive, so a range of 1 requires min == max. Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net> Reviewed-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-09-18Input: elantech - enable middle button of touchpad on ThinkPad P72Aaron Ma
Adding 2 new touchpad IDs to support middle button support. Cc: stable@vger.kernel.org Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-09-18Input: atakbd - fix Atari CapsLock behaviourMichael Schmitz
The CapsLock key on Atari keyboards is not a toggle, it does send the normal make and break scancodes. Drop the CapsLock toggle handling code, which did cause the CapsLock key to merely act as a Shift key. Tested-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-09-18Input: atakbd - fix Atari keymapAndreas Schwab
Fix errors in Atari keymap (mostly in keypad, help and undo keys). Patch provided on debian-68k ML by Andreas Schwab <schwab@linux-m68k.org>, keymap array size and unhandled scancode limit adjusted to 0x73 by me. Tested-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-09-18Input: egalax_ts - add system wakeup supportAnson Huang
This patch adds wakeup function support for egalax touch screen, if "wakeup-source" is added to device tree's egalax touch screen node, the wakeup function will be enabled, and egalax touch screen will be able to wakeup system from suspend. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>