summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-17KVM/VMX: Remve unused function is_external_interrupt().Tianyu Lan
is_external_interrupt() is not used now and so remove it. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17KVM: x86: return 0 in case kvm_mmu_memory_cache has min number of objectsWei Yang
The code tries to pre-allocate *min* number of objects, so it is ok to return 0 when the kvm_mmu_memory_cache meets the requirement. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17nVMX x86: Make nested_vmx_check_pml_controls() conciseKrish Sadhukhan
Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17KVM: x86: adjust kvm_mmu_page member to save 8 bytesWei Yang
On a 64bits machine, struct is naturally aligned with 8 bytes. Since kvm_mmu_page member *unsync* and *role* are less then 4 bytes, we can rearrange the sequence to compace the struct. As the comment shows, *role* and *gfn* are used to key the shadow page. In order to keep the comment valid, this patch moves the *unsync* up and exchange the position of *role* and *gfn*. From /proc/slabinfo, it shows the size of kvm_mmu_page is 8 bytes less and with one more object per slap after applying this patch. # name <active_objs> <num_objs> <objsize> <objperslab> kvm_mmu_page_header 0 0 168 24 kvm_mmu_page_header 0 0 160 25 Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17KVM: nVMX: restore host state in nested_vmx_vmexit for VMFailSean Christopherson
A VMEnter that VMFails (as opposed to VMExits) does not touch host state beyond registers that are explicitly noted in the VMFail path, e.g. EFLAGS. Host state does not need to be loaded because VMFail is only signaled for consistency checks that occur before the CPU starts to load guest state, i.e. there is no need to restore any state as nothing has been modified. But in the case where a VMFail is detected by hardware and not by KVM (due to deferring consistency checks to hardware), KVM has already loaded some amount of guest state. Luckily, "loaded" only means loaded to KVM's software model, i.e. vmcs01 has not been modified. So, unwind our software model to the pre-VMEntry host state. Not restoring host state in this VMFail path leads to a variety of failures because we end up with stale data in vcpu->arch, e.g. CR0, CR4, EFER, etc... will all be out of sync relative to vmcs01. Any significant delta in the stale data is all but guaranteed to crash L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong. An alternative to this "soft" reload would be to load host state from vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is wildly inconsistent with respect to the VMX architecture, e.g. an L1 VMM with separate VMExit and VMFail paths would explode. Note that this approach does not mean KVM is 100% accurate with respect to VMX hardware behavior, even at an architectural level (the exact order of consistency checks is microarchitecture specific). But 100% emulation accuracy isn't the goal (with this patch), rather the goal is to be consistent in the information delivered to L1, e.g. a VMExit should not fall-through VMENTER, and a VMFail should not jump to HOST_RIP. This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)", but retains the core aspects of that patch, just in an open coded form due to the need to pull state from vmcs01 instead of vmcs12. Restoring host state resolves a variety of issues introduced by commit "4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)", which remedied the incorrect behavior of treating VMFail like VMExit but in doing so neglected to restore arch state that had been modified prior to attempting nested VMEnter. A sample failure that occurs due to stale vcpu.arch state is a fault of some form while emulating an LGDT (due to emulated UMIP) from L1 after a failed VMEntry to L3, in this case when running the KVM unit test test_tpr_threshold_values in L1. L0 also hits a WARN in this case due to a stale arch.cr4.UMIP. L1: BUG: unable to handle kernel paging request at ffffc90000663b9e PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163 Oops: 0009 [#1] SMP CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:native_load_gdt+0x0/0x10 ... Call Trace: load_fixmap_gdt+0x22/0x30 __vmx_load_host_state+0x10e/0x1c0 [kvm_intel] vmx_switch_vmcs+0x2d/0x50 [kvm_intel] nested_vmx_vmexit+0x222/0x9c0 [kvm_intel] vmx_handle_exit+0x246/0x15a0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm] kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm] do_vfs_ioctl+0x9f/0x600 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4f/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 L0: WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel] ... CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76 Hardware name: Intel Corporation Kabylake Client platform/KBL S RIP: 0010:handle_desc+0x28/0x30 [kvm_intel] ... Call Trace: kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm] kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm] do_vfs_ioctl+0x9f/0x5e0 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x49/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure) Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly) Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17KVM: nVMX: Clear reserved bits of #DB exit qualificationJim Mattson
According to volume 3 of the SDM, bits 63:15 and 12:4 of the exit qualification field for debug exceptions are reserved (cleared to 0). However, the SDM is incorrect about bit 16 (corresponding to DR6.RTM). This bit should be set if a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. Note that this is the opposite of DR6.RTM, which "indicates (when clear) that a debug exception (#DB) or breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled." There is still an issue with stale DR6 bits potentially being misreported for the current debug exception. DR6 should not have been modified before vectoring the #DB exception, and the "new DR6 bits" should be available somewhere, but it was and they aren't. Fixes: b96fb439774e1 ("KVM: nVMX: fixes to nested virt interrupt injection") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: support high GPAs in dirty_log_testAndrew Jones
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: stop lying to aarch64 tests about PA-bitsAndrew Jones
Let's add the 40 PA-bit versions of the VM modes, that AArch64 should have been using, so we can extend the dirty log test without breaking things. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: dirty_log_test: also test 64K pages on aarch64Andrew Jones
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: port dirty_log_test to aarch64Andrew Jones
While we're messing with the code for the port and to support guest page sizes that are less than the host page size, we also make some code formatting cleanups and apply sync_global_to_guest(). Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: introduce new VM mode for 64K pagesAndrew Jones
Rename VM_MODE_FLAT48PG to be more descriptive of its config and add a new config that has the same parameters, except with 64K pages. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: add vcpu support for aarch64Andrew Jones
This code adds VM and VCPU setup code for the VM_MODE_FLAT48PG mode. The VM_MODE_FLAT48PG isn't yet fully supportable, as it defines the guest physical address limit as 52-bits, and KVM currently only supports guests with up to 40-bit physical addresses (see KVM_PHYS_SHIFT). VM_MODE_FLAT48PG will work fine, though, as long as no >= 40-bit physical addresses are used. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: add virt mem support for aarch64Andrew Jones
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: add vm_phy_pages_allocAndrew Jones
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: tidy up kvm_utilAndrew Jones
Tidy up kvm-util code: code/comment formatting, remove unused code, and move x86 specific code out. We also move vcpu_dump() out of common code, because not all arches (AArch64) have KVM_GET_REGS. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: add cscope make targetAndrew Jones
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: move arch-specific files to arch-specific locationsAndrew Jones
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: introduce ucallAndrew Jones
Rework the guest exit to userspace code to generalize the concept into what it is, a "hypercall to userspace", and provide two implementations of it: the PortIO version currently used, but only useable by x86, and an MMIO version that other architectures (except s390) can use. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17kvm: selftests: vcpu_setup: set cr4.osfxsrAndrew Jones
Guest code may want to call functions that have variable arguments. To do so, we either need to compile with -mno-sse or enable SSE in the VCPUs. As it should be pretty safe to turn on the feature, and -mno-sse would make linking test code with standard libraries difficult, we choose the feature enabling. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17KVM: LAPIC: Tune lapic_timer_advance_ns automaticallyWanpeng Li
In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel, both preemption_timer=N, Skylake server). This patch adds the capability to automatically tune lapic_timer_advance_ns step by step, the initial value is 1000ns as 'commit d0659d946be0 ("KVM: x86: add option to advance tscdeadline hrtimer expiration")' recommended, it will be reduced when it is too early, and increased when it is too late. The guest_tsc and tsc_deadline are hard to equal, so we assume we are done when the delta is within a small scope e.g. 100 cycles. This patch reduces latency (kvm-unit-tests/tscdeadline_latency, busy waits, preemption_timer enabled) from ~2600 cyles to ~1200 cyles on our Skylake server. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-16scsi: qla2xxx: Simplify conditional checkNathan Chancellor
Clang generates a warning when it sees a logical not followed by a conditional operator like ==, >, or < because it thinks that the logical not should be applied to the whole statement: drivers/scsi/qla2xxx/qla_nx.c:3702:7: warning: logical not is only applied to the left hand side of this comparison [-Wlogical-not-parentheses] if (!qla2x00_eh_wait_for_pending_commands(vha, 0, 0, ^ drivers/scsi/qla2xxx/qla_nx.c:3702:7: note: add parentheses after the '!' to evaluate the comparison first if (!qla2x00_eh_wait_for_pending_commands(vha, 0, 0, ^ ( drivers/scsi/qla2xxx/qla_nx.c:3702:7: note: add parentheses around left hand side expression to silence this warning if (!qla2x00_eh_wait_for_pending_commands(vha, 0, 0, ^ ( 1 warning generated. It assumes the author might have made a mistake in their logic: if (!a == b) -> if (!(a == b)) Sometimes that is the case; other times, it's just a super convoluted way of saying 'if (a)' when b = 0: if (!1 == 0) -> if (0 == 0) -> if (true) Alternatively: if (!1 == 0) -> if (!!1) -> if (1) Simplify this comparison so that Clang doesn't complain. Link: https://github.com/ClangBuiltLinux/linux/issues/80 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16Merge branch 'nfp-improve-bpf-offload'Alexei Starovoitov
Jakub Kicinski says: ==================== this set adds check to make sure offload behaviour is correct. First when atomic counters are used, we must make sure the map does not already contain data we did not prepare for holding atomics. Second patch double checks vNIC capabilities for program offload in case program is shared by multiple vNICs with different constraints. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16nfp: bpf: double check vNIC capabilities after object sharingJakub Kicinski
Program translation stage checks that program can be offloaded to the netdev which was passed during the load (bpf_attr->prog_ifindex). After program sharing was introduced, however, the netdev on which program is loaded can theoretically be different, and therefore we should recheck the program size and max stack size at load time. This was found by code inspection, AFAIK today all vNICs have identical caps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16nfp: bpf: protect against mis-initializing atomic countersJakub Kicinski
Atomic operations on the NFP are currently always in big endian. The driver keeps track of regions of memory storing atomic values and byte swaps them accordingly. There are corner cases where the map values may be initialized before the driver knows they are used as atomic counters. This can happen either when the datapath is performing the update and the stack contents are unknown or when map is updated before the program which will use it for atomic values is loaded. To avoid situation where user initializes the value to 0 1 2 3 and then after loading a program which uses the word as an atomic counter starts reading 3 2 1 0 - only allow atomic counters to be initialized to endian-neutral values. For updates from the datapath the stack information may not be as precise, so just allow initializing such values to 0. Example code which would break: struct bpf_map_def SEC("maps") rxcnt = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(__u32), .value_size = sizeof(__u64), .max_entries = 1, }; int xdp_prog1() { __u64 nonzeroval = 3; __u32 key = 0; __u64 *value; value = bpf_map_lookup_elem(&rxcnt, &key); if (!value) bpf_map_update_elem(&rxcnt, &key, &nonzeroval, BPF_ANY); else __sync_fetch_and_add(value, 1); return XDP_PASS; } $ offload bpftool map dump key: 00 00 00 00 value: 00 00 00 03 00 00 00 00 should be: $ offload bpftool map dump key: 00 00 00 00 value: 03 00 00 00 00 00 00 00 Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16scsi: bfa: Remove unused functionsNathan Chancellor
Clang warns when a variable is assigned to itself. drivers/scsi/bfa/bfa_fcbuild.c:199:6: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] len = len; ~~~ ^ ~~~ drivers/scsi/bfa/bfa_fcbuild.c:838:6: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] len = len; ~~~ ^ ~~~ drivers/scsi/bfa/bfa_fcbuild.c:917:6: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] len = len; ~~~ ^ ~~~ drivers/scsi/bfa/bfa_fcbuild.c:981:6: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] len = len; ~~~ ^ ~~~ drivers/scsi/bfa/bfa_fcbuild.c:1008:6: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign] len = len; ~~~ ^ ~~~ 5 warnings generated. This construct is usually used to avoid unused variable warnings, which I assume is the case here. -Wunused-parameter is hidden behind -Wextra with GCC 4.6, which is the minimum version to compile the kernel as of commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6"). However, upon further inspection, these functions aren't actually used anywhere; they're just defined. Rather than just removing the self assignments, remove all of this dead code. Link: https://github.com/ClangBuiltLinux/linux/issues/148 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: qla2xxx: Remove unnecessary self assignmentNathan Chancellor
Clang warns when a variable is assigned to itself. drivers/scsi/qla2xxx/qla_mbx.c:1514:4: warning: explicitly assigning value of variable of type 'uint64_t' (aka 'unsigned long long') to itself [-Wself-assign] l = l; ~ ^ ~ 1 warning generated. This construct is usually used to avoid unused variable warnings, which I assume is the case here. -Wunused-parameter is hidden behind -Wextra with GCC 4.6, which is the minimum version to compile the kernel as of commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6"). Just remove this line to silence Clang. Link: https://github.com/ClangBuiltLinux/linux/issues/83 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16libbpf: Per-symbol visibility for DSOAndrey Ignatov
Make global symbols in libbpf DSO hidden by default with -fvisibility=hidden and export symbols that are part of ABI explicitly with __attribute__((visibility("default"))). This is common practice that should prevent from accidentally exporting a symbol, that is not supposed to be a part of ABI what, in turn, improves both libbpf developer- and user-experiences. See [1] for more details. Export control becomes more important since more and more projects use libbpf. The patch doesn't export a bunch of netlink related functions since as agreed in [2] they'll be reworked. That doesn't break bpftool since bpftool links libbpf statically. [1] https://www.akkadia.org/drepper/dsohowto.pdf (2.2 Export Control) [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg251434.html Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16scsi: arcmsr: Remove set but not used variables 'id, lun'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/scsi/arcmsr/arcmsr_hba.c: In function 'arcmsr_drain_donequeue': drivers/scsi/arcmsr/arcmsr_hba.c:1320:10: warning: variable 'lun' set but not used [-Wunused-but-set-variable] drivers/scsi/arcmsr/arcmsr_hba.c:1320:6: warning: variable 'id' set but not used [-Wunused-but-set-variable] Never used since introduction in commit ae52e7f09ff5 ("arcmsr: Support 1024 scatter-gather list entries and improve AP while FW trapped and behaviors of EHs"). Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: megaraid_sas: fix a missing-check bugWenwen Wang
In megasas_mgmt_compat_ioctl_fw(), to handle the structure compat_megasas_iocpacket 'cioc', a user-space structure megasas_iocpacket 'ioc' is allocated before megasas_mgmt_ioctl_fw() is invoked to handle the packet. Since the two data structures have different fields, the data is copied from 'cioc' to 'ioc' field by field. In the copy process, 'sense_ptr' is prepared if the field 'sense_len' is not null, because it will be used in megasas_mgmt_ioctl_fw(). To prepare 'sense_ptr', the user-space data 'ioc->sense_off' and 'cioc->sense_off' are copied and saved to kernel-space variables 'local_sense_off' and 'user_sense_off' respectively. Given that 'ioc->sense_off' is also copied from 'cioc->sense_off', 'local_sense_off' and 'user_sense_off' should have the same value. However, 'cioc' is in the user space and a malicious user can race to change the value of 'cioc->sense_off' after it is copied to 'ioc->sense_off' but before it is copied to 'user_sense_off'. By doing so, the attacker can inject different values into 'local_sense_off' and 'user_sense_off'. This can cause undefined behavior in the following execution, because the two variables are supposed to be same. This patch enforces a check on the two kernel variables 'local_sense_off' and 'user_sense_off' to make sure they are the same after the copy. In case they are not, an error code EINVAL will be returned. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: be2iscsi: fix spelling mistake "Retreiving" -> "Retrieving"Colin Ian King
Trivial fix to spelling mistake in beiscsi_log message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: lpfc: fix spelling mistake "Resrouce" -> "Resource"Colin Ian King
Trivial fix to spelling mistake in lpfc_printf_log message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: hisi_sas: Fix spin lock management in slot_index_alloc_quirk_v2_hw()John Garry
Currently a spin_unlock_irqrestore() call is missing on the error path, so add it. Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: sg: remove bad blk_end_request_all() callJens Axboe
We just need to free the request here. Additionally, this is currently wrong for a queue that's using MQ currently, it'll crash. Cc: Doug Gilbert <dgilbert@interlog.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: osd: initiator should use mq variant of request endingJens Axboe
This is currently wrong since it isn't dependent on if we're using mq or not. At least now it'll be correct when we force mq. Cc: linux-scsi@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16scsi: ips: fix missing break in switchGustavo A. R. Silva
Add missing break statement in order to prevent the code from falling through to case TEST_UNIT_READY. Addresses-Coverity-ID: 1357338 ("Missing break in switch") Suggested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16tcp, ulp: remove socket lock assertion on ULP cleanupDaniel Borkmann
Eric reported that syzkaller triggered a splat in tcp_cleanup_ulp() where assertion sock_owned_by_me() failed. This happened through inet_csk_prepare_forced_close() first releasing the socket lock, then calling into tcp_done(newsk) which is called after the inet_csk_prepare_forced_close() and therefore without the socket lock held. The sock_owned_by_me() assertion can generally be removed as the only place where tcp_cleanup_ulp() is called from now is out of inet_csk_destroy_sock() -> sk->sk_prot->destroy() where socket is in dead state and unreachable. Therefore, add a comment why the check is not needed instead. Fixes: 8b9088f806e1 ("tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanup") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/mlx5: Expose DC scatter to CQE capability bitYonatan Cohen
dc_req_scat_data_cqe capability bit determines if requester scatter to cqe is available for 64 bytes CQE over DC transport type. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-10-16RDMA/umad: Use kernel API to allocate umad indexesLeon Romanovsky
Replace custom code to allocate indexes to generic kernel API. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/uverbs: Use kernel API to allocate uverbs indexesLeon Romanovsky
Replace custom code to allocate indexes to generic kernel API. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Increase total number of RDMA ports across all devicesLeon Romanovsky
IDA adds overhead to store IDs bitmap with maximal value of IDA can be upto 2099202 (IDA_MAX = 0x80000000U / IDA_BITMAP_BITS - 1). However, there is no need to add such enormous number of devices and it is enough for now to limit it to be 8192. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16dm: remove unnecessary unlikely() around WARN_ON_ONCE()Igor Stoppa
WARN_ON() already contains an unlikely(), so it's not necessary to wrap it into another. Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-16MAINTAINERS: Add entry for Broadcom SPI controllerFlorian Fainelli
Add an entry for the Broadcom SPI controller in the MAINTAINERS file. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-10-16dm zoned: target: use refcount_t for dm zoned reference countersJohn Pittman
The API surrounding refcount_t should be used in place of atomic_t when variables are being used as reference counters. This API can prevent issues such as counter overflows and use-after-free conditions. Within the dm zoned target stack, the atomic_t API is used for bioctx->ref and cw->refcount. Change these to use refcount_t, avoiding the issues mentioned. Signed-off-by: John Pittman <jpittman@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-16dm thin: use refcount_t for thin_c reference countingJohn Pittman
The API surrounding refcount_t should be used in place of atomic_t when variables are being used as reference counters. It can potentially prevent reference counter overflows and use-after-free conditions. In the dm thin layer, one such example is tc->refcount. Change this from the atomic_t API to the refcount_t API to prevent mentioned conditions. Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-16IB/mlx4: Add port and TID to MAD debug printHåkon Bugge
Add said information and make the debug print format consistent. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16IB/mlx4: Enable debug print of SMPsHåkon Bugge
IB Subnet Management Packets (SMPs) were excluded from debug prints. Fixed by enabling print even on QP0 MADs. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Rename ports_parent to ports_kobjParav Pandit
Normally kobj objects have kobj suffix to reflect it. Rename ports_parent to ports_kobj. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Do not expose unsupported countersParav Pandit
If the provider driver (such as rdma_rxe) doesn't support pma counters, avoid exposing its directory similar to optional hw_counters directory. If core fails to read the PMA counter, return an error so that user can retry later if needed. Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c") Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16IB/mlx4: Refer to the device kobject instead of ports_parentParav Pandit
iov sysfs tree is created under ib device at /sys/class/infiniband/mlx4_0/iov. And, ibdev->ports_parent->parent = &ibdev->dev. Therefore, refer to device's kobject directly instead of indirect access to it. Additionally, iov entries are created under device kobject and deleted before device is removed. There is no need to hold additional reference to device kobject in provider driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16perf tools: Pass build flags to traceevent buildJiri Olsa
So the extra user build flags are propagated to libtraceevent. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: "Herton R. Krzesinski" <herton@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Link: http://lkml.kernel.org/r/20181016150614.21260-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>