git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-01-31	kvm: x86: remove efer_reload entry in kvm_vcpu_stat	Longpeng(Mike)
	The efer_reload is never used since commit 26bb0981b3ff ("KVM: VMX: Use shared msr infrastructure"), so remove it. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-31	KVM: x86: AMD Processor Topology Information	Stanislav Lanci
	This patch allow to enable x86 feature TOPOEXT. This is needed to provide information about SMT on AMD Zen CPUs to the guest. Signed-off-by: Stanislav Lanci <pixo@polepetko.eu> Tested-by: Nick Sarnie <commendsarnex@gmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-31	x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when ↵	Vitaly Kuznetsov
	running nested I was investigating an issue with seabios >= 1.10 which stopped working for nested KVM on Hyper-V. The problem appears to be in handle_ept_violation() function: when we do fast mmio we need to skip the instruction so we do kvm_skip_emulated_instruction(). This, however, depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. However, this is not the case. Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when EPT MISCONFIG occurs. While on real hardware it was observed to be set, some hypervisors follow the spec and don't set it; we end up advancing IP with some random value. I checked with Microsoft and they confirmed they don't fill VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. Fix the issue by doing instruction skip through emulator when running nested. Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-31	kvm: embed vcpu id to dentry of vcpu anon inode	Masatake YAMATO
	All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means it is impossible to know the mapping between fds for vcpu and vcpu from userland. # LC_ALL=C ls -l /proc/617/fd \| grep vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu It is also impossible to know the mapping between vma for kvm_run structure and vcpu from userland. # LC_ALL=C grep vcpu /proc/617/maps 7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu 7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu This change adds vcpu id to d-entries for vcpu. With this change you can get the following output: # LC_ALL=C ls -l /proc/617/fd \| grep vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu:0 lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu:1 # LC_ALL=C grep vcpu /proc/617/maps 7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:0 7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:1 With the mappings known from the output, a tool like strace can report more details of qemu-kvm process activities. Here is the strace output of my local prototype: # ./strace -KK -f -p 617 2>&1 \| grep 'KVM_RUN\\| K' ... [pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO) K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00 K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112 [pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO) K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00 K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112 ... Signed-off-by: Masatake YAMATO <yamato@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-31	kvm: Map PFN-type memory regions as writable (if possible)	KarimAllah Ahmed
	For EPT-violations that are triggered by a read, the pages are also mapped with write permissions (if their memory region is also writable). That would avoid getting yet another fault on the same page when a write occurs. This optimization only happens when you have a "struct page" backing the memory region. So also enable it for memory regions that do not have a "struct page". Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-31	Merge branch 'work.misc' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of misc stuff, without any unifying topic, from various people. Neil's d_anon patch, several bugfixes, introduction of kvmalloc analogue of kmemdup_user(), extending bitfield.h to deal with fixed-endians, assorted cleanups all over the place..." * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) alpha: osf_sys.c: use timespec64 where appropriate alpha: osf_sys.c: fix put_tv32 regression jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path dcache: delete unused d_hash_mask dcache: subtract d_hash_shift from 32 in advance fs/buffer.c: fold init_buffer() into init_page_buffers() fs: fold __inode_permission() into inode_permission() fs: add RWF_APPEND sctp: use vmemdup_user() rather than badly open-coding memdup_user() snd_ctl_elem_init_enum_names(): switch to vmemdup_user() replace_user_tlv(): switch to vmemdup_user() new primitive: vmemdup_user() memdup_user(): switch to GFP_USER eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget() eventfd: fold eventfd_ctx_read() into eventfd_read() eventfd: convert to use anon_inode_getfd() nfs4file: get rid of pointless include of btrfs.h uvc_v4l2: clean copyin/copyout up vme_user: don't use __copy_..._user() usx2y: don't bother with memdup_user() for 16-byte structure ...
2018-01-31	Merge tag 'gfs2-4.16.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We've got 30 patches for this merge window. These generally fall into five categories: - code cleanups - patches related to adding PUNCH_HOLE support to GFS2 - support for new fields in resource group headers - a few bug fixes - support for new fields in journal log headers. These new fields, which were previously unused, are designed to make it easier to track down file system corruption, and allow fsck.gfs2 to make more intelligent decisions when finding and fixing file system corruption. Details: - Two patches from Abhi Das, to trim the ordered writes list, which used to grow uncontrollably until unmount. - Several patches from Andreas Gruenbacher: remove an unused parameter from function gfs2_write_jdata_pagevec, remove a pointless BUG_ON, clean up an error patch in trunc_start, remove some unused parameters from truncate, make gfs2_journaled_truncate more efficient, clean up the support functions for truncate, fix metadata read-ahead for truncate to make it faster, fix up the non-recursive truncate code, rework and rename gfs2_block_truncate_page, generalize the non-recursive truncate code so it can take a range of values for punch_hole support, introduce new PUNCH_HOLE support that take advantage of the previous patches, add fallocate support with PUNCH_HOLE, fix some typos in the comments, add the function gfs2_max_stuffed_size to replace a piece of code that was needlessly repeated throughout GFS2, a minor cleanup to function gfs2_page_add_databufs, get rid of function gfs2_log_header_in in preparation for the new log header fields, and also fix up some missing newlines in kernel messages. - Andy Price added a new field to resource groups to indicate where the next one should be, to allow fsck.gfs2 to make better repairs. He also added new rindex fields for consistency checking, and added a crc field to resource group headers for consistency checking. - I reduced redundancy in functions common to freeing dinodes, and when writing log headers between the journalling code and journal recovery code. Also added new fields to journal log headers based on a prototype from Steve Whitehouse, and log the source of journal log headers so we can better track down journal corruption. Minor comment typo fix and a fix for a BUG in an unlink error path. - Steve Whitehouse contributed a patch to fix an incorrect use of the gfs2_blk2rgrpd function. - Tetsuo Handa contributed a patch that fixes incorrect error handling in function init_gfs2_fs" * tag 'gfs2-4.16.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits) gfs2: Add a few missing newlines in messages gfs2: Remove inode from ordered write list in gfs2_write_inode() GFS2: Don't try to end a non-existent transaction in unlink GFS2: Fix minor comment typo GFS2: Log the reason for log flushes in every log header GFS2: Introduce new gfs2_log_header_v2 gfs2: Get rid of gfs2_log_header_in gfs2: Minor gfs2_page_add_databufs cleanup gfs2: Add gfs2_max_stuffed_size gfs2: Typo fixes gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE) gfs2: Turn trunc_dealloc into punch_hole gfs2: Generalize truncate code Turn gfs2_block_truncate_page into gfs2_block_zero_range gfs2: Improve non-recursive delete algorithm gfs2: Fix metadata read-ahead during truncate gfs2: Clean up {lookup,fillup}_metapath gfs2: Remove minor gfs2_journaled_truncate inefficiencies gfs2: truncate: Remove unnecessary oldsize parameters gfs2: Clean up trunc_start error path ...
2018-01-31	devpts: fix error handling in devpts_mntget()	Eric Biggers
	If devpts_ptmx_path() returns an error code, then devpts_mntget() dereferences an ERR_PTR(): BUG: unable to handle kernel paging request at fffffffffffffff5 IP: devpts_mntget+0x13f/0x280 fs/devpts/inode.c:173 Fix it by returning early in the error paths. Reproducer: #define _GNU_SOURCE #include <fcntl.h> #include <sched.h> #include <sys/ioctl.h> #define TIOCGPTPEER _IO('T', 0x41) int main() { for (;;) { int fd = open("/dev/ptmx", 0); unshare(CLONE_NEWNS); ioctl(fd, TIOCGPTPEER, 0); } } Fixes: 311fc65c9fb9 ("pty: Repair TIOCGPTPEER") Reported-by: syzbot <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # v4.13+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31	iversion: make inode_cmp_iversion{+raw} return bool instead of s64	Jeff Layton
	As Linus points out: The inode_cmp_iversion{+raw}() functions are pure and utter crap. Why? You say that they return 0/negative/positive, but they do so in a completely broken manner. They return that ternary value as the sequence number difference in a 's64', which means that if you actually care about that ternary value, and do the sane thing that the kernel-doc of the function implies is the right thing, you would do int cmp = inode_cmp_iversion(inode, old); if (cmp < 0 ... and as a result you get code that looks sane, but that doesn't actually WORK right. Since none of the callers actually care about the ternary value here, convert the inode_cmp_iversion{+raw} functions to just return a boolean value (false for matching, true for non-matching). This matches the existing use of these functions just fine, and makes it simple to convert them to return a ternary value in the future if we grow callers that need it. With this change we can also reimplement inode_cmp_iversion in a simpler way using inode_peek_iversion. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31	Merge remote-tracking branch 'lorenzo/pci/cadence' into next	Bjorn Helgaas
	* lorenzo/pci/cadence: PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller PCI: endpoint: Fix EPF device name to support multi-function devices PCI: endpoint: Add the function number as argument to EPC ops PCI: cadence: Add host driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller PCI: Add vendor ID for Cadence PCI: Add generic function to probe PCI host controllers PCI: generic: fix missing call of pci_free_resource_list() PCI: OF: Add generic function to parse and allocate PCI resources PCI: Regroup all PCI related entries into drivers/pci/Makefile Conflicts: drivers/pci/of.c include/linux/pci.h
2018-01-31	Merge branch 'pci/virtualization' into next	Bjorn Helgaas
	* pci/virtualization: PCI: Expose ari_enabled in sysfs PCI: Add function 1 DMA alias quirk for Marvell 9128 PCI: Mark Ceton InfiniTV4 INTx masking as broken xen/pci: Use acpi_noirq_set() helper to avoid #ifdef
2018-01-31	Merge branch 'pci/trivial' into next	Bjorn Helgaas
	* pci/trivial: PCI: Clean up whitespace in linux/pci.h, pci/pci.h PCI: Tidy up pci/probe.c comments
2018-01-31	Merge branch 'pci/switchtec' into next	Bjorn Helgaas
	* pci/switchtec: switchtec: Add device IDs for PSX 24xG3 and PSX 48xG3 switchtec: Add Global Fabric Manager Server (GFMS) event
2018-01-31	Merge branch 'pci/resource' into next	Bjorn Helgaas
	* pci/resource: PCI: tegra: Remove PCI_REASSIGN_ALL_BUS use on Tegra resource: Set type when reserving new regions resource: Set type of "reserve=" user-specified resources irqchip/i8259: Set I/O port resource types correctly powerpc: Set I/O port resource types correctly MIPS: Set I/O port resource types correctly vgacon: Set VGA struct resource types PCI: Use dev_info() rather than dev_err() for ROM validation PCI: Remove PCI_REASSIGN_ALL_RSRC use on arm and arm64 PCI: Remove sysfs resource mmap warning Conflicts: drivers/pci/rom.c
2018-01-31	Merge branch 'pci/msi' into next	Bjorn Helgaas
	* pci/msi: PCI: Disable MSI for HiSilicon Hip06/Hip07 only in Root Port mode
2018-01-31	Merge branch 'pci/misc' into next	Bjorn Helgaas
	* pci/misc: PCI: Add dummy pci_irqd_intx_xlate() for CONFIG_PCI=n build PCI: Add wrappers for dev_printk() PCI: Remove unnecessary messages for memory allocation failures PCI: Add #defines for Completion Timeout Disable feature hinic: Replace PCI pool old API net: e100: Replace PCI pool old API block: DAC960: Replace PCI pool old API MAINTAINERS: Include more PCI files PCI: Remove unneeded kallsyms include powerpc/pci: Unroll two pass loop when scanning bridges powerpc/pci: Use for_each_pci_bridge() helper
2018-01-31	Merge branch 'pci/hotplug' into next	Bjorn Helgaas
	* pci/hotplug: PCI: pciehp: Assume NoCompl+ for Thunderbolt ports PCI: hotplug: Drop checking of PCI_BRIDGE_CONTROL in *_unconfigure_device()
2018-01-31	Merge branch 'pci/enumeration' into next	Bjorn Helgaas
	* pci/enumeration: RDMA/qedr: Use pci_enable_atomic_ops_to_root() PCI: Add pci_enable_atomic_ops_to_root() PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports
2018-01-31	Merge branch 'pci/dt-resources' into next	Bjorn Helgaas
	* pci/dt-resources: PCI: Make of_irq_parse_pci() static powerpc/pci: Use of_irq_parse_and_map_pci() helper PCI: Move OF-related PCI functions into PCI core
2018-01-31	Merge branch 'pci/dpc' into next	Bjorn Helgaas
	* pci/dpc: PCI/DPC: Reformat DPC register definitions PCI/DPC: Add and use DPC Status register field definitions PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error() PCI/DPC: Remove unnecessary RP PIO register structs PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info() PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info() PCI/DPC: Make RP PIO log size check more generic PCI/DPC: Rename local "status" to "dpc_status" PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error() PCI/DPC: Process RP PIO details only if RP PIO extensions supported PCI/DPC: Read RP PIO Log Size once at probe PCI/DPC: Rename struct dpc_dev.rp to rp_extensions PCI/DPC: Add local variable for DPC capability offset PCI/DPC: Rename interrupt_event_handler() to dpc_work() PCI/DPC: Fix interrupt message number print PCI/DPC: Enable DPC only if AER is available PCI/DPC: Fix shared interrupt handling
2018-01-31	Merge branch 'pci/dma' into next	Bjorn Helgaas
	* pci/dma: PCI: Remove NULL device handling from PCI DMA API net: tsi108: Use DMA API properly media: ttusb-dec: Remove pci_zalloc_coherent() abuse media: ttusb-budget: Remove pci_zalloc_coherent() abuse
2018-01-31	Merge branch 'pci/deprecate-get-bus-and-slot' into next	Bjorn Helgaas
	* pci/deprecate-get-bus-and-slot: video: fbdev: riva: deprecate pci_get_bus_and_slot() video: fbdev: nvidia: deprecate pci_get_bus_and_slot() video: fbdev: intelfb: deprecate pci_get_bus_and_slot() openprom: Deprecate pci_get_bus_and_slot() xen/pcifront: Deprecate pci_get_bus_and_slot() PCI: Deprecate pci_get_bus_and_slot() PCI: ibmphp: Deprecate pci_get_bus_and_slot() PCI: cpqhp: Deprecate pci_get_bus_and_slot() pch_gbe: Deprecate pci_get_bus_and_slot() bnx2x: Deprecate pci_get_bus_and_slot() powerpc/via-pmu: Deprecate pci_get_bus_and_slot() iommu/amd: Deprecate pci_get_bus_and_slot() sl82c105: deprecate pci_get_bus_and_slot() drm/nouveau: deprecate pci_get_bus_and_slot() drm/gma500: Deprecate pci_get_bus_and_slot() ibft: Deprecate pci_get_bus_and_slot() edd: Deprecate pci_get_bus_and_slot() agp: sworks: Deprecate pci_get_bus_and_slot() agp: nvidia: Deprecate pci_get_bus_and_slot() ata: Deprecate pci_get_bus_and_slot() x86/PCI: Deprecate pci_get_bus_and_slot() powerpc/PCI: Deprecate pci_get_bus_and_slot() alpha/PCI: Deprecate pci_get_bus_and_slot()
2018-01-31	Merge branch 'pci/aspm' into next	Bjorn Helgaas
	* pci/aspm: PCI/ASPM: Unexport internal ASPM interfaces PCI/ASPM: Enable Latency Tolerance Reporting when supported PCI/ASPM: Calculate LTR_L1.2_THRESHOLD from device characteristics
2018-01-31	Merge branch 'pci/aer' into next	Bjorn Helgaas
	* pci/aer: PCI/AER: Return error if AER is not supported PCI/AER: Skip recovery callbacks for correctable errors from ACPI APEI
2018-01-31	netfilter: on sockopt() acquire sock lock only in the required scope	Paolo Abeni
	Syzbot reported several deadlocks in the netfilter area caused by rtnl lock and socket lock being acquired with a different order on different code paths, leading to backtraces like the following one: ====================================================== WARNING: possible circular locking dependency detected 4.15.0-rc9+ #212 Not tainted ------------------------------------------------------ syzkaller041579/3682 is trying to acquire lock: (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock include/net/sock.h:1463 [inline] (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 but task is already holding lock: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607 tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106 xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845 check_target net/ipv6/netfilter/ip6_tables.c:538 [inline] find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:580 translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749 do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline] do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 -> #0 (sk_lock-AF_INET6){+.+.}: lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914 lock_sock_nested+0xc2/0x110 net/core/sock.c:2780 lock_sock include/net/sock.h:1463 [inline] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_INET6); lock(rtnl_mutex); lock(sk_lock-AF_INET6); * DEADLOCK * 1 lock held by syzkaller041579/3682: #0: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 The problem, as Florian noted, is that nf_setsockopt() is always called with the socket held, even if the lock itself is required only for very tight scopes and only for some operation. This patch addresses the issues moving the lock_sock() call only where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt() does not need anymore to acquire both locks. Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers") Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31	Merge branch 'for-4.16/remove-immediate' into for-linus	Jiri Kosina
	Pull 'immediate' feature removal from Miroslav Benes.
2018-01-31	tls: Add support for encryption using async offload accelerator	Vakul Garg
	Async crypto accelerators (e.g. drivers/crypto/caam) support offloading GCM operation. If they are enabled, crypto_aead_encrypt() return error code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a completion till the time the response for crypto offload request is received. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	ip6mr: fix stale iterator	Nikolay Aleksandrov
	When we dump the ip6mr mfc entries via proc, we initialize an iterator with the table to dump but we don't clear the cache pointer which might be initialized from a prior read on the same descriptor that ended. This can result in lock imbalance (an unnecessary unlock) leading to other crashes and hangs. Clear the cache pointer like ipmr does to fix the issue. Thanks for the reliable reproducer. Here's syzbot's trace: WARNING: bad unlock balance detected! 4.15.0-rc3+ #128 Not tainted syzkaller971460/3195 is trying to release lock (mrt_lock) at: [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller971460/3195: #0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0 fs/seq_file.c:165 stack backtrace: CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561 __lock_release kernel/locking/lockdep.c:3775 [inline] lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023 __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 traverse+0x3bc/0xa00 fs/seq_file.c:135 seq_read+0x96a/0x13d0 fs/seq_file.c:189 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 BUG: sleeping function called from invalid context at lib/usercopy.c:25 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460 INFO: lockdep is turned off. CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060 __might_sleep+0x95/0x190 kernel/sched/core.c:6013 __might_fault+0xab/0x1d0 mm/memory.c:4525 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] seq_read+0xcb4/0x13d0 fs/seq_file.c:279 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0 lib/usercopy.c:26 Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	net/sched: kconfig: Remove blank help texts	Ulf Magnusson
	Blank help texts are probably either a typo, a Kconfig misunderstanding, or some kind of half-committing to adding a help text (in which case a TODO comment would be clearer, if the help text really can't be added right away). Best to remove them, IMO. Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	openvswitch: meter: Use 64-bit arithmetic instead of 32-bit	Gustavo A. R. Silva
	Add suffix LL to constant 1000 in order to give the compiler complete information about the proper arithmetic to use. Notice that this constant is used in a context that expects an expression of type long long int (64 bits, signed). The expression (band->burst_size + band->rate) * 1000 is currently being evaluated using 32-bit arithmetic. Addresses-Coverity-ID: 1461563 ("Unintentional integer overflow") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	tcp_nv: fix potential integer overflow in tcpnv_acked	Gustavo A. R. Silva
	Add suffix ULL to constant 80000 in order to avoid a potential integer overflow and give the compiler complete information about the proper arithmetic to use. Notice that this constant is used in a context that expects an expression of type u64. The current cast to u64 effectively applies to the whole expression as an argument of type u64 to be passed to div64_u64, but it does not prevent it from being evaluated using 32-bit arithmetic instead of 64-bit arithmetic. Also, once the expression is properly evaluated using 64-bit arithmentic, there is no need for the parentheses and the external cast to u64. Addresses-Coverity-ID: 1357588 ("Unintentional integer overflow") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	r8169: fix RTL8168EP take too long to complete driver initialization.	Chunhao Lin
	Driver check the wrong register bit in rtl_ocp_tx_cond() that keep driver waiting until timeout. Fix this by waiting for the right register bit. Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	qmi_wwan: Add support for Quectel EP06	Kristian Evensen
	The Quectel EP06 is a Cat. 6 LTE modem. It uses the same interface as the EC20/EC25 for QMI, and requires the same "set DTR"-quirk to work. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK	Christian Brauner
	- Backwards Compatibility: If userspace wants to determine whether RTM_NEWLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_NEWLINK. Userpace should then fallback to other means. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31	Merge branches 'for-4.16/upstream' and 'for-4.15/upstream-fixes' into for-linus	Jiri Kosina
	Pull assorted small fixes queued for merge window.
2018-01-31	Merge branch 'for-4.16/i2c-hid' into for-linus	Jiri Kosina

2018-01-31	Merge branch 'for-4.16/wacom' into for-linus	Jiri Kosina
	Pull Wacom device driver updates. These don't have to go on top of the hid_have_special_driver[] revamp, as the whole group is assumed to have a special driver based on VID.
2018-01-31	Merge branch 'for-4.16/elo' into for-linus	Jiri Kosina
	Pull hid-elo device detection fix
2018-01-31	Merge branches 'for-4.16/hid-quirks-cleanup/asus', ↵	Jiri Kosina
	'for-4.16/hid-quirks-cleanup/elecom', 'for-4.16/hid-quirks-cleanup/ish', 'for-4.16/hid-quirks-cleanup/multitouch', 'for-4.16/hid-quirks-cleanup/pixart', 'for-4.16/hid-quirks-cleanup/rmi', 'for-4.16/hid-quirks-cleanup/sony' and 'for-4.16/hid-quirks-cleanup/toshiba' into for-linus Pull assorted device driver fixes (ASUS, Elecom, Intel-ISH, Multitouch, PixArt, RMI, Sony and Toshiba) based on top the hid-quirks revamp.
2018-01-31	Merge branch 'for-4.16/hid-quirks-cleanup/_base' into for-linus	Jiri Kosina
	This series from Benjamin Tissoires finally removes one of the big PITAs in the hid-core, which is the absolute need of having added all the new device IDs into the horrid hid_have_special_driver[]
2018-01-31	netfilter: ipt_CLUSTERIP: fix out-of-bounds accesses in clusterip_tg_check()	Dmitry Vyukov
	Commit 136e92bbec0a switched local_nodes from an array to a bitmask but did not add proper bounds checks. As the result clusterip_config_init_nodelist() can both over-read ipt_clusterip_tgt_info.local_nodes and over-write clusterip_config.local_nodes. Add bounds checks for both. Fixes: 136e92bbec0a ("[NETFILTER] CLUSTERIP: use a bitmap to store node responsibility data") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31	netfilter: x_tables: fix pointer leaks to userspace	Dmitry Vyukov
	Several netfilter matches and targets put kernel pointers into info objects, but don't set usersize in descriptors. This leads to kernel pointer leaks if a match/target is set and then read back to userspace. Properly set usersize for these matches/targets. Found with manual code inspection. Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31	netfilter: ipset: Fix wraparound in hash:net types	Jozsef Kadlecsik
	Fix wraparound bug which could lead to memory exhaustion when adding an x.x.x.x-255.255.255.255 range to any hash:net types. Fixes Netfilter's bugzilla id #1212, reported by Thomas Schwark. Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31	Merge tag 'kvm-arm-for-v4.16' of ↵	Radim Krčmář
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Changes for v4.16 The changes for this version include icache invalidation optimizations (improving VM startup time), support for forwarded level-triggered interrupts (improved performance for timers and passthrough platform devices), a small fix for power-management notifiers, and some cosmetic changes.
2018-01-31	Merge tag 'kvm-s390-next-4.16-3' of ↵	Radim Krčmář
	git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: update maintainers
2018-01-31	PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller	Cyrille Pitchen
	This patch adds support to the Cadence PCIe controller in endpoint mode. Since pieces of source code are shared with the host driver (Root Complex mode), we create a new directory under drivers/pci dedicated to the Cadence PCIe controller. The common code is placed into drivers/pci/cadence/pcie-cadence.c and used by both the host and endpoint controller drivers. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-01-31	dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller	Cyrille Pitchen
	This patch documents the DT bindings for the Cadence PCIe controller when configured in endpoint mode. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org>
2018-01-31	PCI: endpoint: Fix EPF device name to support multi-function devices	Cyrille Pitchen
	Fix the pci_epf_make() function so it can now bind many EPF devices to the same EPF driver. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-01-31	PCI: endpoint: Add the function number as argument to EPC ops	Cyrille Pitchen
	This patch updates the prototype of most handlers from 'struct pci_epc_ops' so the EPC library can now support multi-function devices. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-01-31	PCI: cadence: Add host driver for Cadence PCIe controller	Cyrille Pitchen
	This patch adds support to the Cadence PCIe controller in host mode. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>