Age | Commit message (Collapse) | Author |
|
The efer_reload is never used since
commit 26bb0981b3ff ("KVM: VMX: Use shared msr infrastructure"),
so remove it.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
This patch allow to enable x86 feature TOPOEXT. This is needed to provide
information about SMT on AMD Zen CPUs to the guest.
Signed-off-by: Stanislav Lanci <pixo@polepetko.eu>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
running nested
I was investigating an issue with seabios >= 1.10 which stopped working
for nested KVM on Hyper-V. The problem appears to be in
handle_ept_violation() function: when we do fast mmio we need to skip
the instruction so we do kvm_skip_emulated_instruction(). This, however,
depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
However, this is not the case.
Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
EPT MISCONFIG occurs. While on real hardware it was observed to be set,
some hypervisors follow the spec and don't set it; we end up advancing
IP with some random value.
I checked with Microsoft and they confirmed they don't fill
VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
Fix the issue by doing instruction skip through emulator when running
nested.
Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means
it is impossible to know the mapping between fds for vcpu and vcpu
from userland.
# LC_ALL=C ls -l /proc/617/fd | grep vcpu
lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu
lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu
It is also impossible to know the mapping between vma for kvm_run
structure and vcpu from userland.
# LC_ALL=C grep vcpu /proc/617/maps
7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu
7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu
This change adds vcpu id to d-entries for vcpu. With this change
you can get the following output:
# LC_ALL=C ls -l /proc/617/fd | grep vcpu
lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu:0
lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu:1
# LC_ALL=C grep vcpu /proc/617/maps
7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:0
7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:1
With the mappings known from the output, a tool like strace can report more details
of qemu-kvm process activities. Here is the strace output of my local prototype:
# ./strace -KK -f -p 617 2>&1 | grep 'KVM_RUN\| K'
...
[pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO)
K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
[pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO)
K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
...
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
For EPT-violations that are triggered by a read, the pages are also mapped with
write permissions (if their memory region is also writable). That would avoid
getting yet another fault on the same page when a write occurs.
This optimization only happens when you have a "struct page" backing the memory
region. So also enable it for memory regions that do not have a "struct page".
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"All kinds of misc stuff, without any unifying topic, from various
people.
Neil's d_anon patch, several bugfixes, introduction of kvmalloc
analogue of kmemdup_user(), extending bitfield.h to deal with
fixed-endians, assorted cleanups all over the place..."
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
alpha: osf_sys.c: use timespec64 where appropriate
alpha: osf_sys.c: fix put_tv32 regression
jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
dcache: delete unused d_hash_mask
dcache: subtract d_hash_shift from 32 in advance
fs/buffer.c: fold init_buffer() into init_page_buffers()
fs: fold __inode_permission() into inode_permission()
fs: add RWF_APPEND
sctp: use vmemdup_user() rather than badly open-coding memdup_user()
snd_ctl_elem_init_enum_names(): switch to vmemdup_user()
replace_user_tlv(): switch to vmemdup_user()
new primitive: vmemdup_user()
memdup_user(): switch to GFP_USER
eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget()
eventfd: fold eventfd_ctx_read() into eventfd_read()
eventfd: convert to use anon_inode_getfd()
nfs4file: get rid of pointless include of btrfs.h
uvc_v4l2: clean copyin/copyout up
vme_user: don't use __copy_..._user()
usx2y: don't bother with memdup_user() for 16-byte structure
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Bob Peterson:
"We've got 30 patches for this merge window. These generally fall into
five categories:
- code cleanups
- patches related to adding PUNCH_HOLE support to GFS2
- support for new fields in resource group headers
- a few bug fixes
- support for new fields in journal log headers. These new fields,
which were previously unused, are designed to make it easier to
track down file system corruption, and allow fsck.gfs2 to make more
intelligent decisions when finding and fixing file system
corruption.
Details:
- Two patches from Abhi Das, to trim the ordered writes list, which
used to grow uncontrollably until unmount.
- Several patches from Andreas Gruenbacher: remove an unused
parameter from function gfs2_write_jdata_pagevec, remove a
pointless BUG_ON, clean up an error patch in trunc_start, remove
some unused parameters from truncate, make gfs2_journaled_truncate
more efficient, clean up the support functions for truncate, fix
metadata read-ahead for truncate to make it faster, fix up the
non-recursive truncate code, rework and rename
gfs2_block_truncate_page, generalize the non-recursive truncate
code so it can take a range of values for punch_hole support,
introduce new PUNCH_HOLE support that take advantage of the
previous patches, add fallocate support with PUNCH_HOLE, fix some
typos in the comments, add the function gfs2_max_stuffed_size to
replace a piece of code that was needlessly repeated throughout
GFS2, a minor cleanup to function gfs2_page_add_databufs, get rid
of function gfs2_log_header_in in preparation for the new log
header fields, and also fix up some missing newlines in kernel
messages.
- Andy Price added a new field to resource groups to indicate where
the next one should be, to allow fsck.gfs2 to make better repairs.
He also added new rindex fields for consistency checking, and added
a crc field to resource group headers for consistency checking.
- I reduced redundancy in functions common to freeing dinodes, and
when writing log headers between the journalling code and journal
recovery code. Also added new fields to journal log headers based
on a prototype from Steve Whitehouse, and log the source of journal
log headers so we can better track down journal corruption. Minor
comment typo fix and a fix for a BUG in an unlink error path.
- Steve Whitehouse contributed a patch to fix an incorrect use of the
gfs2_blk2rgrpd function.
- Tetsuo Handa contributed a patch that fixes incorrect error
handling in function init_gfs2_fs"
* tag 'gfs2-4.16.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits)
gfs2: Add a few missing newlines in messages
gfs2: Remove inode from ordered write list in gfs2_write_inode()
GFS2: Don't try to end a non-existent transaction in unlink
GFS2: Fix minor comment typo
GFS2: Log the reason for log flushes in every log header
GFS2: Introduce new gfs2_log_header_v2
gfs2: Get rid of gfs2_log_header_in
gfs2: Minor gfs2_page_add_databufs cleanup
gfs2: Add gfs2_max_stuffed_size
gfs2: Typo fixes
gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE)
gfs2: Turn trunc_dealloc into punch_hole
gfs2: Generalize truncate code
Turn gfs2_block_truncate_page into gfs2_block_zero_range
gfs2: Improve non-recursive delete algorithm
gfs2: Fix metadata read-ahead during truncate
gfs2: Clean up {lookup,fillup}_metapath
gfs2: Remove minor gfs2_journaled_truncate inefficiencies
gfs2: truncate: Remove unnecessary oldsize parameters
gfs2: Clean up trunc_start error path
...
|
|
If devpts_ptmx_path() returns an error code, then devpts_mntget()
dereferences an ERR_PTR():
BUG: unable to handle kernel paging request at fffffffffffffff5
IP: devpts_mntget+0x13f/0x280 fs/devpts/inode.c:173
Fix it by returning early in the error paths.
Reproducer:
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <sys/ioctl.h>
#define TIOCGPTPEER _IO('T', 0x41)
int main()
{
for (;;) {
int fd = open("/dev/ptmx", 0);
unshare(CLONE_NEWNS);
ioctl(fd, TIOCGPTPEER, 0);
}
}
Fixes: 311fc65c9fb9 ("pty: Repair TIOCGPTPEER")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.13+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As Linus points out:
The inode_cmp_iversion{+raw}() functions are pure and utter crap.
Why?
You say that they return 0/negative/positive, but they do so in a
completely broken manner. They return that ternary value as the
sequence number difference in a 's64', which means that if you
actually care about that ternary value, and do the *sane* thing that
the kernel-doc of the function implies is the right thing, you would
do
int cmp = inode_cmp_iversion(inode, old);
if (cmp < 0 ...
and as a result you get code that looks sane, but that doesn't
actually *WORK* right.
Since none of the callers actually care about the ternary value here,
convert the inode_cmp_iversion{+raw} functions to just return a boolean
value (false for matching, true for non-matching).
This matches the existing use of these functions just fine, and makes it
simple to convert them to return a ternary value in the future if we
grow callers that need it.
With this change we can also reimplement inode_cmp_iversion in a simpler
way using inode_peek_iversion.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* lorenzo/pci/cadence:
PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
PCI: endpoint: Fix EPF device name to support multi-function devices
PCI: endpoint: Add the function number as argument to EPC ops
PCI: cadence: Add host driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
PCI: Add vendor ID for Cadence
PCI: Add generic function to probe PCI host controllers
PCI: generic: fix missing call of pci_free_resource_list()
PCI: OF: Add generic function to parse and allocate PCI resources
PCI: Regroup all PCI related entries into drivers/pci/Makefile
Conflicts:
drivers/pci/of.c
include/linux/pci.h
|
|
* pci/virtualization:
PCI: Expose ari_enabled in sysfs
PCI: Add function 1 DMA alias quirk for Marvell 9128
PCI: Mark Ceton InfiniTV4 INTx masking as broken
xen/pci: Use acpi_noirq_set() helper to avoid #ifdef
|
|
* pci/trivial:
PCI: Clean up whitespace in linux/pci.h, pci/pci.h
PCI: Tidy up pci/probe.c comments
|
|
* pci/switchtec:
switchtec: Add device IDs for PSX 24xG3 and PSX 48xG3
switchtec: Add Global Fabric Manager Server (GFMS) event
|
|
* pci/resource:
PCI: tegra: Remove PCI_REASSIGN_ALL_BUS use on Tegra
resource: Set type when reserving new regions
resource: Set type of "reserve=" user-specified resources
irqchip/i8259: Set I/O port resource types correctly
powerpc: Set I/O port resource types correctly
MIPS: Set I/O port resource types correctly
vgacon: Set VGA struct resource types
PCI: Use dev_info() rather than dev_err() for ROM validation
PCI: Remove PCI_REASSIGN_ALL_RSRC use on arm and arm64
PCI: Remove sysfs resource mmap warning
Conflicts:
drivers/pci/rom.c
|
|
* pci/msi:
PCI: Disable MSI for HiSilicon Hip06/Hip07 only in Root Port mode
|
|
* pci/misc:
PCI: Add dummy pci_irqd_intx_xlate() for CONFIG_PCI=n build
PCI: Add wrappers for dev_printk()
PCI: Remove unnecessary messages for memory allocation failures
PCI: Add #defines for Completion Timeout Disable feature
hinic: Replace PCI pool old API
net: e100: Replace PCI pool old API
block: DAC960: Replace PCI pool old API
MAINTAINERS: Include more PCI files
PCI: Remove unneeded kallsyms include
powerpc/pci: Unroll two pass loop when scanning bridges
powerpc/pci: Use for_each_pci_bridge() helper
|
|
* pci/hotplug:
PCI: pciehp: Assume NoCompl+ for Thunderbolt ports
PCI: hotplug: Drop checking of PCI_BRIDGE_CONTROL in *_unconfigure_device()
|
|
* pci/enumeration:
RDMA/qedr: Use pci_enable_atomic_ops_to_root()
PCI: Add pci_enable_atomic_ops_to_root()
PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports
|
|
* pci/dt-resources:
PCI: Make of_irq_parse_pci() static
powerpc/pci: Use of_irq_parse_and_map_pci() helper
PCI: Move OF-related PCI functions into PCI core
|
|
* pci/dpc:
PCI/DPC: Reformat DPC register definitions
PCI/DPC: Add and use DPC Status register field definitions
PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
PCI/DPC: Remove unnecessary RP PIO register structs
PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
PCI/DPC: Make RP PIO log size check more generic
PCI/DPC: Rename local "status" to "dpc_status"
PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
PCI/DPC: Process RP PIO details only if RP PIO extensions supported
PCI/DPC: Read RP PIO Log Size once at probe
PCI/DPC: Rename struct dpc_dev.rp to rp_extensions
PCI/DPC: Add local variable for DPC capability offset
PCI/DPC: Rename interrupt_event_handler() to dpc_work()
PCI/DPC: Fix interrupt message number print
PCI/DPC: Enable DPC only if AER is available
PCI/DPC: Fix shared interrupt handling
|
|
* pci/dma:
PCI: Remove NULL device handling from PCI DMA API
net: tsi108: Use DMA API properly
media: ttusb-dec: Remove pci_zalloc_coherent() abuse
media: ttusb-budget: Remove pci_zalloc_coherent() abuse
|
|
* pci/deprecate-get-bus-and-slot:
video: fbdev: riva: deprecate pci_get_bus_and_slot()
video: fbdev: nvidia: deprecate pci_get_bus_and_slot()
video: fbdev: intelfb: deprecate pci_get_bus_and_slot()
openprom: Deprecate pci_get_bus_and_slot()
xen/pcifront: Deprecate pci_get_bus_and_slot()
PCI: Deprecate pci_get_bus_and_slot()
PCI: ibmphp: Deprecate pci_get_bus_and_slot()
PCI: cpqhp: Deprecate pci_get_bus_and_slot()
pch_gbe: Deprecate pci_get_bus_and_slot()
bnx2x: Deprecate pci_get_bus_and_slot()
powerpc/via-pmu: Deprecate pci_get_bus_and_slot()
iommu/amd: Deprecate pci_get_bus_and_slot()
sl82c105: deprecate pci_get_bus_and_slot()
drm/nouveau: deprecate pci_get_bus_and_slot()
drm/gma500: Deprecate pci_get_bus_and_slot()
ibft: Deprecate pci_get_bus_and_slot()
edd: Deprecate pci_get_bus_and_slot()
agp: sworks: Deprecate pci_get_bus_and_slot()
agp: nvidia: Deprecate pci_get_bus_and_slot()
ata: Deprecate pci_get_bus_and_slot()
x86/PCI: Deprecate pci_get_bus_and_slot()
powerpc/PCI: Deprecate pci_get_bus_and_slot()
alpha/PCI: Deprecate pci_get_bus_and_slot()
|
|
* pci/aspm:
PCI/ASPM: Unexport internal ASPM interfaces
PCI/ASPM: Enable Latency Tolerance Reporting when supported
PCI/ASPM: Calculate LTR_L1.2_THRESHOLD from device characteristics
|
|
* pci/aer:
PCI/AER: Return error if AER is not supported
PCI/AER: Skip recovery callbacks for correctable errors from ACPI APEI
|
|
Syzbot reported several deadlocks in the netfilter area caused by
rtnl lock and socket lock being acquired with a different order on
different code paths, leading to backtraces like the following one:
======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc9+ #212 Not tainted
------------------------------------------------------
syzkaller041579/3682 is trying to acquire lock:
(sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock
include/net/sock.h:1463 [inline]
(sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
but task is already holding lock:
(rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (rtnl_mutex){+.+.}:
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
find_check_entry.isra.7+0x935/0xcf0
net/ipv6/netfilter/ip6_tables.c:580
translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0
-> #0 (sk_lock-AF_INET6){+.+.}:
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
lock_sock include/net/sock.h:1463 [inline]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);
*** DEADLOCK ***
1 lock held by syzkaller041579/3682:
#0: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
The problem, as Florian noted, is that nf_setsockopt() is always
called with the socket held, even if the lock itself is required only
for very tight scopes and only for some operation.
This patch addresses the issues moving the lock_sock() call only
where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
does not need anymore to acquire both locks.
Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull 'immediate' feature removal from Miroslav Benes.
|
|
Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.
Here's syzbot's trace:
WARNING: bad unlock balance detected!
4.15.0-rc3+ #128 Not tainted
syzkaller971460/3195 is trying to release lock (mrt_lock) at:
[<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syzkaller971460/3195:
#0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
fs/seq_file.c:165
stack backtrace:
CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
__lock_release kernel/locking/lockdep.c:3775 [inline]
lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
__raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
_raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
traverse+0x3bc/0xa00 fs/seq_file.c:135
seq_read+0x96a/0x13d0 fs/seq_file.c:189
proc_reg_read+0xef/0x170 fs/proc/inode.c:217
do_loop_readv_writev fs/read_write.c:673 [inline]
do_iter_read+0x3db/0x5b0 fs/read_write.c:897
compat_readv+0x1bf/0x270 fs/read_write.c:1140
do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
C_SYSC_preadv fs/read_write.c:1209 [inline]
compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
RIP: 0023:0xf7f73c79
RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
BUG: sleeping function called from invalid context at lib/usercopy.c:25
in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
INFO: lockdep is turned off.
CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
__might_sleep+0x95/0x190 kernel/sched/core.c:6013
__might_fault+0xab/0x1d0 mm/memory.c:4525
_copy_to_user+0x2c/0xc0 lib/usercopy.c:25
copy_to_user include/linux/uaccess.h:155 [inline]
seq_read+0xcb4/0x13d0 fs/seq_file.c:279
proc_reg_read+0xef/0x170 fs/proc/inode.c:217
do_loop_readv_writev fs/read_write.c:673 [inline]
do_iter_read+0x3db/0x5b0 fs/read_write.c:897
compat_readv+0x1bf/0x270 fs/read_write.c:1140
do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
C_SYSC_preadv fs/read_write.c:1209 [inline]
compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
RIP: 0023:0xf7f73c79
RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
lib/usercopy.c:26
Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Blank help texts are probably either a typo, a Kconfig misunderstanding,
or some kind of half-committing to adding a help text (in which case a
TODO comment would be clearer, if the help text really can't be added
right away).
Best to remove them, IMO.
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add suffix LL to constant 1000 in order to give the compiler
complete information about the proper arithmetic to use. Notice
that this constant is used in a context that expects an expression
of type long long int (64 bits, signed).
The expression (band->burst_size + band->rate) * 1000 is currently
being evaluated using 32-bit arithmetic.
Addresses-Coverity-ID: 1461563 ("Unintentional integer overflow")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add suffix ULL to constant 80000 in order to avoid a potential integer
overflow and give the compiler complete information about the proper
arithmetic to use. Notice that this constant is used in a context that
expects an expression of type u64.
The current cast to u64 effectively applies to the whole expression
as an argument of type u64 to be passed to div64_u64, but it does
not prevent it from being evaluated using 32-bit arithmetic instead
of 64-bit arithmetic.
Also, once the expression is properly evaluated using 64-bit arithmentic,
there is no need for the parentheses and the external cast to u64.
Addresses-Coverity-ID: 1357588 ("Unintentional integer overflow")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Driver check the wrong register bit in rtl_ocp_tx_cond() that keep driver
waiting until timeout.
Fix this by waiting for the right register bit.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Quectel EP06 is a Cat. 6 LTE modem. It uses the same interface as
the EC20/EC25 for QMI, and requires the same "set DTR"-quirk to work.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Backwards Compatibility:
If userspace wants to determine whether RTM_NEWLINK supports the
IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
does not include IFLA_IF_NETNSID userspace should assume that
IFLA_IF_NETNSID is not supported on this kernel.
If the reply does contain an IFLA_IF_NETNSID property userspace
can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive
EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
with RTM_NEWLINK. Userpace should then fallback to other means.
- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull assorted small fixes queued for merge window.
|
|
|
|
Pull Wacom device driver updates. These don't have to go on top of the
hid_have_special_driver[] revamp, as the whole group is assumed to
have a special driver based on VID.
|
|
Pull hid-elo device detection fix
|
|
'for-4.16/hid-quirks-cleanup/elecom', 'for-4.16/hid-quirks-cleanup/ish', 'for-4.16/hid-quirks-cleanup/multitouch', 'for-4.16/hid-quirks-cleanup/pixart', 'for-4.16/hid-quirks-cleanup/rmi', 'for-4.16/hid-quirks-cleanup/sony' and 'for-4.16/hid-quirks-cleanup/toshiba' into for-linus
Pull assorted device driver fixes (ASUS, Elecom, Intel-ISH, Multitouch, PixArt, RMI,
Sony and Toshiba) based on top the hid-quirks revamp.
|
|
This series from Benjamin Tissoires finally removes one of the big PITAs
in the hid-core, which is the absolute need of having added all the new
device IDs into the horrid hid_have_special_driver[]
|
|
Commit 136e92bbec0a switched local_nodes from an array to a bitmask
but did not add proper bounds checks. As the result
clusterip_config_init_nodelist() can both over-read
ipt_clusterip_tgt_info.local_nodes and over-write
clusterip_config.local_nodes.
Add bounds checks for both.
Fixes: 136e92bbec0a ("[NETFILTER] CLUSTERIP: use a bitmap to store node responsibility data")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Several netfilter matches and targets put kernel pointers into
info objects, but don't set usersize in descriptors.
This leads to kernel pointer leaks if a match/target is set
and then read back to userspace.
Properly set usersize for these matches/targets.
Found with manual code inspection.
Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fix wraparound bug which could lead to memory exhaustion when adding an
x.x.x.x-255.255.255.255 range to any hash:*net* types.
Fixes Netfilter's bugzilla id #1212, reported by Thomas Schwark.
Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Changes for v4.16
The changes for this version include icache invalidation optimizations
(improving VM startup time), support for forwarded level-triggered
interrupts (improved performance for timers and passthrough platform
devices), a small fix for power-management notifiers, and some cosmetic
changes.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: update maintainers
|
|
This patch adds support to the Cadence PCIe controller in endpoint mode.
Since pieces of source code are shared with the host driver (Root
Complex mode), we create a new directory under drivers/pci dedicated to
the Cadence PCIe controller. The common code is placed into
drivers/pci/cadence/pcie-cadence.c and used by both the host and
endpoint controller drivers.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
This patch documents the DT bindings for the Cadence PCIe controller
when configured in endpoint mode.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
|
|
Fix the pci_epf_make() function so it can now bind many EPF devices to the
same EPF driver.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
This patch updates the prototype of most handlers from 'struct
pci_epc_ops' so the EPC library can now support multi-function devices.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
This patch adds support to the Cadence PCIe controller in host mode.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|