Age | Commit message (Collapse) | Author |
|
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.
Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.
With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.
Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
|
|
Currently we just return err here, but we need to put the fd reference
first.
Fixes: 94073ad77fff (fs/locks: don't mess with the address limit in compat_fcntl64)
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|
syzbot easily found a regression added in our latest patches [1]
No longer set tp->highest_sack to the head of the send queue since
this is not logical and error prone.
Only sack processing should maintain the pointer to an skb from rtx queue.
We might in the future only remember the sequence instead of a pointer to skb,
since rb-tree should allow a fast lookup.
[1]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1706 [inline]
BUG: KASAN: use-after-free in tcp_ack+0x42bb/0x4fd0 net/ipv4/tcp_input.c:3537
Read of size 4 at addr ffff8801c154faa8 by task syz-executor4/12860
CPU: 0 PID: 12860 Comm: syz-executor4 Not tainted 4.14.0-next-20171113+ #41
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
tcp_highest_sack_seq include/net/tcp.h:1706 [inline]
tcp_ack+0x42bb/0x4fd0 net/ipv4/tcp_input.c:3537
tcp_rcv_established+0x672/0x18a0 net/ipv4/tcp_input.c:5439
tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1468
sk_backlog_rcv include/net/sock.h:909 [inline]
__release_sock+0x124/0x360 net/core/sock.c:2264
release_sock+0xa4/0x2a0 net/core/sock.c:2778
tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
sock_sendmsg_nosec net/socket.c:632 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:642
___sys_sendmsg+0x75b/0x8a0 net/socket.c:2048
__sys_sendmsg+0xe5/0x210 net/socket.c:2082
SYSC_sendmsg net/socket.c:2093 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2089
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x452879
RSP: 002b:00007fc9761bfbe8 EFLAGS: 00000212 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452879
RDX: 0000000000000000 RSI: 0000000020917fc8 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000212 R12: 00000000006ee3a0
R13: 00000000ffffffff R14: 00007fc9761c06d4 R15: 0000000000000000
Allocated by task 12860:
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
kmem_cache_alloc_node+0x144/0x760 mm/slab.c:3638
__alloc_skb+0xf1/0x780 net/core/skbuff.c:193
alloc_skb_fclone include/linux/skbuff.h:1023 [inline]
sk_stream_alloc_skb+0x11d/0x900 net/ipv4/tcp.c:870
tcp_sendmsg_locked+0x1341/0x3b80 net/ipv4/tcp.c:1299
tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1461
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
sock_sendmsg_nosec net/socket.c:632 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:642
SYSC_sendto+0x358/0x5a0 net/socket.c:1749
SyS_sendto+0x40/0x50 net/socket.c:1717
entry_SYSCALL_64_fastpath+0x1f/0x96
Freed by task 12860:
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3492 [inline]
kmem_cache_free+0x77/0x280 mm/slab.c:3750
kfree_skbmem+0xdd/0x1d0 net/core/skbuff.c:603
__kfree_skb+0x1d/0x20 net/core/skbuff.c:642
sk_wmem_free_skb include/net/sock.h:1419 [inline]
tcp_rtx_queue_unlink_and_free include/net/tcp.h:1682 [inline]
tcp_clean_rtx_queue net/ipv4/tcp_input.c:3111 [inline]
tcp_ack+0x1b17/0x4fd0 net/ipv4/tcp_input.c:3593
tcp_rcv_established+0x672/0x18a0 net/ipv4/tcp_input.c:5439
tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1468
sk_backlog_rcv include/net/sock.h:909 [inline]
__release_sock+0x124/0x360 net/core/sock.c:2264
release_sock+0xa4/0x2a0 net/core/sock.c:2778
tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
sock_sendmsg_nosec net/socket.c:632 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:642
___sys_sendmsg+0x75b/0x8a0 net/socket.c:2048
__sys_sendmsg+0xe5/0x210 net/socket.c:2082
SYSC_sendmsg net/socket.c:2093 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2089
entry_SYSCALL_64_fastpath+0x1f/0x96
The buggy address belongs to the object at ffff8801c154fa80
which belongs to the cache skbuff_fclone_cache of size 456
The buggy address is located 40 bytes inside of
456-byte region [ffff8801c154fa80, ffff8801c154fc48)
The buggy address belongs to the page:
page:ffffea00070553c0 count:1 mapcount:0 mapping:ffff8801c154f080 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801c154f080 0000000000000000 0000000100000006
raw: ffffea00070a5a20 ffffea0006a18360 ffff8801d9ca0500 0000000000000000
page dumped because: kasan: bad access detected
Fixes: 737ff314563c ("tcp: use sequence distance to detect reordering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
geneve->sock4/6 were added with geneve_open and released with geneve_stop.
So when geneve link down, we will not able to show remote address and
checksum info after commit 11387fe4a98 ("geneve: fix fill_info when using
collect_metadata").
Fix this by avoid passing *_REMOTE{,6} for COLLECT_METADATA since they are
mutually exclusive, and always show UDP_ZERO_CSUM6_RX info.
Fixes: 11387fe4a98 ("geneve: fix fill_info when using collect_metadata")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
(&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300
and this task is already holding:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
x868/0x1240
which would create a new lock dependency:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
... which became SOFTIRQ-irq-safe at:
[<ffffffff9db5931b>] __lock_acquire+0x42b/0x1f10
[<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
[<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
[<ffffffff9e135848>] __dev_queue_xmit+0x868/0x1240
[<ffffffff9e136240>] dev_queue_xmit+0x10/0x20
[<ffffffff9e1965d9>] ip_finish_output2+0x439/0x590
[<ffffffff9e197410>] ip_finish_output+0x150/0x2f0
[<ffffffff9e19886d>] ip_output+0x7d/0x260
[<ffffffff9e19789e>] ip_local_out+0x5e/0xe0
[<ffffffff9e197b25>] ip_queue_xmit+0x205/0x620
[<ffffffff9e1b8398>] tcp_transmit_skb+0x5a8/0xcb0
[<ffffffff9e1ba152>] tcp_write_xmit+0x242/0x1070
[<ffffffff9e1baffc>] __tcp_push_pending_frames+0x3c/0xf0
[<ffffffff9e1b3472>] tcp_rcv_established+0x312/0x700
[<ffffffff9e1c1acc>] tcp_v4_do_rcv+0x11c/0x200
[<ffffffff9e1c3dc2>] tcp_v4_rcv+0xaa2/0xc30
[<ffffffff9e191107>] ip_local_deliver_finish+0xa7/0x240
[<ffffffff9e191a36>] ip_local_deliver+0x66/0x200
[<ffffffff9e19137d>] ip_rcv_finish+0xdd/0x560
[<ffffffff9e191e65>] ip_rcv+0x295/0x510
[<ffffffff9e12ff88>] __netif_receive_skb_core+0x988/0x1020
[<ffffffff9e130641>] __netif_receive_skb+0x21/0x70
[<ffffffff9e1306ff>] process_backlog+0x6f/0x230
[<ffffffff9e132129>] net_rx_action+0x229/0x420
[<ffffffff9da07ee8>] __do_softirq+0xd8/0x43d
[<ffffffff9e282bcc>] do_softirq_own_stack+0x1c/0x30
[<ffffffff9dafc2f5>] do_softirq+0x55/0x60
[<ffffffff9dafc3a8>] __local_bh_enable_ip+0xa8/0xb0
[<ffffffff9db4c727>] cpu_startup_entry+0x1c7/0x500
[<ffffffff9daab333>] start_secondary+0x113/0x140
to a SOFTIRQ-irq-unsafe lock:
(&head->lock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
... [<ffffffff9db5971f>] __lock_acquire+0x82f/0x1f10
[<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
[<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
[<ffffffff9dc0b7fa>] pcpu_freelist_pop+0x7a/0xb0
[<ffffffff9dc08b2c>] htab_map_alloc+0x50c/0x5f0
[<ffffffff9dc00dc5>] SyS_bpf+0x265/0x1200
[<ffffffff9e28195f>] entry_SYSCALL_64_fastpath+0x12/0x17
other info that might help us debug this:
Chain exists of:
dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&head->lock);
local_irq_disable();
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
*** DEADLOCK ***
Fixes: e19494edab82 ("bpf: introduce percpu_freelist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The GetNtbFormat and SetNtbFormat requests operate on 16 bit little
endian values. We get away with ignoring this most of the time, because
we only care about USB_CDC_NCM_NTB16_FORMAT which is 0x0000. This
fails for USB_CDC_NCM_NTB32_FORMAT.
Fix comparison between LE value from device and constant by converting
the constant to LE.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Fixes: 2b02c20ce0c2 ("cdc_ncm: Set NTB format again after altsetting switch for Huawei devices")
Cc: Enrico Mioso <mrkiko.rs@gmail.com>
Cc: Christian Panton <christian@panton.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-By: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- High resolution mode for DEll canvas support, from Benjamin Tissoires
- A lot of improvements to pen handling in the Wacom driver, from Jason Gerecke
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
- usbhid: conversion to timer_setup() and from_timer() from Kees Cook
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
- cp2112: GPIO error handling and Kconfig fixes from Sébastien Szymanski
- i2c-hid: fixup / quirk for Apollo-Lake based laptops, from Hans de Goede
- Input/Core: add eraser tool support, from Ping Cheng
- small assorted code fixes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
- SHANWAN PS3 rumble fix from Bastien Nocera
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
- make sure that we forward MSC_TIMESTAMP in accordance to the specification,
from Nicolas Boichat
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
- small code fixes for Logitech driver from Colin Ian King
|
|
- trivial printk() line termination fix for HyperV
|
|
- Asus laptop fixes (fn keys, backlight), from Mustafa Kuscu and
Maxime Bellengé
|
|
- New ALPS touchpad (T4, found currently on HP EliteBook 1000, Zbook Stduio
and HP Elite book x360) support from Masaki Ota
|
|
- Wacom: recognize PEN application collection properly, from Jason Gerecke
- RMI: avoid cofusion caused by RMI functions being by mistake called on
non-RMI devices, from Andrew Duggan
- small device-ID-specific quirks/fixes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
This pulls in an infrastructure/API that allows livepatch writers to
register pre-patch and post-patch callbacks that allow for running a
glue code necessary for finalizing the patching if necessary.
Conflicts:
kernel/livepatch/core.c
- trivial conflict by adding a callback call into
module going notifier vs. moving that code block
to klp_cleanup_module_patches_limited()
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Shadow variables allow callers to associate new shadow fields to existing data
structures. This is intended to be used by livepatch modules seeking to
emulate additions to data structure definitions.
|
|
Don't crash in case of allocation failure in dax_alloc_inode.
syzkaller hit the following crash on e4880bc5dfb1
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
[..]
RIP: 0010:dax_alloc_inode+0x3b/0x70 drivers/dax/super.c:348
Call Trace:
alloc_inode+0x65/0x180 fs/inode.c:208
new_inode_pseudo+0x69/0x190 fs/inode.c:890
new_inode+0x1c/0x40 fs/inode.c:919
mount_pseudo_xattr+0x288/0x560 fs/libfs.c:261
mount_pseudo include/linux/fs.h:2137 [inline]
dax_mount+0x2e/0x40 drivers/dax/super.c:388
mount_fs+0x66/0x2d0 fs/super.c:1223
Cc: <stable@vger.kernel.org>
Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider...")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
In case of error returned by 'q6v5_xfer_mem_ownership', we must free
some resources before returning.
In 'q6v5_mpss_init_image()', add a new label to undo a previous
'dma_alloc_attrs()'.
In 'q6v5_mpss_load()', re-use the already existing error handling code to
undo a previous 'request_firmware()', as already done in the other error
handling paths of the function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
The qcom_glink_native driver is missing a MODULE_LICENSE(), correct
this.
Fixes: 835764ddd9af ("rpmsg: glink: Move the common glink protocol implementation to glink_native.c")
Cc: stable@vger.kernel.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
This reverts commit c9f3f813d462c72dbe412cee6a5cbacf13c4ad5e.
This commit breaks transport mode when the policy template
has widlcard addresses configured, so revert it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
For a PUD hugepage entry, we need to propagate bits [32:22]
from virtual address to resolve at 4M granularity. However,
the current code was incorrectly propagating bits [29:19].
This bug can cause incorrect data to be returned for pages
backed with 16G hugepages.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Switch to using the new timer_setup() and from_timer()
in LDOM Virtual I/O handshake.
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable mdesc_handle.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a static variable to hold timeout
value.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geliang Tang <geliangtang@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vijay Kumar says:
====================
sparc64: Optimize fls and __fls
SPARC provides lzcnt instruction (with VIS3) which can be used to
optimize fls, __fls and fls64 functions. For the systems that supports
lzcnt instruction, we now do boot time patching to use sparc
optimized fls, __fls and fls64 functions.
v3->v4:
- Fixed a typo.
v2->v3:
- Using ENTRY(), ENDPROC() for assembler functions.
- Removed BITS_PER_LONG from __fls.
- Using generic fls64().
- Replaced lzcnt instruction with .word directive.
v1->v2:
- Fixed delay slot issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For T4 and above, patch fls and __fls functions
at the boot time to use lzcnt instruction.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defined SPARC optimized __fls using lzcnt opcode.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Defined SPARC optimized fls using lzcnt opcode.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following patch is based on work done by Nick Alcock on 64-bit vDSO for sparc
in Oracle linux. I have extended it to include support for 32-bit vDSO for sparc
on 64-bit kernel.
vDSO for sparc is based on the X86 implementation. This patch
provides vDSO support for both 64-bit and 32-bit programs on 64-bit kernel.
vDSO will be disabled on 32-bit linux kernel on sparc.
*) vclock_gettime.c contains all the vdso functions. Since data page is mapped
before the vdso code page, the pointer to data page is got by subracting offset
from an address in the vdso code page. The return address stored in
%i7 is used for this purpose.
*) During compilation, both 32-bit and 64-bit vdso images are compiled and are
converted into raw bytes by vdso2c program to be ready for mapping into the
process. 32-bit images are compiled only if CONFIG_COMPAT is enabled. vdso2c
generates two files vdso-image-64.c and vdso-image-32.c which contains the
respective vDSO image in C structure.
*) During vdso initialization, required number of vdso pages are allocated and
raw bytes are copied into the pages.
*) During every exec, these pages are mapped into the process through
arch_setup_additional_pages and the location of mapping is passed on to the
process through aux vector AT_SYSINFO_EHDR which is used by glibc.
*) A new update_vsyscall routine for sparc is added to keep the data page in
vdso updated.
*) As vDSO cannot contain dynamically relocatable references, a new version of
cpu_relax is added for the use of vDSO.
This change also requires a putback to glibc to use vDSO. For testing,
programs planning to try vDSO can be compiled against the generated
vdso(64/32).so in the source.
Testing:
========
[root@localhost ~]# cat vdso_test.c
int main() {
struct timespec tv_start, tv_end;
struct timeval tv_tmp;
int i;
int count = 1 * 1000 * 10000;
long long diff;
clock_gettime(0, &tv_start);
for (i = 0; i < count; i++)
gettimeofday(&tv_tmp, NULL);
clock_gettime(0, &tv_end);
diff = (long long)(tv_end.tv_sec -
tv_start.tv_sec)*(1*1000*1000*1000);
diff += (tv_end.tv_nsec - tv_start.tv_nsec);
printf("Start sec: %d\n", tv_start.tv_sec);
printf("End sec : %d\n", tv_end.tv_sec);
printf("%d cycles in %lld ns = %f ns/cycle\n", count, diff,
(double)diff / (double)count);
return 0;
}
[root@localhost ~]# cc vdso_test.c -o t32_without_fix -m32 -lrt
[root@localhost ~]# ./t32_without_fix
Start sec: 1502396130
End sec : 1502396140
10000000 cycles in 9565148528 ns = 956.514853 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t32_with_fix -m32 ./vdso32.so.dbg
[root@localhost ~]# ./t32_with_fix
Start sec: 1502396168
End sec : 1502396169
10000000 cycles in 798141262 ns = 79.814126 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t64_without_fix -m64 -lrt
[root@localhost ~]# ./t64_without_fix
Start sec: 1502396208
End sec : 1502396218
10000000 cycles in 9846091800 ns = 984.609180 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t64_with_fix -m64 ./vdso64.so.dbg
[root@localhost ~]# ./t64_with_fix
Start sec: 1502396257
End sec : 1502396257
10000000 cycles in 380984048 ns = 38.098405 ns/cycle
V1 to V2 Changes:
=================
Added hot patching code to switch the read stick instruction to read
tick instruction based on the hardware.
V2 to V3 Changes:
=================
Merged latest changes from sparc-next and moved the initialization
of clocksource_tick.archdata.vclock_mode to time_init_early. Disabled
queued spinlock and rwlock configuration when simulating 32-bit config
to compile 32-bit VDSO.
V3 to V4 Changes:
=================
Hardcoded the page size as 8192 in linker script for both 64-bit and
32-bit binaries. Removed unused variables in vdso2c.h. Added -mv8plus flag to
Makefile to prevent the generation of relocation entries for __lshrdi3 in 32-bit
vdso binary.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It seems that the intention of the code is to null check the value
returned by function genlmsg_put. But the current code is null
checking the address of the pointer that holds the value returned
by genlmsg_put.
Fix this by properly null checking the value returned by function
genlmsg_put in order to avoid a pontential null pointer dereference.
Addresses-Coverity-ID: 1461561 ("Dereference before null check")
Addresses-Coverity-ID: 1461562 ("Dereference null return value")
Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stephen Hemminger says:
====================
netem: fix compilation on 32 bit
A couple of places where 64 bit CPU was being assumed incorrectly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix compilation on 32 bit platforms (where doing modulus operation
with 64 bit requires extra glibc functions) by truncation.
The jitter for table distribution is limited to a 32 bit value
because random numbers are scaled as 32 bit value.
Also fix some whitespace.
Fixes: 99803171ef04 ("netem: add uapi to express delay and jitter in nanoseconds")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since times are now expressed in nanosecond, need to now do
true 64 bit divide. Old code would truncate rate at 32 bits.
Rename function to better express current usage.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make default TCP default congestion control to a per namespace
value. This changes default congestion control to a pointer to congestion ops
(rather than implicit as first element of available lsit).
The congestion control setting of new namespaces is inherited
from the current setting of the root namespace.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is at least unlocked deletion of net->ipv4.fib_notifier_ops
from net::fib_notifier_ops:
ip_fib_net_exit()
rtnl_unlock()
fib4_notifier_exit()
fib_notifier_ops_unregister(net->ipv4.notifier_ops)
list_del_rcu(&ops->list)
So fib_seq_sum() can't use rtnl_lock() only for protection.
The possible solution could be to use rtnl_lock()
in fib_notifier_ops_unregister(), but this adds
a possible delay during net namespace creation,
so we better use rcu_read_lock() till someone
really needs the mutex (if that happens).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag
is also taken into account (default value is 1). If either global or
per-interface flag is non-zero, DAD will be enabled on a given interface.
This is not backward compatible: before those patches, the user could
disable DAD just by setting the per-interface flag to 0. Now, the
user instead needs to set both flags to 0 to actually disable DAD.
Restore the previous behaviour by setting the default for the global
'accept_dad' flag to 0. This way, DAD is still enabled by default,
as per-interface flags are set to 1 on device creation, but setting
them to 0 is enough to disable DAD on a given interface.
- Before 35e015e1f57a7 and a2d3f3e33853:
global per-interface DAD enabled
[default] 1 1 yes
X 0 no
X 1 yes
- After 35e015e1f577 and a2d3f3e33853:
global per-interface DAD enabled
[default] 1 1 yes
0 0 no
0 1 yes
1 0 yes
- After this fix:
global per-interface DAD enabled
1 1 yes
0 0 no
[default] 0 1 yes
1 0 yes
Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real")
CC: Stefano Brivio <sbrivio@redhat.com>
CC: Matteo Croce <mcroce@redhat.com>
CC: Erik Kline <ek@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move inclusion of a private kernel header <net/tcp.h>
from uapi/linux/tls.h to its only user - net/tls.h,
to fix the following linux/tls.h userspace compilation error:
/usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory
As to this point uapi/linux/tls.h was totaly unusuable for userspace,
cleanup this header file further by moving other redundant includes
to net/tls.h.
Fixes: 3c4d7559159b ("tls: kernel TLS support")
Cc: <stable@vger.kernel.org> # v4.13+
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
iOS devices require the host to be "trusted" before servicing network
packets. Establishing trust requires the user to confirm a dialog on the
iOS device.Until trust is established, the iOS device will silently discard
network packets from the host. Currently, the ipheth driver does not detect
whether an iOS device has established trust with the host, and immediately
sets up the transmit queues.
This causes the following problems:
- Kernel taint due to WARN() in netdev watchdog.
- Dmesg spam ("TX timeout").
- Disruption of user space networking activity (dhcpd, etc...) when new
interface comes up but cannot be used.
- Unnecessary host and device wakeups and USB traffic
Example dmesg output:
[ 1101.319778] NETDEV WATCHDOG: eth1 (ipheth): transmit queue 0 timed out
[ 1101.319817] ------------[ cut here ]------------
[ 1101.319828] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x20f/0x220
[ 1101.319831] Modules linked in: ipheth usbmon nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) iwlmvm mac80211 iwlwifi btusb btrtl btbcm btintel qmi_wwan bluetooth cfg80211 ecdh_generic thinkpad_acpi rfkill [last unloaded: ipheth]
[ 1101.319861] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 4.13.12.1 #1
[ 1101.319864] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W (1.35 ) 11/10/2016
[ 1101.319867] task: ffffffff81e11500 task.stack: ffffffff81e00000
[ 1101.319873] RIP: 0010:dev_watchdog+0x20f/0x220
[ 1101.319876] RSP: 0018:ffff8810a3c03e98 EFLAGS: 00010292
[ 1101.319880] RAX: 000000000000003a RBX: 0000000000000000 RCX: 0000000000000000
[ 1101.319883] RDX: ffff8810a3c15c48 RSI: ffffffff81ccbfc2 RDI: 00000000ffffffff
[ 1101.319886] RBP: ffff880c04ebc41c R08: 0000000000000000 R09: 0000000000000379
[ 1101.319889] R10: 00000100696589d0 R11: 0000000000000378 R12: ffff880c04ebc000
[ 1101.319892] R13: 0000000000000000 R14: 0000000000000001 R15: ffff880c2865fc80
[ 1101.319896] FS: 0000000000000000(0000) GS:ffff8810a3c00000(0000) knlGS:0000000000000000
[ 1101.319899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1101.319902] CR2: 00007f3ff24ac000 CR3: 0000000001e0a000 CR4: 00000000003406f0
[ 1101.319905] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1101.319908] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1101.319910] Call Trace:
[ 1101.319914] <IRQ>
[ 1101.319921] ? dev_graft_qdisc+0x70/0x70
[ 1101.319928] ? dev_graft_qdisc+0x70/0x70
[ 1101.319934] ? call_timer_fn+0x2e/0x170
[ 1101.319939] ? dev_graft_qdisc+0x70/0x70
[ 1101.319944] ? run_timer_softirq+0x1ea/0x440
[ 1101.319951] ? timerqueue_add+0x54/0x80
[ 1101.319956] ? enqueue_hrtimer+0x38/0xa0
[ 1101.319963] ? __do_softirq+0xed/0x2e7
[ 1101.319970] ? irq_exit+0xb4/0xc0
[ 1101.319976] ? smp_apic_timer_interrupt+0x39/0x50
[ 1101.319981] ? apic_timer_interrupt+0x8c/0xa0
[ 1101.319983] </IRQ>
[ 1101.319992] ? cpuidle_enter_state+0xfa/0x2a0
[ 1101.319999] ? do_idle+0x1a3/0x1f0
[ 1101.320004] ? cpu_startup_entry+0x5f/0x70
[ 1101.320011] ? start_kernel+0x444/0x44c
[ 1101.320017] ? early_idt_handler_array+0x120/0x120
[ 1101.320023] ? x86_64_start_kernel+0x145/0x154
[ 1101.320028] ? secondary_startup_64+0x9f/0x9f
[ 1101.320033] Code: 20 04 00 00 eb 9f 4c 89 e7 c6 05 59 44 71 00 01 e8 a7 df fd ff 89 d9 4c 89 e6 48 c7 c7 70 b7 cd 81 48 89 c2 31 c0 e8 97 64 90 ff <0f> ff eb bf 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[ 1101.320103] ---[ end trace 0cc4d251e2b57080 ]---
[ 1101.320110] ipheth 1-5:4.2: ipheth_tx_timeout: TX timeout
The last message "TX timeout" is repeated every 5 seconds until trust is
established or the device is disconnected, filling up dmesg.
The proposed patch eliminates the problem by, upon connection, keeping the
TX queue and carrier disabled until a packet is first received from the iOS
device. This is reflected by the confirmed_pairing variable in the device
structure. Only after at least one packet has been received from the iOS
device, the transmit queue and carrier are brought up during the periodic
device poll in ipheth_carrier_set. Because the iOS device will always send
a packet immediately upon trust being established, this should not delay
the interface becoming useable. To prevent failed UBRs in
ipheth_rcvbulk_callback from perpetually re-enabling the queue if it was
disabled, a new check is added so only successful transfers re-enable the
queue, whereas failed transfers only trigger an immediate poll.
This has the added benefit of removing the periodic control requests to the
iOS device until trust has been established and thus should reduce wakeup
events on both the host and the iOS device.
Signed-off-by: Alexander Kappner <agk@godking.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We always poll tx for socket, this is sub optimal since this will
slightly increase the waitqueue traversing time and more important,
vhost could not benefit from commit 9e641bdcfa4e ("net-tun:
restructure tun_do_read for better sleep/wakeup efficiency") even if
we've stopped rx polling during handle_rx(), tx poll were still left
in the waitqueue.
Pktgen from a remote host to VM over mlx4 on two 2.00GHz Xeon E5-2650
shows 11.7% improvements on rx PPS. (from 1.28Mpps to 1.44Mpps)
Cc: Wei Xu <wexu@redhat.com>
Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Consistently use types provided by <linux/types.h> to fix the following
linux/rxrpc.h userspace compilation errors:
/usr/include/linux/rxrpc.h:24:2: error: unknown type name 'u16'
u16 srx_service; /* service desired */
/usr/include/linux/rxrpc.h:25:2: error: unknown type name 'u16'
u16 transport_type; /* type of transport socket (SOCK_DGRAM) */
/usr/include/linux/rxrpc.h:26:2: error: unknown type name 'u16'
u16 transport_len; /* length of transport address */
Use __kernel_sa_family_t instead of sa_family_t the same way
as uapi/linux/in.h does, to fix the following
linux/rxrpc.h userspace compilation errors:
/usr/include/linux/rxrpc.h:23:2: error: unknown type name 'sa_family_t'
sa_family_t srx_family; /* address family */
/usr/include/linux/rxrpc.h:28:3: error: unknown type name 'sa_family_t'
sa_family_t family; /* transport address family */
Fixes: 727f8914477e ("rxrpc: Expose UAPI definitions to userspace")
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PMD faults on a zero length file on a file system mounted with -o dax
will not generate SIGBUS as expected.
fd = open(...O_TRUNC);
addr = mmap(NULL, 2*1024*1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*addr = 'a';
<expect SIGBUS>
The problem is this code in dax_iomap_pmd_fault:
max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;
If the inode size is zero, we end up with a max_pgoff that is way larger
than 0. :) Fix it by using DIV_ROUND_UP, as is done elsewhere in the
kernel.
I tested this with some simple test code that ensured that SIGBUS was
received where expected.
Cc: <stable@vger.kernel.org>
Fixes: 642261ac995e ("dax: add struct iomap based DAX PMD support")
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Recently we added a CPU feature for Power9 DD2.0, to capture the fact
that some workarounds are required only on Power9 DD1 and DD2.0 but
not DD2.1 or later.
Then in commit 9d2f510a66ec ("powerpc/64s/idle: avoid POWER9 DD1 and
DD2.0 ERAT workaround on DD2.1") and commit e3646330cf66
"powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on
DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg:
BEGIN_FTR_SECTION
PPC_INVALIDATE_ERAT
END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)
Unfortunately although this reads as "if set DD1 or DD2.0", the or is
a bitwise or and actually generates a mask of both bits. The code that
does the feature patching then checks that the value of the CPU
features masked with that mask are equal to the mask.
So the end result is we're checking for DD1 and DD20 being set, which
never happens. Yes the API is terrible.
Removing the ERAT workaround on DD2.0 results in random SEGVs, the
system tends to boot, but things randomly die including sometimes
dhclient, udev etc.
To fix the problem and hopefully avoid it in future, we remove the
DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This
allows us to easily express that the workarounds are required if DD2.1
is not set.
At some point we will drop the DD1 workarounds entirely and some of
this can be cleaned up.
Fixes: 9d2f510a66ec ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1")
Fixes: e3646330cf66 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
BFQ currently creates, and updates, its own instance of the whole
set of blkio statistics that cfq creates. Yet, from the comments
of Tejun Heo in [1], it turned out that most of these statistics
are meant/useful only for debugging. This commit makes BFQ create
the latter, debugging statistics only if the option
CONFIG_DEBUG_BLK_CGROUP is set.
By doing so, this commit also enables BFQ to enjoy a high perfomance
boost. The reason is that, if CONFIG_DEBUG_BLK_CGROUP is not set, then
BFQ has to update far fewer statistics, and, in particular, not the
heaviest to update. To give an idea of the benefits, if
CONFIG_DEBUG_BLK_CGROUP is not set, then, on an Intel i7-4850HQ, and
with 8 threads doing random I/O in parallel on null_blk (configured
with 0 latency), the throughput of BFQ grows from 310 to 400 KIOPS
(+30%). We have measured similar or even much higher boosts with other
CPUs: e.g., +45% with an ARM CortexTM-A53 Octa-core. Our results have
been obtained and can be reproduced very easily with the script in [1].
[1] https://www.spinics.net/lists/linux-block/msg18943.html
Suggested-by: Tejun Heo <tj@kernel.org>
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bfq invokes various blkg_*stats_* functions to update the statistics
contained in the special files blkio.bfq.* in the blkio controller
groups, i.e., the I/O accounting related to the proportional-share
policy provided by bfq. The execution of these functions takes a
considerable percentage, about 40%, of the total per-request execution
time of bfq (i.e., of the sum of the execution time of all the bfq
functions that have to be executed to process an I/O request from its
creation to its destruction). This reduces the request-processing
rate sustainable by bfq noticeably, even on a multicore CPU. In fact,
the bfq functions that invoke blkg_*stats_* functions cannot be
executed in parallel with the rest of the code of bfq, because both
are executed under the same same per-device scheduler lock.
To reduce this slowdown, this commit moves, wherever possible, the
invocation of these functions (more precisely, of the bfq functions
that invoke blkg_*stats_* functions) outside the critical sections
protected by the scheduler lock.
With this change, and with all blkio.bfq.* statistics enabled, the
throughput grows, e.g., from 250 to 310 KIOPS (+25%) on an Intel
i7-4850HQ, in case of 8 threads doing random I/O in parallel on
null_blk, with the latter configured with 0 latency. We obtained the
same or higher throughput boosts, up to +30%, with other processors
(some figures are reported in the documentation). For our tests, we
used the script [1], with which our results can be easily reproduced.
NOTE. This commit still protects the invocation of blkg_*stats_*
functions with the request_queue lock, because the group these
functions are invoked on may otherwise disappear before or while these
functions are executed. Fortunately, tests without even this lock
show, by difference, that the serialization caused by this lock has a
little impact (at most ~5% of throughput reduction).
[1] https://github.com/Algodev-github/IOSpeed
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bfqg_stats_update_io_add and bfqg_stats_update_io_remove are to be
invoked, respectively, when an I/O request enters and when an I/O
request exits the scheduler. Unfortunately, bfq does not fully comply
with this scheme, because it does not invoke these functions for
requests that are inserted into or extracted from its priority
dispatch list. This commit fixes this mistake.
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have investigated more deeply the performance of BFQ, in terms of
number of IOPS that can be processed by the CPU when BFQ is used as
I/O scheduler. In more detail, using the script [1], we have measured
the number of IOPS reached on top of a null block device configured
with zero latency, as a function of the workload (sequential read,
sequential write, random read, random write) and of the system (we
considered desktops, laptops and embedded systems).
Basing on the resulting figures, with this commit we update the
current, conservative IOPS range reported in BFQ documentation. In
particular, the documentation now reports, for each of three different
systems, the lowest number of IOPS obtained for that system with the
above test (namely, the value obtained with the workload leading to
the lowest IOPS).
[1] https://github.com/Algodev-github/IOSpeed
Reviewed-by: Lee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|