Age | Commit message (Collapse) | Author |
|
Proper support of indirect stack access with variable offset in
unprivileged mode (!root) requires corresponding support in Spectre
masking for stack ALU in retrieve_ptr_limit().
There are no use-case for variable offset in unprivileged mode though so
make verifier reject such accesses for simplicity.
Pointer arithmetics is one (and only?) way to cause variable offset and
it's already rejected in unpriv mode so that verifier won't even get to
helper function whose argument contains variable offset, e.g.:
0: (7a) *(u64 *)(r10 -16) = 0
1: (7a) *(u64 *)(r10 -8) = 0
2: (61) r2 = *(u32 *)(r1 +0)
3: (57) r2 &= 4
4: (17) r2 -= 16
5: (0f) r2 += r10
variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1R2
stack pointer arithmetic goes out of range, prohibited for !root
Still it looks like a good idea to reject variable offset indirect stack
access for unprivileged mode in check_stack_boundary() explicitly.
Fixes: 2011fccfb61b ("bpf: Support variable offset stack access from helpers")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Test that verifier rejects indirect access to uninitialized stack with
variable offset.
Example of output:
# ./test_verifier
...
#859/p indirect variable-offset stack access, uninitialized OK
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
It's hard to guarantee that whole memory is marked as initialized on
helper return if uninitialized stack is accessed with variable offset
since specific bounds are unknown to verifier. This may cause
uninitialized stack leaking.
Reject such an access in check_stack_boundary to prevent possible
leaking.
There are no known use-cases for indirect uninitialized stack access
with variable offset so it shouldn't break anything.
Fixes: 2011fccfb61b ("bpf: Support variable offset stack access from helpers")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
clang started to error on invalid asm clobber usage in x86 headers
and many bpf program samples failed to build with the message:
CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o
In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14:
In file included from ../include/linux/in.h:23:
In file included from ../include/uapi/linux/in.h:24:
In file included from ../include/linux/socket.h:8:
In file included from ../include/linux/uio.h:14:
In file included from ../include/crypto/hash.h:16:
In file included from ../include/linux/crypto.h:26:
In file included from ../include/linux/uaccess.h:5:
In file included from ../include/linux/sched.h:15:
In file included from ../include/linux/sem.h:5:
In file included from ../include/uapi/linux/sem.h:5:
In file included from ../include/linux/ipc.h:9:
In file included from ../include/linux/refcount.h:72:
../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list
r->refs.counter, e, "er", i, "cx");
^
../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list
r->refs.counter, e, "cx");
^
2 errors generated.
Override volatile() to workaround the problem.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
After removing the start and count arguments of syscall_get_arguments() it
seems reasonable to remove them from syscall_set_arguments(). Note, as of
today, there are no users of syscall_set_arguments(). But we are told that
there will be soon. But for now, at least make it consistent with
syscall_get_arguments().
Link: http://lkml.kernel.org/r/20190327222014.GA32540@altlinux.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dave Martin <dave.martin@arm.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
function call syscall_get_arguments() implemented in x86 was horribly
written and not optimized for the standard case of passing in 0 and 6 for
the starting index and the number of system calls to get. When looking at
all the users of this function, I discovered that all instances pass in only
0 and 6 for these arguments. Instead of having this function handle
different cases that are never used, simply rewrite it to return the first 6
arguments of a system call.
This should help out the performance of tracing system calls by ptrace,
ftrace and perf.
Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dave Martin <dave.martin@arm.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc
is not initialized which causes malfunction.
Fixes: 9114014cf4e6 ("genirq: Add mutex to irq desc to serialize request/free_irq()")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.com
|
|
Currently, when irq_domain_add_linear() fails, the error code does not get
set so it returns zero which is wrong. Fix it by setting the appropriate
error code.
Fixes: 9e543e22e204 ("irqchip: Add driver for Loongson-1 interrupt controller")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20190329062136.GQ32613@kadam
|
|
When ioctl calls are made with non-null-terminated userspace strings,
strlcpy causes an OOB-read from within strlen. Fix by changing to use
strscpy instead.
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
This fixes the following warning seen on GCC 7.3:
arch/x86/kernel/dumpstack.o: warning: objtool: oops_end() falls through to next function show_regs()
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/3418ebf5a5a9f6ed7e80954c741c0b904b67b5dc.1554398240.git.jpoimboe@redhat.com
|
|
Currently resolutions with pixel clock higher than 340 MHz don't work
with H6 HDMI controller. They just produce a blank screen.
Limit maximum pixel clock rate to 340 MHz until scrambling is supported.
Cc: stable@vger.kernel.org # 5.0
Fixes: 40bb9d3147b2 ("drm/sun4i: Add support for H6 DW HDMI controller")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190324190609.32721-1-jernej.skrabec@siol.net
|
|
This reverts commit a3f98bb22cbfaaf67717e156f79e2bfeb42d4cac.
Patch "Documentation/gpu/meson: Remove link to meson_canvas.c" was
incorrectly applied on the wrong branch not containing the fixed
commit 2bf6b5b0e374 ("drm/meson: exclusively use the canvas provider module")
Acked-by: Sean Paul <sean@poorly.run>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190404144342.15238-1-narmstrong@baylibre.com
|
|
The "call" variable comes from the user in privcmd_ioctl_hypercall().
It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
elements. We need to put an upper bound on it to prevent an out of
bounds access.
Cc: stable@vger.kernel.org
Fixes: 1246ae0bb992 ("xen: add variable hypercall caller")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
struct privcmd_buf_vma_private has a zero-sized array at the end
(pages), use the new struct_size() helper to determine the proper
allocation size and avoid potential type mistakes.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Pull drm fixes from Dave Airlie:
"Pretty quiet week, just some amdgpu and i915 fixes.
i915:
- deadlock fix
- gvt fixes
amdgpu:
- PCIE dpm feature fix
- Powerplay fixes"
* tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm:
drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplug
drm/i915/gvt: Correct the calculation of plane size
drm/amdgpu: remove unnecessary rlc reset function on gfx9
drm/i915: Always backoff after a drm_modeset_lock() deadlock
drm/i915/gvt: do not let pin count of shadow mm go negative
drm/i915/gvt: do not deliver a workload if its creation fails
drm/amd/display: VBIOS can't be light up HDMI when restart system
drm/amd/powerplay: fix possible hang with 3+ 4K monitors
drm/amd/powerplay: correct data type to avoid overflow
drm/amd/powerplay: add ECC feature bit
drm/amd/amdgpu: fix PCIe dpm feature issue (v3)
|
|
Pull networking fixes from David Miller:
1) Several hash table refcount fixes in batman-adv, from Sven
Eckelmann.
2) Use after free in bpf_evict_inode(), from Daniel Borkmann.
3) Fix mdio bus registration in ixgbe, from Ivan Vecera.
4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.
5) ila rhashtable corruption fix from Herbert Xu.
6) Don't allow upper-devices to be added to vrf devices, from Sabrina
Dubroca.
7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.
8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
from Alexander Lobakin.
9) Missing IDR checks in mlx5 driver, from Aditya Pakki.
10) Fix false connection termination in ktls, from Jakub Kicinski.
11) Work around some ASPM issues with r8169 by disabling rx interrupt
coalescing on certain chips. From Heiner Kallweit.
12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
Abeni.
13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.
14) Various BPF flow dissector fixes from Stanislav Fomichev.
15) Divide by zero in act_sample, from Davide Caratti.
16) Fix bridging multicast regression introduced by rhashtable
conversion, from Nikolay Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
ibmvnic: Fix completion structure initialization
ipv6: sit: reset ip header pointer in ipip6_rcv
net: bridge: always clear mcast matching struct on reports and leaves
libcxgb: fix incorrect ppmax calculation
vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
sch_cake: Make sure we can write the IP header before changing DSCP bits
sch_cake: Use tc_skb_protocol() helper for getting packet protocol
tcp: Ensure DCTCP reacts to losses
net/sched: act_sample: fix divide by zero in the traffic path
net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
net: hns: Fix sparse: some warnings in HNS drivers
net: hns: Fix WARNING when remove HNS driver with SMMU enabled
net: hns: fix ICMP6 neighbor solicitation messages discard problem
net: hns: Fix probabilistic memory overwrite when HNS driver initialized
net: hns: Use NAPI_POLL_WEIGHT for hns driver
net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
flow_dissector: rst'ify documentation
ipv6: Fix dangling pointer when ipv6 fragment
net-gro: Fix GRO flush when receiving a GSO packet.
flow_dissector: document BPF flow dissector environment
...
|
|
Topology is not unloaded in the core during unregister_component()
anymore. So, add the remove() callback that will unload the
topology.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The chips main power supplies VA and VP are enabled during probe but
then never disabled, this will cause warnings from the regulator
framework on driver removal. Fix this by adding a remove callback and
disabling the supplies, whilst doing so follow best practice and put the
chip back into reset as well.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Use %lu instead of %zu to fix the following warning introduced with
recent memblock refactoring:
xtensa/mm/mmu.c:36:9: warning: format '%zu' expects argument of type
'size_t', but argument 3 has type 'long unsigned int
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
|
|
Fix device initialization completion handling for vNIC adapters.
Initialize the completion structure on probe and reinitialize when needed.
This also fixes a race condition during kdump where the driver can attempt
to access the completion struct before it is initialized:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc0000000081acbe0
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop
CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1
NIP: c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900
REGS: c000000027f3f990 TRAP: 0300 Not tainted (4.18.0-64.el8.ppc64le)
MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288 XER: 00000006
CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58
GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4
GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28
GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100
GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006
GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000
GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003
GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58
NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100
LR [c0000000081ad964] complete+0x64/0xa0
Call Trace:
[c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable)
[c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0
[c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic]
[c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic]
[c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0
[c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400
[c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0
[c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210
[c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24
[c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130
[c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipip6 tunnels run iptunnel_pull_header on received skbs. This can
determine the following use-after-free accessing iph pointer since
the packet will be 'uncloned' running pskb_expand_head if it is a
cloned gso skb (e.g if the packet has been sent though a veth device)
[ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
[ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
[ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
[ 706.771839] Call trace:
[ 706.801159] dump_backtrace+0x0/0x2f8
[ 706.845079] show_stack+0x24/0x30
[ 706.884833] dump_stack+0xe0/0x11c
[ 706.925629] print_address_description+0x68/0x260
[ 706.982070] kasan_report+0x178/0x340
[ 707.025995] __asan_report_load1_noabort+0x30/0x40
[ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit]
[ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4]
[ 707.185940] ip_local_deliver_finish+0x3b8/0x988
[ 707.241338] ip_local_deliver+0x144/0x470
[ 707.289436] ip_rcv_finish+0x43c/0x14b0
[ 707.335447] ip_rcv+0x628/0x1138
[ 707.374151] __netif_receive_skb_core+0x1670/0x2600
[ 707.432680] __netif_receive_skb+0x28/0x190
[ 707.482859] process_backlog+0x1d0/0x610
[ 707.529913] net_rx_action+0x37c/0xf68
[ 707.574882] __do_softirq+0x288/0x1018
[ 707.619852] run_ksoftirqd+0x70/0xa8
[ 707.662734] smpboot_thread_fn+0x3a4/0x9e8
[ 707.711875] kthread+0x2c8/0x350
[ 707.750583] ret_from_fork+0x10/0x18
[ 707.811302] Allocated by task 16982:
[ 707.854182] kasan_kmalloc.part.1+0x40/0x108
[ 707.905405] kasan_kmalloc+0xb4/0xc8
[ 707.948291] kasan_slab_alloc+0x14/0x20
[ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0
[ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0
[ 708.108280] __alloc_skb+0xd8/0x400
[ 708.150139] sk_stream_alloc_skb+0xa4/0x638
[ 708.200346] tcp_sendmsg_locked+0x818/0x2b90
[ 708.251581] tcp_sendmsg+0x40/0x60
[ 708.292376] inet_sendmsg+0xf0/0x520
[ 708.335259] sock_sendmsg+0xac/0xf8
[ 708.377096] sock_write_iter+0x1c0/0x2c0
[ 708.424154] new_sync_write+0x358/0x4a8
[ 708.470162] __vfs_write+0xc4/0xf8
[ 708.510950] vfs_write+0x12c/0x3d0
[ 708.551739] ksys_write+0xcc/0x178
[ 708.592533] __arm64_sys_write+0x70/0xa0
[ 708.639593] el0_svc_handler+0x13c/0x298
[ 708.686646] el0_svc+0x8/0xc
[ 708.739019] Freed by task 17:
[ 708.774597] __kasan_slab_free+0x114/0x228
[ 708.823736] kasan_slab_free+0x10/0x18
[ 708.868703] kfree+0x100/0x3d8
[ 708.905320] skb_free_head+0x7c/0x98
[ 708.948204] skb_release_data+0x320/0x490
[ 708.996301] pskb_expand_head+0x60c/0x970
[ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0
[ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit]
[ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4]
[ 709.200195] ip_local_deliver_finish+0x3b8/0x988
[ 709.255596] ip_local_deliver+0x144/0x470
[ 709.303692] ip_rcv_finish+0x43c/0x14b0
[ 709.349705] ip_rcv+0x628/0x1138
[ 709.388413] __netif_receive_skb_core+0x1670/0x2600
[ 709.446943] __netif_receive_skb+0x28/0x190
[ 709.497120] process_backlog+0x1d0/0x610
[ 709.544169] net_rx_action+0x37c/0xf68
[ 709.589131] __do_softirq+0x288/0x1018
[ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580
which belongs to the cache kmalloc-1024 of size 1024
[ 709.804356] The buggy address is located 117 bytes inside of
1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
[ 709.946340] The buggy address belongs to the page:
[ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
[ 710.099914] flags: 0xfffff8000000100(slab)
[ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
[ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
[ 710.334966] page dumped because: kasan: bad access detected
Fix it resetting iph pointer after iptunnel_pull_header
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
Tested-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tuong Lien says:
====================
tipc: improve TIPC unicast link throughput
The series introduces an algorithm to improve TIPC throughput especially
in terms of packet loss, also tries to reduce packet duplication due to
overactive NACK sending mechanism.
The link failover situation is also covered by the patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit 0ae955e2656d ("tipc: improve TIPC throughput by Gap ACK
blocks"), we enhance the link transmq by releasing as many packets as
possible with the multi-ACKs from peer node. This also means the queue
is now non-linear and the peer link deferdq becomes vital.
Whereas, in the case of link failover, all messages in the link transmq
need to be transmitted as tunnel messages in such a way that message
sequentiality and cardinality per sender is preserved. This requires us
to maintain the link deferdq somehow, so that when the tunnel messages
arrive, the inner user messages along with the ones in the deferdq will
be delivered to upper layer correctly.
The commit accomplishes this by defining a new queue in the TIPC link
structure to hold the old link deferdq when link failover happens and
process it upon receipt of tunnel messages.
Also, in the case of link syncing, the link deferdq will not be purged
to avoid unnecessary retransmissions that in the worst case will fail
because the packets might have been freed on the sending side.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For unicast transmission, the current NACK sending althorithm is over-
active that forces the sending side to retransmit a packet that is not
really lost but just arrived at the receiving side with some delay, or
even retransmit same packets that have already been retransmitted
before. As a result, many duplicates are observed also under normal
condition, ie. without packet loss.
One example case is: node1 transmits 1 2 3 4 10 5 6 7 8 9, when node2
receives packet #10, it puts into the deferdq. When the packet #5 comes
it sends NACK with gap [6 - 9]. However, shortly after that, when
packet #6 arrives, it pulls out packet #10 from the deferfq, but it is
still out of order, so it makes another NACK with gap [7 - 9] and so on
... Finally, node1 has to retransmit the packets 5 6 7 8 9 a number of
times, but in fact all the packets are not lost at all, so duplicates!
This commit reduces duplicates by changing the condition to send NACK,
also restricting the retransmissions on individual packets via a timer
of about 1ms. However, it also needs to say that too tricky condition
for NACKs or too long timeout value for retransmissions will result in
performance reducing! The criterias in this commit are found to be
effective for both the requirements to reduce duplicates but not affect
performance.
The tipc_link_rcv() is also improved to only dequeue skb from the link
deferdq if it is expected (ie. its seqno <= rcv_nxt).
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During unicast link transmission, it's observed very often that because
of one or a few lost/dis-ordered packets, the sending side will fastly
reach the send window limit and must wait for the packets to be arrived
at the receiving side or in the worst case, a retransmission must be
done first. The sending side cannot release a lot of subsequent packets
in its transmq even though all of them might have already been received
by the receiving side.
That is, one or two packets dis-ordered/lost and dozens of packets have
to wait, this obviously reduces the overall throughput!
This commit introduces an algorithm to overcome this by using "Gap ACK
blocks". Basically, a Gap ACK block will consist of <ack, gap> numbers
that describes the link deferdq where packets have been got by the
receiving side but with gaps, for example:
link deferdq: [1 2 3 4 10 11 13 14 15 20]
--> Gap ACK blocks: <4, 5>, <11, 1>, <15, 4>, <20, 0>
The Gap ACK blocks will be sent to the sending side along with the
traditional ACK or NACK message. Immediately when receiving the message
the sending side will now not only release from its transmq the packets
ack-ed by the ACK but also by the Gap ACK blocks! So, more packets can
be enqueued and transmitted.
In addition, the sending side can now do "multi-retransmissions"
according to the Gaps reported in the Gap ACK blocks.
The new algorithm as verified helps greatly improve the TIPC throughput
especially under packet loss condition.
So far, a maximum of 32 blocks is quite enough without any "Too few Gap
ACK blocks" reports with a 5.0% packet loss rate, however this number
can be increased in the furture if needed.
Also, the patch is backward compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"I dropped the ball a bit here: these patches should all probably have
been part of rc2, but I wanted to get around to properly testing them
in the various configurations (qemu32, qeum64, unleashed) first.
Unfortunately I've been traveling and didn't have time to actually do
that, but since these fix concrete bugs and pass my old set of tests I
don't want to delay the fixes any longer.
There are four independent fixes here:
- A fix for the rv32 port that corrects the 64-bit user accesor's
fixup label address.
- A fix for a regression introduced during the merge window that
broke medlow configurations at run time. This patch also includes a
fix that disables ftrace for the same set of functions, which was
found by inspection at the same time.
- A modification of the memory map to avoid overlapping the FIXMAP
and VMALLOC regions on systems with small memory maps.
- A fix to the module handling code to use the correct syntax for
probing Kconfig entries.
These have passed my standard test flow, but I didn't have time to
expand that testing like I said I would"
* tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Use IS_ENABLED(CONFIG_CMODEL_MEDLOW)
RISC-V: Fix FIXMAP_TOP to avoid overlap with VMALLOC area
RISC-V: Always compile mm/init.c with cmodel=medany and notrace
riscv: fix accessing 8-byte variable from RV32
|
|
Heiner Kallweit says:
====================
net: phy: use generic PHY ability readers if callback get_features isn't set
Meanwhile we have generic functions for reading the abilities of
Clause 22 / 45 PHY's. This allows to use them as fallback in case
callback get_features isn't set. Benefit is the reduction of
boilerplate code in PHY drivers.
v2:
- adjust a comment in patch 1 to match the code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that phylib uses genphy_read_abilities() as fallback, we don't have
to set callback get_features any longer.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Meanwhile we have generic functions for reading the abilities of
Clause 22 / 45 PHY's. This allows to use them as fallback in case
callback get_features isn't set. Benefit is the reduction of
boilerplate code in PHY drivers.
v2:
- adjust the comment in phy_driver_register to match the code
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the mcast conversion to rhashtable this function has been unused, so
remove it.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to be careful and always zero the whole br_ip struct when it is
used for matching since the rhashtable change. This patch fixes all the
places which didn't properly clear it which in turn might've caused
mismatches.
Thanks for the great bug report with reproducing steps and bisection.
Steps to reproduce (from the bug report):
ip link add br0 type bridge mcast_querier 1
ip link set br0 up
ip link add v2 type veth peer name v3
ip link set v2 master br0
ip link set v2 up
ip link set v3 up
ip addr add 3.0.0.2/24 dev v3
ip netns add test
ip link add v1 type veth peer name v1 netns test
ip link set v1 master br0
ip link set v1 up
ip -n test link set v1 up
ip -n test addr add 3.0.0.1/24 dev v1
# Multicast receiver
ip netns exec test socat
UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -
# Multicast sender
echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588
Reported-by: liam.mcbirnie@boeing.com
Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix up the intel_pstate driver after recent changes to prevent
it from printing pointless messages and update the turbostat utility
(mostly fixes and new hardware support).
Specifics:
- Make intel_pstate only load on Intel processors and prevent it from
printing pointless failure messages (Borislav Petkov).
- Update the turbostat utility:
* Assorted fixes (Ben Hutchings, Len Brown, Prarit Bhargava).
* Support for AMD Fam 17h (Zen) RAPL and package power (Calvin
Walton).
* Support for Intel Icelake and for systems with more than one die
per package (Len Brown).
* Cleanups (Len Brown)"
* tag 'pm-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/intel_pstate: Load only on Intel hardware
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Prevent stale GPE events from triggering spurious system wakeups from
suspend-to-idle (Furquan Shaikh)"
* tag 'acpi-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Clear status of GPEs before enabling them
|
|
This reverts commit 6578229d4efb7ea6287861bfc2bd306140458e07.
netif_receive_skb_list() doesn't support GRO, therefore we may have
scenarios with decreased performance. See discussion here [0].
[0] https://marc.info/?t=155403847400001&r=1&w=2
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Only one fix for DSC (backoff after drm_modeset_lock deadlock)
and GVT's fixes including vGPU display plane size calculation,
shadow mm pin count, error recovery path for workload create
and one kerneldoc fix.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190404161116.GA14522@intel.com
|
|
Linux currently disable ECN for incoming connections when the SYN
requests ECN and the IP header has ECT(0)/ECT(1) set, as some
networks were reportedly mangling the ToS byte, hence could later
trigger false congestion notifications.
RFC8311 §4.3 relaxes RFC3168's requirements such that ECT can be set
one TCP control packets (including SYNs). The main benefit of this
is the decreased probability of losing a SYN in a congested
ECN-capable network (i.e., it avoids the initial 1s timeout).
Additionally, this allows the development of newer TCP extensions,
such as AccECN.
This patch relaxes the previous check, by enabling ECN on incoming
connections using SYN+ECT if at least one bit of the reserved flags
of the TCP header is set. Such bit would indicate that the sender of
the SYN is using a newer TCP feature than what the host implements,
such as AccECN, and is thus implementing RFC8311. This enables
end-hosts not supporting such extensions to still negociate ECN, and
to have some of the benefits of using ECN on control packets.
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Suggested-by: Bob Briscoe <research@bobbriscoe.net>
Cc: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull mfd fixes from Lee Jones:
- Fix failed reads due to enabled IRQs when suspended; twl-core
- Fix driver registration when using DT; sprd-sc27xx-spi
- Fix `make allyesconfig` on x86_64; SUN6I_PRCM
* tag 'mfd-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: sun6i-prcm: Allow to compile with COMPILE_TEST
mfd: sc27xx: Use SoC compatible string for PMIC devices
mfd: twl-core: Disable IRQ while suspended
|
|
Jiri Pirko says:
====================
net: extend devlink port attrs with switch ID
Extend devlink port attrs to contain switch ID and change drivers that
register devlink ports to use that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently if the driver registers devlink port instance, he should set
the devlink port attributes as well. Then the devlink core is able to
obtain switch id itself, no need for driver to implement the ndo.
Once all drivers will implement devlink port registration, this ndo
should be removed. This warning guides new drivers to do things as
they should be done.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get(). Leave
ndo_get_port_parent_id implementation only for legacy.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Obtain HW id and pass it down to mlxsw_core_port_init() as it would be
used as switch_id in devlink and exposed to user.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove implementation of get_port_parent_id ndo and rely on core calling
into devlink for the information directly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove implementation of get_port_parent_id ndo and rely on core calling
into devlink for the information directly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the switch_id is being only initialized when switching eswitch
mode from "legacy" to "switchdev". However, nothing prevents the id to
be initialized from the very beginning. Physical ports can show it even
in "legacy" mode.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove implementation of get_port_parent_id ndo and rely on core calling
into devlink for the information directly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce devlink_compat_switch_id_get() helper which fills up switch_id
according to passed netdev pointer. Call it directly from
dev_get_port_parent_id() as a fallback when ndo_get_port_parent_id
is not defined for given netdev.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend devlink_port_attrs_set() to pass switch ID for ports which are
part of switch and store it in port attrs. For other ports, this is
NULL.
Note that this allows the driver to group devlink ports into one or more
switches according to the actual topology.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|