summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-24strparser: initialize all callbacksEric Biggers
commit bbb03029a899 ("strparser: Generalize strparser") added more function pointers to 'struct strp_callbacks'; however, kcm_attach() was not updated to initialize them. This could cause the ->lock() and/or ->unlock() function pointers to be set to garbage values, causing a crash in strp_work(). Fix the bug by moving the callback structs into static memory, so unspecified members are zeroed. Also constify them while we're at it. This bug was found by syzkaller, which encountered the following splat: IP: 0x55 PGD 3b1ca067 P4D 3b1ca067 PUD 3b12f067 PMD 0 Oops: 0010 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: kstrp strp_work task: ffff88006bb0e480 task.stack: ffff88006bb10000 RIP: 0010:0x55 RSP: 0018:ffff88006bb17540 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000 RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48 RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000 R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48 R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000 FS: 0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098 worker_thread+0x223/0x1860 kernel/workqueue.c:2233 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Code: Bad RIP value. RIP: 0x55 RSP: ffff88006bb17540 CR2: 0000000000000055 ---[ end trace f0e4920047069cee ]--- Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and CONFIG_AF_KCM=y): #include <linux/bpf.h> #include <linux/kcm.h> #include <linux/types.h> #include <stdint.h> #include <sys/ioctl.h> #include <sys/socket.h> #include <sys/syscall.h> #include <unistd.h> static const struct bpf_insn bpf_insns[3] = { { .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */ { .code = 0x95 }, /* BPF_EXIT_INSN() */ }; static const union bpf_attr bpf_attr = { .prog_type = 1, .insn_cnt = 2, .insns = (uintptr_t)&bpf_insns, .license = (uintptr_t)"", }; int main(void) { int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &bpf_attr, sizeof(bpf_attr)); int inet_fd = socket(AF_INET, SOCK_STREAM, 0); int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0); ioctl(kcm_fd, SIOCKCMATTACH, &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd }); } Fixes: bbb03029a899 ("strparser: Generalize strparser") Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24hv_netvsc: Fix rndis_filter_close error during netvsc_removeHaiyang Zhang
We now remove rndis filter before unregister_netdev(), which calls device close. It involves closing rndis filter already removed. This patch fixes this error. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24Merge branch 'tipc-buffer-reassignment-fixes'David S. Miller
Parthasarathy Bhuvaragan says: ==================== tipc: buffer reassignment fixes This series contains fixes for buffer reassignments and a context imbalance. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: context imbalance at node read unlockParthasarathy Bhuvaragan
If we fail to find a valid bearer in tipc_node_get_linkname(), node_read_unlock() is called without holding the node read lock. This commit fixes this error. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: reassign pointers after skb reallocation / linearizationParthasarathy Bhuvaragan
In tipc_msg_reverse(), we assign skb attributes to local pointers in stack at startup. This is followed by skb_linearize() and for cloned buffers we perform skb relocation using pskb_expand_head(). Both these methods may update the skb attributes and thus making the pointers incorrect. In this commit, we fix this error by ensuring that the pointers are re-assigned after any of these skb operations. Fixes: 29042e19f2c60 ("tipc: let function tipc_msg_reverse() expand header when needed") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24tipc: perform skb_linearize() before parsing the inner headerParthasarathy Bhuvaragan
In tipc_rcv(), we linearize only the header and usually the packets are consumed as the nodes permit direct reception. However, if the skb contains tunnelled message due to fail over or synchronization we parse it in tipc_node_check_state() without performing linearization. This will cause link disturbances if the skb was non linear. In this commit, we perform linearization for the above messages. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24Merge tag 'mlx5-updates-2017-08-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-08-24 This series includes updates to mlx5 core driver. From Gal and Saeed, three cleanup patches. From Matan, Low level flow steering improvements and optimizations, - Use more efficient data structures for flow steering objects handling. - Add tracepoints to flow steering operations. - Overall these patches improve flow steering rule insertion rate by a factor of seven in large scales (~50K rules or more). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24hinic: uninitialized variable in hinic_api_cmd_init()Dan Carpenter
We never set the error code in this function. Fixes: eabf0fad81d5 ("net-next/hinic: Initialize api cmd resources") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net_sched: fix a refcount_t issue with noop_qdiscEric Dumazet
syzkaller reported a refcount_t warning [1] Issue here is that noop_qdisc refcnt was never really considered as a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set : if (qdisc->flags & TCQ_F_BUILTIN || !refcount_dec_and_test(&qdisc->refcnt))) return; Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not really needed, but harmless until refcount_t came. To fix this problem, we simply need to not increment noop_qdisc.refcnt, since we never decrement it. [1] refcount_t: increment on 0; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152 RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282 RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000 RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8 RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0 R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000 attach_default_qdiscs net/sched/sch_generic.c:792 [inline] dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833 __dev_open+0x227/0x330 net/core/dev.c:1380 __dev_change_flags+0x695/0x990 net/core/dev.c:6726 dev_change_flags+0x88/0x140 net/core/dev.c:6792 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554 sock_do_ioctl+0x94/0xb0 net/socket.c:968 sock_ioctl+0x2c2/0x440 net/socket.c:1058 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Reshetova, Elena <elena.reshetova@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net: mv643xx_eth: Be drop monitor friendlyFlorian Fainelli
txq_reclaim() does the normal transmit queue reclamation and rxq_deinit() does the RX ring cleanup, none of these are packet drops, so use dev_consume_skb() for both locations. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24mtd: nand: atmel: Relax tADL_min constraintBoris Brezillon
Version 4 of the ONFI spec mandates that tADL be at least 400 nanoseconds, but, depending on the master clock rate, 400 ns may not fit in the tADL field of the SMC reg. We need to relax the check and accept the -ERANGE return code. Note that previous versions of the ONFI spec had a lower tADL_min (100 or 200 ns). It's not clear why this timing constraint got increased but it seems most NANDs are fine with values lower than 400ns, so we should be safe. Fixes: f9ce2eddf176 ("mtd: nand: atmel: Add ->setup_data_interface() hooks") Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2017-08-24mtd: nandsim: remove debugfs entries in error pathUwe Kleine-König
The debugfs entries must be removed before an error is returned in the probe function. Otherwise another try to load the module fails and when the debugfs files are accessed without the module loaded, the kernel still tries to call a function in that module. Fixes: 5346c27c5fed ("mtd: nandsim: Introduce debugfs infrastructure") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Richard Weinberger <richard@nod.at> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2017-08-24tg3: Be drop monitor friendlyFlorian Fainelli
tg3_tx() does the normal packet TX completion, tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a new SKB that is suitable to workaround HW bugs, and finally tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for these 3 locations to be SKB drop monitor friendly. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net: systemport: Free DMA coherent descriptors on errorsFlorian Fainelli
In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we would return with an error, and call bcm_sysport_fini_tx_ring() and it would see that ring->cbs is NULL and do nothing. This would leak the coherent DMA descriptor area, so we need to free it on error before returning. Reported-by: Eric Dumazet <edumazet@gmail.com> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net: bcmgenet: Be drop monitor friendlyFlorian Fainelli
There are 3 spots where we call dev_kfree_skb() but we are actually just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring setup and bcmgenet_free_rx_buffers() during RX ring cleanup. Fixes: d6707bec5986 ("net: bcmgenet: rewrite bcmgenet_rx_refill()") Fixes: f48bed16a756 ("net: bcmgenet: Free skb after last Tx frag") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24bpf: fix bpf_setsockopts return valueYuchung Cheng
This patch fixes a bug causing any sock operations to always return EINVAL. Fixes: a5192c52377e ("bpf: fix to bpf_setsockops"). Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Craig Gallek <kraig@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net: systemport: Be drop monitor friendlyFlorian Fainelli
Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is used when a TX packet is completed, as well as when the RX ring is cleaned on shutdown. None of these two cases are packet drops, so be drop monitor friendly. Suggested-by: Eric Dumazet <edumazet@gmail.com> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24Merge branch 'ipv6-Route-ICMPv6-errors-with-the-flow-when-ECMP-in-use'David S. Miller
Jakub Sitnicki says: ==================== ipv6: Route ICMPv6 errors with the flow when ECMP in use This patch set is another take at making Path MTU Discovery work when server nodes are behind a router employing multipath routing in a load-balance or anycast setup (that is, when not every end-node can be reached by every path). The problem has been well described in RFC 7690 [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed to be routed back to the server node that sent a reply that exceeds path MTU. The proposed solution is two-fold: (1) on the server side - reflect the Flow Label [2]. This can be done without modifying the application using a new per-netns sysctl knob that has been proposed independently of this patchset in the patch entitled "ipv6: Add sysctl for per namespace flow label reflection" [3]. (2) on the ECMP router - make the ipv6 routing subsystem look into the ICMPv6 error packets and compute the flow-hash from its payload, i.e. the offending packet that triggered the error. This is the same behavior as ipv4 stack has already. With both parts in place Path MTU Discovery can work past the ECMP router when using IPv6. [1] https://tools.ietf.org/html/rfc7690 [2] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01 [3] http://patchwork.ozlabs.org/patch/804870/ v1 -> v2: - don't use "extern" in external function declaration in header file - style change, put as many arguments as possible on the first line of a function call, and align consecutive lines to the first argument - expand the cover letter based on the feedback v2 -> v3: - switch to computing flow-hash using flow dissector to align with recent changes to multipath routing in ipv4 stack - add a sysctl knob for enabling flow label reflection per netns --- Testing has covered multipath routing of ICMPv6 PTB errors in forward and local output path in a simple use-case of an HTTP server sending a reply which is over the path MTU size [3]. I have also checked if the flows get evenly spread over multiple paths (i.e. if there are no regressions) [4]. [3] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/pmtud [4] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/load-balance ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24ipv6: Use multipath hash from flow info if availableJakub Sitnicki
Allow our callers to influence the choice of ECMP link by honoring the hash passed together with the flow info. This allows for special treatment of ICMP errors which we would like to route over the same path as the IPv6 datagram that triggered the error. Also go through rt6_multipath_hash(), in the usual case when we aren't dealing with an ICMP error, so that there is one central place where multipath hash is computed. Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24ipv6: Fold rt6_info_hash_nhsfn() into its only callerJakub Sitnicki
Commit 644d0e656958 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes sense to keep it around. Also remove the accompanying comment that has become outdated. Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24ipv6: Compute multipath hash for ICMP errors from offending packetJakub Sitnicki
When forwarding or sending out an ICMPv6 error, look at the embedded packet that triggered the error and compute a flow hash over its headers. This let's us route the ICMP error together with the flow it belongs to when multipath (ECMP) routing is in use, which in turn makes Path MTU Discovery work in ECMP load-balanced or anycast setups (RFC 7690). Granted, end-hosts behind the ECMP router (aka servers) need to reflect the IPv6 Flow Label for PMTUD to work. The code is organized to be in parallel with ipv4 stack: ip_multipath_l3_keys -> ip6_multipath_l3_keys fib_multipath_hash -> rt6_multipath_hash Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net: Extend struct flowi6 with multipath hashJakub Sitnicki
Allow for functions that fill out the IPv6 flow info to also pass a hash computed over the skb contents. The hash value will drive the multipath routing decisions. This is intended for special treatment of ICMPv6 errors, where we would like to make a routing decision based on the flow identifying the offending IPv6 datagram that triggered the error, rather than the flow of the ICMP error itself. Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24devlink: Fix devlink_dpipe_table_register() stub signature.David S. Miller
One too many arguments compared to the non-stub version. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: ffd3cdccf214 ("devlink: Add support for dynamic table size") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24ipv6: Add sysctl for per namespace flow label reflectionJakub Sitnicki
Reflecting IPv6 Flow Label at server nodes is useful in environments that employ multipath routing to load balance the requests. As "IPv6 Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error messages generated in response to a downstream packets from the server can be routed by a load balancer back to the original server without looking at transport headers, if the server applies the flow label reflection. This enables the Path MTU Discovery past the ECMP router in load-balance or anycast environments where each server node is reachable by only one path. Introduce a sysctl to enable flow label reflection per net namespace for all newly created sockets. Same could be earlier achieved only per socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR socket option. [1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01 Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25Merge tag 'drm-misc-fixes-2017-08-24' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-misc into drm-fixes Core Changes: - Release driver tracking before making the object available again (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> * tag 'drm-misc-fixes-2017-08-24' of git://anongit.freedesktop.org/git/drm-misc: drm: Release driver tracking before making the object available again
2017-08-25Merge tag 'drm-intel-fixes-2017-08-24' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-intel into drm-fixes drm/i915 fixes for v4.13-rc7 * tag 'drm-intel-fixes-2017-08-24' of git://anongit.freedesktop.org/git/drm-intel: drm/i915/gvt: Fix the kernel null pointer error drm/i915: Clear lost context-switch interrupts across reset drm/i915/bxt: use NULL for GPIO connection ID drm/i915/cnl: Fix LSPCON support. drm/i915/vbt: ignore extraneous child devices for a port drm/i915: Initialize 'data' in intel_dsi_dcs_backlight.c
2017-08-24Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpadMasaki Ota
Fixed the issue that two finger scroll does not work correctly on V8 protocol. The cause is that V8 protocol X-coordinate decode is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition. Mote notes: the problem manifests itself by the commit e7348396c6d5 ("Input: ALPS - fix V8+ protocol handling (73 03 28)"), where a fix for the V8+ protocol was applied. Although the culprit must have been present beforehand, the two-finger scroll worked casually even with the wrongly reported values by some reason. It got broken by the commit above just because it changed x_max value, and this made libinput correctly figuring the MT events. Since the X coord is reported as falsely doubled, the events on the right-half side go outside the boundary, thus they are no longer handled. This resulted as a broken two-finger scroll. One finger event is decoded differently, and it didn't suffer from this problem. The problem was only about MT events. --tiwai Fixes: e7348396c6d5 ("Input: ALPS - fix V8+ protocol handling (73 03 28)") Signed-off-by: Masaki Ota <masaki.ota@jp.alps.com> Tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-08-24Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma fixes from Doug Ledford: "Well, I thought we were going to be done for this -rc cycle. I should have known better than to say so though. We have four additional items that trickled in. One was a simple mistake on my part. I took a patch into my for-next thinking that the issue was less severe than it was. I was then notified that it needed to be in my -rc area instead. The other three were just found late in testing. Summary: - One core fix accidentally applied first to for-next and then cherry picked back because it needed to be in the -rc cycles instead - Another core fix - Two mlx5 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/mlx5: Always return success for RoCE modify port IB/mlx5: Fix Raw Packet QP event handler assignment IB/core: Avoid accessing non-allocated memory when inferring port type RDMA/uverbs: Initialize cq_context appropriately
2017-08-24net: sunrpc: svcsock: fix NULL-pointer exceptionVadim Lomovtsev
While running nfs/connectathon tests kernel NULL-pointer exception has been observed due to races in svcsock.c. Race is appear when kernel accepts connection by kernel_accept (which creates new socket) and start queuing ingress packets to new socket. This happens in ksoftirq context which could run concurrently on a different core while new socket setup is not done yet. The fix is to re-order socket user data init sequence and add write/read barrier calls to be sure that we got proper values for callback pointers before actually calling them. Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations. Crash log: ---<-snip->--- [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 6708.647093] pgd = ffff0000094e0000 [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000 [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021] [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1 [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017 [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000 [ 6708.773822] PC is at 0x0 [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc] [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145 [ 6708.789248] sp : ffff810ffbad3900 [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58 [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00 [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28 [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000 [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00 [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2 [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000 [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001 [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000 [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000 [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000 [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000 [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000 [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00 [ 6708.872084] [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000) [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000) [ 6708.886075] Call trace: [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840) [ 6708.894942] 3700: ffff810012412400 0001000000000000 [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000 [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000 [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000 [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140 [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000 [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028 [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000 [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000 [ 6708.973117] [< (null)>] (null) [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58 [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254 [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4 [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8 [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84 [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8 [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf] [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf] [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464 [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308 [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174 [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4 [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190 [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20) [ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000 [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18 [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80 [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4 [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50 [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000 [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000 [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20 [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145 [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238 [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140 [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30 [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4 [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30 [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160 [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4 [ 6709.207967] Code: bad PC value [ 6709.211061] SMP: stopping secondary CPUs [ 6709.218830] Starting crashdump kernel... [ 6709.222749] Bye! ---<-snip>--- Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24nfsd: Limit end of page list when decoding NFSv4 WRITEChuck Lever
When processing an NFSv4 WRITE operation, argp->end should never point past the end of the data in the final page of the page list. Otherwise, nfsd4_decode_compound can walk into uninitialized memory. More critical, nfsd4_decode_write is failing to increment argp->pagelen when it increments argp->pagelist. This can cause later xdr decoders to assume more data is available than really is, which can cause server crashes on malformed requests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24Merge tag 'acpi-4.13-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix two recent regressions (in ACPICA and in the ACPI EC driver) and one bug in code introduced during the 4.12 cycle (ACPI device properties library routine). Specifics: - Fix a regression in the ACPI EC driver causing a kernel to crash during initialization on some systems due to a code ordering issue exposed by a recent change (Lv Zheng). - Fix a recent regression in ACPICA due to a change of the behavior of a library function in a way that is not backwards compatible with some existing callers of it (Rafael Wysocki). - Fix a coding mistake in a library function related to the handling of ACPI device properties introduced during the 4.12 cycle (Sakari Ailus)" * tag 'acpi-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value() ACPICA: Fix acpi_evaluate_object_typed() ACPI: EC: Fix regression related to wrong ECDT initialization order
2017-08-24Merge tag 'kbuild-fixes-v4.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix linker script regression caused by dead code elimination support - fix typos and outdated comments - specify kselftest-clean as a PHONY target - fix "make dtbs_install" when $(srctree) includes shell special characters like '~' - Move -fshort-wchar to the global option list because defining it partially emits warnings * tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: update comments of Makefile.asm-generic kbuild: Do not use hyphen in exported variable name Makefile: add kselftest-clean to PHONY target list Kbuild: use -fshort-wchar globally fixdep: trivial: typo fix and correction kbuild: trivial cleanups on the comments kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured
2017-08-24Merge branch 'for-4.13-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "We have one more fixup that stems from the blk_status_t conversion that did not quite cover everything. The normal cases were not affected because the code is 0, but any error and retries could mix up new and old values" * 'for-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix blk_status_t/errno confusion
2017-08-24Merge tag 'trace-v4.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various bug fixes: - Two small memory leaks in error paths. - A missed return error code on an error path. - A fix to check the tracing ring buffer CPU when it doesn't exist (caused by setting maxcpus on the command line that is less than the actual number of CPUs, and then onlining them manually). - A fix to have the reset of boot tracers called by lateinit_sync() instead of just lateinit(). As some of the tracers register via lateinit(), and if the clear happens before the tracer is registered, it will never start even though it was told to via the kernel command line" * tag 'trace-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix freeing of filter in create_filter() when set_str is false tracing: Fix kmemleak in tracing_map_array_free() ftrace: Check for null ret_stack on profile function graph entry function ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU tracing: Missing error code in tracer_alloc_buffers() tracing: Call clear_boot_tracer() at lateinit_sync
2017-08-24tipc: Fix tipc_sk_reinit handling of -EAGAINBob Peterson
In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit had additional logic added to loop in the event that function rhashtable_walk_next() returned -EAGAIN. No worries. However, if rhashtable_walk_start returns -EAGAIN, it does "continue", and therefore skips the call to rhashtable_walk_stop(). That has the effect of calling rcu_read_lock() without its paired call to rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem may not be apparent for a while, especially since resize events may be rare. But the comments to rhashtable_walk_start() state: * ...Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. This patch replaces the continue with a goto and label to ensure a matching call to rhashtable_walk_stop(). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "A small number of bugfixes, again nothing serious. - Alexander Dahl found multiple bugs in the Atmel memory interface driver - A randconfig build fix for at91 was incomplete, the second attempt fixes the remaining corner case - One fix for the TI Keystone queue handler - The Odroid XU4 HDMI port (added in 4.13) needs a small DT fix" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: exynos: add needs-hpd for Odroid-XU3/4 ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create memory: atmel-ebi: Fix smc cycle xlate converter memory: atmel-ebi: Allow t_DF timings of zero ns memory: atmel-ebi: Fix smc timing return value evaluation
2017-08-24qlge: avoid memcpy buffer overflowArnd Bergmann
gcc-8.0.0 (snapshot) points out that we copy a variable-length string into a fixed length field using memcpy() with the destination length, and that ends up copying whatever follows the string: inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2: drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=] memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1); Changing it to use strncpy() will instead zero-pad the destination, which seems to be the right thing to do here. The bug is probably harmless, but it seems like a good idea to address it in stable kernels as well, if only for the purpose of building with gcc-8 without warnings. Fixes: a61f80261306 ("qlge: Add ethtool register dump function.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24pty: Repair TIOCGPTPEEREric W. Biederman
The implementation of TIOCGPTPEER has two issues. When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong vfsmount is passed to dentry_open. Which results in the kernel displaying the wrong pathname for the peer. The second is simply by caching the vfsmount and dentry of the peer it leaves them open, in a way they were not previously Which because of the inreased reference counts can cause unnecessary behaviour differences resulting in regressions. To fix these move the ioctl into tty_io.c at a generic level allowing the ioctl to have access to the struct file on which the ioctl is being called. This allows the path of the slave to be derived when opening the slave through TIOCGPTPEER instead of requiring the path to the slave be cached. Thus removing the need for caching the path. A new function devpts_ptmx_path is factored out of devpts_acquire and used to implement a function devpts_mntget. The new function devpts_mntget takes a filp to perform the lookup on and fsi so that it can confirm that the superblock that is found by devpts_ptmx_path is the proper superblock. v2: Lots of fixes to make the code actually work v3: Suggestions by Linus - Removed the unnecessary initialization of filp in ptm_open_peer - Simplified devpts_ptmx_path as gotos are no longer required [ This is the fix for the issue that was reverted in commit 143c97cc6529, but this time without breaking 'pbuilder' due to increased reference counts - Linus ] Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl") Reported-by: Christian Brauner <christian.brauner@canonical.com> Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-24Merge branches 'acpica-fix', 'acpi-ec-fix' and 'acpi-properties-fix'Rafael J. Wysocki
* acpica-fix: ACPICA: Fix acpi_evaluate_object_typed() * acpi-ec-fix: ACPI: EC: Fix regression related to wrong ECDT initialization order * acpi-properties-fix: ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value()
2017-08-24um: Fix check for _xstate for older hostsFlorian Fainelli
Commit 0a987645672e ("um: Allow building and running on older hosts") attempted to check for PTRACE_{GET,SET}REGSET under the premise that these ptrace(2) parameters were directly linked with the presence of the _xstate structure. After Richard's commit 61e8d462457f ("um: Correctly check for PTRACE_GETRESET/SETREGSET") which properly included linux/ptrace.h instead of asm/ptrace.h, we could get into the original build failure that I reported: arch/x86/um/user-offsets.c: In function 'foo': arch/x86/um/user-offsets.c:54: error: invalid application of 'sizeof' to incomplete type 'struct _xstate' On this particular host, we do have PTRACE_GETREGSET and PTRACE_SETREGSET defined in linux/ptrace.h, but not the structure _xstate that should be pulled from the following include chain: signal.h -> bits/sigcontext.h. Correctly fix this by checking for FP_XSTATE_MAGIC1 which is the correct way to see if struct _xstate is available or not on the host. Fixes: 61e8d462457f ("um: Correctly check for PTRACE_GETRESET/SETREGSET") Fixes: 0a987645672e ("um: Allow building and running on older hosts") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-08-24IB/mlx5: Always return success for RoCE modify portMajd Dibbiny
CM layer calls ib_modify_port() regardless of the link layer. For the Ethernet ports, qkey violation and Port capabilities are meaningless. Therefore, always return success for ib_modify_port calls on the Ethernet ports. Cc: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/mlx5: Fix Raw Packet QP event handler assignmentMajd Dibbiny
In case we have SQ and RQ for Raw Packet QP, the SQ's event handler wasn't assigned. Fixing this by assigning event handler for each WQ after creation. [ 1877.145243] Call Trace: [ 1877.148644] <IRQ> [ 1877.150580] [<ffffffffa07987c5>] ? mlx5_rsc_event+0x105/0x210 [mlx5_core] [ 1877.159581] [<ffffffffa0795bd7>] ? mlx5_cq_event+0x57/0xd0 [mlx5_core] [ 1877.167137] [<ffffffffa079208e>] mlx5_eq_int+0x53e/0x6c0 [mlx5_core] [ 1877.174526] [<ffffffff8101a679>] ? sched_clock+0x9/0x10 [ 1877.180753] [<ffffffff810f717e>] handle_irq_event_percpu+0x3e/0x1e0 [ 1877.188014] [<ffffffff810f735d>] handle_irq_event+0x3d/0x60 [ 1877.194567] [<ffffffff810f9fe7>] handle_edge_irq+0x77/0x130 [ 1877.201129] [<ffffffff81014c3f>] handle_irq+0xbf/0x150 [ 1877.207244] [<ffffffff815ed78a>] ? atomic_notifier_call_chain+0x1a/0x20 [ 1877.214829] [<ffffffff815f434f>] do_IRQ+0x4f/0xf0 [ 1877.220498] [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d [ 1877.227025] <EOI> [ 1877.228967] [<ffffffff814834e2>] ? cpuidle_enter_state+0x52/0xc0 [ 1877.236990] [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200 [ 1877.243676] [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30 [ 1877.249831] [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290 [ 1877.256513] [<ffffffff815cfee1>] start_secondary+0x265/0x27b [ 1877.263111] Code: Bad RIP value. [ 1877.267296] RIP [< (null)>] (null) [ 1877.273264] RSP <ffff88046fd63df8> [ 1877.277531] CR2: 0000000000000000 Fixes: 19098df2da78 ("IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/core: Avoid accessing non-allocated memory when inferring port typeNoa Osherovich
Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") introduced the concept of type in ah_attr: * During ib_register_device, each port is checked for its type which is stored in ib_device's port_immutable array. * During uverbs' modify_qp, the type is inferred using the port number in ib_uverbs_qp_dest struct (address vector) by accessing the relevant port_immutable array and the type is passed on to providers. IB spec (version 1.3) enforces a valid port value only in Reset to Init. During Init to RTR, the address vector must be valid but port number is not mentioned as a field in the address vector, so its value is not validated, which leads to accesses to a non-allocated memory when inferring the port type. Save the real port number in ib_qp during modify to Init (when the comp_mask indicates that the port number is valid) and use this value to infer the port type. Avoid copying the address vector fields if the matching bit is not set in the attr_mask. Address vector can't be modified before the port, so no valid flow is affected. Fixes: 44c58487d51a ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24net/mlx5e: make mlx5e_profile constBhumika Goyal
Make this const as it is only passed as an argument to the function mlx5e_create_netdev and the corresponding argument is of type const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net/mlx4_core: make mlx4_profile constBhumika Goyal
Make these const as they are only used in a copy operation. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24Merge branch 'xdp-more-work-on-xdp-tracepoints'David S. Miller
Jesper Dangaard Brouer says: ==================== xdp: more work on xdp tracepoints More work on streamlining and performance optimizing the tracepoints for XDP. I've created a simple xdp_monitor application that uses this tracepoint, and prints statistics. Available at github: https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_kern.c https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_user.c The improvement over tracepoint with strcpy: 9810372 - 8428762 = +1381610 pps faster - (1/9810372 - 1/8428762)*10^9 = -16.7 nanosec - 100-(8428762/9810372*100) = strcpy-trace is 14.08% slower - 981037/8428762*100 = removing strcpy made it 11.64% faster V3: Fix merge conflict with commit e4a8e817d3cb ("bpf: misc xdp redirect cleanups") V2: Change trace_xdp_redirect() to align with args of trace_xdp_exception() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24xdp: get tracepoints xdp_exception and xdp_redirect in syncJesper Dangaard Brouer
Remove the net_device string name from the xdp_exception tracepoint, like the xdp_redirect tracepoint. Align the TP_STRUCT to have common entries between these two tracepoint. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24xdp: remove net_device names from xdp_redirect tracepointJesper Dangaard Brouer
There is too much overhead in the current trace_xdp_redirect tracepoint as it does strcpy and strlen on the net_device names. Besides, exposing the ifindex/index is actually the information that is needed in the tracepoint to diagnose issues. When a lookup fails (either ifindex or devmap index) then there is a need for saying which to_index that have issues. V2: Adjust args to be aligned with trace_xdp_exception. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24ixgbe: use return codes from ndo_xdp_xmit that are distinguishableJesper Dangaard Brouer
For XDP_REDIRECT the use of return code -EINVAL is confusing, as it is used in three different cases. (1) When the index or ifindex lookup fails, and in the ixgbe driver (2) when link is down and (3) when XDP have not been enabled. The return code can be picked up by the tracepoint xdp:xdp_redirect for diagnosing why XDP_REDIRECT isn't working. Thus, there is a need different return codes to tell the issues apart. I'm considering using a specific err-code scheme for XDP_REDIRECT instead of using these errno codes. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24xdp: make generic xdp redirect use tracepoint trace_xdp_redirectJesper Dangaard Brouer
If the xdp_do_generic_redirect() call fails, it trigger the trace_xdp_exception tracepoint. It seems better to use the same tracepoint trace_xdp_redirect, as the native xdp_do_redirect{,_map} does. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>