summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-07net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()Eric Dumazet
Use the new tcf_proto_check_kind() helper to make sure user provided value is well formed. BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline] BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668 CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108 __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245 string_nocheck lib/vsprintf.c:606 [inline] string+0x4be/0x600 lib/vsprintf.c:668 vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510 __request_module+0x2b1/0x11c0 kernel/kmod.c:143 tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139 tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline] tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850 rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmsg net/socket.c:2356 [inline] __do_sys_sendmsg net/socket.c:2365 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2363 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45a649 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006 RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4 R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline] kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132 kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86 slab_alloc_node mm/slub.c:2773 [inline] __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x306/0xa10 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmsg net/socket.c:2356 [inline] __do_sys_sendmsg net/socket.c:2365 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2363 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 6f96c3c6904c ("net_sched: fix backward compatibility for TCA_KIND") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07r8169: add missing RX enabling for WoL on RTL8125Heiner Kallweit
RTL8125 also requires to enable RX for WoL. v2: add missing Fixes tag Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07vhost/vsock: accept only packets with the right dst_cidStefano Garzarella
When we receive a new packet from the guest, we check if the src_cid is correct, but we forgot to check the dst_cid. The host should accept only packets where dst_cid is equal to the host CID. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07net: phy: dp83867: fix hfs boot in rgmii modeGrygorii Strashko
The commit ef87f7da6b28 ("net: phy: dp83867: move dt parsing to probe") causes regression on TI dra71x-evm and dra72x-evm, where DP83867 PHY is used in "rgmii-id" mode - the networking stops working. Unfortunately, it's not enough to just move DT parsing code to .probe() as it depends on phydev->interface value, which is set to correct value abter the .probe() is completed and before calling .config_init(). So, RGMII configuration can't be loaded from DT. To fix and issue - move RGMII validation code to .config_init() - parse RGMII parameters in dp83867_of_init(), but consider them as optional. Fixes: ef87f7da6b28 ("net: phy: dp83867: move dt parsing to probe") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07net: ethernet: ti: cpsw: fix extra rx interruptGrygorii Strashko
Now RX interrupt is triggered twice every time, because in cpsw_rx_interrupt() it is asked first and then disabled. So there will be pending interrupt always, when RX interrupt is enabled again in NAPI handler. Fix it by first disabling IRQ and then do ask. Fixes: 870915feabdc ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07inet: protect against too small mtu values.Eric Dumazet
syzbot was once again able to crash a host by setting a very small mtu on loopback device. Let's make inetdev_valid_mtu() available in include/net/ip.h, and use it in ip_setup_cork(), so that we protect both ip_append_page() and __ip_append_data() Also add a READ_ONCE() when the device mtu is read. Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(), even if other code paths might write over this field. Add a big comment in include/linux/netdevice.h about dev->mtu needing READ_ONCE()/WRITE_ONCE() annotations. Hopefully we will add the missing ones in followup patches. [1] refcount_t: saturated; leaking memory. WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x3e kernel/panic.c:582 report_bug+0x289/0x300 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:174 [inline] fixup_bug arch/x86/kernel/traps.c:169 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89 RSP: 0018:ffff88809689f550 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1 R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001 R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40 refcount_add include/linux/refcount.h:193 [inline] skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999 sock_wmalloc+0xf1/0x120 net/core/sock.c:2096 ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383 udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276 inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821 kernel_sendpage+0x92/0xf0 net/socket.c:3794 sock_sendpage+0x8b/0xc0 net/socket.c:936 pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458 splice_from_pipe_feed fs/splice.c:512 [inline] __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636 splice_from_pipe+0x108/0x170 fs/splice.c:671 generic_splice_sendpage+0x3c/0x50 fs/splice.c:842 do_splice_from fs/splice.c:861 [inline] direct_splice_actor+0x123/0x190 fs/splice.c:1035 splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990 do_splice_direct+0x1da/0x2a0 fs/splice.c:1078 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441409 Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409 RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005 RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010 R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180 R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07gre: refetch erspan header from skb->data after pskb_may_pull()Cong Wang
After pskb_may_pull() we should always refetch the header pointers from the skb->data in case it got reallocated. In gre_parse_header(), the erspan header is still fetched from the 'options' pointer which is fetched before pskb_may_pull(). Found this during code review of a KMSAN bug report. Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup") Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Acked-by: William Tu <u9012063@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-07pppoe: remove redundant BUG_ON() check in pppoe_pernetAditya Pakki
Passing NULL to pppoe_pernet causes a crash via BUG_ON. Dereferencing net in net_generici() also has the same effect. This patch removes the redundant BUG_ON check on the same parameter. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06Merge branch 'tcp-fix-handling-of-stale-syncookies-timestamps'David S. Miller
Guillaume Nault says: ==================== tcp: fix handling of stale syncookies timestamps The synflood timestamps (->ts_recent_stamp and ->synq_overflow_ts) are only refreshed when the syncookie protection triggers. Therefore, their value can become very far apart from jiffies if no synflood happens for a long time. If jiffies grows too much and wraps while the synflood timestamp isn't refreshed, then time_after32() might consider the later to be in the future. This can trick tcp_synq_no_recent_overflow() into returning erroneous values and rejecting valid ACKs. Patch 1 handles the case of ACKs using legitimate syncookies. Patch 2 handles the case of stray ACKs. Patch 3 annotates lockless timestamp operations with READ_ONCE() and WRITE_ONCE(). Changes from v3: - Fix description of time_between32() (found by Eric Dumazet). - Use more accurate Fixes tag in patch 3 (suggested by Eric Dumazet). Changes from v2: - Define and use time_between32() instead of a pair of time_before32/time_after32 (suggested by Eric Dumazet). - Use 'last_overflow - HZ' as lower bound in tcp_synq_no_recent_overflow(), to accommodate for concurrent timestamp updates (found by Eric Dumazet). - Add a third patch to annotate lockless accesses to .ts_recent_stamp. Changes from v1: - Initialising timestamps at socket creation time is not enough because jiffies wraps in 24 days with HZ=1000 (Eric Dumazet). Handle stale timestamps in tcp_synq_overflow() and tcp_synq_no_recent_overflow() instead. - Rework commit description. - Add a second patch to handle the case of stray ACKs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()Guillaume Nault
Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the timestamp of the last synflood. Protect them with READ_ONCE() and WRITE_ONCE() since reads and writes aren't serialised. Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from struct tcp_sock"). But unprotected accesses were already there when timestamp was stored in .last_synq_overflow. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06tcp: tighten acceptance of ACKs not matching a child socketGuillaume Nault
When no synflood occurs, the synflood timestamp isn't updated. Therefore it can be so old that time_after32() can consider it to be in the future. That's a problem for tcp_synq_no_recent_overflow() as it may report that a recent overflow occurred while, in fact, it's just that jiffies has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31. Spurious detection of recent overflows lead to extra syncookie verification in cookie_v[46]_check(). At that point, the verification should fail and the packet dropped. But we should have dropped the packet earlier as we didn't even send a syncookie. Let's refine tcp_synq_no_recent_overflow() to report a recent overflow only if jiffies is within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This way, no spurious recent overflow is reported when jiffies wraps and 'last_overflow' becomes in the future from the point of view of time_after32(). However, if jiffies wraps and enters the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with 'last_overflow' being a stale synflood timestamp), then tcp_synq_no_recent_overflow() still erroneously reports an overflow. In such cases, we have to rely on syncookie verification to drop the packet. We unfortunately have no way to differentiate between a fresh and a stale syncookie timestamp. In practice, using last_overflow as lower bound is problematic. If the synflood timestamp is concurrently updated between the time we read jiffies and the moment we store the timestamp in 'last_overflow', then 'now' becomes smaller than 'last_overflow' and tcp_synq_no_recent_overflow() returns true, potentially dropping a valid syncookie. Reading jiffies after loading the timestamp could fix the problem, but that'd require a memory barrier. Let's just accommodate for potential timestamp growth instead and extend the interval using 'last_overflow - HZ' as lower bound. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06tcp: fix rejected syncookies due to stale timestampsGuillaume Nault
If no synflood happens for a long enough period of time, then the synflood timestamp isn't refreshed and jiffies can advance so much that time_after32() can't accurately compare them any more. Therefore, we can end up in a situation where time_after32(now, last_overflow + HZ) returns false, just because these two values are too far apart. In that case, the synflood timestamp isn't updated as it should be, which can trick tcp_synq_no_recent_overflow() into rejecting valid syncookies. For example, let's consider the following scenario on a system with HZ=1000: * The synflood timestamp is 0, either because that's the timestamp of the last synflood or, more commonly, because we're working with a freshly created socket. * We receive a new SYN, which triggers synflood protection. Let's say that this happens when jiffies == 2147484649 (that is, 'synflood timestamp' + HZ + 2^31 + 1). * Then tcp_synq_overflow() doesn't update the synflood timestamp, because time_after32(2147484649, 1000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 1000: the value of 'last_overflow' + HZ. * A bit later, we receive the ACK completing the 3WHS. But cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow() says that we're not under synflood. That's because time_after32(2147484649, 120000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID. Of course, in reality jiffies would have increased a bit, but this condition will last for the next 119 seconds, which is far enough to accommodate for jiffie's growth. Fix this by updating the overflow timestamp whenever jiffies isn't within the [last_overflow, last_overflow + HZ] range. That shouldn't have any performance impact since the update still happens at most once per second. Now we're guaranteed to have fresh timestamps while under synflood, so tcp_synq_no_recent_overflow() can safely use it with time_after32() in such situations. Stale timestamps can still make tcp_synq_no_recent_overflow() return the wrong verdict when not under synflood. This will be handled in the next patch. For 64 bits architectures, the problem was introduced with the conversion of ->tw_ts_recent_stamp to 32 bits integer by commit cca9bab1b72c ("tcp: use monotonic timestamps for PAWS"). The problem has always been there on 32 bits architectures. Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06Merge tag 'mlx5-fixes-2019-12-05' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-12-05 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.19: ('net/mlx5e: Query global pause state before setting prio2buffer') For -stable v5.3 ('net/mlx5e: Fix SFF 8472 eeprom length') ('net/mlx5e: Fix translation of link mode into speed') ('net/mlx5e: Fix freeing flow with kfree() and not kvfree()') ('net/mlx5e: ethtool, Fix analysis of speed setting') ('net/mlx5e: Fix TXQ indices to be sequential') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06lpc_eth: kernel BUG on removeBruno Carneiro da Cunha
We may have found a bug in the nxp/lpc_eth.c driver. The function platform_set_drvdata() is called twice, the second time it is called, in lpc_mii_init(), it overwrites the struct net_device which should be at pdev->dev->driver_data with pldat->mii_bus. When trying to remove the driver, in lpc_eth_drv_remove(), platform_get_drvdata() will return the pldat->mii_bus pointer and try to use it as a struct net_device pointer. This causes unregister_netdev to segfault and generate a kernel BUG. Is this reproducible? Signed-off-by: Daniel Martinez <linux@danielsmartinez.com> Signed-off-by: Bruno Carneiro da Cunha <brunocarneirodacunha@usp.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06tcp: md5: fix potential overestimation of TCP option spaceEric Dumazet
Back in 2008, Adam Langley fixed the corner case of packets for flows having all of the following options : MD5 TS SACK Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block can be cooked from the remaining 8 bytes. tcp_established_options() correctly sets opts->num_sack_blocks to zero, but returns 36 instead of 32. This means TCP cooks packets with 4 extra bytes at the end of options, containing unitialized bytes. Fixes: 33ad798c924b ("tcp: options clean up") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06Merge branch 'net-tc-indirect-block-relay'David S. Miller
John Hurley says: ==================== Ensure egress un/bind are relayed with indirect blocks On register and unregister for indirect blocks, a command is called that sends a bind/unbind event to the registering driver. This command assumes that the bind to indirect block will be on ingress. However, drivers such as NFP have allowed binding to clsact qdiscs as well as ingress qdiscs from mainline Linux 5.2. A clsact qdisc binds to an ingress and an egress block. Rather than assuming that an indirect bind is always ingress, modify the function names to remove the ingress tag (patch 1). In cls_api, which is used by NFP to offload TC flower, generate bind/unbind message for both ingress and egress blocks on the event of indirectly registering/unregistering from that block. Doing so mimics the behaviour of both ingress and clsact qdiscs on initialise and destroy. This now ensures that drivers such as NFP receive the correct binder type for the indirect block registration. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06net: sched: allow indirect blocks to bind to clsact in TCJohn Hurley
When a device is bound to a clsact qdisc, bind events are triggered to registered drivers for both ingress and egress. However, if a driver registers to such a device using the indirect block routines then it is assumed that it is only interested in ingress offload and so only replays ingress bind/unbind messages. The NFP driver supports the offload of some egress filters when registering to a block with qdisc of type clsact. However, on unregister, if the block is still active, it will not receive an unbind egress notification which can prevent proper cleanup of other registered callbacks. Modify the indirect block callback command in TC to send messages of ingress and/or egress bind depending on the qdisc in use. NFP currently supports egress offload for TC flower offload so the changes are only added to TC. Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06net: core: rename indirect block ingress cb functionJohn Hurley
With indirect blocks, a driver can register for callbacks from a device that is does not 'own', for example, a tunnel device. When registering to or unregistering from a new device, a callback is triggered to generate a bind/unbind event. This, in turn, allows the driver to receive any existing rules or to properly clean up installed rules. When first added, it was assumed that all indirect block registrations would be for ingress offloads. However, the NFP driver can, in some instances, support clsact qdisc binds for egress offload. Change the name of the indirect block callback command in flow_offload to remove the 'ingress' identifier from it. While this does not change functionality, a follow up patch will implement a more more generic callback than just those currently just supporting ingress offload. Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06net-sysfs: Call dev_hold always in netdev_queue_add_kobjectJouni Hogander
Dev_hold has to be called always in netdev_queue_add_kobject. Otherwise usage count drops below 0 in case of failure in kobject_init_and_add. Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject") Reported-by: Hulk Robot <hulkci@huawei.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Miller <davem@davemloft.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06net: dsa: fix flow dissection on Tx pathAlexander Lobakin
Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an ability to override protocol and network offset during flow dissection for DSA-enabled devices (i.e. controllers shipped as switch CPU ports) in order to fix skb hashing for RPS on Rx path. However, skb_hash() and added part of code can be invoked not only on Rx, but also on Tx path if we have a multi-queued device and: - kernel is running on UP system or - XPS is not configured. The call stack in this two cases will be like: dev_queue_xmit() -> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() -> skb_tx_hash() -> skb_get_hash(). The problem is that skbs queued for Tx have both network offset and correct protocol already set up even after inserting a CPU tag by DSA tagger, so calling tag_ops->flow_dissect() on this path actually only breaks flow dissection and hashing. This can be observed by adding debug prints just before and right after tag_ops->flow_dissect() call to the related block of code: Before the patch: Rx path (RPS): [ 19.240001] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 19.244271] tag_ops->flow_dissect() [ 19.247811] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */ [ 19.215435] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 19.219746] tag_ops->flow_dissect() [ 19.223241] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */ [ 18.654057] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 18.658332] tag_ops->flow_dissect() [ 18.661826] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */ Tx path (UP system): [ 18.759560] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */ [ 18.763933] tag_ops->flow_dissect() [ 18.767485] Tx: proto: 0x920b, nhoff: 34 /* junk */ [ 22.800020] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */ [ 22.804392] tag_ops->flow_dissect() [ 22.807921] Tx: proto: 0x920b, nhoff: 34 /* junk */ [ 16.898342] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */ [ 16.902705] tag_ops->flow_dissect() [ 16.906227] Tx: proto: 0x920b, nhoff: 34 /* junk */ After: Rx path (RPS): [ 16.520993] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 16.525260] tag_ops->flow_dissect() [ 16.528808] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */ [ 15.484807] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 15.490417] tag_ops->flow_dissect() [ 15.495223] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */ [ 17.134621] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */ [ 17.138895] tag_ops->flow_dissect() [ 17.142388] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */ Tx path (UP system): [ 15.499558] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */ [ 20.664689] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */ [ 18.565782] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */ In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)' to prevent code from calling tag_ops->flow_dissect() on Tx. I also decided to initialize 'offset' variable so tagger callbacks can now safely leave it untouched without provoking a chaos. Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection") Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06net/tls: Fix return values to avoid ENOTSUPPValentin Vidic
ENOTSUPP is not available in userspace, for example: setsockopt failed, 524, Unknown error 524 Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06net: avoid an indirect call in ____sys_recvmsg()Eric Dumazet
CONFIG_RETPOLINE=y made indirect calls expensive. gcc seems to add an indirect call in ____sys_recvmsg(). Rewriting the code slightly makes sure to avoid this indirection. Alternative would be to not call sock_recvmsg() and instead use security_socket_recvmsg() and sock_recvmsg_nosec(), but this is less readable IMO. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: David Laight <David.Laight@aculab.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06phy: mdio-thunder: add missed pci_release_regions in removeChuhong Yuan
The driver forgets to call pci_release_regions() in remove like that in probe failure. Add the missed call to fix it. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06tipc: fix ordering of tipc module init and exit routineTaehee Yoo
In order to set/get/dump, the tipc uses the generic netlink infrastructure. So, when tipc module is inserted, init function calls genl_register_family(). After genl_register_family(), set/get/dump commands are immediately allowed and these callbacks internally use the net_generic. net_generic is allocated by register_pernet_device() but this is called after genl_register_family() in the __init function. So, these callbacks would use un-initialized net_generic. Test commands: #SHELL1 while : do modprobe tipc modprobe -rv tipc done #SHELL2 while : do tipc link list done Splat looks like: [ 59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled [ 59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194 [ 59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc] [ 59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00 [ 59.622550][ T2780] NET: Registered protocol family 30 [ 59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202 [ 59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907 [ 59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149 [ 59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1 [ 59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40 [ 59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328 [ 59.624639][ T2788] FS: 00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 [ 59.624645][ T2788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.625875][ T2780] tipc: Started in single node mode [ 59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0 [ 59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.636478][ T2788] Call Trace: [ 59.637025][ T2788] tipc_nl_add_bc_link+0x179/0x1470 [tipc] [ 59.638219][ T2788] ? lock_downgrade+0x6e0/0x6e0 [ 59.638923][ T2788] ? __tipc_nl_add_link+0xf90/0xf90 [tipc] [ 59.639533][ T2788] ? tipc_nl_node_dump_link+0x318/0xa50 [tipc] [ 59.640160][ T2788] ? mutex_lock_io_nested+0x1380/0x1380 [ 59.640746][ T2788] tipc_nl_node_dump_link+0x4fd/0xa50 [tipc] [ 59.641356][ T2788] ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc] [ 59.642088][ T2788] ? __skb_ext_del+0x270/0x270 [ 59.642594][ T2788] genl_lock_dumpit+0x85/0xb0 [ 59.643050][ T2788] netlink_dump+0x49c/0xed0 [ 59.643529][ T2788] ? __netlink_sendskb+0xc0/0xc0 [ 59.644044][ T2788] ? __netlink_dump_start+0x190/0x800 [ 59.644617][ T2788] ? __mutex_unlock_slowpath+0xd0/0x670 [ 59.645177][ T2788] __netlink_dump_start+0x5a0/0x800 [ 59.645692][ T2788] genl_rcv_msg+0xa75/0xe90 [ 59.646144][ T2788] ? __lock_acquire+0xdfe/0x3de0 [ 59.646692][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320 [ 59.647340][ T2788] ? genl_lock_dumpit+0xb0/0xb0 [ 59.647821][ T2788] ? genl_unlock+0x20/0x20 [ 59.648290][ T2788] ? genl_parallel_done+0xe0/0xe0 [ 59.648787][ T2788] ? find_held_lock+0x39/0x1d0 [ 59.649276][ T2788] ? genl_rcv+0x15/0x40 [ 59.649722][ T2788] ? lock_contended+0xcd0/0xcd0 [ 59.650296][ T2788] netlink_rcv_skb+0x121/0x350 [ 59.650828][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320 [ 59.651491][ T2788] ? netlink_ack+0x940/0x940 [ 59.651953][ T2788] ? lock_acquire+0x164/0x3b0 [ 59.652449][ T2788] genl_rcv+0x24/0x40 [ 59.652841][ T2788] netlink_unicast+0x421/0x600 [ ... ] Fixes: 7e4369057806 ("tipc: fix a slab object leak") Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06mqprio: Fix out-of-bounds access in mqprio_dumpVladyslav Tarasiuk
When user runs a command like tc qdisc add dev eth1 root mqprio KASAN stack-out-of-bounds warning is emitted. Currently, NLA_ALIGN macro used in mqprio_dump provides too large buffer size as argument for nla_put and memcpy down the call stack. The flow looks like this: 1. nla_put expects exact object size as an argument; 2. Later it provides this size to memcpy; 3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN macro itself. Therefore, NLA_ALIGN should not be applied to the nla_put parameter. Otherwise it will lead to out-of-bounds memory access in memcpy. Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06net: stmmac: reset Tx desc base address before restarting TxJongsung Kim
Refer to the databook of DesignWare Cores Ethernet MAC Universal: 6.2.1.5 Register 4 (Transmit Descriptor List Address Register If this register is not changed when the ST bit is set to 0, then the DMA takes the descriptor address where it was stopped earlier. The stmmac_tx_err() does zero indices to Tx descriptors, but does not reset HW current Tx descriptor address. To fix inconsistency, the base address of the Tx descriptors should be rewritten before restarting Tx. Signed-off-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06enetc: disable EEE autoneg by defaultYangbo Lu
The EEE support has not been enabled on ENETC, but it may connect to a PHY which supports EEE and advertises EEE by default, while its link partner also advertises EEE. If this happens, the PHY enters low power mode when the traffic rate is low and causes packet loss. This patch disables EEE advertisement by default for any PHY that ENETC connects to, to prevent the above unwanted outcome. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-12-05 The following pull-request contains BPF updates for your *net* tree. We've added 6 non-merge commits during the last 1 day(s) which contain a total of 14 files changed, 116 insertions(+), 37 deletions(-). The main changes are: 1) three selftests fixes, from Stanislav. 2) one samples fix, from Jesper. 3) one verifier fix, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05ppp: fix out-of-bounds access in bpf_prog_create()Eric Biggers
sock_fprog_kern::len is in units of struct sock_filter, not bytes. Fixes: 3e859adf3643 ("compat_ioctl: unify copy-in of ppp filters") Reported-by: syzbot+eb853b51b10f1befa0b7@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05Merge branch 'hns3-fixes'David S. Miller
Huazhong Tan says: ==================== net: hns3: fixes for -net This patchset includes misc fixes for the HNS3 ethernet driver. [patch 1/3] fixes a TX queue not restarted problem. [patch 2/3] fixes a use-after-free issue. [patch 3/3] fixes a VF ID issue for setting VF VLAN. change log: V1->V2: keeps 'ring' as parameter in hns3_nic_maybe_stop_tx() in [patch 1/3], suggestted by David. rewrites [patch 2/3]'s commit log to make it be easier to understand, suggestted by David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05net: hns3: fix VF ID issue for setting VF VLANJian Shen
Previously, when set VF VLAN with command "ip link set <pf name> vf <vf id> vlan <vlan id>", the VF ID 0 is handled as PF incorrectly, which should be the first VF. This patch fixes it. Fixes: 21e043cd8124 ("net: hns3: fix set port based VLAN for PF") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05net: hns3: fix a use after free problem in hns3_nic_maybe_stop_tx()Yunsheng Lin
Currently, hns3_nic_maybe_stop_tx() uses skb_copy() to linearize a SKB if the BD num required by the SKB does not meet the hardware limitation, and it linearizes the SKB by allocating a new linearized SKB and freeing the old SKB, if hns3_nic_maybe_stop_tx() returns -EBUSY because there are no enough space in the ring to send the linearized skb to hardware, the sch_direct_xmit() still hold reference to old SKB and try to retransmit the old SKB when dev_hard_start_xmit() return TX_BUSY, which may cause use after freed problem. This patch fixes it by using __skb_linearize() to linearize the SKB in hns3_nic_maybe_stop_tx(). Fixes: 51e8439f3496 ("net: hns3: add 8 BD limit for tx flow") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05net: hns3: fix for TX queue not restarted problemYunsheng Lin
There is timing window between ring_space checking and netif_stop_subqueue when transmiting a SKB, and the TX BD cleaning may be executed during the time window, which may caused TX queue not restarted problem. This patch fixes it by rechecking the ring_space after netif_stop_subqueue to make sure TX queue is restarted. Also, the ring->next_to_clean is updated even when pkts is zero, because all the TX BD cleaned may be non-SKB, so it needs to check if TX queue need to be restarted. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05net: ethernet: ti: cpsw_switchdev: fix unmet direct dependencies detected ↵Grygorii Strashko
for NET_SWITCHDEV Replace "select NET_SWITCHDEV" vs "depends on NET_SWITCHDEV" to fix Kconfig warning with CONFIG_COMPILE_TEST=y WARNING: unmet direct dependencies detected for NET_SWITCHDEV Depends on [n]: NET [=y] && INET [=n] Selected by [y]: - TI_CPSW_SWITCHDEV [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && (ARCH_DAVINCI || ARCH_OMAP2PLUS || COMPILE_TEST [=y]) because TI_CPSW_SWITCHDEV blindly selects NET_SWITCHDEV even though INET is not set/enabled, while NET_SWITCHDEV depends on INET. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05net/mlx5e: E-switch, Fix Ingress ACL groups in switchdev mode for prio tagParav Pandit
In cited commit, when prio tag mode is enabled, FTE creation fails due to missing group with valid match criteria. Hence, (a) create prio tag group metadata_prio_tag_grp when prio tag is enabled with match criteria for vlan push FTE. (b) Rename metadata_grp to metadata_allmatch_grp to reflect its purpose. Also when priority tag is enabled, delete metadata settings after deleting ingress rules, which are using it. Tide up rest of the ingress config code for unnecessary labels. Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: ethtool, Fix analysis of speed settingAya Levin
When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is configured while the user, who has an advanced HW which supports extended PTYS, expects also 50G*2 to be configured. With this patch, when extended PTYS mode is available, configure PTYS via extended fields. Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix translation of link mode into speedAya Levin
Add a missing value in translation of PTYS ext_eth_proto_oper to its corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool shows unknown speed. With this fix, ethtool shows speed is 100G as expected. Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix free peer_flow when refcount is 0Roi Dayan
It could be neigh update flow took a refcount on peer flow so sometimes we cannot release peer flow even if parent flow is being freed now. Fixes: 5a7e5bcb663d ("net/mlx5e: Extend tc flow struct with reference counter") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix freeing flow with kfree() and not kvfree()Roi Dayan
Flows are allocated with kzalloc() so free with kfree(). Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix SFF 8472 eeprom lengthEran Ben Elisha
SFF 8472 eeprom length is 512 bytes. Fix module info return value to support 512 bytes read. Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Query global pause state before setting prio2bufferHuy Nguyen
When the user changes prio2buffer mapping while global pause is enabled, mlx5 driver incorrectly sets all active buffers (buffer that has at least one priority mapped) to lossy. Solution: If global pause is enabled, set all the active buffers to lossless in prio2buffer command. Also, add error message when buffer size is not enough to meet xoff threshold. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix TXQ indices to be sequentialEran Ben Elisha
Cited patch changed (channel index, tc) => (TXQ index) mapping to be a static one, in order to keep indices consistent when changing number of channels or TCs. For 32 channels (OOB) and 8 TCs, real num of TXQs is 256. When reducing the amount of channels to 8, the real num of TXQs will be changed to 64. This indices method is buggy: - Channel #0, TC 3, the TXQ index is 96. - Index 8 is not valid, as there is no such TXQ from driver perspective (As it represents channel #8, TC 0, which is not valid with the above configuration). As part of driver's select queue, it calls netdev_pick_tx which returns an index in the range of real number of TXQs. Depends on the return value, with the examples above, driver could have returned index larger than the real number of tx queues, or crash the kernel as it tries to read invalid address of SQ which was not allocated. Fix that by allocating sequential TXQ indices, and hold a new mapping between (channel index, tc) => (real TXQ index). This mapping will be updated as part of priv channels activation, and is used in mlx5e_select_queue to find the selected queue index. The existing indices mapping (channel_tc2txq) is no longer needed, as it is used only for statistics structures and can be calculated on run time. Delete its definintion and updates. Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05Merge branch 's390-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2019-12-05 please apply the following fixes to your net tree. The first two patches target the RX data path, the third fixes a memory leak when shutting down a qeth device. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05s390/qeth: fix dangling IO buffers after halt/clearJulian Wiedmann
The cio layer's intparm logic does not align itself well with how qeth manages cmd IOs. When an active IO gets terminated via halt/clear, the corresponding IRQ's intparm does not reflect the cmd buffer but rather the intparm that was passed to ccw_device_halt() / ccw_device_clear(). This behaviour was recently clarified in commit b91d9e67e50b ("s390/cio: fix intparm documentation"). As a result, qeth_irq() currently doesn't cancel a cmd that was terminated via halt/clear. This primarily causes us to leak card->read_cmd after the qeth device is removed, since our IO path still holds a refcount for this cmd. For qeth this means that we need to keep track of which IO is pending on a device ('active_cmd'), and use this as the intparm when calling halt/clear. Otherwise qeth_irq() can't match the subsequent IRQ to its cmd buffer. Since we now keep track of the _expected_ intparm, we can also detect any mismatch; this would constitute a bug somewhere in the lower layers. In this case cancel the active cmd - we effectively "lost" the IRQ and should not expect any further notification for this IO. Fixes: 405548959cc7 ("s390/qeth: add support for dynamically allocated cmds") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05s390/qeth: ensure linear access to packet headersJulian Wiedmann
When the RX path builds non-linear skbs, the packet headers can currently spill over into page fragments. Depending on the packet type and what fields we need to access in the headers, this could cause us to go past the end of skb->data. So for non-linear packets, copy precisely the length of the necessary headers ('linear_len') into skb->data. And don't copy more, upper-level protocols will peel whatever additional packet headers they need. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05s390/qeth: guard against runt packetsJulian Wiedmann
Depending on a packet's type, the RX path needs to access fields in the packet headers and thus requires a minimum packet length. Enforce this length when building the skb. On the other hand a single runt packet is no reason to drop the whole RX buffer. So just skip it, and continue processing on the next packet. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05net: thunderx: start phy before starting autonegotiationMian Yousaf Kaukab
Since commit 2b3e88ea6528 ("net: phy: improve phy state checking") phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start() before calling phy_start_aneg() during probe so that autonegotiation is initiated. As phy_start() takes care of calling phy_start_aneg(), drop the explicit call to phy_start_aneg(). Network fails without this patch on Octeon TX. Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-05hsr: fix a NULL pointer dereference in hsr_dev_xmit()Taehee Yoo
hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would return NULL if master node is not existing in the list. But hsr_dev_xmit() doesn't check return pointer so a NULL dereference could occur. Test commands: ip netns add nst ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link set veth1 netns nst ip link set veth3 netns nst ip link set veth0 up ip link set veth2 up ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip netns exec nst ip link set veth1 up ip netns exec nst ip link set veth3 up ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3 ip netns exec nst ip a a 192.168.100.2/24 dev hsr1 ip netns exec nst ip link set hsr1 up hping3 192.168.100.2 -2 --flood & modprobe -rv hsr Splat looks like: [ 217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled [ 217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192 [ 217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr] [ 217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b [ 217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202 [ 217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002 [ 217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010 [ 217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001 [ 217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000 [ 217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00 [ 217.383094][ T1635] FS: 00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000 [ 217.384289][ T1635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0 [ 217.385940][ T1635] Call Trace: [ 217.386544][ T1635] dev_hard_start_xmit+0x160/0x740 [ 217.387114][ T1635] __dev_queue_xmit+0x1961/0x2e10 [ 217.388118][ T1635] ? check_object+0xaf/0x260 [ 217.391466][ T1635] ? __alloc_skb+0xb9/0x500 [ 217.392017][ T1635] ? init_object+0x6b/0x80 [ 217.392629][ T1635] ? netdev_core_pick_tx+0x2e0/0x2e0 [ 217.393175][ T1635] ? __alloc_skb+0xb9/0x500 [ 217.393727][ T1635] ? rcu_read_lock_sched_held+0x90/0xc0 [ 217.394331][ T1635] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 217.395013][ T1635] ? kasan_unpoison_shadow+0x30/0x40 [ 217.395668][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0 [ 217.396280][ T1635] ? __kmalloc_node_track_caller+0x3a8/0x3f0 [ 217.399007][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0 [ 217.400093][ T1635] ? __kmalloc_reserve.isra.46+0x2e/0xb0 [ 217.401118][ T1635] ? memset+0x1f/0x40 [ 217.402529][ T1635] ? __alloc_skb+0x317/0x500 [ 217.404915][ T1635] ? arp_xmit+0xca/0x2c0 [ ... ] Fixes: 311633b60406 ("hsr: switch ->dellink() to ->ndo_uninit()") Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04selftests/bpf: Add a fexit/bpf2bpf test with target bpf prog no calleesYonghong Song
The existing fexit_bpf2bpf test covers the target progrm with callees. This patch added a test for the target program without callees. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191205010607.177904-1-yhs@fb.com
2019-12-04bpf: Fix a bug when getting subprog 0 jited image in check_attach_btf_idYonghong Song
For jited bpf program, if the subprogram count is 1, i.e., there is no callees in the program, prog->aux->func will be NULL and prog->bpf_func points to image address of the program. If there is more than one subprogram, prog->aux->func is populated, and subprogram 0 can be accessed through either prog->bpf_func or prog->aux->func[0]. Other subprograms should be accessed through prog->aux->func[subprog_id]. This patch fixed a bug in check_attach_btf_id(), where prog->aux->func[subprog_id] is used to access any subprogram which caused a segfault like below: [79162.619208] BUG: kernel NULL pointer dereference, address: 0000000000000000 ...... [79162.634255] Call Trace: [79162.634974] ? _cond_resched+0x15/0x30 [79162.635686] ? kmem_cache_alloc_trace+0x162/0x220 [79162.636398] ? selinux_bpf_prog_alloc+0x1f/0x60 [79162.637111] bpf_prog_load+0x3de/0x690 [79162.637809] __do_sys_bpf+0x105/0x1740 [79162.638488] do_syscall_64+0x5b/0x180 [79162.639147] entry_SYSCALL_64_after_hwframe+0x44/0xa9 ...... Fixes: 5b92a28aae4d ("bpf: Support attaching tracing BPF program to other BPF programs") Reported-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191205010606.177774-1-yhs@fb.com