summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-24netfilter: nft_fwd_netdev: validate family and chain typePablo Neira Ayuso
Make sure the forward action is only used from ingress. Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_set_rbtree: Detect partial overlaps on insertionStefano Brivio
...and return -ENOTEMPTY to the front-end in this case, instead of proceeding. Currently, nft takes care of checking for these cases and not sending them to the kernel, but if we drop the set_overlap() call in nft we can end up in situations like: # nft add table t # nft add set t s '{ type inet_service ; flags interval ; }' # nft add element t s '{ 1 - 5 }' # nft add element t s '{ 6 - 10 }' # nft add element t s '{ 4 - 7 }' # nft list set t s table ip t { set s { type inet_service flags interval elements = { 1-3, 4-5, 6-7 } } } This change has the primary purpose of making the behaviour consistent with nft_set_pipapo, but is also functional to avoid inconsistent behaviour if userspace sends overlapping elements for any reason. v2: When we meet the same key data in the tree, as start element while inserting an end element, or as end element while inserting a start element, actually check that the existing element is active, before resetting the overlap flag (Pablo Neira Ayuso) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()Stefano Brivio
Replace negations of nft_rbtree_interval_end() with a new helper, nft_rbtree_interval_start(), wherever this helps to visualise the problem at hand, that is, for all the occurrences except for the comparison against given flags in __nft_rbtree_get(). This gets especially useful in the next patch. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_set_pipapo: Separate partial and complete overlap cases on ↵Stefano Brivio
insertion ...and return -ENOTEMPTY to the front-end on collision, -EEXIST if an identical element already exists. Together with the previous patch, element collision will now be returned to the user as -EEXIST. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nf_tables: Allow set back-ends to report partial overlaps on ↵Pablo Neira Ayuso
insertion Currently, the -EEXIST return code of ->insert() callbacks is ambiguous: it might indicate that a given element (including intervals) already exists as such, or that the new element would clash with existing ones. If identical elements already exist, the front-end is ignoring this without returning error, in case NLM_F_EXCL is not set. However, if the new element can't be inserted due an overlap, we should report this to the user. To this purpose, allow set back-ends to return -ENOTEMPTY on collision with existing elements, translate that to -EEXIST, and return that to userspace, no matter if NLM_F_EXCL was set. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-21net: phy: dp83867: w/a for fld detect threshold bootstrapping issueGrygorii Strashko
When the DP83867 PHY is strapped to enable Fast Link Drop (FLD) feature STRAP_STS2.STRAP_ FLD (reg 0x006F bit 10), the Energy Lost Threshold for FLD Energy Lost Mode FLD_THR_CFG.ENERGY_LOST_FLD_THR (reg 0x002e bits 2:0) will be defaulted to 0x2. This may cause the phy link to be unstable. The new DP83867 DM recommends to always restore ENERGY_LOST_FLD_THR to 0x1. Hence, restore default value of FLD_THR_CFG.ENERGY_LOST_FLD_THR to 0x1 when FLD is enabled by bootstrapping as recommended by DM. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21net: stmmac: dwmac-rk: fix error path in rk_gmac_probeEmil Renner Berthing
Make sure we clean up devicetree related configuration also when clock init fails. Fixes: fecd4d7eef8b ("net: stmmac: dwmac-rk: Add integrated PHY support") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21slcan: not call free_netdev before rtnl_unlock in slcan_openOliver Hartkopp
As the description before netdev_run_todo, we cannot call free_netdev before rtnl_unlock, fix it by reorder the code. This patch is a 1:1 copy of upstream slip.c commit f596c87005f7 ("slip: not call free_netdev before rtnl_unlock in slip_open"). Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21ionic: make spdxcheck.py happyLukas Bulwahn
Headers ionic_if.h and ionic_regs.h are licensed under three alternative licenses and the used SPDX-License-Identifier expression makes ./scripts/spdxcheck.py complain: drivers/net/ethernet/pensando/ionic/ionic_if.h: 1:52 Syntax error: OR drivers/net/ethernet/pensando/ionic/ionic_regs.h: 1:52 Syntax error: OR As OR is associative, it is irrelevant if the parentheses are put around the first or the second OR-expression. Simply add parentheses to make spdxcheck.py happy. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21hsr: fix general protection fault in hsr_addr_is_self()Taehee Yoo
The port->hsr is used in the hsr_handle_frame(), which is a callback of rx_handler. hsr master and slaves are initialized in hsr_add_port(). This function initializes several pointers, which includes port->hsr after registering rx_handler. So, in the rx_handler routine, un-initialized pointer would be used. In order to fix this, pointers should be initialized before registering rx_handler. Test commands: ip netns del left ip netns del right modprobe -rv veth modprobe -rv hsr killall ping modprobe hsr ip netns add left ip netns add right ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link add veth4 type veth peer name veth5 ip link set veth1 netns left ip link set veth3 netns right ip link set veth4 netns left ip link set veth5 netns right ip link set veth0 up ip link set veth2 up ip link set veth0 address fc:00:00:00:00:01 ip link set veth2 address fc:00:00:00:00:02 ip netns exec left ip link set veth1 up ip netns exec left ip link set veth4 up ip netns exec right ip link set veth3 up ip netns exec right ip link set veth5 up ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip netns exec left ip link add hsr1 type hsr slave1 veth1 slave2 veth4 ip netns exec left ip a a 192.168.100.2/24 dev hsr1 ip netns exec left ip link set hsr1 up ip netns exec left ip n a 192.168.100.1 dev hsr1 lladdr \ fc:00:00:00:00:01 nud permanent ip netns exec left ip n r 192.168.100.1 dev hsr1 lladdr \ fc:00:00:00:00:01 nud permanent for i in {1..100} do ip netns exec left ping 192.168.100.1 & done ip netns exec left hping3 192.168.100.1 -2 --flood & ip netns exec right ip link add hsr2 type hsr slave1 veth3 slave2 veth5 ip netns exec right ip a a 192.168.100.3/24 dev hsr2 ip netns exec right ip link set hsr2 up ip netns exec right ip n a 192.168.100.1 dev hsr2 lladdr \ fc:00:00:00:00:02 nud permanent ip netns exec right ip n r 192.168.100.1 dev hsr2 lladdr \ fc:00:00:00:00:02 nud permanent for i in {1..100} do ip netns exec right ping 192.168.100.1 & done ip netns exec right hping3 192.168.100.1 -2 --flood & while : do ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip link del hsr0 done Splat looks like: [ 120.954938][ C0] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1]I [ 120.957761][ C0] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] [ 120.959064][ C0] CPU: 0 PID: 1511 Comm: hping3 Not tainted 5.6.0-rc5+ #460 [ 120.960054][ C0] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 120.962261][ C0] RIP: 0010:hsr_addr_is_self+0x65/0x2a0 [hsr] [ 120.963149][ C0] Code: 44 24 18 70 73 2f c0 48 c1 eb 03 48 8d 04 13 c7 00 f1 f1 f1 f1 c7 40 04 00 f2 f2 f2 4 [ 120.966277][ C0] RSP: 0018:ffff8880d9c09af0 EFLAGS: 00010206 [ 120.967293][ C0] RAX: 0000000000000006 RBX: 1ffff1101b38135f RCX: 0000000000000000 [ 120.968516][ C0] RDX: dffffc0000000000 RSI: ffff8880d17cb208 RDI: 0000000000000000 [ 120.969718][ C0] RBP: 0000000000000030 R08: ffffed101b3c0e3c R09: 0000000000000001 [ 120.972203][ C0] R10: 0000000000000001 R11: ffffed101b3c0e3b R12: 0000000000000000 [ 120.973379][ C0] R13: ffff8880aaf80100 R14: ffff8880aaf800f2 R15: ffff8880aaf80040 [ 120.974410][ C0] FS: 00007f58e693f740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000 [ 120.979794][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 120.980773][ C0] CR2: 00007ffcb8b38f29 CR3: 00000000afe8e001 CR4: 00000000000606f0 [ 120.981945][ C0] Call Trace: [ 120.982411][ C0] <IRQ> [ 120.982848][ C0] ? hsr_add_node+0x8c0/0x8c0 [hsr] [ 120.983522][ C0] ? rcu_read_lock_held+0x90/0xa0 [ 120.984159][ C0] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 120.984944][ C0] hsr_handle_frame+0x1db/0x4e0 [hsr] [ 120.985597][ C0] ? hsr_nl_nodedown+0x2b0/0x2b0 [hsr] [ 120.986289][ C0] __netif_receive_skb_core+0x6bf/0x3170 [ 120.992513][ C0] ? check_chain_key+0x236/0x5d0 [ 120.993223][ C0] ? do_xdp_generic+0x1460/0x1460 [ 120.993875][ C0] ? register_lock_class+0x14d0/0x14d0 [ 120.994609][ C0] ? __netif_receive_skb_one_core+0x8d/0x160 [ 120.995377][ C0] __netif_receive_skb_one_core+0x8d/0x160 [ 120.996204][ C0] ? __netif_receive_skb_core+0x3170/0x3170 [ ... ] Reported-by: syzbot+fcf5dd39282ceb27108d@syzkaller.appspotmail.com Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21Merge branch 'hinic-BugFixes'David S. Miller
Luo bin says: ==================== hinic: BugFixes Fix a number of bugs which have been present since the first commit. The bugs fixed in these patchs are hardly exposed unless given very specific conditions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21hinic: fix wrong value of MIN_SKB_LENLuo bin
the minimum value of skb len that hw supports is 32 rather than 17 Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21hinic: fix wrong para of wait_for_completion_timeoutLuo bin
the second input parameter of wait_for_completion_timeout should be jiffies instead of millisecond Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21hinic: fix out-of-order excution in arm cpuLuo bin
add read barrier in driver code to keep from reading other fileds in dma memory which is writable for hw until we have verified the memory is valid for driver Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21hinic: fix the bug of clearing event queueLuo bin
should disable eq irq before freeing it, must clear event queue depth in hw before freeing relevant memory to avoid illegal memory access and update consumer idx to avoid invalid interrupt Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-21hinic: fix a bug of waitting for IO stoppedLuo bin
it's unreliable for fw to check whether IO is stopped, so driver wait for enough time to ensure IO process is done in hw before freeing resources Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20tcp: also NULL skb->dev when copy was neededFlorian Westphal
In rare cases retransmit logic will make a full skb copy, which will not trigger the zeroing added in recent change b738a185beaa ("tcp: ensure skb->dev is NULL before leaving TCP stack"). Cc: Eric Dumazet <edumazet@google.com> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Fixes: 28f8bfd1ac94 ("netfilter: Support iif matches in POSTROUTING") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Refetch IP header pointer after pskb_may_pull() in flowtable, from Haishuang Yan. 2) Fix memleak in flowtable offload in nf_flow_table_free(), from Paul Blakey. 3) Set control.addr_type mask in flowtable offload, from Edward Cree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19tcp: ensure skb->dev is NULL before leaving TCP stackEric Dumazet
skb->rbnode is sharing three skb fields : next, prev, dev When a packet is sent, TCP keeps the original skb (master) in a rtx queue, which was converted to rbtree a while back. __tcp_transmit_skb() is responsible to clone the master skb, and add the TCP header to the clone before sending it to network layer. skb_clone() already clears skb->next and skb->prev, but copies the master oskb->dev into the clone. We need to clear skb->dev, otherwise lower layers could interpret the value as a pointer to a netdev. This old bug surfaced recently when commit 28f8bfd1ac94 ("netfilter: Support iif matches in POSTROUTING") was merged. Before this netfilter commit, skb->dev value was ignored and changed before reaching dev_queue_xmit() Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Fixes: 28f8bfd1ac94 ("netfilter: Support iif matches in POSTROUTING") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Martin Zaharinov <micron10@gmail.com> Cc: Florian Westphal <fw@strlen.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19cxgb4: fix Txq restart check during backpressureRahul Lakkireddy
Driver reclaims descriptors in much smaller batches, even if hardware indicates more to reclaim, during backpressure. So, fix the check to restart the Txq during backpressure, by looking at how many descriptors hardware had indicated to reclaim, and not on how many descriptors that driver had actually reclaimed. Once the Txq is restarted, driver will reclaim even more descriptors when Tx path is entered again. Fixes: d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19cxgb4: fix throughput drop during Tx backpressureRahul Lakkireddy
commit 7c3bebc3d868 ("cxgb4: request the TX CIDX updates to status page") reverted back to getting Tx CIDX updates via DMA, instead of interrupts, introduced by commit d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer") However, it missed reverting back several code changes where Tx CIDX updates are not explicitly requested during backpressure when using interrupt mode. These missed changes cause slow recovery during backpressure because the corresponding interrupt no longer comes and hence results in Tx throughput drop. So, revert back these missed code changes, as well, which will allow explicitly requesting Tx CIDX updates when backpressure happens. This enables the corresponding interrupt with Tx CIDX update message to get generated and hence speed up recovery and restore back throughput. Fixes: 7c3bebc3d868 ("cxgb4: request the TX CIDX updates to status page") Fixes: d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19net: dsa: mt7530: Change the LINK bit to reflect the link statusRené van Dorst
Andrew reported: After a number of network port link up/down changes, sometimes the switch port gets stuck in a state where it thinks it is still transmitting packets but the cpu port is not actually transmitting anymore. In this state you will see a message on the console "mtk_soc_eth 1e100000.ethernet eth0: transmit timed out" and the Tx counter in ifconfig will be incrementing on virtual port, but not incrementing on cpu port. The issue is that MAC TX/RX status has no impact on the link status or queue manager of the switch. So the queue manager just queues up packets of a disabled port and sends out pause frames when the queue is full. Change the LINK bit to reflect the link status. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Reported-by: Andrew Smith <andrew.smith@digi.com> Signed-off-by: René van Dorst <opensource@vdorst.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19Merge tag 'rxrpc-fixes-20200319' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc, afs: Interruptibility fixes Here are a number of fixes for AF_RXRPC and AFS that make AFS system calls less interruptible and so less likely to leave the filesystem in an uncertain state. There's also a miscellaneous patch to make tracing consistent. (1) Firstly, abstract out the Tx space calculation in sendmsg. Much the same code is replicated in a number of places that subsequent patches are going to alter, including adding another copy. (2) Fix Tx interruptibility by allowing a kernel service, such as AFS, to request that a call be interruptible only when waiting for a call slot to become available (ie. the call has not taken place yet) or that a call be not interruptible at all (e.g. when we want to do writeback and don't want a signal interrupting a VM-induced writeback). (3) Increase the minimum delay on MSG_WAITALL for userspace sendmsg() when waiting for Tx buffer space as a 2*RTT delay is really small over 10G ethernet and a 1 jiffy timeout might be essentially 0 if at the end of the jiffy period. (4) Fix some tracing output in AFS to make it consistent with rxrpc. (5) Make sure aborted asynchronous AFS operations are tidied up properly so we don't end up with stuck rxrpc calls. (6) Make AFS client calls uninterruptible in the Rx phase. If we don't wait for the reply to be fully gathered, we can't update the local VFS state and we end up in an indeterminate state with respect to the server. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19mlxsw: pci: Only issue reset when system is readyIdo Schimmel
During initialization the driver issues a software reset command and then waits for the system status to change back to "ready" state. However, before issuing the reset command the driver does not check that the system is actually in "ready" state. On Spectrum-{1,2} systems this was always the case as the hardware initialization time is very short. On Spectrum-3 systems this is no longer the case. This results in the software reset command timing-out and the driver failing to load: [ 6.347591] mlxsw_spectrum3 0000:06:00.0: Cmd exec timed-out (opcode=40(ACCESS_REG),opcode_mod=0,in_mod=0) [ 6.358382] mlxsw_spectrum3 0000:06:00.0: Reg cmd access failed (reg_id=9023(mrsr),type=write) [ 6.368028] mlxsw_spectrum3 0000:06:00.0: cannot register bus device [ 6.375274] mlxsw_spectrum3: probe of 0000:06:00.0 failed with error -110 Fix this by waiting for the system to become ready both before issuing the reset command and afterwards. In case of failure, print the last system status to aid in debugging. Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19netfilter: flowtable: populate addr_type maskEdward Cree
nf_flow_rule_match() sets control.addr_type in key, so needs to also set the corresponding mask. An exact match is wanted, so mask is all ones. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: flowtable: Fix flushing of offloaded flows on freePaul Blakey
Freeing a flowtable with offloaded flows, the flow are deleted from hardware but are not deleted from the flow table, leaking them, and leaving their offload bit on. Add a second pass of the disabled gc to delete the these flows from the flow table before freeing it. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: flowtable: reload ip{v6}h in nf_flow_tuple_ip{v6}Haishuang Yan
Since pskb_may_pull may change skb->data, so we need to reload ip{v6}h at the right place. Fixes: a908fdec3dda ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table") Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19netfilter: flowtable: reload ip{v6}h in nf_flow_nat_ip{v6}Haishuang Yan
Since nf_flow_snat_port and nf_flow_snat_ip{v6} call pskb_may_pull() which may change skb->data, so we need to reload ip{v6}h at the right place. Fixes: a908fdec3dda ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table") Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-18Merge branch 'wireguard-fixes'David S. Miller
Jason A. Donenfeld says: ==================== wireguard fixes for 5.6-rc7 I originally intended to spend this cycle working on fun optimizations and architecture for WireGuard for 5.7, but I've been a bit neurotic about having 5.6 ship without any show stopper bugs. WireGuard has been stable for a long time now, but that doesn't make me any less nervous about the real deal in 5.6. To that end, I've been doing code reviews and having discussions, and we also had a security firm audit the code. That audit didn't turn up any vulnerabilities, but they did make a good defense-in-depth suggestion. This series contains: 1) Removal of a duplicated header, from YueHaibing. 2) Testing with 64-bit time in our test suite. 3) Account for skb->protocol==0 due to AF_PACKET sockets, suggested by Florian Fainelli. 4) Clean up some code in an unreachable switch/case branch, suggested by Florian Fainelli. 5) Better handling of low-order points, discussed with Mathias Hall-Andersen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18wireguard: noise: error out precomputed DH during handshake rather than configJason A. Donenfeld
We precompute the static-static ECDH during configuration time, in order to save an expensive computation later when receiving network packets. However, not all ECDH computations yield a contributory result. Prior, we were just not letting those peers be added to the interface. However, this creates a strange inconsistency, since it was still possible to add other weird points, like a valid public key plus a low-order point, and, like points that result in zeros, a handshake would not complete. In order to make the behavior more uniform and less surprising, simply allow all peers to be added. Then, we'll error out later when doing the crypto if there's an issue. This also adds more separation between the crypto layer and the configuration layer. Discussed-with: Mathias Hall-Andersen <mathias@hall-andersen.dk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18wireguard: receive: remove dead code from default packet type caseJason A. Donenfeld
The situation in which we wind up hitting the default case here indicates a major bug in earlier parsing code. It is not a usual thing that should ever happen, which means a "friendly" message for it doesn't make sense. Rather, replace this with a WARN_ON, just like we do earlier in the file for a similar situation, so that somebody sends us a bug report and we can fix it. Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18wireguard: queueing: account for skb->protocol==0Jason A. Donenfeld
We carry out checks to the effect of: if (skb->protocol != wg_examine_packet_protocol(skb)) goto err; By having wg_skb_examine_untrusted_ip_hdr return 0 on failure, this means that the check above still passes in the case where skb->protocol is zero, which is possible to hit with AF_PACKET: struct sockaddr_pkt saddr = { .spkt_device = "wg0" }; unsigned char buffer[5] = { 0 }; sendto(socket(AF_PACKET, SOCK_PACKET, /* skb->protocol = */ 0), buffer, sizeof(buffer), 0, (const struct sockaddr *)&saddr, sizeof(saddr)); Additional checks mean that this isn't actually a problem in the code base, but I could imagine it becoming a problem later if the function is used more liberally. I would prefer to fix this by having wg_examine_packet_protocol return a 32-bit ~0 value on failure, which will never match any value of skb->protocol, which would simply change the generated code from a mov to a movzx. However, sparse complains, and adding __force casts doesn't seem like a good idea, so instead we just add a simple helper function to check for the zero return value. Since wg_examine_packet_protocol itself gets inlined, this winds up not adding an additional branch to the generated code, since the 0 return value already happens in a mergable branch. Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18wireguard: selftests: test using new 64-bit time_tJason A. Donenfeld
In case this helps expose bugs with the newer 64-bit time_t types, we do our testing with the newer musl that supports this as well as CONFIG_COMPAT_32BIT_TIME=n. This matters to us, since wireguard does in fact deal with timestamps. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18wireguard: selftests: remove duplicated include <sys/types.h>YueHaibing
This commit removes a duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18vxlan: check return value of gro_cells_init()Taehee Yoo
gro_cells_init() returns error if memory allocation is failed. But the vxlan module doesn't check the return value of gro_cells_init(). Fixes: 58ce31cca1ff ("vxlan: GRO support at tunnel layer")` Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18net/sched: act_ct: Fix leak of ct zone template on replacePaul Blakey
Currently, on replace, the previous action instance params is swapped with a newly allocated params. The old params is only freed (via kfree_rcu), without releasing the allocated ct zone template related to it. Call tcf_ct_params_free (via call_rcu) for the old params, so it will release it. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: core: dev.c: fix a documentation warningMauro Carvalho Chehab
There's a markup for link with is "foo_". On this kernel-doc comment, we don't want this, but instead, place a literal reference. So, escape the literal with ``foo``, in order to avoid this warning: ./net/core/dev.c:5195: WARNING: Unknown target name: "page_is". Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: phy: sfp-bus.c: get rid of docs warningsMauro Carvalho Chehab
The indentation for the returned values are weird, causing those warnings: ./drivers/net/phy/sfp-bus.c:579: WARNING: Unexpected indentation. ./drivers/net/phy/sfp-bus.c:619: WARNING: Unexpected indentation. Use a list and change the identation for it to be properly parsed by the documentation toolchain. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17Merge branch 'ENA-driver-bug-fixes'David S. Miller
Arthur Kiyanovski says: ==================== ENA driver bug fixes ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: ena: fix continuous keep-alive resetsArthur Kiyanovski
last_keep_alive_jiffies is updated in probe and when a keep-alive event is received. In case the driver times-out on a keep-alive event, it has high chances of continuously timing-out on keep-alive events. This is because when the driver recovers from the keep-alive-timeout reset the value of last_keep_alive_jiffies is very old, and if a keep-alive event is not received before the next timer expires, the value of last_keep_alive_jiffies will cause another keep-alive-timeout reset and so forth in a loop. Solution: Update last_keep_alive_jiffies whenever the device is restored after reset. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Noam Dagan <ndagan@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: ena: avoid memory access violation by validating req_id properlyArthur Kiyanovski
Rx req_id is an index in struct ena_eth_io_rx_cdesc_base. The driver should validate that the Rx req_id it received from the device is in range [0, ring_size -1]. Failure to do so could yield to potential memory access violoation. The validation was mistakenly done when refilling the Rx submission queue and not in Rx completion queue. Fixes: ad974baef2a1 ("net: ena: add support for out of order rx buffers refill") Signed-off-by: Noam Dagan <ndagan@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: ena: fix request of incorrect number of IRQ vectorsArthur Kiyanovski
Bug: In short the main issue is caused by the fact that the number of queues is changed using ethtool after ena_probe() has been called and before ena_up() was executed. Here is the full scenario in detail: * ena_probe() is called when the driver is loaded, the driver is not up yet at the end of ena_probe(). * The number of queues is changed -> io_queue_count is changed as well - ena_up() is not called since the "dev_was_up" boolean in ena_update_queue_count() is false. * ena_up() is called by the kernel (it's called asynchronously some time after ena_probe()). ena_setup_io_intr() is called by ena_up() and it uses io_queue_count to get the suitable irq lines for each msix vector. The function ena_request_io_irq() is called right after that and it uses msix_vecs - This value only changes during ena_probe() and ena_restore() - to request the irq vectors. This results in "Failed to request I/O IRQ" error for i > io_queue_count. Numeric example: * After ena_probe() io_queue_count = 8, msix_vecs = 9. * The number of queues changes to 4 -> io_queue_count = 4, msix_vecs = 9. * ena_up() is executed for the first time: ** ena_setup_io_intr() inits the vectors only up to io_queue_count. ** ena_request_io_irq() calls request_irq() and fails for i = 5. How to reproduce: simply run the following commands: sudo rmmod ena && sudo insmod ena.ko; sudo ethtool -L eth1 combined 3; Fix: Use ENA_MAX_MSIX_VEC(adapter->num_io_queues + adapter->xdp_num_queues) instead of adapter->msix_vecs. We need to take XDP queues into consideration as they need to have msix vectors assigned to them as well. Note that the XDP cannot be attached before the driver is up and running but in XDP mode the issue might occur when the number of queues changes right after a reset trigger. The ENA_MAX_MSIX_VEC simply adds one to the argument since the first msix vector is reserved for management queue. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: ena: fix incorrect setting of the number of msix vectorsArthur Kiyanovski
Overview: We don't frequently change the msix vectors throughout the life cycle of the driver. We do so in two functions: ena_probe() and ena_restore(). ena_probe() is only called when the driver is loaded. ena_restore() on the other hand is called during device reset / resume operations. We use num_io_queues for calculating and allocating the number of msix vectors. At ena_probe() this value is equal to max_num_io_queues and thus this is not an issue, however ena_restore() might be called after the number of io queues has changed. A possible bug scenario is as follows: * Change number of queues from 8 to 4. (num_io_queues = 4, max_num_io_queues = 8, msix_vecs = 9,) * Trigger reset occurs -> ena_restore is called. (num_io_queues = 4, max_num_io_queues =8 , msix_vecs = 5) * Change number of queues from 4 to 6. (num_io_queues = 6, max_num_io_queues = 8, msix_vecs = 5) * The driver will reset due to failure of check_for_rx_interrupt_queue() Fix: This can be easily fixed by always using max_num_io_queues to init the msix_vecs, since this number won't change as opposed to num_io_queues. Fixes: 4d19266022ec ("net: ena: multiple queue creation related cleanups") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: phy: mdio-mux-bcm-iproc: check clk_prepare_enable() return valueRayagonda Kokatanur
Check clk_prepare_enable() return value. Fixes: 2c7230446bc9 ("net: phy: Add pm support to Broadcom iProc mdio mux driver") Signed-off-by: Rayagonda Kokatanur <rayagonda.kokatanur@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17Merge branch 'net-bcmgenet-revisit-MAC-reset'David S. Miller
Doug Berger says: ==================== net: bcmgenet: revisit MAC reset Commit 3a55402c9387 ("net: bcmgenet: use RGMII loopback for MAC reset") was intended to resolve issues with reseting the UniMAC core within the GENET block by providing better control over the clocks used by the UniMAC core. Unfortunately, it is not compatible with all of the supported system configurations so an alternative method must be applied. This commit set provides such an alternative. The first commit reverts the previous change and the second commit provides the alternative reset sequence that addresses the concerns observed with the previous implementation. This replacement implementation should be applied to the stable branches wherever commit 3a55402c9387 ("net: bcmgenet: use RGMII loopback for MAC reset") has been applied. Unfortunately, reverting that commit may conflict with some restructuring changes introduced by commit 4f8d81b77e66 ("net: bcmgenet: Refactor register access in bcmgenet_mii_config"). The first commit in this set has been manually edited to resolve the conflict on net/master. I would be happy to help stable maintainers with resolving any such conflicts if they occur. However, I do not expect that commit to have been backported to stable branch so hopefully the revert can be applied cleanly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: bcmgenet: keep MAC in reset until PHY is upDoug Berger
As noted in commit 28c2d1a7a0bf ("net: bcmgenet: enable loopback during UniMAC sw_reset") the UniMAC must be clocked at least 5 cycles while the sw_reset is asserted to ensure a clean reset. That commit enabled local loopback to provide an Rx clock from the GENET sourced Tx clk. However, when connected in MII mode the Tx clk is sourced by the PHY so if an EPHY is not supplying clocks (e.g. when the link is down) the UniMAC does not receive the necessary clocks. This commit extends the sw_reset window until the PHY reports that the link is up thereby ensuring that the clocks are being provided to the MAC to produce a clean reset. One consequence is that if the system attempts to enter a Wake on LAN suspend state when the PHY link has not been active the MAC may not have had a chance to initialize cleanly. In this case, we remove the sw_reset and enable the WoL reception path as normal with the hope that the PHY will provide the necessary clocks to drive the WoL blocks if the link becomes active after the system has entered suspend. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17Revert "net: bcmgenet: use RGMII loopback for MAC reset"Doug Berger
This reverts commit 3a55402c93877d291b0a612d25edb03d1b4b93ac. This is not a good solution when connecting to an external switch that may not support the isolation of the TXC signal resulting in output driver contention on the pin. A different solution is necessary. Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17Merge branch 'net-mvmdio-avoid-error-message-for-optional-IRQ'David S. Miller
Chris Packham says: ==================== net: mvmdio: avoid error message for optional IRQ I've gone ahead an sent a revert. This is the same as the original v1 except I've added Andrew's review to the commit message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17net: mvmdio: avoid error message for optional IRQChris Packham
Per the dt-binding the interrupt is optional so use platform_get_irq_optional() instead of platform_get_irq(). Since commit 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()") platform_get_irq() produces an error message orion-mdio f1072004.mdio: IRQ index 0 not found which is perfectly normal if one hasn't specified the optional property in the device tree. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17Revert "net: mvmdio: avoid error message for optional IRQ"Chris Packham
This reverts commit e1f550dc44a4d535da4e25ada1b0eaf8f3417929. platform_get_irq_optional() will still return -ENXIO when no interrupt is provided so the additional error handling caused the driver prone to fail when no interrupt was specified. Revert the change so we can apply the correct minimal fix. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>