summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-10-22net: sched: taprio: fix -Wmissing-prototypes warningsYi Wang
We get one warnings when build kernel W=1: net/sched/sch_taprio.c:1155:6: warning: no previous prototype for ‘taprio_offload_config_changed’ [-Wmissing-prototypes] Make the function static to fix this. Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading") Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: remove dsa_switch_alloc helperVivien Didelot
Now that ports are dynamically listed in the fabric, there is no need to provide a special helper to allocate the dsa_switch structure. This will give more flexibility to drivers to embed this structure as they wish in their private structure. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: allocate ports on touchVivien Didelot
Allocate the struct dsa_port the first time it is accessed with dsa_port_touch, and remove the static dsa_port array from the dsa_switch structure. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use ports list to setup default CPU portVivien Didelot
Use the new ports list instead of iterating over switches and their ports when setting up the default CPU port. Unassign it on teardown. Now that we can iterate over multiple CPU ports, remove dst->cpu_dp. At the same time, provide a better error message for CPU-less tree. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use ports list to find first CPU portVivien Didelot
Use the new ports list instead of iterating over switches and their ports when looking up the first CPU port in the tree. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use ports list to setup multiple master devicesVivien Didelot
Now that we have a potential list of CPU ports, make use of it instead of only configuring the master device of an unique CPU port. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use ports list to find a port by nodeVivien Didelot
Use the new ports list instead of iterating over switches and their ports to find a port from a given node. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use ports list for routing table setupVivien Didelot
Use the new ports list instead of accessing the dsa_switch array of ports when iterating over DSA ports of a switch to set up the routing table. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use ports list to setup switchesVivien Didelot
Use the new ports list instead of iterating over switches and their ports when setting up the switches and their ports. At the same time, provide setup states and messages for ports and switches as it is done for the trees. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use ports list to find slaveVivien Didelot
Use the new ports list instead of iterating over switches and their ports when looking for a slave device from a given master interface. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: add ports list in the switch fabricVivien Didelot
Add a list of switch ports within the switch fabric. This will help the lookup of a port inside the whole fabric, and it is the first step towards supporting multiple CPU ports, before deprecating the usage of the unique dst->cpu_dp pointer. In preparation for a future allocation of the dsa_port structures, return -ENOMEM in case no structure is returned, even though this error cannot be reached yet. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net: dsa: use dsa_to_port helper everywhereVivien Didelot
Do not let the drivers access the ds->ports static array directly while there is a dsa_to_port helper for this purpose. At the same time, un-const this helper since the SJA1105 driver assigns the priv member of the returned dsa_port structure. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: remove close abort workerUrsula Braun
With the introduction of the link group termination worker there is no longer a need to postpone smc_close_active_abort() to a worker. To protect socket destruction due to normal and abnormal socket closing, the socket refcount is increased. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: introduce link group termination workerUrsula Braun
Use a worker for link group termination to guarantee process context. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: improve abnormal termination of link groupsUrsula Braun
If a link group and its connections must be terminated, * wake up socket waiters * do not enable buffer reuse A linkgroup might be terminated while normal connection closing is running. Avoid buffer reuse and its related LLC DELETE RKEY call, if linkgroup termination has started. And use the earliest indication of linkgroup termination possible, namely the removal from the linkgroup list. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: tell peers about abnormal link group terminationUrsula Braun
There are lots of link group termination scenarios. Most of them still allow to inform the peer of the terminating sockets about aborting. This patch tries to call smc_close_abort() for terminating sockets. And the internal TCP socket is reset with tcp_abort(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: improve link group freeingUrsula Braun
Usually link groups are freed delayed to enable quick connection creation for a follow-on SMC socket. Terminated link groups are freed faster. This patch makes sure, fast schedule of link group freeing is not rescheduled by a delayed schedule. And it makes sure link group freeing is not rescheduled, if the real freeing is already running. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: improve abnormal termination lockingUrsula Braun
Locking hierarchy requires that the link group conns_lock can be taken if the socket lock is held, but not vice versa. Nevertheless socket termination during abnormal link group termination should be protected by the socket lock. This patch reduces the time segments the link group conns_lock is held to enable usage of lock_sock in smc_lgr_terminate(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: terminate link group without holding lgr lockUrsula Braun
When a link group is to be terminated, it is sufficient to hold the lgr lock when unlinking the link group from its list. Move the lock-protected link group unlinking into smc_lgr_terminate(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22net/smc: cancel send and receive for terminated socketUrsula Braun
The resources for a terminated socket are being cleaned up. This patch makes sure * no more data is received for an actively terminated socket * no more data is sent for an actively or passively terminated socket Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-21net/sched: act_police: re-use tcf_tm_dump()Davide Caratti
Use tcf_tm_dump(), instead of an open coded variant (no functional change in this patch). Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21ipv4: fix IPSKB_FRAG_PMTU handling with fragmentationEric Dumazet
This patch removes the iph field from the state structure, which is not properly initialized. Instead, add a new field to make the "do we want to set DF" be the state bit and move the code to set the DF flag from ip_frag_next(). Joint work with Pablo and Linus. Fixes: 19c3401a917b ("net: ipv4: place control buffer handling away from fragmentation iterators") Reported-by: Patrick Schönthaler <patrick@notvads.ovh> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Several cases of overlapping changes which were for the most part trivially resolvable. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "I was battling a cold after some recent trips, so quite a bit piled up meanwhile, sorry about that. Highlights: 1) Fix fd leak in various bpf selftests, from Brian Vazquez. 2) Fix crash in xsk when device doesn't support some methods, from Magnus Karlsson. 3) Fix various leaks and use-after-free in rxrpc, from David Howells. 4) Fix several SKB leaks due to confusion of who owns an SKB and who should release it in the llc code. From Eric Biggers. 5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet. 6) Jumbo packets don't work after resume on r8169, as the BIOS resets the chip into non-jumbo mode during suspend. From Heiner Kallweit. 7) Corrupt L2 header during MPLS push, from Davide Caratti. 8) Prevent possible infinite loop in tc_ctl_action, from Eric Dumazet. 9) Get register bits right in bcmgenet driver, based upon chip version. From Florian Fainelli. 10) Fix mutex problems in microchip DSA driver, from Marek Vasut. 11) Cure race between route lookup and invalidation in ipv4, from Wei Wang. 12) Fix performance regression due to false sharing in 'net' structure, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits) net: reorder 'struct net' fields to avoid false sharing net: dsa: fix switch tree list net: ethernet: dwmac-sun8i: show message only when switching to promisc net: aquantia: add an error handling in aq_nic_set_multicast_list net: netem: correct the parent's backlog when corrupted packet was dropped net: netem: fix error path for corrupted GSO frames macb: propagate errors when getting optional clocks xen/netback: fix error path of xenvif_connect_data() net: hns3: fix mis-counting IRQ vector numbers issue net: usb: lan78xx: Connect PHY before registering MAC vsock/virtio: discard packets if credit is not respected vsock/virtio: send a credit update when buffer size is changed mlxsw: spectrum_trap: Push Ethernet header before reporting trap net: ensure correct skb->tstamp in various fragmenters net: bcmgenet: reset 40nm EPHY on energy detect net: bcmgenet: soft reset 40nm EPHYs before MAC init net: phy: bcm7xxx: define soft_reset for 40nm EPHY net: bcmgenet: don't set phydev->link from MAC net: Update address for MediaTek ethernet driver in MAINTAINERS ipv4: fix race condition between route lookup and invalidation ...
2019-10-19net: dsa: fix switch tree listVivien Didelot
If there are multiple switch trees on the device, only the last one will be listed, because the arguments of list_add_tail are swapped. Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: netem: correct the parent's backlog when corrupted packet was droppedJakub Kicinski
If packet corruption failed we jump to finish_segs and return NET_XMIT_SUCCESS. Seeing success will make the parent qdisc increment its backlog, that's incorrect - we need to return NET_XMIT_DROP. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: netem: fix error path for corrupted GSO framesJakub Kicinski
To corrupt a GSO frame we first perform segmentation. We then proceed using the first segment instead of the full GSO skb and requeue the rest of the segments as separate packets. If there are any issues with processing the first segment we still want to process the rest, therefore we jump to the finish_segs label. Commit 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") started using the pointer to the first segment in the "rest of segments processing", but as mentioned above the first segment may had already been freed at this point. Backlog corrections for parent qdiscs have to be adjusted. Fixes: 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-18vsock/virtio: discard packets if credit is not respectedStefano Garzarella
If the remote peer doesn't respect the credit information (buf_alloc, fwd_cnt), sending more data than it can send, we should drop the packets to prevent a malicious peer from using all of our memory. This is patch follows the VIRTIO spec: "VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has sufficient free buffer space for the payload" Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-18vsock/virtio: send a credit update when buffer size is changedStefano Garzarella
When the user application set a new buffer size value, we should update the remote peer about this change, since it uses this information to calculate the credit available. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-18net: ensure correct skb->tstamp in various fragmentersEric Dumazet
Thomas found that some forwarded packets would be stuck in FQ packet scheduler because their skb->tstamp contained timestamps far in the future. We thought we addressed this point in commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths") but there is still an issue when/if a packet needs to be fragmented. In order to meet EDT requirements, we have to make sure all fragments get the original skb->tstamp. Note that this original skb->tstamp should be zero in forwarding path, but might have a non zero value in output path if user decided so. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Thomas Bartschies <Thomas.Bartschies@cvk.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17ipv4: fix race condition between route lookup and invalidationWei Wang
Jesse and Ido reported the following race condition: <CPU A, t0> - Received packet A is forwarded and cached dst entry is taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set() <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables from multiple ISPs"), route is added / deleted and rt_cache_flush() is called <CPU B, t2> - Received packet B tries to use the same cached dst entry from t0, but rt_cache_valid() is no longer true and it is replaced in rt_cache_route() by the newer one. This calls dst_dev_put() on the original dst entry which assigns the blackhole netdev to 'dst->dev' <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due to 'dst->dev' being the blackhole netdev There are 2 issues in the v4 routing code: 1. A per-netns counter is used to do the validation of the route. That means whenever a route is changed in the netns, users of all routes in the netns needs to redo lookup. v6 has an implementation of only updating fn_sernum for routes that are affected. 2. When rt_cache_valid() returns false, rt_cache_route() is called to throw away the current cache, and create a new one. This seems unnecessary because as long as this route does not change, the route cache does not need to be recreated. To fully solve the above 2 issues, it probably needs quite some code changes and requires careful testing, and does not suite for net branch. So this patch only tries to add the deleted cached rt into the uncached list, so user could still be able to use it to receive packets until it's done. Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly") Signed-off-by: Wei Wang <weiwan@google.com> Reported-by: Ido Schimmel <idosch@idosch.org> Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org> Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17ipv4: Return -ENETUNREACH if we can't create route but saddr is validStefano Brivio
...instead of -EINVAL. An issue was found with older kernel versions while unplugging a NFS client with pending RPCs, and the wrong error code here prevented it from recovering once link is back up with a configured address. Incidentally, this is not an issue anymore since commit 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context"), included in 5.2-rc7, had the effect of decoupling the forwarding of this error by using SO_ERROR in xs_wake_error(), as pointed out by Benjamin Coddington. To the best of my knowledge, this isn't currently causing any further issue, but the error code doesn't look appropriate anyway, and we might hit this in other paths as well. In detail, as analysed by Gonzalo Siero, once the route is deleted because the interface is down, and can't be resolved and we return -EINVAL here, this ends up, courtesy of inet_sk_rebuild_header(), as the socket error seen by tcp_write_err(), called by tcp_retransmit_timer(). In turn, tcp_write_err() indirectly calls xs_error_report(), which wakes up the RPC pending tasks with a status of -EINVAL. This is then seen by call_status() in the SUN RPC implementation, which aborts the RPC call calling rpc_exit(), instead of handling this as a potentially temporary condition, i.e. as a timeout. Return -EINVAL only if the input parameters passed to ip_route_output_key_hash_rcu() are actually invalid (this is the case if the specified source address is multicast, limited broadcast or all zeroes), but return -ENETUNREACH in all cases where, at the given moment, the given source address doesn't allow resolving the route. While at it, drop the initialisation of err to -ENETUNREACH, which was added to __ip_route_output_key() back then by commit 0315e3827048 ("net: Fix behaviour of unreachable, blackhole and prohibit routes"), but actually had no effect, as it was, and is, overwritten by the fib_lookup() return code assignment, and anyway ignored in all other branches, including the if (fl4->saddr) one: I find this rather confusing, as it would look like -ENETUNREACH is the "default" error, while that statement has no effect. Also note that after commit fc75fc8339e7 ("ipv4: dont create routes on down devices"), we would get -ENETUNREACH if the device is down, but -EINVAL if the source address is specified and we can't resolve the route, and this appears to be rather inconsistent. Reported-by: Stefan Walter <walteste@inf.ethz.ch> Analysed-by: Benjamin Coddington <bcodding@redhat.com> Analysed-by: Gonzalo Siero <gsierohu@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17net: sched: Avoid using yield() in a busy waiting loopMarc Kleine-Budde
With threaded interrupts enabled, the interrupt thread runs as SCHED_RR with priority 50. If a user application with a higher priority preempts the interrupt thread and tries to shutdown the network interface then it will loop forever. The kernel will spin in the loop waiting for the device to become idle and the scheduler will never consider the interrupt thread because its priority is lower. Avoid the problem by sleeping for a jiffy giving other tasks, including the interrupt thread, a chance to run and make progress. In the original thread it has been suggested to use wait_event() and properly waiting for the state to occur. DaveM explained that this would require to add expensive checks in the fast paths of packet processing. Link: https://lkml.kernel.org/r/1393976987-23555-1-git-send-email-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> [bigeasy: Rewrite commit message, add comment, use schedule_timeout_uninterruptible()] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17net/rds: Remove unnecessary null checkYueHaibing
Null check before dma_pool_destroy is redundant, so remove it. This is detected by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17pktgen: remove unnecessary assignment in pktgen_xmit()Yunsheng Lin
variable ret is not used after jumping to "unlock" label, so the assignment is redundant. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17bpf: Check types of arguments passed into helpersAlexei Starovoitov
Introduce new helper that reuses existing skb perf_event output implementation, but can be called from raw_tracepoint programs that receive 'struct sk_buff *' as tracepoint argument or can walk other kernel data structures to skb pointer. In order to do that teach verifier to resolve true C types of bpf helpers into in-kernel BTF ids. The type of kernel pointer passed by raw tracepoint into bpf program will be tracked by the verifier all the way until it's passed into helper function. For example: kfree_skb() kernel function calls trace_kfree_skb(skb, loc); bpf programs receives that skb pointer and may eventually pass it into bpf_skb_output() bpf helper which in-kernel is implemented via bpf_skb_event_output() kernel function. Its first argument in the kernel is 'struct sk_buff *'. The verifier makes sure that types match all the way. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-11-ast@kernel.org
2019-10-17netfilter: nft_tproxy: Fix typo in IPv6 module description.Norman Rasmussen
Signed-off-by: Norman Rasmussen <norman@rasmussen.co.za> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17netfilter: add and use nf_hook_slow_list()Florian Westphal
At this time, NF_HOOK_LIST() macro will iterate the list and then calls nf_hook() for each individual skb. This makes it so the entire list is passed into the netfilter core. The advantage is that we only need to fetch the rule blob once per list instead of per-skb. NF_HOOK_LIST now only works for ipv4 and ipv6, as those are the only callers. v2: use skb_list_del_init() instead of list_del (Edward Cree) Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17netfilter: conntrack: free extension area immediatelyFlorian Westphal
Instead of waiting for rcu grace period just free it directly. This is safe because conntrack lookup doesn't consider extensions. Other accesses happen while ct->ext can't be free'd, either because a ct refcount was taken or because the conntrack hash bucket lock or the dying list spinlock have been taken. This allows to remove __krealloc in a followup patch, netfilter was the only user. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17netfilter: ctnetlink: don't dump ct extensions of unconfirmed conntracksFlorian Westphal
When dumping the unconfirmed lists, the cpu that is processing the ct entry can reallocate ct->ext at any time. Right now accessing the extensions from another CPU is ok provided we're holding rcu read lock: extension reallocation does use rcu. Once RCU isn't used anymore this becomes unsafe, so skip extensions for the unconfirmed list. Dumping the extension area for confirmed or dying conntracks is fine: no reallocations are allowed and list iteration holds appropriate locks that prevent ct (and this ct->ext) from getting free'd. v2: fix compiler warnings due to misue of 'const' and missing return statement (kbuild robot). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17Merge tag 'ipvs-next-for-v5.5' of ↵Pablo Neira Ayuso
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Pablo Neira Ayuso says: ==================== IPVS updates for v5.5 1) Two patches to speedup ipvs netns dismantle, from Haishuang Yan. 2) Three patches to add selftest script for ipvs, also from Haishuang Yan. 3) Simplify __ip_vs_get_out_rt() from zhang kai. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17netfilter: ecache: document extension area access rulesFlorian Westphal
Once ct->ext gets free'd via kfree() rather than kfree_rcu we can't access the extension area anymore without owning the conntrack. This is a special case: The worker is walking the pcpu dying list while holding dying list lock: Neither ct nor ct->ext can be free'd until after the walk has completed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17Bluetooth: hci_core: fix init for HCI_USER_CHANNELMattijs Korpershoek
During the setup() stage, HCI device drivers expect the chip to acknowledge its setup() completion via vendor specific frames. If userspace opens() such HCI device in HCI_USER_CHANNEL [1] mode, the vendor specific frames are never tranmitted to the driver, as they are filtered in hci_rx_work(). Allow HCI devices which operate in HCI_USER_CHANNEL mode to receive frames if the HCI device is is HCI_INIT state. [1] https://www.spinics.net/lists/linux-bluetooth/msg37345.html Fixes: 23500189d7e0 ("Bluetooth: Introduce new HCI socket channel for user operation") Signed-off-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-10-16rxrpc: use rcu protection while reading sk->sk_user_dataEric Dumazet
We need to extend the rcu_read_lock() section in rxrpc_error_report() and use rcu_dereference_sk_user_data() instead of plain access to sk->sk_user_data to make sure all rules are respected. The compiler wont reload sk->sk_user_data at will, and RCU rules prevent memory beeing freed too soon. Fixes: f0308fb07080 ("rxrpc: Fix possible NULL pointer access in ICMP handling") Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-16Bluetooth: Workaround directed advertising bug in Broadcom controllersSzymon Janc
It appears that some Broadcom controllers (eg BCM20702A0) reject LE Set Advertising Parameters command if advertising intervals provided are not within range for undirected and low duty directed advertising. Workaround this bug by populating min and max intervals with 'valid' values. < HCI Command: LE Set Advertising Parameters (0x08|0x0006) plen 15 Min advertising interval: 0.000 msec (0x0000) Max advertising interval: 0.000 msec (0x0000) Type: Connectable directed - ADV_DIRECT_IND (high duty cycle) (0x01) Own address type: Public (0x00) Direct address type: Random (0x01) Direct address: E2:F0:7B:9F:DC:F4 (Static) Channel map: 37, 38, 39 (0x07) Filter policy: Allow Scan Request from Any, Allow Connect Request from Any (0x00) > HCI Event: Command Complete (0x0e) plen 4 LE Set Advertising Parameters (0x08|0x0006) ncmd 1 Status: Invalid HCI Command Parameters (0x12) Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl> Tested-by: Sören Beye <linux@hypfer.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-10-16Bluetooth: missed cpu_to_le16 conversion in hci_init4_reqBen Dooks (Codethink)
It looks like in hci_init4_req() the request is being initialised from cpu-endian data but the packet is specified to be little-endian. This causes an warning from sparse due to __le16 to u16 conversion. Fix this by using cpu_to_le16() on the two fields in the packet. net/bluetooth/hci_core.c:845:27: warning: incorrect type in assignment (different base types) net/bluetooth/hci_core.c:845:27: expected restricted __le16 [usertype] tx_len net/bluetooth/hci_core.c:845:27: got unsigned short [usertype] le_max_tx_len net/bluetooth/hci_core.c:846:28: warning: incorrect type in assignment (different base types) net/bluetooth/hci_core.c:846:28: expected restricted __le16 [usertype] tx_time net/bluetooth/hci_core.c:846:28: got unsigned short [usertype] le_max_tx_time Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-10-16Revert "blackhole_netdev: fix syzkaller reported issue"Mahesh Bandewar
This reverts commit b0818f80c8c1bc215bba276bd61c216014fab23b. Started seeing weird behavior after this patch especially in the IPv6 code path. Haven't root caused it, but since this was applied to net branch, taking a precautionary measure to revert it and look / analyze those failures Revert this now and I'll send a better fix after analysing / fixing the weirdness observed. CC: Eric Dumazet <edumazet@google.com> CC: Wei Wang <weiwan@google.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-16Bluetooth: remove set but not used variable 'smp'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: net/bluetooth/smp.c: In function 'smp_irk_matches': net/bluetooth/smp.c:505:18: warning: variable 'smp' set but not used [-Wunused-but-set-variable] net/bluetooth/smp.c: In function 'smp_generate_rpa': net/bluetooth/smp.c:526:18: warning: variable 'smp' set but not used [-Wunused-but-set-variable] It is not used since commit 28a220aac596 ("bluetooth: switch to AES library") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-10-15sctp: change sctp_prot .no_autobind with trueXin Long
syzbot reported a memory leak: BUG: memory leak, unreferenced object 0xffff888120b3d380 (size 64): backtrace: [...] slab_alloc mm/slab.c:3319 [inline] [...] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483 [...] sctp_bucket_create net/sctp/socket.c:8523 [inline] [...] sctp_get_port_local+0x189/0x5a0 net/sctp/socket.c:8270 [...] sctp_do_bind+0xcc/0x200 net/sctp/socket.c:402 [...] sctp_bindx_add+0x4b/0xd0 net/sctp/socket.c:497 [...] sctp_setsockopt_bindx+0x156/0x1b0 net/sctp/socket.c:1022 [...] sctp_setsockopt net/sctp/socket.c:4641 [inline] [...] sctp_setsockopt+0xaea/0x2dc0 net/sctp/socket.c:4611 [...] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3147 [...] __sys_setsockopt+0x10f/0x220 net/socket.c:2084 [...] __do_sys_setsockopt net/socket.c:2100 [inline] It was caused by when sending msgs without binding a port, in the path: inet_sendmsg() -> inet_send_prepare() -> inet_autobind() -> .get_port/sctp_get_port(), sp->bind_hash will be set while bp->port is not. Later when binding another port by sctp_setsockopt_bindx(), a new bucket will be created as bp->port is not set. sctp's autobind is supposed to call sctp_autobind() where it does all things including setting bp->port. Since sctp_autobind() is called in sctp_sendmsg() if the sk is not yet bound, it should have skipped the auto bind. THis patch is to avoid calling inet_autobind() in inet_send_prepare() by changing sctp_prot .no_autobind with true, also remove the unused .get_port. Reported-by: syzbot+d44f7bbebdea49dbc84a@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15sched: etf: Fix ordering of packets with same txtimeVinicius Costa Gomes
When a application sends many packets with the same txtime, they may be transmitted out of order (different from the order in which they were enqueued). This happens because when inserting elements into the tree, when the txtime of two packets are the same, the new packet is inserted at the left side of the tree, causing the reordering. The only effect of this change should be that packets with the same txtime will be transmitted in the order they are enqueued. The application in question (the AVTP GStreamer plugin, still in development) is sending video traffic, in which each video frame have a single presentation time, the problem is that when packetizing, multiple packets end up with the same txtime. The receiving side was rejecting packets because they were being received out of order. Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc") Reported-by: Ederson de Souza <ederson.desouza@intel.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>