summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-12-20xsk: Consolidate to one single cached producer pointerMagnus Karlsson
Currently, the xsk ring code has two cached producer pointers: prod_head and prod_tail. This patch consolidates these two into a single one called cached_prod to make the code simpler and easier to maintain. This will be in line with the user space part of the the code found in libbpf, that only uses a single cached pointer. The Rx path only uses the two top level functions xskq_produce_batch_desc and xskq_produce_flush_desc and they both use prod_head and never prod_tail. So just move them over to cached_prod. The Tx XDP_DRV path uses xskq_produce_addr_lazy and xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail and prod_head, so move them over to just use cached_prod by skipping the intermediate step of updating prod_tail. The Tx path in XDP_SKB mode uses xskq_reserve_addr and xskq_produce_addr. They currently use both cached pointers, but we can operate on the global producer pointer in xskq_produce_addr since it has to be updated anyway, thus eliminating the use of both cached pointers. We can also remove the xskq_nb_free in xskq_produce_addr since it is already called in xskq_reserve_addr. No need to do it twice. When there is only one cached producer pointer, we can also simplify xskq_nb_free by removing one argument. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-4-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Simplify detection of empty and full ringsMagnus Karlsson
In order to set the correct return flags for poll, the xsk code has to check if the Rx queue is empty and if the Tx queue is full. This code was unnecessarily large and complex as it used the functions that are used to update the local state from the global state (xskq_nb_free and xskq_nb_avail). Since we are not doing this nor updating any data dependent on this state, we can simplify the functions. Another benefit from this is that we can also simplify the xskq_nb_free and xskq_nb_avail functions in a later commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-3-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Eliminate the lazy update thresholdMagnus Karlsson
The lazy update threshold was introduced to keep the producer and consumer some distance apart in the completion ring. This was important in the beginning of the development of AF_XDP as the ring format as that point in time was very sensitive to the producer and consumer being on the same cache line. This is not the case anymore as the current ring format does not degrade in any noticeable way when this happens. Moreover, this threshold makes it impossible to run the system with rings that have less than 128 entries. So let us remove this threshold and just get one entry from the ring as in all other functions. This will enable us to remove this function in a later commit. Note that xskq_produce_addr_lazy followed by xskq_produce_flush_addr_n are still not the same function as xskq_produce_addr() as it operates on another cached pointer. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-2-git-send-email-magnus.karlsson@intel.com
2019-12-20rxrpc: Fix missing security check on incoming callsDavid Howells
Fix rxrpc_new_incoming_call() to check that we have a suitable service key available for the combination of service ID and security class of a new incoming call - and to reject calls for which we don't. This causes an assertion like the following to appear: rxrpc: Assertion failed - 6(0x6) == 12(0xc) is false kernel BUG at net/rxrpc/call_object.c:456! Where call->state is RXRPC_CALL_SERVER_SECURING (6) rather than RXRPC_CALL_COMPLETE (12). Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-12-20rxrpc: Don't take call->user_mutex in rxrpc_new_incoming_call()David Howells
Standard kernel mutexes cannot be used in any way from interrupt or softirq context, so the user_mutex which manages access to a call cannot be a mutex since on a new call the mutex must start off locked and be unlocked within the softirq handler to prevent userspace interfering with a call we're setting up. Commit a0855d24fc22d49cdc25664fb224caee16998683 ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") causes big warnings to be splashed in dmesg for each a new call that comes in from the server. Whilst it *seems* like it should be okay, since the accept path uses trylock, there are issues with PI boosting and marking the wrong task as the owner. Fix this by not taking the mutex in the softirq path at all. It's not obvious that there should be any need for it as the state is set before the first notification is generated for the new call. There's also no particular reason why the link-assessing ping should be triggered inside the mutex. It's not actually transmitted there anyway, but rather it has to be deferred to a workqueue. Further, I don't think that there's any particular reason that the socket notification needs to be done from within rx->incoming_lock, so the amount of time that lock is held can be shortened too and the ping prepared before the new call notification is sent. Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Signed-off-by: David Howells <dhowells@redhat.com> cc: Peter Zijlstra (Intel) <peterz@infradead.org> cc: Ingo Molnar <mingo@redhat.com> cc: Will Deacon <will@kernel.org> cc: Davidlohr Bueso <dave@stgolabs.net>
2019-12-20rxrpc: Unlock new call in rxrpc_new_incoming_call() rather than the callerDavid Howells
Move the unlock and the ping transmission for a new incoming call into rxrpc_new_incoming_call() rather than doing it in the caller. This makes it clearer to see what's going on. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> cc: Ingo Molnar <mingo@redhat.com> cc: Will Deacon <will@kernel.org> cc: Davidlohr Bueso <dave@stgolabs.net>
2019-12-19xdp: Simplify __bpf_tx_xdp_map()Björn Töpel
The explicit error checking is not needed. Simply return the error instead. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20191219061006.21980-9-bjorn.topel@gmail.com
2019-12-19xdp: Remove map_to_flush and map swap detectionBjörn Töpel
Now that all XDP maps that can be used with bpf_redirect_map() tracks entries to be flushed in a global fashion, there is not need to track that the map has changed and flush from xdp_do_generic_map() anymore. All entries will be flushed in xdp_do_flush_map(). This means that the map_to_flush can be removed, and the corresponding checks. Moving the flush logic to one place, xdp_do_flush_map(), give a bulking behavior and performance boost. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20191219061006.21980-8-bjorn.topel@gmail.com
2019-12-19xdp: Make cpumap flush_list common for all map instancesBjörn Töpel
The cpumap flush list is used to track entries that need to flushed from via the xdp_do_flush_map() function. This list used to be per-map, but there is really no reason for that. Instead make the flush list global for all devmaps, which simplifies __cpu_map_flush() and cpu_map_alloc(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20191219061006.21980-7-bjorn.topel@gmail.com
2019-12-19xdp: Make devmap flush_list common for all map instancesBjörn Töpel
The devmap flush list is used to track entries that need to flushed from via the xdp_do_flush_map() function. This list used to be per-map, but there is really no reason for that. Instead make the flush list global for all devmaps, which simplifies __dev_map_flush() and dev_map_init_map(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20191219061006.21980-6-bjorn.topel@gmail.com
2019-12-19xsk: Make xskmap flush_list common for all map instancesBjörn Töpel
The xskmap flush list is used to track entries that need to flushed from via the xdp_do_flush_map() function. This list used to be per-map, but there is really no reason for that. Instead make the flush list global for all xskmaps, which simplifies __xsk_map_flush() and xsk_map_alloc(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20191219061006.21980-5-bjorn.topel@gmail.com
2019-12-19net/sched: cls_u32: fix refcount leak in the error path of u32_change()Davide Caratti
when users replace cls_u32 filters with new ones having wrong parameters, so that u32_change() fails to validate them, the kernel doesn't roll-back correctly, and leaves semi-configured rules. Fix this in u32_walk(), avoiding a call to the walker function on filters that don't have a match rule connected. The side effect is, these "empty" filters are not even dumped when present; but that shouldn't be a problem as long as we are restoring the original behaviour, where semi-configured filters were not even added in the error path of u32_change(). Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net/tls: add helper for testing if socket is RX offloadedJakub Kicinski
There is currently no way for driver to reliably check that the socket it has looked up is in fact RX offloaded. Add a helper. This allows drivers to catch misbehaving firmware. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20netfilter: nft_tproxy: Fix port selector on Big EndianPhil Sutter
On Big Endian architectures, u16 port value was extracted from the wrong parts of u32 sreg_port, just like commit 10596608c4d62 ("netfilter: nf_tables: fix mismatch in big-endian system") describes. Fixes: 4ed8eb6570a49 ("netfilter: nf_tables: Add native tproxy support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-20netfilter: ebtables: compat: reject all padding in matches/watchersFlorian Westphal
syzbot reported following splat: BUG: KASAN: vmalloc-out-of-bounds in size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline] BUG: KASAN: vmalloc-out-of-bounds in compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155 Read of size 4 at addr ffffc900004461f4 by task syz-executor267/7937 CPU: 1 PID: 7937 Comm: syz-executor267 Not tainted 5.5.0-rc1-syzkaller #0 size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline] compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155 compat_do_replace+0x344/0x720 net/bridge/netfilter/ebtables.c:2249 compat_do_ebt_set_ctl+0x22f/0x27e net/bridge/netfilter/ebtables.c:2333 [..] Because padding isn't considered during computation of ->buf_user_offset, "total" is decremented by fewer bytes than it should. Therefore, the first part of if (*total < sizeof(*entry) || entry->next_offset < sizeof(*entry)) will pass, -- it should not have. This causes oob access: entry->next_offset is past the vmalloced size. Reject padding and check that computed user offset (sum of ebt_entry structure plus all individual matches/watchers/targets) is same value that userspace gave us as the offset of the next entry. Reported-by: syzbot+f68108fed972453a0ad4@syzkaller.appspotmail.com Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-20netfilter: nf_flow_table: fix big-endian integer overflowArnd Bergmann
In some configurations, gcc reports an integer overflow: net/netfilter/nf_flow_table_offload.c: In function 'nf_flow_rule_match': net/netfilter/nf_flow_table_offload.c:80:21: error: unsigned conversion from 'int' to '__be16' {aka 'short unsigned int'} changes value from '327680' to '0' [-Werror=overflow] mask->tcp.flags = TCP_FLAG_RST | TCP_FLAG_FIN; ^~~~~~~~~~~~ From what I can tell, we want the upper 16 bits of these constants, so they need to be shifted in cpu-endian mode. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-12-19 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 269 insertions(+), 108 deletions(-). The main changes are: 1) Fix lack of synchronization between xsk wakeup and destroying resources used by xsk wakeup, from Maxim Mikityanskiy. 2) Fix pruning with tail call patching, untrack programs in case of verifier error and fix a cgroup local storage tracking bug, from Daniel Borkmann. 3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer. 4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when only cBPF is present, from Alexander Lobakin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net, sysctl: Fix compiler warning when only cBPF is presentAlexander Lobakin
proc_dointvec_minmax_bpf_restricted() has been firstly introduced in commit 2e4a30983b0f ("bpf: restrict access to core bpf sysctls") under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations"), because a new sysctl, bpf_jit_limit, made use of it. Finally, this parameter has become long instead of integer with fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") and thus, a new proc_dolongvec_minmax_bpf_restricted() has been added. With this last change, we got back to that proc_dointvec_minmax_bpf_restricted() is used only under CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been brought back. So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n since v4.20 we have: CC net/core/sysctl_net_core.o net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function] 292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again. Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru
2019-12-19xsk: Add rcu_read_lock around the XSK wakeupMaxim Mikityanskiy
The XSK wakeup callback in drivers makes some sanity checks before triggering NAPI. However, some configuration changes may occur during this function that affect the result of those checks. For example, the interface can go down, and all the resources will be destroyed after the checks in the wakeup function, but before it attempts to use these resources. Wrap this callback in rcu_read_lock to allow driver to synchronize_rcu before actually destroying the resources. xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup wrapped into the RCU lock. After this commit, xsk_poll starts using xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide ndo_xsk_wakeup should be called. It also fixes a bug introduced with the need_wakeup feature: a non-zero-copy socket may be used with a driver supporting zero-copy, and in this case ndo_xsk_wakeup should not be called, so the xs->zc check is the correct one. Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com
2019-12-18bpf: Allow to change skb mark in test_runNikita V. Shirokov
allow to pass skb's mark field into bpf_prog_test_run ctx for BPF_PROG_TYPE_SCHED_CLS prog type. that would allow to test bpf programs which are doing decision based on this field Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-12-18net: sch_ets: Make the ETS qdisc offloadablePetr Machata
Add hooks at appropriate points to make it possible to offload the ETS Qdisc. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: sch_ets: Add a new QdiscPetr Machata
Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is PRIO-like in how it is configured, meaning one needs to specify how many bands there are, how many are strict and how many are dwrr, quanta for the latter, and priomap. The new Qdisc operates like the PRIO / DRR combo would when configured as per the standard. The strict classes, if any, are tried for traffic first. When there's no traffic in any of the strict queues, the ETS ones (if any) are treated in the same way as in DRR. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18sch_cake: drop unused variable tin_quantum_prioKevin 'ldir' Darbyshire-Bryant
Turns out tin_quantum_prio isn't used anymore and is a leftover from a previous implementation of diffserv tins. Since the variable isn't used in any calculations it can be eliminated. Drop variable and places where it was set. Rename remaining variable and consolidate naming of intermediate variables that set it. Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18net: nfc: nci: fix a possible sleep-in-atomic-context bug in ↵Jia-Ju Bai
nci_uart_tty_receive() The kernel may sleep while holding a spinlock. The function call path (from bottom to top) in Linux 4.19 is: net/nfc/nci/uart.c, 349: nci_skb_alloc in nci_uart_default_recv_buf net/nfc/nci/uart.c, 255: (FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive net/nfc/nci/uart.c, 254: spin_lock in nci_uart_tty_receive nci_skb_alloc(GFP_KERNEL) can sleep at runtime. (FUNC_PTR) means a function pointer is called. To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for nci_skb_alloc(). This bug is found by a static analysis tool STCheck written by myself. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18nfs: use time64_t internallyArnd Bergmann
The timestamps for the cache are all in boottime seconds, so they don't overflow 32-bit values, but the use of time_t is deprecated because it generally does overflow when used with wall-clock time. There are multiple possible ways of avoiding it: - leave time_t, which is safe here, but forces others to look into this code to determine that it is over and over. - use a more generic type, like 'int' or 'long', which is known to be sufficient here but loses the documentation of referring to timestamps - use ktime_t everywhere, and convert into seconds in the few places where we want realtime-seconds. The conversion is sometimes expensive, but not more so than the conversion we do today. - use time64_t to clarify that this code is safe. Nothing would change for 64-bit architectures, but it is slightly less efficient on 32-bit architectures. Without a clear winner of the three approaches above, this picks the last one, favouring readability over a small performance loss on 32-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-18sunrpc: convert to time64_t for expiryArnd Bergmann
Using signed 32-bit types for UTC time leads to the y2038 overflow, which is what happens in the sunrpc code at the moment. This changes the sunrpc code over to use time64_t where possible. The one exception is the gss_import_v{1,2}_context() function for kerberos5, which uses 32-bit timestamps in the protocol. Here, we can at least treat the numbers as 'unsigned', which extends the range from 2038 to 2106. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-18packet: clarify timestamp overflowArnd Bergmann
The memory mapped packet socket data structure in version 1 through 3 all contain 32-bit second values for the packet time stamps, which makes them suffer from the overflow of time_t in y2038 or y2106 (depending on whether user space interprets the value as signed or unsigned). The implementation uses the deprecated getnstimeofday() function. In order to get rid of that, this changes the code to use ktime_get_real_ts64() as a replacement, documenting the nature of the overflow. As long as the user applications treat the timestamps as unsigned, or only use the difference between timestamps, they are fine, and changing the timestamps to 64-bit wouldn't require a more invasive user space API change. Note: a lot of other APIs suffer from incompatible structures when time_t gets redefined to 64-bit in 32-bit user space, but this one does not. Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/lkml/CAF=yD-Jomr-gWSR-EBNKnSpFL46UeG564FLfqTCMNEm-prEaXA@mail.gmail.com/T/#u Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-17net-sysfs: Call dev_hold always in rx_queue_add_kobjectJouni Hogander
Dev_hold has to be called always in rx_queue_add_kobject. Otherwise usage count drops below 0 in case of failure in kobject_init_and_add. Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject") Reported-by: syzbot <syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Miller <davem@davemloft.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: dsa: make unexported dsa_link_touch() staticBen Dooks (Codethink)
dsa_link_touch() is not exported, or defined outside of the file it is in so make it static to avoid the following warning: net/dsa/dsa2.c:127:17: warning: symbol 'dsa_link_touch' was not declared. Should it be static? Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: annotate lockless accesses to sk->sk_pacing_shiftEric Dumazet
sk->sk_pacing_shift can be read and written without lock synchronization. This patch adds annotations to document this fact and avoid future syzbot complains. This might also avoid unexpected false sharing in sk_pacing_shift_update(), as the compiler could remove the conditional check and always write over sk->sk_pacing_shift : if (sk->sk_pacing_shift != val) sk->sk_pacing_shift = val; Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17sctp: fix memleak on err handling of stream initializationMarcelo Ricardo Leitner
syzbot reported a memory leak when an allocation fails within genradix_prealloc() for output streams. That's because genradix_prealloc() leaves initialized members initialized when the issue happens and SCTP stack will abort the current initialization but without cleaning up such members. The fix here is to always call genradix_free() when genradix_prealloc() fails, for output and also input streams, as it suffers from the same issue. Reported-by: syzbot+772d9e36c490b18d51d1@syzkaller.appspotmail.com Fixes: 2075e50caf5e ("sctp: convert to genradix") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17net: dsa: Make PHYLINK related function static againFlorian Fainelli
Commit 77373d49de22 ("net: dsa: Move the phylink driver calls into port.c") moved and exported a bunch of symbols, but they are not used outside of net/dsa/port.c at the moment, so no reason to export them. Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17tipc: don't send gap blocks in ACK messagesJon Maloy
In the commit referred to below we eliminated sending of the 'gap' indicator in regular ACK messages, reserving this to explicit NACK ditto. Unfortunately we missed to also eliminate building of the 'gap block' area in ACK messages. This area is meant to report gaps in the received packet sequence following the initial gap, so that lost packets can be retransmitted earlier and received out-of-sequence packets can be released earlier. However, the interpretation of those blocks is dependent on a complete and correct sequence of gaps and acks. Hence, when the initial gap indicator is missing a single gap block will be interpreted as an acknowledgment of all preceding packets. This may lead to packets being released prematurely from the sender's transmit queue, with easily predicatble consequences. We now fix this by not building any gap block area if there is no initial gap to report. Fixes: commit 02288248b051 ("tipc: eliminate gap indicator from ACK messages") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17netfilter: conntrack: remove two export symbolsFlorian Westphal
Not used anywhere, remove them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-17netfilter: nft_tunnel: add the missing nla_nest_cancel()Xin Long
When nla_put_xxx() fails under nla_nest_start_noflag(), nla_nest_cancel() should be called, so that the skb can be trimmed properly. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-17netfilter: nft_tunnel: also dump OPTS_ERSPAN/VXLANXin Long
This patch is to add the nest attr OPTS_ERSPAN/VXLAN when dumping KEY_OPTS, and it would be helpful when parsing in userpace. Also, this is needed for supporting multiple geneve opts in the future patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-17netfilter: nft_tunnel: also dump ERSPAN_VERSIONXin Long
This is not necessary, but it'll be easier to parse in userspace, also given that other places like act_tunnel_key, cls_flower and ip_tunnel_core are also doing so. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-17netfilter: nft_tunnel: add the missing ERSPAN_VERSION nla_policyXin Long
ERSPAN_VERSION is an attribute parsed in kernel side, nla_policy type should be added for it, like other attributes. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-17netfilter: nft_tunnel: no need to call htons() when dumping portsXin Long
info->key.tp_src and tp_dst are __be16, when using nla_put_be16() to dump them, htons() is not needed, so remove it in this patch. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-17netfilter: Clean up unnecessary #ifdefLukas Wunner
If CONFIG_NETFILTER_INGRESS is not enabled, nf_ingress() becomes a no-op because it solely contains an if-clause calling nf_hook_ingress_active(), for which an empty inline stub exists in <linux/netfilter_ingress.h>. All the symbols used in the if-clause's body are still available even if CONFIG_NETFILTER_INGRESS is not enabled. The additional "#ifdef CONFIG_NETFILTER_INGRESS" in nf_ingress() is thus unnecessary, so drop it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-16Merge tag 'mac80211-for-net-2019-10-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A handful of fixes: * disable AQL on most drivers, addressing the iwlwifi issues * fix double-free on network namespace changes * fix TID field in frames injected through monitor interfaces * fix ieee80211_calc_rx_airtime() * fix NULL pointer dereference in rfkill (and remove BUG_ON) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Remove old route notifications and convert listenersIdo Schimmel
Unlike mlxsw, the other listeners to the FIB notification chain do not require any special modifications as they never considered multiple identical routes. This patch removes the old route notifications and converts all the listeners to use the new replace / delete notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Only Replay routes of interest to new listenersIdo Schimmel
When a new listener is registered to the FIB notification chain it receives a dump of all the available routes in the system. Instead, make sure to only replay the IPv4 routes that are actually used in the data path and are of any interest to the new listener. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Handle route deletion notification during flushIdo Schimmel
In a similar fashion to previous patch, when a route is deleted as part of table flushing, promote the next route in the list, if exists. Otherwise, simply emit a delete notification. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Handle route deletion notificationIdo Schimmel
When a route is deleted we potentially need to promote the next route in the FIB alias list (e.g., with an higher metric). In case we find such a route, a replace notification is emitted. Otherwise, a delete notification for the deleted route. v2: * Convert to use fib_find_alias() instead of fib_find_first_alias() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Notify newly added route if should be offloadedIdo Schimmel
When a route is added, it should only be notified in case it is the first route in the FIB alias list with the given {prefix, prefix length, table ID}. Otherwise, it is not used in the data path and should not be considered by switch drivers. v2: * Convert to use fib_find_alias() instead of fib_find_first_alias() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Notify route if replacing currently offloaded oneIdo Schimmel
When replacing a route, its replacement should only be notified in case the replaced route is of any interest to listeners. In other words, if the replaced route is currently used in the data path, which means it is the first route in the FIB alias list with the given {prefix, prefix length, table ID}. v2: * Convert to use fib_find_alias() instead of fib_find_first_alias() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Extend FIB alias find functionIdo Schimmel
Extend the function with another argument, 'find_first'. When set, the function returns the first FIB alias with the matching {prefix, prefix length, table ID}. The TOS and priority parameters are ignored. Current callers are converted to pass 'false' in order to maintain existing behavior. This will be used by subsequent patches in the series. v2: * New patch Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16ipv4: Notify route after insertion to the routing tableIdo Schimmel
Currently, a new route is notified in the FIB notification chain before it is inserted to the FIB alias list. Subsequent patches will use the placement of the new route in the ordered FIB alias list in order to determine if the route should be notified or not. As a preparatory step, change the order so that the route is first inserted into the FIB alias list and only then notified. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16vsock/virtio: add WARN_ON check on virtio_transport_get_ops()Stefano Garzarella
virtio_transport_get_ops() and virtio_transport_send_pkt_info() can only be used on connecting/connected sockets, since a socket assigned to a transport is required. This patch adds a WARN_ON() on virtio_transport_get_ops() to check this requirement, a comment and a returned error on virtio_transport_send_pkt_info(), Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>