summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-09-11nl80211: Fix possible Spectre-v1 for CQM RSSI thresholdsMasashi Honma
commit 1222a1601488 ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") was incomplete and requires one more fix to prevent accessing to rssi_thresholds[n] because user can control rssi_thresholds[i] values to make i reach to n. For example, rssi_thresholds = {-400, -300, -200, -100} when last is -34. Cc: stable@vger.kernel.org Fixes: 1222a1601488 ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Link: https://lore.kernel.org/r/20190908005653.17433-1-masashi.honma@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: allow drivers to set max MTUWen Gong
Make it possibly for drivers to adjust the default max_mtu by storing it in the hardware struct and using that value for all interfaces. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/1567738137-31748-1-git-send-email-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11cfg80211: Do not compare with boolean in nl80211_common_reg_change_eventzhong jiang
With the help of boolinit.cocci, we use !nl80211_reg_change_event_fill instead of (nl80211_reg_change_event_fill == false). Meanwhile, Clean up the code. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Link: https://lore.kernel.org/r/1567657537-65472-1-git-send-email-zhongjiang@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: IBSS: send deauth when expiring inactive STAsJohannes Berg
When we expire an inactive station, try to send it a deauth. This helps if it's actually still around, and just has issues with beacon distribution (or we do), and it will not also remove us. Then, if we have shared state, this may not be reset properly, causing problems; for example, we saw a case where aggregation sessions weren't removed properly (due to the TX start being offloaded to firmware and it relying on deauth for stop), causing a lot of traffic to get lost due to the SN reset after remove/add of the peer. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-9-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: don't check if key is NULL in ieee80211_key_link()Luca Coelho
We already assume that key is not NULL and dereference it in a few other places before we check whether it is NULL, so the check is unnecessary. Remove it. Fixes: 96fc6efb9ad9 ("mac80211: IEEE 802.11 Extended Key ID support") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-8-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: clear crypto tx tailroom counter upon keys enableLior Cohen
In case we got a fw restart while roaming from encrypted AP to non-encrypted one, we might end up with hitting a warning on the pending counter crypto_tx_tailroom_pending_dec having a non-zero value. The following comment taken from net/mac80211/key.c explains the rational for the delayed tailroom needed: /* * The reason for the delayed tailroom needed decrementing is to * make roaming faster: during roaming, all keys are first deleted * and then new keys are installed. The first new key causes the * crypto_tx_tailroom_needed_cnt to go from 0 to 1, which invokes * the cost of synchronize_net() (which can be slow). Avoid this * by deferring the crypto_tx_tailroom_needed_cnt decrementing on * key removal for a while, so if we roam the value is larger than * zero and no 0->1 transition happens. * * The cost is that if the AP switching was from an AP with keys * to one without, we still allocate tailroom while it would no * longer be needed. However, in the typical (fast) roaming case * within an ESS this usually won't happen. */ The next flow lead to the warning eventually reported as a bug: 1. Disconnect from encrypted AP 2. Set crypto_tx_tailroom_pending_dec = 1 for the key 3. Schedule work 4. Reconnect to non-encrypted AP 5. Add a new key, setting the tailroom counter = 1 6. Got FW restart while pending counter is set ---> hit the warning While on it, the ieee80211_reset_crypto_tx_tailroom() func was merged into its single caller ieee80211_reenable_keys (previously called ieee80211_enable_keys). Also, we reset the crypto_tx_tailroom_pending_dec and remove the counters warning as we just reset both. Signed-off-by: Lior Cohen <lior2.cohen@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-7-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: remove unnecessary key conditionJohannes Berg
When we reach this point, the key cannot be NULL. Remove the condition that suggests otherwise. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-6-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: list features in WEP/TKIP disable in better orderJohannes Berg
"HE/HT/VHT" is a bit confusing since really the order of development (and possible support) is different - change this to "HT/VHT/HE". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-4-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11cfg80211: always shut down on HW rfkillJohannes Berg
When the RFKILL subsystem isn't available, then rfkill_blocked() always returns false. In the case of hardware rfkill this will be wrong though, as if the hardware reported being killed then it cannot operate any longer. Since we only ever call the rfkill_sync work in this case, just rename it to rfkill_block and always pass "true" for the blocked parameter, rather than passing rfkill_blocked(). We rely on the underlying driver to still reject any new attempt to bring up the device by itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-2-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: vht: add support VHT EXT NSS BW in parsing VHTMordechay Goodstein
This fixes was missed in parsing the vht capabilities max bw support. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Fixes: e80d642552a3 ("mac80211: copy VHT EXT NSS BW Support/Capable data to station") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830114057.22197-1-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11cfg80211: fix boundary value in ieee80211_frequency_to_channel()Arend van Spriel
The boundary value used for the 6G band was incorrect as it would result in invalid 6G channel number for certain frequencies. Reported-by: Amar Singhal <asinghal@codeaurora.org> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1567510772-24263-1-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-10netfilter: nft_{fwd,dup}_netdev: add offload supportPablo Neira Ayuso
This patch adds support for packet mirroring and redirection. The nft_fwd_dup_netdev_offload() function configures the flow_action object for the fwd and the dup actions. Extend nft_flow_rule_destroy() to release the net_device object when the flow_rule object is released, since nft_fwd_dup_netdev_offload() bumps the net_device reference counter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>
2019-09-10netfilter: nft_synproxy: add synproxy stateful object supportFernando Fernandez Mancera
Register a new synproxy stateful object type into the stateful object infrastructure. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-10sctp: fix the missing put_user when dumping transport thresholdsXin Long
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump a transport thresholds info. Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds. Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10sch_hhf: ensure quantum and hhf_non_hh_weight are non-zeroCong Wang
In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero, it would make no progress inside the loop in hhf_dequeue() thus kernel would get stuck. Fix this by checking this corner case in hhf_change(). Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Terry Lam <vtlam@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10net_sched: check cops->tcf_block in tc_bind_tclass()Cong Wang
At least sch_red and sch_tbf don't implement ->tcf_block() while still have a non-zero tc "class". Instead of adding nop implementations to each of such qdisc's, we can just relax the check of cops->tcf_block() in tc_bind_tclass(). They don't support TC filter anyway. Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10devlink: add 'reset_dev_on_drv_probe' paramDirk van der Merwe
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the device reset policy on driver probe. This parameter is useful in conjunction with the existing 'fw_load_policy' parameter. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10bridge/mdb: remove wrong use of NLM_F_MULTINicolas Dichtel
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end. In fact, NLMSG_DONE is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 949f1e39a617 ("bridge: mdb: notify on router port add and del") CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-08netfilter: nf_tables_offload: move indirect flow_block callback logic to corePablo Neira Ayuso
Add nft_offload_init() and nft_offload_exit() function to deal with the init and the exit path of the offload infrastructure. Rename nft_indr_block_get_and_ing_cmd() to nft_indr_block_cb(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-08netfilter: nf_tables_offload: avoid excessive stack usageArnd Bergmann
The nft_offload_ctx structure is much too large to put on the stack: net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=] Use dynamic allocation here, as we do elsewhere in the same function. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-08netfilter: nf_tables: Fix an Oops in nf_tables_updobj() error handlingDan Carpenter
The "newobj" is an error pointer so we can't pass it to kfree(). It doesn't need to be freed so we can remove that and I also renamed the error label. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-07net/tls: align non temporal copy to cache linesJakub Kicinski
Unlike normal TCP code TLS has to touch the cache lines it copies into to fill header info. On memory-heavy workloads having non temporal stores and normal accesses targeting the same cache line leads to significant overhead. Measured 3% overhead running 3600 round robin connections with additional memory heavy workload. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: remove the record tail optimizationJakub Kicinski
For TLS device offload the tag/message authentication code are filled in by the device. The kernel merely reserves space for them. Because device overwrites it, the contents of the tag make do no matter. Current code tries to save space by reusing the header as the tag. This, however, leads to an additional frag being created and defeats buffer coalescing (which trickles all the way down to the drivers). Remove this optimization, and try to allocate the space for the tag in the usual way, leave the memory uninitialized. If memory allocation fails rewind the record pointer so that we use the already copied user data as tag. Note that the optimization was actually buggy, as the tag for TLS 1.2 is 16 bytes, but header is just 13, so the reuse may had looked past the end of the page.. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: use RCU for the adder to the offload record listJakub Kicinski
All modifications to TLS record list happen under the socket lock. Since records form an ordered queue readers are only concerned about elements being removed, additions can happen concurrently. Use RCU primitives to ensure the correct access types (READ_ONCE/WRITE_ONCE). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: unref frags in orderJakub Kicinski
It's generally more cache friendly to walk arrays in order, especially those which are likely not in cache. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-09-06 Here's the main bluetooth-next pull request for the 5.4 kernel. - Cleanups & fixes to btrtl driver - Fixes for Realtek devices in btusb, e.g. for suspend handling - Firmware loading support for BCM4345C5 - hidp_send_message() return value handling fixes - Added support for utilizing Fast Advertising Interval - Various other minor cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: gso: Fix skb_segment splat when splitting gso_size mangled skb having ↵Shmulik Ladkani
linear-headed frag_list Historically, support for frag_list packets entering skb_segment() was limited to frag_list members terminating on exact same gso_size boundaries. This is verified with a BUG_ON since commit 89319d3801d1 ("net: Add frag_list support to skb_segment"), quote: As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"), the "exact MSS boundaries" assumption no longer holds: An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but leaves the frag_list members as originally merged by GRO with the original 'gso_size'. Example of such programs are bpf-based NAT46 or NAT64. This lead to a kernel BUG_ON for flows involving: - GRO generating a frag_list skb - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room() - skb_segment() of the skb See example BUG_ON reports in [0]. In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"), skb_segment() was modified to support the "gso_size mangling" case of a frag_list GRO'ed skb, but *only* for frag_list members having head_frag==true (having a page-fragment head). Alas, GRO packets having frag_list members with a linear kmalloced head (head_frag==false) still hit the BUG_ON. This commit adds support to skb_segment() for a 'head_skb' packet having a frag_list whose members are *non* head_frag, with gso_size mangled, by disabling SG and thus falling-back to copying the data from the given 'head_skb' into the generated segmented skbs - as suggested by Willem de Bruijn [1]. Since this approach involves the penalty of skb_copy_and_csum_bits() when building the segments, care was taken in order to enable this solution only when required: - untrusted gso_size, by testing SKB_GSO_DODGY is set (SKB_GSO_DODGY is set by any gso_size mangling functions in net/core/filter.c) - the frag_list is non empty, its item is a non head_frag, *and* the headlen of the given 'head_skb' does not match the gso_size. [0] https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/ https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/ [1] https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/ Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07ipmr: remove hard code cache_resolve_queue_len limitHangbin Liu
This is a re-post of previous patch wrote by David Miller[1]. Phil Karn reported[2] that on busy networks with lots of unresolved multicast routing entries, the creation of new multicast group routes can be extremely slow and unreliable. The reason is we hard-coded multicast route entries with unresolved source addresses(cache_resolve_queue_len) to 10. If some multicast route never resolves and the unresolved source addresses increased, there will be no ability to create new multicast route cache. To resolve this issue, we need either add a sysctl entry to make the cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len limit directly, as we already have the socket receive queue limits of mrouted socket, pointed by David. >From my side, I'd perfer to remove the cache_resolve_queue_len limit instead of creating two more(IPv4 and IPv6 version) sysctl entry. [1] https://lkml.org/lkml/2018/7/22/11 [2] https://lkml.org/lkml/2018/7/21/343 v3: instead of remove cache_resolve_queue_len totally, let's only remove the hard code limit when allocate the unresolved cache, as Eric Dumazet suggested, so we don't need to re-count it in other places. v2: hold the mfc_unres_lock while walking the unresolved list in queue_count(), as Nikolay Aleksandrov remind. Reported-by: Phil Karn <karn@ka9q.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR()Maciej Żenczykowski
Fixes a stupid bug I recently introduced... ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error. Fixes: d55a2e374a94 ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07tcp: ulp: fix possible crash in tcp_diag_get_aux_size()Eric Dumazet
tcp_diag_get_aux_size() can be called with sockets in any state. icsk_ulp_ops is only present for full sockets. For SYN_RECV or TIME_WAIT ones we would access garbage. Fixes: 61723b393292 ("tcp: ulp: add functions to dump ulp-specific information") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Luke Hsiao <lukehsiao@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Cc: Davide Caratti <dcaratti@redhat.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: fib_notifier: move fib_notifier_ops from struct net into per-net structJiri Pirko
No need for fib_notifier_ops to be in struct net. It is used only by fib_notifier as a private data. Use net_generic to introduce per-net fib_notifier struct and move fib_notifier_ops there. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add nft_reg_store64() and nft_reg_load64() helpers, from Ander Juaristi. 2) Time matching support, also from Ander Juaristi. 3) VLAN support for nfnetlink_log, from Michael Braun. 4) Support for set element deletions from the packet path, also from Ander. 5) Remove __read_mostly from conntrack spinlock, from Li RongQing. 6) Support for updating stateful objects, this also includes the initial client for this infrastructure: the quota extension. A follow up fix for the control plane also comes in this batch. Patches from Fernando Fernandez Mancera. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06kcm: use BPF_PROG_RUNSami Tolvanen
Instead of invoking struct bpf_prog::bpf_func directly, use the BPF_PROG_RUN macro. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06Bluetooth: hidp: Fix assumptions on the return value of hidp_send_messageDan Elkouby
hidp_send_message was changed to return non-zero values on success, which some other bits did not expect. This caused spurious errors to be propagated through the stack, breaking some drivers, such as hid-sony for the Dualshock 4 in Bluetooth mode. As pointed out by Dan Carpenter, hid-microsoft directly relied on that assumption as well. Fixes: 48d9cc9d85dd ("Bluetooth: hidp: Let hidp_send_message return number of queued bytes") Signed-off-by: Dan Elkouby <streetwalkermc@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-06net: sched: fix reordering issuesEric Dumazet
Whenever MQ is not used on a multiqueue device, we experience serious reordering problems. Bisection found the cited commit. The issue can be described this way : - A single qdisc hierarchy is shared by all transmit queues. (eg : tc qdisc replace dev eth0 root fq_codel) - When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting a different transmit queue than the one used to build a packet train, we stop building the current list and save the 'bad' skb (P1) in a special queue. (bad_txq) - When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this skb (P1), it checks if the associated transmit queues is still in frozen state. If the queue is still blocked (by BQL or NIC tx ring full), we leave the skb in bad_txq and return NULL. - dequeue_skb() calls q->dequeue() to get another packet (P2) The other packet can target the problematic queue (that we found in frozen state for the bad_txq packet), but another cpu just ran TX completion and made room in the txq that is now ready to accept new packets. - Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent at next round. In practice P2 is the lead of a big packet train (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/ To solve this problem, we have to block the dequeue process as long as the first packet in bad_txq can not be sent. Reordering issues disappear and no side effects have been seen. Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2019-09-05 1) Several xfrm interface fixes from Nicolas Dichtel: - Avoid an interface ID corruption on changelink. - Fix wrong intterface names in the logs. - Fix a list corruption when changing network namespaces. - Fix unregistation of the underying phydev. 2) Fix a potential warning when merging xfrm_plocy nodes. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06net_sched: act_police: add 2 new attributes to support police 64bit rate and ↵David Dai
peakrate For high speed adapter like Mellanox CX-5 card, it can reach upto 100 Gbits per second bandwidth. Currently htb already supports 64bit rate in tc utility. However police action rate and peakrate are still limited to 32bit value (upto 32 Gbits per second). Add 2 new attributes TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support so that tc utility can use them for 64bit rate and peakrate value to break the 32bit limit, and still keep the backward binary compatibility. Tested-by: David Dai <zdai@linux.vnet.ibm.com> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06net: openvswitch: Set OvS recirc_id from tc chain indexPaul Blakey
Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05new helper: get_tree_keyed()Al Viro
For vfs_get_keyed_super users. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-05Bluetooth: mgmt: Use struct_size() helperGustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct mgmt_rp_get_connections { ... struct mgmt_addr_info addr[0]; } __packed; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(*rp) + (i * sizeof(struct mgmt_addr_info)); with: struct_size(rp, addr, i) Also, notice that, in this case, variable rp_len is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-05Bluetooth: 6lowpan: Make variable header_ops constantNishka Dasgupta
Static variable header_ops, of type header_ops, is used only once, when it is assigned to field header_ops of a variable having type net_device. This corresponding field is declared as const in the definition of net_device. Hence make header_ops constant as well to protect it from unnecessary modification. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-05Bluetooth: Add support for utilizing Fast Advertising IntervalSpoorthi Ravishankar Koppad
Changes made to add support for fast advertising interval as per core 4.1 specification, section 9.3.11.2. A peripheral device entering any of the following GAP modes and sending either non-connectable advertising events or scannable undirected advertising events should use adv_fast_interval2 (100ms - 150ms) for adv_fast_period(30s). - Non-Discoverable Mode - Non-Connectable Mode - Limited Discoverable Mode - General Discoverable Mode Signed-off-by: Spoorthi Ravishankar Koppad <spoorthix.k@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-05xsk: lock the control mutex in sock_diag interfaceBjörn Töpel
When accessing the members of an XDP socket, the control mutex should be held. This commit fixes that. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05xsk: use state member for socket synchronizationBjörn Töpel
Prior the state variable was introduced by Ilya, the dev member was used to determine whether the socket was bound or not. However, when dev was read, proper SMP barriers and READ_ONCE were missing. In order to address the missing barriers and READ_ONCE, we start using the state variable as a point of synchronization. The state member read/write is paired with proper SMP barriers, and from this follows that the members described above does not need READ_ONCE if used in conjunction with state check. In all syscalls and the xsk_rcv path we check if state is XSK_BOUND. If that is the case we do a SMP read barrier, and this implies that the dev, umem and all rings are correctly setup. Note that no READ_ONCE are needed for these variable if used when state is XSK_BOUND (plus the read barrier). To summarize: The members struct xdp_sock members dev, queue_id, umem, fq, cq, tx, rx, and state were read lock-less, with incorrect barriers and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state are read lock-less. When these members are updated, WRITE_ONCE is used. When read, READ_ONCE are only used when read outside the control mutex (e.g. mmap) or, not synchronized with the state member (XSK_BOUND plus smp_rmb()) Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due to the introduce state synchronization (XSK_BOUND plus smp_rmb()). Introducing the state check also fixes a race, found by syzcaller, in xsk_poll() where umem could be accessed when stale. Suggested-by: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05xsk: avoid store-tearing when assigning umemBjörn Töpel
The umem member of struct xdp_sock is read outside of the control mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid potential store-tearing. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05xsk: avoid store-tearing when assigning queuesBjörn Töpel
Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid potential store-tearing. These members are read outside of the control mutex in the mmap implementation. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: 37b076933a8e ("xsk: add missing write- and data-dependency barrier") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05netfilter: nf_tables: fix possible null-pointer dereference in object updateFernando Fernandez Mancera
Not all objects have an update operation. If the object type doesn't implement an update operation and the user tries to update it will hit EOPNOTSUPP. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-05net: Properly update v4 routes with v6 nexthopDonald Sharp
When creating a v4 route that uses a v6 nexthop from a nexthop group. Allow the kernel to properly send the nexthop as v6 via the RTA_VIA attribute. Broken behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via 254.128.0.0 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ Fixed behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ v2, v3: Addresses code review comments from David Ahern Fixes: dcb1ecb50edf (“ipv4: Prepare for fib6_nh from a nexthop object”) Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05pppoatm: use %*ph to print small bufferAndy Shevchenko
Use %*ph format to print small buffer as hex string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>