summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-06-02batman-adv: Fix spelling mistakesZheng Yongjun
Fix some spelling mistakes in comments: containg ==> containing dont ==> don't datas ==> data brodcast ==> broadcast Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Support for SCTP chunks matching on nf_tables, from Phil Sutter. 2) Skip LDMXCSR, we don't need a valid MXCSR state. From Stefano Brivio. 3) CONFIG_RETPOLINE for nf_tables set lookups, from Florian Westphal. 4) A few Kconfig leading spaces removal, from Juerg Haefliger. 5) Remove spinlock from xt_limit, from Jason Baron. 6) Remove useless initialization in xt_CT, oneliner from Yang Li. 7) Tree-wide replacement of netlink_unicast() by nfnetlink_unicast(). 8) Reduce footprint of several structures: xt_action_param, nft_pktinfo and nf_hook_state, from Florian. 10) Add nft_thoff() and nft_sk() helpers and use them, also from Florian. 11) Fix documentation in nf_tables pipapo avx2, from Florian Westphal. 12) Fix clang-12 fmt string warnings, also from Florian. ====================
2021-06-01net: Return the correct errno codeZheng Yongjun
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net: dcb: Return the correct errno codeZheng Yongjun
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net/sched: act_vlan: No dump for unset priorityBoris Sukholitko
Dump vlan priority only if it has been previously set. Fix the tests accordingly. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net/sched: act_vlan: Fix modify to allow 0Boris Sukholitko
Currently vlan modification action checks existence of vlan priority by comparing it to 0. Therefore it is impossible to modify existing vlan tag to have priority 0. For example, the following tc command will change the vlan id but will not affect vlan priority: tc filter add dev eth1 ingress matchall action vlan modify id 300 \ priority 0 pipe mirred egress redirect dev eth2 The incoming packet on eth1: ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4 will be changed to: ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4 although the user has intended to have p == 0. The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params and rely on it when deciding to set the priority. Fixes: 45a497f2d149a4a8061c (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action) Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net/tls: Fix use-after-free after the TLS device goes down and upMaxim Mikityanskiy
When a netdev with active TLS offload goes down, tls_device_down is called to stop the offload and tear down the TLS context. However, the socket stays alive, and it still points to the TLS context, which is now deallocated. If a netdev goes up, while the connection is still active, and the data flow resumes after a number of TCP retransmissions, it will lead to a use-after-free of the TLS context. This commit addresses this bug by keeping the context alive until its normal destruction, and implements the necessary fallbacks, so that the connection can resume in software (non-offloaded) kTLS mode. On the TX side tls_sw_fallback is used to encrypt all packets. The RX side already has all the necessary fallbacks, because receiving non-decrypted packets is supported. The thing needed on the RX side is to block resync requests, which are normally produced after receiving non-decrypted packets. The necessary synchronization is implemented for a graceful teardown: first the fallbacks are deployed, then the driver resources are released (it used to be possible to have a tls_dev_resync after tls_dev_del). A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback mode. It's used to skip the RX resync logic completely, as it becomes useless, and some objects may be released (for example, resync_async, which is allocated and freed by the driver). Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net/tls: Replace TLS_RX_SYNC_RUNNING with RCUMaxim Mikityanskiy
RCU synchronization is guaranteed to finish in finite time, unlike a busy loop that polls a flag. This patch is a preparation for the bugfix in the next patch, where the same synchronize_net() call will also be used to sync with the TX datapath. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01NFC: nci: Remove redundant assignment to lenYang Li
Variable 'len' is set to conn_info->max_pkt_payload_len but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: net/nfc/nci/hci.c:164:3: warning: Value stored to 'len' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net: sock: fix in-kernel mark settingAlexander Aring
This patch fixes the in-kernel mark setting by doing an additional sk_dst_reset() which was introduced by commit 50254256f382 ("sock: Reset dst when changing sk_mark via setsockopt"). The code is now shared to avoid any further suprises when changing the socket mark value. Fixes: 84d1c617402e ("net: sock: add sock_set_mark") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01netpoll: don't require irqs disabled in rt kernelsWander Lairson Costa
write_msg(netconsole.c:836) calls netpoll_send_udp after a call to spin_lock_irqsave, which normally disables interrupts; but in PREEMPT_RT this call just locks an rt_mutex without disabling irqs. In this case, netpoll_send_udp is called with interrupts enabled. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANsVladimir Oltean
When using sub-VLANs in the range of 1-7, the resulting value from: rx_vid = dsa_8021q_rx_vid_subvlan(ds, port, subvlan); is wrong according to the description from tag_8021q.c: | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | +-----------+-----+-----------------+-----------+-----------------------+ | DIR | SVL | SWITCH_ID | SUBVLAN | PORT | +-----------+-----+-----------------+-----------+-----------------------+ For example, when ds->index == 0, port == 3 and subvlan == 1, dsa_8021q_rx_vid_subvlan() returns 1027, same as it returns for subvlan == 0, but it should have returned 1043. This is because the low portion of the subvlan bits are not masked properly when writing into the 12-bit VLAN value. They are masked into bits 4:3, but they should be masked into bits 5:4. Fixes: 3eaae1d05f2b ("net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01netfilter: fix clang-12 fmt string warningsFlorian Westphal
nf_conntrack_h323_main.c:198:6: warning: format specifies type 'unsigned short' but xt_AUDIT.c:121:9: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-01netfilter: nft_set_pipapo_avx2: fix up description warningsFlorian Westphal
W=1: net/netfilter/nft_set_pipapo_avx2.c:159: warning: Excess function parameter 'len' description in 'nft_pipapo_avx2_refill' net/netfilter/nft_set_pipapo_avx2.c:1124: warning: Function parameter or member 'key' not described in 'nft_pipapo_avx2_lookup' net/netfilter/nft_set_pipapo_avx2.c:1124: warning: Excess function parameter 'elem' description in 'nft_pipapo_avx2_lookup' Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-01xfrm: remove the fragment check for ipv6 beet modeXin Long
In commit 68dc022d04eb ("xfrm: BEET mode doesn't support fragments for inner packets"), it tried to fix the issue that in TX side the packet is fragmented before the ESP encapping while in the RX side the fragments always get reassembled before decapping with ESP. This is not true for IPv6. IPv6 is different, and it's using exthdr to save fragment info, as well as the ESP info. Exthdrs are added in TX and processed in RX both in order. So in the above case, the ESP decapping will be done earlier than the fragment reassembling in TX side. Here just remove the fragment check for the IPv6 inner packets to recover the fragments support for BEET mode. Fixes: 68dc022d04eb ("xfrm: BEET mode doesn't support fragments for inner packets") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-01xfrm: policy: Read seqcount outside of rcu-read side in ↵Varad Gautam
xfrm_policy_lookup_bytype xfrm_policy_lookup_bytype loops on seqcount mutex xfrm_policy_hash_generation within an RCU read side critical section. Although ill advised, this is fine if the loop is bounded. xfrm_policy_hash_generation wraps mutex hash_resize_mutex, which is used to serialize writers (xfrm_hash_resize, xfrm_hash_rebuild). This is fine too. On PREEMPT_RT=y, the read_seqcount_begin call within xfrm_policy_lookup_bytype emits a mutex lock/unlock for hash_resize_mutex. Mutex locking is fine, since RCU read side critical sections are allowed to sleep with PREEMPT_RT. xfrm_hash_resize can, however, block on synchronize_rcu while holding hash_resize_mutex. This leads to the following situation on PREEMPT_RT, where the writer is blocked on RCU grace period expiry, while the reader is blocked on a lock held by the writer: Thead 1 (xfrm_hash_resize) Thread 2 (xfrm_policy_lookup_bytype) rcu_read_lock(); mutex_lock(&hash_resize_mutex); read_seqcount_begin(&xfrm_policy_hash_generation); mutex_lock(&hash_resize_mutex); // block xfrm_bydst_resize(); synchronize_rcu(); // block <RCU stalls in xfrm_policy_lookup_bytype> Move the read_seqcount_begin call outside of the RCU read side critical section, and do an rcu_read_unlock/retry if we got stale data within the critical section. On non-PREEMPT_RT, this shortens the time spent within RCU read side critical section in case the seqcount needs a retry, and avoids unbounded looping. Fixes: 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated lock") Signed-off-by: Varad Gautam <varad.gautam@suse.com> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org # v4.9 Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Florian Westphal <fw@strlen.de> Cc: "Ahmed S. Darwish" <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Ahmed S. Darwish <a.darwish@linutronix.de>
2021-05-31sctp: sm_statefuns: Fix spelling mistakesZheng Yongjun
Fix some spelling mistakes in comments: genereate ==> generate correclty ==> correctly boundries ==> boundaries failes ==> fails isses ==> issues assocition ==> association signe ==> sign assocaition ==> association managemement ==> management restransmissions ==> retransmission sideffect ==> sideeffect bomming ==> booming chukns ==> chunks SHUDOWN ==> SHUTDOWN violationg ==> violating explcitly ==> explicitly CHunk ==> Chunk Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Link: https://lore.kernel.org/r/20210601020801.3625358-1-zhengyongjun3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31rds: Fix spelling mistakesZheng Yongjun
Fix some spelling mistakes in comments: alloced ==> allocated Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Link: https://lore.kernel.org/r/20210531063617.3018637-1-zhengyongjun3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31net: sched: Fix spelling mistakesZheng Yongjun
Fix some spelling mistakes in comments: sevaral ==> several sugestion ==> suggestion unregster ==> unregister suplied ==> supplied cirsumstances ==> circumstances Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Link: https://lore.kernel.org/r/20210531020048.2920054-1-zhengyongjun3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31nfc: hci: Fix spelling mistakesZheng Yongjun
Fix some spelling mistakes in comments: occured ==> occurred negociate ==> negotiate Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20210531020019.2919799-1-zhengyongjun3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connectKrzysztof Kozlowski
It's possible to trigger NULL pointer dereference by local unprivileged user, when calling getsockname() after failed bind() (e.g. the bind fails because LLCP_SAP_MAX used as SAP): BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 Call Trace: llcp_sock_getname+0xb1/0xe0 __sys_getpeername+0x95/0xc0 ? lockdep_hardirqs_on_prepare+0xd5/0x180 ? syscall_enter_from_user_mode+0x1c/0x40 __x64_sys_getpeername+0x11/0x20 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduced with Syzkaller C repro (bind followed by getpeername): https://syzkaller.appspot.com/x/repro.c?x=14def446e00000 Cc: <stable@vger.kernel.org> Fixes: d646960f7986 ("NFC: Initial LLCP support") Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31ipv6: align code with contextRocco Yue
The Tab key is used three times, causing the code block to be out of alignment with the context. Signed-off-by: Rocco Yue <rocco.yue@mediatek.com> Link: https://lore.kernel.org/r/20210530113811.8817-1-rocco.yue@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31ipv6: use prandom_u32() for ID generationWilly Tarreau
This is a complement to commit aa6dd211e4b1 ("inet: use bigger hash table for IP ID generation"), but focusing on some specific aspects of IPv6. Contary to IPv4, IPv6 only uses packet IDs with fragments, and with a minimum MTU of 1280, it's much less easy to force a remote peer to produce many fragments to explore its ID sequence. In addition packet IDs are 32-bit in IPv6, which further complicates their analysis. On the other hand, it is often easier to choose among plenty of possible source addresses and partially work around the bigger hash table the commit above permits, which leaves IPv6 partially exposed to some possibilities of remote analysis at the risk of weakening some protocols like DNS if some IDs can be predicted with a good enough probability. Given the wide range of permitted IDs, the risk of collision is extremely low so there's no need to rely on the positive increment algorithm that is shared with the IPv4 code via ip_idents_reserve(). We have a fast PRNG, so let's simply call prandom_u32() and be done with it. Performance measurements at 10 Gbps couldn't show any difference with the previous code, even when using a single core, because due to the large fragments, we're limited to only ~930 kpps at 10 Gbps and the cost of the random generation is completely offset by other operations and by the network transfer time. In addition, this change removes the need to update a shared entry in the idents table so it may even end up being slightly faster on large scale systems where this matters. The risk of at least one collision here is about 1/80 million among 10 IDs, 1/850k among 100 IDs, and still only 1/8.5k among 1000 IDs, which remains very low compared to IPv4 where all IDs are reused every 4 to 80ms on a 10 Gbps flow depending on packet sizes. Reported-by: Amit Klein <aksecurity@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210529110746.6796-1-w@1wt.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31mac80211: Fix NULL ptr deref for injected rate infoMathy Vanhoef
The commit cb17ed29a7a5 ("mac80211: parse radiotap header when selecting Tx queue") moved the code to validate the radiotap header from ieee80211_monitor_start_xmit to ieee80211_parse_tx_radiotap. This made is possible to share more code with the new Tx queue selection code for injected frames. But at the same time, it now required the call of ieee80211_parse_tx_radiotap at the beginning of functions which wanted to handle the radiotap header. And this broke the rate parser for radiotap header parser. The radiotap parser for rates is operating most of the time only on the data in the actual radiotap header. But for the 802.11a/b/g rates, it must also know the selected band from the chandef information. But this information is only written to the ieee80211_tx_info at the end of the ieee80211_monitor_start_xmit - long after ieee80211_parse_tx_radiotap was already called. The info->band information was therefore always 0 (NL80211_BAND_2GHZ) when the parser code tried to access it. For a 5GHz only device, injecting a frame with 802.11a rates would cause a NULL pointer dereference because local->hw.wiphy->bands[NL80211_BAND_2GHZ] would most likely have been NULL when the radiotap parser searched for the correct rate index of the driver. Cc: stable@vger.kernel.org Reported-by: Ben Greear <greearb@candelatech.com> Fixes: cb17ed29a7a5 ("mac80211: parse radiotap header when selecting Tx queue") Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> [sven@narfation.org: added commit message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20210530133226.40587-1-sven@narfation.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-05-31mac80211: fix skb length check in ieee80211_scan_rx()Du Cheng
Replace hard-coded compile-time constants for header length check with dynamic determination based on the frame type. Otherwise, we hit a validation WARN_ON in cfg80211 later. Fixes: cd418ba63f0c ("mac80211: convert S1G beacon to scan results") Reported-by: syzbot+405843667e93b9790fc1@syzkaller.appspotmail.com Signed-off-by: Du Cheng <ducheng2@gmail.com> Link: https://lore.kernel.org/r/20210510041649.589754-1-ducheng2@gmail.com [style fixes, reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-05-31cfg80211: call cfg80211_leave_ocb when switching away from OCBDu Cheng
If the userland switches back-and-forth between NL80211_IFTYPE_OCB and NL80211_IFTYPE_ADHOC via send_msg(NL80211_CMD_SET_INTERFACE), there is a chance where the cleanup cfg80211_leave_ocb() is not called. This leads to initialization of in-use memory (e.g. init u.ibss while in-use by u.ocb) due to a shared struct/union within ieee80211_sub_if_data: struct ieee80211_sub_if_data { ... union { struct ieee80211_if_ap ap; struct ieee80211_if_vlan vlan; struct ieee80211_if_managed mgd; struct ieee80211_if_ibss ibss; // <- shares address struct ieee80211_if_mesh mesh; struct ieee80211_if_ocb ocb; // <- shares address struct ieee80211_if_mntr mntr; struct ieee80211_if_nan nan; } u; ... } Therefore add handling of otype == NL80211_IFTYPE_OCB, during cfg80211_change_iface() to perform cleanup when leaving OCB mode. link to syzkaller bug: https://syzkaller.appspot.com/bug?id=0612dbfa595bf4b9b680ff7b4948257b8e3732d5 Reported-by: syzbot+105896fac213f26056f9@syzkaller.appspotmail.com Signed-off-by: Du Cheng <ducheng2@gmail.com> Link: https://lore.kernel.org/r/20210428063941.105161-1-ducheng2@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-05-31mac80211: remove warning in ieee80211_get_sband()Johannes Berg
Syzbot reports that it's possible to hit this from userspace, by trying to add a station before any other connection setup has been done. Instead of trying to catch this in some other way simply remove the warning, that will appropriately reject the call from userspace. Reported-by: syzbot+7716dbc401d9a437890d@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20210517164715.f537da276d17.Id05f40ec8761d6a8cc2df87f1aa09c651988a586@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-05-31Bluetooth: use correct lock to prevent UAF of hdev objectLin Ma
The hci_sock_dev_event() function will cleanup the hdev object for sockets even if this object may still be in used within the hci_sock_bound_ioctl() function, result in UAF vulnerability. This patch replace the BH context lock to serialize these affairs and prevent the race condition. Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-05-31Merge 5.13-rc4 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-30batman-adv: Remove the repeated declarationShaokun Zhang
Function 'batadv_bla_claim_dump' is declared twice, so remove the repeated declaration. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-05-30batman-adv: mcast: add MRD + routable IPv4 multicast with bridges supportLinus Lüssing
This adds support for routable IPv4 multicast addresses (224.0.0.0/4, excluding 224.0.0.0/24) in bridged setups. This utilizes the Multicast Router Discovery (MRD, RFC4286) support in the Linux bridge. batman-adv will now query the Linux bridge for IPv4 multicast routers, which the bridge has previously learned about via MRD. This allows us to then safely send routable IPv4 multicast packets in bridged setups to multicast listeners and multicast routers only. Before we had to flood such packets to avoid potential multicast packet loss to IPv4 multicast routers, which we were not able to detect before. With the bridge MRD integration, we are now also able to perform more fine-grained detection of IPv6 multicast routers in bridged setups: Before we were "guessing" IPv6 multicast routers by looking up multicast listeners for the link-local All Routers multicast address (ff02::2), which every IPv6 multicast router is listening to. However this would also include more nodes than necessary: For instance nodes which are just a router for unicast, but not multicast would be included, too. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-05-29netfilter: nf_tables: remove xt_action_param from nft_pktinfoFlorian Westphal
Init it on demand in the nft_compat expression. This reduces size of nft_pktinfo from 48 to 24 bytes on x86_64. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: nf_tables: remove unused arg in nft_set_pktinfo_unspec()Florian Westphal
The functions pass extra skb arg, but either its not used or the helpers can already access it via pkt->skb. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: nf_tables: add and use nft_thoff helperFlorian Westphal
This allows to change storage placement later on without changing readers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: nf_tables: add and use nft_sk helperFlorian Westphal
This allows to change storage placement later on without changing readers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: x_tables: reduce xt_action_param by 8 byteFlorian Westphal
The fragment offset in ipv4/ipv6 is a 16bit field, so use u16 instead of unsigned int. On 64bit: 40 bytes to 32 bytes. By extension this also reduces nft_pktinfo (56 to 48 byte). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: use nfnetlink_unicast()Pablo Neira Ayuso
Replace netlink_unicast() calls by nfnetlink_unicast() which already deals with translating EAGAIN to ENOBUFS as the nfnetlink core expects. nfnetlink_unicast() calls nlmsg_unicast() which returns zero in case of success, otherwise the netlink core function netlink_rcv_skb() turns err > 0 into an acknowlegment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: xt_CT: Remove redundant assignment to retYang Li
Variable 'ret' is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed Clean up the following clang-analyzer warning: net/netfilter/xt_CT.c:175:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: x_tables: improve limit_mt scalabilityJason Baron
We've seen this spin_lock show up high in profiles. Let's introduce a lockless version. I've tested this using pktgen_sample01_simple.sh. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: Remove leading spaces in KconfigJuerg Haefliger
Remove leading spaces before tabs in Kconfig file(s) by running the following command: $ find net/netfilter -name 'Kconfig*' | xargs sed -r -i 's/^[ ]+\t/\t/' Signed-off-by: Juerg Haefliger <juergh@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: nf_tables: prefer direct calls for set lookupsFlorian Westphal
Extend nft_set_do_lookup() to use direct calls when retpoline feature is enabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28mptcp: restrict values of 'enabled' sysctlMatthieu Baerts
To avoid confusions, it seems better to parse this sysctl parameter as a boolean. We use it as a boolean, no need to parse an integer and bring confusions if we see a value different from 0 and 1, especially with this parameter name: enabled. It seems fine to do this modification because the default value is 1 (enabled). Then the only other interesting value to set is 0 (disabled). All other values would not have changed the default behaviour. Suggested-by: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: support SYSCTL only if enabledMatthieu Baerts
Since the introduction of the sysctl support in MPTCP with commit 784325e9f037 ("mptcp: new sysctl to control the activation per NS"), we don't check CONFIG_SYSCTL. Until now, that was not an issue: the register and unregister functions were replaced by NO-OP one if SYSCTL was not enabled in the config. The only thing we could have avoid is not to reserve memory for the table but that's for the moment only a small table per net-ns. But the following commit is going to use SYSCTL_ZERO and SYSCTL_ONE which are not be defined if SYSCTL is not enabled in the config. This causes 'undefined reference' errors from the linker. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: make sure flag signal is set when add addr with portJianguo Wu
When add address with port, it is mean to create a listening socket, and send an ADD_ADDR to remote, so it must have flag signal set, add this check in mptcp_pm_parse_addr(). Fixes: a77e9179c7651 ("mptcp: deal with MPTCP_PM_ADDR_ATTR_PORT in PM netlink") Acked-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: remove redundant initialization in pm_nl_init_net()Jianguo Wu
Memory of struct pm_nl_pernet{} is allocated by kzalloc() in setup_net()->ops_init(), so it's no need to reset counters and zero bitmap in pm_nl_init_net(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: generate subflow hmac after mptcp_finish_join()Jianguo Wu
For outgoing subflow join, when recv SYNACK, in subflow_finish_connect(), the mptcp_finish_join() may return false in some cases, and send a RESET to remote, and no local hmac is required. So generate subflow hmac after mptcp_finish_join(). Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: using TOKEN_MAX_RETRIES instead of magic numberJianguo Wu
We have macro TOKEN_MAX_RETRIES for the number of token generate retries, so using TOKEN_MAX_RETRIES in subflow_check_req(). And rename TOKEN_MAX_RETRIES to MPTCP_TOKEN_MAX_RETRIES as it is now exposed. Fixes: 535fb8152f31 ("mptcp: token: move retry to caller") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: fix pr_debug in mptcp_token_new_connectJianguo Wu
After commit 2c5ebd001d4f ("mptcp: refactor token container"), pr_debug() is called before mptcp_crypto_key_gen_sha() in mptcp_token_new_connect(), so the output local_key, token and idsn are 0, like: MPTCP: ssk=00000000f6b3c4a2, local_key=0, token=0, idsn=0 Move pr_debug() after mptcp_crypto_key_gen_sha(). Fixes: 2c5ebd001d4f ("mptcp: refactor token container") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: do not reset MP_CAPABLE subflow on mapping errorsPaolo Abeni
When some mapping related errors occurs we close the main MPC subflow with a RST. We should instead fallback gracefully to TCP, and do the reset only for MPJ subflows. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/192 Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: always parse mptcp options for MPC reqskPaolo Abeni
In subflow_syn_recv_sock() we currently skip options parsing for OoO packet, given that such packets may not carry the relevant MPC option. If the peer generates an MPC+data TSO packet and some of the early segments are lost or get reorder, we server will ignore the peer key, causing transient, unexpected fallback to TCP. The solution is always parsing the incoming MPTCP options, and do the fallback only for in-order packets. This actually cleans the existing code a bit. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>