summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-03-23SUNRPC: fix shutdown of NFS TCP client socketSiddharth Kawar
NFS server Duplicate Request Cache (DRC) algorithms rely on NFS clients reconnecting using the same local TCP port. Unique NFS operations are identified by the per-TCP connection set of XIDs. This prevents file corruption when non-idempotent NFS operations are retried. Currently, NFS client TCP connections are using different local TCP ports when reconnecting to NFS servers. After an NFS server initiates shutdown of the TCP connection, the NFS client's TCP socket is set to NULL after the socket state has reached TCP_LAST_ACK(9). When reconnecting, the new socket attempts to reuse the same local port but fails with EADDRNOTAVAIL (99). This forces the socket to use a different local TCP port to reconnect to the remote NFS server. State Transition and Events: TCP_CLOSE_WAIT(8) TCP_LAST_ACK(9) connect(fail EADDRNOTAVAIL(99)) TCP_CLOSE(7) bind on new port connect success dmesg excerpts showing reconnect switching from TCP local port of 926 to 763 after commit 7c81e6a9d75b: [13354.947854] NFS call mkdir testW ... [13405.654781] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.654813] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1 [13405.654826] RPC: xs_data_ready... [13405.654892] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.654895] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [13405.654899] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.654900] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [13405.654950] RPC: xs_connect scheduled xprt 00000000037d0f03 [13405.654975] RPC: xs_bind 0.0.0.0:926: ok (0) [13405.654980] RPC: worker connecting xprt 00000000037d0f03 via tcp to 10.101.6.228 (port 2049) [13405.654991] RPC: 00000000037d0f03 connect status 99 connected 0 sock state 7 [13405.655001] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.655002] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [13405.655024] RPC: xs_connect scheduled xprt 00000000037d0f03 [13405.655038] RPC: xs_bind 0.0.0.0:763: ok (0) [13405.655041] RPC: worker connecting xprt 00000000037d0f03 via tcp to 10.101.6.228 (port 2049) [13405.655065] RPC: 00000000037d0f03 connect status 115 connected 0 sock state 2 State Transition and Events with patch applied: TCP_CLOSE_WAIT(8) TCP_LAST_ACK(9) TCP_CLOSE(7) connect(reuse of port succeeds) dmesg excerpts showing reconnect on same TCP local port of 936 with patch applied: [ 257.139935] NFS: mkdir(0:59/560857152), testQ [ 257.139937] NFS call mkdir testQ ... [ 307.822702] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1 [ 307.822714] RPC: xs_data_ready... [ 307.822817] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.822821] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.822825] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.822826] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.823606] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.823609] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.823629] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.823632] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.823676] RPC: xs_connect scheduled xprt 00000000ce702f14 [ 307.823704] RPC: xs_bind 0.0.0.0:936: ok (0) [ 307.823709] RPC: worker connecting xprt 00000000ce702f14 via tcp to 10.101.1.30 (port 2049) [ 307.823748] RPC: 00000000ce702f14 connect status 115 connected 0 sock state 2 ... [ 314.916193] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 314.916251] RPC: xs_connect scheduled xprt 00000000ce702f14 [ 314.916282] RPC: xs_bind 0.0.0.0:936: ok (0) [ 314.916292] RPC: worker connecting xprt 00000000ce702f14 via tcp to 10.101.1.30 (port 2049) [ 314.916342] RPC: 00000000ce702f14 connect status 115 connected 0 sock state 2 Fixes: 7c81e6a9d75b ("SUNRPC: Tweak TCP socket shutdown in the RPC client") Signed-off-by: Siddharth Rajendra Kawar <sikawar@microsoft.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-23net/sched: act_api: use the correct TCA_ACT attributes in dumpPedro Tammela
4 places in the act api code are using 'TCA_' definitions where they should be using 'TCA_ACT_', which is confusing for the reader, although functionally they are equivalent. Cc: Hangbin Liu <haliu@redhat.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-23net: ipv4: Allow changing IPv4 address protocolPetr Machata
When IP address protocol field was added in commit 47f0bd503210 ("net: Add new protocol attribute to IP addresses"), the semantics included the ability to change the protocol for IPv6 addresses, but not for IPv4 addresses. It seems this was not deliberate, but rather by accident. A userspace that wants to change the protocol of an address might drop and recreate the address, but that disrupts routing and is just impractical. So in this patch, when an IPv4 address is replaced (through RTM_NEWADDR request with NLM_F_REPLACE flag), update the proto at the address to the one given in the request, or zero if none is given. This matches the behavior of IPv6. Previously, any new value given was simply ignored. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-22bpf: Update the struct_ops of a bpf_link.Kui-Feng Lee
By improving the BPF_LINK_UPDATE command of bpf(), it should allow you to conveniently switch between different struct_ops on a single bpf_link. This would enable smoother transitions from one struct_ops to another. The struct_ops maps passing along with BPF_LINK_UPDATE should have the BPF_F_LINK flag. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22bpf: Create links for BPF struct_ops maps.Kui-Feng Lee
Make bpf_link support struct_ops. Previously, struct_ops were always used alone without any associated links. Upon updating its value, a struct_ops would be activated automatically. Yet other BPF program types required to make a bpf_link with their instances before they could become active. Now, however, you can create an inactive struct_ops, and create a link to activate it later. With bpf_links, struct_ops has a behavior similar to other BPF program types. You can pin/unpin them from their links and the struct_ops will be deactivated when its link is removed while previously need someone to delete the value for it to be deactivated. bpf_links are responsible for registering their associated struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag set to create a bpf_link, while a structs without this flag behaves in the same manner as before and is registered upon updating its value. The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it used to craft the links for BPF struct_ops programs, but also to create links for BPF struct_ops them-self. Since the links of BPF struct_ops programs are only used to create trampolines internally, they are never seen in other contexts. Thus, they can be reused for struct_ops themself. To maintain a reference to the map supporting this link, we add bpf_struct_ops_link as an additional type. The pointer of the map is RCU and won't be necessary until later in the patchset. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22net: Update an existing TCP congestion control algorithm.Kui-Feng Lee
This feature lets you immediately transition to another congestion control algorithm or implementation with the same name. Once a name is updated, new connections will apply this new algorithm. The purpose is to update a customized algorithm implemented in BPF struct_ops with a new version on the flight. The following is an example of using the userspace API implemented in later BPF patches. link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); ....... err = bpf_link__update_map(link, skel->maps.ca_update_2); We first load and register an algorithm implemented in BPF struct_ops, then swap it out with a new one using the same name. After that, newly created connections will apply the updated algorithm, while older ones retain the previous version already applied. This patch also takes this chance to refactor the ca validation into the new tcp_validate_congestion_control() function. Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Link: https://lore.kernel.org/r/20230323032405.3735486-3-kuifeng@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22net/sched: remove two skb_mac_header() usesEric Dumazet
tcf_mirred_act() and tcf_mpls_act() can use skb_network_offset() instead of relying on skb_mac_header(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-22sch_cake: do not use skb_mac_header() in cake_overhead()Eric Dumazet
We want to remove our use of skb_mac_header() in tx paths, eg remove skb_reset_mac_header() from __dev_queue_xmit(). Idea is that ndo_start_xmit() can get the mac header simply looking at skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-22net: do not use skb_mac_header() in qdisc_pkt_len_init()Eric Dumazet
We want to remove our use of skb_mac_header() in tx paths, eg remove skb_reset_mac_header() from __dev_queue_xmit(). Idea is that ndo_start_xmit() can get the mac header simply looking at skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-22net: Catch invalid index in XPS mappingNick Child
When setting the XPS value of a TX queue, warn the user once if the index of the queue is greater than the number of allocated TX queues. Previously, this scenario went uncaught. In the best case, it resulted in unnecessary allocations. In the worst case, it resulted in out-of-bounds memory references through calls to `netdev_get_tx_queue( dev, index)`. Therefore, it is important to inform the user but not worth returning an error and risk downing the netdevice. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Link: https://lore.kernel.org/r/20230321150725.127229-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-22Merge tag 'ipsec-libreswan-mlx5' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== Extend packet offload to fully support libreswan The following patches are an outcome of Raed's work to add packet offload support to libreswan [1]. The series includes: * Priority support to IPsec policies * Statistics per-SA (visible through "ip -s xfrm state ..." command) * Support to IKE policy holes * Fine tuning to acquire logic. [1] https://github.com/libreswan/libreswan/pull/986 Link: https://lore.kernel.org/all/cover.1678714336.git.leon@kernel.org * tag 'ipsec-libreswan-mlx5' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5e: Update IPsec per SA packets/bytes count net/mlx5e: Use one rule to count all IPsec Tx offloaded traffic net/mlx5e: Support IPsec acquire default SA net/mlx5e: Allow policies with reqid 0, to support IKE policy holes xfrm: copy_to_user_state fetch offloaded SA packets/bytes statistics xfrm: add new device offload acquire flag net/mlx5e: Use chains for IPsec policy priority offload net/mlx5: fs_core: Allow ignore_flow_level on TX dest net/mlx5: fs_chains: Refactor to detach chains from tc usage ==================== Link: https://lore.kernel.org/r/20230320094722.1009304-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-22Bluetooth: Remove "Power-on" check from Mesh featureBrian Gix
The Bluetooth mesh experimental feature enable was requiring the controller to be powered off in order for the Enable to work. Mesh is supposed to be enablable regardless of the controller state, and created an unintended requirement that the mesh daemon be started before the classic bluetoothd daemon. Fixes: af6bcc1921ff ("Bluetooth: Add experimental wrapper for MGMT based mesh") Signed-off-by: Brian Gix <brian.gix@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-22Bluetooth: Fix race condition in hci_cmd_sync_clearMin Li
There is a potential race condition in hci_cmd_sync_work and hci_cmd_sync_clear, and could lead to use-after-free. For instance, hci_cmd_sync_work is added to the 'req_workqueue' after cancel_work_sync The entry of 'cmd_sync_work_list' may be freed in hci_cmd_sync_clear, and causing kernel panic when it is used in 'hci_cmd_sync_work'. Here's the call trace: dump_stack_lvl+0x49/0x63 print_report.cold+0x5e/0x5d3 ? hci_cmd_sync_work+0x282/0x320 kasan_report+0xaa/0x120 ? hci_cmd_sync_work+0x282/0x320 __asan_report_load8_noabort+0x14/0x20 hci_cmd_sync_work+0x282/0x320 process_one_work+0x77b/0x11c0 ? _raw_spin_lock_irq+0x8e/0xf0 worker_thread+0x544/0x1180 ? poll_idle+0x1e0/0x1e0 kthread+0x285/0x320 ? process_one_work+0x11c0/0x11c0 ? kthread_complete_and_exit+0x30/0x30 ret_from_fork+0x22/0x30 </TASK> Allocated by task 266: kasan_save_stack+0x26/0x50 __kasan_kmalloc+0xae/0xe0 kmem_cache_alloc_trace+0x191/0x350 hci_cmd_sync_queue+0x97/0x2b0 hci_update_passive_scan+0x176/0x1d0 le_conn_complete_evt+0x1b5/0x1a00 hci_le_conn_complete_evt+0x234/0x340 hci_le_meta_evt+0x231/0x4e0 hci_event_packet+0x4c5/0xf00 hci_rx_work+0x37d/0x880 process_one_work+0x77b/0x11c0 worker_thread+0x544/0x1180 kthread+0x285/0x320 ret_from_fork+0x22/0x30 Freed by task 269: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x40 kasan_set_free_info+0x24/0x40 ____kasan_slab_free+0x176/0x1c0 __kasan_slab_free+0x12/0x20 slab_free_freelist_hook+0x95/0x1a0 kfree+0xba/0x2f0 hci_cmd_sync_clear+0x14c/0x210 hci_unregister_dev+0xff/0x440 vhci_release+0x7b/0xf0 __fput+0x1f3/0x970 ____fput+0xe/0x20 task_work_run+0xd4/0x160 do_exit+0x8b0/0x22a0 do_group_exit+0xba/0x2a0 get_signal+0x1e4a/0x25b0 arch_do_signal_or_restart+0x93/0x1f80 exit_to_user_mode_prepare+0xf5/0x1a0 syscall_exit_to_user_mode+0x26/0x50 ret_from_fork+0x15/0x30 Fixes: 6a98e3836fa2 ("Bluetooth: Add helper for serialized HCI command execution") Cc: stable@vger.kernel.org Signed-off-by: Min Li <lm0963hack@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-22Bluetooth: ISO: fix timestamped HCI ISO data packet parsingPauli Virtanen
Use correct HCI ISO data packet header struct when the packet has timestamp. The timestamp, when present, goes before the other fields (Core v5.3 4E 5.4.5), so the structs are not compatible. Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-22Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packetLuiz Augusto von Dentz
Because some transports don't have a dedicated type for ISO packets (see 14202eff214e1e941fefa0366d4c3bc4b1a0d500) they may use ACL type when in fact they are ISO packets. In the past this was left for the driver to detect such thing but it creates a problem when using the likes of btproxy when used by a VM as the host would not be aware of the connection the guest is doing it won't be able to detect such behavior, so this make bt_recv_frame detect when it happens as it is the common interface to all drivers including guest VMs. Fixes: 14202eff214e ("Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-22Bluetooth: hci_sync: Resume adv with no RPA when active scanZhengping Jiang
The address resolution should be disabled during the active scan, so all the advertisements can reach the host. The advertising has to be paused before disabling the address resolution, because the advertising will prevent any changes to the resolving list and the address resolution status. Skipping this will cause the hci error and the discovery failure. According to the bluetooth specification: "7.8.44 LE Set Address Resolution Enable command This command shall not be used when: - Advertising (other than periodic advertising) is enabled, - Scanning is enabled, or - an HCI_LE_Create_Connection, HCI_LE_Extended_Create_Connection, or HCI_LE_Periodic_Advertising_Create_Sync command is outstanding." If the host is using RPA, the controller needs to generate RPA for the advertising, so the advertising must remain paused during the active scan. If the host is not using RPA, the advertising can be resumed after disabling the address resolution. Fixes: 9afc675edeeb ("Bluetooth: hci_sync: allow advertise when scan without RPA") Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-22bpf: return long from bpf_map_ops funcsJP Kobryn
This patch changes the return types of bpf_map_ops functions to long, where previously int was returned. Using long allows for bpf programs to maintain the sign bit in the absence of sign extension during situations where inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative error is returned. The definitions of the helper funcs are generated from comments in the bpf uapi header at `include/uapi/linux/bpf.h`. The return type of these helpers was previously changed from int to long in commit bdb7b79b4ce8. For any case where one of the map helpers call the bpf_map_ops funcs that are still returning 32-bit int, a compiler might not include sign extension instructions to properly convert the 32-bit negative value a 64-bit negative value. For example: bpf assembly excerpt of an inlined helper calling a kernel function and checking for a specific error: ; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST); ... 46: call 0xffffffffe103291c ; htab_map_update_elem ; if (err && err != -EEXIST) { 4b: cmp $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax kernel function assembly excerpt of return value from `htab_map_update_elem` returning 32-bit int: movl $0xffffffef, %r9d ... movl %r9d, %eax ...results in the comparison: cmp $0xffffffffffffffef, $0x00000000ffffffef Fixes: bdb7b79b4ce8 ("bpf: Switch most helper return values from 32-bit int to 64-bit long") Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Link: https://lore.kernel.org/r/20230322194754.185781-3-inwardvessel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-22SUNRPC: Fix a crash in gss_krb5_checksum()Chuck Lever
Anna says: > KASAN reports [...] a slab-out-of-bounds in gss_krb5_checksum(), > and it can cause my client to panic when running cthon basic > tests with krb5p. > Running faddr2line gives me: > > gss_krb5_checksum+0x4b6/0x630: > ahash_request_free at > /home/anna/Programs/linux-nfs.git/./include/crypto/hash.h:619 > (inlined by) gss_krb5_checksum at > /home/anna/Programs/linux-nfs.git/net/sunrpc/auth_gss/gss_krb5_crypto.c:358 My diagnosis is that the memcpy() at the end of gss_krb5_checksum() reads past the end of the buffer containing the checksum data because the callers have ignored gss_krb5_checksum()'s API contract: * Caller provides the truncation length of the output token (h) in * cksumout.len. Instead they provide the fixed length of the hmac buffer. This length happens to be larger than the value returned by crypto_ahash_digestsize(). Change these errant callers to work like krb5_etm_{en,de}crypt(). As a defensive measure, bound the length of the byte copy at the end of gss_krb5_checksum(). Kunit sez: Testing complete. Ran 68 tests: passed: 68 Elapsed time: 81.680s total, 5.875s configuring, 75.610s building, 0.103s running Reported-by: Anna Schumaker <schumaker.anna@gmail.com> Fixes: 8270dbfcebea ("SUNRPC: Obscure Kerberos integrity keys") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-22netfilter: keep conntrack reference until IPsecv6 policy checks are doneMadhu Koriginja
Keep the conntrack reference until policy checks have been performed for IPsec V6 NAT support, just like ipv4. The reference needs to be dropped before a packet is queued to avoid having the conntrack module unloadable. Fixes: 58a317f1061c ("netfilter: ipv6: add IPv6 NAT support") Signed-off-by: Madhu Koriginja <madhu.koriginja@nxp.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-22xtables: move icmp/icmpv6 logic to xt_tcpudpFlorian Westphal
icmp/icmp6 matches are baked into ip(6)_tables.ko. This means that even if iptables-nft is used, a rule like "-p icmp --icmp-type 1" will load the ip(6)tables modules. Move them to xt_tcpdudp.ko instead to avoid this. This will also allow to eventually add kconfig knobs to build kernels that support iptables-nft but not iptables-legacy (old set/getsockopt interface). Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-22netfilter: xtables: disable 32bit compat interface by defaultFlorian Westphal
This defaulted to 'y' because before this knob existed the 32bit compat layer was always compiled in if CONFIG_COMPAT was set. 32bit iptables on 64bit kernel isn't common anymore, so remove the default-y now. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-22netfilter: nft_masq: deduplicate eval call-backsJeremy Sowden
nft_masq has separate ipv4 and ipv6 call-backs which share much of their code, and an inet one switch containing a switch that calls one of the others based on the family of the packet. Merge the ipv4 and ipv6 ones into the inet one in order to get rid of the duplicate code. Const-qualify the `priv` pointer since we don't need to write through it. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-22netfilter: nft_redir: use `struct nf_nat_range2` throughout and deduplicate ↵Jeremy Sowden
eval call-backs `nf_nat_redirect_ipv4` takes a `struct nf_nat_ipv4_multi_range_compat`, but converts it internally to a `struct nf_nat_range2`. Change the function to take the latter, factor out the code now shared with `nf_nat_redirect_ipv6`, move the conversion to the xt_REDIRECT module, and update the ipv4 range initialization in the nft_redir module. Replace a bare hex constant for 127.0.0.1 with a macro. Remove `WARN_ON`. `nf_nat_setup_info` calls `nf_ct_is_confirmed`: /* Can't setup nat info for confirmed ct. */ if (nf_ct_is_confirmed(ct)) return NF_ACCEPT; This means that `ct` cannot be null or the kernel will crash, and implies that `ctinfo` is `IP_CT_NEW` or `IP_CT_RELATED`. nft_redir has separate ipv4 and ipv6 call-backs which share much of their code, and an inet one switch containing a switch that calls one of the others based on the family of the packet. Merge the ipv4 and ipv6 ones into the inet one in order to get rid of the duplicate code. Const-qualify the `priv` pointer since we don't need to write through it. Assign `priv->flags` to the range instead of OR-ing it in. Set the `NF_NAT_RANGE_PROTO_SPECIFIED` flag once during init, rather than on every eval. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-22xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver supportJesper Dangaard Brouer
When driver doesn't implement a bpf_xdp_metadata kfunc the fallback implementation returns EOPNOTSUPP, which indicate device driver doesn't implement this kfunc. Currently many drivers also return EOPNOTSUPP when the hint isn't available, which is ambiguous from an API point of view. Instead change drivers to return ENODATA in these cases. There can be natural cases why a driver doesn't provide any hardware info for a specific hint, even on a frame to frame basis (e.g. PTP). Lets keep these cases as separate return codes. When describing the return values, adjust the function kernel-doc layout to get proper rendering for the return values. Fixes: ab46182d0dcb ("net/mlx4_en: Support RX XDP metadata") Fixes: bc8d405b1ba9 ("net/mlx5e: Support RX XDP metadata") Fixes: 306531f0249f ("veth: Support RX XDP metadata") Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/167940675120.2718408.8176058626864184420.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-22net-zerocopy: Reduce compound page head accessXiaoyan Li
When compound pages are enabled, although the mm layer still returns an array of page pointers, a subset (or all) of them may have the same page head since a max 180kb skb can span 2 hugepages if it is on the boundary, be a mix of pages and 1 hugepage, or fit completely in a hugepage. Instead of referencing page head on all page pointers, use page length arithmetic to only call page head when referencing a known different page head to avoid touching a cold cacheline. Tested: See next patch with changes to tcp_mmap Correntess: On a pair of separate hosts as send with MSG_ZEROCOPY will force a copy on tx if using loopback alone, check that the SHA on the message sent is equivalent to checksum on the message received, since the current program already checks for the length. echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages ./tcp_mmap -s -z ./tcp_mmap -H $DADDR -z SHA256 is correct received 2 MB (100 % mmap'ed) in 0.005914 s, 2.83686 Gbit cpu usage user:0.001984 sys:0.000963, 1473.5 usec per MB, 10 c-switches Performance: Run neper between adjacent hosts with the same config tcp_stream -Z --skip-rx-copy -6 -T 20 -F 1000 --stime-use-proc --test-length=30 Before patch: stime_end=37.670000 After patch: stime_end=30.310000 Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230321081202.2370275-1-lixiaoyan@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-22wifi: mac80211: generate EMA beacons in AP modeAloka Dixit
Add APIs to generate an array of beacons for an EMA AP (enhanced multiple BSSID advertisements), each including a single MBSSID element. EMA profile periodicity equals the count of elements. - ieee80211_beacon_get_template_ema_list() - Generate and return all EMA beacon templates. Drivers must call ieee80211_beacon_free_ema_list() to free the memory. No change in the prototype for the existing API, ieee80211_beacon_get_template(), which should be used for non-EMA AP. - ieee80211_beacon_get_template_ema_index() - Generate a beacon which includes the multiple BSSID element at the given index. Drivers can use this function in a loop until NULL is returned which indicates end of available MBSSID elements. - ieee80211_beacon_free_ema_list() - free the memory allocated for the list of EMA beacon templates. Modify existing functions ieee80211_beacon_get_ap(), ieee80211_get_mbssid_beacon_len() and ieee80211_beacon_add_mbssid() to accept a new parameter for EMA index. Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Co-developed-by: John Crispin <john@phrozen.org> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20221206005040.3177-2-quic_alokad@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: fix mesh path discovery based on unicast packetsFelix Fietkau
If a packet has reached its intended destination, it was bumped to the code that accepts it, without first checking if a mesh_path needs to be created based on the discovered source. Fix this by moving the destination address check further down. Cc: stable@vger.kernel.org Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230314095956.62085-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: fix qos on mesh interfacesFelix Fietkau
When ieee80211_select_queue is called for mesh, the sta pointer is usually NULL, since the nexthop is looked up much later in the tx path. Explicitly check for unicast address in that case in order to make qos work again. Cc: stable@vger.kernel.org Fixes: 50e2ab392919 ("wifi: mac80211: fix queue selection for mesh/OCB interfaces") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230314095956.62085-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: implement support for yet another mesh A-MSDU formatFelix Fietkau
MT7996 hardware supports mesh A-MSDU subframes in hardware, but uses a big-endian length field Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://lore.kernel.org/r/20230314095956.62085-7-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: add mesh fast-rx supportFelix Fietkau
This helps bring down rx CPU usage by avoiding calls to the rx handlers in the slow path. Supports forwarding and local rx, including A-MSDU. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://lore.kernel.org/r/20230314095956.62085-6-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: use mesh header cache to speed up mesh forwardingFelix Fietkau
Significantly reduces mesh forwarding path CPU usage and enables the direct use of iTXQ. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://lore.kernel.org/r/20230314095956.62085-5-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: mesh fast xmit supportFelix Fietkau
Previously, fast xmit only worked on interface types where initially a sta lookup is performed, and a cached header can be attached to the sta, requiring only some fields to be updated at runtime. This technique is not directly applicable for a mesh device type due to the dynamic nature of the topology and protocol. There are more addresses that need to be filled, and there is an extra header with a dynamic length based on the addressing mode. Change the code to cache entries contain a copy of the mesh subframe header + bridge tunnel header, as well as an embedded struct ieee80211_fast_tx, which contains the information for building the 802.11 header. Add a mesh specific early fast xmit call, which looks up a cached entry and adds only the mesh subframe header, before passing it over to the generic fast xmit code. To ensure the changes in network are reflected in these cached headers, flush affected cached entries on path changes, as well as other conditions that currently trigger a fast xmit check in other modes (key changes etc.) This code is loosely based on a previous implementation by: Sriram R <quic_srirrama@quicinc.com> Cc: Sriram R <quic_srirrama@quicinc.com> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230314095956.62085-4-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: fix race in mesh sequence number assignmentFelix Fietkau
Since the sequence number is shared across different tx queues, it needs to be atomic in order to avoid accidental duplicate assignment Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230314095956.62085-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: add support for letting drivers register tc offload supportFelix Fietkau
On newer MediaTek SoCs (e.g. MT7986), WLAN->WLAN or WLAN->Ethernet flows can be offloaded by the SoC. In order to support that, the .ndo_setup_tc op is needed. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230321091248.30947-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22net-sysfs: display two backlog queue len separatelyJason Xing
Sometimes we need to know which one of backlog queue can be exactly long enough to cause some latency when debugging this part is needed. Thus, we can then separate the display of both. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230321015746.96994-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-22virtio/vsock: check transport before skb allocationArseniy Krasnov
Pointer to transport could be checked before allocation of skbuff, thus there is no need to free skbuff when this pointer is NULL. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/08d61bef-0c11-c7f9-9266-cb2109070314@sberdevices.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-21tools: ynl: skip the explicit op array size when not neededJakub Kicinski
Jiri suggests it reads more naturally to skip the explicit array size when possible. When we export the symbol we want to make sure that the size is right but for statics its not needed. Link: https://lore.kernel.org/r/20230321044159.1031040-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21neighbour: switch to standard rcu, instead of rcu_bhEric Dumazet
rcu_bh is no longer a win, especially for objects freed with standard call_rcu(). Switch neighbour code to no longer disable BH when not necessary. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21ipv6: flowlabel: do not disable BH where not neededEric Dumazet
struct ip6_flowlabel are rcu managed, and call_rcu() is used to delay fl_free_rcu() after RCU grace period. There is no point disabling BH for pure RCU lookups. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21erspan: do not use skb_mac_header() in ndo_start_xmit()Eric Dumazet
Drivers should not assume skb_mac_header(skb) == skb->data in their ndo_start_xmit(). Use skb_network_offset() and skb_transport_offset() which better describe what is needed in erspan_fb_xmit() and ip6erspan_tunnel_xmit() syzbot reported: WARNING: CPU: 0 PID: 5083 at include/linux/skbuff.h:2873 skb_mac_header include/linux/skbuff.h:2873 [inline] WARNING: CPU: 0 PID: 5083 at include/linux/skbuff.h:2873 ip6erspan_tunnel_xmit+0x1d9c/0x2d90 net/ipv6/ip6_gre.c:962 Modules linked in: CPU: 0 PID: 5083 Comm: syz-executor406 Not tainted 6.3.0-rc2-syzkaller-00866-gd4671cb96fa3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 RIP: 0010:skb_mac_header include/linux/skbuff.h:2873 [inline] RIP: 0010:ip6erspan_tunnel_xmit+0x1d9c/0x2d90 net/ipv6/ip6_gre.c:962 Code: 04 02 41 01 de 84 c0 74 08 3c 03 0f 8e 1c 0a 00 00 45 89 b4 24 c8 00 00 00 c6 85 77 fe ff ff 01 e9 33 e7 ff ff e8 b4 27 a1 f8 <0f> 0b e9 b6 e7 ff ff e8 a8 27 a1 f8 49 8d bf f0 0c 00 00 48 b8 00 RSP: 0018:ffffc90003b2f830 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 000000000000ffff RCX: 0000000000000000 RDX: ffff888021273a80 RSI: ffffffff88e1bd4c RDI: 0000000000000003 RBP: ffffc90003b2f9d8 R08: 0000000000000003 R09: 000000000000ffff R10: 000000000000ffff R11: 0000000000000000 R12: ffff88802b28da00 R13: 00000000000000d0 R14: ffff88807e25b6d0 R15: ffff888023408000 FS: 0000555556a61300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e5b11eb6e8 CR3: 0000000027c1b000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __netdev_start_xmit include/linux/netdevice.h:4900 [inline] netdev_start_xmit include/linux/netdevice.h:4914 [inline] __dev_direct_xmit+0x504/0x730 net/core/dev.c:4300 dev_direct_xmit include/linux/netdevice.h:3088 [inline] packet_xmit+0x20a/0x390 net/packet/af_packet.c:285 packet_snd net/packet/af_packet.c:3075 [inline] packet_sendmsg+0x31a0/0x5150 net/packet/af_packet.c:3107 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 __sys_sendto+0x23a/0x340 net/socket.c:2142 __do_sys_sendto net/socket.c:2154 [inline] __se_sys_sendto net/socket.c:2150 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2150 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f123aaa1039 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc15d12058 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f123aaa1039 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000020000040 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f123aa648c0 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 Fixes: 1baf5ebf8954 ("erspan: auto detect truncated packets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230320163427.8096-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: dsa: tag_brcm: legacy: fix daisy-chained switchesÁlvaro Fernández Rojas
When BCM63xx internal switches are connected to switches with a 4-byte Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one based on its PVID (which is likely 0). Right now, the packet is received by the BCM63xx internal switch and the 6-byte tag is properly processed. The next step would to decode the corresponding 4-byte tag. However, the internal switch adds an invalid VLAN tag after the 6-byte tag and the 4-byte tag handling fails. In order to fix this we need to remove the invalid VLAN tag after the 6-byte tag before passing it to the 4-byte tag decoding. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230319095540.239064-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: skbuff: rename __pkt_vlan_present_offset to __mono_tc_offsetJakub Kicinski
vlan_present is gone since commit 354259fa73e2 ("net: remove skb->vlan_present") rename the offset field to what BPF is currently looking for in this byte - mono_delivery_time and tc_at_ingress. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230321014115.997841-2-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-20xfrm: copy_to_user_state fetch offloaded SA packets/bytes statisticsRaed Salem
Both in RX and TX, the traffic that performs IPsec packet offload transformation is accounted by HW only. Consequently, the HW should be queried for packets/bytes statistics when user asks for such transformation data. Signed-off-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/d90ec74186452b1509ee94875d942cb777b7181e.1678714336.git.leon@kernel.org Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20xfrm: add new device offload acquire flagRaed Salem
During XFRM acquire flow, a default SA is created to be updated later, once acquire netlink message is handled in user space. When the relevant policy is offloaded this default SA is also offloaded to IPsec offload supporting driver, however this SA does not have context suitable for offloading in HW, nor is interesting to offload to HW, consequently needs a special driver handling apart from other offloaded SA(s). Add a special flag that marks such SA so driver can handle it correctly. Signed-off-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/f5da0834d8c6b82ab9ba38bd4a0c55e71f0e3dab.1678714336.git.leon@kernel.org Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20net: dsa: report rx_bytes unadjusted for ETH_HLENVladimir Oltean
We collect the software statistics counters for RX bytes (reported to /proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when skb->len has already been adjusted by the eth_type_trans() -> skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header. This means that when connecting 2 DSA interfaces back to back and sending 1 packet with length 100, the sending interface will report tx_bytes as incrementing by 100, and the receiving interface will report rx_bytes as incrementing by 86. Since accounting for that in scripts is quirky and is something that would be DSA-specific behavior (requiring users to know that they are running on a DSA interface in the first place), the proposal is that we treat it as a bug and fix it. This design bug has always existed in DSA, according to my analysis: commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") also updates skb->dev->stats.rx_bytes += skb->len after the eth_type_trans() call. Technically, prior to Florian's commit a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and every vendor-specific tagging protocol driver open-coded the same bug, until the buggy code was consolidated into something resembling what can be seen now. So each and every driver should have its own Fixes: tag, because of their different histories until the convergence point. I'm not going to do that, for the sake of simplicity, but just blame the oldest appearance of buggy code. There are 2 ways to fix the problem. One is the obvious way, and the other is how I ended up doing it. Obvious would have been to move dev_sw_netstats_rx_add() one line above eth_type_trans(), and below skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that. We count the bytes after removing everything DSA-related from the packet, to emulate what the packet's length was, on the wire, when the user port received it. When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet, so in case the switch driver requests this behavior - commit 412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the details - the obvious variant of the fix wouldn't have worked, because the positioning there would have also counted the not-yet-stripped VLAN header length, something which is absent from the packet as seen on the wire (there it may be untagged, whereas software will see it as PVID-tagged). Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-19net/packet: remove po->xmitEric Dumazet
Use PACKET_SOCK_QDISC_BYPASS atomic bit instead of a pointer. This removes one indirect call in fast path, and READ_ONCE()/WRITE_ONCE() annotations as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-18tcp: preserve const qualifier in tcp_sk()Eric Dumazet
We can change tcp_sk() to propagate its argument const qualifier, thanks to container_of_const(). We have two places where a const sock pointer has to be upgraded to a write one. We have been using const qualifier for lockless listeners to clearly identify points where writes could happen. Add tcp_sk_rw() helper to better document these. tcp_inbound_md5_hash(), __tcp_grow_window(), tcp_reset_check() and tcp_rack_reo_wnd() get an additional const qualififer for their @tp local variables. smc_check_reset_syn_req() also needs a similar change. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-18mptcp: preserve const qualifier in mptcp_sk()Eric Dumazet
We can change mptcp_sk() to propagate its argument const qualifier, thanks to container_of_const(). We need to change few things to avoid build errors: mptcp_set_datafin_timeout() and mptcp_rtx_head() have to accept non-const sk pointers. @msk local variable in mptcp_pending_tail() must be const. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-18smc: preserve const qualifier in smc_sk()Eric Dumazet
We can change smc_sk() to propagate its argument const qualifier, thanks to container_of_const(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Karsten Graul <kgraul@linux.ibm.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-18af_packet: preserve const qualifier in pkt_sk()Eric Dumazet
We can change pkt_sk() to propagate const qualifier of its argument, thanks to container_of_const() This should avoid some potential errors caused by accidental (const -> not_const) promotion. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>