summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-06-14net/sched: qdisc_destroy() old ingress and clsact Qdiscs before graftingPeilin Ye
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs access this per-net_device pointer in mini_qdisc_pair_swap(). Similar for clsact Qdiscs and miniq_egress. Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER requests (thanks Hillf Danton for the hint), when replacing ingress or clsact Qdiscs, for example, the old Qdisc ("@old") could access the same miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"), causing race conditions [1] including a use-after-free bug in mini_qdisc_pair_swap() reported by syzbot: BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901 ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319 print_report mm/kasan/report.c:430 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:536 mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 tcf_chain_head_change_item net/sched/cls_api.c:495 [inline] tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509 tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline] tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline] tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266 ... @old and @new should not affect each other. In other words, @old should never modify miniq_{in,e}gress after @new, and @new should not update @old's RCU state. Fixing without changing sch_api.c turned out to be difficult (please refer to Closes: for discussions). Instead, make sure @new's first call always happen after @old's last call (in {ingress,clsact}_destroy()) has finished: In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests, and call qdisc_destroy() for @old before grafting @new. Introduce qdisc_refcount_dec_if_one() as the counterpart of qdisc_refcount_inc_nz() used for filter requests. Introduce a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check, just like qdisc_put() etc. Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs". [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched: sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8 transmission queues: Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then adds a flower filter X to A. Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and b2) to replace A, then adds a flower filter Y to B. Thread 1 A's refcnt Thread 2 RTM_NEWQDISC (A, RTNL-locked) qdisc_create(A) 1 qdisc_graft(A) 9 RTM_NEWTFILTER (X, RTNL-unlocked) __tcf_qdisc_find(A) 10 tcf_chain0_head_change(A) mini_qdisc_pair_swap(A) (1st) | | RTM_NEWQDISC (B, RTNL-locked) RCU sync 2 qdisc_graft(B) | 1 notify_and_destroy(A) | tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-unlocked) qdisc_destroy(A) tcf_chain0_head_change(B) tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd) mini_qdisc_pair_swap(A) (3rd) | ... ... Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets on eth0 will not find filter Y in sch_handle_ingress(). This is just one of the possible consequences of concurrently accessing miniq_{in,e}gress pointers. Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops") Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops") Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/ Cc: Hillf Danton <hdanton@sina.com> Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14net/sched: Refactor qdisc_graft() for ingress and clsact QdiscsPeilin Ye
Grafting ingress and clsact Qdiscs does not need a for-loop in qdisc_graft(). Refactor it. No functional changes intended. Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14net/sched: act_ct: Fix promotion of offloaded unreplied tuplePaul Blakey
Currently UNREPLIED and UNASSURED connections are added to the nf flow table. This causes the following connection packets to be processed by the flow table which then skips conntrack_in(), and thus such the connections will remain UNREPLIED and UNASSURED even if reply traffic is then seen. Even still, the unoffloaded reply packets are the ones triggering hardware update from new to established state, and if there aren't any to triger an update and/or previous update was missed, hardware can get out of sync with sw and still mark packets as new. Fix the above by: 1) Not skipping conntrack_in() for UNASSURED packets, but still refresh for hardware, as before the cited patch. 2) Try and force a refresh by reply-direction packets that update the hardware rules from new to established state. 3) Remove any bidirectional flows that didn't failed to update in hardware for re-insertion as bidrectional once any new packet arrives. Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-13ethtool: ioctl: account for sopass diff in set_wolJustin Chen
sopass won't be set if wolopt doesn't change. This means the following will fail to set the correct sopass. ethtool -s eth0 wol s sopass 11:22:33:44:55:66 ethtool -s eth0 wol s sopass 22:44:55:66:77:88 Make sure we call into the driver layer set_wol if sopass is different. Fixes: 55b24334c0f2 ("ethtool: ioctl: improve error checking for set_wol") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Link: https://lore.kernel.org/r/1686605822-34544-1-git-send-email-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12kcm: Send multiple frags in one sendmsg()David Howells
Rewrite the AF_KCM transmission loop to send all the fragments in a single skb or frag_list-skb in one sendmsg() with MSG_SPLICE_PAGES set. The list of fragments in each skb is conveniently a bio_vec[] that can just be attached to a BVEC iter. Note: I'm working out the size of each fragment-skb by adding up bv_len for all the bio_vecs in skb->frags[] - but surely this information is recorded somewhere? For the skbs in head->frag_list, this is equal to skb->data_len, but not for the head. head->data_len includes all the tail frags too. Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12kcm: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpageDavid Howells
When transmitting data, call down into the transport socket using sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than using sendpage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12tcp_bpf: Make tcp_bpf_sendpage() go through tcp_bpf_sendmsg(MSG_SPLICE_PAGES)David Howells
Make tcp_bpf_sendpage() a wrapper around tcp_bpf_sendmsg(MSG_SPLICE_PAGES) rather than a loop calling tcp_sendpage(). sendpage() will be removed in the future. Signed-off-by: David Howells <dhowells@redhat.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Sitnicki <jakub@cloudflare.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpageDavid Howells
When transmitting data, call down into TCP using sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing sendpage calls to transmit header, data pages and trailer. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: flower: add support for matching cfm fieldsZahari Doychev
Add support to the tc flower classifier to match based on fields in CFM information elements like level and opcode. tc filter add dev ens6 ingress protocol 802.1q \ flower vlan_id 698 vlan_ethtype 0x8902 cfm mdl 5 op 46 \ action drop Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12net: flow_dissector: add support for cfm packetsZahari Doychev
Add support for dissecting cfm packets. The cfm packet header fields maintenance domain level and opcode can be dissected. Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-12svcrdma: Revert 2a1e4f21d841 ("svcrdma: Normalize Send page handling")Chuck Lever
Get rid of the completion wait in svc_rdma_sendto(), and release pages in the send completion handler again. A subsequent patch will handle releasing those pages more efficiently. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12SUNRPC: Revert 579900670ac7 ("svcrdma: Remove unused sc_pages field")Chuck Lever
Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12SUNRPC: Revert cc93ce9529a6 ("svcrdma: Retain the page backing ↵Chuck Lever
rq_res.head[0].iov_base") Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_rw_ctxtChuck Lever
The physical device's favored NUMA node ID is available when allocating a rw_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_send_ctxtChuck Lever
The physical device's favored NUMA node ID is available when allocating a send_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_recv_ctxtChuck Lever
The physical device's favored NUMA node ID is available when allocating a recv_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. This clean up eliminates the hack of destroying recv_ctxts that were not created by the receive CQ thread -- recv_ctxts are now always allocated on a "good" node. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Allocate new transports on device's NUMA nodeChuck Lever
The physical device's NUMA node ID is available when allocating an svc_xprt for an incoming connection. Use that value to ensure the svc_xprt structure is allocated on the NUMA node closest to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12tcp: remove size parameter from tcp_stream_alloc_skb()Eric Dumazet
Now all tcp_stream_alloc_skb() callers pass @size == 0, we can remove this parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12tcp: remove some dead codeEric Dumazet
Now all skbs in write queue do not contain any payload in skb->head, we can remove some dead code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12tcp: let tcp_send_syn_data() build headless packetsEric Dumazet
tcp_send_syn_data() is the last component in TCP transmit path to put payload in skb->head. Switch it to use page frags, so that we can remove dead code later. This allows to put more payload than previous implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: ethtool: don't require empty header nestsJakub Kicinski
Ethtool currently requires a header nest (which is used to carry the common family options) in all requests including dumps. $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get lib.ynl.NlError: Netlink error: Invalid argument nl_len = 64 (48) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'request header missing'} $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get \ --json '{"header":{}}'; ) [{'combined-count': 1, 'combined-max': 1, 'header': {'dev-index': 2, 'dev-name': 'enp1s0'}}] Requiring the header nest to always be there may seem nice from the consistency perspective, but it's not serving any practical purpose. We shouldn't burden the user like this. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12netlink: support extack in dump ->start()Jakub Kicinski
Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers") added extack support to netlink dumps. It was focused on rtnl and since rtnl does not use ->start(), ->done() callbacks it ignored those. Genetlink on the other hand uses ->start() extensively, for parsing and input validation. Pass the extact in via struct netlink_dump_control and link it to cb for the time of ->start(). Both struct netlink_dump_control and extack itself live on the stack so we can't keep the same extack for the duration of the dump. This means that the extack visible in ->start() and each ->dump() callbacks will be different. Corner cases like reporting a warning message in DONE across dump calls are still not supported. We could put the extack (for dumps) in the socket struct, but layering makes it slightly awkward (extack pointer is decided before the DO / DUMP split). The genetlink dump error extacks are now surfaced: $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get lib.ynl.NlError: Netlink error: Invalid argument nl_len = 64 (48) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'request header missing'} Previously extack was missing: $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get lib.ynl.NlError: Netlink error: Invalid argument nl_len = 36 (20) nl_flags = 0x100 nl_type = 2 error: -22 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12af_unix: Kconfig: make CONFIG_UNIX boolAlexander Mikhalitsyn
Let's make CONFIG_UNIX a bool instead of a tristate. We've decided to do that during discussion about SCM_PIDFD patchset [1]. [1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/ Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: core: add getsockopt SO_PEERPIDFDAlexander Mikhalitsyn
Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. This thing is direct analog of SO_PEERCRED which allows to get plain PID. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Stanislav Fomichev <sdf@google.com> Cc: bpf@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Tested-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12scm: add SO_PASSPIDFD and SCM_PIDFDAlexander Mikhalitsyn
Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about PID reuse problem. We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because it depends on a pidfd_prepare() API which is not exported to the kernel modules. Idea comes from UAPI kernel group: https://uapi-group.org/kernel-features/ Big thanks to Christian Brauner and Lennart Poettering for productive discussions about this. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Tested-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: openvswitch: add support for l4 symmetric hashingAaron Conole
Since its introduction, the ovs module execute_hash action allowed hash algorithms other than the skb->l4_hash to be used. However, additional hash algorithms were not implemented. This means flows requiring different hash distributions weren't able to use the kernel datapath. Now, introduce support for symmetric hashing algorithm as an alternative hash supported by the ovs module using the flow dissector. Output of flow using l4_sym hash: recirc_id(0),in_port(3),eth(),eth_type(0x0800), ipv4(dst=64.0.0.0/192.0.0.0,proto=6,frag=no), packets:30473425, bytes:45902883702, used:0.000s, flags:SP., actions:hash(sym_l4(0)),recirc(0xd) Some performance testing with no GRO/GSO, two veths, single flow: hash(l4(0)): 4.35 GBits/s hash(l4_sym(0)): 4.24 GBits/s Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net/sched: taprio: report class offload stats per TXQ, not per TCVladimir Oltean
The taprio Qdisc creates child classes per netdev TX queue, but taprio_dump_class_stats() currently reports offload statistics per traffic class. Traffic classes are groups of TXQs sharing the same dequeue priority, so this is incorrect and we shouldn't be bundling up the TXQ stats when reporting them, as we currently do in enetc. Modify the API from taprio to drivers such that they report TXQ offload stats and not TC offload stats. There is no change in the UAPI or in the global Qdisc stats. Fixes: 6c1adb650c8d ("net/sched: taprio: add netlink reporting for offload statistics counters") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12xfrm: Use xfrm_state selector for BEET inputHerbert Xu
For BEET the inner address and therefore family is stored in the xfrm_state selector. Use that when decapsulating an input packet instead of incorrectly relying on a non-existent tunnel protocol. Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path") Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-12sctp: fix an error code in sctp_sf_eat_auth()Dan Carpenter
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition values and returning a kernel error code will cause issues in the caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM. Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12sctp: handle invalid error codes without calling BUG()Dan Carpenter
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition values but if the call to sctp_ulpevent_make_authkey() fails, it returns -ENOMEM. This results in calling BUG() inside the sctp_side_effects() function. Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE() instead. This code predates git. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net/sched: act_pedit: Use kmemdup() to replace kmalloc + memcpyJiapeng Chong
./net/sched/act_pedit.c:245:21-28: WARNING opportunity for kmemdup. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5478 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12wifi: mac80211: fragment per STA profile correctlyBenjamin Berg
When fragmenting the ML per STA profile, the element ID should be IEEE80211_MLE_SUBELEM_PER_STA_PROFILE rather than WLAN_EID_FRAGMENT. Change the helper function to take the to be used element ID and pass the appropriate value for each of the fragmentation levels. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230611121219.9b5c793d904b.I7dad952bea8e555e2f3139fbd415d0cd2b3a08c3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-11NFSD: Hoist rq_vec preparation into nfsd_read() [step two]Chuck Lever
Now that the preparation of an rq_vec has been removed from the generic read path, nfsd_splice_read() no longer needs to reset rq_next_page. nfsd4_encode_read() calls nfsd_splice_read() directly. As far as I can ascertain, resetting rq_next_page for NFSv4 splice reads is unnecessary because rq_next_page is already set correctly. Moreover, resetting it might even be incorrect if previous operations in the COMPOUND have already consumed at least a page of the send buffer. I would expect that the result would be encoding the READ payload over previously-encoded results. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-10Merge tag 'nf-23-06-08' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf netfilter pull request 23-06-08 Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Add commit and abort set operation to pipapo set abort path. 2) Bail out immediately in case of ENOMEM in nfnetlink batch. 3) Incorrect error path handling when creating a new rule leads to dangling pointer in set transaction list. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-10netlabel: fix shift wrapping bug in netlbl_catmap_setlong()Dmitry Mastykin
There is a shift wrapping bug in this code on 32-bit architectures. NETLBL_CATMAP_MAPTYPE is u64, bitmap is unsigned long. Every second 32-bit word of catmap becomes corrupted. Signed-off-by: Dmitry Mastykin <dmastykin@astralinux.ru> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-10net: move gso declarations and functions to their own filesEric Dumazet
Move declarations into include/net/gso.h and code into net/core/gso.c Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stanislav Fomichev <sdf@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-10mptcp: unify pm set_flags interfacesGeliang Tang
This patch unifies the three PM set_flags() interfaces: mptcp_pm_nl_set_flags() in mptcp/pm_netlink.c for the in-kernel PM and mptcp_userspace_pm_set_flags() in mptcp/pm_userspace.c for the userspace PM. They'll be switched in the common PM infterface mptcp_pm_set_flags() in mptcp/pm.c based on whether token is NULL or not. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-10mptcp: unify pm get_flags_and_ifindex_by_idGeliang Tang
This patch unifies the three PM get_flags_and_ifindex_by_id() interfaces: mptcp_pm_nl_get_flags_and_ifindex_by_id() in mptcp/pm_netlink.c for the in-kernel PM and mptcp_userspace_pm_get_flags_and_ifindex_by_id() in mptcp/pm_userspace.c for the userspace PM. They'll be switched in the common PM infterface mptcp_pm_get_flags_and_ifindex_by_id() in mptcp/pm.c based on whether mptcp_pm_is_userspace() or not. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-10mptcp: unify pm get_local_id interfacesGeliang Tang
This patch unifies the three PM get_local_id() interfaces: mptcp_pm_nl_get_local_id() in mptcp/pm_netlink.c for the in-kernel PM and mptcp_userspace_pm_get_local_id() in mptcp/pm_userspace.c for the userspace PM. They'll be switched in the common PM infterface mptcp_pm_get_local_id() in mptcp/pm.c based on whether mptcp_pm_is_userspace() or not. Also put together the declarations of these three functions in protocol.h. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-10mptcp: export local_addressGeliang Tang
Rename local_address() with "mptcp_" prefix and export it in protocol.h. This function will be re-used in the common PM code (pm.c) in the following commit. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09Merge tag 'wireless-next-2023-06-09' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.5 The second pull request for v6.5. We have support for three new Realtek chipsets, all from different generations. Shows how active Realtek development is right now, even older generations are being worked on. Note: We merged wireless into wireless-next to avoid complex conflicts between the trees. Major changes: rtl8xxxu - RTL8192FU support rtw89 - RTL8851BE support rtw88 - RTL8723DS support ath11k - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode iwlwifi - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature cfg80211/mac80211 - more Multi-Link Operation (MLO) support such as hardware restart - fixes for a potential work/mutex deadlock and with it beginnings of the previously discussed locking simplifications * tag 'wireless-next-2023-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (162 commits) wifi: rtlwifi: remove misused flag from HAL data wifi: rtlwifi: remove unused dualmac control leftovers wifi: rtlwifi: remove unused timer and related code wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled wifi: brcmfmac: Detect corner error case earlier with log wifi: rtw89: 8852c: update RF radio A/B parameters to R63 wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (3 of 3) wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (2 of 3) wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (1 of 3) wifi: rtw89: process regulatory for 6 GHz power type wifi: rtw89: regd: update regulatory map to R64-R40 wifi: rtw89: regd: judge 6 GHz according to chip and BIOS wifi: rtw89: refine clearing supported bands to check 2/5 GHz first wifi: rtw89: 8851b: configure CRASH_TRIGGER feature for 8851B wifi: rtw89: set TX power without precondition during setting channel wifi: rtw89: debug: txpwr table access only valid page according to chip wifi: rtw89: 8851b: enable hw_scan support wifi: cfg80211: move scan done work to wiphy work wifi: cfg80211: move sched scan stop to wiphy work ... ==================== Link: https://lore.kernel.org/r/87bkhohkbg.fsf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09mm/gup: remove vmas parameter from pin_user_pages()Lorenzo Stoakes
We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian König <christian.koenig@amd.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09wifi: mac80211: Use active_links instead of valid_links in TxIlan Peer
Fix few places on the Tx path where the valid_links were used instead of active links. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.e24832691fc8.I9ac10dc246d7798a8d26b1a94933df5668df63fc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09wifi: cfg80211: remove links only on APJohannes Berg
Since links are only controlled by userspace via cfg80211 in AP mode, also only remove them from the driver in that case. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.ed65b94916fa.I2458c46888284cc5ce30715fe642bc5fc4340c8f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09wifi: mac80211: take lock before setting vif linksBenjamin Berg
ieee80211_vif_set_links requires the sdata->local->mtx lock to be held. Add the appropriate locking around the calls in both the link add and remove handlers. This causes a warning when e.g. ieee80211_link_release_channel is called via ieee80211_link_stop from ieee80211_vif_update_links. Fixes: 0d8c4a3c8688 ("wifi: mac80211: implement add/del interface link callbacks") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.fa0c6597fdad.I83dd70359f6cda30f86df8418d929c2064cf4995@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09wifi: cfg80211: fix link del callback to call correct handlerBenjamin Berg
The wrapper function was incorrectly calling the add handler instead of the del handler. This had no negative side effect as the default handlers are essentially identical. Fixes: f2a0290b2df2 ("wifi: cfg80211: add optional link add/remove callbacks") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.ebd00e000459.Iaff7dc8d1cdecf77f53ea47a0e5080caa36ea02a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09wifi: mac80211: fix link activation settings orderJohannes Berg
In the normal MLME code we always call ieee80211_mgd_set_link_qos_params() before ieee80211_link_info_change_notify() and some drivers, notably iwlwifi, rely on that as they don't do anything (but store the data) in their conf_tx. Fix the order here to be the same as in the normal code paths, so this isn't broken. Fixes: 3d9011029227 ("wifi: mac80211: implement link switching") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230608163202.a2a86bba2f80.Iac97e04827966d22161e63bb6e201b4061e9651b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()Dan Carpenter
The locking was changed recently so now the caller holds the wiphy_lock() lock. Taking the lock inside the reg_wdev_chan_valid() function will lead to a deadlock. Fixes: f7e60032c661 ("wifi: cfg80211: fix locking in regulatory disconnect") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/40c4114a-6cb4-4abf-b013-300b598aba65@moroto.mountain Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09net/sched: cls_u32: Fix reference counter leak leading to overflowLee Jones
In the event of a failure in tcf_change_indev(), u32_set_parms() will immediately return without decrementing the recently incremented reference counter. If this happens enough times, the counter will rollover and the reference freed, leading to a double free which can be used to do 'bad things'. In order to prevent this, move the point of possible failure above the point where the reference counter is incremented. Also save any meaningful return values to be applied to the return data at the appropriate point in time. This issue was caught with KASAN. Fixes: 705c7091262d ("net: sched: cls_u32: no need to call tcf_exts_change for newly allocated struct") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Lee Jones <lee@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09net/sched: taprio: fix slab-out-of-bounds Read in taprio_dequeue_from_txqZhengchao Shao
As shown in [1], out-of-bounds access occurs in two cases: 1)when the qdisc of the taprio type is used to replace the previously configured taprio, count and offset in tc_to_txq can be set to 0. In this case, the value of *txq in taprio_next_tc_txq() will increases continuously. When the number of accessed queues exceeds the number of queues on the device, out-of-bounds access occurs. 2)When packets are dequeued, taprio can be deleted. In this case, the tc rule of dev is cleared. The count and offset values are also set to 0. In this case, out-of-bounds access is also caused. Now the restriction on the queue number is added. [1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg Fixes: 2f530df76c8c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode") Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Tested-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>