summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-08-25wifi: cfg80211: reject connect response with MLO params for WEPVeerendranath Jakkam
MLO connections are not supposed to use WEP security. Reject connect response of MLO connection if WEP security mode is used. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20220730052643.1959111-2-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: fix use-after-freeJohannes Berg
We've already freed the assoc_data at this point, so need to use another copy of the AP (MLD) address instead. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: use link in TXQ parameter configurationShaul Triebitz
Configure the correct link per the passed parameters. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: cfg80211: add link id to txq paramsShaul Triebitz
The Tx queue parameters are per link, so add the link ID from nl80211 parameters to the API. While at it, lock the wdev when calling into the driver so it (and we) can check the link ID appropriately. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: set link BSSIDShaul Triebitz
For an AP interface, set the link BSSID when the link is initialized. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: cfg80211: get correct AP link chandefShaul Triebitz
When checking for channel regulatory validity, use the AP link chandef (and not mesh's chandef). Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: cfg80211: Update RNR parsing to align with Draft P802.11be_D2.0Ilan Peer
Based on changes in the specification the TBTT information in the RNR can include MLD information, so update the parsing to allow extracting the short SSID information in such a case. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: properly set old_links when removing a linkShaul Triebitz
In ieee80211_sta_remove_link, valid_links is set to the new_links before calling drv_change_sta_links, but is used for the old_links. Fixes: cb71f1d136a6 ("wifi: mac80211: add sta link addition/removal") Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: accept STA changes without link changesJohannes Berg
If there's no link ID, then check that there are no changes to the link, and if so accept them, unless a new link is created. While at it, reject creating a new link without an address. This fixes authorizing an MLD (peer) that has no link 0. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25net: gro: skb_gro_header helper functionRichard Gobert
Introduce a simple helper function to replace a common pattern. When accessing the GRO header, we fetch the pointer from frag0, then test its validity and fetch it from the skb when necessary. This leads to the pattern skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow recurring many times throughout GRO code. This patch replaces these patterns with a single inlined function call, improving code readability. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220823071034.GA56142@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-25wifi: mac80211: potential NULL dereference in ieee80211_tx_control_port()Dan Carpenter
The ieee80211_lookup_ra_sta() function will sometimes set "sta" to NULL so add this NULL check to prevent an Oops. Fixes: 9dd1953846c7 ("wifi: nl80211/mac80211: clarify link ID in control port TX") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YuKcTAyO94YOy0Bu@kili Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: cfg80211: debugfs: fix return type in ht40allow_map_read()Dan Carpenter
The return type is supposed to be ssize_t, which is signed long, but "r" was declared as unsigned int. This means that on 64 bit systems we return positive values instead of negative error codes. Fixes: 80a3511d70e8 ("cfg80211: add debugfs HT40 allow map") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YutvOQeJm0UjLhwU@kili Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnectedSiddh Raman Pant
When we are not connected to a channel, sending channel "switch" announcement doesn't make any sense. The BSS list is empty in that case. This causes the for loop in cfg80211_get_bss() to be bypassed, so the function returns NULL (check line 1424 of net/wireless/scan.c), causing the WARN_ON() in ieee80211_ibss_csa_beacon() to get triggered (check line 500 of net/mac80211/ibss.c), which was consequently reported on the syzkaller dashboard. Thus, check if we have an existing connection before generating the CSA beacon in ieee80211_ibss_finish_csa(). Cc: stable@vger.kernel.org Fixes: cd7760e62c2a ("mac80211: add support for CSA in IBSS mode") Link: https://syzkaller.appspot.com/bug?id=05603ef4ae8926761b678d2939a3b2ad28ab9ca6 Reported-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com Signed-off-by: Siddh Raman Pant <code@siddh.me> Tested-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20220814151512.9985-1-code@siddh.me Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: fix possible leak in ieee80211_tx_control_port()Yang Yingliang
Add missing dev_kfree_skb() in an error path in ieee80211_tx_control_port() to avoid a memory leak. Fixes: dd820ed6336a ("wifi: mac80211: return error from control port TX for drops") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220818043349.4168835-1-yangyingliang@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: always free sta in __sta_info_alloc in case of errorLorenzo Bianconi
Free sta pointer in __sta_info_alloc routine if sta_info_alloc_link() fails. Fixes: 246b39e4a1ba5 ("wifi: mac80211: refactor some sta_info link handling") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/a3d079208684cddbc25289f7f7e0fed795b0cad4.1661260857.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-25wifi: mac80211: Fix UAF in ieee80211_scan_rx()Siddh Raman Pant
ieee80211_scan_rx() tries to access scan_req->flags after a null check, but a UAF is observed when the scan is completed and __ieee80211_scan_completed() executes, which then calls cfg80211_scan_done() leading to the freeing of scan_req. Since scan_req is rcu_dereference()'d, prevent the racing in __ieee80211_scan_completed() by ensuring that from mac80211's POV it is no longer accessed from an RCU read critical section before we call cfg80211_scan_done(). Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?extid=f9acff9bf08a845f225d Reported-by: syzbot+f9acff9bf08a845f225d@syzkaller.appspotmail.com Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Siddh Raman Pant <code@siddh.me> Link: https://lore.kernel.org/r/20220819200340.34826-1-code@siddh.me Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-24net: Add a bhash2 table hashed by port and addressJoanne Koong
The current bind hashtable (bhash) is hashed by port only. In the socket bind path, we have to check for bind conflicts by traversing the specified port's inet_bind_bucket while holding the hashbucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()). In instances where there are tons of sockets hashed to the same port at different addresses, the bind conflict check is time-intensive and can cause softirq cpu lockups, as well as stops new tcp connections since __inet_inherit_port() also contests for the spinlock. This patch adds a second bind table, bhash2, that hashes by port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6). Searching the bhash2 table leads to significantly faster conflict resolution and less time holding the hashbucket spinlock. Please note a few things: * There can be the case where the a socket's address changes after it has been bound. There are two cases where this happens: 1) The case where there is a bind() call on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will assign the socket an address when it handles the connect() 2) In inet_sk_reselect_saddr(), which is called when rebuilding the sk header and a few pre-conditions are met (eg rerouting fails). In these two cases, we need to update the bhash2 table by removing the entry for the old address, and add a new entry reflecting the updated address. * The bhash2 table must have its own lock, even though concurrent accesses on the same port are protected by the bhash lock. Bhash2 must have its own lock to protect against cases where sockets on different ports hash to different bhash hashbuckets but to the same bhash2 hashbucket. This brings up a few stipulations: 1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock will always be acquired after the bhash lock and released before the bhash lock is released. 2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always acquired+released before another bhash2 lock is acquired+released. * The bhash table cannot be superseded by the bhash2 table because for bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket bound to that port must be checked for a potential conflict. The bhash table is the only source of port->socket associations. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix crash with malformed ebtables blob which do not provide all entry points, from Florian Westphal. 2) Fix possible TCP connection clogging up with default 5-days timeout in conntrack, from Florian. 3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian. 4) Do not allow to update implicit chains. 5) Make table handle allocation per-netns to fix data race. 6) Do not truncated payload length and offset, and checksum offset. Instead report EINVAl. 7) Enable chain stats update via static key iff no error occurs. 8) Restrict osf expression to ip, ip6 and inet families. 9) Restrict tunnel expression to netdev family. 10) Fix crash when trying to bind again an already bound chain. 11) Flowtable garbage collector might leave behind pending work to delete entries. This patch comes with a previous preparation patch as dependency. 12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases netfilter: flowtable: fix stuck flows on cleanup due to pending work netfilter: flowtable: add function to invoke garbage collection immediately netfilter: nf_tables: disallow binding to already bound chain netfilter: nft_tunnel: restrict it to netdev family netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families netfilter: nf_tables: do not leave chain stats enabled on error netfilter: nft_payload: do not truncate csum_offset and csum_type netfilter: nft_payload: report ERANGE for too long offset and length netfilter: nf_tables: make table handle allocation per-netns friendly netfilter: nf_tables: disallow updates of implicit chain netfilter: nft_tproxy: restrict to prerouting hook netfilter: conntrack: work around exceeded receive window netfilter: ebtables: reject blobs that don't provide all entry points ==================== Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24net: Fix a data-race around sysctl_somaxconn.Kuniyuki Iwashima
While reading sysctl_somaxconn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around netdev_unregister_timeout_secs.Kuniyuki Iwashima
While reading netdev_unregister_timeout_secs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_devconf_inherit_init_net.Kuniyuki Iwashima
While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around netdev_budget_usecs.Kuniyuki Iwashima
While reading netdev_budget_usecs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_max_skb_frags.Kuniyuki Iwashima
While reading sysctl_max_skb_frags, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around netdev_budget.Kuniyuki Iwashima
While reading netdev_budget, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around sysctl_net_busy_read.Kuniyuki Iwashima
While reading sysctl_net_busy_read, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around sysctl_tstamp_allow_data.Kuniyuki Iwashima
While reading sysctl_tstamp_allow_data, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_optmem_max.Kuniyuki Iwashima
While reading sysctl_optmem_max, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around netdev_tstamp_prequeue.Kuniyuki Iwashima
While reading netdev_tstamp_prequeue, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around netdev_max_backlog.Kuniyuki Iwashima
While reading netdev_max_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. While at it, we remove the unnecessary spaces in the doc. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around weight_p and dev_weight_[rt]x_bias.Kuniyuki Iwashima
While reading weight_p, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Also, dev_[rt]x_weight can be read/written at the same time. So, we need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to use the same weight_p while changing dev_[rt]x_weight, we add a mutex in proc_do_dev_weight(). Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_[rw]mem_(max|default).Kuniyuki Iwashima
While reading sysctl_[rw]mem_(max|default), they can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net/core/skbuff: Check the return value of skb_copy_bits()lily
skb_copy_bits() could fail, which requires a check on the return value. Signed-off-by: Li Zhong <floridsleeves@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24tcp: annotate data-race around tcp_md5sig_pool_populatedEric Dumazet
tcp_md5sig_pool_populated can be read while another thread changes its value. The race has no consequence because allocations are protected with tcp_md5sig_mutex. This patch adds READ_ONCE() and WRITE_ONCE() to document the race and silence KCSAN. Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-08-24 1) Fix a refcount leak in __xfrm_policy_check. From Xin Xiong. 2) Revert "xfrm: update SA curlft.use_time". This violates RFC 2367. From Antony Antony. 3) Fix a comment on XFRMA_LASTUSED. From Antony Antony. 4) x->lastused is not cloned in xfrm_do_migrate. Fix from Antony Antony. 5) Serialize the calls to xfrm_probe_algs. From Herbert Xu. 6) Fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: neigh: don't call kfree_skb() under spin_lock_irqsave()Yang Yingliang
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So add all skb to a tmp list, then free them after spin_unlock_irqrestore() at once. Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: skb: prevent the split of kfree_skb_reason() by gccMenglong Dong
Sometimes, gcc will optimize the function by spliting it to two or more functions. In this case, kfree_skb_reason() is splited to kfree_skb_reason and kfree_skb_reason.part.0. However, the function/tracepoint trace_kfree_skb() in it needs the return address of kfree_skb_reason(). This split makes the call chains becomes: kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb() which makes the return address that passed to trace_kfree_skb() be kfree_skb(). Therefore, introduce '__fix_address', which is the combination of '__noclone' and 'noinline', and apply it to kfree_skb_reason() to prevent to from being splited or made inline. (Is it better to simply apply '__noclone oninline' to kfree_skb_reason? I'm thinking maybe other functions have the same problems) Meanwhile, wrap 'skb_unref()' with 'unlikely()', as the compiler thinks it is likely return true and splits kfree_skb_reason(). Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increasesEric Dumazet
Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered. I found this issue while investigating a probable kernel issue causing flakes in tools/testing/selftests/net/ip_defrag.sh In particular, these sysctl changes were ignored: ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1 ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000 >/dev/null 2>&1 This change is inline with commit 836196239298 ("net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net") Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: flowtable: fix stuck flows on cleanup due to pending workPablo Neira Ayuso
To clear the flow table on flow table free, the following sequence normally happens in order: 1) gc_step work is stopped to disable any further stats/del requests. 2) All flow table entries are set to teardown state. 3) Run gc_step which will queue HW del work for each flow table entry. 4) Waiting for the above del work to finish (flush). 5) Run gc_step again, deleting all entries from the flow table. 6) Flow table is freed. But if a flow table entry already has pending HW stats or HW add work step 3 will not queue HW del work (it will be skipped), step 4 will wait for the pending add/stats to finish, and step 5 will queue HW del work which might execute after freeing of the flow table. To fix the above, this patch flushes the pending work, then it sets the teardown flag to all flows in the flowtable and it forces a garbage collector run to queue work to remove the flows from hardware, then it flushes this new pending work and (finally) it forces another garbage collector run to remove the entry from the software flowtable. Stack trace: [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460 [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704 [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2 [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table] [47773.889727] Call Trace: [47773.890214] dump_stack+0xbb/0x107 [47773.890818] print_address_description.constprop.0+0x18/0x140 [47773.892990] kasan_report.cold+0x7c/0xd8 [47773.894459] kasan_check_range+0x145/0x1a0 [47773.895174] down_read+0x99/0x460 [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table] [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table] [47773.913372] process_one_work+0x8ac/0x14e0 [47773.921325] [47773.921325] Allocated by task 592159: [47773.922031] kasan_save_stack+0x1b/0x40 [47773.922730] __kasan_kmalloc+0x7a/0x90 [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct] [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct] [47773.925207] tcf_action_init_1+0x45b/0x700 [47773.925987] tcf_action_init+0x453/0x6b0 [47773.926692] tcf_exts_validate+0x3d0/0x600 [47773.927419] fl_change+0x757/0x4a51 [cls_flower] [47773.928227] tc_new_tfilter+0x89a/0x2070 [47773.936652] [47773.936652] Freed by task 543704: [47773.937303] kasan_save_stack+0x1b/0x40 [47773.938039] kasan_set_track+0x1c/0x30 [47773.938731] kasan_set_free_info+0x20/0x30 [47773.939467] __kasan_slab_free+0xe7/0x120 [47773.940194] slab_free_freelist_hook+0x86/0x190 [47773.941038] kfree+0xce/0x3a0 [47773.941644] tcf_ct_flow_table_cleanup_work Original patch description and stack trace by Paul Blakey. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@nvidia.com> Tested-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: flowtable: add function to invoke garbage collection immediatelyPablo Neira Ayuso
Expose nf_flow_table_gc_run() to force a garbage collector run from the offload infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: disallow binding to already bound chainPablo Neira Ayuso
Update nft_data_init() to report EINVAL if chain is already bound. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_tunnel: restrict it to netdev familyPablo Neira Ayuso
Only allow to use this expression from NFPROTO_NETDEV family. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet familiesPablo Neira Ayuso
As it was originally intended, restrict extension to supported families. Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: do not leave chain stats enabled on errorPablo Neira Ayuso
Error might occur later in the nf_tables_addchain() codepath, enable static key only after transaction has been created. Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_payload: do not truncate csum_offset and csum_typePablo Neira Ayuso
Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support. Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_payload: report ERANGE for too long offset and lengthPablo Neira Ayuso
Instead of offset and length are truncation to u8, report ERANGE. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: make table handle allocation per-netns friendlyPablo Neira Ayuso
mutex is per-netns, move table_netns to the pernet area. *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0: nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921 Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: disallow updates of implicit chainPablo Neira Ayuso
Updates on existing implicit chain make no sense, disallow this. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-23bpf: Use cgroup_{common,current}_func_proto in more hooksStanislav Fomichev
The following hooks are per-cgroup hooks but they are not using cgroup_{common,current}_func_proto, fix it: * BPF_PROG_TYPE_CGROUP_SKB (cg_skb) * BPF_PROG_TYPE_CGROUP_SOCK_ADDR (cg_sock_addr) * BPF_PROG_TYPE_CGROUP_SOCK (cg_sock) * BPF_PROG_TYPE_LSM+BPF_LSM_CGROUP Also: * move common func_proto's into cgroup func_proto handlers * make sure bpf_{g,s}et_retval are not accessible from recvmsg, getpeername and getsockname (return/errno is ignored in these places) * as a side effect, expose get_current_pid_tgid, get_current_comm_proto, get_current_ancestor_cgroup_id, get_cgroup_classid to more cgroup hooks Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-3-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf, test_run: Propagate bpf_flow_dissect's retval to user's ↵Shmulik Ladkani
bpf_attr.test.retval Formerly, a boolean denoting whether bpf_flow_dissect returned BPF_OK was set into 'bpf_attr.test.retval'. Augment this, so users can check the actual return code of the dissector program under test. Existing prog_tests/flow_dissector*.c tests were correspondingly changed to check against each test's expected retval. Also, tests' resulting 'flow_keys' are verified only in case the expected retval is BPF_OK. This allows adding new tests that expect non BPF_OK. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-4-shmulik.ladkani@gmail.com
2022-08-23bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progsShmulik Ladkani
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely replaces the flow-dissector logic with custom dissection logic. This forces implementors to write programs that handle dissection for any flows expected in the namespace. It makes sense for flow-dissector BPF programs to just augment the dissector with custom logic (e.g. dissecting certain flows or custom protocols), while enjoying the broad capabilities of the standard dissector for any other traffic. Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF programs may return this to indicate no dissection was made, and fallback to the standard dissector is requested. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com