Age | Commit message (Collapse) | Author |
|
There is no need to store the result of the addition back to variable count
after the addition. The store is redundant, replace += with just +
Cleans up clang scan build warning:
warning: Although the value stored to 'count' is used in the enclosing
expression, the value is never actually read from 'count'
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220628145406.183527-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As reported by syzbot, we should not use rcu_dereference()
when rcu_read_lock() is not held.
WARNING: suspicious RCU usage
5.19.0-rc2-syzkaller #0 Not tainted
net/ipv6/addrconf.c:5175 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by syz-executor326/3617:
#0: ffffffff8d5848e8 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xae/0xc20 net/netlink/af_netlink.c:2223
stack backtrace:
CPU: 0 PID: 3617 Comm: syz-executor326 Not tainted 5.19.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
in6_dump_addrs+0x12d1/0x1790 net/ipv6/addrconf.c:5175
inet6_dump_addr+0x9c1/0xb50 net/ipv6/addrconf.c:5300
netlink_dump+0x541/0xc20 net/netlink/af_netlink.c:2275
__netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2380
netlink_dump_start include/linux/netlink.h:245 [inline]
rtnetlink_rcv_msg+0x73e/0xc90 net/core/rtnetlink.c:6046
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2492
___sys_sendmsg+0xf3/0x170 net/socket.c:2546
__sys_sendmsg net/socket.c:2575 [inline]
__do_sys_sendmsg net/socket.c:2584 [inline]
__se_sys_sendmsg net/socket.c:2582 [inline]
__x64_sys_sendmsg+0x132/0x220 net/socket.c:2582
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20220628121248.858695-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for pause stats
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Restore set counter when one of the CPU loses race to add elements
to sets.
2) After NF_STOLEN, skb might be there no more, update nftables trace
infra to avoid access to skb in this case. From Florian Westphal.
3) nftables bridge might register a prerouting hook with zero priority,
br_netfilter incorrectly skips it. Also from Florian.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: br_netfilter: do not skip all hooks with 0 priority
netfilter: nf_tables: avoid skb access on nf_stolen
netfilter: nft_dynset: restore set element counter when failing to update
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I don't see how to make it nice without introducing btf id lists
for the hooks where these helpers are allowed. Some LSM hooks
work on the locked sockets, some are triggering early and
don't grab any locks, so have two lists for now:
1. LSM hooks which trigger under socket lock - minority of the hooks,
but ideal case for us, we can expose existing BTF-based helpers
2. LSM hooks which trigger without socket lock, but they trigger
early in the socket creation path where it should be safe to
do setsockopt without any locks
3. The rest are prohibited. I'm thinking that this use-case might
be a good gateway to sleeping lsm cgroup hooks in the future.
We can either expose lock/unlock operations (and add tracking
to the verifier) or have another set of bpf_setsockopt
wrapper that grab the locks and might sleep.
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220628174314.1216643-7-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Free sk in case tipc_sk_insert() fails.
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case of 4way handshake offload, cfg80211_port_authorized
enables driver to indicate successful 4way handshake to cfg80211 layer.
Currently this path of port authorization is restricted to
interface type NL80211_IFTYPE_STATION. This patch extends
the use of port authorization API for P2P client as well.
Signed-off-by: Vinayak Yadawad <vinayak.yadawad@broadcom.com>
Link: https://lore.kernel.org/r/ef25cb49fcb921df2e5d99e574f65e8a009cc52c.1655905440.git.vinayak.yadawad@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When a vif is being removed and sdata->bss is cleared, __ieee80211_wake_txqs
can still be called on it, which crashes as soon as sdata->bss is being
dereferenced.
To fix this properly, check for SDATA_STATE_RUNNING before waking queues,
and take the fq lock when setting it (to ensure that __ieee80211_wake_txqs
observes the change when running on a different CPU)
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://lore.kernel.org/r/20220531190824.60019-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add a missing skb_shared check into 802.3 path to prevent potential
use-after-free from happening. This also uses skb_share_check()
instead of open-coding in tx path.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Link: https://lore.kernel.org/r/e7a73aaf7742b17e43421c56625646dfc5c4d2cb.1653571902.git.ryder.lee@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Introduce the capability to specify gfp_t parameter to
ieeee80211_obss_color_collision_notify routine since it runs in
interrupt context in ieee80211_rx_check_bss_color_collision().
Fixes: 6d945a33f2b0a ("mac80211: introduce BSS color collision detection")
Co-developed-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/02c990fb3fbd929c8548a656477d20d6c0427a13.1655419135.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
codespell finds some spelling mistake in mptcp:
net/mptcp/subflow.c:1624: interaces ==> interfaces
net/mptcp/pm_netlink.c:1130: regarless ==> regardless
Just fix them.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20220627121626.1595732-1-imagedong@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, "ss -K -ta ..." does not support TIME_WAIT sockets.
Issue has been raised at least two times in the past [1] [2]
it is time to fix it.
[1] https://lore.kernel.org/netdev/ba65f579-4e69-ae0d-4770-bc6234beb428@gmail.com/
[2] https://lore.kernel.org/netdev/CANn89i+R9RgmD=AQ4vX1Vb_SQAj4c3fi7-ZtQz-inYY4Sq4CMQ@mail.gmail.com/T/
While we are at it, use inet_sk_state_load() while tcp_abort()
does not hold a lock on the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20220627121038.226500-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As of commit 5801f064e351 ("net: ipv6: unexport __init-annotated seg6_hmac_init()"),
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
This remove the EXPORT_SYMBOL to fix modpost warning:
WARNING: modpost: vmlinux.o(___ksymtab+seg6_hmac_net_init+0x0): Section mismatch in reference from the variable __ksymtab_seg6_hmac_net_init to the function .init.text:seg6_hmac_net_init()
The symbol seg6_hmac_net_init is exported and annotated __init
Fix this by removing the __init annotation of seg6_hmac_net_init or drop the export.
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20220628033134.21088-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When kcalloc fails, ipip6_tunnel_get_prl() should return -ENOMEM.
Move the position of label "out" to return correctly.
Addresses-Coverity: ("Unused value")
Fixes: 300aaeeaab5f ("[IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.")
Signed-off-by: katrinzhou <katrinzhou@tencent.com>
Reviewed-by: Eric Dumazet<edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220628035030.1039171-1-zys.zljxml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While setting up init_net's sysctl table, we need not duplicate the
global table and can use it directly as ipv4_sysctl_init_net() does.
Unlike IPv4, AF_UNIX does not have a huge sysctl table for now, so it
cannot be a problem, but this patch makes code consistent.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220627233627.51646-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the listener socket owning the relevant request is closed,
it frees the unaccepted subflows and that causes later deletion
of the paired MPTCP sockets.
The mptcp socket's worker can run in the time interval between such delete
operations. When that happens, any access to msk->first will cause an UaF
access, as the subflow cleanup did not cleared such field in the mptcp
socket.
Address the issue explicitly traversing the listener socket accept
queue at close time and performing the needed cleanup on the pending
msk.
Note that the locking is a bit tricky, as we need to acquire the msk
socket lock, while still owning the subflow socket one.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the MPTCP receive path reach a non fatal fall-back condition, e.g.
when the MPC sockets must fall-back to TCP, the existing code is a little
self-inconsistent: it reports that new data is available - return true -
but sets the MPC flag to the opposite value.
As the consequence read operations in some exceptional scenario may block
unexpectedly.
Address the issue setting the correct MPC read status. Additionally avoid
some code duplication in the fatal fall-back scenario.
Fixes: 9c81be0dbc89 ("mptcp: add MP_FAIL response support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the MPTCP socket shutdown happens before a fallback
to TCP, and all the pending data have been already spooled,
we never close the TCP connection.
Address the issue explicitly checking for critical condition
at fallback time.
Fixes: 1e39e5a32ad7 ("mptcp: infinite mapping sending")
Fixes: 0348c690ed37 ("mptcp: add the fallback check")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mptcp_mp_fail_no_response shouldn't be invoked on each worker run, it
should be invoked only when MP_FAIL response timeout occurs.
This patch refactors the MP_FAIL response logic.
It leverages the fact that only the MPC/first subflow can gracefully
fail to avoid unneeded subflows traversal: the failing subflow can
be only msk->first.
A new 'fail_tout' field is added to the subflow context to record the
MP_FAIL response timeout and use such field to reliably share the
timeout timer between the MP_FAIL event and the MPTCP socket close
timeout.
Finally, a new ack is generated to send out MP_FAIL notification as soon
as we hit the relevant condition, instead of waiting a possibly unbound
time for the next data packet.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/281
Fixes: d9fb797046c5 ("mptcp: Do not traverse the subflow connection list without lock")
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This allow moving a couple of conditional out of the fast path,
making the code more easy to follow and will simplify the next
patch.
Fixes: ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current accounting for MP_FAIL and FASTCLOSE is not very
accurate: both can be increased even when the related option is
not really sent. Move the accounting into the correct place.
Fixes: eb7f33654dc1 ("mptcp: add the mibs for MP_FAIL")
Fixes: 1e75629cb964 ("mptcp: add the mibs for MP_FASTCLOSE")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a XSK pool gets mapped, xp_check_dma_contiguity() adds bit 0x1
to pages' DMA addresses that go in ascending order and at 4K stride.
The problem is that the bit does not get cleared before doing unmap.
As a result, a lot of warnings from iommu_dma_unmap_page() are seen
in dmesg, which indicates that lookups by iommu_iova_to_phys() fail.
Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220628091848.534803-1-ivan.malov@oktetlabs.ru
|
|
The addrconf_verify_rtnl() function uses a big if/elseif/elseif/... block
to categorize each address by what type of attention it needs. An
about-to-expire (RFC 4941) temporary address is one such category, but the
previous elseif branch catches addresses that have already run out their
prefered_lft. This means that if addrconf_verify_rtnl() fails to run in
the necessary time window (i.e. REGEN_ADVANCE time units before the end of
the prefered_lft), the temporary address will never be regenerated, and no
temporary addresses will be available until each one's valid_lft runs out
and manage_tempaddrs() begins anew.
Fix this by moving the entire temporary address regeneration case out of
that block. That block is supposed to implement the "destructive" part of
an address's lifecycle, and regenerating a fresh temporary address is not,
semantically speaking, actually tied to any particular lifecycle stage.
The age test is also changed from `age >= prefered_lft - regen_advance`
to `age + regen_advance >= prefered_lft` instead, to ensure no underflow
occurs if the system administrator increases the regen_advance to a value
greater than the already-set prefered_lft.
Note that this does not fix the problem of addrconf_verify_rtnl() sometimes
not running in time, resulting in the race condition described in RFC 4941
section 3.4 - it only ensures that the address is regenerated. Fixing THAT
problem may require either using jiffies instead of seconds for all time
arithmetic here, or always rounding up when regen_advance is converted to
seconds.
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Link: https://lore.kernel.org/r/20220623181103.7033-1-CFSworks@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot reproduced the bug ...
BUG: sleeping function called from invalid context at kernel/workqueue.c:3010
... with the following stack trace fragment ...
start_flush_work kernel/workqueue.c:3010 [inline]
__flush_work+0x109/0xb10 kernel/workqueue.c:3074
__cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3162
sk_psock_stop+0x4cb/0x630 net/core/skmsg.c:802
sock_map_destroy+0x333/0x760 net/core/sock_map.c:1581
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1130
__tcp_close+0xd5b/0x12b0 net/ipv4/tcp.c:2897
tcp_close+0x29/0xc0 net/ipv4/tcp.c:2909
... introduced by d8616ee2affc. Do a quick trace of the code path and the
bug is obvious:
inet_csk_destroy_sock(sk)
sk_prot->destroy(sk); <--- sock_map_destroy
sk_psock_stop(, true); <--- true so cancel workqueue
cancel_work_sync() <--- splat, because *_bh_disable()
We can not call cancel_work_sync() from inside destroy path. So mark
the sk_psock_stop call to skip this cancel_work_sync(). This will avoid
the BUG, but means we may run sk_psock_backlog after or during the
destroy op. We zapped the ingress_skb queue in sk_psock_stop (safe to
do with local_bh_disable) so its empty and the sk_psock_backlog work
item will not find any pkts to process here. However, because we are
not going to wait for it or clear its ->state its possible it kicks off
or is already running. This should be 'safe' up until psock drops its
refcnt to psock->sk. The sock_put() that drops this reference is only
done at psock destroy time from sk_psock_destroy(). This is done through
workqueue when sk_psock_drop() is called on psock refnt reaches 0.
And importantly sk_psock_destroy() does a cancel_work_sync(). So trivial
fix works.
I've had hit or miss luck reproducing this caught it once or twice with
the provided reproducer when running with many runners. However, syzkaller
is very good at reproducing so relying on syzkaller to verify fix.
Fixes: d8616ee2affc ("bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues")
Reported-by: syzbot+140186ceba0c496183bc@syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/bpf/20220628035803.317876-1-john.fastabend@gmail.com
|
|
When routes corresponding to addresses are restored by
fixup_permanent_addr(), the dst_nopolicy parameter was not set.
The typical use case is a user that configures an address on a down
interface and then put this interface up.
Let's take care of this flag in addrconf_f6i_alloc(), so that every callers
benefit ont it.
CC: stable@kernel.org
CC: David Forster <dforster@brocade.com>
Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
Reported-by: Siwar Zitouni <siwar.zitouni@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220623120015.32640-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If during an action flush operation one of the actions is still being
referenced, the flush operation is aborted and the kernel returns to
user space with an error. However, if the kernel was able to flush, for
example, 3 actions and failed on the fourth, the kernel will not notify
user space that it deleted 3 actions before failing.
This patch fixes that behaviour by notifying user space of how many
actions were deleted before flush failed and by setting extack with a
message describing what happened.
Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
When br_netfilter module is loaded, skbs may be diverted to the
ipv4/ipv6 hooks, just like as if we were routing.
Unfortunately, bridge filter hooks with priority 0 may be skipped
in this case.
Example:
1. an nftables bridge ruleset is loaded, with a prerouting
hook that has priority 0.
2. interface is added to the bridge.
3. no tcp packet is ever seen by the bridge prerouting hook.
4. flush the ruleset
5. load the bridge ruleset again.
6. tcp packets are processed as expected.
After 1) the only registered hook is the bridge prerouting hook, but its
not called yet because the bridge hasn't been brought up yet.
After 2), hook order is:
0 br_nf_pre_routing // br_netfilter internal hook
0 chain bridge f prerouting // nftables bridge ruleset
The packet is diverted to br_nf_pre_routing.
If call-iptables is off, the nftables bridge ruleset is called as expected.
But if its enabled, br_nf_hook_thresh() will skip it because it assumes
that all 0-priority hooks had been called previously in bridge context.
To avoid this, check for the br_nf_pre_routing hook itself, we need to
resume directly after it, even if this hook has a priority of 0.
Unfortunately, this still results in different packet flow.
With this fix, the eval order after in 3) is:
1. br_nf_pre_routing
2. ip(6)tables (if enabled)
3. nftables bridge
but after 5 its the much saner:
1. nftables bridge
2. br_nf_pre_routing
3. ip(6)tables (if enabled)
Unfortunately I don't see a solution here:
It would be possible to move br_nf_pre_routing to a higher priority
so that it will be called later in the pipeline, but this also impacts
ebtables evaluation order, and would still result in this very ordering
problem for all nftables-bridge hooks with the same priority as the
br_nf_pre_routing one.
Searching back through the git history I don't think this has
ever behaved in any other way, hence, no fixes-tag.
Reported-by: Radim Hrazdil <rhrazdil@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When verdict is NF_STOLEN, the skb might have been freed.
When tracing is enabled, this can result in a use-after-free:
1. access to skb->nf_trace
2. access to skb->mark
3. computation of trace id
4. dump of packet payload
To avoid 1, keep a cached copy of skb->nf_trace in the
trace state struct.
Refresh this copy whenever verdict is != STOLEN.
Avoid 2 by skipping skb->mark access if verdict is STOLEN.
3 is avoided by precomputing the trace id.
Only dump the packet when verdict is not "STOLEN".
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch fixes a race condition.
nft_rhash_update() might fail for two reasons:
- Element already exists in the hashtable.
- Another packet won race to insert an entry in the hashtable.
In both cases, new() has already bumped the counter via atomic_add_unless(),
therefore, decrement the set element counter.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
vif_dev_read() should be used from RCU protected sections only.
ipmr_rtm_dumplink() is holding RTNL, so the data structures
can not be changed.
syzbot reported:
net/ipv4/ipmr.c:84 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by syz-executor.4/3068:
stack backtrace:
CPU: 1 PID: 3068 Comm: syz-executor.4 Not tainted 5.19.0-rc3-syzkaller-00565-g5d04b0b634bb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
vif_dev_read net/ipv4/ipmr.c:84 [inline]
vif_dev_read net/ipv4/ipmr.c:82 [inline]
ipmr_fill_vif net/ipv4/ipmr.c:2756 [inline]
ipmr_rtm_dumplink+0x1343/0x18c0 net/ipv4/ipmr.c:2866
netlink_dump+0x541/0xc20 net/netlink/af_netlink.c:2275
__netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2380
netlink_dump_start include/linux/netlink.h:245 [inline]
rtnetlink_rcv_msg+0x73e/0xc90 net/core/rtnetlink.c:6046
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x334/0x810 net/socket.c:2489
___sys_sendmsg+0xf3/0x170 net/socket.c:2543
__sys_sendmmsg+0x195/0x470 net/socket.c:2629
__do_sys_sendmmsg net/socket.c:2658 [inline]
__se_sys_sendmmsg net/socket.c:2655 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2655
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fefd8a89109
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fefd9ca6168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007fefd8b9bf60 RCX: 00007fefd8a89109
RDX: 0000000004924b68 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007fefd8ae305d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc346febaf R14: 00007fefd9ca6300 R15: 0000000000022000
</TASK>
Fixes: ebc3197963fc ("ipmr: add rcu protection over (struct vif_device)->dev")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Shuang Li reported a NULL pointer dereference crash:
[] BUG: kernel NULL pointer dereference, address: 0000000000000068
[] RIP: 0010:tipc_link_is_up+0x5/0x10 [tipc]
[] Call Trace:
[] <IRQ>
[] tipc_bcast_rcv+0xa2/0x190 [tipc]
[] tipc_node_bc_rcv+0x8b/0x200 [tipc]
[] tipc_rcv+0x3af/0x5b0 [tipc]
[] tipc_udp_recv+0xc7/0x1e0 [tipc]
It was caused by the 'l' passed into tipc_bcast_rcv() is NULL. When it
creates a node in tipc_node_check_dest(), after inserting the new node
into hashtable in tipc_node_create(), it creates the bc link. However,
there is a gap between this insert and bc link creation, a bc packet
may come in and get the node from the hashtable then try to dereference
its bc link, which is NULL.
This patch is to fix it by moving the bc link creation before inserting
into the hashtable.
Note that for a preliminary node becoming "real", the bc link creation
should also be called before it's rehashed, as we don't create it for
preliminary nodes.
Fixes: 4cbf8ac2fe5a ("tipc: enable creating a "preliminary" node")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently added debug in commit f9aefd6b2aa3 ("net: warn if mac header
was not set") caught a bug in skb_tunnel_check_pmtu(), as shown
in this syzbot report [1].
In ndo_start_xmit() paths, there is really no need to use skb->mac_header,
because skb->data is supposed to point at it.
[1] WARNING: CPU: 1 PID: 8604 at include/linux/skbuff.h:2784 skb_mac_header_len include/linux/skbuff.h:2784 [inline]
WARNING: CPU: 1 PID: 8604 at include/linux/skbuff.h:2784 skb_tunnel_check_pmtu+0x5de/0x2f90 net/ipv4/ip_tunnel_core.c:413
Modules linked in:
CPU: 1 PID: 8604 Comm: syz-executor.3 Not tainted 5.19.0-rc2-syzkaller-00443-g8720bd951b8e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:skb_mac_header_len include/linux/skbuff.h:2784 [inline]
RIP: 0010:skb_tunnel_check_pmtu+0x5de/0x2f90 net/ipv4/ip_tunnel_core.c:413
Code: 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 80 3c 02 00 0f 84 b9 fe ff ff 4c 89 ff e8 7c 0f d7 f9 e9 ac fe ff ff e8 c2 13 8a f9 <0f> 0b e9 28 fc ff ff e8 b6 13 8a f9 48 8b 54 24 70 48 b8 00 00 00
RSP: 0018:ffffc90002e4f520 EFLAGS: 00010212
RAX: 0000000000000324 RBX: ffff88804d5fd500 RCX: ffffc90005b52000
RDX: 0000000000040000 RSI: ffffffff87f05e3e RDI: 0000000000000003
RBP: ffffc90002e4f650 R08: 0000000000000003 R09: 000000000000ffff
R10: 000000000000ffff R11: 0000000000000000 R12: 000000000000ffff
R13: 0000000000000000 R14: 000000000000ffcd R15: 000000000000001f
FS: 00007f3babba9700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000080 CR3: 0000000075319000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
geneve_xmit_skb drivers/net/geneve.c:927 [inline]
geneve_xmit+0xcf8/0x35d0 drivers/net/geneve.c:1107
__netdev_start_xmit include/linux/netdevice.h:4805 [inline]
netdev_start_xmit include/linux/netdevice.h:4819 [inline]
__dev_direct_xmit+0x500/0x730 net/core/dev.c:4309
dev_direct_xmit include/linux/netdevice.h:3007 [inline]
packet_direct_xmit+0x1b8/0x2c0 net/packet/af_packet.c:282
packet_snd net/packet/af_packet.c:3073 [inline]
packet_sendmsg+0x21f4/0x55d0 net/packet/af_packet.c:3104
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2489
___sys_sendmsg+0xf3/0x170 net/socket.c:2543
__sys_sendmsg net/socket.c:2572 [inline]
__do_sys_sendmsg net/socket.c:2581 [inline]
__se_sys_sendmsg net/socket.c:2579 [inline]
__x64_sys_sendmsg+0x132/0x220 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f3baaa89109
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3babba9168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f3baab9bf60 RCX: 00007f3baaa89109
RDX: 0000000000000000 RSI: 0000000020000a00 RDI: 0000000000000003
RBP: 00007f3baaae305d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe74f2543f R14: 00007f3babba9300 R15: 0000000000022000
</TASK>
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2022-06-25
this is a pull request of 22 patches for net-next/master.
The first 2 patches target the xilinx driver. Srinivas Neeli's patch
adds Transmitter Delay Compensation (TDC) support, a patch by me fixes
a typo.
The next patch is by me and fixes a typo in the m_can driver.
Another patch by me allows the configuration of fixed bit rates
without need for do_set_bittiming callback.
The following 7 patches are by Vincent Mailhol and refactor the
can-dev module and Kbuild, de-inline the can_dropped_invalid_skb()
function, which has grown over the time, and drop outgoing skbs if the
controller is in listen only mode.
Max Staudt's patch fixes a reference in the networking/can.rst
documentation.
Vincent Mailhol provides 2 patches with cleanups for the etas_es58x
driver.
Conor Dooley adds bindings for the mpfs-can to the PolarFire SoC dtsi.
Another patch by me allows the configuration of fixed data bit rates
without need for do_set_data_bittiming callback.
The last 5 patches are by Frank Jungclaus. They prepare the esd_usb
driver to add support for the the CAN-USB/3 device in a later series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The switch that is present on the Renesas RZ/N1 SoC uses a specific
VLAN value followed by 6 bytes which contains forwarding configuration.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to allow dsa drivers to specify the .get_rmon_stats()
operation.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some drivers might report that they are unable to bridge ports by
returning -EOPNOTSUPP, but still wants to override extack message.
In order to do so, in dsa_slave_changeupper(), if port_bridge_join()
returns -EOPNOTSUPP, check if extack message is set and if so, do not
override it.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I accidentally broke IPv4 traceroute, by swapping iph->saddr
and iph->daddr.
Probably because raw_icmp_error() and raw_v4_input()
use different order for iph->saddr and iph->daddr.
Fixes: ba44f8182ec2 ("raw: use more conventional iterators")
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220623193540.2851799-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the third packet of 3WHS connection establishment
contains payload, it is added into socket receive queue
without the XFRM check and the drop of connection tracking
context.
This means that if the data is left unread in the socket
receive queue, conntrack module can not be unloaded.
As most applications usually reads the incoming data
immediately after accept(), bug has been hiding for
quite a long time.
Commit 68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") exposed this bug because
even if the application reads this data, the skb
with nfct state could stay in a per-cpu cache for
an arbitrary time, if said cpu no longer process RX softirqs.
Many thanks to Ilya Maximets for reporting this issue,
and for testing various patches:
https://lore.kernel.org/netdev/20220619003919.394622-1-i.maximets@ovn.org/
Note that I also added a missing xfrm4_policy_check() call,
although this is probably not a big issue, as the SYN
packet should have been dropped earlier.
Fixes: b59c270104f0 ("[NETFILTER]: Keep conntrack reference until IPsec policy checks are done")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/r/20220623050436.1290307-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the len fields manipulation in the skbs to a helper function.
There is a comment specifically requesting this and there are several
other areas in the code displaying the same pattern which can be
refactored.
This improves code readability.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Link: https://lore.kernel.org/r/20220622160853.GA6478@debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mrt_lock is only held in write mode, from process context only.
We can switch to a mere spinlock, and avoid blocking BH.
Also, vif_dev_read() is always called under standard rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mrt_lock is only held in write mode, from process context only.
We can switch to a mere spinlock, and avoid blocking BH.
Also, vif_dev_read() is always called under standard rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can use standard rcu_read_lock(), to get rid
of last read_lock(&mrt_lock) call points.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We no longer need to acquire mrt_lock() in mr_dump,
using rcu_read_lock() is enough.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Like ipmr_get_route(), we can use standard RCU here.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip6_mr_forward() uses standard RCU protection already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rcu_read_lock() protection is good enough.
ip6mr_cache_unresolved() uses a dedicated spinlock (mfc_unres_lock)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rcu_read_lock() protection is good enough.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rcu_read_lock() protection is more than enough.
vif_dev_read() supports either mrt_lock or rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip6mr_cache_report() first argument can be marked const, and we change
the caller convention about which lock needs to be held.
Instead of read_lock(&mrt_lock), we can use rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|