Age | Commit message (Collapse) | Author |
|
Currently, we assume the skb is associated with a device before calling
__ip_options_compile, which is not always the case if it is re-routed by
ipvs.
When skb->dev is NULL, dev_net(skb->dev) will become null-dereference.
This patch adds a check for the edge case and switch to use the net_device
from the rtable when skb->dev is NULL.
Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Cc: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them.
It is less verbose and it improves the type checking and semantic.
While at it, add missing header inclusion (should be bitops.h,
but with the above change it becomes bitmap.h).
Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230911154534.4174265-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
The following pull-request contains BPF updates for your *net* tree.
We've added 21 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 450 insertions(+), 36 deletions(-).
The main changes are:
1) Adjust bpf_mem_alloc buckets to match ksize(), from Hou Tao.
2) Check whether override is allowed in kprobe mult, from Jiri Olsa.
3) Fix btf_id symbol generation with ld.lld, from Jiri and Nick.
4) Fix potential deadlock when using queue and stack maps from NMI, from Toke Høiland-Jørgensen.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Thanks a lot!
Also thanks to reporters, reviewers and testers of commits in this pull-request:
Alan Maguire, Biju Das, Björn Töpel, Dan Carpenter, Daniel Borkmann,
Eduard Zingerman, Hsin-Wei Hung, Marcus Seyfarth, Nathan Chancellor,
Satya Durga Srinivasu Prabhala, Song Liu, Stephen Rothwell
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bpf_nf testcase fails on s390x: bpf_skb_ct_lookup() cannot find the entry
that was added by bpf_ct_insert_entry() within the same BPF function.
The reason is that this entry is deleted by nf_ct_gc_expired().
The CT timeout starts ticking after the CT confirmation; therefore
nf_conn.timeout is initially set to the timeout value, and
__nf_conntrack_confirm() sets it to the deadline value.
bpf_ct_insert_entry() sets IPS_CONFIRMED_BIT, but does not adjust the
timeout, making its value meaningless and causing false positives.
Fix the problem by making bpf_ct_insert_entry() adjust the timeout,
like __nf_conntrack_confirm().
Fixes: 2cdaa3eefed8 ("netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/bpf/20230830011128.1415752-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
netfilter pull request 23-09-13
====================
The following patchset contains Netfilter fixes for net:
1) Do not permit to remove rules from chain binding, otherwise
double rule release is possible, triggering UaF. This rule
deletion support does not make sense and userspace does not use
this. Problem exists since the introduction of chain binding support.
2) rbtree GC worker only collects the elements that have expired.
This operation is not destructive, therefore, turn write into
read spinlock to avoid datapath contention due to GC worker run.
This was not fixed in the recent GC fix batch in the 6.5 cycle.
3) pipapo set backend performs sync GC, therefore, catchall elements
must use sync GC queue variant. This bug was introduced in the
6.5 cycle with the recent GC fixes.
4) Stop GC run if memory allocation fails in pipapo set backend,
otherwise access to NULL pointer to GC transaction object might
occur. This bug was introduced in the 6.5 cycle with the recent
GC fixes.
5) rhash GC run uses an iterator that might hit EAGAIN to rewind,
triggering double-collection of the same element. This bug was
introduced in the 6.5 cycle with the recent GC fixes.
6) Do not permit to remove elements in anonymous sets, this type of
sets are populated once and then bound to rules. This fix is
similar to the chain binding patch coming first in this batch.
API permits since the very beginning but it has no use case from
userspace.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a PTP ethernet raw frame with a size of more than 256 bytes followed
by a 0xff pattern is sent to __skb_flow_dissect, nhoff value calculation
is wrong. For example: hdr->message_length takes the wrong value (0xffff)
and it does not replicate real header length. In this case, 'nhoff' value
was overridden and the PTP header was badly dissected. This leads to a
kernel crash.
net/core: flow_dissector
net/core flow dissector nhoff = 0x0000000e
net/core flow dissector hdr->message_length = 0x0000ffff
net/core flow dissector nhoff = 0x0001000d (u16 overflow)
...
skb linear: 00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88
skb frag: 00000000: f7 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Using the size of the ptp_header struct will allow the corrected
calculation of the nhoff value.
net/core flow dissector nhoff = 0x0000000e
net/core flow dissector nhoff = 0x00000030 (sizeof ptp_header)
...
skb linear: 00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88 f7 ff ff
skb linear: 00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
skb linear: 00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
skb frag: 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Kernel trace:
[ 74.984279] ------------[ cut here ]------------
[ 74.989471] kernel BUG at include/linux/skbuff.h:2440!
[ 74.995237] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 75.001098] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G U 5.15.85-intel-ese-standard-lts #1
[ 75.011629] Hardware name: Intel Corporation A-Island (CPU:AlderLake)/A-Island (ID:06), BIOS SB_ADLP.01.01.00.01.03.008.D-6A9D9E73-dirty Mar 30 2023
[ 75.026507] RIP: 0010:eth_type_trans+0xd0/0x130
[ 75.031594] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9
[ 75.052612] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297
[ 75.058473] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003
[ 75.066462] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300
[ 75.074458] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800
[ 75.082466] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010
[ 75.090461] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800
[ 75.098464] FS: 0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000
[ 75.107530] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 75.113982] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0
[ 75.121980] PKRU: 55555554
[ 75.125035] Call Trace:
[ 75.127792] <IRQ>
[ 75.130063] ? eth_get_headlen+0xa4/0xc0
[ 75.134472] igc_process_skb_fields+0xcd/0x150
[ 75.139461] igc_poll+0xc80/0x17b0
[ 75.143272] __napi_poll+0x27/0x170
[ 75.147192] net_rx_action+0x234/0x280
[ 75.151409] __do_softirq+0xef/0x2f4
[ 75.155424] irq_exit_rcu+0xc7/0x110
[ 75.159432] common_interrupt+0xb8/0xd0
[ 75.163748] </IRQ>
[ 75.166112] <TASK>
[ 75.168473] asm_common_interrupt+0x22/0x40
[ 75.173175] RIP: 0010:cpuidle_enter_state+0xe2/0x350
[ 75.178749] Code: 85 c0 0f 8f 04 02 00 00 31 ff e8 39 6c 67 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 50 02 00 00 31 ff e8 52 b0 6d ff fb 45 85 f6 <0f> 88 b1 00 00 00 49 63 ce 4c 2b 2c 24 48 89 c8 48 6b d1 68 48 c1
[ 75.199757] RSP: 0018:ffff9948c013bea8 EFLAGS: 00000202
[ 75.205614] RAX: ffff8e4e8fb00000 RBX: ffffb948bfd23900 RCX: 000000000000001f
[ 75.213619] RDX: 0000000000000004 RSI: ffffffff94206161 RDI: ffffffff94212e20
[ 75.221620] RBP: 0000000000000004 R08: 000000117568973a R09: 0000000000000001
[ 75.229622] R10: 000000000000afc8 R11: ffff8e4e8fb29ce4 R12: ffffffff945ae980
[ 75.237628] R13: 000000117568973a R14: 0000000000000004 R15: 0000000000000000
[ 75.245635] ? cpuidle_enter_state+0xc7/0x350
[ 75.250518] cpuidle_enter+0x29/0x40
[ 75.254539] do_idle+0x1d9/0x260
[ 75.258166] cpu_startup_entry+0x19/0x20
[ 75.262582] secondary_startup_64_no_verify+0xc2/0xcb
[ 75.268259] </TASK>
[ 75.270721] Modules linked in: 8021q snd_sof_pci_intel_tgl snd_sof_intel_hda_common tpm_crb snd_soc_hdac_hda snd_sof_intel_hda snd_hda_ext_core snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress iTCO_wdt ac97_bus intel_pmc_bxt mei_hdcp iTCO_vendor_support snd_hda_codec_hdmi pmt_telemetry intel_pmc_core pmt_class snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel snd_pcm snd_timer kvm snd mei_me soundcore tpm_tis irqbypass i2c_i801 mei tpm_tis_core pcspkr intel_rapl_msr tpm i2c_smbus intel_pmt thermal sch_fq_codel uio uhid i915 drm_buddy video drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm fuse configfs
[ 75.342736] ---[ end trace 3785f9f360400e3a ]---
[ 75.347913] RIP: 0010:eth_type_trans+0xd0/0x130
[ 75.352984] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9
[ 75.373994] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297
[ 75.379860] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003
[ 75.387856] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300
[ 75.395864] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800
[ 75.403857] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010
[ 75.411863] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800
[ 75.419875] FS: 0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000
[ 75.428946] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 75.435403] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0
[ 75.443410] PKRU: 55555554
[ 75.446477] Kernel panic - not syncing: Fatal exception in interrupt
[ 75.453738] Kernel Offset: 0x11c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 75.465794] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fixes: 4f1cc51f3488 ("net: flow_dissector: Parse PTP L2 packet header")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd720
("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by
updating kcm_tx_msg(head)->last_skb if partial data is copied so that the
following sendmsg() will resume from the skb.
However, we cannot know how many bytes were copied when we get the error.
Thus, we could mess up the MSG_MORE queue.
When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we
do so for UDP by udp_flush_pending_frames().
Even without this change, when the error occurred, the following sendmsg()
resumed from a wrong skb and the queue was messed up. However, we have
yet to get such a report, and only syzkaller stumbled on it. So, this
can be changed safely.
Note this does not change SOCK_SEQPACKET behaviour.
Fixes: c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()")
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The value in idx and the number of rules handled in that particular
__nf_tables_dump_rules() call is not identical. The former is a cursor
to pick up from if multiple netlink messages are needed, so its value is
ever increasing. Fixing this is not just a matter of subtracting s_idx
from it, though: When resetting rules in multiple chains,
__nf_tables_dump_rules() is called for each and cb->args[0] is not
adjusted in between. Introduce a dedicated counter to record the number
of rules reset in this call in a less confusing way.
While being at it, prevent the direct return upon buffer exhaustion: Any
rules previously dumped into that skb would evade audit logging
otherwise.
Fixes: 9b5ba5c9c5109 ("netfilter: nf_tables: Unbreak audit log reset")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The size table is incorrect due to copypaste error,
this reserves more size than needed.
TSTAMP reserved 32 instead of 16 bytes.
TIMEOUT reserved 16 instead of 8 bytes.
Fixes: 5f31edc0676b ("netfilter: conntrack: move extension sizes into core")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently, when GETDEVICEINFO returns multiple locations where each
is a different IP but the server's identity is same as MDS, then
nfs4_set_ds_client() finds the existing nfs_client structure which
has the MDS's max_connect value (and if it's 1), then the 1st IP
on the DS's list will get dropped due to MDS trunking rules. Other
IPs would be added as they fall under the pnfs trunking rules.
For the list of IPs the 1st goes thru calling nfs4_set_ds_client()
which will eventually call nfs4_add_trunk() and call into
rpc_clnt_test_and_add_xprt() which has the check for MDS trunking.
The other IPs (after the 1st one), would call rpc_clnt_add_xprt()
which doesn't go thru that check.
nfs4_add_trunk() is called when MDS trunking is happening and it
needs to enforce the usage of max_connect mount option of the
1st mount. However, this shouldn't be applied to pnfs flow.
Instead, this patch proposed to treat MDS=DS as DS trunking and
make sure that MDS's max_connect limit does not apply to the
1st IP returned in the GETDEVICEINFO list. It does so by
marking the newly created client with a new flag NFS_CS_PNFS
which then used to pass max_connect value to use into the
rpc_clnt_test_and_add_xprt() instead of the existing rpc
client's max_connect value set by the MDS connection.
For example, mount was done without max_connect value set
so MDS's rpc client has cl_max_connect=1. Upon calling into
rpc_clnt_test_and_add_xprt() and using rpc client's value,
the caller passes in max_connect value which is previously
been set in the pnfs path (as a part of handling
GETDEVICEINFO list of IPs) in nfs4_set_ds_client().
However, when NFS_CS_PNFS flag is not set and we know we
are doing MDS trunking, comparing a new IP of the same
server, we then set the max_connect value to the
existing MDS's value and pass that into
rpc_clnt_test_and_add_xprt().
Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This reverts commit 0701214cd6e66585a999b132eb72ae0489beb724.
The premise of this commit was incorrect. There are exactly 2 cases
where rpcauth_checkverf() will return an error:
1) If there was an XDR decode problem (i.e. garbage data).
2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC.
In the second case, there are again 2 subcases:
a) The GSS context expires, in which case gss_validate() will force a
new context negotiation on retry by invalidating the cred.
b) The sequence number check failed because an RPC call timed out, and
the client retransmitted the request using a new sequence number,
as required by RFC2203.
In neither subcase is this a fatal error.
Reported-by: Russell Cattelan <cattelan@thebarn.com>
Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
If the server rejects the credential as being stale, or bad, then we
should mark it for revalidation before retransmitting.
Fixes: 7f5667a5f8c4 ("SUNRPC: Clean up rpc_verify_header()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Since the changed field size was increased to u64, mesh_bss_info_changed
pulls invalid bits from the first 3 bytes of the mesh id, clears them, and
passes them on to ieee80211_link_info_change_notify, because
ifmsh->mbss_changed was not updated to match its size.
Fix this by turning into ifmsh->mbss_changed into an unsigned long array with
64 bit size.
Fixes: 15ddba5f4311 ("wifi: mac80211: consistently use u64 for BSS changes")
Reported-by: Thomas Hühn <thomas.huehn@hs-nordhausen.de>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230913050134.53536-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since bhash2 was introduced, the example below does not work as expected.
These two bind() should conflict, but the 2nd bind() now succeeds.
from socket import *
s1 = socket(AF_INET6, SOCK_STREAM)
s1.bind(('::ffff:127.0.0.1', 0))
s2 = socket(AF_INET, SOCK_STREAM)
s2.bind(('127.0.0.1', s1.getsockname()[1]))
During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find()
fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates
a new tb2 for the 2nd socket. Then, we call inet_csk_bind_conflict() that
checks conflicts in the new tb2 by inet_bhash2_conflict(). However, the
new tb2 does not include the 1st socket, thus the bind() finally succeeds.
In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has
the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find()
returns the 1st socket's tb2.
Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1,
the 2nd bind() fails properly for the same reason mentinoed in the previous
commit.
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrei Vagin reported bind() regression with strace logs.
If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.
from socket import *
s1 = socket(AF_INET6, SOCK_STREAM)
s1.bind(('::ffff:0.0.0.0', 0))
s2 = socket(AF_INET, SOCK_STREAM)
s2.bind(('127.0.0.1', s1.getsockname()[1]))
During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
if tb has the v4-mapped-v6 wildcard address.
The example above does not work after commit 5456262d2baa ("net: Fix
incorrect address comparison when searching for a bind2 bucket"), but
the blamed change is not the commit.
Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
as 0.0.0.0, and the sequence above worked by chance. Technically, this
case has been broken since bhash2 was introduced.
Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
the 2nd bind() fails properly because we fall back to using bhash to
detect conflicts for the v4-mapped-v6 address.
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Andrei Vagin <avagin@google.com>
Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet_bind2_bucket_match(_addr_any).
This is a prep patch to make the following patches cleaner that touch
inet_bind2_bucket_match() and inet_bind2_bucket_match_addr_any().
Both functions have duplicated comparison for netns, port, and l3mdev.
Let's factorise them.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I got the below warning when do fuzzing test:
BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9
CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE
Hardware name: linux,dummy-virt (DT)
Workqueue: pencrypt_parallel padata_parallel_worker
Call trace:
dump_backtrace+0x0/0x420
show_stack+0x34/0x44
dump_stack+0x1d0/0x248
__kasan_report+0x138/0x140
kasan_report+0x44/0x6c
__asan_load4+0x94/0xd0
scatterwalk_copychunks+0x320/0x470
skcipher_next_slow+0x14c/0x290
skcipher_walk_next+0x2fc/0x480
skcipher_walk_first+0x9c/0x110
skcipher_walk_aead_common+0x380/0x440
skcipher_walk_aead_encrypt+0x54/0x70
ccm_encrypt+0x13c/0x4d0
crypto_aead_encrypt+0x7c/0xfc
pcrypt_aead_enc+0x28/0x84
padata_parallel_worker+0xd0/0x2dc
process_one_work+0x49c/0xbdc
worker_thread+0x124/0x880
kthread+0x210/0x260
ret_from_fork+0x10/0x18
This is because the value of rec_seq of tls_crypto_info configured by the
user program is too large, for example, 0xffffffffffffff. In addition, TLS
is asynchronously accelerated. When tls_do_encryption() returns
-EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
skmsg is released before the asynchronous encryption process ends. As a
result, the UAF problem occurs during the asynchronous processing of the
encryption module.
If the operation is asynchronous and the encryption module returns
EINPROGRESS, do not free the record information.
Fixes: 635d93981786 ("net/tls: free record only on encryption error")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Max Schulze reports crashes with brcmfmac. The reason seems
to be a race between userspace removing the CQM config and
the driver calling cfg80211_cqm_rssi_notify(), where if the
data is freed while cfg80211_cqm_rssi_notify() runs it will
crash since it assumes wdev->cqm_config is set. This can't
be fixed with a simple non-NULL check since there's nothing
we can do for locking easily, so use RCU instead to protect
the pointer, but that requires pulling the updates out into
an asynchronous worker so they can sleep and call back into
the driver.
Since we need to change the free anyway, also change it to
go back to the old settings if changing the settings fails.
Reported-and-tested-by: Max Schulze <max.schulze@online.de>
Closes: https://lore.kernel.org/r/ac96309a-8d8d-4435-36e6-6d152eb31876@online.de
Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Many regulatories can have HE/EHT Operation as not permitted. In such
cases, AP should not be allowed to start if it is using a channel
having the no operation flag set. However, currently there is no such
check in place.
Fix this issue by validating such IEs sent during start AP against the
channel flags.
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230905064857.1503-1-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When connect to MLO AP with more than one link, and the assoc response of
AP is not success, then cfg80211_unhold_bss() is not called for all the
links' cfg80211_bss except the primary link which means the link used by
the latest successful association request. Thus the hold value of the
cfg80211_bss is not reset to 0 after the assoc fail, and then the
__cfg80211_unlink_bss() will not be called for the cfg80211_bss by
__cfg80211_bss_expire().
Then the AP always looks exist even the AP is shutdown or reconfigured
to another type, then it will lead error while connecting it again.
The detail info are as below.
When connect with muti-links AP, cfg80211_hold_bss() is called by
cfg80211_mlme_assoc() for each cfg80211_bss of all the links. When
assoc response from AP is not success(such as status_code==1), the
ieee80211_link_data of non-primary link(sdata->link[link_id]) is NULL
because ieee80211_assoc_success()->ieee80211_vif_update_links() is
not called for the links.
Then struct cfg80211_rx_assoc_resp resp in cfg80211_rx_assoc_resp() and
struct cfg80211_connect_resp_params cr in __cfg80211_connect_result()
will only have the data of the primary link, and finally function
cfg80211_connect_result_release_bsses() only call cfg80211_unhold_bss()
for the primary link. Then cfg80211_bss of the other links will never free
because its hold is always > 0 now.
Hence assign value for the bss and status from assoc_data since it is
valid for this case. Also assign value of addr from assoc_data when the
link is NULL because the addrs of assoc_data and link both represent the
local link addr and they are same value for success connection.
Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Link: https://lore.kernel.org/r/20230825070055.28164-1-quic_wgong@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Anonymous sets need to be populated once at creation and then they are
bound to rule since 938154b93be8 ("netfilter: nf_tables: reject unbound
anonymous set before commit phase"), otherwise transaction reports
EINVAL.
Userspace does not need to delete elements of anonymous sets that are
not yet bound, reject this with EOPNOTSUPP.
From flush command path, skip anonymous sets, they are expected to be
bound already. Otherwise, EINVAL is hit at the end of this transaction
for unbound sets.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
syzbot reported a memory leak like below:
BUG: memory leak
unreferenced object 0xffff88810b088c00 (size 240):
comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s)
hex dump (first 32 bytes):
00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634
[<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline]
[<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815
[<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline]
[<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748
[<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494
[<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548
[<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577
[<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
[<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append
newly allocated skbs to 'head'. If some bytes are copied, an error occurred,
and jumped to out_error label, 'last_skb' is left unmodified. A later
kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the
'head' frag_list and causing the leak.
This patch fixes this issue by properly updating the last allocated skb in
'last_skb'.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzbot reports the following uninit-value access problem.
=====================================================
BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline]
BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
fill_frame_info net/hsr/hsr_forward.c:601 [inline]
hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560
__dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3087 [inline]
packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:730 [inline]
sock_sendmsg net/socket.c:753 [inline]
__sys_sendto+0x781/0xa30 net/socket.c:2176
__do_sys_sendto net/socket.c:2188 [inline]
__se_sys_sendto net/socket.c:2184 [inline]
__ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
entry_SYSENTER_compat_after_hwframe+0x70/0x82
Uninit was created at:
slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
slab_alloc_node mm/slub.c:3478 [inline]
kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523
kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559
__alloc_skb+0x318/0x740 net/core/skbuff.c:644
alloc_skb include/linux/skbuff.h:1286 [inline]
alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299
sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794
packet_alloc_skb net/packet/af_packet.c:2936 [inline]
packet_snd net/packet/af_packet.c:3030 [inline]
packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:730 [inline]
sock_sendmsg net/socket.c:753 [inline]
__sys_sendto+0x781/0xa30 net/socket.c:2176
__do_sys_sendto net/socket.c:2188 [inline]
__se_sys_sendto net/socket.c:2184 [inline]
__ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
entry_SYSENTER_compat_after_hwframe+0x70/0x82
It is because VLAN not yet supported in hsr driver. Return error
when protocol is ETH_P_8021Q in fill_frame_info() now to fix it.
Fixes: 451d8123f897 ("net: prp: add packet handling support")
Reported-by: syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
smcr_port_add
While doing smcr_port_add, there maybe linkgroup add into or delete
from smc_lgr_list.list at the same time, which may result kernel crash.
So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in
smcr_port_add.
The crash calltrace show below:
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
Workqueue: events smc_ib_port_event_work [smc]
RIP: 0010:smcr_port_add+0xa6/0xf0 [smc]
RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297
RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918
R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4
R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08
FS: 0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0
PKRU: 55555554
Call Trace:
smc_ib_port_event_work+0x18f/0x380 [smc]
process_one_work+0x19b/0x340
worker_thread+0x30/0x370
? process_one_work+0x340/0x340
kthread+0x114/0x130
? __kthread_cancel_work+0x50/0x50
ret_from_fork+0x1f/0x30
Fixes: 1f90a05d9ff9 ("net/smc: add smcr_port_add() and smcr_link_up() processing")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the macro SMC_STAT_SERV_SUCC_INC, the smcd_version is used
to determin whether to increase the v1 statistic or the v2
statistic. It is correct for SMCD. But for SMCR, smcr_version
should be used.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I got the below warning when do fuzzing test:
unregister_netdevice: waiting for bond0 to become free. Usage count = 2
It can be repoduced via:
ip link add bond0 type bond
sysctl -w net.ipv4.conf.bond0.promote_secondaries=1
ip addr add 4.117.174.103/0 scope 0x40 dev bond0
ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0
ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0
ip addr del 4.117.174.103/0 scope 0x40 dev bond0
ip link delete bond0 type bond
In this reproduction test case, an incorrect 'last_prim' is found in
__inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40)
is lost. The memory of the secondary address is leaked and the reference of
in_device and net_device is leaked.
Fix this problem:
Look for 'last_prim' starting at location of the deleted IP and inserting
the promoted IP into the location of 'last_prim'.
Fixes: 0ff60a45678e ("[IPV4]: Fix secondary IP addresses after promotion")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking updates from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- eth: stmmac: fix failure to probe without MAC interface specified
Current release - new code bugs:
- docs: netlink: fix missing classic_netlink doc reference
Previous releases - regressions:
- deal with integer overflows in kmalloc_reserve()
- use sk_forward_alloc_get() in sk_get_meminfo()
- bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
- fib: avoid warn splat in flow dissector after packet mangling
- skb_segment: call zero copy functions before using skbuff frags
- eth: sfc: check for zero length in EF10 RX prefix
Previous releases - always broken:
- af_unix: fix msg_controllen test in scm_pidfd_recv() for
MSG_CMSG_COMPAT
- xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
- netfilter:
- nft_exthdr: fix non-linear header modification
- xt_u32, xt_sctp: validate user space input
- nftables: exthdr: fix 4-byte stack OOB write
- nfnetlink_osf: avoid OOB read
- one more fix for the garbage collection work from last release
- igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
- bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
- handshake: fix null-deref in handshake_nl_done_doit()
- ip: ignore dst hint for multipath routes to ensure packets are
hashed across the nexthops
- phy: micrel:
- correct bit assignments for cable test errata
- disable EEE according to the KSZ9477 errata
Misc:
- docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
- Revert "net: macsec: preserve ingress frame ordering", it appears
to have been developed against an older kernel, problem doesn't
exist upstream"
* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
Revert "net: team: do not use dynamic lockdep key"
net: hns3: remove GSO partial feature bit
net: hns3: fix the port information display when sfp is absent
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
net: hns3: fix debugfs concurrency issue between kfree buffer and read
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
net: hns3: Support query tx timeout threshold by debugfs
net: hns3: fix tx timeout issue
net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
...
|
|
Skip GC run if iterator rewinds to the beginning with EAGAIN, otherwise GC
might collect the same element more than once.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nft_trans_gc_queue_sync() enqueues the GC transaction and it allocates a
new one. If this allocation fails, then stop this GC sync run and retry
later.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
pipapo needs to enqueue GC transactions for catchall elements through
nft_trans_gc_queue_sync(). Add nft_trans_gc_catchall_sync() and
nft_trans_gc_catchall_async() to handle GC transaction queueing
accordingly.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
rbtree GC does not modify the datastructure, instead it collects expired
elements and it enqueues a GC transaction. Use a read spinlock instead
to avoid data contention while GC worker is running.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Chain binding only requires the rule addition/insertion command within
the same transaction. Removal of rules from chain bindings within the
same transaction makes no sense, userspace does not utilize this
feature. Replace nft_chain_is_bound() check to nft_chain_binding() in
rule deletion commands. Replace command implies a rule deletion, reject
this command too.
Rule flush command can also safely rely on this nft_chain_binding()
check because unbound chains are not allowed since 62e1e94b246e
("netfilter: nf_tables: reject unbound chain set before commit phase").
Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
This time almost all fixes are for old bugs:
First patch fixes a 4-byte stack OOB write, from myself.
This was broken ever since nftables was switches from 128 to 32bit
register addressing in v4.1.
2nd patch fixes an out-of-bounds read.
This has been broken ever since xt_osf got added in 2.6.31, the bug
was then just moved around during refactoring, from Wander Lairson Costa.
3rd patch adds a missing enum description, from Phil Sutter.
4th patch fixes a UaF inftables that occurs when userspace adds
elements with a timeout so small that expiration happens while the
transaction is still in progress. Fix from Pablo Neira Ayuso.
Patch 5 fixes a memory out of bounds access, this was
broken since v4.20. Patch from Kyle Zeng and Jozsef Kadlecsik.
Patch 6 fixes another bogus memory access when building audit
record. Bug added in the previous pull request, fix from Pablo.
netfilter pull request 2023-09-06
* tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
====================
Link: https://lore.kernel.org/r/20230906162525.11079-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-09-06
We've added 9 non-merge commits during the last 6 day(s) which contain
a total of 12 files changed, 189 insertions(+), 44 deletions(-).
The main changes are:
1) Fix bpf_sk_storage to address an invalid wait context lockdep
report and another one to address missing omem uncharge,
from Martin KaFai Lau.
2) Two BPF recursion detection related fixes,
from Sebastian Andrzej Siewior.
3) Fix tailcall limit enforcement in trampolines for s390 JIT,
from Ilya Leoshkevich.
4) Fix a sockmap refcount race where skbs in sk_psock_backlog can
be referenced after user space side has already skb_consumed them,
from John Fastabend.
5) Fix BPF CI flake/race wrt sockmap vsock write test where
the transport endpoint is not connected, from Xu Kuohai.
6) Follow-up doc fix to address a cross-link warning,
from Eduard Zingerman.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
bpf, sockmap: Fix skb refcnt race after locking changes
docs/bpf: Fix "file doesn't exist" warnings in {llvm_reloc,btf}.rst
selftests/bpf: Fix a CI failure caused by vsock write
====================
Link: https://lore.kernel.org/r/20230906095117.16941-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull ceph updates from Ilya Dryomov:
"Mixed with some fixes and cleanups, this brings in reasonably complete
fscrypt support to CephFS! The list of things which don't work with
encryption should be fairly short, mostly around the edges: fallocate
(not supported well in CephFS to begin with), copy_file_range
(requires re-encryption), non-default striping patterns.
This was a multi-year effort principally by Jeff Layton with
assistance from Xiubo Li, Luís Henriques and others, including several
dependant changes in the MDS, netfs helper library and fscrypt
framework itself"
* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
ceph: make num_fwd and num_retry to __u32
ceph: make members in struct ceph_mds_request_args_ext a union
rbd: use list_for_each_entry() helper
libceph: do not include crypto/algapi.h
ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
ceph: fix updating i_truncate_pagecache_size for fscrypt
ceph: wait for OSD requests' callbacks to finish when unmounting
ceph: drop messages from MDS when unmounting
ceph: update documentation regarding snapshot naming limitations
ceph: prevent snapshot creation in encrypted locked directories
ceph: add support for encrypted snapshot names
ceph: invalidate pages when doing direct/sync writes
ceph: plumb in decryption during reads
ceph: add encryption support to writepage and writepages
ceph: add read/modify/write to ceph_sync_write
ceph: align data in pages in ceph_sync_write
ceph: don't use special DIO path for encrypted inodes
ceph: add truncate size handling support for fscrypt
ceph: add object version support for sync read
libceph: allow ceph_osdc_new_request to accept a multi-op read
...
|
|
Deliver audit log from __nf_tables_dump_rules(), table dereference at
the end of the table list loop might point to the list head, leading to
this crash.
[ 4137.407349] BUG: unable to handle page fault for address: 00000000001f3c50
[ 4137.407357] #PF: supervisor read access in kernel mode
[ 4137.407359] #PF: error_code(0x0000) - not-present page
[ 4137.407360] PGD 0 P4D 0
[ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ #277
[ 4137.407369] RIP: 0010:string+0x49/0xd0
[ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81
[ 4137.407377] RSP: 0018:ffff8881179737f0 EFLAGS: 00010286
[ 4137.407379] RAX: 00000000001f2c50 RBX: ffff888117973848 RCX: ffff0a00ffffff04
[ 4137.407380] RDX: 00000000001f3c50 RSI: 0000000000000000 RDI: 0000000000000000
[ 4137.407381] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[ 4137.407383] R10: ffffffffffffffff R11: ffff88813584d200 R12: 0000000000000000
[ 4137.407384] R13: ffffffffa15cf709 R14: 0000000000000000 R15: ffffffffa15cf709
[ 4137.407385] FS: 00007fcfc18bb580(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000
[ 4137.407387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4137.407388] CR2: 00000000001f3c50 CR3: 00000001055b2001 CR4: 00000000001706e0
[ 4137.407390] Call Trace:
[ 4137.407392] <TASK>
[ 4137.407393] ? __die+0x1b/0x60
[ 4137.407397] ? page_fault_oops+0x6b/0xa0
[ 4137.407399] ? exc_page_fault+0x60/0x120
[ 4137.407403] ? asm_exc_page_fault+0x22/0x30
[ 4137.407408] ? string+0x49/0xd0
[ 4137.407410] vsnprintf+0x257/0x4f0
[ 4137.407414] kvasprintf+0x3e/0xb0
[ 4137.407417] kasprintf+0x3e/0x50
[ 4137.407419] nf_tables_dump_rules+0x1c0/0x360 [nf_tables]
[ 4137.407439] ? __alloc_skb+0xc3/0x170
[ 4137.407442] netlink_dump+0x170/0x330
[ 4137.407447] __netlink_dump_start+0x227/0x300
[ 4137.407449] nf_tables_getrule+0x205/0x390 [nf_tables]
Deliver audit log only once at the end of the rule dump+reset for
consistency with the set dump+reset.
Ensure audit reset access to table under rcu read side lock. The table
list iteration holds rcu read lock side, but recent audit code
dereferences table object out of the rcu read lock side.
Fixes: ea078ae9108e ("netfilter: nf_tables: Audit log rule reset")
Fixes: 7e9be1124dbe ("netfilter: nf_tables: Audit log setelem reset")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
ip_set_hash_netportnet.c
The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can
lead to the use of wrong `CIDR_POS(c)` for calculating array offsets,
which can lead to integer underflow. As a result, it leads to slab
out-of-bound access.
This patch adds back the IP_SET_HASH_WITH_NET0 macro to
ip_set_hash_netportnet to address the issue.
Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net")
Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
New elements in this transaction might expired before such transaction
ends. Skip sync GC for such elements otherwise commit path might walk
over an already released object. Once transaction is finished, async GC
will collect such expired element.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
The opt_num field is controlled by user mode and is not currently
validated inside the kernel. An attacker can take advantage of this to
trigger an OOB read and potentially leak information.
BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
Read of size 2 at addr ffff88804bc64272 by task poc/6431
CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1
Call Trace:
nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281
nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47
expr_call_ops_eval net/netfilter/nf_tables_core.c:214
nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264
nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23
[..]
Also add validation to genre, subtype and version fields.
Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
If priv->len is a multiple of 4, then dst[len / 4] can write past
the destination array which leads to stack corruption.
This construct is necessary to clean the remainder of the register
in case ->len is NOT a multiple of the register size, so make it
conditional just like nft_payload.c does.
The bug was added in 4.1 cycle and then copied/inherited when
tcp/sctp and ip option support was added.
Bug reported by Zero Day Initiative project (ZDI-CAN-21950,
ZDI-CAN-21951, ZDI-CAN-21961).
Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching")
Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks")
Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options")
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
__skb_get_hash_symmetric() was added to compute a symmetric hash over
the protocol, addresses and transport ports, by commit eb70db875671
("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses
flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate
IPv4 addresses, IPv6 addresses and ports. However, it should not specify
the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further
dissection when an IPv6 flow label is encountered, making transport
ports not being incorporated in such case.
As a consequence, the symmetric hash is based on 5-tuple for IPv4 but
3-tuple for IPv6 when flow label is present. It caused a few problems,
e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash
to perform load balancing as different L4 flows between two given IPv6
addresses would always get the same symmetric hash, leading to uneven
traffic distribution.
Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the
symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.
Fixes: eb70db875671 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.")
Reported-by: Lars Ekman <uablrek@gmail.com>
Closes: https://github.com/antrea-io/antrea/issues/5457
Signed-off-by: Quan Tian <qtian@vmware.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a follow up of commit 915d975b2ffa ("net: deal with integer
overflows in kmalloc_reserve()") based on David Laight feedback.
Back in 2010, I failed to realize malicious users could set dev->mtu
to arbitrary values. This mtu has been since limited to 0x7fffffff but
regardless of how big dev->mtu is, it makes no sense for igmpv3_newpack()
to allocate more than IP_MAX_MTU and risk various skb fields overflows.
Fixes: 57e1ab6eaddc ("igmp: refine skb allocations")
Link: https://lore.kernel.org/netdev/d273628df80f45428e739274ab9ecb72@AcuMS.aculab.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Cc: Kyle Zeng <zengyhkyle@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kcm_exit_net() should call mutex_destroy() on knet->mutex. This is especially
needed if CONFIG_DEBUG_MUTEXES is enabled.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/r/20230902170708.1727999-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:
tc qdisc add dev lo root handle 1: qfq
tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
tc qdisc add dev lo parent 1:1 handle 2: plug
tc filter add dev lo parent 1: basic classid 1:1
ping -c1 127.0.0.1
and boom:
[ 285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[ 285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[ 285.355903]
[ 285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[ 285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 285.358376] Call Trace:
[ 285.358773] <IRQ>
[ 285.359109] dump_stack_lvl+0x44/0x60
[ 285.359708] print_address_description.constprop.0+0x2c/0x3c0
[ 285.360611] kasan_report+0x10c/0x120
[ 285.361195] ? qfq_dequeue+0xa7/0x7f0
[ 285.361780] qfq_dequeue+0xa7/0x7f0
[ 285.362342] __qdisc_run+0xf1/0x970
[ 285.362903] net_tx_action+0x28e/0x460
[ 285.363502] __do_softirq+0x11b/0x3de
[ 285.364097] do_softirq.part.0+0x72/0x90
[ 285.364721] </IRQ>
[ 285.365072] <TASK>
[ 285.365422] __local_bh_enable_ip+0x77/0x90
[ 285.366079] __dev_queue_xmit+0x95f/0x1550
[ 285.366732] ? __pfx_csum_and_copy_from_iter+0x10/0x10
[ 285.367526] ? __pfx___dev_queue_xmit+0x10/0x10
[ 285.368259] ? __build_skb_around+0x129/0x190
[ 285.368960] ? ip_generic_getfrag+0x12c/0x170
[ 285.369653] ? __pfx_ip_generic_getfrag+0x10/0x10
[ 285.370390] ? csum_partial+0x8/0x20
[ 285.370961] ? raw_getfrag+0xe5/0x140
[ 285.371559] ip_finish_output2+0x539/0xa40
[ 285.372222] ? __pfx_ip_finish_output2+0x10/0x10
[ 285.372954] ip_output+0x113/0x1e0
[ 285.373512] ? __pfx_ip_output+0x10/0x10
[ 285.374130] ? icmp_out_count+0x49/0x60
[ 285.374739] ? __pfx_ip_finish_output+0x10/0x10
[ 285.375457] ip_push_pending_frames+0xf3/0x100
[ 285.376173] raw_sendmsg+0xef5/0x12d0
[ 285.376760] ? do_syscall_64+0x40/0x90
[ 285.377359] ? __static_call_text_end+0x136578/0x136578
[ 285.378173] ? do_syscall_64+0x40/0x90
[ 285.378772] ? kasan_enable_current+0x11/0x20
[ 285.379469] ? __pfx_raw_sendmsg+0x10/0x10
[ 285.380137] ? __sock_create+0x13e/0x270
[ 285.380673] ? __sys_socket+0xf3/0x180
[ 285.381174] ? __x64_sys_socket+0x3d/0x50
[ 285.381725] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 285.382425] ? __rcu_read_unlock+0x48/0x70
[ 285.382975] ? ip4_datagram_release_cb+0xd8/0x380
[ 285.383608] ? __pfx_ip4_datagram_release_cb+0x10/0x10
[ 285.384295] ? preempt_count_sub+0x14/0xc0
[ 285.384844] ? __list_del_entry_valid+0x76/0x140
[ 285.385467] ? _raw_spin_lock_bh+0x87/0xe0
[ 285.386014] ? __pfx__raw_spin_lock_bh+0x10/0x10
[ 285.386645] ? release_sock+0xa0/0xd0
[ 285.387148] ? preempt_count_sub+0x14/0xc0
[ 285.387712] ? freeze_secondary_cpus+0x348/0x3c0
[ 285.388341] ? aa_sk_perm+0x177/0x390
[ 285.388856] ? __pfx_aa_sk_perm+0x10/0x10
[ 285.389441] ? check_stack_object+0x22/0x70
[ 285.390032] ? inet_send_prepare+0x2f/0x120
[ 285.390603] ? __pfx_inet_sendmsg+0x10/0x10
[ 285.391172] sock_sendmsg+0xcc/0xe0
[ 285.391667] __sys_sendto+0x190/0x230
[ 285.392168] ? __pfx___sys_sendto+0x10/0x10
[ 285.392727] ? kvm_clock_get_cycles+0x14/0x30
[ 285.393328] ? set_normalized_timespec64+0x57/0x70
[ 285.393980] ? _raw_spin_unlock_irq+0x1b/0x40
[ 285.394578] ? __x64_sys_clock_gettime+0x11c/0x160
[ 285.395225] ? __pfx___x64_sys_clock_gettime+0x10/0x10
[ 285.395908] ? _copy_to_user+0x3e/0x60
[ 285.396432] ? exit_to_user_mode_prepare+0x1a/0x120
[ 285.397086] ? syscall_exit_to_user_mode+0x22/0x50
[ 285.397734] ? do_syscall_64+0x71/0x90
[ 285.398258] __x64_sys_sendto+0x74/0x90
[ 285.398786] do_syscall_64+0x64/0x90
[ 285.399273] ? exit_to_user_mode_prepare+0x1a/0x120
[ 285.399949] ? syscall_exit_to_user_mode+0x22/0x50
[ 285.400605] ? do_syscall_64+0x71/0x90
[ 285.401124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 285.401807] RIP: 0033:0x495726
[ 285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[ 285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[ 285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[ 285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[ 285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[ 285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[ 285.410403] </TASK>
[ 285.410704]
[ 285.410929] Allocated by task 144:
[ 285.411402] kasan_save_stack+0x1e/0x40
[ 285.411926] kasan_set_track+0x21/0x30
[ 285.412442] __kasan_slab_alloc+0x55/0x70
[ 285.412973] kmem_cache_alloc_node+0x187/0x3d0
[ 285.413567] __alloc_skb+0x1b4/0x230
[ 285.414060] __ip_append_data+0x17f7/0x1b60
[ 285.414633] ip_append_data+0x97/0xf0
[ 285.415144] raw_sendmsg+0x5a8/0x12d0
[ 285.415640] sock_sendmsg+0xcc/0xe0
[ 285.416117] __sys_sendto+0x190/0x230
[ 285.416626] __x64_sys_sendto+0x74/0x90
[ 285.417145] do_syscall_64+0x64/0x90
[ 285.417624] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 285.418306]
[ 285.418531] Freed by task 144:
[ 285.418960] kasan_save_stack+0x1e/0x40
[ 285.419469] kasan_set_track+0x21/0x30
[ 285.419988] kasan_save_free_info+0x27/0x40
[ 285.420556] ____kasan_slab_free+0x109/0x1a0
[ 285.421146] kmem_cache_free+0x1c2/0x450
[ 285.421680] __netif_receive_skb_core+0x2ce/0x1870
[ 285.422333] __netif_receive_skb_one_core+0x97/0x140
[ 285.423003] process_backlog+0x100/0x2f0
[ 285.423537] __napi_poll+0x5c/0x2d0
[ 285.424023] net_rx_action+0x2be/0x560
[ 285.424510] __do_softirq+0x11b/0x3de
[ 285.425034]
[ 285.425254] The buggy address belongs to the object at ffff8880bad31280
[ 285.425254] which belongs to the cache skbuff_head_cache of size 224
[ 285.426993] The buggy address is located 40 bytes inside of
[ 285.426993] freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[ 285.428572]
[ 285.428798] The buggy address belongs to the physical page:
[ 285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[ 285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[ 285.431447] page_type: 0xffffffff()
[ 285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
[ 285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[ 285.433562] page dumped because: kasan: bad access detected
[ 285.434144]
[ 285.434320] Memory state around the buggy address:
[ 285.434828] ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 285.435580] ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 285.436777] ^
[ 285.437106] ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 285.437616] ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 285.438126] ==================================================================
[ 285.438662] Disabling lock debugging due to kernel taint
Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: valis <sec@valis.email>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As with sk->sk_shutdown shown in the previous patch, sk->sk_err can be
read locklessly by unix_dgram_sendmsg().
Let's use READ_ONCE() for sk_err as well.
Note that the writer side is marked by commit cc04410af7de ("af_unix:
annotate lockless accesses to sk->sk_err").
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sk->sk_shutdown is changed under unix_state_lock(sk), but
unix_dgram_sendmsg() calls two functions to read sk_shutdown locklessly.
sock_alloc_send_pskb
`- sock_wait_for_wmem
Let's use READ_ONCE() there.
Note that the writer side was marked by commit e1d09c2c2f57 ("af_unix:
Fix data races around sk->sk_shutdown.").
BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock
write (marked) to 0xffff8880069af12c of 1 bytes by task 1 on cpu 1:
unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
unix_release+0x59/0x80 net/unix/af_unix.c:1053
__sock_release+0x7d/0x170 net/socket.c:654
sock_close+0x19/0x30 net/socket.c:1386
__fput+0x2a3/0x680 fs/file_table.c:384
____fput+0x15/0x20 fs/file_table.c:412
task_work_run+0x116/0x1a0 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
read to 0xffff8880069af12c of 1 bytes by task 28650 on cpu 0:
sock_alloc_send_pskb+0xd2/0x620 net/core/sock.c:2767
unix_dgram_sendmsg+0x2f8/0x14f0 net/unix/af_unix.c:1944
unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:748
____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
___sys_sendmsg+0xc6/0x140 net/socket.c:2548
__sys_sendmsg+0x94/0x140 net/socket.c:2577
__do_sys_sendmsg net/socket.c:2586 [inline]
__se_sys_sendmsg net/socket.c:2584 [inline]
__x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
value changed: 0x00 -> 0x03
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 28650 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
unix_tot_inflight is changed under spin_lock(unix_gc_lock), but
unix_release_sock() reads it locklessly.
Let's use READ_ONCE() for unix_tot_inflight.
Note that the writer side was marked by commit 9d6d7f1cb67c ("af_unix:
annote lockless accesses to unix_tot_inflight & gc_in_progress")
BUG: KCSAN: data-race in unix_inflight / unix_release_sock
write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1:
unix_inflight+0x130/0x180 net/unix/scm.c:64
unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123
unix_scm_to_skb net/unix/af_unix.c:1832 [inline]
unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:747
____sys_sendmsg+0x4e4/0x610 net/socket.c:2493
___sys_sendmsg+0xc6/0x140 net/socket.c:2547
__sys_sendmsg+0x94/0x140 net/socket.c:2576
__do_sys_sendmsg net/socket.c:2585 [inline]
__se_sys_sendmsg net/socket.c:2583 [inline]
__x64_sys_sendmsg+0x45/0x50 net/socket.c:2583
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0:
unix_release_sock+0x608/0x910 net/unix/af_unix.c:671
unix_release+0x59/0x80 net/unix/af_unix.c:1058
__sock_release+0x7d/0x170 net/socket.c:653
sock_close+0x19/0x30 net/socket.c:1385
__fput+0x179/0x5e0 fs/file_table.c:321
____fput+0x15/0x20 fs/file_table.c:349
task_work_run+0x116/0x1a0 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x00000000 -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 9305cfa4443d ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
user->unix_inflight is changed under spin_lock(unix_gc_lock),
but too_many_unix_fds() reads it locklessly.
Let's annotate the write/read accesses to user->unix_inflight.
BUG: KCSAN: data-race in unix_attach_fds / unix_inflight
write to 0xffffffff8546f2d0 of 8 bytes by task 44798 on cpu 1:
unix_inflight+0x157/0x180 net/unix/scm.c:66
unix_attach_fds+0x147/0x1e0 net/unix/scm.c:123
unix_scm_to_skb net/unix/af_unix.c:1827 [inline]
unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950
unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:748
____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
___sys_sendmsg+0xc6/0x140 net/socket.c:2548
__sys_sendmsg+0x94/0x140 net/socket.c:2577
__do_sys_sendmsg net/socket.c:2586 [inline]
__se_sys_sendmsg net/socket.c:2584 [inline]
__x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
read to 0xffffffff8546f2d0 of 8 bytes by task 44814 on cpu 0:
too_many_unix_fds net/unix/scm.c:101 [inline]
unix_attach_fds+0x54/0x1e0 net/unix/scm.c:110
unix_scm_to_skb net/unix/af_unix.c:1827 [inline]
unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950
unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:748
____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
___sys_sendmsg+0xc6/0x140 net/socket.c:2548
__sys_sendmsg+0x94/0x140 net/socket.c:2577
__do_sys_sendmsg net/socket.c:2586 [inline]
__se_sys_sendmsg net/socket.c:2584 [inline]
__x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
value changed: 0x000000000000000c -> 0x000000000000000d
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 44814 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 712f4aad406b ("unix: properly account for FDs passed over unix sockets")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a race where skb's from the sk_psock_backlog can be referenced
after userspace side has already skb_consumed() the sk_buff and its refcnt
dropped to zer0 causing use after free.
The flow is the following:
while ((skb = skb_peek(&psock->ingress_skb))
sk_psock_handle_Skb(psock, skb, ..., ingress)
if (!ingress) ...
sk_psock_skb_ingress
sk_psock_skb_ingress_enqueue(skb)
msg->skb = skb
sk_psock_queue_msg(psock, msg)
skb_dequeue(&psock->ingress_skb)
The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
what the application reads when recvmsg() is called. An application can
read this anytime after the msg is placed on the queue. The recvmsg hook
will also read msg->skb and then after user space reads the msg will call
consume_skb(skb) on it effectively free'ing it.
But, the race is in above where backlog queue still has a reference to
the skb and calls skb_dequeue(). If the skb_dequeue happens after the
user reads and free's the skb we have a use after free.
The !ingress case does not suffer from this problem because it uses
sendmsg_*(sk, msg) which does not pass the sk_buff further down the
stack.
The following splat was observed with 'test_progs -t sockmap_listen':
[ 1022.710250][ T2556] general protection fault, ...
[...]
[ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
[ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
[ 1022.713653][ T2556] Code: ...
[...]
[ 1022.720699][ T2556] Call Trace:
[ 1022.720984][ T2556] <TASK>
[ 1022.721254][ T2556] ? die_addr+0x32/0x80^M
[ 1022.721589][ T2556] ? exc_general_protection+0x25a/0x4b0
[ 1022.722026][ T2556] ? asm_exc_general_protection+0x22/0x30
[ 1022.722489][ T2556] ? skb_dequeue+0x4c/0x80
[ 1022.722854][ T2556] sk_psock_backlog+0x27a/0x300
[ 1022.723243][ T2556] process_one_work+0x2a7/0x5b0
[ 1022.723633][ T2556] worker_thread+0x4f/0x3a0
[ 1022.723998][ T2556] ? __pfx_worker_thread+0x10/0x10
[ 1022.724386][ T2556] kthread+0xfd/0x130
[ 1022.724709][ T2556] ? __pfx_kthread+0x10/0x10
[ 1022.725066][ T2556] ret_from_fork+0x2d/0x50
[ 1022.725409][ T2556] ? __pfx_kthread+0x10/0x10
[ 1022.725799][ T2556] ret_from_fork_asm+0x1b/0x30
[ 1022.726201][ T2556] </TASK>
To fix we add an skb_get() before passing the skb to be enqueued in the
engress queue. This bumps the skb->users refcnt so that consume_skb()
and kfree_skb will not immediately free the sk_buff. With this we can
be sure the skb is still around when we do the dequeue. Then we just
need to decrement the refcnt or free the skb in the backlog case which
we do by calling kfree_skb() on the ingress case as well as the sendmsg
case.
Before locking change from fixes tag we had the sock locked so we
couldn't race with user and there was no issue here.
Fixes: 799aa7f98d53e ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Xu Kuohai <xukuohai@huawei.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
|
|
The existing code incorrectly casted a negative value (the result of a
subtraction) to an unsigned value without checking. For example, if
/proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred
lifetime would jump to 4 billion seconds. On my machine and network the
shortest lifetime that avoided underflow was 3 seconds.
Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR")
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|