Age | Commit message (Collapse) | Author |
|
Sabrina Dubroca says:
====================
tls: a few more fixes for async decrypt
The previous patchset [1] took care of "full async". This adds a few
fixes for cases where only part of the crypto operations go the async
route, found by extending my previous debug patch [2] to do N
synchronous operations followed by M asynchronous ops (with N and M
configurable).
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=823784&state=*
[2] https://lore.kernel.org/all/9d664093b1bf7f47497b2c40b3a085b45f3274a2.1694021240.git.sd@queasysnail.net/
====================
Link: https://lore.kernel.org/r/cover.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the decrypt request goes to the backlog and crypto_aead_decrypt
returns -EBUSY, tls_do_decryption will wait until all async
decryptions have completed. If one of them fails, tls_do_decryption
will return -EBADMSG and tls_decrypt_sg jumps to the error path,
releasing all the pages. But the pages have been passed to the async
callback, and have already been released by tls_decrypt_done.
The only true async case is when crypto_aead_decrypt returns
-EINPROGRESS. With -EBUSY, we already waited so we can tell
tls_sw_recvmsg that the data is available for immediate copy, but we
need to notify tls_decrypt_sg (via the new ->async_done flag) that the
memory has already been released.
Fixes: 859054147318 ("net: tls: handle backlogging of crypto requests")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/4755dd8d9bebdefaa19ce1439b833d6199d4364c.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we're not doing async, the handling is much simpler. There's no
reference counting, we just need to wait for the completion to wake us
up and return its result.
We should preferably also use a separate crypto_wait. I'm not seeing a
UAF as I did in the past, I think aec7961916f3 ("tls: fix race between
async notify and socket close") took care of it.
This will make the next fix easier.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/47bde5f649707610eaef9f0d679519966fc31061.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we peek from 2 records with a currently empty rx_list, and the
first record is decrypted synchronously but the second record is
decrypted async, the following happens:
1. decrypt record 1 (sync)
2. copy from record 1 to the userspace's msg
3. queue the decrypted record to rx_list for future read(!PEEK)
4. decrypt record 2 (async)
5. queue record 2 to rx_list
6. call process_rx_list to copy data from the 2nd record
We currently pass copied=0 as skip offset to process_rx_list, so we
end up copying once again from the first record. We should skip over
the data we've already copied.
Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs
Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/1b132d2b2b99296bfde54e8a67672d90d6d16e71.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With mixed sync/async decryption, or failures of crypto_aead_decrypt,
we increment decrypt_pending but we never do the corresponding
decrement since tls_decrypt_done will not be called. In this case, we
should decrement decrypt_pending immediately to avoid getting stuck.
For example, the prequeue prequeue test gets stuck with mixed
modes (one async decrypt + one sync decrypt).
Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The gtp_link_ops operations structure for the subsystem must be
registered after registering the gtp_net_ops pernet operations structure.
Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
[ 1010.702740] gtp: GTP module unloaded
[ 1010.715877] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI
[ 1010.715888] KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
[ 1010.715895] CPU: 1 PID: 128616 Comm: a.out Not tainted 6.8.0-rc6-std-def-alt1 #1
[ 1010.715899] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014
[ 1010.715908] RIP: 0010:gtp_newlink+0x4d7/0x9c0 [gtp]
[ 1010.715915] Code: 80 3c 02 00 0f 85 41 04 00 00 48 8b bb d8 05 00 00 e8 ed f6 ff ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4f 04 00 00 4c 89 e2 4c 8b 6d 00 48 b8 00 00 00
[ 1010.715920] RSP: 0018:ffff888020fbf180 EFLAGS: 00010203
[ 1010.715929] RAX: dffffc0000000000 RBX: ffff88800399c000 RCX: 0000000000000000
[ 1010.715933] RDX: 0000000000000001 RSI: ffffffff84805280 RDI: 0000000000000282
[ 1010.715938] RBP: 000000000000000d R08: 0000000000000001 R09: 0000000000000000
[ 1010.715942] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88800399cc80
[ 1010.715947] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000400
[ 1010.715953] FS: 00007fd1509ab5c0(0000) GS:ffff88805b300000(0000) knlGS:0000000000000000
[ 1010.715958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1010.715962] CR2: 0000000000000000 CR3: 000000001c07a000 CR4: 0000000000750ee0
[ 1010.715968] PKRU: 55555554
[ 1010.715972] Call Trace:
[ 1010.715985] ? __die_body.cold+0x1a/0x1f
[ 1010.715995] ? die_addr+0x43/0x70
[ 1010.716002] ? exc_general_protection+0x199/0x2f0
[ 1010.716016] ? asm_exc_general_protection+0x1e/0x30
[ 1010.716026] ? gtp_newlink+0x4d7/0x9c0 [gtp]
[ 1010.716034] ? gtp_net_exit+0x150/0x150 [gtp]
[ 1010.716042] __rtnl_newlink+0x1063/0x1700
[ 1010.716051] ? rtnl_setlink+0x3c0/0x3c0
[ 1010.716063] ? is_bpf_text_address+0xc0/0x1f0
[ 1010.716070] ? kernel_text_address.part.0+0xbb/0xd0
[ 1010.716076] ? __kernel_text_address+0x56/0xa0
[ 1010.716084] ? unwind_get_return_address+0x5a/0xa0
[ 1010.716091] ? create_prof_cpu_mask+0x30/0x30
[ 1010.716098] ? arch_stack_walk+0x9e/0xf0
[ 1010.716106] ? stack_trace_save+0x91/0xd0
[ 1010.716113] ? stack_trace_consume_entry+0x170/0x170
[ 1010.716121] ? __lock_acquire+0x15c5/0x5380
[ 1010.716139] ? mark_held_locks+0x9e/0xe0
[ 1010.716148] ? kmem_cache_alloc_trace+0x35f/0x3c0
[ 1010.716155] ? __rtnl_newlink+0x1700/0x1700
[ 1010.716160] rtnl_newlink+0x69/0xa0
[ 1010.716166] rtnetlink_rcv_msg+0x43b/0xc50
[ 1010.716172] ? rtnl_fdb_dump+0x9f0/0x9f0
[ 1010.716179] ? lock_acquire+0x1fe/0x560
[ 1010.716188] ? netlink_deliver_tap+0x12f/0xd50
[ 1010.716196] netlink_rcv_skb+0x14d/0x440
[ 1010.716202] ? rtnl_fdb_dump+0x9f0/0x9f0
[ 1010.716208] ? netlink_ack+0xab0/0xab0
[ 1010.716213] ? netlink_deliver_tap+0x202/0xd50
[ 1010.716220] ? netlink_deliver_tap+0x218/0xd50
[ 1010.716226] ? __virt_addr_valid+0x30b/0x590
[ 1010.716233] netlink_unicast+0x54b/0x800
[ 1010.716240] ? netlink_attachskb+0x870/0x870
[ 1010.716248] ? __check_object_size+0x2de/0x3b0
[ 1010.716254] netlink_sendmsg+0x938/0xe40
[ 1010.716261] ? netlink_unicast+0x800/0x800
[ 1010.716269] ? __import_iovec+0x292/0x510
[ 1010.716276] ? netlink_unicast+0x800/0x800
[ 1010.716284] __sock_sendmsg+0x159/0x190
[ 1010.716290] ____sys_sendmsg+0x712/0x880
[ 1010.716297] ? sock_write_iter+0x3d0/0x3d0
[ 1010.716304] ? __ia32_sys_recvmmsg+0x270/0x270
[ 1010.716309] ? lock_acquire+0x1fe/0x560
[ 1010.716315] ? drain_array_locked+0x90/0x90
[ 1010.716324] ___sys_sendmsg+0xf8/0x170
[ 1010.716331] ? sendmsg_copy_msghdr+0x170/0x170
[ 1010.716337] ? lockdep_init_map_type+0x2c7/0x860
[ 1010.716343] ? lockdep_hardirqs_on_prepare+0x430/0x430
[ 1010.716350] ? debug_mutex_init+0x33/0x70
[ 1010.716360] ? percpu_counter_add_batch+0x8b/0x140
[ 1010.716367] ? lock_acquire+0x1fe/0x560
[ 1010.716373] ? find_held_lock+0x2c/0x110
[ 1010.716384] ? __fd_install+0x1b6/0x6f0
[ 1010.716389] ? lock_downgrade+0x810/0x810
[ 1010.716396] ? __fget_light+0x222/0x290
[ 1010.716403] __sys_sendmsg+0xea/0x1b0
[ 1010.716409] ? __sys_sendmsg_sock+0x40/0x40
[ 1010.716419] ? lockdep_hardirqs_on_prepare+0x2b3/0x430
[ 1010.716425] ? syscall_enter_from_user_mode+0x1d/0x60
[ 1010.716432] do_syscall_64+0x30/0x40
[ 1010.716438] entry_SYSCALL_64_after_hwframe+0x62/0xc7
[ 1010.716444] RIP: 0033:0x7fd1508cbd49
[ 1010.716452] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ef 70 0d 00 f7 d8 64 89 01 48
[ 1010.716456] RSP: 002b:00007fff18872348 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 1010.716463] RAX: ffffffffffffffda RBX: 000055f72bf0eac0 RCX: 00007fd1508cbd49
[ 1010.716468] RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000006
[ 1010.716473] RBP: 00007fff18872360 R08: 00007fff18872360 R09: 00007fff18872360
[ 1010.716478] R10: 00007fff18872360 R11: 0000000000000202 R12: 000055f72bf0e1b0
[ 1010.716482] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1010.716491] Modules linked in: gtp(+) udp_tunnel ib_core uinput af_packet rfkill qrtr joydev hid_generic usbhid hid kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm snd_hda_codec_generic ledtrig_audio irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel nls_utf8 snd_intel_dspcfg nls_cp866 psmouse aesni_intel vfat crypto_simd fat cryptd glue_helper snd_hda_codec pcspkr snd_hda_core i2c_i801 snd_hwdep i2c_smbus xhci_pci snd_pcm lpc_ich xhci_pci_renesas xhci_hcd qemu_fw_cfg tiny_power_button button sch_fq_codel vboxvideo drm_vram_helper drm_ttm_helper ttm vboxsf vboxguest snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device snd_timer snd soundcore msr fuse efi_pstore dm_mod ip_tables x_tables autofs4 virtio_gpu virtio_dma_buf drm_kms_helper cec rc_core drm virtio_rng virtio_scsi rng_core virtio_balloon virtio_blk virtio_net virtio_console net_failover failover ahci libahci libata evdev scsi_mod input_leds serio_raw virtio_pci intel_agp
[ 1010.716674] virtio_ring intel_gtt virtio [last unloaded: gtp]
[ 1010.716693] ---[ end trace 04990a4ce61e174b ]---
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Ofitserov <oficerovas@altlinux.org>
Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240228114703.465107-1-oficerovas@altlinux.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.
Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
packets.
There is a day 0 bug in br_netfilter when used with connection tracking.
Conntrack assumes that an nf_conn structure that is not yet added to
hash table ("unconfirmed"), is only visible by the current cpu that is
processing the sk_buff.
For bridge this isn't true, sk_buff can get cloned in between, and
clones can be processed in parallel on different cpu.
This patch disables NAT and conntrack helpers for multicast packets.
Patch #3 adds a selftest to cover for the br_netfilter bug.
netfilter pull request 24-02-29
* tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
====================
Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Current HSR implementation uses following supervisory frame (even for
HSRv1 the HSR tag is not is not present):
00000000: 01 15 4e 00 01 2d XX YY ZZ 94 77 10 88 fb 00 01
00000010: 7e 1c 17 06 XX YY ZZ 94 77 10 1e 06 XX YY ZZ 94
00000020: 77 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 00 00 00 00 00 00 00 00
The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv))
when offset for skb_pull() is calculated.
This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp'
already have 'struct hsr_sup_tag' defined in them, so there is no need
for adding extra two bytes.
This code was working correctly as with no RedBox support, the check for
HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to
zeroed padded bytes for minimal packet size.
Fixes: eafaa88b3eb7 ("net: hsr: Add support for redbox supervision frames")
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240228085644.3618044-1-lukma@denx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The i211 requires the same PTP timestamp adjustments as the i210,
according to its datasheet. To ensure consistent timestamping across
different platforms, this change extends the existing adjustments to
include the i211.
The adjustment result are tested and comparable for i210 and i211 based
systems.
Fixes: 3f544d2a4d5c ("igb: adjust PTP timestamps for Tx/Rx latency")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240227184942.362710-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks
IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic
in the function `rtnl_bridge_setlink` to enable the loop to also check
the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment
removed the `break` statement and led to an error logic of the flags
writing back at the end of this function.
if (have_flags)
memcpy(nla_data(attr), &flags, sizeof(flags));
// attr should point to IFLA_BRIDGE_FLAGS NLA !!!
Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS.
However, this is not necessarily true fow now as the updated loop will let
the attr point to the last NLA, even an invalid NLA which could cause
overflow writes.
This patch introduces a new variable `br_flag` to save the NLA pointer
that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned
error logic.
Fixes: d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240227121128.608110-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We never increment the group number iterator, so all groups
get recorded into index 0 of the mcast_groups[] array.
As a result YNL can only handle using the last group.
For example using the "netdev" sample on kernel with
page pool commands results in:
$ ./samples/netdev
YNL: Multicast group 'mgmt' not found
Most families have only one multicast group, so this hasn't
been noticed. Plus perhaps developers usually test the last
group which would have worked.
Fixes: 86878f14d71a ("tools: ynl: user space helpers")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add test case for multicast packet confirm race.
Without preceding patch, this should result in:
WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0
Workqueue: events_unbound macvlan_process_broadcast
RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0
? __nf_conntrack_confirm+0x3ed/0x5f0
nf_confirm+0x2ad/0x2d0
nf_hook_slow+0x36/0xd0
ip_local_deliver+0xce/0x110
__netif_receive_skb_one_core+0x4f/0x70
process_backlog+0x8c/0x130
[..]
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
conntrack nf_confirm logic cannot handle cloned skbs referencing
the same nf_conn entry, which will happen for multicast (broadcast)
frames on bridges.
Example:
macvlan0
|
br0
/ \
ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing
an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
-> skb->_nfct now references a unconfirmed entry
2. skb is broad/mcast packet. bridge now passes clones out on each bridge
interface.
3. skb gets passed up the stack.
4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the
original skb. The macvlan rx handler then returns RX_HANDLER_PASS.
5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that
case later steps perform skb_clone() with skb->_nfct already confirmed (in
hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the
packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful
nat, so we can safely discard the nf_conn entry and let inet call
conntrack again.
This doesn't work for bridge netfilter: skb could have a nat
transformation. Also bridge nf prevents re-invocation of inet prerouting
via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN
time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with
unconfirmed packets, but even if that could be done in a sane way this
opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4
-m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be
created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic
NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass
them, ruleset might have accept rules for untracked traffic already,
so user-visible behaviour would change.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") added
some validation of NFPROTO_* families in the nft_compat module, but it broke
the ability to use legacy iptables modules in dual-stack nftables.
While with legacy iptables one had to independently manage IPv4 and IPv6
tables, with nftables it is possible to have dual-stack tables sharing the
rules. Moreover, it was possible to use rules based on legacy iptables
match/target modules in dual-stack nftables.
As an example, the program from [2] creates an INET dual-stack family table
using an xt_bpf based rule, which looks like the following (the actual output
was generated with a patched nft tool as the current nft tool does not parse
dual stack tables with legacy match rules, so consider it for illustrative
purposes only):
table inet testfw {
chain input {
type filter hook prerouting priority filter; policy accept;
bytecode counter packets 0 bytes 0 accept
}
}
After d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") we get
EOPNOTSUPP for the above program.
Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also
restrict the functions to classic iptables hooks.
Changes in v3:
* clarify that upstream nft will not display such configuration properly and
that the output was generated with a patched nft tool
* remove example program from commit description and link to it instead
* no code changes otherwise
Changes in v2:
* restrict nft_(match/target)_validate() to classic iptables hooks
* rewrite example program to use unmodified libnftnl
Fixes: d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family")
Link: https://lore.kernel.org/all/Zc1PfoWN38UuFJRI@calendula/T/#mc947262582c90fec044c7a3398cc92fac7afea72 [1]
Link: https://lore.kernel.org/all/20240220145509.53357-1-ignat@cloudflare.com/ [2]
Reported-by: Jordan Griege <jgriege@cloudflare.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The fast path usage breakdown describes the detail for 'inet_sock', fix
the markup title.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently when suspending driver and stopping workqueue it is checked whether
workqueue is not NULL and if so, it is destroyed.
Function destroy_workqueue() does drain queue and does clear variable, but
it does not set workqueue variable to NULL. This can cause kernel/module
panic if code attempts to clear workqueue that was not initialized.
This scenario is possible when resuming suspended driver in stmmac_resume(),
because there is no handling for failed stmmac_hw_setup(),
which can fail and return if DMA engine has failed to initialize,
and workqueue is initialized after DMA engine.
Should DMA engine fail to initialize, resume will proceed normally,
but interface won't work and TX queue will eventually timeout,
causing 'Reset adapter' error.
This then does destroy workqueue during reset process.
And since workqueue is initialized after DMA engine and can be skipped,
it will cause kernel/module panic.
To secure against this possible crash, set workqueue variable to NULL when
destroying workqueue.
Log/backtrace from crash goes as follows:
[88.031977]------------[ cut here ]------------
[88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
[88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
<Skipping backtrace for watchdog timeout>
[88.032251]---[ end trace e70de432e4d5c2c0 ]---
[88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
[88.036359]------------[ cut here ]------------
[88.036519]Call trace:
[88.036523] flush_workqueue+0x3e4/0x430
[88.036528] drain_workqueue+0xc4/0x160
[88.036533] destroy_workqueue+0x40/0x270
[88.036537] stmmac_fpe_stop_wq+0x4c/0x70
[88.036541] stmmac_release+0x278/0x280
[88.036546] __dev_close_many+0xcc/0x158
[88.036551] dev_close_many+0xbc/0x190
[88.036555] dev_close.part.0+0x70/0xc0
[88.036560] dev_close+0x24/0x30
[88.036564] stmmac_service_task+0x110/0x140
[88.036569] process_one_work+0x1d8/0x4a0
[88.036573] worker_thread+0x54/0x408
[88.036578] kthread+0x164/0x170
[88.036583] ret_from_fork+0x10/0x20
[88.036588]---[ end trace e70de432e4d5c2c1 ]---
[88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Correct type in the hsr_forward_do() comment.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a previous patch I added "select PHYLIB" at the wrong place for the
ADIN1110 driver symbol, so move it to its correct place under the
ADIN1110 kconfig symbol.
Fixes: a9f80df4f514 ("net: ethernet: adi: requires PHYLIB support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/lkml/77012b38-4b49-47f4-9a88-d773d52909ad@infradead.org/T/#m8ba397484738711edc0ad607b2c63ca02244e3c3
Cc: Lennart Franzen <lennart@lfdomain.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Cc: Nuno Sa <nuno.sa@analog.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.8-rc7
Few remaining fixes, hopefully the last wireless pull request to v6.8.
Two fixes to the stack and two to iwlwifi but no high priority fixes
this time.
* tag 'wireless-2024-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: only call drv_sta_rc_update for uploaded stations
MAINTAINERS: wifi: Add N: ath1*k entries to match .yaml files
MAINTAINERS: wifi: update Jeff Johnson e-mail address
wifi: iwlwifi: mvm: fix the TXF mapping for BZ devices
wifi: iwlwifi: mvm: ensure offloading TID queue exists
wifi: nl80211: reject iftype change with mesh ID change
====================
Link: https://lore.kernel.org/r/20240227135751.C5EC6C43390@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY
IANA allocation for IOAM", while RFC 9486 is available for some time
now. Just update the reference.
Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Disable BH around the call to napi_schedule() to avoid following
error:
NOHZ tick-stop error: local softirq work is pending, handler #08!!!
Fixes: ec4c7e12396b ("lan78xx: Introduce NAPI polling support")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240226110820.2113584-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently using plain XDP/ZC sockets on stmmac results in a kernel crash:
|[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
|[...]
|[ 255.822764] Call trace:
|[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38
The program counter indicates xsk_tx_metadata_complete(). It works on
compl->tx_timestamp, which is not set by xsk_tx_metadata_to_compl() due to
missing meta data. Therefore, call xsk_tx_metadata_complete() only when
meta data is actually used.
Tested on imx93 without XDP, with XDP and with XDP/ZC.
Fixes: 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC")
Suggested-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/netdev/87r0h7wg8u.fsf@kurt.kurt.home/
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20240222-stmmac_xdp-v2-1-4beee3a037e4@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MII code does not check the return value of mdio_read (among
others), and therefore no error code should be sent. A previous fix to
the use of an uninitialized variable propagates negative error codes,
that might lead to wrong operations by the MII library.
An example of such issues is the use of mii_nway_restart by the dm9601
driver. The mii_nway_restart function does not check the value returned
by mdio_read, which in this case might be a negative number which could
contain the exact bit the function checks (BMCR_ANENABLE = 0x1000).
Return zero in case of error, as it is common practice in users of
mdio_read to avoid wrong uses of the return value.
Fixes: 8f8abb863fa5 ("net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Peter Korsgaard <peter@korsgaard.com>
Link: https://lore.kernel.org/r/20240225-dm9601_ret_err-v1-1-02c1d959ea59@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
struct veth_rq is pretty large, 832B total without debug
options enabled. Since commit under Fixes we try to pre-allocate
enough queues for every possible CPU. Miao Wang reports that
this may lead to order-5 allocations which will fail in production.
Let the allocation fallback to vmalloc() and try harder.
These are the same flags we pass to netdev queue allocation.
Reported-and-tested-by: Miao Wang <shankerwangmiao@gmail.com>
Fixes: 9d3684c24a52 ("veth: create by default nr_possible_cpus queues")
Link: https://lore.kernel.org/all/5F52CAE2-2FB7-4712-95F1-3312FBBFA8DD@gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240223235908.693010-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Shannon Nelson says:
====================
ionic: PCI error handling fixes
These are a few things to make our PCI reset handling better.
====================
Link: https://lore.kernel.org/r/20240223222742.13923-1-shannon.nelson@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When rebuilding the lif after an FLR, be sure to restore the
current netdev features, not do the usual first time feature
init. This prevents losing user changes to things like TSO
or vlan tagging states.
Fixes: 45b84188a0a4 ("ionic: keep filters across FLR")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since we now have potential cases of NULL cmd_regs and info_regs
during a reset recovery, and left NULL if a reset recovery has
failed, we need to check that they exist before we use them.
Most of the cases were covered in the original patch where we
verify before doing the ioreadb() for health or cmd status.
However, we need to protect a few uses of io mem that could
be hit in error recovery or asynchronous threads calls as well
(e.g. ethtool or devlink handlers).
Fixes: 219e183272b4 ("ionic: no fw read when PCI reset failed")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
AER recovery handler can trigger a PCI Reset after tearing
down the device setup in the error detection handler. The PCI
Reset handler will also attempt to tear down the device setup,
and this second tear down needs to know that it doesn't need
to call pci_release_regions() a second time. We can clear
num_bars on tear down and use that to decide later if we need
to clear the resources. This prevents a harmless but disturbing
warning message
resource: Trying to free nonexistent resource <0xXXXXXXXXXX-0xXXXXXXXXXX>
Fixes: c3a910e1c47a ("ionic: fill out pci error handlers")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Matthieu Baerts says:
====================
mptcp: more misc. fixes for v6.8
This series includes 6 types of fixes:
- Patch 1 fixes v4 mapped in v6 addresses support for the userspace PM,
when asking to delete a subflow. It was done everywhere else, but not
there. Patch 2 validates the modification, thanks to a subtest in
mptcp_join.sh. These patches can be backported up to v5.19.
- Patch 3 is a small fix for a recent bug-fix patch, just to avoid
printing an irrelevant warning (pr_warn()) once. It can be backported
up to v5.6, alongside the bug-fix that has been introduced in the
v6.8-rc5.
- Patches 4 to 6 are fixes for bugs found by Paolo while working on
TCP_NOTSENT_LOWAT support for MPTCP. These fixes can improve the
performances in some cases. Patches can be backported up to v5.6,
v5.11 and v6.7 respectively.
- Patch 7 makes sure 'ss -M' is available when starting MPTCP Join
selftest as it is required for some subtests since v5.18.
- Patch 8 fixes a possible double-free on socket dismantle. The issue
always existed, but was unnoticed because it was not causing any
problem so far. This fix can be backported up to v5.6.
- Patch 9 is a fix for a very recent patch causing lockdep warnings in
subflow diag. The patch causing the regression -- which fixes another
issue present since v5.7 -- should be part of the future v6.8-rc6.
Patch 10 validates the modification, thanks to a new subtest in
diag.sh.
====================
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-0-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.
Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.
Be careful to cope with very slow environments.
Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzbot and Eric reported a lockdep splat in the subflow diag:
WARNING: possible circular locking dependency detected
6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted
syz-executor.2/24141 is trying to acquire lock:
ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
but task is already holding lock:
ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock
include/linux/spinlock.h:351 [inline]
ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at:
inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&h->lhash2[i].lock){+.+.}-{2:2}:
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743
inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261
__inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217
inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239
rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316
rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577
ops_init+0x352/0x610 net/core/net_namespace.c:136
__register_pernet_operations net/core/net_namespace.c:1214 [inline]
register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283
register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370
rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735
do_one_initcall+0x238/0x830 init/main.c:1236
do_initcall_level+0x157/0x210 init/main.c:1298
do_initcalls+0x3f/0x80 init/main.c:1314
kernel_init_freeable+0x42f/0x5d0 init/main.c:1551
kernel_init+0x1d/0x2a0 init/main.c:1441
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
-> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
lock_sock_fast include/net/sock.h:1723 [inline]
subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28
tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345
inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061
__inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263
inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371
netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264
__netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370
netlink_dump_start include/linux/netlink.h:338 [inline]
inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405
sock_diag_rcv_msg+0xe7/0x410
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
As noted by Eric we can break the lock dependency chain avoid
dumping any extended info for the mptcp subflow listener:
nothing actually useful is presented there.
Fixes: b8adb69a7d29 ("mptcp: fix lockless access in subflow ULP diag")
Cc: stable@vger.kernel.org
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89iJ=Oecw6OZDwmSYc9HJKQ_G32uN11L+oUcMu+TOD5Xiaw@mail.gmail.com/
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-9-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
when MPTCP server accepts an incoming connection, it clones its listener
socket. However, the pointer to 'inet_opt' for the new socket has the same
value as the original one: as a consequence, on program exit it's possible
to observe the following splat:
BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0
Free of addr ffff888485950880 by task swapper/25/0
CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609
Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013
Call Trace:
<IRQ>
dump_stack_lvl+0x32/0x50
print_report+0xca/0x620
kasan_report_invalid_free+0x64/0x90
__kasan_slab_free+0x1aa/0x1f0
kfree+0xed/0x2e0
inet_sock_destruct+0x54f/0x8b0
__sk_destruct+0x48/0x5b0
rcu_do_batch+0x34e/0xd90
rcu_core+0x559/0xac0
__do_softirq+0x183/0x5a4
irq_exit_rcu+0x12d/0x170
sysvec_apic_timer_interrupt+0x6b/0x80
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:cpuidle_enter_state+0x175/0x300
Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b
RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000
RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588
RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080
R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0
R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80
cpuidle_enter+0x4a/0xa0
do_idle+0x310/0x410
cpu_startup_entry+0x51/0x60
start_secondary+0x211/0x270
secondary_startup_64_no_verify+0x184/0x18b
</TASK>
Allocated by task 6853:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
__kasan_kmalloc+0xa6/0xb0
__kmalloc+0x1eb/0x450
cipso_v4_sock_setattr+0x96/0x360
netlbl_sock_setattr+0x132/0x1f0
selinux_netlbl_socket_post_create+0x6c/0x110
selinux_socket_post_create+0x37b/0x7f0
security_socket_post_create+0x63/0xb0
__sock_create+0x305/0x450
__sys_socket_create.part.23+0xbd/0x130
__sys_socket+0x37/0xb0
__x64_sys_socket+0x6f/0xb0
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Freed by task 6858:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x12c/0x1f0
kfree+0xed/0x2e0
inet_sock_destruct+0x54f/0x8b0
__sk_destruct+0x48/0x5b0
subflow_ulp_release+0x1f0/0x250
tcp_cleanup_ulp+0x6e/0x110
tcp_v4_destroy_sock+0x5a/0x3a0
inet_csk_destroy_sock+0x135/0x390
tcp_fin+0x416/0x5c0
tcp_data_queue+0x1bc8/0x4310
tcp_rcv_state_process+0x15a3/0x47b0
tcp_v4_do_rcv+0x2c1/0x990
tcp_v4_rcv+0x41fb/0x5ed0
ip_protocol_deliver_rcu+0x6d/0x9f0
ip_local_deliver_finish+0x278/0x360
ip_local_deliver+0x182/0x2c0
ip_rcv+0xb5/0x1c0
__netif_receive_skb_one_core+0x16e/0x1b0
process_backlog+0x1e3/0x650
__napi_poll+0xa6/0x500
net_rx_action+0x740/0xbb0
__do_softirq+0x183/0x5a4
The buggy address belongs to the object at ffff888485950880
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
64-byte region [ffff888485950880, ffff8884859508c0)
The buggy address belongs to the physical page:
page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950
flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006
raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc
Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix
this by duplicating IP / IPv6 options after clone, so that
ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice.
Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections")
Cc: stable@vger.kernel.org
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP
sockets. So it must be checked if ss tool supports MPTCP in this script.
Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After the blamed commit below, the send buffer auto-tuning can
happen after that the mptcp_propagate_sndbuf() completes - via
the delegated action infrastructure.
We must check for write space even after such change or we risk
missing the wake-up event.
Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-6-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Such value should be inherited from the first subflow, but
passive sockets always used 'rsk_rcv_wnd'.
Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-5-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
when inserting not contiguous data in the subflow write queue,
the protocol creates a new skb and prevent the TCP stack from
merging it later with already queued skbs by setting the EOR marker.
Still no push flag is explicitly set at the end of previous GSO
packet, making the aggregation on the receiver side sub-optimal -
and packetdrill self-tests less predictable.
Explicitly mark the end of not contiguous DSS with the push flag.
Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After the 'Fixes' commit mentioned below, the client side might print
the following warning once when a subflow is fully established at the
reception of any valid additional ack:
MPTCP: bogus mpc option on established client sk
That's a normal situation, and no warning should be printed for that. We
can then skip the check when the label is used.
Fixes: e4a0fa47e816 ("mptcp: corner case locking for rx path fields initialization")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-3-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now both a v4 address and a v4-mapped address are supported when
destroying a userspace pm subflow, this patch adds a second subflow
to "userspace pm add & remove address" test, and two subflows could
be removed two different ways, one with the v4mapped and one with v4.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387
Fixes: 48d73f609dcc ("selftests: mptcp: update userspace pm addr tests")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Address family of server side mismatches with that of client side, like
in "userspace pm add & remove address" test:
userspace_pm_add_addr $ns1 10.0.2.1 10
userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED
That's because on the server side, the family is set to AF_INET6 and the
v4 address is mapped in a v6 one.
This patch fixes this issue. In mptcp_pm_nl_subflow_destroy_doit(), before
checking local address family with remote address family, map an IPv4
address to an IPv6 address if the pair is a v4-mapped address.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-1-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This fixes a possible UAF in if_nlmsg_size(),
which can run without RTNL.
Add rcu protection to "struct dpll_pin"
Move netdev_dpll_pin() from netdevice.h to dpll.h to
decrease name pollution.
Note: This looks possible to no longer acquire RTNL in
netdev_dpll_pin_assign() later in net-next.
v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko)
Fixes: 5f1842692880 ("netdev: expose DPLL pin handle for netdevice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not
present or not flashed, LAN7850 will fail to sync the speed detected by the PHY
with the MAC. In case link speed is 100Mbit, it will accidentally work,
otherwise no data can be transferred.
Better way would be to implement link_up callback, or set auto speed
configuration unconditionally. But this changes would be more intrusive.
So, for now, set it only if no EEPROM is found.
Fixes: e69647a19c87 ("lan78xx: Set ASD in MAC_CR when EEE is enabled.")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It seems that if userspace provides a correct IFA_TARGET_NETNSID value
but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr()
returns -EINVAL with an elevated "struct net" refcount.
Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test that we keep GRO flag in sync when XDP is disabled while
the device is closed.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
veth sets NETIF_F_GRO automatically when XDP is enabled,
because both features use the same NAPI machinery.
The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which
is called both on ndo_stop and when XDP is turned off.
To avoid the flag from being cleared when the device is brought
down, the clearing is skipped when IFF_UP is not set.
Bringing the device down should indeed not modify its features.
Unfortunately, this means that clearing is also skipped when
XDP is disabled _while_ the device is down. And there's nothing
on the open path to bring the device features back into sync.
IOW if user enables XDP, disables it and then brings the device
up we'll end up with a stray GRO flag set but no NAPI instances.
We don't depend on the GRO flag on the datapath, so the datapath
won't crash. We will crash (or hang), however, next time features
are sync'ed (either by user via ethtool or peer changing its config).
The GRO flag will go away, and veth will try to disable the NAPIs.
But the open path never created them since XDP was off, the GRO flag
was a stray. If NAPI was initialized before we'll hang in napi_disable().
If it never was we'll crash trying to stop uninitialized hrtimer.
Move the GRO flag updates to the XDP enable / disable paths,
instead of mixing them with the ndo_open / ndo_close paths.
Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: syzbot+039399a9b96297ddedca@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a queue(tfile) is detached, we only update tfile's queue_index,
but do not update xdp_rxq_info's queue_index. This patch fixes it.
Fixes: 8bf5c4ee1889 ("tun: setup xdp_rxq_info")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Link: https://lore.kernel.org/r/1708398727-46308-1-git-send-email-wangyunjian@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
6.8-rc1 had a copy-and-paste error where the pointer that holds the
allocated SKB (struct gelic_descr.skb) was set to NULL after the SKB was
allocated. This resulted in a kernel panic when the SKB pointer was
accessed.
This fix moves the initialization of the gelic_descr to before the SKB
is allocated.
Reported-by: sambat goson <sombat3960@gmail.com>
Fixes: 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures")
Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tree
Since commit 5d93cfcf7360 ("net: dpaa: Convert to phylink"), we support
the "10gbase-r" phy-mode through a driver-based conversion of "xgmii",
but we still don't actually support it when the device tree specifies
"10gbase-r" proper.
This is because boards such as LS1046A-RDB do not define pcs-handle-names
(for whatever reason) in the ethernet@f0000 device tree node, and the
code enters through this code path:
err = of_property_match_string(mac_node, "pcs-handle-names", "xfi");
// code takes neither branch and falls through
if (err >= 0) {
(...)
} else if (err != -EINVAL && err != -ENODATA) {
goto _return_fm_mac_free;
}
(...)
/* For compatibility, if pcs-handle-names is missing, we assume this
* phy is the first one in pcsphy-handle
*/
err = of_property_match_string(mac_node, "pcs-handle-names", "sgmii");
if (err == -EINVAL || err == -ENODATA)
pcs = memac_pcs_create(mac_node, 0); // code takes this branch
else if (err < 0)
goto _return_fm_mac_free;
else
pcs = memac_pcs_create(mac_node, err);
// A default PCS is created and saved in "pcs"
// This determination fails and mistakenly saves the default PCS
// memac->sgmii_pcs instead of memac->xfi_pcs, because at this
// stage, mac_dev->phy_if == PHY_INTERFACE_MODE_10GBASER.
if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
memac->xfi_pcs = pcs;
else
memac->sgmii_pcs = pcs;
In other words, in the absence of pcs-handle-names, the default
xfi_pcs assignment logic only works when in the device tree we have
PHY_INTERFACE_MODE_XGMII.
By reversing the order between the fallback xfi_pcs assignment and the
"xgmii" overwrite with "10gbase-r", we are able to support both values
in the device tree, with identical behavior.
Currently, it is impossible to make the s/xgmii/10gbase-r/ device tree
conversion, because it would break forward compatibility (new device
tree with old kernel). The only way to modify existing device trees to
phy-interface-mode = "10gbase-r" is to fix stable kernels to accept this
value and handle it properly.
One reason why the conversion is desirable is because with pre-phylink
kernels, the Aquantia PHY driver used to warn about the improper use
of PHY_INTERFACE_MODE_XGMII [1]. It is best to have a single (latest)
device tree that works with all supported stable kernel versions.
Note that the blamed commit does not constitute a regression per se.
Older stable kernels like 6.1 still do not work with "10gbase-r", but
for a different reason. That is a battle for another time.
[1] https://lore.kernel.org/netdev/20240214-ls1046-dts-use-10gbase-r-v1-1-8c2d68547393@concurrent-rt.com/
Fixes: 5d93cfcf7360 ("net: dpaa: Convert to phylink")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a station has not been uploaded yet, receiving SMPS or channel width
notification action frames can lead to rate_control_rate_update calling
drv_sta_rc_update with uninitialized driver private data.
Fix this by adding a missing check for sta->uploaded.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://msgid.link/20240221140535.16102-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Currently, mctp_local_output only takes ownership of skb on success, and
we may leak an skb if mctp_local_output fails in specific states; the
skb ownership isn't transferred until the actual output routing occurs.
Instead, make mctp_local_output free the skb on all error paths up to
the route action, so it always consumes the passed skb.
Fixes: 833ef3b91de6 ("mctp: Populate socket implementation")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-02-20 (ice)
This series contains updates to ice driver only.
Yochai sets parent device to properly reflect connection state between
source DPLL and output pin.
Arkadiusz fixes additional issues related to DPLL; proper reporting of
phase_adjust value and preventing use/access of data while resetting.
Amritha resolves ASSERT_RTNL() being triggered on certain reset/rebuild
flows.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix ASSERT_RTNL() warning during certain scenarios
ice: fix pin phase adjust updates on PF reset
ice: fix dpll periodic work data updates on PF reset
ice: fix dpll and dpll_pin data access on PF reset
ice: fix dpll input pin phase_adjust value updates
ice: fix connection state of DPLL and out pin
====================
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240220214444.1039759-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|