summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-02-15net: dsa: remove OF-based MDIO bus registration from DSA coreArınç ÜNAL
The code block under the "!ds->user_mii_bus && ds->ops->phy_read" check under dsa_switch_setup() populates ds->user_mii_bus. The use of ds->user_mii_bus is inappropriate when the MDIO bus of the switch is described on the device tree [1]. For this reason, use this code block only for switches [with MDIO bus] probed on platform_data, and OF which the switch MDIO bus isn't described on the device tree. Therefore, remove OF-based MDIO bus registration as it's useless for these cases. These subdrivers which control switches [with MDIO bus] probed on OF, will lose the ability to register the MDIO bus OF-based: drivers/net/dsa/b53/b53_common.c drivers/net/dsa/lan9303-core.c drivers/net/dsa/vitesse-vsc73xx-core.c These subdrivers let the DSA core driver register the bus: - ds->ops->phy_read() and ds->ops->phy_write() are present. - ds->user_mii_bus is not populated. The commit fe7324b93222 ("net: dsa: OF-ware slave_mii_bus") which brought OF-based MDIO bus registration on the DSA core driver is reasonably recent and, in this time frame, there have been no device trees in the Linux repository that started describing the MDIO bus, or dt-bindings defining the MDIO bus for the switches these subdrivers control. So I don't expect any devices to be affected. The logic we encourage is that all subdrivers should register the switch MDIO bus on their own [2]. And, for subdrivers which control switches [with MDIO bus] probed on OF, this logic must be followed to support all cases properly: No switch MDIO bus defined: Populate ds->user_mii_bus, register the MDIO bus, set the interrupts for PHYs if "interrupt-controller" is defined at the switch node. This case should only be covered for the switches which their dt-bindings documentation didn't document the MDIO bus from the start. This is to keep supporting the device trees that do not describe the MDIO bus on the device tree but the MDIO bus is being used nonetheless. Switch MDIO bus defined: Don't populate ds->user_mii_bus, register the MDIO bus, set the interrupts for PHYs if ["interrupt-controller" is defined at the switch node and "interrupts" is defined at the PHY nodes under the switch MDIO bus node]. Switch MDIO bus defined but explicitly disabled: If the device tree says status = "disabled" for the MDIO bus, we shouldn't need an MDIO bus at all. Instead, just exit as early as possible and do not call any MDIO API. After all subdrivers that control switches with MDIO buses are made to register the MDIO buses on their own, we will be able to get rid of dsa_switch_ops :: phy_read() and :: phy_write(), and the code block for registering the MDIO bus on the DSA core driver. Link: https://lore.kernel.org/netdev/20231213120656.x46fyad6ls7sqyzv@skbuf/ [1] Link: https://lore.kernel.org/netdev/20240103184459.dcbh57wdnlox6w7d@skbuf/ [2] Suggested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240213-for-netnext-dsa-mdio-bus-v2-1-0ff6f4823a9e@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15Merge branch 'for-thermal-genetlink-family-bind-unbind-callbacks'Jakub Kicinski
Stanislaw Gruszka says: ==================== thermal/netlink/intel_hfi: Enable HFI feature only when required The patchset introduces a new genetlink family bind/unbind callbacks and thermal/netlink notifications, which allow drivers to send netlink multicast events based on the presence of actual user-space consumers. This functionality optimizes resource usage by allowing disabling of features when not needed. v1: https://lore.kernel.org/linux-pm/20240131120535.933424-1-stanislaw.gruszka@linux.intel.com// v2: https://lore.kernel.org/linux-pm/20240206133605.1518373-1-stanislaw.gruszka@linux.intel.com/ v3: https://lore.kernel.org/linux-pm/20240209120625.1775017-1-stanislaw.gruszka@linux.intel.com/ ==================== Link: https://lore.kernel.org/r/20240212161615.161935-1-stanislaw.gruszka@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15genetlink: Add per family bind/unbind callbacksStanislaw Gruszka
Add genetlink family bind()/unbind() callbacks when adding/removing multicast group to/from netlink client socket via setsockopt() or bind() syscall. They can be used to track if consumers of netlink multicast messages emerge or disappear. Thus, a client implementing callbacks, can now send events only when there are active consumers, preventing unnecessary work when none exist. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240212161615.161935-2-stanislaw.gruszka@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15Merge tag 'net-6.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can, wireless and netfilter. Current release - regressions: - af_unix: fix task hung while purging oob_skb in GC - pds_core: do not try to run health-thread in VF path Current release - new code bugs: - sched: act_mirred: don't zero blockid when net device is being deleted Previous releases - regressions: - netfilter: - nat: restore default DNAT behavior - nf_tables: fix bidirectional offload, broken when unidirectional offload support was added - openvswitch: limit the number of recursions from action sets - eth: i40e: do not allow untrusted VF to remove administratively set MAC address Previous releases - always broken: - tls: fix races and bugs in use of async crypto - mptcp: prevent data races on some of the main socket fields, fix races in fastopen handling - dpll: fix possible deadlock during netlink dump operation - dsa: lan966x: fix crash when adding interface under a lag when some of the ports are disabled - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Misc: - a handful of fixes and reliability improvements for selftests - fix sysfs documentation missing net/ in paths - finish the work of squashing the missing MODULE_DESCRIPTION() warnings in networking" * tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits) net: fill in MODULE_DESCRIPTION()s for missing arcnet net: fill in MODULE_DESCRIPTION()s for mdio_devres net: fill in MODULE_DESCRIPTION()s for ppp net: fill in MODULE_DESCRIPTION()s for fddik/skfp net: fill in MODULE_DESCRIPTION()s for plip net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb net: fill in MODULE_DESCRIPTION()s for xen-netback net: ravb: Count packets instead of descriptors in GbEth RX path pppoe: Fix memory leak in pppoe_sendmsg() net: sctp: fix skb leak in sctp_inq_free() net: bcmasp: Handle RX buffer allocation failure net-timestamp: make sk_tskey more predictable in error path selftests: tls: increase the wait in poll_partial_rec_async ice: Add check for lport extraction to LAG init netfilter: nf_tables: fix bidirectional offload regression netfilter: nat: restore default DNAT behavior netfilter: nft_set_pipapo: fix missing : in kdoc igc: Remove temporary workaround igb: Fix string truncation warnings in igb_set_fw_version can: netlink: Fix TDCO calculation using the old data bittiming ...
2024-02-15net: sctp: fix skb leak in sctp_inq_free()Dmitry Antipov
In case of GSO, 'chunk->skb' pointer may point to an entry from fraglist created in 'sctp_packet_gso_append()'. To avoid freeing random fraglist entry (and so undefined behavior and/or memory leak), introduce 'sctp_inq_chunk_free()' helper to ensure that 'chunk->skb' is set to 'chunk->head_skb' (i.e. fraglist head) before calling 'sctp_chunk_free()', and use the aforementioned helper in 'sctp_inq_pop()' as well. Reported-by: syzbot+8bb053b5d63595ab47db@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=0d8351bbe54fd04a492c2daab0164138db008042 Fixes: 90017accff61 ("sctp: Add GSO support") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20240214082224.10168-1-dmantipov@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15net: ipv6/addrconf: clamp preferred_lft to the minimum requiredAlex Henrie
If the preferred lifetime was less than the minimum required lifetime, ipv6_create_tempaddr would error out without creating any new address. On my machine and network, this error happened immediately with the preferred lifetime set to 5 seconds or less, after a few minutes with the preferred lifetime set to 6 seconds, and not at all with the preferred lifetime set to 7 seconds. During my investigation, I found a Stack Exchange post from another person who seems to have had the same problem: They stopped getting new addresses if they lowered the preferred lifetime below 3 seconds, and they didn't really know why. The preferred lifetime is a preference, not a hard requirement. The kernel does not strictly forbid new connections on a deprecated address, nor does it guarantee that the address will be disposed of the instant its total valid lifetime expires. So rather than disable IPv6 privacy extensions altogether if the minimum required lifetime swells above the preferred lifetime, it is more in keeping with the user's intent to increase the temporary address's lifetime to the minimum necessary for the current network conditions. With these fixes, setting the preferred lifetime to 5 or 6 seconds "just works" because the extra fraction of a second is practically unnoticeable. It's even possible to reduce the time before deprecation to 1 or 2 seconds by setting /proc/sys/net/ipv6/conf/*/regen_min_advance and /proc/sys/net/ipv6/conf/*/dad_transmits to 0. I realize that that is a pretty niche use case, but I know at least one person who would gladly sacrifice performance and convenience to be sure that they are getting the maximum possible level of privacy. Link: https://serverfault.com/a/1031168/310447 Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15net: ipv6/addrconf: introduce a regen_min_advance sysctlAlex Henrie
In RFC 8981, REGEN_ADVANCE cannot be less than 2 seconds, and the RFC does not permit the creation of temporary addresses with lifetimes shorter than that: > When processing a Router Advertisement with a > Prefix Information option carrying a prefix for the purposes of > address autoconfiguration (i.e., the A bit is set), the host MUST > perform the following steps: > 5. A temporary address is created only if this calculated preferred > lifetime is greater than REGEN_ADVANCE time units. However, some users want to change their IPv6 address as frequently as possible regardless of the RFC's arbitrary minimum lifetime. For the benefit of those users, add a regen_min_advance sysctl parameter that can be set to below or above 2 seconds. Link: https://datatracker.ietf.org/doc/html/rfc8981 Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15net: ipv6/addrconf: ensure that regen_advance is at least 2 secondsAlex Henrie
RFC 8981 defines REGEN_ADVANCE as follows: REGEN_ADVANCE = 2 + (TEMP_IDGEN_RETRIES * DupAddrDetectTransmits * RetransTimer / 1000) Thus, allowing it to be less than 2 seconds is technically a protocol violation. Link: https://datatracker.ietf.org/doc/html/rfc8981#name-defined-protocol-parameters Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15tipc: Cleanup tipc_nl_bearer_add() error pathsShigeru Yoshida
Consolidate the error paths of tipc_nl_bearer_add() under the common label if the function holds rtnl_lock. Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/20240213134058.386123-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15tcp: no need to use acceptable for conn_requestJason Xing
Since tcp_conn_request() always returns zero, there is no need to keep the dead code. Remove it then. Link: https://lore.kernel.org/netdev/CANn89iJwx9b2dUGUKFSV3PF=kN5o+kxz3A_fHZZsOS4AnXhBNw@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240213131205.4309-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15Merge tag 'nf-24-02-15' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Missing : in kdoc field in nft_set_pipapo. 2) Restore default DNAT behavior When a DNAT rule is configured via iptables with different port ranges, from Kyle Swenson. 3) Restore flowtable hardware offload for bidirectional flows by setting NF_FLOW_HW_BIDIRECTIONAL flag, from Felix Fietkau. netfilter pull request 24-02-15 * tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: fix bidirectional offload regression netfilter: nat: restore default DNAT behavior netfilter: nft_set_pipapo: fix missing : in kdoc ==================== Link: https://lore.kernel.org/r/20240214233818.7946-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15Merge tag 'linux-can-fixes-for-6.8-20240214' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2024-02-14 this is a pull request of 3 patches for net/master. the first patch is by Ziqi Zhao and targets the CAN J1939 protocol, it fixes a potential deadlock by replacing the spinlock by an rwlock. Oleksij Rempel's patch adds a missing spin_lock_bh() to prevent a potential Use-After-Free in the CAN J1939's setsockopt(SO_J1939_FILTER). Maxime Jayat contributes a patch to fix the transceiver delay compensation (TDCO) calculation, which is needed for higher CAN-FD bit rates (usually 2Mbit/s). * tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: netlink: Fix TDCO calculation using the old data bittiming can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER) can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock ==================== Link: https://lore.kernel.org/r/20240214140348.2412776-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15net-timestamp: make sk_tskey more predictable in error pathVadim Fedorenko
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams, the sk_tskey can become unpredictable in case of any error happened during sendmsg(). Move increment later in the code and make decrement of sk_tskey in error path. This solution is still racy in case of multiple threads doing snedmsg() over the very same socket in parallel, but still makes error path much more predictable. Fixes: 09c2d251b707 ("net-timestamp: add key to disambiguate concurrent datagrams") Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240213110428.1681540-1-vadfed@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriatelyJohannes Berg
Even if that's the same as IEEE80211_MAX_SSID_LEN, we really should just use IEEE80211_MAX_MESH_ID_LEN for mesh, rather than having the BUILD_BUG_ON()s. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-15Merge wireless into wireless-nextJohannes Berg
There's a conflict already and some upcoming changes also depend on changes in wireless for being conflict- free, so pull wireless in to make all that easier. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-14Merge tag 'wireless-2024-02-14' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Valentine's day edition, with just few fixes because that's how we love it ;-) iwlwifi: - correct A3 in A-MSDUs - fix crash when operating as AP and running out of station slots to use - clear link ID to correct some later checks against it - fix error codes in SAR table loading - fix error path in PPAG table read mac80211: - reload a pointer after SKB may have changed (only in certain monitor inject mode scenarios) * tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: mvm: fix a crash when we run out of stations wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table() wifi: iwlwifi: Fix some error codes wifi: iwlwifi: clear link_id in time_event wifi: iwlwifi: mvm: use correct address 3 in A-MSDU wifi: mac80211: reload info pointer in ieee80211_tx_dequeue() ==================== Link: https://lore.kernel.org/r/20240214184326.132813-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15netfilter: nf_tables: fix bidirectional offload regressionFelix Fietkau
Commit 8f84780b84d6 ("netfilter: flowtable: allow unidirectional rules") made unidirectional flow offload possible, while completely ignoring (and breaking) bidirectional flow offload for nftables. Add the missing flag that was left out as an exercise for the reader :) Cc: Vlad Buslov <vladbu@nvidia.com> Fixes: 8f84780b84d6 ("netfilter: flowtable: allow unidirectional rules") Reported-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-15netfilter: nat: restore default DNAT behaviorKyle Swenson
When a DNAT rule is configured via iptables with different port ranges, iptables -t nat -A PREROUTING -p tcp -d 10.0.0.2 -m tcp --dport 32000:32010 -j DNAT --to-destination 192.168.0.10:21000-21010 we seem to be DNATing to some random port on the LAN side. While this is expected if --random is passed to the iptables command, it is not expected without passing --random. The expected behavior (and the observed behavior prior to the commit in the "Fixes" tag) is the traffic will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision with that destination. In that case, we expect the traffic to be instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of the range. This patch intends to restore the behavior observed prior to the "Fixes" tag. Fixes: 6ed5943f8735 ("netfilter: nat: remove l4 protocol port rovers") Signed-off-by: Kyle Swenson <kyle.swenson@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-15netfilter: nft_set_pipapo: fix missing : in kdocPablo Neira Ayuso
Add missing : in kdoc field names. Fixes: 8683f4b9950d ("nft_set_pipapo: Prepare for vectorised implementation: helpers") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-14can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)Oleksij Rempel
Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...) modifies jsk->filters while receiving packets. Following trace was seen on affected system: ================================================================== BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] Read of size 4 at addr ffff888012144014 by task j1939/350 CPU: 0 PID: 350 Comm: j1939 Tainted: G W OE 6.5.0-rc5 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: print_report+0xd3/0x620 ? kasan_complete_mode_report_info+0x7d/0x200 ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] kasan_report+0xc2/0x100 ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] __asan_load4+0x84/0xb0 j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] j1939_sk_recv+0x20b/0x320 [can_j1939] ? __kasan_check_write+0x18/0x20 ? __pfx_j1939_sk_recv+0x10/0x10 [can_j1939] ? j1939_simple_recv+0x69/0x280 [can_j1939] ? j1939_ac_recv+0x5e/0x310 [can_j1939] j1939_can_recv+0x43f/0x580 [can_j1939] ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939] ? raw_rcv+0x42/0x3c0 [can_raw] ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939] can_rcv_filter+0x11f/0x350 [can] can_receive+0x12f/0x190 [can] ? __pfx_can_rcv+0x10/0x10 [can] can_rcv+0xdd/0x130 [can] ? __pfx_can_rcv+0x10/0x10 [can] __netif_receive_skb_one_core+0x13d/0x150 ? __pfx___netif_receive_skb_one_core+0x10/0x10 ? __kasan_check_write+0x18/0x20 ? _raw_spin_lock_irq+0x8c/0xe0 __netif_receive_skb+0x23/0xb0 process_backlog+0x107/0x260 __napi_poll+0x69/0x310 net_rx_action+0x2a1/0x580 ? __pfx_net_rx_action+0x10/0x10 ? __pfx__raw_spin_lock+0x10/0x10 ? handle_irq_event+0x7d/0xa0 __do_softirq+0xf3/0x3f8 do_softirq+0x53/0x80 </IRQ> <TASK> __local_bh_enable_ip+0x6e/0x70 netif_rx+0x16b/0x180 can_send+0x32b/0x520 [can] ? __pfx_can_send+0x10/0x10 [can] ? __check_object_size+0x299/0x410 raw_sendmsg+0x572/0x6d0 [can_raw] ? __pfx_raw_sendmsg+0x10/0x10 [can_raw] ? apparmor_socket_sendmsg+0x2f/0x40 ? __pfx_raw_sendmsg+0x10/0x10 [can_raw] sock_sendmsg+0xef/0x100 sock_write_iter+0x162/0x220 ? __pfx_sock_write_iter+0x10/0x10 ? __rtnl_unlock+0x47/0x80 ? security_file_permission+0x54/0x320 vfs_write+0x6ba/0x750 ? __pfx_vfs_write+0x10/0x10 ? __fget_light+0x1ca/0x1f0 ? __rcu_read_unlock+0x5b/0x280 ksys_write+0x143/0x170 ? __pfx_ksys_write+0x10/0x10 ? __kasan_check_read+0x15/0x20 ? fpregs_assert_state_consistent+0x62/0x70 __x64_sys_write+0x47/0x60 do_syscall_64+0x60/0x90 ? do_syscall_64+0x6d/0x90 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x79/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Allocated by task 348: kasan_save_stack+0x2a/0x50 kasan_set_track+0x29/0x40 kasan_save_alloc_info+0x1f/0x30 __kasan_kmalloc+0xb5/0xc0 __kmalloc_node_track_caller+0x67/0x160 j1939_sk_setsockopt+0x284/0x450 [can_j1939] __sys_setsockopt+0x15c/0x2f0 __x64_sys_setsockopt+0x6b/0x80 do_syscall_64+0x60/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 349: kasan_save_stack+0x2a/0x50 kasan_set_track+0x29/0x40 kasan_save_free_info+0x2f/0x50 __kasan_slab_free+0x12e/0x1c0 __kmem_cache_free+0x1b9/0x380 kfree+0x7a/0x120 j1939_sk_setsockopt+0x3b2/0x450 [can_j1939] __sys_setsockopt+0x15c/0x2f0 __x64_sys_setsockopt+0x6b/0x80 do_syscall_64+0x60/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol") Reported-by: Sili Luo <rootlab@huawei.com> Suggested-by: Sili Luo <rootlab@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20231020133814.383996-1-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-14can: j1939: prevent deadlock by changing j1939_socks_lock to rwlockZiqi Zhao
The following 3 locks would race against each other, causing the deadlock situation in the Syzbot bug report: - j1939_socks_lock - active_session_list_lock - sk_session_queue_lock A reasonable fix is to change j1939_socks_lock to an rwlock, since in the rare situations where a write lock is required for the linked list that j1939_socks_lock is protecting, the code does not attempt to acquire any more locks. This would break the circular lock dependency, where, for example, the current thread already locks j1939_socks_lock and attempts to acquire sk_session_queue_lock, and at the same time, another thread attempts to acquire j1939_socks_lock while holding sk_session_queue_lock. NOTE: This patch along does not fix the unregister_netdevice bug reported by Syzbot; instead, it solves a deadlock situation to prepare for one or more further patches to actually fix the Syzbot bug, which appears to be a reference counting problem within the j1939 codebase. Reported-by: <syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com> Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com [mkl: remove unrelated newline change] Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-14net: remove dev_base_lockEric Dumazet
dev_base_lock is not needed anymore, all remaining users also hold RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: remove dev_base_lock from register_netdevice() and friends.Eric Dumazet
RTNL already protects writes to dev->reg_state, we no longer need to hold dev_base_lock to protect the readers. unlist_netdevice() second argument can be removed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: remove dev_base_lock from do_setlink()Eric Dumazet
We hold RTNL here, and dev->link_mode readers already are using READ_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: add netdev_set_operstate() helperEric Dumazet
dev_base_lock is going away, add netdev_set_operstate() helper so that hsr does not have to know core internals. Remove dev_base_lock acquisition from rfc2863_policy() v3: use an "unsigned int" for dev->operstate, so that try_cmpxchg() can work on all arches. ( https://lore.kernel.org/oe-kbuild-all/202402081918.OLyGaea3-lkp@intel.com/ ) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net-sysfs: convert netstat_show() to RCUEric Dumazet
dev_get_stats() can be called from RCU, there is no need to acquire dev_base_lock. Change dev_isalive() comment to reflect we no longer use dev_base_lock from net/core/net-sysfs.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net-sysfs: convert dev->operstate reads to lockless onesEric Dumazet
operstate_show() can omit dev_base_lock acquisition only to read dev->operstate. Annotate accesses to dev->operstate. Writers still acquire dev_base_lock for mutual exclusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net-sysfs: use dev_addr_sem to remove races in address_show()Eric Dumazet
Using dev_base_lock is not preventing from reading garbage. Use dev_addr_sem instead. v4: place dev_addr_sem extern in net/core/dev.h (Jakub Kicinski) Link: https://lore.kernel.org/netdev/20240212175845.10f6680a@kernel.org/ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net-sysfs: convert netdev_show() to RCUEric Dumazet
Make clear dev_isalive() can be called with RCU protection. Then convert netdev_show() to RCU, to remove dev_base_lock dependency. Also add RCU to broadcast_show(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: convert dev->reg_state to u8Eric Dumazet
Prepares things so that dev->reg_state reads can be lockless, by adding WRITE_ONCE() on write side. READ_ONCE()/WRITE_ONCE() do not support bitfields. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14dev: annotate accesses to dev->linkEric Dumazet
Following patch will read dev->link locklessly, annotate the write from do_setlink(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14ip_tunnel: annotate data-races around t->parms.linkEric Dumazet
t->parms.link is read locklessly, annotate these reads and opposite writes accordingly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: annotate data-races around dev->name_assign_typeEric Dumazet
name_assign_type_show() runs locklessly, we should annotate accesses to dev->name_assign_type. Alternative would be to grab devnet_rename_sem semaphore from name_assign_type_show(), but this would not bring more accuracy. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: smc: fix spurious error message from __sock_release()Dmitry Antipov
Commit 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback") leaves the socket's fasync list pointer within a container socket as well. When the latter is destroyed, '__sock_release()' warns about its non-empty fasync list, which is a dangling pointer to previously freed fasync list of an underlying TCP socket. Fix this spurious warning by nullifying fasync list of a container socket. Fixes: 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14Merge tag 'linux-can-next-for-6.9-20240213' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-6.9-20240213 this is a pull request of 23 patches for net-next/master. The first patch is by Nicolas Maier and targets the CAN Broadcast Manager (bcm), it adds message flags to distinguish between own local and remote traffic. Oliver Hartkopp contributes a patch for the CAN ISOTP protocol that adds dynamic flow control parameters. Stefan Mätje's patch series add support for the esd PCIe/402 CAN interface family. Markus Schneider-Pargmann contributes 14 patches for the m_can to optimize for the SPI attached tcan4x5x controller. A patch by Vincent Mailhol replaces Wolfgang Grandegger by Vincent Mailhol as the CAN drivers Co-Maintainer. Jimmy Assarsson's patch add support for the Kvaser M.2 PCIe 4xCAN adapter. A patch by Daniil Dulov removed a redundant NULL check in the softing driver. Oliver Hartkopp contributes a patch to add CANXL virtual CAN network identifier support. A patch by myself removes Naga Sureshkumar Relli as the maintainer of the xilinx_can driver, as their email bounces. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-13veth: rely on skb_pp_cow_data utility routineLorenzo Bianconi
Rely on skb_pp_cow_data utility routine and remove duplicated code. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/029cc14cce41cb242ee7efdcf32acc81f1ce4e9f.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13xdp: add multi-buff support for xdp running in generic modeLorenzo Bianconi
Similar to native xdp, do not always linearize the skb in netif_receive_generic_xdp routine but create a non-linear xdp_buff to be processed by the eBPF program. This allow to add multi-buffer support for xdp running in generic mode. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/1044d6412b1c3e95b40d34993fd5f37cd2f319fd.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13xdp: rely on skb pointer reference in do_xdp_generic and ↵Lorenzo Bianconi
netif_receive_generic_xdp Rely on skb pointer reference instead of the skb pointer in do_xdp_generic and netif_receive_generic_xdp routine signatures. This is a preliminary patch to add multi-buff support for xdp running in generic mode where we will need to reallocate the skb to avoid linearization and we will need to make it visible to do_xdp_generic() caller. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c09415b1f48c8620ef4d76deed35050a7bddf7c2.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13net: add generic percpu page_pool allocatorLorenzo Bianconi
Introduce generic percpu page_pools allocator. Moreover add page_pool_create_percpu() and cpuid filed in page_pool struct in order to recycle the page in the page_pool "hot" cache if napi_pp_put_page() is running on the same cpu. This is a preliminary patch to add xdp multi-buff support for xdp running in generic mode. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/80bc4285228b6f4220cd03de1999d86e46e3fcbd.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13net: add netdev_lockdep_set_classes() to virtual driversEric Dumazet
Based on a syzbot report, it appears many virtual drivers do not yet use netdev_lockdep_set_classes(), triggerring lockdep false positives. WARNING: possible recursive locking detected 6.8.0-rc4-next-20240212-syzkaller #0 Not tainted syz-executor.0/19016 is trying to acquire lock: ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 but task is already holding lock: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(_xmit_ETHER#2); lock(_xmit_ETHER#2); *** DEADLOCK *** May be due to missing lock nesting notation 9 locks held by syz-executor.0/19016: #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x82c/0x1040 net/core/rtnetlink.c:6603 #1: ffffc90000a08c00 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x600 kernel/time/timer.c:1697 #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325 #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325 stack backtrace: CPU: 1 PID: 19016 Comm: syz-executor.0 Not tainted 6.8.0-rc4-next-20240212-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_deadlock kernel/locking/lockdep.c:3062 [inline] validate_chain+0x15c1/0x58e0 kernel/locking/lockdep.c:3856 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __netif_tx_lock include/linux/netdevice.h:4452 [inline] sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 iptunnel_xmit+0x540/0x9b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x20ee/0x2960 net/ipv4/ip_tunnel.c:831 erspan_xmit+0x9de/0x1460 net/ipv4/ip_gre.c:720 __netdev_start_xmit include/linux/netdevice.h:4989 [inline] netdev_start_xmit include/linux/netdevice.h:5003 [inline] xmit_one net/core/dev.c:3555 [inline] dev_hard_start_xmit+0x242/0x770 net/core/dev.c:3571 sch_direct_xmit+0x2b6/0x5f0 net/sched/sch_generic.c:342 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 igmpv3_send_cr net/ipv4/igmp.c:723 [inline] igmp_ifc_timer_expire+0xb71/0xd90 net/ipv4/igmp.c:813 call_timer_fn+0x17e/0x600 kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1751 [inline] __run_timers+0x621/0x830 kernel/time/timer.c:2038 run_timer_softirq+0x67/0xf0 kernel/time/timer.c:2051 __do_softirq+0x2bc/0x943 kernel/softirq.c:554 invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1076 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1076 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0010:resched_offsets_ok kernel/sched/core.c:10127 [inline] RIP: 0010:__might_resched+0x16f/0x780 kernel/sched/core.c:10142 Code: 00 4c 89 e8 48 c1 e8 03 48 ba 00 00 00 00 00 fc ff df 48 89 44 24 38 0f b6 04 10 84 c0 0f 85 87 04 00 00 41 8b 45 00 c1 e0 08 <01> d8 44 39 e0 0f 85 d6 00 00 00 44 89 64 24 1c 48 8d bc 24 a0 00 RSP: 0018:ffffc9000ee069e0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8880296a9e00 RDX: dffffc0000000000 RSI: ffff8880296a9e00 RDI: ffffffff8bfe8fa0 RBP: ffffc9000ee06b00 R08: ffffffff82326877 R09: 1ffff11002b5ad1b R10: dffffc0000000000 R11: ffffed1002b5ad1c R12: 0000000000000000 R13: ffff8880296aa23c R14: 000000000000062a R15: 1ffff92001dc0d44 down_write+0x19/0x50 kernel/locking/rwsem.c:1578 kernfs_activate fs/kernfs/dir.c:1403 [inline] kernfs_add_one+0x4af/0x8b0 fs/kernfs/dir.c:819 __kernfs_create_file+0x22e/0x2e0 fs/kernfs/file.c:1056 sysfs_add_file_mode_ns+0x24a/0x310 fs/sysfs/file.c:307 create_files fs/sysfs/group.c:64 [inline] internal_create_group+0x4f4/0xf20 fs/sysfs/group.c:152 internal_create_groups fs/sysfs/group.c:192 [inline] sysfs_create_groups+0x56/0x120 fs/sysfs/group.c:218 create_dir lib/kobject.c:78 [inline] kobject_add_internal+0x472/0x8d0 lib/kobject.c:240 kobject_add_varg lib/kobject.c:374 [inline] kobject_init_and_add+0x124/0x190 lib/kobject.c:457 netdev_queue_add_kobject net/core/net-sysfs.c:1706 [inline] netdev_queue_update_kobjects+0x1f3/0x480 net/core/net-sysfs.c:1758 register_queue_kobjects net/core/net-sysfs.c:1819 [inline] netdev_register_kobject+0x265/0x310 net/core/net-sysfs.c:2059 register_netdevice+0x1191/0x19c0 net/core/dev.c:10298 bond_newlink+0x3b/0x90 drivers/net/bonding/bond_netlink.c:576 rtnl_newlink_create net/core/rtnetlink.c:3506 [inline] __rtnl_newlink net/core/rtnetlink.c:3726 [inline] rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3739 rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6606 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3c/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 __sys_sendto+0x3a4/0x4f0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2199 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fc3fa87fa9c Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240212140700.2795436-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13net: bridge: use netdev_lockdep_set_classes()Eric Dumazet
br_set_lockdep_class() is missing many details. Use generic netdev_lockdep_set_classes() to not worry anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240212140700.2795436-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13vlan: use netdev_lockdep_set_classes()Eric Dumazet
vlan uses vlan_dev_set_lockdep_class() which lacks qdisc_tx_busylock initialization. Use generic netdev_lockdep_set_classes() to not worry anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240212140700.2795436-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()Eric Dumazet
Adopt net->dev_by_index as I did in commit 0e0939c0adf9 ("net-procfs: use xarray iterator to implement /proc/net/dev") This makes sure an existing device is always visible in the dump, regardless of concurrent insertions/deletions. v2: added suggestions from Jakub Kicinski and Ido Schimmel, thanks for the help ! Link: https://lore.kernel.org/all/20240209142441.6c56435b@kernel.org/ Link: https://lore.kernel.org/all/ZckR-XOsULLI9EHc@shredder/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240211214404.1882191-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13vlan: use xarray iterator to implement /proc/net/vlan/configEric Dumazet
Adopt net->dev_by_index as I did in commit 0e0939c0adf9 ("net-procfs: use xarray iterator to implement /proc/net/dev") Not only this removes quadratic behavior, it also makes sure an existing vlan device is always visible in the dump, regardless of concurrent net->dev_base_head changes. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240211214404.1882191-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13net: sched: codel replace GPLv2/BSD boilerplateStephen Hemminger
The prologue to codel is using BSD-3 clause and GPL-2 boiler plate language. Replace it by using SPDX. The automated treewide scan in commit d2912cb15bdd ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500") did not pickup dual licensed code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Dave Taht <dave.taht@gmail.com> Link: https://lore.kernel.org/r/20240211172532.6568-1-stephen@networkplumber.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-13can: canxl: add virtual CAN network identifier supportOliver Hartkopp
CAN XL data frames contain an 8-bit virtual CAN network identifier (VCID). A VCID value of zero represents an 'untagged' CAN XL frame. To receive and send these optional VCIDs via CAN_RAW sockets a new socket option CAN_RAW_XL_VCID_OPTS is introduced to define/access VCID content: - tx: set the outgoing VCID value by the kernel (one fixed 8-bit value) - tx: pass through VCID values from the user space (e.g. for traffic replay) - rx: apply VCID receive filter (value/mask) to be passed to the user space With the 'tx pass through' option CAN_RAW_XL_VCID_TX_PASS all valid VCID values can be sent, e.g. to replay full qualified CAN XL traffic. The VCID value provided for the CAN_RAW_XL_VCID_TX_SET option will override the VCID value in the struct canxl_frame.prio defined for CAN_RAW_XL_VCID_TX_PASS when both flags are set. With a rx_vcid_mask of zero all possible VCID values (0x00 - 0xFF) are passed to the user space when the CAN_RAW_XL_VCID_RX_FILTER flag is set. Without this flag only untagged CAN XL frames (VCID = 0x00) are delivered to the user space (default). The 8-bit VCID is stored inside the CAN XL prio element (only in CAN XL frames!) to not interfere with other CAN content or the CAN filters provided by the CAN_RAW sockets and kernel infrastruture. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20240212213550.18516-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-13af_unix: Fix task hung while purging oob_skb in GC.Kuniyuki Iwashima
syzbot reported a task hung; at the same time, GC was looping infinitely in list_for_each_entry_safe() for OOB skb. [0] syzbot demonstrated that the list_for_each_entry_safe() was not actually safe in this case. A single skb could have references for multiple sockets. If we free such a skb in the list_for_each_entry_safe(), the current and next sockets could be unlinked in a single iteration. unix_notinflight() uses list_del_init() to unlink the socket, so the prefetched next socket forms a loop itself and list_for_each_entry_safe() never stops. Here, we must use while() and make sure we always fetch the first socket. [0]: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 5065 Comm: syz-executor236 Not tainted 6.8.0-rc3-syzkaller-00136-g1f719a2f3fa6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline] RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x60 kernel/kcov.c:207 Code: cc cc cc cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 65 48 8b 14 25 40 c2 03 00 <65> 8b 05 b4 7c 78 7e a9 00 01 ff 00 48 8b 34 24 74 0f f6 c4 01 74 RSP: 0018:ffffc900033efa58 EFLAGS: 00000283 RAX: ffff88807b077800 RBX: ffff88807b077800 RCX: 1ffffffff27b1189 RDX: ffff88802a5a3b80 RSI: ffffffff8968488d RDI: ffff88807b077f70 RBP: ffffc900033efbb0 R08: 0000000000000001 R09: fffffbfff27a900c R10: ffffffff93d48067 R11: ffffffff8ae000eb R12: ffff88807b077800 R13: dffffc0000000000 R14: ffff88807b077e40 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564f4fc1e3a8 CR3: 000000000d57a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <TASK> unix_gc+0x563/0x13b0 net/unix/garbage.c:319 unix_release_sock+0xa93/0xf80 net/unix/af_unix.c:683 unix_release+0x91/0xf0 net/unix/af_unix.c:1064 __sock_release+0xb0/0x270 net/socket.c:659 sock_close+0x1c/0x30 net/socket.c:1421 __fput+0x270/0xb80 fs/file_table.c:376 task_work_run+0x14f/0x250 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xa8a/0x2ad0 kernel/exit.c:871 do_group_exit+0xd4/0x2a0 kernel/exit.c:1020 __do_sys_exit_group kernel/exit.c:1031 [inline] __se_sys_exit_group kernel/exit.c:1029 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd5/0x270 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f9d6cbdac09 Code: Unable to access opcode bytes at 0x7f9d6cbdabdf. RSP: 002b:00007fff5952feb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9d6cbdac09 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 RBP: 00007f9d6cc552b0 R08: ffffffffffffffb8 R09: 0000000000000006 R10: 0000000000000006 R11: 0000000000000246 R12: 00007f9d6cc552b0 R13: 0000000000000000 R14: 00007f9d6cc55d00 R15: 00007f9d6cbabe70 </TASK> Reported-by: syzbot+4fa4a2d1f5a5ee06f006@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4fa4a2d1f5a5ee06f006 Fixes: 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240209220453.96053-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-13net: sched: Remove NET_ACT_IPT from KconfigHarshit Mogalapalli
After this commit ba24ea129126 ("net/sched: Retire ipt action") NET_ACT_IPT is not needed anymore as the action is retired and the code is removed. Clean the Kconfig part as well. Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240209180656.867546-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-13net:rds: Fix possible deadlock in rds_message_putAllison Henderson
Functions rds_still_queued and rds_clear_recv_queue lock a given socket in order to safely iterate over the incoming rds messages. However calling rds_inc_put while under this lock creates a potential deadlock. rds_inc_put may eventually call rds_message_purge, which will lock m_rs_lock. This is the incorrect locking order since m_rs_lock is meant to be locked before the socket. To fix this, we move the message item to a local list or variable that wont need rs_recv_lock protection. Then we can safely call rds_inc_put on any item stored locally after rs_recv_lock is released. Fixes: bdbe6fbc6a2f ("RDS: recv.c") Reported-by: syzbot+f9db6ff27b9bfdcfeca0@syzkaller.appspotmail.com Reported-by: syzbot+dcd73ff9291e6d34b3ab@syzkaller.appspotmail.com Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://lore.kernel.org/r/20240209022854.200292-1-allison.henderson@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>