summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-18inet: frags: save a pair of atomic operations in reassemblyEric Dumazet
As mentioned in commit 648700f76b03 ("inet: frags: use rhashtables for reassembly units"): A followup patch will even remove the refcount hold/release left from prior implementation and save a couple of atomic operations. This patch implements this idea, seven years later. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250312082250.1803501-5-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18inet: frags: change inet_frag_kill() to defer refcount updatesEric Dumazet
In the following patch, we no longer assume inet_frag_kill() callers own a reference. Consuming two refcounts from inet_frag_kill() would lead in UAF. Propagate the pointer to the refs that will be consumed later by the final inet_frag_putn() call. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250312082250.1803501-4-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ipv4: frags: remove ipq_put()Eric Dumazet
Replace ipq_put() with inet_frag_putn() Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250312082250.1803501-3-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18inet: frags: add inet_frag_putn() helperEric Dumazet
inet_frag_putn() can release multiple references in one step. Use it in inet_frags_free_cb(). Replace inet_frag_put(X) with inet_frag_putn(X, 1) Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250312082250.1803501-2-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18firmware: thead: add CONFIG_MAILBOX dependencyArnd Bergmann
Without this, the driver fails to build: ld: drivers/firmware/thead,th1520-aon.o: in function `th1520_aon_call_rpc': thead,th1520-aon.c:(.text+0x28): undefined reference to `mbox_send_message' ld: drivers/firmware/thead,th1520-aon.o: in function `th1520_aon_deinit': thead,th1520-aon.c:(.text+0x17e): undefined reference to `mbox_free_channel' ld: drivers/firmware/thead,th1520-aon.o: in function `th1520_aon_init': thead,th1520-aon.c:(.text+0x1d9): undefined reference to `mbox_request_channel_byname' Fixes: e4b3cbd840e5 ("firmware: thead: Add AON firmware protocol driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Michal Wilczynski <m.wilczynski@samsung.com> Link: https://lore.kernel.org/r/20250314154816.4045334-1-arnd@kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-03-18firmware: thead,th1520-aon: Fix use after free in th1520_aon_init()Dan Carpenter
Record the error code before freeing "aon_chan" to avoid a use after free. Fixes: e4b3cbd840e5 ("firmware: thead: Add AON firmware protocol driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/f19be994-d355-48a6-ab45-d0f7e5955daf@stanley.mountain Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-03-18xfs: don't wake zone space waiters without m_zone_infoDarrick J. Wong
xfs_zoned_wake_all checks SB_ACTIVE to make sure it does the right thing when a shutdown happens during unmount, but it fails to account for the log recovery special case that sets SB_ACTIVE temporarily. Add a NULL check to cover both cases. Signed-off-by: Darrick J. Wong <djwong@kernel.org> [hch: added a commit log and comment] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-18xfs: don't increment m_generation for all errors in xfs_growfs_dataChristoph Hellwig
xfs_growfs_data needs to increment m_generation as soon as the primary superblock has been updated. As the update of the secondary superblocks was part of xfs_growfs_data_private that mean the incremented had to be done unconditionally once that was called. Later, commit 83a7f86e39ff ("xfs: separate secondary sb update in growfs") split the secondary superblock update into a separate helper, so now the increment on error can be limited to failed calls to xfs_update_secondary_sbs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-18xfs: fix a missing unlock in xfs_growfs_dataChristoph Hellwig
The newly added check for the internal RT device needs to unlock m_growlock just like all ther other error cases. Fixes: bdc03eb5f98f ("xfs: allow internal RT devices for zoned mode") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-18net: skbuff: Remove unused skb_add_data()Yue Haibing
Since commit a4ea4c477619 ("rxrpc: Don't use a ring buffer for call Tx queue") this function is not used anymore. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312063450.183652-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().Kuniyuki Iwashima
While creating a new IPv6, we could get a weird -ENOMEM when RTA_NH_ID is set and either of the conditions below is true: 1) CONFIG_IPV6_SUBTREES is enabled and rtm_src_len is specified 2) nexthop_get() fails e.g.) # strace ip -6 route add fe80::dead:beef:dead:beef nhid 1 from :: recvmsg(3, {msg_iov=[{iov_base=[...[ {error=-ENOMEM, msg=[... [...]]}, [{nla_len=49, nla_type=NLMSGERR_ATTR_MSG}, "Nexthops can not be used with so"...] ]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148 Let's set err explicitly after ip_fib_metrics_init() in ip6_route_info_create(). Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250312013854.61125-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().Kuniyuki Iwashima
fib_check_nh_v6_gw() expects that fib6_nh_init() cleans up everything when it fails. Commit 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh") moved fib_nh_common_init() before alloc_percpu_gfp() within fib6_nh_init() but forgot to add cleanup for fib6_nh->nh_common.nhc_pcpu_rth_output in case it fails to allocate fib6_nh->rt6i_pcpu, resulting in memleak. Let's call fib_nh_common_release() and clear nhc_pcpu_rth_output in the error path. Note that we can remove the fib6_nh_release() call in nh_create_ipv6() later in net-next.git. Fixes: 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250312010333.56001-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge patch series "riscv: Add bfloat16 instruction support"Alexandre Ghiti
Inochi Amaoto <inochiama@gmail.com> says: Add description for the BFloat16 precision Floating-Point ISA extension, (Zfbfmin, Zvfbfmin, Zvfbfwma). which was ratified in commit 4dc23d62 ("Added Chapter title to BF16") of the riscv-isa-manual. * patches from https://lore.kernel.org/r/20250213003849.147358-1-inochiama@gmail.com: riscv: hwprobe: export bfloat16 ISA extension riscv: add ISA extension parsing for bfloat16 ISA extension dt-bindings: riscv: add bfloat16 ISA extension description Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250213003849.147358-1-inochiama@gmail.com
2025-03-18riscv: hwprobe: export bfloat16 ISA extensionInochi Amaoto
Export Zfbmin, Zvfbfmin, Zvfbfwma ISA extension through hwprobe. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20250213003849.147358-4-inochiama@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-18riscv: add ISA extension parsing for bfloat16 ISA extensionInochi Amaoto
Add parsing for Zfbmin, Zvfbfmin, Zvfbfwma ISA extension which were ratified in 4dc23d62 ("Added Chapter title to BF16") of the riscv-isa-manual. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20250213003849.147358-3-inochiama@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-18dt-bindings: riscv: add bfloat16 ISA extension descriptionInochi Amaoto
Add description for the BFloat16 precision Floating-Point ISA extension, (Zfbfmin, Zvfbfmin, Zvfbfwma). which was ratified in commit 4dc23d62 ("Added Chapter title to BF16") of the riscv-isa-manual. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250213003849.147358-2-inochiama@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-18arm64: dts: hi3660: Add property for fixing CPUIdleLeo Yan
During CPU low power modes, ETM components will lose their context. Add the "arm,coresight-loses-context-with-cpu" property to ETM nodes to save and restore ETM context for CPU idle states. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
2025-03-18Merge tag 'linux-can-next-for-6.15-20250314' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2025-03-14 this is a pull request of 4 patches for net-next/main. In the first 2 patches by Dimitri Fedrau add CAN transceiver support to the flexcan driver. Frank Li's patch adds i.MX94 support to the flexcan device tree bindings. The last patch is by Davide Caratti and adds protocol counter for AF_CAN sockets. linux-can-next-for-6.15-20250314 * tag 'linux-can-next-for-6.15-20250314' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: add protocol counter for AF_CAN sockets dt-bindings: can: fsl,flexcan: add i.MX94 support can: flexcan: add transceiver capabilities dt-bindings: can: fsl,flexcan: add transceiver capabilities ==================== Link: https://patch.msgid.link/20250314132327.2905693-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge tag 'linux-can-fixes-for-6.14-20250314' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-03-14 this is a pull request of 6 patches for net/main. The first patch is by Vincent Mailhol and fixes an out of bound read in strscpy() in the ucan driver. Oliver Hartkopp contributes a patch for the af_can statistics to use atomic access in the hot path. The next 2 patches are by Biju Das, target the rcar_canfd driver and fix the page entries in the AFL list. The 2 patches by Haibo Chen for the flexcan driver fix the suspend and resume functions. linux-can-fixes-for-6.14-20250314 * tag 'linux-can-fixes-for-6.14-20250314' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: flexcan: disable transceiver during system PM can: flexcan: only change CAN state when link up in system PM can: rcar_canfd: Fix page entries in the AFL list dt-bindings: can: renesas,rcar-canfd: Fix typo in pattern properties for R-Car V4M can: statistics: use atomic access in hot path can: ucan: fix out of bound read in strscpy() source ==================== Link: https://patch.msgid.link/20250314130909.2890541-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge tag 'batadv-next-pullrequest-20250313' of ↵Paolo Abeni
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - drop batadv_priv_debug_log struct, by Sven Eckelmann - adopt netdev_hold() / netdev_put(), by Eric Dumazet - add support for jumbo frames, by Sven Eckelmann - use consistent name for mesh interface, by Sven Eckelmann - cleanup B.A.T.M.A.N. IV OGM aggregation handling, by Sven Eckelmann (4 patches) - add missing newlines for log macros, by Sven Eckelmann * tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge: batman-adv: add missing newlines for log macros batman-adv: Limit aggregation size to outgoing MTU batman-adv: Use actual packet count for aggregated packets batman-adv: Switch to bitmap helper for aggregation handling batman-adv: Limit number of aggregated packets directly batman-adv: Use consistent name for mesh interface batman-adv: Add support for jumbo frames batman-adv: adopt netdev_hold() / netdev_put() batman-adv: Drop batadv_priv_debug_log struct batman-adv: Start new development cycle ==================== Link: https://patch.msgid.link/20250313164519.72808-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ahci: Marvell 88SE9215 controllers prefer DMA for ATAPIHuacai Chen
We use CD/DVD drives under Marvell 88SE9215 SATA controller on many Loongson-based machines. We found its PIO doesn't work well, and on the opposite its DMA seems work very well. We don't know the detail of the 88SE9215 SATA controller, but we have tested different CD/DVD drives and they all have problems under 88SE9215 (but they all work well under an Intel SATA controller). So, we consider this problem is bound to 88SE9215 SATA controller rather than bound to CD/DVD drives. As a solution, we define a new dedicated AHCI board id which is named board_ahci_yes_fbs_atapi_dma for 88SE9215, and for this id we set the AHCI_HFLAG_ATAPI_DMA_QUIRK and ATA_QUIRK_ATAPI_MOD16_DMA flags on the SATA controller in order to prefer ATAPI DMA. Reported-by: Yuli Wang <wangyuli@uniontech.com> Tested-by: Jie Fan <fanjie@uniontech.com> Tested-by: Erpeng Xu <xuerpeng@uniontech.com> Tested-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/r/20250318104314.2160526-1-chenhuacai@loongson.cn Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-03-18net: ipv6: fix TCP GSO segmentation with NATFelix Fietkau
When updating the source/destination address, the TCP/UDP checksum needs to be updated as well. Fixes: bee88cd5bd83 ("net: add support for segmenting TCP fraglist GSO packets") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20250311212530.91519-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge branch 'udp_tunnel-gro-optimizations'Paolo Abeni
Paolo Abeni says: ==================== udp_tunnel: GRO optimizations The UDP tunnel GRO stage is source of measurable overhead for workload based on UDP-encapsulated traffic: each incoming packets requires a full UDP socket lookup and an indirect call. In the most common setups a single UDP tunnel device is used. In such case we can optimize both the lookup and the indirect call. Patch 1 tracks per netns the active UDP tunnels and replaces the socket lookup with a single destination port comparison when possible. Patch 2 tracks the different types of UDP tunnels and replaces the indirect call with a static one when there is a single UDP tunnel type active. I measure ~5% performance improvement in TCP over UDP tunnel stream tests on top of this series. v3: https://lore.kernel.org/netdev/cover.1741632298.git.pabeni@redhat.com/ v2: https://lore.kernel.org/netdev/cover.1741338765.git.pabeni@redhat.com/ v1: https://lore.kernel.org/netdev/cover.1741275846.git.pabeni@redhat.com/ ==================== Link: https://patch.msgid.link/cover.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18udp_tunnel: use static call for GRO hooks when possiblePaolo Abeni
It's quite common to have a single UDP tunnel type active in the whole system. In such a case we can replace the indirect call for the UDP tunnel GRO callback with a static call. Add the related accounting in the control path and switch to static call when possible. To keep the code simple use a static array for the registered tunnel types, and size such array based on the kernel config. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/6fd1f9c7651151493ecab174e7b8386a1534170d.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18udp_tunnel: create a fastpath GRO lookup.Paolo Abeni
Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single tunnel device per namespace. Track in each namespace the UDP tunnel socket respecting the above. When only a single one is present, store a reference in the netns. When such reference is not NULL, UDP tunnel GRO lookup just need to match the incoming packet destination port vs the socket local port. The tunnel socket never sets the reuse[port] flag[s]. When bound to no address and interface, no other socket can exist in the same netns matching the specified local port. Matching packets with non-local destination addresses will be aggregated, and eventually segmented as needed - no behavior changes intended. Note that the UDP tunnel socket reference is stored into struct netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep all the fastpath-related netns fields in the same struct and allow cacheline-based optimization. Currently both the IPv4 and IPv6 socket pointer share the same cacheline as the `udp_table` field. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/4d5c319c4471161829f50cb8436841de81a5edae.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18RDMA/mlx5: Fix calculation of total invalidated pagesChiara Meiohas
When invalidating an address range in mlx5, there is an optimization to do UMR operations in chunks. Previously, the invalidation counter was incorrectly updated for the same indexes within a chunk. Now, the invalidation counter is updated only when a chunk is complete and mlx5r_umr_update_xlt() is called. This ensures that the counter accurately represents the number of pages invalidated using UMR. Fixes: a3de94e3d61e ("IB/mlx5: Introduce ODP diagnostic counters") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/560deb2433318e5947282b070c915f3c81fef77f.1741875692.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/mlx5: Fix mlx5_poll_one() cur_qp update flowPatrisious Haddad
When cur_qp isn't NULL, in order to avoid fetching the QP from the radix tree again we check if the next cqe QP is identical to the one we already have. The bug however is that we are checking if the QP is identical by checking the QP number inside the CQE against the QP number inside the mlx5_ib_qp, but that's wrong since the QP number from the CQE is from FW so it should be matched against mlx5_core_qp which is our FW QP number. Otherwise we could use the wrong QP when handling a CQE which could cause the kernel trace below. This issue is mainly noticeable over QPs 0 & 1, since for now they are the only QPs in our driver whereas the QP number inside mlx5_ib_qp doesn't match the QP number inside mlx5_core_qp. BUG: kernel NULL pointer dereference, address: 0000000000000012 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 0 UID: 0 PID: 7927 Comm: kworker/u62:1 Not tainted 6.14.0-rc3+ #189 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] RIP: 0010:mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib] Code: 03 00 00 8d 58 ff 21 cb 66 39 d3 74 39 48 c7 c7 3c 89 6e a0 0f b7 db e8 b7 d2 b3 e0 49 8b 86 60 03 00 00 48 c7 c7 4a 89 6e a0 <0f> b7 5c 98 02 e8 9f d2 b3 e0 41 0f b7 86 78 03 00 00 83 e8 01 21 RSP: 0018:ffff88810511bd60 EFLAGS: 00010046 RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88885fa1b3c0 RDI: ffffffffa06e894a RBP: 00000000000000b0 R08: 0000000000000000 R09: ffff88810511bc10 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810d593000 R13: ffff88810e579108 R14: ffff888105146000 R15: 00000000000000b0 FS: 0000000000000000(0000) GS:ffff88885fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000012 CR3: 00000001077e6001 CR4: 0000000000370eb0 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x150/0x3e0 ? exc_page_fault+0x74/0x130 ? asm_exc_page_fault+0x22/0x30 ? mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib] __ib_process_cq+0x5a/0x150 [ib_core] ib_cq_poll_work+0x31/0x90 [ib_core] process_one_work+0x169/0x320 worker_thread+0x288/0x3a0 ? work_busy+0xb0/0xb0 kthread+0xd7/0x1f0 ? kthreads_online_cpu+0x130/0x130 ? kthreads_online_cpu+0x130/0x130 ret_from_fork+0x2d/0x50 ? kthreads_online_cpu+0x130/0x130 ret_from_fork_asm+0x11/0x20 </TASK> Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/4ada09d41f1e36db62c44a9b25c209ea5f054316.1741875692.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18net: mana: Support holes in device list reply msgHaiyang Zhang
According to GDMA protocol, holes (zeros) are allowed at the beginning or middle of the gdma_list_devices_resp message. The existing code cannot properly handle this, and may miss some devices in the list. To fix, scan the entire list until the num_of_devs are found, or until the end of the list. Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@microsoft.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741723974-1534-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18RDMA/mlx5: Fix page_size variable overflowMichael Guralnik
Change all variables storing mlx5_umem_mkc_find_best_pgsz() result to unsigned long to support values larger than 31 and avoid overflow. For example: If we try to register 4GB of memory that is contiguous in physical memory, the driver will optimize the page_size and try to use an mkey with 4GB entity size. The 'unsigned int' page_size variable will overflow to '0' and we'll hit the WARN_ON() in alloc_cacheable_mr(). WARNING: CPU: 2 PID: 1203 at drivers/infiniband/hw/mlx5/mr.c:1124 alloc_cacheable_mr+0x22/0x580 [mlx5_ib] Modules linked in: mlx5_ib mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_rxe rdma_ucm ib_uverbs ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm fuse ib_core [last unloaded: mlx5_core] CPU: 2 UID: 70878 PID: 1203 Comm: rdma_resource_l Tainted: G W 6.14.0-rc4-dirty #43 Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:alloc_cacheable_mr+0x22/0x580 [mlx5_ib] Code: 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 52 53 48 83 ec 30 f6 46 28 04 4c 8b 77 08 75 21 <0f> 0b 49 c7 c2 ea ff ff ff 48 8d 65 d0 4c 89 d0 5b 41 5a 41 5c 41 RSP: 0018:ffffc900006ffac8 EFLAGS: 00010246 RAX: 0000000004c0d0d0 RBX: ffff888217a22000 RCX: 0000000000100001 RDX: 00007fb7ac480000 RSI: ffff8882037b1240 RDI: ffff8882046f0600 RBP: ffffc900006ffb28 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000007e0 R11: ffffea0008011d40 R12: ffff8882037b1240 R13: ffff8882046f0600 R14: ffff888217a22000 R15: ffffc900006ffe00 FS: 00007fb7ed013340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb7ed1d8000 CR3: 00000001fd8f6006 CR4: 0000000000772eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x81/0x130 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib] ? report_bug+0xfc/0x1e0 ? handle_bug+0x55/0x90 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib] create_real_mr+0x54/0x150 [mlx5_ib] ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xca/0x140 [ib_uverbs] ib_uverbs_run_method+0x6d0/0x780 [ib_uverbs] ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs] ib_uverbs_cmd_verbs+0x19b/0x360 [ib_uverbs] ? walk_system_ram_range+0x79/0xd0 ? ___pte_offset_map+0x1b/0x110 ? __pte_offset_map_lock+0x80/0x100 ib_uverbs_ioctl+0xac/0x110 [ib_uverbs] __x64_sys_ioctl+0x94/0xb0 do_syscall_64+0x50/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fb7ecf0737b Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffdbe03ecc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffdbe03edb8 RCX: 00007fb7ecf0737b RDX: 00007ffdbe03eda0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffdbe03ed80 R08: 00007fb7ecc84010 R09: 00007ffdbe03eed4 R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffdbe03eed4 R13: 000000000000000c R14: 000000000000000c R15: 00007fb7ecc84150 </TASK> Fixes: cef7dde8836a ("net/mlx5: Expand mkey page size to support 6 bits") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/2479a4a3f6fd9bd032e1b6d396274a89c4c5e22f.1741875692.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/mlx5: Drop access_flags from _mlx5_mr_cache_alloc()Michael Guralnik
Drop the unused access_flags parameter. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/4d769c51eb012c62b3a92fd916b7886c25b56fbf.1741875692.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/mlx5: Fix cache entry update on dereg errorMichael Guralnik
Fix double decrement of 'in_use' counter on push_mkey_locked() failure while deregistering an MR. If we fail to return an mkey to the cache in cache_ent_find_and_store() it'll update the 'in_use' counter. Its caller, revoke_mr(), also updates it, thus having double decrement. Wrong value of 'in_use' counter will be exposed through debugfs and can also cause wrong resizing of the cache when users try to set cache entry size using the 'size' debugfs. To address this issue, the 'in_use' counter is now decremented within mlx5_revoke_mr() also after a successful call to cache_ent_find_and_store() and not within cache_ent_find_and_store(). Other success or failure flows remains unchanged where it was also decremented. Fixes: 8c1185fef68c ("RDMA/mlx5: Change check for cacheable mkeys") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/97e979dff636f232ff4c83ce709c17c727da1fdb.1741875692.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/mlx5: Fix MR cache initialization error flowMichael Guralnik
Destroy all previously created cache entries and work queue when rolling back the MR cache initialization upon an error. Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/c41d525fb3c72e28dd38511bf3aaccb5d584063e.1741875692.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/mlx5: Support optional-counters binding for QPsPatrisious Haddad
Add support to allow optional-counters binding to a QP, whereas when a bind operation is requested depending on the counter optional-counter binding state the driver will determine if to also add optional-counters to this QP binding. The optional-counter binding is done by simply adding a steering rule for the specific optional-counter condition with the additional match over that QP number. Note that optional-counters per QP rules are handled on an earlier prio than per device counters, and per device counter correctness is maintained by core whereas it is responsible to sum active counters when checking device counter and to add them to history count when they are deallocated. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/2cad1b891a6641ae61fe8d92f867e1059121813a.1741875070.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/mlx5: Compile fs.c regardless of INFINIBAND_USER_ACCESS configPatrisious Haddad
Change mlx5 Makefile, fs.c and fs.h to support fs compilation regardless of INFINIBAND_USER_ACCESS config. In addition allow optional counters support regardless of the config. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/b8dd220456a91538b22c3aff150ab021d7b9e1bf.1741875070.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/core: Pass port to counter bind/unbind operationsPatrisious Haddad
This will be useful for the next patches in the series since port number is needed for optional counters binding and unbinding. Note that this change is needed since when the operation is done qp->port isn't necessarily initialized yet and can't be used. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/b6f6797844acbd517358e8d2a270ea9b3e6ecba1.1741875070.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/core: Add support to optional-counters binding configurationPatrisious Haddad
Whenever a new counter is created, save inside it the user requested configuration for optional-counters binding, for manual configuration it is requested directly by the user and for the automatic configuration it depends on if the automatic binding was enabled with or without optional-counters binding. This argument will later be used by the driver to determine if to bind the optional-counters as well or not when trying to bind this counter to a QP. It indicates that when binding counters to a QP we also want the currently enabled link optional-counters to be bound as well. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/82f1c357606a16932979ef9a5910122675c74a3a.1741875070.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj()Patrisious Haddad
Change rdma_counter allocation to use rdma_zalloc_drv_obj() instead of, explicitly allocating at core, in order to be contained inside driver specific structures. Adjust all drivers that use it to have their containing structure, and add driver specific initialization operation. This change is needed to allow upcoming patches to implement optional-counters binding whereas inside each driver specific counter struct his bound optional-counters will be maintained. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/a5a484f421fc2e5595158e61a354fba43272b02d.1741875070.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18RDMA/mlx5: Add optional counters for RDMA_TX/RX_packets/bytesPatrisious Haddad
Add the following optional counters: rdma_tx_packets,rdma_rx_bytes,rdma_rx_packets,rdma_tx_bytes. Which counts all RDMA packets/bytes sent and received per link. Note that since each direction packet and byte counter are shared, the counter is only reset when both counters of that direction are removed. But from user-perspective each can be enabled/disabled separately. The counters can be enabled using: sudo rdma stat set link rocep8s0f0/1 optional-counters rdma_tx_packets And can be seen using: rdma stat -j show link rocep8s0f0/1 Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/9f2753ad636f21704416df64b47395c8991d1123.1741875070.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-18net: ethernet: ti: am65-cpsw: Fix NAPI registration sequenceVignesh Raghavendra
Registering the interrupts for TX or RX DMA Channels prior to registering their respective NAPI callbacks can result in a NULL pointer dereference. This is seen in practice as a random occurrence since it depends on the randomness associated with the generation of traffic by Linux and the reception of traffic from the wire. Fixes: 681eb2beb3ef ("net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path") Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Co-developed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250311154259.102865-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18um: Store full CSGSFS and SS register from mcontextBenjamin Berg
Doing this allows using registers as retrieved from an mcontext to be pushed to a process using PTRACE_SETREGS. It is not entirely clear to me why CSGSFS was masked. Doing so creates issues when using the mcontext as process state in seccomp and simply copying the register appears to work perfectly fine for ptrace. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20250224181827.647129-2-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18Merge tag 'samsung-pinctrl-6.15' of ↵Linus Walleij
https://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung into devel Samsung pinctrl drivers changes for v6.15 1. Add pin controller drivers for newly usptreamed Samsung Exynos2200 and Exynos7870. 2. Correct filter configuration offset of some of Google GS101 SoC pin banks, which later is supposed to be used during system suspend/resume. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-03-18um: virt-pci: Refactor virtio_pcidev into its own moduleTiwei Bie
Decouple virt-pci and virtio_pcidev, refactoring virtio_pcidev into its own module. Define a set of APIs for virt-pci. This allows for future addition of more PCI emulation implementations. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250315161910.4082396-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: work around sched_yield not yielding in time-travel modeBenjamin Berg
sched_yield by a userspace may not actually cause scheduling in time-travel mode as no time has passed. In the case seen it appears to be a badly implemented userspace spinlock in ASAN. Unfortunately, with time-travel it causes an extreme slowdown or even deadlock depending on the kernel configuration (CONFIG_UML_MAX_USERSPACE_ITERATIONS). Work around it by accounting time to the process whenever it executes a sched_yield syscall. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250314130815.226872-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18qed: remove cast to pointers passed to kfreeChen Ni
Remove unnecessary casts to pointer types passed to kfree. Issue detected by coccinelle: @@ type t1; expression *e; @@ -kfree((t1 *)e); +kfree(e); Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20250311070624.1037787-1-nichen@iscas.ac.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18um/locking: Remove semicolon from "lock" prefixUros Bizjak
Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "lock" prefix, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "lock; insn" as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving lock prefix. Trying to assemble e.g. "lock andl %eax, %ebx" results in: Error: expecting lockable instruction after `lock' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://patch.msgid.link/20250228090058.2499163-1-ubizjak@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: Update min_low_pfn to match changes in uml_reservedTiwei Bie
When uml_reserved is updated, min_low_pfn must also be updated accordingly. Otherwise, min_low_pfn will not accurately reflect the lowest available PFN. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250221041855.1156109-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: use str_yes_no() to remove hardcoded "yes" and "no"Ethan Carter Edwards
Remove hard-coded strings by using the str_yes_no() helper function provided by <linux/string_choices.h>. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Link: https://patch.msgid.link/20250220-um_yes_no-v1-1-2a355ed2d225@ethancedwards.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: hostfs: avoid issues on inode number reuse by hostBenjamin Berg
Some file systems (e.g. ext4) may reuse inode numbers once the inode is not in use anymore. Usually hostfs will keep an FD open for each inode, but this is not always the case. In the case of sockets, this cannot even be done properly. As such, the following sequence of events was possible: * application creates and deletes a socket * hostfs creates/deletes the socket on the host * inode is still in the hostfs cache * hostfs creates a new file * ext4 on the outside reuses the inode number * hostfs finds the socket inode for the newly created file * application receives -ENXIO when opening the file As mentioned, this can only happen if the deleted file is a special file that is never opened on the host (i.e. no .open fop). As such, to prevent issues, it is sufficient to check that the inode has the expected type. That said, also add a check for the inode birth time, just to be on the safe side. Fixes: 74ce793bcbde ("hostfs: Fix ephemeral inodes") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Mickaël Salaün <mic@digikod.net> Tested-by: Mickaël Salaün <mic@digikod.net> Link: https://patch.msgid.link/20250214092822.1241575-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: Allocate vdso page pointer staticallyTiwei Bie
Instead of dynamically allocating the pointer to the vdso page during boot, we can just allocate it statically. Doing so will reduce error handling and make the code slightly more readable. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250212045756.164977-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: remove copy_from_kernel_nofault_allowedBenjamin Berg
There is no need to override the default version of this function anymore as UML now has proper _nofault memory access functions. Doing this also fixes the fact that the implementation was incorrect as using mincore() will incorrectly flag pages as inaccessible if they were swapped out by the host. Fixes: f75b1b1bedfb ("um: Implement probe_kernel_read()") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250210160926.420133-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>