summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-12netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flagsDavid Howells
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used correctly. The problem is that we try to set them up in the request initialisation, but we the cache may be in the process of setting up still, and so the state may not be correct. Further, we secondarily sample the cache state and make contradictory decisions later. The issue arises because we set up the cache resources, which allows the cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which triggers cache writing even if we didn't set the flags when allocating. Fix this in the following way: (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in ->init_request() rather than trying to juggle that in netfs_alloc_request(). (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is to be done, then PG_private_2 is to be used rather than only setting it if we decide to cache and then having netfs_rreq_unlock_folios() set the non-PG_private_2 writeback-to-cache if it wasn't set. (3) Split netfs_rreq_unlock_folios() into two functions, one of which contains the deprecated code for using PG_private_2 to avoid accidentally doing the writeback path - and always use it if USE_PGPRIV2 is set. (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always wait for PG_private_2. This function is deprecated and only used by ceph anyway, and so label it so. (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use fscache_operation_valid() on the cache_resources instead. This has the advantage of picking up the result of netfs_begin_cache_read() and fscache_begin_write_operation() - which are called after the object is initialised and will wait for the cache to come to a usable state. Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be applied on top of that. Without this as well, things like: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { and: WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386 may happen, along with some UAFs due to PG_private_2 not getting used to wait on writeback completion. Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty") Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Hristo Venev <hristo@venev.name> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a ↵David Howells
second writeback flag" This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a. Revert the patch that removes the deprecated use of PG_private_2 in netfslib for the moment as Ceph is actually still using this to track data copied to the cache. Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag") Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12file: fix typo in take_fd() commentMathias Krause
The explanatory comment above take_fd() contains a typo, fix that to not confuse readers. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240809135035.748109-1-minipli@grsecurity.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12pidfd: prevent creation of pidfds for kthreadsChristian Brauner
It's currently possible to create pidfds for kthreads but it is unclear what that is supposed to mean. Until we have use-cases for it and we figured out what behavior we want block the creation of pidfds for kthreads. Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner Fixes: 32fcb426ec00 ("pid: add pidfd_open()") Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12netfs: clean up after renaming FSCACHE_DEBUG configLukas Bulwahn
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First, NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas the actual intended config is called NETFS_SUPPORT. Second, the config renaming misses to adjust the documentation of the functionality of this config. Clean up those two points. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Link: https://lore.kernel.org/r/20240731073902.69262-1-lukas.bulwahn@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12libfs: fix infinite directory reads for offset diryangerkun
After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below). 1. create 5000 files(1 2 3...) under one dir 2. call readdir(man 3 readdir) once, and get one entry 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) 4. loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition) We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12nsfs: fix ioctl declarationChristian Brauner
The kernel is writing an object of type __u64, so the ioctl has to be defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5). Reported-by: Dmitry V. Levin <ldv@strace.io> Link: https://lore.kernel.org/r/20240730164554.GA18486@altlinux.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12fs/netfs/fscache_cookie: add missing "n_accesses" checkMax Kellermann
This fixes a NULL pointer dereference bug due to a data race which looks like this: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_rreq_write_to_cache_work RIP: 0010:cachefiles_prepare_write+0x30/0xa0 Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10 RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286 RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000 RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438 RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001 R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68 R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00 FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x70 ? page_fault_oops+0x15d/0x440 ? search_module_extables+0xe/0x40 ? fixup_exception+0x22/0x2f0 ? exc_page_fault+0x5f/0x100 ? asm_exc_page_fault+0x22/0x30 ? cachefiles_prepare_write+0x30/0xa0 netfs_rreq_write_to_cache_work+0x135/0x2e0 process_one_work+0x137/0x2c0 worker_thread+0x2e9/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- This happened because fscache_cookie_state_machine() was slow and was still running while another process invoked fscache_unuse_cookie(); this led to a fscache_cookie_lru_do_one() call, setting the FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by fscache_cookie_state_machine(), withdrawing the cookie via cachefiles_withdraw_cookie(), clearing cookie->cache_priv. At the same time, yet another process invoked cachefiles_prepare_write(), which found a NULL pointer in this code line: struct cachefiles_object *object = cachefiles_cres_object(cres); The next line crashes, obviously: struct cachefiles_cache *cache = object->volume->cache; During cachefiles_prepare_write(), the "n_accesses" counter is non-zero (via fscache_begin_operation()). The cookie must not be withdrawn until it drops to zero. The counter is checked by fscache_cookie_state_machine() before switching to FSCACHE_COOKIE_STATE_RELINQUISHING and FSCACHE_COOKIE_STATE_WITHDRAWING (in "case FSCACHE_COOKIE_STATE_FAILED"), but not for FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case FSCACHE_COOKIE_STATE_ACTIVE"). This patch adds the missing check. With a non-zero access counter, the function returns and the next fscache_end_cookie_access() call will queue another fscache_cookie_state_machine() call to handle the still-pending FSCACHE_COOKIE_DO_LRU_DISCARD. Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12filelock: fix name of file_lease slab cacheOmar Sandoval
When struct file_lease was split out from struct file_lock, the name of the file_lock slab cache was copied to the new slab cache for file_lease. This name conflict causes confusion in /proc/slabinfo and /sys/kernel/slab. In particular, it caused failures in drgn's test case for slab cache merging. Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59a/tests/linux_kernel/helpers/test_slab.py#L81 Fixes: c69ff4071935 ("filelock: split leases out of struct file_lock") Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12netfs: Fault in smaller chunks for non-large folio mappingsMatthew Wilcox (Oracle)
As in commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large folio mappings"), we can see a performance loss for filesystems which have not yet been converted to large folios. Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240527201735.1898381-1-willy@infradead.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12Merge tag 'platform-drivers-x86-v6.11-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "While the ideapad concurrency fix itself is relatively straightforward, it required moving code around and adding a bit of supporting infrastructure to have a clean inter-driver interface. This shows up in the diffstats. - ideapad-laptop / lenovo-ymc: Protect VPC calls with a mutex - amd/pmf: Query HPD data also when ALS is disabled" * tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc platform/x86: ideapad-laptop: introduce a generic notification chain platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
2024-08-12Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull fd bitmap fix from Al Viro: "Fix bitmap corruption on close_range() by cleaning up copy_fd_bitmaps()" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
2024-08-12net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb()Zheng Zhang
When there are multiple ap interfaces on one band and with WED on, turning the interface down will cause a kernel panic on MT798X. Previously, cb_priv was freed in mtk_wed_setup_tc_block() without marking NULL,and mtk_wed_setup_tc_block_cb() didn't check the value, too. Assign NULL after free cb_priv in mtk_wed_setup_tc_block() and check NULL in mtk_wed_setup_tc_block_cb(). ---------- Unable to handle kernel paging request at virtual address 0072460bca32b4f5 Call trace: mtk_wed_setup_tc_block_cb+0x4/0x38 0xffffffc0794084bc tcf_block_playback_offloads+0x70/0x1e8 tcf_block_unbind+0x6c/0xc8 ... --------- Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support") Signed-off-by: Zheng Zhang <everything411@qq.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: mana: Fix RX buf alloc_size alignment and atomic op panicHaiyang Zhang
The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment is affected by the alloc_size passed into napi_build_skb(). The size needs to be aligned properly for better performance and atomic operations. Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic operations may panic on the skb_shinfo(skb)->dataref due to alignment fault. To fix this bug, add proper alignment to the alloc_size calculation. Sample panic info: [ 253.298819] Unable to handle kernel paging request at virtual address ffff000129ba5cce [ 253.300900] Mem abort info: [ 253.301760] ESR = 0x0000000096000021 [ 253.302825] EC = 0x25: DABT (current EL), IL = 32 bits [ 253.304268] SET = 0, FnV = 0 [ 253.305172] EA = 0, S1PTW = 0 [ 253.306103] FSC = 0x21: alignment fault Call trace: __skb_clone+0xfc/0x198 skb_clone+0x78/0xe0 raw6_local_deliver+0xfc/0x228 ip6_protocol_deliver_rcu+0x80/0x500 ip6_input_finish+0x48/0x80 ip6_input+0x48/0xc0 ip6_sublist_rcv_finish+0x50/0x78 ip6_sublist_rcv+0x1cc/0x2b8 ipv6_list_rcv+0x100/0x150 __netif_receive_skb_list_core+0x180/0x220 netif_receive_skb_list_internal+0x198/0x2a8 __napi_poll+0x138/0x250 net_rx_action+0x148/0x330 handle_softirqs+0x12c/0x3a0 Cc: stable@vger.kernel.org Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property physFrank Li
Add missed property phys, which indicate how connect to serdes phy. Fix below warning: arch/arm64/boot/dts/freescale/fsl-lx2160a-honeycomb.dtb: fsl-mc@80c000000: dpmacs:ethernet@7: Unevaluated properties are not allowed ('phys' was unexpected) Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12Merge branch 'vsc73xx-fix-mdio-and-phy'David S. Miller
Pawel Dembicki says: ==================== net: dsa: vsc73xx: fix MDIO bus access and PHY opera This series are extracted patches from net-next series [0]. The VSC73xx driver has issues with PHY configuration. This patch series fixes most of them. The first patch synchronizes the register configuration routine with the datasheet recommendations. Patches 2-3 restore proper communication on the MDIO bus. Currently, the write value isn't sent to the MDIO register, and without a busy check, communication with the PHY can be interrupted. This causes the PHY to receive improper configuration and autonegotiation could fail. The fourth patch removes the PHY reset blockade, as it is no longer required. After fixing the MDIO operations, autonegotiation became possible. The last patch removes the blockade, which became unnecessary after the MDIO operations fix. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=874739&state=%2A&archive=both ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: phy: vitesse: repair vsc73xx autonegotiationPawel Dembicki
When the vsc73xx mdio bus work properly, the generic autonegotiation configuration works well. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: dsa: vsc73xx: allow phy resettingPawel Dembicki
Resetting the VSC73xx PHY was problematic because the MDIO bus, without a busy check, read and wrote incorrect register values. My investigation indicates that resetting the PHY only triggers changes in configuration. However, improper register values written earlier were only exposed after a soft reset. The reset itself wasn't the issue; rather, the problem stemmed from incorrect read and write operations. A 'soft_reset' can now proceed normally. There are no reasons to keep the VSC73xx from being reset. This commit removes the reset blockade in the 'vsc73xx_phy_write' function. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: dsa: vsc73xx: check busy flag in MDIO operationsPawel Dembicki
The VSC73xx has a busy flag used during MDIO operations. It is raised when MDIO read/write operations are in progress. Without it, PHYs are misconfigured and bus operations do not work as expected. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: dsa: vsc73xx: pass value in phy_write operationPawel Dembicki
In the 'vsc73xx_phy_write' function, the register value is missing, and the phy write operation always sends zeros. This commit passes the value variable into the proper register. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: dsa: vsc73xx: fix port MAC configuration in full duplex modePawel Dembicki
According to the datasheet description ("Port Mode Procedure" in 5.6.2), the VSC73XX_MAC_CFG_WEXC_DIS bit is configured only for half duplex mode. The WEXC_DIS bit is responsible for MAC behavior after an excessive collision. Let's set it as described in the datasheet. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: axienet: Fix register defines comment descriptionRadhey Shyam Pandey
In axiethernet header fix register defines comment description to be inline with IP documentation. It updates MAC configuration register, MDIO configuration register and frame filter control description. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12atm: idt77252: prevent use after free in dequeue_rx()Dan Carpenter
We can't dereference "skb" after calling vcc->push() because the skb is released. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11Linux 6.11-rc3v6.11-rc3Linus Torvalds
2024-08-11Merge tag 'x86-urgent-2024-08-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Fix 32-bit PTI for real. pti_clone_entry_text() is called twice, once before initcalls so that initcalls can use the user-mode helper and then again after text is set read only. Setting read only on 32-bit might break up the PMD mapping, which makes the second invocation of pti_clone_entry_text() find the mappings out of sync and failing. Allow the second call to split the existing PMDs in the user mapping and synchronize with the kernel mapping. - Don't make acpi_mp_wake_mailbox read-only after init as the mail box must be writable in the case that CPU hotplug operations happen after boot. Otherwise the attempt to start a CPU crashes with a write to read only memory. - Add a missing sanity check in mtrr_save_state() to ensure that the fixed MTRR MSRs are supported. Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but the WARN_ON() can bring systems down when panic on warn is set. * tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed MTRRs exist before saving them x86/paravirt: Fix incorrect virt spinlock setting on bare metal x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox x86/mm: Fix PTI for i386 some more
2024-08-11Merge tag 'timers-urgent-2024-08-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time keeping fixes from Thomas Gleixner: - Fix a couple of issues in the NTP code where user supplied values are neither sanity checked nor clamped to the operating range. This results in integer overflows and eventualy NTP getting out of sync. According to the history the sanity checks had been removed in favor of clamping the values, but the clamping never worked correctly under all circumstances. The NTP people asked to not bring the sanity checks back as it might break existing applications. Make the clamping work correctly and add it where it's missing - If adjtimex() sets the clock it has to trigger the hrtimer subsystem so it can adjust and if the clock was set into the future expire timers if needed. The caller should provide a bitmask to tell hrtimers which clocks have been adjusted. adjtimex() uses not the proper constant and uses CLOCK_REALTIME instead, which is 0. So hrtimers adjusts only the clocks, but does not check for expired timers, which might make them expire really late. Use the proper bitmask constant instead. * tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex() ntp: Safeguard against time_constant overflow ntp: Clamp maxerror and esterror to operating range
2024-08-11Merge tag 'irq-urgent-2024-08-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Three small fixes for interrupt core and drivers: - The interrupt core fails to honor caller supplied affinity hints for non-managed interrupts and uses the system default affinity on startup instead. Set the missing flag in the descriptor to tell the core to use the provided affinity. - Fix a shift out of bounds error in the Xilinx driver - Handle switching to level trigger correctly in the RISCV APLIC driver. It failed to retrigger the interrupt which causes it to become stale" * tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration irqchip/xilinx: Fix shift out of bounds genirq/irqdesc: Honor caller provided affinity in alloc_desc()
2024-08-11Merge tag 'usb-6.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of small USB driver fixes for reported issues for 6.11-rc3. Included in here are: - usb serial driver MODULE_DESCRIPTION() updates - usb serial driver fixes - typec driver fixes - usb-ip driver fix - gadget driver fixes - dt binding update All of these have been in linux-next with no reported issues" * tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common() usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt() usb: gadget: f_fs: restore ffs_func_disable() functionality USB: serial: debug: do not echo input by default usb: typec: tipd: Delete extra semi-colon usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch() usb: gadget: u_serial: Set start_delayed during suspend usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap() usb: typec: fsa4480: Check if the chip is really there usb: gadget: core: Check for unset descriptor usb: vhci-hcd: Do not drop references before new references are gained usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed. usb: gadget: midi2: Fix the response for FB info with block 0xff dt-bindings: usb: microchip,usb2514: Add USB2517 compatible USB: serial: garmin_gps: use struct_size() to allocate pkt USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by USB: serial: add missing MODULE_DESCRIPTION() macros USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'
2024-08-11Merge tag 'tty-6.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for reported problems for 6.11-rc3. Included in here are: - sc16is7xx serial driver fixes - uartclk bugfix for a divide by zero issue - conmakehash userspace build issue fix All of these have been in linux-next for a while with no reported issues" * tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: vt: conmakehash: cope with abs_srctree no longer in env serial: sc16is7xx: fix invalid FIFO access with special register set serial: sc16is7xx: fix TX fifo corruption serial: core: check uartclk for zero to avoid divide by zero
2024-08-11Merge tag 'driver-core-6.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / documentation fixes from Greg KH: "Here are some small fixes, and some documentation updates for 6.11-rc3. Included in here are: - embargoed hardware documenation updates based on a lot of review by legal-types in lots of companies to try to make the process a _bit_ easier for us to manage over time. - rust firmware documentation fix - driver detach race fix for the fix that went into 6.11-rc1 All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Fix uevent_show() vs driver detach race Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes rust: firmware: fix invalid rustdoc link
2024-08-11Merge tag 'char-misc-6.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are some small char/misc/other driver fixes for 6.11-rc3 for reported issues. Included in here are: - binder driver fixes - fsi MODULE_DESCRIPTION() additions (people seem to love them...) - eeprom driver fix - Kconfig dependency fix to resolve build issues - spmi driver fixes All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: spmi: pmic-arb: add missing newline in dev_err format strings spmi: pmic-arb: Pass the correct of_node to irq_domain_add_tree binder_alloc: Fix sleeping function called from invalid context binder: fix descriptor lookup for context manager char: add missing NetWinder MODULE_DESCRIPTION() macros misc: mrvl-cn10k-dpi: add PCI_IOV dependency eeprom: ee1004: Fix locking issues in ee1004_probe() fsi: add missing MODULE_DESCRIPTION() macros
2024-08-11Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two core fixes: one to prevent discard type changes (seen on iSCSI) during intermittent errors and the other is fixing a lockdep problem caused by the queue limits change. And one driver fix in ufs" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Keep the discard mode stable scsi: sd: Move sd_read_cpr() out of the q->limits_lock region scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic
2024-08-11Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-qDavid S. Miller
ueue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-08-07 (igc) This series contains updates to igc driver only. Faizal adjusts the size of the MAC internal buffer on i226 devices to resolve an errata for leaking packet transmits. He also corrects a condition in which qbv_config_change_errors are incorrectly counted. Lastly, he adjusts the conditions for resetting the adapter when changing TSN Tx mode and corrects the conditions in which gtxoffset register is set. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11net: ethernet: use ip_hdrlen() instead of bit shiftMoon Yeounsu
`ip_hdr(skb)->ihl << 2` is the same as `ip_hdrlen(skb)` Therefore, we should use a well-defined function not a bit shift to find the header length. It also compresses two lines to a single line. Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com> Reviewed-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-10Merge tag 'nfsd-6.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Two minor fixes for recent changes * tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets sunrpc: avoid -Wformat-security warning
2024-08-10Merge tag 'i2c-for-6.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - Two fixes for SMBusAlert handling in the I2C core: one to avoid an endless loop when scanning for handlers and one to make sure handlers are always called even if HW has broken behaviour - I2C header build fix for when ACPI is enabled but I2C isn't - The testunit gets a rename in the code to match the documentation - Two fixes for the Qualcomm GENI I2C controller are cleaning up the error exit patch in the runtime_resume() function. The first is disabling the clock, the second disables the icc on the way out * tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: testunit: match HostNotify test name with docs i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume i2c: qcom-geni: Add missing clk_disable_unprepare in geni_i2c_runtime_resume i2c: Fix conditional for substituting empty ACPI functions i2c: smbus: Send alert notifications to all devices if source not found i2c: smbus: Improve handling of stuck alerts
2024-08-10Merge tag 'dma-mapping-6.11-2024-08-10' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - avoid a deadlock with dma-debug and netconsole (Rik van Riel) * tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: avoid deadlock between dma debug vs printk and netconsole
2024-08-10Merge tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull more bcachefs fixes from Kent Overstreet: "A couple last minute fixes for the new disk accounting - fix a bug that was causing ACLs to seemingly "disappear" - new on disk format version, bcachefs_metadata_version_disk_accounting_v3 bcachefs_metadata_version_disk_accounting_v2 accidentally included padding in disk_accounting_key; fortunately, 6.11 isn't out yet so we can fix this with another version bump" * tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefs: bcachefs: bcachefs_metadata_version_disk_accounting_v3 bcachefs: improve bch2_dev_usage_to_text() bcachefs: bch2_accounting_invalid() bcachefs: Switch to .get_inode_acl()
2024-08-10wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletionJanne Grunau
wpa_supplicant 2.11 sends since 1efdba5fdc2c ("Handle PMKSA flush in the driver for SAE/OWE offload cases") SSID based PMKSA del commands. brcmfmac is not prepared and tries to dereference the NULL bssid and pmkid pointers in cfg80211_pmksa. PMKID_V3 operations support SSID based updates so copy the SSID. Fixes: a96202acaea4 ("wifi: brcmfmac: cfg80211: Add support for PMKID_V3 operations") Cc: stable@vger.kernel.org # 6.4.x Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Neal Gompa <neal@gompa.dev> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20240803-brcmfmac_pmksa_del_ssid-v1-1-4e85f19135e1@jannau.net
2024-08-10irqchip/riscv-aplic: Retrigger MSI interrupt on source configurationYong-Xuan Wang
The section 4.5.2 of the RISC-V AIA specification says that "any write to a sourcecfg register of an APLIC might (or might not) cause the corresponding interrupt-pending bit to be set to one if the rectified input value is high (= 1) under the new source mode." When the interrupt type is changed in the sourcecfg register, the APLIC device might not set the corresponding pending bit, so the interrupt might never become pending. To handle sourcecfg register changes for level-triggered interrupts in MSI mode, manually set the pending bit for retriggering interrupt so it gets retriggered if it was already asserted. Fixes: ca8df97fe679 ("irqchip/riscv-aplic: Add support for MSI-mode") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240809071049.2454-1-yongxuan.wang@sifive.com
2024-08-10irqchip/xilinx: Fix shift out of boundsRadhey Shyam Pandey
The device tree property 'xlnx,kind-of-intr' is sanity checked that the bitmask contains only set bits which are in the range of the number of interrupts supported by the controller. The check is done by shifting the mask right by the number of supported interrupts and checking the result for zero. The data type of the mask is u32 and the number of supported interrupts is up to 32. In case of 32 interrupts the shift is out of bounds, resulting in a mismatch warning. The out of bounds condition is also reported by UBSAN: UBSAN: shift-out-of-bounds in irq-xilinx-intc.c:332:22 shift exponent 32 is too large for 32-bit type 'unsigned int' Fix it by promoting the mask to u64 for the test. Fixes: d50466c90724 ("microblaze: intc: Refactor DT sanity check") Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/1723186944-3571957-1-git-send-email-radhey.shyam.pandey@amd.com
2024-08-09Merge branch 'mlx5-misc-fixes-2024-08-08'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 misc fixes 2024-08-08 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/20240808144107.2095424-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09net/mlx5e: Fix queue stats access to non-existing channels splatGal Pressman
The queue stats API queries the queues according to the real_num_[tr]x_queues, in case the device is down and channels were not yet created, don't try to query their statistics. To trigger the panic, run this command before the interface is brought up: ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml --dump qstats-get --json '{"ifindex": 4}' BUG: kernel NULL pointer dereference, address: 0000000000000c00 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 3 UID: 0 PID: 977 Comm: python3 Not tainted 6.10.0+ #40 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core] Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08 RSP: 0018:ffff888116be37d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004 RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0 RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004 R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000 R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000 FS: 00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x14e/0x3d0 ? exc_page_fault+0x73/0x130 ? asm_exc_page_fault+0x22/0x30 ? mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core] netdev_nl_stats_by_netdev+0x2a6/0x4c0 ? __rmqueue_pcplist+0x351/0x6f0 netdev_nl_qstats_get_dumpit+0xc4/0x1b0 genl_dumpit+0x2d/0x80 netlink_dump+0x199/0x410 __netlink_dump_start+0x1aa/0x2c0 genl_family_rcv_msg_dumpit+0x94/0xf0 ? __pfx_genl_start+0x10/0x10 ? __pfx_genl_dumpit+0x10/0x10 ? __pfx_genl_done+0x10/0x10 genl_rcv_msg+0x116/0x2b0 ? __pfx_netdev_nl_qstats_get_dumpit+0x10/0x10 ? __pfx_genl_rcv_msg+0x10/0x10 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x21a/0x340 netlink_sendmsg+0x1f4/0x440 __sys_sendto+0x1b6/0x1c0 ? do_sock_setsockopt+0xc3/0x180 ? __sys_setsockopt+0x60/0xb0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x50/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f43757132b0 Code: c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 41 54 48 83 ec 20 RSP: 002b:00007ffd258da048 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007ffd258da0f8 RCX: 00007f43757132b0 RDX: 000000000000001c RSI: 00007f437464b850 RDI: 0000000000000003 RBP: 00007f4375085de0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f43751a6147 </TASK> Modules linked in: netconsole xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse [last unloaded: netconsole] CR2: 0000000000000c00 ---[ end trace 0000000000000000 ]--- RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core] Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08 RSP: 0018:ffff888116be37d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004 RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0 RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004 R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000 R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000 FS: 00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 7b66ae536a78 ("net/mlx5e: Add per queue netdev-genl stats") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240808144107.2095424-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09net/mlx5e: Correctly report errors for ethtool rx flowsCosmin Ratiu
Previously, an ethtool rx flow with no attrs would not be added to the NIC as it has no rules to configure the hw with, but it would be reported as successful to the caller (return code 0). This is confusing for the user as ethtool then reports "Added rule $num", but no rule was actually added. This change corrects that by instead reporting these wrong rules as -EINVAL. Fixes: b29c61dac3a2 ("net/mlx5e: Ethtool steering flow validation refactoring") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808144107.2095424-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09net/mlx5e: Take state lock during tx timeout reporterDragos Tatulea
mlx5e_safe_reopen_channels() requires the state lock taken. The referenced changed in the Fixes tag removed the lock to fix another issue. This patch adds it back but at a later point (when calling mlx5e_safe_reopen_channels()) to avoid the deadlock referenced in the Fixes tag. Fixes: eab0da38912e ("net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/all/ZplpKq8FKi3vwfxv@gmail.com/T/ Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808144107.2095424-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09net/mlx5e: SHAMPO, Increase timeout to improve latencyDragos Tatulea
During latency tests (netperf TCP_RR) a 30% degradation of HW GRO vs SW GRO was observed. This is due to SHAMPO triggering timeout filler CQEs instead of delivering the CQE for the packet. Having a short timeout for SHAMPO doesn't bring any benefits as it is the driver that does the merging, not the hardware. On the contrary, it can have a negative impact: additional filler CQEs are generated due to the timeout. As there is no way to disable this timeout, this change sets it to the maximum value. Instead of using the packet_merge.timeout parameter which is also used for LRO, set the value directly when filling in the rest of the SHAMPO parameters in mlx5e_build_rq_param(). Fixes: 99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808144107.2095424-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09net/mlx5: SD, Do not query MPIR register if no sd_groupTariq Toukan
Unconditionally calling the MPIR query on BF separate mode yields the FW syndrome below [1]. Do not call it unless admin clearly specified the SD group, i.e. expressing the intention of using the multi-PF netdev feature. This fix covers cases not covered in commit fca3b4791850 ("net/mlx5: Do not query MPIR on embedded CPU function"). [1] mlx5_cmd_out_err:808:(pid 8267): ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x685f19), err(-5) Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240808144107.2095424-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09Merge branch ↵Jakub Kicinski
'don-t-take-hw-uso-path-when-packets-can-t-be-checksummed-by-device' Jakub Sitnicki says: ==================== Don't take HW USO path when packets can't be checksummed by device This series addresses a recent regression report from syzbot [1]. After enabling UDP_SEGMENT for egress devices which don't support checksum offload [2], we need to tighten down the checks which let packets take the HW USO path. The fix consists of two parts: 1. don't let devices offer USO without checksum offload, and 2. force software USO fallback in presence of IPv6 extension headers. [1] https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10154dbded6d6a2fecaebdfda206609de0f121a9 v3: https://lore.kernel.org/r/20240807-udp-gso-egress-from-tunnel-v3-0-8828d93c5b45@cloudflare.com v2: https://lore.kernel.org/r/20240801-udp-gso-egress-from-tunnel-v2-0-9a2af2f15d8d@cloudflare.com v1: https://lore.kernel.org/r/20240725-udp-gso-egress-from-tunnel-v1-0-5e5530ead524@cloudflare.com ==================== Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-0-f5c5b4149ab9@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09selftests/net: Add coverage for UDP GSO with IPv6 extension headersJakub Sitnicki
After enabling UDP GSO for devices not offering checksum offload, we have hit a regression where a bad offload warning can be triggered when sending a datagram with IPv6 extension headers. Extend the UDP GSO IPv6 tests to cover this scenario. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-3-f5c5b4149ab9@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09udp: Fall back to software USO if IPv6 extension headers are presentJakub Sitnicki
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") we have intentionally allowed UDP GSO packets marked CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and checksummed by a software fallback when the egress device lacks these features. What was not taken into consideration is that a CHECKSUM_NONE skb can be handed over to the GSO stack also when the egress device advertises the tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature. This will happen when there are IPv6 extension headers present, which we check for in __ip6_append_data(). Syzbot has discovered this scenario, producing a warning as below: ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869) WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 Modules linked in: CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 [...] Call Trace: <TASK> __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661 __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137 ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222 ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958 udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292 udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xef/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585 ___sys_sendmsg net/socket.c:2639 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725 __do_sys_sendmmsg net/socket.c:2754 [inline] __se_sys_sendmmsg net/socket.c:2751 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [...] </TASK> We are hitting the bad offload warning because when an egress device is capable of handling segmentation offload requested by skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce any segment skbs and return NULL. See the skb_gso_ok() branch in {__udp,tcp,sctp}_gso_segment helpers. To fix it, force a fallback to software USO when processing a packet with IPv6 extension headers, since we don't know if these can checksummed by all devices which offer USO. Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/ Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-2-f5c5b4149ab9@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>