summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-12igc: Clean the TX buffer and TX descriptor ringMuhammad Husaini Zulkifli
There could be a race condition during link down where interrupt being generated and igc_clean_tx_irq() been called to perform the TX completion. Properly clear the TX buffer/descriptor ring and disable the TX Queue ring in igc_free_tx_resources() to avoid that. Kernel trace: [ 108.237177] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLIFUI1.R00.4204.A00.2105270302 05/27/2021 [ 108.237178] RIP: 0010:refcount_warn_saturate+0x55/0x110 [ 108.242143] RSP: 0018:ffff9e7980003db0 EFLAGS: 00010286 [ 108.245555] Code: 84 bc 00 00 00 c3 cc cc cc cc 85 f6 74 46 80 3d 20 8c 4d 01 00 75 ee 48 c7 c7 88 f4 03 ab c6 05 10 8c 4d 01 01 e8 0b 10 96 ff <0f> 0b c3 cc cc cc cc 80 3d fc 8b 4d 01 00 75 cb 48 c7 c7 b0 f4 03 [ 108.250434] [ 108.250434] RSP: 0018:ffff9e798125f910 EFLAGS: 00010286 [ 108.254358] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 108.259325] [ 108.259325] RAX: 0000000000000000 RBX: ffff8ddb935b8000 RCX: 0000000000000027 [ 108.261868] RDX: ffff8de250a28800 RSI: ffff8de250a1c580 RDI: ffff8de250a1c580 [ 108.265538] RDX: 0000000000000027 RSI: 0000000000000002 RDI: ffff8de250a9c588 [ 108.265539] RBP: ffff8ddb935b8000 R08: ffffffffab2655a0 R09: ffff9e798125f898 [ 108.267914] RBP: ffff8ddb8a5b8d80 R08: 0000005648eba354 R09: 0000000000000000 [ 108.270196] R10: 0000000000000001 R11: 000000002d2d2d2d R12: ffff9e798125f948 [ 108.270197] R13: ffff9e798125fa1c R14: ffff8ddb8a5b8d80 R15: 7fffffffffffffff [ 108.273001] R10: 000000002d2d2d2d R11: 000000002d2d2d2d R12: ffff8ddb8a5b8ed4 [ 108.276410] FS: 00007f605851b740(0000) GS:ffff8de250a80000(0000) knlGS:0000000000000000 [ 108.280597] R13: 00000000000002ac R14: 00000000ffffff99 R15: ffff8ddb92561b80 [ 108.282966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.282967] CR2: 00007f053c039248 CR3: 0000000185850003 CR4: 0000000000f70ee0 [ 108.286206] FS: 0000000000000000(0000) GS:ffff8de250a00000(0000) knlGS:0000000000000000 [ 108.289701] PKRU: 55555554 [ 108.289702] Call Trace: [ 108.289704] <TASK> [ 108.293977] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.297562] sock_alloc_send_pskb+0x20c/0x240 [ 108.301494] CR2: 00007f053c03a168 CR3: 0000000184394002 CR4: 0000000000f70ef0 [ 108.301495] PKRU: 55555554 [ 108.306464] __ip_append_data.isra.0+0x96f/0x1040 [ 108.309441] Call Trace: [ 108.309443] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.314927] <IRQ> [ 108.314928] sock_wfree+0x1c7/0x1d0 [ 108.318078] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.320276] skb_release_head_state+0x32/0x90 [ 108.324812] ip_make_skb+0xf6/0x130 [ 108.327188] skb_release_all+0x16/0x40 [ 108.330775] ? udp_sendmsg+0x9f3/0xcb0 [ 108.332626] napi_consume_skb+0x48/0xf0 [ 108.334134] ? xfrm_lookup_route+0x23/0xb0 [ 108.344285] igc_poll+0x787/0x1620 [igc] [ 108.346659] udp_sendmsg+0x9f3/0xcb0 [ 108.360010] ? ttwu_do_activate+0x40/0x220 [ 108.365237] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.366744] ? try_to_wake_up+0x289/0x5e0 [ 108.376987] ? sock_sendmsg+0x81/0x90 [ 108.395698] ? __pfx_process_timeout+0x10/0x10 [ 108.395701] sock_sendmsg+0x81/0x90 [ 108.409052] __napi_poll+0x29/0x1c0 [ 108.414279] ____sys_sendmsg+0x284/0x310 [ 108.419507] net_rx_action+0x257/0x2d0 [ 108.438216] ___sys_sendmsg+0x7c/0xc0 [ 108.439723] __do_softirq+0xc1/0x2a8 [ 108.444950] ? finish_task_switch+0xb4/0x2f0 [ 108.452077] irq_exit_rcu+0xa9/0xd0 [ 108.453584] ? __schedule+0x372/0xd00 [ 108.460713] common_interrupt+0x84/0xa0 [ 108.467840] ? clockevents_program_event+0x95/0x100 [ 108.474968] </IRQ> [ 108.482096] ? do_nanosleep+0x88/0x130 [ 108.489224] <TASK> [ 108.489225] asm_common_interrupt+0x26/0x40 [ 108.496353] ? __rseq_handle_notify_resume+0xa9/0x4f0 [ 108.503478] RIP: 0010:cpu_idle_poll+0x2c/0x100 [ 108.510607] __sys_sendmsg+0x5d/0xb0 [ 108.518687] Code: 05 e1 d9 c8 00 65 8b 15 de 64 85 55 85 c0 7f 57 e8 b9 ef ff ff fb 65 48 8b 1c 25 00 cc 02 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 77 63 cd 00 85 c0 75 ed e8 ce ec ff ff [ 108.525817] do_syscall_64+0x44/0xa0 [ 108.531563] RSP: 0018:ffffffffab203e70 EFLAGS: 00000202 [ 108.538693] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 108.546775] [ 108.546777] RIP: 0033:0x7f605862b7f7 [ 108.549495] RAX: 0000000000000001 RBX: ffffffffab20c940 RCX: 000000000000003b [ 108.551955] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 108.554068] RDX: 4000000000000000 RSI: 000000002da97f6a RDI: 00000000002b8ff4 [ 108.559816] RSP: 002b:00007ffc99264058 EFLAGS: 00000246 [ 108.564178] RBP: 0000000000000000 R08: 00000000002b8ff4 R09: ffff8ddb01554c80 [ 108.571302] ORIG_RAX: 000000000000002e [ 108.571303] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f605862b7f7 [ 108.574023] R10: 000000000000015b R11: 000000000000000f R12: ffffffffab20c940 [ 108.574024] R13: 0000000000000000 R14: ffff8de26fbeef40 R15: ffffffffab20c940 [ 108.578727] RDX: 0000000000000000 RSI: 00007ffc992640a0 RDI: 0000000000000003 [ 108.578728] RBP: 00007ffc99264110 R08: 0000000000000000 R09: 175f48ad1c3a9c00 [ 108.581187] do_idle+0x62/0x230 [ 108.585890] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc992642d8 [ 108.585891] R13: 00005577814ab2ba R14: 00005577814addf0 R15: 00007f605876d000 [ 108.587920] cpu_startup_entry+0x1d/0x20 [ 108.591422] </TASK> [ 108.596127] rest_init+0xc5/0xd0 [ 108.600490] ---[ end trace 0000000000000000 ]--- Test Setup: DUT: - Change mac address on DUT Side. Ensure NIC not having same MAC Address - Running udp_tai on DUT side. Let udp_tai running throughout the test Example: ./udp_tai -i enp170s0 -P 100000 -p 90 -c 1 -t 0 -u 30004 Host: - Perform link up/down every 5 second. Result: Kernel panic will happen on DUT Side. Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-12clk: mediatek: mt8365: Fix index issueAlexandre Mergnat
Before the patch [1], the clock probe was done directly in the clk-mt8365 driver. In this probe function, the array which stores the data clocks is sized using the higher defined numbers (*_NR_CLOCK) in the clock lists [2]. Currently, with the patch [1], the specific clk-mt8365 probe function is replaced by the mtk generic one [3], which size the clock data array by adding all the clock descriptor array size provided by the clk-mt8365 driver. Actually, all clock indexes come from the header file [2], that mean, if there are more clock (then more index) in the header file [2] than the number of clock declared in the clock descriptor arrays (which is the case currently), the clock data array will be undersized and then the generic probe function will overflow when it will try to write in "clk_data[CLK_INDEX]". Actually, instead of crashing at boot, the probe function returns an error in the log which looks like: "of_clk_hw_onecell_get: invalid index 135", then this clock isn't enabled. Solve this issue by adding in the driver the missing clocks declared in the header clock file [2]. [1]: Commit ffe91cb28f6a ("clk: mediatek: mt8365: Convert to mtk_clk_simple_{probe,remove}()") [2]: include/dt-bindings/clock/mediatek,mt8365-clk.h [3]: drivers/clk/mediatek/clk-mtk.c Fixes: ffe91cb28f6a ("clk: mediatek: mt8365: Convert to mtk_clk_simple_{probe,remove}()") Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com> Link: https://lore.kernel.org/r/20230517-fix-clk-index-v3-1-be4df46065c4@baylibre.com Tested-by: Markus Schneider-Pargmann <msp@baylibre.com> Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-12zswap: do not shrink if cgroup may not zswapNhat Pham
Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt. However, since the zswap's LRU is not memcg-aware, this can create the following pathological behavior: the cgroup whose zswap limit is 0 will evict pages from other cgroups continually, without lowering its own zswap usage. This means the shrinking will continue until the need for swap ceases or the pool becomes empty. As a result of this, we observe a disproportionate amount of zswap writeback and a perpetually small zswap pool in our experiments, even though the pool limit is never hit. More generally, a cgroup might unnecessarily evict pages from other cgroups before we drive the memcg back below its limit. This patch fixes the issue by rejecting zswap store attempt without shrinking the pool when obj_cgroup_may_zswap() returns false. [akpm@linux-foundation.org: fix return of unintialized value] [akpm@linux-foundation.org: s/ENOSPC/ENOMEM/] Link: https://lkml.kernel.org/r/20230530222440.2777700-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20230530232435.3097106-1-nphamcs@gmail.com Fixes: f4840ccfca25 ("zswap: memcg accounting") Signed-off-by: Nhat Pham <nphamcs@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12page cache: fix page_cache_next/prev_miss off by oneMike Kravetz
Ackerley Tng reported an issue with hugetlbfs fallocate here[1]. The issue showed up after the conversion of hugetlb page cache lookup code to use page_cache_next_miss. Code in hugetlb fallocate, userfaultfd and GUP is now using page_cache_next_miss to determine if a page is present the page cache. The following statement is used. present = page_cache_next_miss(mapping, index, 1) != index; There are two issues with page_cache_next_miss when used in this way. 1) If the passed value for index is equal to the 'wrap-around' value, the same index will always be returned. This wrap-around value is 0, so 0 will be returned even if page is present at index 0. 2) If there is no gap in the range passed, the last index in the range will be returned. When passed a range of 1 as above, the passed index value will be returned even if the page is present. The end result is the statement above will NEVER indicate a page is present in the cache, even if it is. As noted by Ackerley in [1], users can see this by hugetlb fallocate incorrectly returning EEXIST if pages are already present in the file. In addition, hugetlb pages will not be included in core dumps if they need to be brought in via GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages already present in the cache. It may try to allocate a new page and potentially return ENOMEM as opposed to EEXIST. Both page_cache_next_miss and page_cache_prev_miss have similar issues. Fix by: - Check for index equal to 'wrap-around' value and do not exit early. - If no gap is found in range, return index outside range. - Update function description to say 'wrap-around' value could be returned if passed as index. [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/ Link: https://lkml.kernel.org/r/20230602225747.103865-2-mike.kravetz@oracle.com Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Cc: Erdem Aktas <erdemaktas@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12ocfs2: check new file size on fallocate callLuís Henriques
When changing a file size with fallocate() the new size isn't being checked. In particular, the FSIZE ulimit isn't being checked, which makes fstest generic/228 fail. Simply adding a call to inode_newsize_ok() fixes this issue. Link: https://lkml.kernel.org/r/20230529152645.32680-1-lhenriques@suse.de Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Mark Fasheh <mark@fasheh.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12mailmap: add entry for John KeepingJohn Keeping
Map my corporate address to my personal one, as I am leaving the company. Link: https://lkml.kernel.org/r/20230531144839.1157112-1-john@keeping.me.uk Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()Kefeng Wang
If 'aggr_interval' is smaller than 'sample_interval', max_nr_accesses in damon_nr_accesses_to_accesses_bp() becomes zero which leads to divide error, let's validate the values of them in damon_set_attrs() to fix it, which similar to others attrs check. Link: https://lkml.kernel.org/r/20230527032101.167788-1-wangkefeng.wang@huawei.com Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes") Reported-by: syzbot+841a46899768ec7bec67@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=841a46899768ec7bec67 Link: https://lore.kernel.org/damon/00000000000055fc4e05fc975bc2@google.com/ Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12epoll: ep_autoremove_wake_function should use list_del_init_carefulBenjamin Segall
autoremove_wake_function uses list_del_init_careful, so should epoll's more aggressive variant. It only doesn't because it was copied from an older wait.c rather than the most recent. [bsegall@google.com: add comment] Link: https://lkml.kernel.org/r/xm26bki0ulsr.fsf_-_@google.com Link: https://lkml.kernel.org/r/xm26pm6hvfer.fsf@google.com Fixes: a16ceb139610 ("epoll: autoremove wakers even more aggressively") Signed-off-by: Ben Segall <bsegall@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12mm/gup_test: fix ioctl fail for compat taskHaibo Li
When tools/testing/selftests/mm/gup_test.c is compiled as 32bit, then run on arm64 kernel, it reports "ioctl: Inappropriate ioctl for device". Fix it by filling compat_ioctl in gup_test_fops Link: https://lkml.kernel.org/r/20230526022125.175728-1-haibo.li@mediatek.com Signed-off-by: Haibo Li <haibo.li@mediatek.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12nilfs2: reject devices with insufficient block countRyusuke Konishi
The current sanity check for nilfs2 geometry information lacks checks for the number of segments stored in superblocks, so even for device images that have been destructively truncated or have an unusually high number of segments, the mount operation may succeed. This causes out-of-bounds block I/O on file system block reads or log writes to the segments, the latter in particular causing "a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to hang. Fix this issue by checking the number of segments stored in the superblock and avoiding mounting devices that can cause out-of-bounds accesses. To eliminate the possibility of overflow when calculating the number of blocks required for the device from the number of segments, this also adds a helper function to calculate the upper bound on the number of segments and inserts a check using it. Link: https://lkml.kernel.org/r/20230526021332.3431-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+7d50f1e54a12ba3aeae2@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12ocfs2: fix use-after-free when unmounting read-only filesystemLuís Henriques
It's trivial to trigger a use-after-free bug in the ocfs2 quotas code using fstest generic/452. After a read-only remount, quotas are suspended and ocfs2_mem_dqinfo is freed through ->ocfs2_local_free_info(). When unmounting the filesystem, an UAF access to the oinfo will eventually cause a crash. BUG: KASAN: slab-use-after-free in timer_delete+0x54/0xc0 Read of size 8 at addr ffff8880389a8208 by task umount/669 ... Call Trace: <TASK> ... timer_delete+0x54/0xc0 try_to_grab_pending+0x31/0x230 __cancel_work_timer+0x6c/0x270 ocfs2_disable_quotas.isra.0+0x3e/0xf0 [ocfs2] ocfs2_dismount_volume+0xdd/0x450 [ocfs2] generic_shutdown_super+0xaa/0x280 kill_block_super+0x46/0x70 deactivate_locked_super+0x4d/0xb0 cleanup_mnt+0x135/0x1f0 ... </TASK> Allocated by task 632: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x8b/0x90 ocfs2_local_read_info+0xe3/0x9a0 [ocfs2] dquot_load_quota_sb+0x34b/0x680 dquot_load_quota_inode+0xfe/0x1a0 ocfs2_enable_quotas+0x190/0x2f0 [ocfs2] ocfs2_fill_super+0x14ef/0x2120 [ocfs2] mount_bdev+0x1be/0x200 legacy_get_tree+0x6c/0xb0 vfs_get_tree+0x3e/0x110 path_mount+0xa90/0xe10 __x64_sys_mount+0x16f/0x1a0 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 650: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x50 __kasan_slab_free+0xf9/0x150 __kmem_cache_free+0x89/0x180 ocfs2_local_free_info+0x2ba/0x3f0 [ocfs2] dquot_disable+0x35f/0xa70 ocfs2_susp_quotas.isra.0+0x159/0x1a0 [ocfs2] ocfs2_remount+0x150/0x580 [ocfs2] reconfigure_super+0x1a5/0x3a0 path_mount+0xc8a/0xe10 __x64_sys_mount+0x16f/0x1a0 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Link: https://lkml.kernel.org/r/20230522102112.9031-1-lhenriques@suse.de Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12lib/test_vmalloc.c: avoid garbage in page arrayLorenzo Stoakes
It turns out that alloc_pages_bulk_array() does not treat the page_array parameter as an output parameter, but rather reads the array and skips any entries that have already been allocated. This is somewhat unexpected and breaks this test, as we allocate the pages array uninitialised on the assumption it will be overwritten. As a result, the test was referencing uninitialised data and causing the PFN to not be valid and thus a WARN_ON() followed by a null pointer deref and panic. In addition, this is an array of pointers not of struct page objects, so we need only allocate an array with elements of pointer size. We solve both problems by simply using kcalloc() and referencing sizeof(struct page *) rather than sizeof(struct page). Link: https://lkml.kernel.org/r/20230524082424.10022-1-lstoakes@gmail.com Fixes: 869cb29a61a1 ("lib/test_vmalloc.c: add vm_map_ram()/vm_unmap_ram() test case") Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12nilfs2: fix possible out-of-bounds segment allocation in resize ioctlRyusuke Konishi
Syzbot reports that in its stress test for resize ioctl, the log writing function nilfs_segctor_do_construct hits a WARN_ON in nilfs_segctor_truncate_segments(). It turned out that there is a problem with the current implementation of the resize ioctl, which changes the writable range on the device (the range of allocatable segments) at the end of the resize process. This order is necessary for file system expansion to avoid corrupting the superblock at trailing edge. However, in the case of a file system shrink, if log writes occur after truncating out-of-bounds trailing segments and before the resize is complete, segments may be allocated from the truncated space. The userspace resize tool was fine as it limits the range of allocatable segments before performing the resize, but it can run into this issue if the resize ioctl is called alone. Fix this issue by changing nilfs_sufile_resize() to update the range of allocatable segments immediately after successful truncation of segment space in case of file system shrink. Link: https://lkml.kernel.org/r/20230524094348.3784-1-konishi.ryusuke@gmail.com Fixes: 4e33f9eab07e ("nilfs2: implement resize ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+33494cd0df2ec2931851@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000005434c405fbbafdc5@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12riscv/purgatory: remove PGO flagsRicardo Ribalda
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-4-b05c520b7296@chromium.org Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Cc: <stable@vger.kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12powerpc/purgatory: remove PGO flagsRicardo Ribalda
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-3-b05c520b7296@chromium.org Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: <stable@vger.kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12x86/purgatory: remove PGO flagsRicardo Ribalda
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-2-b05c520b7296@chromium.org Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Cc: <stable@vger.kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12kexec: support purgatories with .text.hot sectionsRicardo Ribalda
Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7. When upreving llvm I realised that kexec stopped working on my test platform. The reason seems to be that due to PGO there are multiple .text sections on the purgatory, and kexec does not supports that. This patch (of 4): Clang16 links the purgatory text in two sections when PGO is in use: [ 1] .text PROGBITS 0000000000000000 00000040 00000000000011a1 0000000000000000 AX 0 0 16 [ 2] .rela.text RELA 0000000000000000 00003498 0000000000000648 0000000000000018 I 24 1 8 ... [17] .text.hot. PROGBITS 0000000000000000 00003220 000000000000020b 0000000000000000 AX 0 0 1 [18] .rela.text.hot. RELA 0000000000000000 00004428 0000000000000078 0000000000000018 I 24 17 8 And both of them have their range [sh_addr ... sh_addr+sh_size] on the area pointed by `e_entry`. This causes that image->start is calculated twice, once for .text and another time for .text.hot. The second calculation leaves image->start in a random location. Because of this, the system crashes immediately after: kexec_core: Starting new kernel Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-0-b05c520b7296@chromium.org Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-1-b05c520b7296@chromium.org Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Philipp Rudo <prudo@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12mm/uffd: allow vma to merge as much as possiblePeter Xu
We used to not pass in the pgoff correctly when register/unregister uffd regions, it caused incorrect behavior on vma merging and can cause mergeable vmas being separate after ioctls return. For example, when we have: vma1(range 0-9, with uffd), vma2(range 10-19, no uffd) Then someone unregisters uffd on range (5-9), it should logically become: vma1(range 0-4, with uffd), vma2(range 5-19, no uffd) But with current code we'll have: vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd) This patch allows such merge to happen correctly before ioctl returns. This behavior seems to have existed since the 1st day of uffd. Since pgoff for vma_merge() is only used to identify the possibility of vma merging, meanwhile here what we did was always passing in a pgoff smaller than what we should, so there should have no other side effect besides not merging it. Let's still tentatively copy stable for this, even though I don't see anything will go wrong besides vma being split (which is mostly not user visible). Link: https://lkml.kernel.org/r/20230517190916.3429499-3-peterx@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12mm/uffd: fix vma operation where start addr cuts part of vmaPeter Xu
Patch series "mm/uffd: Fix vma merge/split", v2. This series contains two patches that fix vma merge/split for userfaultfd on two separate issues. Patch 1 fixes a regression since 6.1+ due to something we overlooked when converting to maple tree apis. The plan is we use patch 1 to replace the commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring uffd vma operations back aligned with the rest code again. Patch 2 fixes a long standing issue that vma can be left unmerged even if we can for either uffd register or unregister. Many thanks to Lorenzo on either noticing this issue from the assert movement patch, looking at this problem, and also provided a reproducer on the unmerged vma issue [1]. [1] https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e This patch (of 2): It seems vma merging with uffd paths is broken with either register/unregister, where right now we can feed wrong parameters to vma_merge() and it's found by recent patch which moved asserts upwards in vma_merge() by Lorenzo Stoakes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/ It's possible that "start" is contained within vma but not clamped to its start. We need to convert this into either "cannot merge" case or "can merge" case 4 which permits subdivision of prev by assigning vma to prev. As we loop, each subsequent VMA will be clamped to the start. This patch will eliminate the report and make sure vma_merge() calls will become legal again. One thing to mention is that the "Fixes: 29417d292bd0" below is there only to help explain where the warning can start to trigger, the real commit to fix should be 69dbe6daf104. Commit 29417d292bd0 helps us to identify the issue, but unfortunately we may want to keep it in Fixes too just to ease kernel backporters for easier tracking. Link: https://lkml.kernel.org/r/20230517190916.3429499-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230517190916.3429499-2-peterx@redhat.com Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/ Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12radix-tree: move declarations to headerArnd Bergmann
The xarray.c file contains the only call to radix_tree_node_rcu_free(), and it comes with its own extern declaration for it. This means the function definition causes a missing-prototype warning: lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes] Instead, move the declaration for this function to a new header that can be included by both, and do the same for the radix_tree_node_cachep variable that has the same underlying problem but does not cause a warning with gcc. [zhangpeng.00@bytedance.com: fix building radix tree test suite] Link: https://lkml.kernel.org/r/20230521095450.21332-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20230516194212.548910-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()Ryusuke Konishi
A syzbot fault injection test reported that nilfs_btnode_create_block, a helper function that allocates a new node block for b-trees, causes a kernel BUG for disk images where the file system block size is smaller than the page size. This was due to unexpected flags on the newly allocated buffer head, and it turned out to be because the buffer flags were not cleared by nilfs_btnode_abort_change_key() after an error occurred during a b-tree update operation and the buffer was later reused in that state. Fix this issue by using nilfs_btnode_delete() to abandon the unused preallocated buffer in nilfs_btnode_abort_change_key(). Link: https://lkml.kernel.org/r/20230513102428.10223-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b0a35a5c1f7e846d3b09@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000d1d6c205ebc4d512@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12io_uring/io-wq: don't clear PF_IO_WORKER on exitJens Axboe
A recent commit gated the core dumping task exit logic on current->flags remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time. This exposed a problem with the io-wq handling of that, which explicitly clears PF_IO_WORKER before calling do_exit(). The reasons for this manual clear of PF_IO_WORKER is historical, where io-wq used to potentially trigger a sleep on exit. As the io-wq thread is exiting, it should not participate any further accounting. But these days we don't need to rely on current->flags anymore, so we can safely remove the PF_IO_WORKER clearing. Reported-by: Zorro Lang <zlang@redhat.com> Reported-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/ Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-12Merge tag 'for-6.4-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A more fixes and regression fixes: - in subpage mode, fix crash when repairing metadata at the end of a stripe - properly enable async discard when remounting from read-only to read-write - scrub regression fixes: - respect read-only scrub when attempting to do a repair - fix reporting of found errors, the stats don't get properly accounted after a stripe repair" * tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: also report errors hit during the initial read btrfs: scrub: respect the read-only flag during repair btrfs: properly enable async discard when switching from RO->RW btrfs: subpage: fix a crash in metadata repair path
2023-06-12cgroup: Do not corrupt task iteration when rebinding subsystemXiu Jianfeng
We found a refcount UAF bug as follows: refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148 Workqueue: events cpuset_hotplug_workfn Call trace: refcount_warn_saturate+0xa0/0x148 __refcount_add.constprop.0+0x5c/0x80 css_task_iter_advance_css_set+0xd8/0x210 css_task_iter_advance+0xa8/0x120 css_task_iter_next+0x94/0x158 update_tasks_root_domain+0x58/0x98 rebuild_root_domains+0xa0/0x1b0 rebuild_sched_domains_locked+0x144/0x188 cpuset_hotplug_workfn+0x138/0x5a0 process_one_work+0x1e8/0x448 worker_thread+0x228/0x3e0 kthread+0xe0/0xf0 ret_from_fork+0x10/0x20 then a kernel panic will be triggered as below: Unable to handle kernel paging request at virtual address 00000000c0000010 Call trace: cgroup_apply_control_disable+0xa4/0x16c rebind_subsystems+0x224/0x590 cgroup_destroy_root+0x64/0x2e0 css_free_rwork_fn+0x198/0x2a0 process_one_work+0x1d4/0x4bc worker_thread+0x158/0x410 kthread+0x108/0x13c ret_from_fork+0x10/0x18 The race that cause this bug can be shown as below: (hotplug cpu) | (umount cpuset) mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex) cpuset_hotplug_workfn | rebuild_root_domains | rebind_subsystems update_tasks_root_domain | spin_lock_irq(&css_set_lock) css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id] while(css_task_iter_next) | &dcgrp->e_csets[ss->id]); css_task_iter_end | spin_unlock_irq(&css_set_lock) mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex) Inside css_task_iter_start/next/end, css_set_lock is hold and then released, so when iterating task(left side), the css_set may be moved to another list(right side), then it->cset_head points to the old list head and it->cset_pos->next points to the head node of new list, which can't be used as struct css_set. To fix this issue, switch from all css_sets to only scgrp's css_sets to patch in-flight iterators to preserve correct iteration, and then update it->cset_head as well. Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://www.spinics.net/lists/cgroups/msg37935.html Suggested-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/all/20230526114139.70274-1-xiujianfeng@huaweicloud.com/ Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Fixes: 2d8f243a5e6e ("cgroup: implement cgroup->e_csets[]") Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Tejun Heo <tj@kernel.org>
2023-06-12cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in ↵Tetsuo Handa
freezer_css_{online,offline}() syzbot is again reporting circular locking dependency between cpu_hotplug_lock and freezer_mutex. Do like what we did with commit 57dcd64c7e036299 ("cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex"). Reported-by: syzbot <syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=2ab700fe1829880a2ec6 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com> Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Tejun Heo <tj@kernel.org>
2023-06-12bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.Yonghong Song
The sysctl net/core/bpf_jit_enable does not work now due to commit 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc"). The commit saved the jitted insns into 'rw_image' instead of 'image' which caused bpf_jit_dump not dumping proper content. With 'echo 2 > /proc/sys/net/core/bpf_jit_enable', run './test_progs -t fentry_test'. Without this patch, one of jitted image for one particular prog is: flen=17 proglen=92 pass=4 image=0000000014c64883 from=test_progs pid=1807 00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000050: cc cc cc cc cc cc cc cc cc cc cc cc With this patch, the jitte image for the same prog is: flen=17 proglen=92 pass=4 image=00000000b90254b7 from=test_progs pid=1809 00000000: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 00000010: 0f 1e fa 31 f6 48 8b 57 00 48 83 fa 07 75 2b 48 00000020: 8b 57 10 83 fa 09 75 22 48 8b 57 08 48 81 e2 ff 00000030: 00 00 00 48 83 fa 08 75 11 48 8b 7f 18 be 01 00 00000040: 00 00 48 83 ff 0a 74 02 31 f6 48 bf 18 d0 14 00 00000050: 00 c9 ff ff 48 89 77 00 31 c0 c9 c3 Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20230609005439.3173569-1-yhs@fb.com
2023-06-12mmc: sdhci-msm: Disable broken 64-bit DMA on MSM8916Stephan Gerhold
While SDHCI claims to support 64-bit DMA on MSM8916 it does not seem to be properly functional. It is not immediately obvious because SDHCI is usually used with IOMMU bypassed on this SoC, and all physical memory has 32-bit addresses. But when trying to enable the IOMMU it quickly fails with an error such as the following: arm-smmu 1e00000.iommu: Unhandled context fault: fsr=0x402, iova=0xfffff200, fsynr=0xe0000, cbfrsynra=0x140, cb=3 mmc1: ADMA error: 0x02000000 mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00002e02 mmc1: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000000 mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013 mmc1: sdhci: Present: 0x03f80206 | Host ctl: 0x00000019 mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000 mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 mmc1: sdhci: Timeout: 0x0000000a | Int stat: 0x00000001 mmc1: sdhci: Int enab: 0x03ff900b | Sig enab: 0x03ff100b mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 mmc1: sdhci: Caps: 0x322dc8b2 | Caps_1: 0x00008007 mmc1: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000 mmc1: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x5b590000 mmc1: sdhci: Resp[2]: 0xe6487f80 | Resp[3]: 0x0a404094 mmc1: sdhci: Host ctl2: 0x00000008 mmc1: sdhci: ADMA Err: 0x00000001 | ADMA Ptr: 0x0000000ffffff224 mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP ----------- mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x60006400 | DLL cfg2: 0x00000000 mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00000000 | DDR cfg: 0x00000000 mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88018a8 Vndr func3: 0x00000000 mmc1: sdhci: ============================================ mmc1: sdhci: fffffffff200: DMA 0x0000ffffffffe100, LEN 0x0008, Attr=0x21 mmc1: sdhci: fffffffff20c: DMA 0x0000000000000000, LEN 0x0000, Attr=0x03 Looking closely it's obvious that only the 32-bit part of the address (0xfffff200) arrives at the SMMU, the higher 16-bit (0xffff...) get lost somewhere. This might not be a limitation of the SDHCI itself but perhaps the bus/interconnect it is connected to, or even the connection to the SMMU. Work around this by setting SDHCI_QUIRK2_BROKEN_64_BIT_DMA to avoid using 64-bit addresses. Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230518-msm8916-64bit-v1-1-5694b0f35211@gerhold.net Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-12xfrm: Use xfrm_state selector for BEET inputHerbert Xu
For BEET the inner address and therefore family is stored in the xfrm_state selector. Use that when decapsulating an input packet instead of incorrectly relying on a non-existent tunnel protocol. Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path") Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-12sctp: fix an error code in sctp_sf_eat_auth()Dan Carpenter
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition values and returning a kernel error code will cause issues in the caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM. Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12sctp: handle invalid error codes without calling BUG()Dan Carpenter
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition values but if the call to sctp_ulpevent_make_authkey() fails, it returns -ENOMEM. This results in calling BUG() inside the sctp_side_effects() function. Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE() instead. This code predates git. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12ipvlan: fix bound dev checking for IPv6 l3s modeHangbin Liu
The commit 59a0b022aa24 ("ipvlan: Make skb->skb_iif track skb->dev for l3s mode") fixed ipvlan bonded dev checking by updating skb skb_iif. This fix works for IPv4, as in raw_v4_input() the dif is from inet_iif(skb), which is skb->skb_iif when there is no route. But for IPv6, the fix is not enough, because in ipv6_raw_deliver() -> raw_v6_match(), the dif is inet6_iif(skb), which is returns IP6CB(skb)->iif instead of skb->skb_iif if it's not a l3_slave. To fix the IPv6 part issue. Let's set IP6CB(skb)->iif to correct ifindex. BTW, ipvlan handles NS/NA specifically. Since it works fine, I will not reset IP6CB(skb)->iif when addr->atype is IPVL_ICMPV6. Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2196710 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12wifi: mac80211: fragment per STA profile correctlyBenjamin Berg
When fragmenting the ML per STA profile, the element ID should be IEEE80211_MLE_SUBELEM_PER_STA_PROFILE rather than WLAN_EID_FRAGMENT. Change the helper function to take the to be used element ID and pass the appropriate value for each of the fragmentation levels. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230611121219.9b5c793d904b.I7dad952bea8e555e2f3139fbd415d0cd2b3a08c3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-12net: ethtool: correct MAX attribute value for statsJakub Kicinski
When compiling YNL generated code compiler complains about array-initializer-out-of-bounds. Turns out the MAX value for STATS_GRP uses the value for STATS. This may lead to random corruptions in user space (kernel itself doesn't use this value as it never parses stats). Fixes: f09ea6fb1272 ("ethtool: add a new command for reading standard stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-11cifs: fix max_credits implementationShyam Prasad N
The current implementation of max_credits on the client does not work because the CreditRequest logic for several commands does not take max_credits into account. Still, we can end up asking the server for more credits, depending on the number of credits in flight. For this, we need to limit the credits while parsing the responses too. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-11cifs: fix sockaddr comparison in iface_cmpShyam Prasad N
iface_cmp used to simply do a memcmp of the two provided struct sockaddrs. The comparison needs to do more based on the address family. Similar logic was already present in cifs_match_ipaddr. Doing something similar now. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-11smb/client: print "Unknown" instead of bogus link speed valueEnzo Matsumiya
The virtio driver for Linux guests will not set a link speed to its paravirtualized NICs. This will be seen as -1 in the ethernet layer, and when some servers (e.g. samba) fetches it, it's converted to an unsigned value (and multiplied by 1000 * 1000), so in client side we end up with: 1) Speed: 4294967295000000 bps in DebugData. This patch introduces a helper that returns a speed string (in Mbps or Gbps) if interface speed is valid (>= SPEED_10 and <= SPEED_800000), or "Unknown" otherwise. The reason to not change the value in iface->speed is because we don't know the real speed of the HW backing the server NIC, so let's keep considering these as the fastest NICs available. Also print "Capabilities: None" when the interface doesn't support any. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-11cifs: print all credit counters in DebugDataShyam Prasad N
Output of /proc/fs/cifs/DebugData shows only the per-connection counter for the number of credits of regular type. i.e. the credits reserved for echo and oplocks are not displayed. There have been situations recently where having this info would have been useful. This change prints the credit counters of all three types: regular, echo, oplocks. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-11cifs: fix status checks in cifs_tree_connectShyam Prasad N
The ordering of status checks at the beginning of cifs_tree_connect is wrong. As a result, a tcon which is good may stay marked as needing reconnect infinitely. Fixes: 2f0e4f034220 ("cifs: check only tcon status on tcon related functions") Cc: stable@vger.kernel.org # 6.3 Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-11smb: remove obsolete comment鑫华
Because do_gettimeofday has been removed and replaced by ktime_get_real_ts64, So just remove the comment as it's not needed now. Signed-off-by: 鑫华 <jixianghua@xfusion.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-11blk-cgroup: Flush stats before releasing blkcg_gqMing Lei
As noted by Michal, the blkg_iostat_set's in the lockless list hold reference to blkg's to protect against their removal. Those blkg's hold reference to blkcg. When a cgroup is being destroyed, cgroup_rstat_flush() is only called at css_release_work_fn() which is called when the blkcg reference count reaches 0. This circular dependency will prevent blkcg and some blkgs from being freed after they are made offline. It is less a problem if the cgroup to be destroyed also has other controllers like memory that will call cgroup_rstat_flush() which will clean up the reference count. If block is the only controller that uses rstat, these offline blkcg and blkgs may never be freed leaking more and more memory over time. To prevent this potential memory leak: - flush blkcg per-cpu stats list in __blkg_release(), when no new stat can be added - add global blkg_stat_lock for covering concurrent parent blkg stat update - don't grab bio->bi_blkg reference when adding the stats into blkcg's per-cpu stat list since all stats are guaranteed to be consumed before releasing blkg instance, and grabbing blkg reference for stats was the most fragile part of original patch Based on Waiman's patch: https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/ Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") Cc: stable@vger.kernel.org Reported-by: Jay Shin <jaeshin@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: mkoutny@suse.com Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230609234249.1412858-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-11Linux 6.4-rc6v6.4-rc6Linus Torvalds
2023-06-11IB/isert: Fix incorrect release of isert connectionSaravanan Vajravel
The ib_isert module is releasing the isert connection both in isert_wait_conn() handler as well as isert_free_conn() handler. In isert_wait_conn() handler, it is expected to wait for iSCSI session logout operation to complete. It should free the isert connection only in isert_free_conn() handler. When a bunch of iSER target is cleared, this issue can lead to use-after-free memory issue as isert conn is twice released Fixes: b02efbfc9a05 ("iser-target: Fix implicit termination of connections") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/20230606102531.162967-4-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11IB/isert: Fix possible list corruption in CMA handlerSaravanan Vajravel
When ib_isert module receives connection error event, it is releasing the isert session and removes corresponding list node but it doesn't take appropriate mutex lock to remove the list node. This can lead to linked list corruption Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://lore.kernel.org/r/20230606102531.162967-3-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11IB/isert: Fix dead lock in ib_isertSaravanan Vajravel
- When a iSER session is released, ib_isert module is taking a mutex lock and releasing all pending connections. As part of this, ib_isert is destroying rdma cm_id. To destroy cm_id, rdma_cm module is sending CM events to CMA handler of ib_isert. This handler is taking same mutex lock. Hence it leads to deadlock between ib_isert & rdma_cm modules. - For fix, created local list of pending connections and release the connection outside of mutex lock. Calltrace: --------- [ 1229.791410] INFO: task kworker/10:1:642 blocked for more than 120 seconds. [ 1229.791416] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 [ 1229.791418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1229.791419] task:kworker/10:1 state:D stack: 0 pid: 642 ppid: 2 flags:0x80004000 [ 1229.791424] Workqueue: ib_cm cm_work_handler [ib_cm] [ 1229.791436] Call Trace: [ 1229.791438] __schedule+0x2d1/0x830 [ 1229.791445] ? select_idle_sibling+0x23/0x6f0 [ 1229.791449] schedule+0x35/0xa0 [ 1229.791451] schedule_preempt_disabled+0xa/0x10 [ 1229.791453] __mutex_lock.isra.7+0x310/0x420 [ 1229.791456] ? select_task_rq_fair+0x351/0x990 [ 1229.791459] isert_cma_handler+0x224/0x330 [ib_isert] [ 1229.791463] ? ttwu_queue_wakelist+0x159/0x170 [ 1229.791466] cma_cm_event_handler+0x25/0xd0 [rdma_cm] [ 1229.791474] cma_ib_handler+0xa7/0x2e0 [rdma_cm] [ 1229.791478] cm_process_work+0x22/0xf0 [ib_cm] [ 1229.791483] cm_work_handler+0xf4/0xf30 [ib_cm] [ 1229.791487] ? move_linked_works+0x6e/0xa0 [ 1229.791490] process_one_work+0x1a7/0x360 [ 1229.791491] ? create_worker+0x1a0/0x1a0 [ 1229.791493] worker_thread+0x30/0x390 [ 1229.791494] ? create_worker+0x1a0/0x1a0 [ 1229.791495] kthread+0x10a/0x120 [ 1229.791497] ? set_kthread_struct+0x40/0x40 [ 1229.791499] ret_from_fork+0x1f/0x40 [ 1229.791739] INFO: task targetcli:28666 blocked for more than 120 seconds. [ 1229.791740] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 [ 1229.791741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1229.791742] task:targetcli state:D stack: 0 pid:28666 ppid: 5510 flags:0x00004080 [ 1229.791743] Call Trace: [ 1229.791744] __schedule+0x2d1/0x830 [ 1229.791746] schedule+0x35/0xa0 [ 1229.791748] schedule_preempt_disabled+0xa/0x10 [ 1229.791749] __mutex_lock.isra.7+0x310/0x420 [ 1229.791751] rdma_destroy_id+0x15/0x20 [rdma_cm] [ 1229.791755] isert_connect_release+0x115/0x130 [ib_isert] [ 1229.791757] isert_free_np+0x87/0x140 [ib_isert] [ 1229.791761] iscsit_del_np+0x74/0x120 [iscsi_target_mod] [ 1229.791776] lio_target_np_driver_store+0xe9/0x140 [iscsi_target_mod] [ 1229.791784] configfs_write_file+0xb2/0x110 [ 1229.791788] vfs_write+0xa5/0x1a0 [ 1229.791792] ksys_write+0x4f/0xb0 [ 1229.791794] do_syscall_64+0x5b/0x1a0 [ 1229.791798] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://lore.kernel.org/r/20230606102531.162967-2-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11Merge tag 'x86_urgent_for_v6.4_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Set up the kernel CS earlier in the boot process in case EFI boots the kernel after bypassing the decompressor and the CS descriptor used ends up being the EFI one which is not mapped in the identity page table, leading to early SEV/SNP guest communication exceptions resulting in the guest crashing * tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed
2023-06-11Merge tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Five smb3 server fixes, all also for stable: - Fix four slab out of bounds warnings: improve checks for protocol id, and for small packet length, and for create context parsing, and for negotiate context parsing - Fix for incorrect dereferencing POSIX ACLs" * tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: validate smb request protocol id ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR() ksmbd: fix out-of-bound read in parse_lease_state() ksmbd: fix out-of-bound read in deassemble_neg_contexts()
2023-06-11RDMA/mlx5: Fix affinity assignmentMark Bloch
The cited commit aimed to ensure that Virtual Functions (VFs) assign a queue affinity to a Queue Pair (QP) to distribute traffic when the LAG master creates a hardware LAG. If the affinity was set while the hardware was not in LAG, the firmware would ignore the affinity value. However, this commit unintentionally assigned an affinity to QPs on the LAG master's VPORT even if the RDMA device was not marked as LAG-enabled. In most cases, this was not an issue because when the hardware entered hardware LAG configuration, the RDMA device of the LAG master would be destroyed and a new one would be created, marked as LAG-enabled. The problem arises when a user configures Equal-Cost Multipath (ECMP). In ECMP mode, traffic can be directed to different physical ports based on the queue affinity, which is intended for use by VPORTS other than the E-Switch manager. ECMP mode is supported only if both E-Switch managers are in switchdev mode and the appropriate route is configured via IP. In this configuration, the RDMA device is not destroyed, and we retain the RDMA device that is not marked as LAG-enabled. To ensure correct behavior, Send Queues (SQs) opened by the E-Switch manager through verbs should be assigned strict affinity. This means they will only be able to communicate through the native physical port associated with the E-Switch manager. This will prevent the firmware from assigning affinity and will not allow the SQs to be remapped in case of failover. Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode") Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11IB/uverbs: Fix to consider event queue closing also upon non-blocking modeYishai Hadas
Fix ib_uverbs_event_read() to consider event queue closing also upon non-blocking mode. Once the queue is closed (e.g. hot-plug flow) all the existing events are cleaned-up as part of ib_uverbs_free_event_queue(). An application that uses the non-blocking FD mode should get -EIO in that case to let it knows that the device was removed already. Otherwise, it can loose the indication that the device was removed and won't recover. As part of that, refactor the code to have a single flow with regards to 'is_closed' for both blocking and non-blocking modes. Fixes: 14e23bd6d221 ("RDMA/core: Fix locking in ib_uverbs_event_read") Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/97b00116a1e1e13f8dc4ec38a5ea81cf8c030210.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11RDMA/uverbs: Restrict usage of privileged QKEYsEdward Srouji
According to the IB specification rel-1.6, section 3.5.3: "QKEYs with the most significant bit set are considered controlled QKEYs, and a HCA does not allow a consumer to arbitrarily specify a controlled QKEY." Thus, block non-privileged users from setting such a QKEY. Cc: stable@vger.kernel.org Fixes: bc38a6abdd5a ("[PATCH] IB uverbs: core implementation") Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://lore.kernel.org/r/c00c809ddafaaf87d6f6cb827978670989a511b3.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11RDMA/cma: Always set static rate to 0 for RoCEMark Zhang
Set static rate to 0 as it should be discovered by path query and has no meaning for RoCE. This also avoid of using the rtnl lock and ethtool API, which is a bottleneck when try to setup many rdma-cm connections at the same time, especially with multiple processes. Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/f72a4f8b667b803aee9fa794069f61afb5839ce4.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>