summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-09mmap: fix do_brk_flags() modifying obviously incorrect VMAsLiam Howlett
Add more sanity checks to the VMA that do_brk_flags() will expand. Ensure the VMA matches basic merge requirements within the function before calling can_vma_merge_after(). Drop the duplicate checks from vm_brk_flags() since they will be enforced later. The old code would expand file VMAs on brk(), which is functionally wrong and also dangerous in terms of locking because the brk() path isn't designed for file VMAs and therefore doesn't lock the file mapping. Checking can_vma_merge_after() ensures that new anonymous VMAs can't be merged into file VMAs. See https://lore.kernel.org/linux-mm/CAG48ez1tJZTOjS_FjRZhvtDA-STFmdw8PEizPDwMGFd_ui0Nrw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20221205192304.1957418-1-Liam.Howlett@oracle.com Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Jann Horn <jannh@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bitDavid Hildenbrand
We use "unsigned long" to store a PFN in the kernel and phys_addr_t to store a physical address. On a 64bit system, both are 64bit wide. However, on a 32bit system, the latter might be 64bit wide. This is, for example, the case on x86 with PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans 32bit. The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses that case, and assumes that the maximum PFN is limited by an 32bit phys_addr_t. This implies, that SWP_PFN_BITS will currently only be able to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong. Let's rely on the number of bits in phys_addr_t instead, but make sure to not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in is_pfn_swap_entry() unhappy. Note that swp_entry_t is effectively an unsigned long and the maximum swap offset shares that value with the swap type. For example, on an 8 GiB x86 PAE system with a kernel config based on Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently fail removing migration entries (remove_migration_ptes()), because mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as swp_offset_pfn() wrongly masks off PFN bits. For example, split_huge_page_to_list()->...->remap_page() will leave migration entries in place and continue to unlock the page. Later, when we stumble over these migration entries (e.g., via /proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these migration entries shouldn't exist anymore and the page was unlocked. [ 33.067591] kernel BUG at include/linux/swapops.h:497! [ 33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G E 6.1.0-rc8+ #16 [ 33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 33.067606] EIP: pagemap_pmd_range+0x644/0x650 [ 33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31 [ 33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000 [ 33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68 [ 33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246 [ 33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0 [ 33.067625] Call Trace: [ 33.067628] ? madvise_free_pte_range+0x720/0x720 [ 33.067632] ? smaps_pte_range+0x4b0/0x4b0 [ 33.067634] walk_pgd_range+0x325/0x720 [ 33.067637] ? mt_find+0x1d6/0x3a0 [ 33.067641] ? mt_find+0x1d6/0x3a0 [ 33.067643] __walk_page_range+0x164/0x170 [ 33.067646] walk_page_range+0xf9/0x170 [ 33.067648] ? __kmem_cache_alloc_node+0x2a8/0x340 [ 33.067653] pagemap_read+0x124/0x280 [ 33.067658] ? default_llseek+0x101/0x160 [ 33.067662] ? smaps_account+0x1d0/0x1d0 [ 33.067664] vfs_read+0x90/0x290 [ 33.067667] ? do_madvise.part.0+0x24b/0x390 [ 33.067669] ? debug_smp_processor_id+0x12/0x20 [ 33.067673] ksys_pread64+0x58/0x90 [ 33.067675] __ia32_sys_ia32_pread64+0x1b/0x20 [ 33.067680] __do_fast_syscall_32+0x4c/0xc0 [ 33.067683] do_fast_syscall_32+0x29/0x60 [ 33.067686] do_SYSENTER_32+0x15/0x20 [ 33.067689] entry_SYSENTER_32+0x98/0xf1 Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it readable and consistent. [david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead] Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com [david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64] Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09tmpfs: fix data loss from failed fallocateHugh Dickins
Fix tmpfs data loss when the fallocate system call is interrupted by a signal, or fails for some other reason. The partial folio handling in shmem_undo_range() forgot to consider this unfalloc case, and was liable to erase or truncate out data which had already been committed earlier. It turns out that none of the partial folio handling there is appropriate for the unfalloc case, which just wants to proceed to removal of whole folios: which find_get_entries() provides, even when partially covered. Original patch by Rui Wang. Link: https://lore.kernel.org/linux-mm/33b85d82.7764.1842e9ab207.Coremail.chenguoqic@163.com/ Link: https://lkml.kernel.org/r/a5dac112-cf4b-7af-a33-f386e347fd38@google.com Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Guoqi Chen <chenguoqic@163.com> Link: https://lore.kernel.org/all/20221101032248.819360-1-kernel@hev.cc/ Cc: Rui Wang <kernel@hev.cc> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: <stable@vger.kernel.org> [5.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09kselftests: cgroup: update kmem test precision toleranceMichal Hocko
1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed the batch size while this test case has been left behind. This has led to a test failure reported by test bot: not ok 2 selftests: cgroup: test_kmem # exit=1 Update the tolerance for the pcp charges to reflect the MEMCG_CHARGE_BATCH change to fix this. [akpm@linux-foundation.org: update comments, per Roman] Link: https://lkml.kernel.org/r/Y4m8Unt6FhWKC6IH@dhcp22.suse.cz Fixes: 1813e51eece0a ("memcg: increase MEMCG_CHARGE_BATCH to 64") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/oe-lkp/202212010958.c1053bd3-yujie.liu@intel.com Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Tested-by: Yujie Liu <yujie.liu@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09mm: do not BUG_ON missing brk mapping, because userspace can unmap itJason A. Donenfeld
The following program will trigger the BUG_ON that this patch removes, because the user can munmap() mm->brk: #include <sys/syscall.h> #include <sys/mman.h> #include <assert.h> #include <unistd.h> static void *brk_now(void) { return (void *)syscall(SYS_brk, 0); } static void brk_set(void *b) { assert(syscall(SYS_brk, b) != -1); } int main(int argc, char *argv[]) { void *b = brk_now(); brk_set(b + 4096); assert(munmap(b - 4096, 4096 * 2) == 0); brk_set(b); return 0; } Compile that with musl, since glibc actually uses brk(), and then execute it, and it'll hit this splat: kernel BUG at mm/mmap.c:229! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 12 PID: 1379 Comm: a.out Tainted: G S U 6.1.0-rc7+ #419 RIP: 0010:__do_sys_brk+0x2fc/0x340 Code: 00 00 4c 89 ef e8 04 d3 fe ff eb 9a be 01 00 00 00 4c 89 ff e8 35 e0 fe ff e9 6e ff ff ff 4d 89 a7 20> RSP: 0018:ffff888140bc7eb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000007e7000 RCX: ffff8881020fe000 RDX: ffff8881020fe001 RSI: ffff8881955c9b00 RDI: ffff8881955c9b08 RBP: 0000000000000000 R08: ffff8881955c9b00 R09: 00007ffc77844000 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000007e8000 R13: 00000000007e8000 R14: 00000000007e7000 R15: ffff8881020fe000 FS: 0000000000604298(0000) GS:ffff88901f700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000603fe0 CR3: 000000015ba9a005 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x400678 Code: 10 4c 8d 41 08 4c 89 44 24 10 4c 8b 01 8b 4c 24 08 83 f9 2f 77 0a 4c 8d 4c 24 20 4c 01 c9 eb 05 48 8b> RSP: 002b:00007ffc77863890 EFLAGS: 00000212 ORIG_RAX: 000000000000000c RAX: ffffffffffffffda RBX: 000000000040031b RCX: 0000000000400678 RDX: 00000000004006a1 RSI: 00000000007e6000 RDI: 00000000007e7000 RBP: 00007ffc77863900 R08: 0000000000000000 R09: 00000000007e6000 R10: 00007ffc77863930 R11: 0000000000000212 R12: 00007ffc77863978 R13: 00007ffc77863988 R14: 0000000000000000 R15: 0000000000000000 </TASK> Instead, just return the old brk value if the original mapping has been removed. [akpm@linux-foundation.org: fix changelog, per Liam] Link: https://lkml.kernel.org/r/20221202162724.2009-1-Jason@zx2c4.com Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09mailmap: update Matti Vaittinen's email addressMatti Vaittinen
The email backend used by ROHM keeps labeling patches as spam. This can result in missing the patches. Switch my mail address from a company mail to a personal one. Link: https://lkml.kernel.org/r/8f4498b66fedcbded37b3b87e0c516e659f8f583.1669912977.git.mazziesaccount@gmail.com Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Anup Patel <anup@brainfault.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Atish Patra <atishp@atishpatra.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Bjorn Andersson <andersson@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Qais Yousef <qyousef@layalina.io> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-09riscv: Don't duplicate _ALTERNATIVE_CFG* macrosAndrew Jones
Reduce clutter by only defining the _ALTERNATIVE_CFG* macros once, rather than once for assembly and once for C. To do that, we need to add __ALTERNATIVE_CFG* macros to the assembly side, but those are one-liners. Also take the opportunity to do a bit of reformatting, taking full advantage of the fact checkpatch gives us 100 char lines. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221129150053.50464-5-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09riscv: alternatives: Drop the underscores from the assembly macro namesAndrew Jones
The underscores aren't needed because there isn't anything already named without them and the _CFG extension. This is a bit of a cleanup by itself, but the real motivation is for a coming patch which would otherwise need to add two more underscores to these macro names, i.e. ____ALTERNATIVE_CFG, and that'd be gross. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221129150053.50464-4-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09riscv: alternatives: Don't name unused macro parametersAndrew Jones
Without CONFIG_RISCV_ALTERNATIVE only the first parameter of the ALTERNATIVE macros is needed. Use ... for the rest to cut down on clutter. While there, fix a couple space vs. tab issues. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221129150053.50464-3-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2Andrew Jones
Build __ALTERNATIVE_CFG_2 by adding on to __ALTERNATIVE_CFG rather than duplicating it. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221129150053.50464-2-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09RDMA/rxe: Enable RDMA FLUSH capability for rxe deviceLi Zhijian
Now we are ready to enable RDMA FLUSH capability for RXE. It can support Global Visibility and Persistence placement types. Link: https://lore.kernel.org/r/20221206130201.30986-11-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/cm: Make QP FLUSHABLE for supported deviceLi Zhijian
Similar to RDMA and Atomic qp attributes enabled by default in CM, enable FLUSH attribute for supported device. That makes applications that are built with rdma_create_ep, rdma_accept APIs have FLUSH qp attribute natively so that user is able to request FLUSH operation simpler. Note that, a FLUSH operation requires FLUSH are supported by both device(HCA) and memory region(MR) and QP at the same time, so it's safe to enable FLUSH qp attribute by default here. FLUSH attribute can be disable by modify_qp() interface. Link: https://lore.kernel.org/r/20221206130201.30986-10-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement flush completionLi Zhijian
Per IBA SPEC, FLUSH will ack in rdma read response with 0 length. Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH completion. Link: https://lore.kernel.org/r/20221206130201.30986-9-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement flush execution in responder sideLi Zhijian
Only the requested placement types that also registered in the destination memory region are acceptable. Otherwise, responder will also reply NAK "Remote Access Error" if it found a placement type violation. We will persist data via arch_wb_cache_pmem(), which could be architecture specific. This commit also adds 2 helpers to update qp.resp from the incoming packet. Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement RC RDMA FLUSH service in requester sideLi Zhijian
Implement FLUSH request operation in the requester. Link: https://lore.kernel.org/r/20221206130201.30986-7-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Extend rxe packet format to support flushLi Zhijian
Extend rxe opcode tables, headers, helper and constants to support flush operations. Refer to the IBA A19.4.1 for more FETH definition details Link: https://lore.kernel.org/r/20221206130201.30986-6-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Allow registering persistent flag for pmem MR onlyLi Zhijian
Memory region could support at most 2 flush access flags: IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL But we only allow user to register persistent flush flags to the pmem MR where it has the ability of persisting data across power cycles. So registering a persistent access flag to a non-pmem MR will be rejected. Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com CC: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Extend rxe user ABI to support flushLi Zhijian
This commit extends the rxe user ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing rxe user ABI. The user API request a flush by filling this structure. Link: https://lore.kernel.org/r/20221206130201.30986-4-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA: Extend RDMA kernel verbs ABI to support flushLi Zhijian
This commit extends the RDMA kernel verbs ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing RDMA kernel verbs ABI. It makes device/HCA support new FLUSH attributes/capabilities, and it also makes memory region support new FLUSH access flags. Users can use ibv_reg_mr(3) to register flush access flags. Only the access flags also supported by device's capabilities can be registered successfully. Once registered successfully, it means the MR is flushable. Similarly, A flushable MR should also have one or both of GLOBAL_VISIBILITY and PERSISTENT attributes/capabilities like device/HCA. Link: https://lore.kernel.org/r/20221206130201.30986-3-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA: Extend RDMA user ABI to support flushLi Zhijian
This commit extends the RDMA user ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing RDMA user ABI. Link: https://lore.kernel.org/r/20221206130201.30986-2-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Fix incorrect responder length checkingBob Pearson
The code in rxe_resp.c at check_length() is incorrect as it compares pkt->opcode an 8 bit value against various mask bits which are all higher than 256 so nothing is ever reported. This patch rewrites this to compare against pkt->mask which is correct. However this now exposes another error. For UD send packets the value of the pmtu cannot be determined from qp->mtu. All that is required here is to later check if the payload fits into the posted receive buffer in that case. Fixes: 837a55847ead ("RDMA/rxe: Implement packet length validation on responder") Link: https://lore.kernel.org/r/20221208210945.28607-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09Documentation/rv: Add verification/rv man pagesDaniel Bristot de Oliveira
Add man pages for the rv command line, using the same scheme we used in rtla. Link: https://lkml.kernel.org/r/e841d7cfbdfc3ebdaf7cbd40278571940145d829.1668180100.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-09tools/rv: Add in-kernel monitor interfaceDaniel Bristot de Oliveira
Add the ability to control and trace in-kernel monitors. This is a generic interface, it will check for existing monitors and enable standard setup, like enabling reactors. For example: # rv list wip wakeup in preemptive per-cpu testing monitor. [OFF] wwnr wakeup while not running per-task testing model. [OFF] # rv mon wwnr --help rv version 6.1.0-rc4: help usage: rv mon wwnr [-h] [-q] [-r reactor] [-s] [-v] -h/--help: print this menu and the reactor list -r/--reactor 'reactor': enables the 'reactor' -s/--self: when tracing (-t), also trace rv command -t/--trace: trace monitor's event -v/--verbose: print debug messages available reactors: nop printk panic # rv mon wwnr --trace <TASK>-PID [CPU] TYPE ID STATE x EVENT -> NEXT_STATE FINAL | | | | | | | | | rv-3613 [001] event 3613 running x switch_out -> not_running Y sshd-1248 [005] event 1248 running x switch_out -> not_running Y <idle>-0 [005] event 71 not_running x wakeup -> not_running Y <idle>-0 [005] event 71 not_running x switch_in -> running N kcompactd0-71 [005] event 71 running x switch_out -> not_running Y <idle>-0 [000] event 860 not_running x wakeup -> not_running Y <idle>-0 [000] event 860 not_running x switch_in -> running N systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y <idle>-0 [000] event 860 not_running x wakeup -> not_running Y <idle>-0 [000] event 860 not_running x switch_in -> running N systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y <idle>-0 [005] event 71 not_running x wakeup -> not_running Y <idle>-0 [005] event 71 not_running x switch_in -> running N kcompactd0-71 [005] event 71 running x switch_out -> not_running Y <idle>-0 [000] event 860 not_running x wakeup -> not_running Y <idle>-0 [000] event 860 not_running x switch_in -> running N systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y <idle>-0 [001] event 3613 not_running x wakeup -> not_running Y Link: https://lkml.kernel.org/r/1e57547e3acadda6e23949b2672c89e76ec2ec42.1668180100.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-09rv: Add rv toolDaniel Bristot de Oliveira
This is the (user-space) runtime verification tool, named rv. This tool aims to be the interface for in-kernel rv monitors, as well as the home for monitors in user-space (online asynchronous), and in *eBPF. The tool receives a command as the first argument, the current commands are: list - list all available monitors mon - run a given monitor Each monitor is an independent piece of software inside the tool and can have their own arguments. There is no monitor implemented in this patch, it only adds the basic structure of the tool, based on rtla. # rv --help rv version 6.1.0-rc4: help usage: rv command [-h] [command_options] -h/--help: print this menu command: run one of the following command: list: list all available monitors mon: run a monitor [command options]: each command has its own set of options run rv command -h for further information *dot2bpf is the next patch set, depends on this, doing cleanups. Link: https://lkml.kernel.org/r/fb51184f3b95aea0d7bfdc33ec09f4153aee84fa.1668180100.git.bristot@kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-09rtla: Fix exit status when returning from calls to usage()John Kacur
rtla_usage(), osnoise_usage() and timerlat_usage() all exit with an error status. However when these are called from help, they should exit with a non-error status. Fix this by passing the exit status to the functions. Note, although we remove the subsequent call to exit after calling usage, we leave it in at the end of a function to suppress the compiler warning "control reaches end of a non-void function". Link: https://lkml.kernel.org/r/20221107144313.22470-1-jkacur@redhat.com Signed-off-by: John Kacur <jkacur@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-09MIPS: OCTEON: warn only once if deprecated link status is being usedLadislav Michl
Avoid flooding kernel log with warnings. Fixes: 2c0756d306c2 ("MIPS: OCTEON: warn if deprecated link status is being used") Signed-off-by: Ladislav Michl <ladis@linux-mips.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2022-12-09MIPS: BCM63xx: Add check for NULL for clk in clk_enableAnastasia Belova
Check clk for NULL before calling clk_enable_unlocked where clk is dereferenced. There is such check in other implementations of clk_enable. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: e7300d04bd08 ("MIPS: BCM63xx: Add support for the Broadcom BCM63xx family of SOCs.") Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2022-12-09dt-bindings: lcdif: Fix constraints for imx8mpAlexander Stein
i.MX8MP uses 3 clocks, so soften the restrictions for clocks & clock-names. This SoC requires a power-domain for this peripheral to use. Add it as a required property. Fixes: f5419cb0743f ("dt-bindings: lcdif: Add compatible for i.MX8MP") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://lore.kernel.org/r/20221208140840.3227035-1-alexander.stein@ew.tq-group.com Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-09media: dt-bindings: atmel,isc: Drop unneeded unevaluatedPropertiesRob Herring
The 'port' node schema has both 'additionalProperties' and 'unevaluatedProperties', but only one is necessary. 'additionalProperties' works here, so drop 'unevaluatedProperties' and move 'additionalProperties' next to the $ref. Link: https://lore.kernel.org/r/20221207204406.2810864-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-09riscv: mm: call best_map_size many times during linear-mappingQinglin Pan
Modify the best_map_size function to give map_size many times instead of only once, so a memory region can be mapped by both PMD_SIZE and PAGE_SIZE. Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20221128023643.329091-1-panqinglin2020@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09RDMA/rxe: Fix oops with zero length readsDaisuke Matsuda
The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr is not set in check_rkey() (see IBA C9-88). The mr is NULL in this case, and a NULL pointer dereference occurs as shown below. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246 RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000 RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000 RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000 FS: 00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0 Call Trace: <IRQ> read_reply+0xda/0x310 [rdma_rxe] rxe_responder+0x82d/0xe50 [rdma_rxe] do_task+0x84/0x170 [rdma_rxe] tasklet_action_common.constprop.0+0xa7/0x120 __do_softirq+0xcb/0x2ac do_softirq+0x63/0x90 </IRQ> Support a NULL mr during read_reply() Fixes: 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") Fixes: b5f9a01fae42 ("RDMA/rxe: Fix mr leak in RESPST_ERR_RNR") Link: https://lore.kernel.org/r/20221209045926.531689-1-matsuda-daisuke@fujitsu.com Link: https://lore.kernel.org/r/20221202145713.13152-1-lizhijian@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09Merge tag 'v6.1-rc8' into rdma.git for-nextJason Gunthorpe
For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09iommufd: Change the order of MSI setupJason Gunthorpe
Eric points out this is wrong for the rare case of someone using allow_unsafe_interrupts on ARM. We always have to setup the MSI window in the domain if the iommu driver asks for it. Move the iommu_get_msi_cookie() setup to the top of the function and always do it, regardless of the security mode. Add checks to iommufd_device_setup_msi() to ensure the driver is not doing something incomprehensible. No current driver will set both a HW and SW MSI window, or have more than one SW MSI window. Fixes: e8d57210035b ("iommufd: Add kAPI toward external drivers for physical devices") Link: https://lore.kernel.org/r/3-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09iommufd: Improve a few unclear bits of codeJason Gunthorpe
Correct a few items noticed late in review: - We should assert that the math in batch_clear_carry() doesn't underflow - user->locked should be -1 not 0 sicne we just did mmput - npages should not have been recalculated, it already has that value No functional change. Fixes: 8d160cd4d506 ("iommufd: Algorithms for PFN storage") Link: https://lore.kernel.org/r/2-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09iommufd: Fix comment typosJason Gunthorpe
Repair some typos in comments that were noticed late in the review cycle. Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages") Link: https://lore.kernel.org/r/1-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09Merge tag 'media/v6.1-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fix from Mauro Carvalho Chehab: "A v4l-core fix related to validating DV timings related to video blanking values" * tag 'media/v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: v4l2-dv-timings.c: fix too strict blanking sanity checks
2022-12-09Merge tag 'soc-fixes-6.1-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fix from Arnd Bergmann: "One more last minute revert for a boot regression that was found on the popular colibri-imx7" * tag 'soc-fixes-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: Revert "ARM: dts: imx7: Fix NAND controller size-cells"
2022-12-09clk: nomadik: correct struct name kernel-doc warningRandy Dunlap
Use the correct struct name for the kernel-doc notation to prevent a kernel-doc warning: clk-nomadik.c:148: warning: expecting prototype for struct clk_pll1. Prototype was for struct clk_pll instead Fixes: ef6eb322ce57 ("clk: nomadik: implement the Nomadik clocks properly") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Michael Turquette <mturquette@baylibre.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: linux-clk@vger.kernel.org Link: https://lore.kernel.org/r/20221209002016.14776-1-rdunlap@infradead.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-12-09docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGEDonald Hunter
Add documentation for the BPF_MAP_TYPE_SK_STORAGE including kernel version introduced, usage and examples. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221209112401.69319-1-donald.hunter@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-09regmap-irq: Add handle_mask_sync() callbackWilliam Breathitt Gray
Provide a public callback handle_mask_sync() that drivers can use when they have more complex IRQ masking logic. The default implementation is regmap_irq_handle_mask_sync(), used if the chip doesn't provide its own callback. Cc: Mark Brown <broonie@kernel.org> Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Link: https://lore.kernel.org/r/e083474b3d467a86e6cb53da8072de4515bd6276.1669100542.git.william.gray@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2022-12-09spi: dt-bindings: Convert Synquacer SPI to DT schemaRob Herring
Convert the Socionext Synquacer SPI binding to DT format. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20221209171644.3351787-1-robh@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2022-12-09lsm: Fix description of fs_context_parse_paramRoberto Sassu
The fs_context_parse_param hook already has a description, which seems the right one according to the code. Fixes: 8eb687bc8069 ("lsm: Add/fix return values in lsm_hooks.h and fix formatting") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-12-09Merge tag 'timers-v6.2-rc1' of ↵Thomas Gleixner
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clockevent/source driver updates from Daniel Lezcano: - Add DT bindings for the Rockchip rk3128 timer (Johan Jonker) - Change the DT bindings for the npcm7xx timer in order to specify multiple clocks and enable the clock for the timer1 on WPCM450 (Jonathan Neuschäfer) - Fix the timer duration being too long the ARM architected timer in order to prevent an integer overflow leading to a negative value and an immediate interruption (Joe Korty) - Fix an unused pointer warning reported by lkp and some cleanups in the timer TI dm (Tony Lindgren) - Fix a missing call to clk_disable_unprepare() in the error path at init time on the timer TI dm (Yang Yingliang) - Use kstrtobool() instead of strtobool() in the ARM architected timer (Christophe JAILLET) - Add DT bindings for r8a779g0 on Renesas platform (Wolfram Sang) Link: https://lore.kernel.org/all/3c4c3bb2-b849-0c87-0948-8a36984bdde4@linaro.org
2022-12-09x86/vdso: Conditionally export __vdso_sgx_enter_enclave()Nathan Chancellor
Recently, ld.lld moved from '--undefined-version' to '--no-undefined-version' as the default, which breaks building the vDSO when CONFIG_X86_SGX is not set: ld.lld: error: version script assignment of 'LINUX_2.6' to symbol '__vdso_sgx_enter_enclave' failed: symbol not defined __vdso_sgx_enter_enclave is only included in the vDSO when CONFIG_X86_SGX is set. Only export it if it will be present in the final object, which clears up the error. Fixes: 8466436952017 ("x86/vdso: Implement a vDSO for Intel SGX enclave call") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1756 Link: https://lore.kernel.org/r/20221109000306.1407357-1-nathan@kernel.org
2022-12-09udf: Fix extending file within last blockJan Kara
When extending file within last block it can happen that the extent is already rounded to the blocksize and thus contains the offset we want to grow up to. In such case we would mistakenly expand the last extent and make it one block longer than it should be, exposing unallocated block in a file and causing data corruption. Fix the problem by properly detecting this case and bailing out. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09udf: Discard preallocation before extending file with a holeJan Kara
When extending file with a hole, we tried to preserve existing preallocation for the file. However that is not very useful and complicates code because the previous extent may need to be rounded to block boundary as well (which we forgot to do thus causing data corruption for sequence like: xfs_io -f -c "pwrite 0x75e63 11008" -c "truncate 0x7b24b" \ -c "truncate 0xabaa3" -c "pwrite 0xac70b 22954" \ -c "pwrite 0x93a43 11358" -c "pwrite 0xb8e65 52211" file with 512-byte block size. Just discard preallocation before extending file to simplify things and also fix this data corruption. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09udf: Do not bother looking for prealloc extents if i_lenExtents matches i_sizeJan Kara
If rounded block-rounded i_lenExtents matches block rounded i_size, there are no preallocation extents. Do not bother walking extent linked list. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09udf: Fix preallocation discarding at indirect extent boundaryJan Kara
When preallocation extent is the first one in the extent block, the code would corrupt extent tree header instead. Fix the problem and use udf_delete_aext() for deleting extent to avoid some code duplication. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09drivers: net: qlcnic: Fix potential memory leak in qlcnic_sriov_init()Yuan Can
If vp alloc failed in qlcnic_sriov_init(), all previously allocated vp needs to be freed. Fixes: f197a7aa6288 ("qlcnic: VF-PF communication channel implementation") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-09net: stmmac: fix possible memory leak in stmmac_dvr_probe()Gaosheng Cui
The bitmap_free() should be called to free priv->af_xdp_zc_qps when create_singlethread_workqueue() fails, otherwise there will be a memory leak, so we add the err path error_wq_init to fix it. Fixes: bba2556efad6 ("net: stmmac: Enable RX via AF_XDP zero-copy") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>