summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-09-01mm/hugetlb: enforce that PMD PT sharing has split PMD PT locksDavid Hildenbrand
Sharing page tables between processes but falling back to per-MM page table locks cannot possibly work. So, let's make sure that we do have split PMD locks by adding a new Kconfig option and letting that depend on CONFIG_SPLIT_PMD_PTLOCKS. Link: https://lkml.kernel.org/r/20240726150728.3159964-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: turn USE_SPLIT_PTE_PTLOCKS / USE_SPLIT_PTE_PTLOCKS into Kconfig optionsDavid Hildenbrand
Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications". This series is a follow up to the fixes: "[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking" When working on the fixes, I wondered why 8xx is fine (-> never uses split PT locks) and how PT locking even works properly with PMD page table sharing (-> always requires split PMD PT locks). Let's improve the split PT lock detection, make hugetlb properly depend on it and make 8xx bail out if it would ever get enabled by accident. As an alternative to patch #3 we could extend the Kconfig SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the code that actually implements it feels a bit nicer for documentation purposes, and there is no need to actually disable it because it should always be disabled (!SMP). Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks are still getting used where we would expect them. [1] https://lkml.kernel.org/r/20240725183955.2268884-1-david@redhat.com This patch (of 3): Let's clean that up a bit and prepare for depending on CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options. More cleanups would be reasonable (like the arch-specific "depends on" for CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day. Link: https://lkml.kernel.org/r/20240726150728.3159964-1-david@redhat.com Link: https://lkml.kernel.org/r/20240726150728.3159964-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: page_counters: initialize usage using ATOMIC_LONG_INIT() macroRoman Gushchin
When a page_counter structure is initialized, there is no need to use an atomic set operation to initialize the usage counter because at this point the structure is not visible to anybody else. ATOMIC_LONG_INIT() is what should be used in such cases. Link: https://lkml.kernel.org/r/20240726203110.1577216-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: page_counters: put page_counter_calculate_protection() under CONFIG_MEMCGRoman Gushchin
Put page_counter_calculate_protection() under CONFIG_MEMCG. The protection functionality (min/low limits) is not supported by any other cgroup subsystem, so page_counter_calculate_protection() and related static effective_protection() can be compiled out if CONFIG_MEMCG is not enabled. Link: https://lkml.kernel.org/r/20240726203110.1577216-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: memcg: don't call propagate_protected_usage() needlesslyRoman Gushchin
Patch series "mm: memcg: page counters optimizations", v3. This patchset contains 3 independent small optimizations of page counters. This patch (of 3): Memory protection (min/low) requires a constant tracking of protected memory usage. propagate_protected_usage() is called on each page counters update and does a number of operations even in cases when the actual memory protection functionality is not supported (e.g. hugetlb cgroups or memcg swap counters). It's obviously inefficient and leads to a waste of CPU cycles. It can be addressed by calling propagate_protected_usage() only for the counters which do support memory guarantees. As of now it's only memcg->memory - the unified memory memcg counter. Link: https://lkml.kernel.org/r/20240726203110.1577216-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01task_stack: uninline stack_not_usedPasha Tatashin
Given that stack_not_used() is not performance critical function uninline it. Link: https://lkml.kernel.org/r/20240730150158.832783-4-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20240724203322.2765486-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01vmstat: kernel stack usage histogramPasha Tatashin
As part of the dynamic kernel stack project, we need to know the amount of data that can be saved by reducing the default kernel stack size [1]. Provide a kernel stack usage histogram to aid in optimizing kernel stack sizes and minimizing memory waste in large-scale environments. The histogram divides stack usage into power-of-two buckets and reports the results in /proc/vmstat. This information is especially valuable in environments with millions of machines, where even small optimizations can have a significant impact. The histogram data is presented in /proc/vmstat with entries like "kstack_1k", "kstack_2k", and so on, indicating the number of threads that exited with stack usage falling within each respective bucket. Example outputs: Intel: $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 ARM with 64K page_size: $ grep kstack /proc/vmstat kstack_1k 1 kstack_2k 340 kstack_4k 25212 kstack_8k 1659 kstack_16k 0 kstack_32k 0 kstack_64k 0 Note: once the dynamic kernel stack is implemented it will depend on the implementation the usability of this feature: On hardware that supports faults on kernel stacks, we will have other metrics that show the total number of pages allocated for stacks. On hardware where faults are not supported, we will most likely have some optimization where only some threads are extended, and for those, these metrics will still be very useful. [1] https://lwn.net/Articles/974367 Link: https://lkml.kernel.org/r/20240730150158.832783-3-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20240724203322.2765486-3-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memory tiering: introduce folio_use_access_time() checkZi Yan
If memory tiering mode is on and a folio is not in the top tier memory, folio's cpupid field is repurposed to store page access time. Instead of an open coded check, use a function to encapsulate the check. Link: https://lkml.kernel.org/r/20240724130115.793641-3-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: kmem: remove mem_cgroup_from_obj()Muchun Song
There is no user of mem_cgroup_from_obj(), remove it. Link: https://lkml.kernel.org/r/20240718091821.44740-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: shmem: move shmem_huge_global_enabled() into shmem_allowable_huge_orders()Baolin Wang
Move shmem_huge_global_enabled() into shmem_allowable_huge_orders(), so that shmem_allowable_huge_orders() can also help to find the allowable huge orders for tmpfs. Moreover the shmem_huge_global_enabled() can become static. While we are at it, passing the vma instead of mm for shmem_huge_global_enabled() makes code cleaner. No functional changes. Link: https://lkml.kernel.org/r/8e825146bb29ee1a1c7bd64d2968ff3e19be7815.1721626645.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: shmem: rename shmem_is_huge() to shmem_huge_global_enabled()Baolin Wang
shmem_is_huge() is now used to check if the top-level huge page is enabled, thus rename it to reflect its usage. Link: https://lkml.kernel.org/r/da53296e0ab6359aa083561d9dc01e4223d60fbe.1721626645.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: kvmalloc: align kvrealloc() with krealloc()Danilo Krummrich
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: vmalloc: implement vrealloc()Danilo Krummrich
Patch series "Align kvrealloc() with krealloc()", v2. Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. In order to be able to get rid of kvrealloc()'s oldsize parameter, introduce vrealloc() and make use of it in kvrealloc(). Making use of vrealloc() in kvrealloc() also provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. Besides the above, those functions are required by Rust's allocator abstractons [1] (rework based on this series in [2]). With `Vec` or `KVec` respectively, potentially growing (and shrinking) data structures are rather common. [1] https://lore.kernel.org/lkml/20240704170738.3621-1-dakr@redhat.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/dakr/linux.git/log/?h=rust/mm This patch (of 2): Implement vrealloc() analogous to krealloc(). Currently, krealloc() requires the caller to pass the size of the previous memory allocation, which, instead, should be self-contained. We attempt to fix this in a subsequent patch which, in order to do so, requires vrealloc(). Besides that, we need realloc() functions for kernel allocators in Rust too. With `Vec` or `KVec` respectively, potentially growing (and shrinking) data structures are rather common. [dakr@kernel.org: fix missing nommu implementation] Link: https://lkml.kernel.org/r/20240725141227.13954-1-dakr@kernel.org [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: consider spare memory for __GFP_ZERO] Link: https://lkml.kernel.org/r/20240730185049.6244-3-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-4-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-1-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-2-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: add node_reclaim successes to VM event countersMatthew Cassell
/proc/vmstat currently shows the number of node_reclaim() failures when vm.zone_reclaim_mode is set appropriately. It would be convenient to have the number of successes right next to zone_reclaim_failed (similar to compaction and migration). While just a trivially addition to the vmstat file. It was helpful during benchmarking to not have to probe node_reclaim() to observe the success/failure ratio. Link: https://lkml.kernel.org/r/20240722171316.7517-1-mcassell411@gmail.com Signed-off-by: Matthew Cassell <mcassell411@gmail.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01Merge tag 'x86-urgent-2024-09-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable it thereby creating inconsistent state. Reorder the logic so it actually works correctly - The XSTATE logic for handling LBR is incorrect as it assumes that XSAVES supports LBR when the CPU supports LBR. In fact both conditions need to be true. Otherwise the enablement of LBR in the IA32_XSS MSR fails and subsequently the machine crashes on the next XRSTORS operation because IA32_XSS is not initialized. Cache the XSTATE support bit during init and make the related functions use this cached information and the LBR CPU feature bit to cure this. - Cure a long standing bug in KASLR KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. - Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of an initialized variabled on the stack to the VMM. The variable is only required as output value, so it does not have to exposed to the VMM in the first place. - Prevent an array overrun in the resource control code on systems with Sub-NUMA Clustering enabled because the code failed to adjust the index by the number of SNC nodes per L3 cache. * tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix arch_mbm_* array overrun on SNC x86/tdx: Fix data leak in mmio_read() x86/kaslr: Expose and use the end of the physical memory address space x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported x86/apic: Make x2apic_disable() work correctly
2024-09-01SUNRPC: make various functions static, or not exported.NeilBrown
Various functions are only used within the sunrpc module, and several are only use in the one file. So clean up: These are marked static, and any EXPORT is removed. svc_rcpb_setup() svc_rqst_alloc() svc_rqst_free() - also moved before first use svc_rpcbind_set_version() svc_drop() - also moved to svc.c These are now not EXPORTed, but are not static. svc_authenticate() svc_sock_update_bufs() Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01lockd: discard nlmsvc_timeoutNeilBrown
nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use that in the one place that nlmsvc_timeout is used. In truth it *might* not always be the same as nlmsvc_timeout is only set when lockd is started while nlm_timeout can be set at anytime via sysctl. I think this difference it not helpful so removing it is good. Also remove the test for nlm_timout being 0. This is not possible - unless a module parameter is used to set the minimum timeout to 0, and if that happens then it probably should be honoured. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01PCI: endpoint: Assign PCI domain number for endpoint controllersManivannan Sadhasivam
Right now, PCI endpoint subsystem doesn't assign PCI domain number for the PCI endpoint controllers. But this domain number could be useful to the EPC drivers to uniquely identify each controller based on the hardware instance when there are multiple ones present in an SoC (even multiple RC/EP). So let's make use of the existing pci_bus_find_domain_nr() API to allocate domain numbers based on either devicetree (linux,pci-domain) property or dynamic domain number allocation scheme. It should be noted that the domain number allocated by this API will be based on both RC and EP controllers in a SoC. If the 'linux,pci-domain' DT property is present, then the domain number represents the actual hardware instance of the PCI endpoint controller. If not, then the domain number will be allocated based on the PCI EP/RC controller probe order. If the architecture doesn't support CONFIG_PCI_DOMAINS_GENERIC (rare), then currently a warning is thrown to indicate that the architecture specific implementation is needed. Link: https://lore.kernel.org/linux-pci/20240828-pci-qcom-hotplug-v4-5-263a385fbbcb@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2024-09-01Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - copy_file_range fix - two read fixes including read past end of file rc fix and read retry crediting fix - falloc zero range fix * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region cifs: Fix copy offload to flush destination region netfs, cifs: Fix handling of short DIO read cifs: Fix lack of credit renegotiation on read retry
2024-09-01Merge tag 'arm-fixes-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There is a fairly large number of bug fixes for Qualcomm platforms, most of them addressing issues with the devicetree files for the newly added Snapdragon X1 based laptops to make them more reliable. The Qualcomm driver changes address a few build-time issues as well as runtime problems in the tzmem and scm firmware, the USB Type-C driver, and the cmd-db and pmic_glink soc drivers. The NXP i.MX usually gets a bunch of devicetree fixes that is proportional to the number of supported machines. This includes both warning fixes and correctness for the 64-bit i.MX9, i.MX8 and layerscape platforms, as well as a single fix for a 32-bit i.MX6 based board. The other changes are the usual minor changes, including an update to the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs (risc-v)" * tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits) firmware: microchip: fix incorrect error report of programming:timeout on success soc: qcom: pd-mapper: Fix singleton refcount firmware: qcom: tzmem: disable sdm670 platform soc: qcom: pmic_glink: Actually communicate when remote goes down usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Fix race during initialization firmware: qcom: qseecom: remove unused functions firmware: qcom: tzmem: fix virtual-to-physical address conversion firmware: qcom: scm: Mark get_wq_ctx() as atomic call arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt arm64: dts: qcom: disable GPU on x1e80100 by default arm64: dts: imx8mm-phygate: fix typo pinctrcl-0 arm64: dts: imx95: correct L3Cache cache-sets arm64: dts: imx95: correct a55 power-domains arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design arm64: dts: imx93: update default value for snps,clk-csr arm64: dts: freescale: tqma9352: Fix watchdog reset arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962 ...
2024-08-30cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1Chen Ridong
This patch introduces CONFIG_CPUSETS_V1 and guard cpuset-v1 code under CONFIG_CPUSETS_V1. The default value of CONFIG_CPUSETS_V1 is N, so that user who adopted v2 don't have 'pay' for cpuset v1. Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-30dma-buf: Split out dma fence array create into alloc and arm functionsMatthew Brost
Useful to preallocate dma fence array and then arm in path of reclaim or a dma fence. v2: - s/arm/init (Christian) - Drop !array warn (Christian) v3: - Fix kernel doc typos (dim) Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826170144.2492062-2-matthew.brost@intel.com
2024-08-31Merge tag 'iommu-fixes-v6.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Fix a device-stall problem in bad io-page-fault setups (faults received from devices with no supporting domain attached). - Context flush fix for Intel VT-d. - Do not allow non-read+non-write mapping through iommufd as most implementations can not handle that. - Fix a possible infinite-loop issue in map_pages() path. - Add Jean-Philippe as reviewer for SMMUv3 SVA support * tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer iommu: Do not return 0 from map_pages if it doesn't do anything iommufd: Do not allow creating areas without READ or WRITE iommu/vt-d: Fix incorrect domain ID in context flush helper iommu: Handle iommu faults for a bad iopf setup
2024-08-30ARM: OMAP2+: Remove obsoleted declaration for gpmc_onenand_initGaosheng Cui
The gpmc_onenand_init() have been removed since commit 2514830b8b8c ("ARM: OMAP2+: Remove gpmc-onenand"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20240826035823.4043171-1-cuigaosheng1@huawei.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2024-08-30fbdev: Introduce devm_register_framebuffer()Thomas Weißschuh
Introduce a device-managed variant of register_framebuffer() which automatically unregisters the framebuffer on device destruction. This can simplify the error handling and resource management in drivers. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Helge Deller <deller@gmx.de>
2024-08-30arm64: smccc: Reserve block of KVM "vendor" services for pKVM hypercallsWill Deacon
pKVM relies on hypercalls to expose services such as memory sharing to protected guests. Tentatively allocate a block of 58 hypercalls (i.e. fill the remaining space in the first 64 function IDs) for pKVM usage, as future extensions such as pvIOMMU support, range-based memory sharing and validation of assigned devices will require additional services. Suggested-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/86a5h5yg5y.wl-maz@kernel.org Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-8-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercallWill Deacon
Hook up pKVM's MMIO_GUARD hypercall so that ioremap() and friends will register the target physical address as MMIO with the hypervisor, allowing guest exits to that page to be emulated by the host with full syndrome information. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-7-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30drivers/virt: pkvm: Hook up mem_encrypt API using pKVM hypercallsWill Deacon
If we detect the presence of pKVM's SHARE and UNSHARE hypercalls, then register a backend implementation of the mem_encrypt API so that things like DMA buffers can be shared appropriately with the host. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-5-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30drivers/virt: pkvm: Add initial support for running as a protected guestWill Deacon
Implement a pKVM protected guest driver to probe the presence of pKVM and determine the memory protection granule using the HYP_MEMINFO hypercall. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATORDouglas Anderson
When adding devm_regulator_bulk_get_const() I missed adding a stub for when CONFIG_REGULATOR is not enabled. Under certain conditions (like randconfig testing) this can cause the compiler to reports errors like: error: implicit declaration of function 'devm_regulator_bulk_get_const'; did you mean 'devm_regulator_bulk_get_enable'? Add the stub. Fixes: 1de452a0edda ("regulator: core: Allow drivers to define their init data as const") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408301813.TesFuSbh-lkp@intel.com/ Cc: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20240830073511.1.Ib733229a8a19fad8179213c05e1af01b51e42328@changeid Signed-off-by: Mark Brown <broonie@kernel.org>
2024-08-30fs: drop GFP_NOFAIL mode from alloc_page_buffersMichal Hocko
There is only one called of alloc_page_buffers and it doesn't require __GFP_NOFAIL so drop this allocation mode. Signed-off-by: Michal Hocko <mhocko@suse.com> Link: https://lore.kernel.org/r/20240829130640.1397970-1-mhocko@kernel.org Acked-by: Song Liu <song@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30iommu: Allow ATS to work on VFs when the PF uses IDENTITYJason Gunthorpe
PCI ATS has a global Smallest Translation Unit field that is located in the PF but shared by all of the VFs. The expectation is that the STU will be set to the root port's global STU capability which is driven by the IO page table configuration of the iommu HW. Today it becomes set when the iommu driver first enables ATS. Thus, to enable ATS on the VF, the PF must have already had the correct STU programmed, even if ATS is off on the PF. Unfortunately the PF only programs the STU when the PF enables ATS. The iommu drivers tend to leave ATS disabled when IDENTITY translation is being used. Thus we can get into a state where the PF is setup to use IDENTITY with the DMA API while the VF would like to use VFIO with a PAGING domain and have ATS turned on. This fails because the PF never loaded a PAGING domain and so it never setup the STU, and the VF can't do it. The simplest solution is to have the iommu driver set the ATS STU when it probes the device. This way the ATS STU is loaded immediately at boot time to all PFs and there is no issue when a VF comes to use it. Add a new call pci_prepare_ats() which should be called by iommu drivers in their probe_device() op for every PCI device if the iommu driver supports ATS. This will setup the STU based on whatever page size capability the iommu HW has. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-0fb4d2ab6770+7e706-ats_vf_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-30drivers/perf: arm_spe: Use perf_allow_kernel() for permissionsJames Clark
Use perf_allow_kernel() for 'pa_enable' (physical addresses), 'pct_enable' (physical timestamps) and context IDs. This means that perf_event_paranoid is now taken into account and LSM hooks can be used, which is more consistent with other perf_event_open calls. For example PERF_SAMPLE_PHYS_ADDR uses perf_allow_kernel() rather than just perfmon_capable(). This also indirectly fixes the following error message which is misleading because perf_event_paranoid is not taken into account by perfmon_capable(): $ perf record -e arm_spe/pa_enable/ Error: Access to performance monitoring and observability operations is limited. Consider adjusting /proc/sys/kernel/perf_event_paranoid setting ... Suggested-by: Al Grant <al.grant@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240827145113.1224604-1-james.clark@linaro.org Link: https://lore.kernel.org/all/20240807120039.GD37996@noisy.programming.kicks-ass.net/ Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30mfd: ds1wm: Remove remaining header fileWilken Gottwalt
The driver was removed after kernel 6.2 rendering the header file unused. Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net> Link: https://lore.kernel.org/r/ZpTbGHb6EX2Oe7ok@monster.localdomain Signed-off-by: Lee Jones <lee@kernel.org>
2024-08-30mfd: 88pm80x: Constify read-only regmap structsJavier Carrasco
`pm800_irq`, `pm805_irq` and `pm805_irq_chip` are not modified and can be declared as const to move their data to a read-only section. In order to keep the const modifier for the regmap_irq_chip structures, the pointer used to reference them must be converted to const as well. Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://lore.kernel.org/r/20240704-mfd-const-regmap_config-v2-8-0c8785b1331d@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-08-30Merge branches 'ib-mfd-for-iio-power-6.12' and 'ib-mfd-gpio-pwm-6.12' into ↵Lee Jones
ibs-for-mfd-merged
2024-08-30inode: make i_state a u32Christian Brauner
Now that we use the wait var event mechanism make i_state a u32 and free up 4 bytes. This means we currently have two 4 byte holes in struct inode which we can pack. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-6-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30inode: port __I_NEW to var eventChristian Brauner
Port the __I_NEW mechanism to use the new var event mechanism. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-4-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: reorder i_state bitsChristian Brauner
so that we can use the first bits to derive unique addresses from i_state. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-2-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: add i_state helpersChristian Brauner
The i_state member is an unsigned long so that it can be used with the wait bit infrastructure which expects unsigned long. This wastes 4 bytes which we're unlikely to ever use. Switch to using the var event wait mechanism using the address of the bit. Thanks to Linus for the address idea. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-1-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: s/__u32/u32/ for s_fsnotify_maskChristian Brauner
The underscore variants are for uapi whereas the non-underscore variants are for in-kernel consumers. Link: https://lore.kernel.org/r/20240822-anwerben-nutzung-1cd6c82a565f@brauner Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: remove unused path_put_init()Christian Brauner
This helper has been unused for a while now. Link: https://lore.kernel.org/r/20240822-bewuchs-werktag-46672b3c0606@brauner Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30inode: remove __I_DIO_WAKEUPChristian Brauner
Afaict, we can just rely on inode->i_dio_count for waiting instead of this awkward indirection through __I_DIO_WAKEUP. This survives LTP dio and xfstests dio tests. Link: https://lore.kernel.org/r/20240816-vfs-misc-dio-v1-1-80fe21a2c710@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: move FMODE_UNSIGNED_OFFSET to fop_flagsChristian Brauner
This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: only read fops once in fops_get/putMateusz Guzik
In do_dentry_open() the usage is: f->f_op = fops_get(inode->i_fop); In generated asm the compiler emits 2 reads from inode->i_fop instead of just one. This popped up due to false-sharing where loads from that offset end up bouncing a cacheline during parallel open. While this is going to be fixed, the spurious load does not need to be there. This makes do_dentry_open() go down from 1177 to 1154 bytes. fops_put() is patched to maintain some consistency. No functional changes. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240810064753.1211441-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: dodge smp_mb in break_lease and break_deleg in the common caseMateusz Guzik
These inlines show up in the fast path (e.g., in do_dentry_open()) and induce said full barrier regarding i_flctx access when in most cases the pointer is NULL. The pointer can be safely checked before issuing the barrier, dodging it in most cases as a result. It is plausible the consume fence would be sufficient, but I don't want to go audit all callers regarding what they before calling here. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240806172846.886570-1-mjguzik@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30Merge tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Another week, another set of GPU fixes. amdgpu and vmwgfx leading the charge, then i915 and xe changes along with v3d and some other bits. The TTM revert is due to some stuttering graphical apps probably due to longer stalls while prefaulting. Seems pretty much where I'd expect things, ttm: - revert prefault change, caused stutters aperture: - handle non-VGA devices bettter amdgpu: - SWSMU gaming stability fix - SMU 13.0.7 fix - SWSMU documentation alignment fix - SMU 14.0.x fixes - GC 12.x fix - Display fix - IP discovery fix - SMU 13.0.6 fix i915: - Fix #11195: The external display connect via USB type-C dock stays blank after re-connect the dock - Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F - Move ARL GuC firmware to correct version xe: - Invalidate media_gt TLBs - Fix HWMON i1 power setup write command vmwgfx: - prevent unmapping active read buffers - fix prime with external buffers - disable coherent dumb buffers without 3d v3d: - disable preemption while updating GPU stats" * tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel: drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16 drm/v3d: Disable preemption while updating GPU stats drm/amd/pm: Drop unsupported features on smu v14_0_2 drm/amd/pm: Add support for new P2S table revision drm/amdgpu: support for gc_info table v1.3 drm/amd/display: avoid using null object of framebuffer drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs drm/amd/pm: update message interface for smu v14.0.2/3 drm/amdgpu/swsmu: always force a state reprogram on init drm/amdgpu/smu13.0.7: print index for profiles drm/amdgpu: align pp_power_profile_mode with kernel docs drm/i915/dp_mst: Fix MST state after a sink reset drm/xe: Invalidate media_gt TLBs drm/i915: ARL requires a newer GSC firmware drm/i915/dsi: Make Lenovo Yoga Tab 3 X90F DMI match less strict video/aperture: optionally match the device in sysfb_disable() drm/vmwgfx: Disable coherent dumb buffers without 3d drm/vmwgfx: Fix prime with external buffers drm/vmwgfx: Prevent unmapping active read buffers Revert "drm/ttm: increase ttm pre-fault value to PMD size"
2024-08-30Merge tag 'drm-misc-fixes-2024-08-29' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes A revert for a previous TTM commit causing stuttering, 3 fixes for vmwgfx related to buffer operations, a fix for video/aperture with non-VGA primary devices, and a preemption status fix for v3d Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240829-efficient-swift-from-lemuria-f60c05@houat
2024-08-29bpf: Add gen_epilogue to bpf_verifier_opsMartin KaFai Lau
This patch adds a .gen_epilogue to the bpf_verifier_ops. It is similar to the existing .gen_prologue. Instead of allowing a subsystem to run code at the beginning of a bpf prog, it allows the subsystem to run code just before the bpf prog exit. One of the use case is to allow the upcoming bpf qdisc to ensure that the skb->dev is the same as the qdisc->dev_queue->dev. The bpf qdisc struct_ops implementation could either fix it up or drop the skb. Another use case could be in bpf_tcp_ca.c to enforce snd_cwnd has sane value (e.g. non zero). The epilogue can do the useful thing (like checking skb->dev) if it can access the bpf prog's ctx. Unlike prologue, r1 may not hold the ctx pointer. This patch saves the r1 in the stack if the .gen_epilogue has returned some instructions in the "epilogue_buf". The existing .gen_prologue is done in convert_ctx_accesses(). The new .gen_epilogue is done in the convert_ctx_accesses() also. When it sees the (BPF_JMP | BPF_EXIT) instruction, it will be patched with the earlier generated "epilogue_buf". The epilogue patching is only done for the main prog. Only one epilogue will be patched to the main program. When the bpf prog has multiple BPF_EXIT instructions, a BPF_JA is used to goto the earlier patched epilogue. Majority of the archs support (BPF_JMP32 | BPF_JA): x86, arm, s390, risv64, loongarch, powerpc and arc. This patch keeps it simple and always use (BPF_JMP32 | BPF_JA). A new macro BPF_JMP32_A is added to generate the (BPF_JMP32 | BPF_JA) insn. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240829210833.388152-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29bpf: Move insn_buf[16] to bpf_verifier_envMartin KaFai Lau
This patch moves the 'struct bpf_insn insn_buf[16]' stack usage to the bpf_verifier_env. A '#define INSN_BUF_SIZE 16' is also added to replace the ARRAY_SIZE(insn_buf) usages. Both convert_ctx_accesses() and do_misc_fixup() are changed to use the env->insn_buf. It is a refactoring work for adding the epilogue_buf[16] in a later patch. With this patch, the stack size usage decreased. Before: ./kernel/bpf/verifier.c:22133:5: warning: stack frame size (2584) After: ./kernel/bpf/verifier.c:22184:5: warning: stack frame size (2264) Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240829210833.388152-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>