summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-02-24mm/hugetlb: fix some comment typosMiaohe Lin
Fix typos sasitfy to satisfy, reservtion to reservation, hugegpage to hugepage and uniprocesor to uniprocessor in comments. Link: https://lkml.kernel.org/r/20210128112028.64831-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm/hugetlb: grab head page refcount once for group of subpagesJoao Martins
Patch series "mm/hugetlb: follow_hugetlb_page() improvements", v2. While looking at ZONE_DEVICE struct page reuse particularly the last patch[0], I found two possible improvements for follow_hugetlb_page() which is solely used for get_user_pages()/pin_user_pages(). The first patch batches page refcount updates while the second tidies up storing the subpages/vmas. Both together bring the cost of slow variant of gup() cost from ~87.6k usecs to ~5.8k usecs. libhugetlbfs tests seem to pass as well gup_test benchmarks with hugetlbfs vmas. This patch (of 2): follow_hugetlb_page() once it locks the pmd/pud, checks all its N subpages in a huge page and grabs a reference for each one. Similar to gup-fast, have follow_hugetlb_page() grab the head page refcount only after counting all its subpages that are part of the just faulted huge page. Consequently we reduce the number of atomics necessary to pin said huge page, which improves non-fast gup() considerably: - 16G with 1G huge page size gup_test -f /mnt/huge/file -m 16384 -r 10 -L -S -n 512 -w PIN_LONGTERM_BENCHMARK: ~87.6k us -> ~12.8k us Link: https://lkml.kernel.org/r/20210128182632.24562-1-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/20210128182632.24562-2-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm/gfp: add kernel-doc for gfp_tMatthew Wilcox (Oracle)
The generated html will link to the definition of the gfp_t automatically once we define it. Move the one-paragraph overview of GFP flags from the documentation directory into gfp.h and pull gfp.h into the documentation. This generates warnings with clang (https://lkml.kernel.org/r/20210219195509.GA59987@24bbad8f3778), so use a #if 0 to hide it from the compiler for now. Link: https://lkml.kernel.org/r/20210215204909.3824509-1-willy@infradead.org Link: https://lkml.kernel.org/r/20210220003049.GZ2858050@casper.infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: simplify free_highmem_page() and free_reserved_page()David Hildenbrand
adjust_managed_page_count() as called by free_reserved_page() properly handles pages in a highmem zone, so we can reuse it for free_highmem_page(). We can now get rid of totalhigh_pages_inc() and simplify free_reserved_page(). Link: https://lkml.kernel.org/r/20210126182113.19892-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: simplify parater of function memmap_init_zone()Baoquan He
As David suggested, simply passing 'struct zone *zone' is enough. We can get all needed information from 'struct zone*' easily. Link: https://lkml.kernel.org/r/20210122135956.5946-4-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: rename memmap_init() and memmap_init_zone()Baoquan He
The current memmap_init_zone() only handles memory region inside one zone, actually memmap_init() does the memmap init of one zone. So rename both of them accordingly. Link: https://lkml.kernel.org/r/20210122135956.5946-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: fix prototype warning from kernel test robotBaoquan He
Patch series "mm: clean up names and parameters of memmap_init_xxxx functions", v5. This patchset corrects inappropriate function names of memmap_init_xxx, and simplify parameters of functions in the code flow. And also fix a prototype warning reported by lkp. This patch (of 5); Kernel test robot calling make with 'W=1' is triggering warning like below for memmap_init_zone() function. mm/page_alloc.c:6259:23: warning: no previous prototype for 'memmap_init_zone' [-Wmissing-prototypes] 6259 | void __meminit __weak memmap_init_zone(unsigned long size, int nid, | ^~~~~~~~~~~~~~~~ Fix it by adding the function declaration in include/linux/mm.h. Since memmap_init_zone() has a generic version with '__weak', the declaratoin in ia64 header file can be simply removed. Link: https://lkml.kernel.org/r/20210122135956.5946-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20210122135956.5946-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24kasan: fix bug detection via ksize for HW_TAGS modeAndrey Konovalov
The currently existing kasan_check_read/write() annotations are intended to be used for kernel modules that have KASAN compiler instrumentation disabled. Thus, they are only relevant for the software KASAN modes that rely on compiler instrumentation. However there's another use case for these annotations: ksize() checks that the object passed to it is indeed accessible before unpoisoning the whole object. This is currently done via __kasan_check_read(), which is compiled away for the hardware tag-based mode that doesn't rely on compiler instrumentation. This leads to KASAN missing detecting some memory corruptions. Provide another annotation called kasan_check_byte() that is available for all KASAN modes. As the implementation rename and reuse kasan_check_invalid_free(). Use this new annotation in ksize(). To avoid having ksize() as the top frame in the reported stack trace pass _RET_IP_ to __kasan_check_byte(). Also add a new ksize_uaf() test that checks that a use-after-free is detected via ksize() itself, and via plain accesses that happen later. Link: https://linux-review.googlesource.com/id/Iaabf771881d0f9ce1b969f2a62938e99d3308ec5 Link: https://lkml.kernel.org/r/f32ad74a60b28d8402482a38476f02bb7600f620.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24kasan: move _RET_IP_ to inline wrappersAndrey Konovalov
Generic mm functions that call KASAN annotations that might report a bug pass _RET_IP_ to them as an argument. This allows KASAN to include the name of the function that called the mm function in its report's header. Now that KASAN has inline wrappers for all of its annotations, move _RET_IP_ to those wrappers to simplify annotation call sites. Link: https://linux-review.googlesource.com/id/I8fb3c06d49671305ee184175a39591bc26647a67 Link: https://lkml.kernel.org/r/5c1490eddf20b436b8c4eeea83fce47687d5e4a4.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24fs: buffer: use raw page_memcg() on locked pageJohannes Weiner
alloc_page_buffers() currently uses get_mem_cgroup_from_page() for charging the buffers to the page owner, which does an rcu-protected page->memcg lookup and acquires a reference. But buffer allocation has the page lock held throughout, which pins the page to the memcg and thereby the memcg - neither rcu nor holding an extra reference during the allocation are necessary. Use a raw page_memcg() instead. This was the last user of get_mem_cgroup_from_page(), delete it. Link: https://lkml.kernel.org/r/20210209190126.97842-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: page_counter: re-layout structure to reduce false sharingFeng Tang
When checking a memory cgroup related performance regression [1], from the perf c2c profiling data, we found high false sharing for accessing 'usage' and 'parent'. On 64 bit system, the 'usage' and 'parent' are close to each other, and easy to be in one cacheline (for cacheline size == 64+ B). 'usage' is usally written, while 'parent' is usually read as the cgroup's hierarchical counting nature. So move the 'parent' to the end of the structure to make sure they are in different cache lines. Following are some performance data with the patch, against v5.11-rc1. [ In the data, A means a platform with 2 sockets 48C/96T, B is a platform of 4 sockests 72C/144T, and if a %stddev will be shown bigger than 2%, P100/P50 means number of test tasks equals to 100%/50% of nr_cpu] will-it-scale/malloc1 --------------------- v5.11-rc1 v5.11-rc1+patch A-P100 15782 ± 2% -0.1% 15765 ± 3% will-it-scale.per_process_ops A-P50 21511 +8.9% 23432 will-it-scale.per_process_ops B-P100 9155 +2.2% 9357 will-it-scale.per_process_ops B-P50 10967 +7.1% 11751 ± 2% will-it-scale.per_process_ops will-it-scale/pagefault2 ------------------------ v5.11-rc1 v5.11-rc1+patch A-P100 79028 +3.0% 81411 will-it-scale.per_process_ops A-P50 183960 ± 2% +4.4% 192078 ± 2% will-it-scale.per_process_ops B-P100 85966 +9.9% 94467 ± 3% will-it-scale.per_process_ops B-P50 198195 +9.8% 217526 will-it-scale.per_process_ops fio (4k/1M is block size) ------------------------- v5.11-rc1 v5.11-rc1+patch A-P50-r-4k 16881 ± 2% +1.2% 17081 ± 2% fio.read_bw_MBps A-P50-w-4k 3931 +4.5% 4111 ± 2% fio.write_bw_MBps A-P50-r-1M 15178 -0.2% 15154 fio.read_bw_MBps A-P50-w-1M 3924 +0.1% 3929 fio.write_bw_MBps [1].https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Link: https://lkml.kernel.org/r/1611040814-33449-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: kmem: make __memcg_kmem_(un)charge staticRoman Gushchin
I've noticed that __memcg_kmem_charge() and __memcg_kmem_uncharge() are not used anywhere except memcontrol.c. Yet they are not declared as non-static and are declared in memcontrol.h. This patch makes them static. Link: https://lkml.kernel.org/r/20210108020332.4096911-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcg: add swapcache stat for memcg v2Shakeel Butt
This patch adds swapcache stat for the cgroup v2. The swapcache represents the memory that is accounted against both the memory and the swap limit of the cgroup. The main motivation behind exposing the swapcache stat is for enabling users to gracefully migrate from cgroup v1's memsw counter to cgroup v2's memory and swap counters. Cgroup v1's memsw limit allows users to limit the memory+swap usage of a workload but without control on the exact proportion of memory and swap. Cgroup v2 provides separate limits for memory and swap which enables more control on the exact usage of memory and swap individually for the workload. With some little subtleties, the v1's memsw limit can be switched with the sum of the v2's memory and swap limits. However the alternative for memsw usage is not yet available in cgroup v2. Exposing per-cgroup swapcache stat enables that alternative. Adding the memory usage and swap usage and subtracting the swapcache will approximate the memsw usage. This will help in the transparent migration of the workloads depending on memsw usage and limit to v2' memory and swap counters. The reasons these applications are still interested in this approximate memsw usage are: (1) these applications are not really interested in two separate memory and swap usage metrics. A single usage metric is more simple to use and reason about for them. (2) The memsw usage metric hides the underlying system's swap setup from the applications. Applications with multiple instances running in a datacenter with heterogeneous systems (some have swap and some don't) will keep seeing a consistent view of their usage. [akpm@linux-foundation.org: fix CONFIG_SWAP=n build] Link: https://lkml.kernel.org/r/20210108155813.2914586-3-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcontrol: convert NR_FILE_PMDMAPPED account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_FILE_PMDMAPPED account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-7-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_PMDMAPPED account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-6-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcontrol: convert NR_SHMEM_THPS account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcontrol: convert NR_FILE_THPS account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with if hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_FILE_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcontrol: convert NR_ANON_THPS account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_ANON_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: NeilBrown <neilb@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcontrol: optimize per-lruvec stats counter memory usageMuchun Song
The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), Actually the threshold can be as big as MEMCG_CHARGE_BATCH * PAGE_SIZE. It still fits into s32. So introduce struct batched_lruvec_stat to optimize memory usage. The size of struct lruvec_stat is 304 bytes on 64 bit systems. As it is a per-cpu structure. So with this patch, we can save 304 / 2 * ncpu bytes per-memcg per-node where ncpu is the number of the possible CPU. If there are c memory cgroup (include dying cgroup) and n NUMA node in the system. Finally, we can save (152 * ncpu * c * n) bytes. [akpm@linux-foundation.org: fix typo in comment] Link: https://lkml.kernel.org/r/20201210042121.39665-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Chris Down <chris@chrisdown.name> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNTRoman Gushchin
In general it's unknown in advance if a slab page will contain accounted objects or not. In order to avoid memory waste, an obj_cgroup vector is allocated dynamically when a need to account of a new object arises. Such approach is memory efficient, but requires an expensive cmpxchg() to set up the memcg/objcgs pointer, because an allocation can race with a different allocation on another cpu. But in some common cases it's known for sure that a slab page will contain accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT flag set. It includes such popular objects like vm_area_struct, anon_vma, task_struct, etc. In such cases we can pre-allocate the objcgs vector and simple assign it to the page without any atomic operations, because at this early stage the page is not visible to anyone else. A very simplistic benchmark (allocating 10000000 64-bytes objects in a row) shows ~15% win. In the real life it seems that most workloads are not very sensitive to the speed of (accounted) slab allocations. [guro@fb.com: open-code set_page_objcgs() and add some comments, by Johannes] Link: https://lkml.kernel.org/r/20201113001926.GA2934489@carbon.dhcp.thefacebook.com [akpm@linux-foundation.org: fix it for mm-slub-call-account_slab_page-after-slab-page-initialization-fix.patch] Link: https://lkml.kernel.org/r/20201110195753.530157-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm/filemap: rename generic_file_buffered_read to filemap_readChristoph Hellwig
Rename generic_file_buffered_read to match the naming of filemap_fault, also update the written parameter to a more descriptive name and improve the kerneldoc comment. Link: https://lkml.kernel.org/r/20210122160140.223228-18-willy@infradead.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm/filemap: pass a sleep state to put_and_wait_on_page_lockedMatthew Wilcox (Oracle)
This is prep work for the next patch, but I think at least one of the current callers would prefer a killable sleep to an uninterruptible one. Link: https://lkml.kernel.org/r/20210122160140.223228-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm/filemap: remove unused parameter and change to void type for ↵Baolin Wang
replace_page_cache_page() Since commit 74d609585d8b ("page cache: Add and replace pages using the XArray") was merged, the replace_page_cache_page() can not fail and always return 0, we can remove the redundant return value and void it. Moreover remove the unused gfp_mask. Link: https://lkml.kernel.org/r/609c30e5274ba15d8b90c872fd0d8ac437a9b2bb.1610071401.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24Merge branch 'pci/ntb'Bjorn Helgaas
- Account for 64-bit BARs in pci_epc_get_first_free_bar() (Kishon Vijay Abraham I) - Add pci_epc_get_next_free_bar() helper (Kishon Vijay Abraham I) - Return error codes on failure of endpoint BAR interfaces (Kishon Vijay Abraham I) - Remove unused pci_epf_match_device() (Kishon Vijay Abraham I) - Add support for secondary endpoint controller to prepare for NTB endpoint functionality (Kishon Vijay Abraham I) - Add configfs support for secondary endpoint controller (Kishon Vijay Abraham I) - Add MSI address mapping ops for NTB doorbell support (Kishon Vijay Abraham I) - Add ops for endpoint function-specific attributes (Kishon Vijay Abraham I) - Allow configfs subdirectory for endpoint function configuration (Kishon Vijay Abraham I) - Implement cadence MSI address mapping ops (Kishon Vijay Abraham I) - Configure cadence LM_EP_FUNC_CFG based on epc->function_num_map (Kishon Vijay Abraham I) - Add endpoint-side driver to provide NTB functionality (Kishon Vijay Abraham I) - Add host-side driver for generic EPF NTB functionality (Kishon Vijay Abraham I) - Document NTB endpoint functionality (Kishon Vijay Abraham I) * pci/ntb: Documentation: PCI: Add PCI endpoint NTB function user guide Documentation: PCI: Add configfs binding documentation for pci-ntb endpoint function NTB: Add support for EPF PCI Non-Transparent Bridge PCI: Add TI J721E device to PCI IDs PCI: endpoint: Add EP function driver to provide NTB functionality PCI: cadence: Configure LM_EP_FUNC_CFG based on epc->function_num_map PCI: cadence: Implement ->msi_map_irq() ops PCI: endpoint: Allow user to create sub-directory of 'EPF Device' directory PCI: endpoint: Add pci_epf_ops to expose function-specific attrs PCI: endpoint: Add pci_epc_ops to map MSI IRQ PCI: endpoint: Add support in configfs to associate two EPCs with EPF PCI: endpoint: Add support to associate secondary EPC with EPF PCI: endpoint: Remove unused pci_epf_match_device() PCI: endpoint: Make *_free_bar() to return error codes on failure PCI: endpoint: Add helper API to get the 'next' unreserved BAR PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR Documentation: PCI: Add specification for the PCI NTB function device
2021-02-24Merge branch 'pci/misc'Bjorn Helgaas
- Align checking of syscall user config accessor return codes (Heiner Kallweit) - Fix "ordering" comment typos (Bjorn Helgaas) - Fix 'ARM/TEXAS INSTRUMENT KEYSTONE CLOCKSOURCE' capitalization in MAINTAINERS (Bjorn Helgaas) - Add Silicom Denmark vendor ID (Martin Hundebøll) - Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He) - Remove WARN_ON(in_interrupt()) (Sebastian Andrzej Siewior) * pci/misc: PCI: Remove WARN_ON(in_interrupt()) PCI: Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy PCI: Add Silicom Denmark vendor ID MAINTAINERS: Fix 'ARM/TEXAS INSTRUMENT KEYSTONE CLOCKSOURCE' capitalization Fix "ordering" comment typos PCI: Align checking of syscall user config accessors
2021-02-24Merge tag 'rpmsg-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc Pull rpmsg updates from Bjorn Andersson: "Fix two build issues in the GLINK driver and correct some kerneldoc in the same" * tag 'rpmsg-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: rpmsg: glink: add include of header file rpmsg: glink: Guard qcom_glink_ssr_notify() with correct config rpmsg: glink: fix some kerneldoc comments
2021-02-24Merge tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updatesfrom Alex Williamson: - Virtual address update handling (Steve Sistare) - s390/zpci fixes and cleanups (Max Gurtovoy) - Fixes for dirty bitmap handling, non-mdev page pinning, and improved pinned dirty scope tracking (Keqian Zhu) - Batched page pinning enhancement (Daniel Jordan) - Page access permission fix (Alex Williamson) * tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio: (21 commits) vfio/type1: Batch page pinning vfio/type1: Prepare for batched pinning with struct vfio_batch vfio/type1: Change success value of vaddr_get_pfn() vfio/type1: Use follow_pte() vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig vfio/iommu_type1: Fix duplicate included kthread.h vfio-pci/zdev: fix possible segmentation fault issue vfio-pci/zdev: remove unused vdev argument vfio/pci: Fix handling of pci use accessor return codes vfio/iommu_type1: Mantain a counter for non_pinned_groups vfio/iommu_type1: Fix some sanity checks in detach group vfio/iommu_type1: Populate full dirty when detach non-pinned group vfio/type1: block on invalid vaddr vfio/type1: implement notify callback vfio: iommu driver notify callback vfio/type1: implement interfaces to update vaddr vfio/type1: massage unmap iteration vfio: interfaces to update vaddr vfio/type1: implement unmap all vfio/type1: unmap cleanup ...
2021-02-24Merge tag 'sfi-removal-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull Simple Firmware Interface (SFI) support removal from Rafael Wysocki: "Drop support for depercated platforms using SFI, drop the entire support for SFI that has been long deprecated too and make some janitorial changes on top of that (Andy Shevchenko)" * tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/platform/intel-mid: Update Copyright year and drop file names x86/platform/intel-mid: Remove unused header inclusion in intel-mid.h x86/platform/intel-mid: Drop unused __intel_mid_cpu_chip and Co. x86/platform/intel-mid: Get rid of intel_scu_ipc_legacy.h x86/PCI: Describe @reg for type1_access_ok() x86/PCI: Get rid of custom x86 model comparison sfi: Remove framework for deprecated firmware cpufreq: sfi-cpufreq: Remove driver for deprecated firmware media: atomisp: Remove unused header mfd: intel_msic: Remove driver for deprecated platform x86/apb_timer: Remove driver for deprecated platform x86/platform/intel-mid: Remove unused leftovers (vRTC) x86/platform/intel-mid: Remove unused leftovers (msic) x86/platform/intel-mid: Remove unused leftovers (msic_thermal) x86/platform/intel-mid: Remove unused leftovers (msic_power_btn) x86/platform/intel-mid: Remove unused leftovers (msic_gpio) x86/platform/intel-mid: Remove unused leftovers (msic_battery) x86/platform/intel-mid: Remove unused leftovers (msic_ocd) x86/platform/intel-mid: Remove unused leftovers (msic_audio) platform/x86: intel_scu_wdt: Drop mistakenly added const
2021-02-24Merge tag 'char-misc-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the large set of char/misc/whatever driver subsystem updates for 5.12-rc1. Over time it seems like this tree is collecting more and more tiny driver subsystems in one place, making it easier for those maintainers, which is why this is getting larger. Included in here are: - coresight driver updates - habannalabs driver updates - virtual acrn driver addition (proper acks from the x86 maintainers) - broadcom misc driver addition - speakup driver updates - soundwire driver updates - fpga driver updates - amba driver updates - mei driver updates - vfio driver updates - greybus driver updates - nvmeem driver updates - phy driver updates - mhi driver updates - interconnect driver udpates - fsl-mc bus driver updates - random driver fix - some small misc driver updates (rtsx, pvpanic, etc.) All of these have been in linux-next for a while, with the only reported issue being a merge conflict due to the dfl_device_id addition from the fpga subsystem in here" * tag 'char-misc-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits) spmi: spmi-pmic-arb: Fix hw_irq overflow Documentation: coresight: Add PID tracing description coresight: etm-perf: Support PID tracing for kernel at EL2 coresight: etm-perf: Clarify comment on perf options ACRN: update MAINTAINERS: mailing list is subscribers-only regmap: sdw-mbq: use MODULE_LICENSE("GPL") regmap: sdw: use no_pm routines for SoundWire 1.2 MBQ regmap: sdw: use _no_pm functions in regmap_read/write soundwire: intel: fix possible crash when no device is detected MAINTAINERS: replace my with email with replacements mhi: Fix double dma free uapi: map_to_7segment: Update example in documentation uio: uio_pci_generic: don't fail probe if pdev->irq equals to IRQ_NOTCONNECTED drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue firewire: replace tricky statement by two simple ones vme: make remove callback return void firmware: google: make coreboot driver's remove callback return void firmware: xilinx: Use explicit values for all enum values sample/acrn: Introduce a sample of HSM ioctl interface usage virt: acrn: Introduce an interface for Service VM to control vCPU ...
2021-02-24Merge tag 'driver-core-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / debugfs update from Greg KH: "Here is the "big" driver core and debugfs update for 5.12-rc1 This set of driver core patches caused a bunch of problems in linux-next for the past few weeks, when Saravana tried to set fw_devlink=on as the default functionality. This caused a number of systems to stop booting, and lots of bugs were fixed in this area for almost all of the reported systems, but this option is not ready to be turned on just yet for the default operation based on this testing, so I've reverted that change at the very end so we don't have to worry about regressions in 5.12 We will try to turn this on for 5.13 if testing goes better over the next few months. Other than the fixes caused by the fw_devlink testing in here, there's not much more: - debugfs fixes for invalid input into debugfs_lookup() - kerneldoc cleanups - warn message if platform drivers return an error on their remove callback (a futile effort, but good to catch). All of these have been in linux-next for a while now, and the regressions have gone away with the revert of the fw_devlink change" * tag 'driver-core-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Revert "driver core: Set fw_devlink=on by default" of: property: fw_devlink: Ignore interrupts property for some configs debugfs: do not attempt to create a new file before the filesystem is initalized debugfs: be more robust at handling improper input in debugfs_lookup() driver core: auxiliary bus: Fix calling stage for auxiliary bus init of: irq: Fix the return value for of_irq_parse_one() stub of: irq: make a stub for of_irq_parse_one() clk: Mark fwnodes when their clock provider is added/removed PM: domains: Mark fwnodes when their powerdomain is added/removed irqdomain: Mark fwnodes when their irqdomain is added/removed driver core: fw_devlink: Handle suppliers that don't use driver core of: property: Add fw_devlink support for optional properties driver core: Add fw_devlink.strict kernel param of: property: Don't add links to absent suppliers driver core: fw_devlink: Detect supplier devices that will never be added driver core: platform: Emit a warning if a remove callback returned non-zero of: property: Fix fw_devlink handling of interrupts/interrupts-extended gpiolib: Don't probe gpio_device if it's not the primary device device.h: Remove bogus "the" in kerneldoc gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default ...
2021-02-24Merge tag 'dma-mapping-5.12' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - add support to emulate processing delays in the DMA API benchmark selftest (Barry Song) - remove support for non-contiguous noncoherent allocations, which aren't used and will be replaced by a different API * tag 'dma-mapping-5.12' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: remove the {alloc,free}_noncoherent methods dma-mapping: benchmark: pretend DMA is transmitting
2021-02-24Merge tag 'cxl-for-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull initial support for CXL (Compute Express Link) from Dan Williams: "Introduce an initial driver for CXL 2.0 Type-3 Memory Devices. CXL is Compute Express Link which released the 2.0 specification in November. The Linux relevant changes in CXL 2.0 are support for an OS to dynamically assign address space to memory devices, support for switches, persistent memory, and hotplug. A Type-3 Memory Device is a PCI enumerated device presenting the CXL Memory Device Class Code and implementing the CXL.mem protocol. CXL.mem allows device to advertise CPU and I/O coherent memory to the system, i.e. typical "System RAM" and "Persistent Memory" in Linux /proc/iomem terms. In addition to the CXL.mem fast path there is an administrative command hardware mailbox interface for maintenance and provisioning. It is this command interface that is the focus of the initial driver. With this driver a CXL device that is mapped by the BIOS can be administered by Linux. Linux support for CXL PMEM and dynamic CXL address space management are to be implemented post v5.12" Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 4cdadfd5e0a7 ("cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints") 13237183c735 ("cxl/mem: Add a "RAW" send command") 472b1ce6e9d6 ("cxl/mem: Enable commands via CEL") 57ee605b976c ("cxl/mem: Add set of informational commands") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> 8adaf747c9f0 ("cxl/mem: Find device capabilities") b39cb1052a5c ("cxl/mem: Register CXL memX devices") * tag 'cxl-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: cxl/mem: Fix potential memory leak cxl/mem: Return -EFAULT if copy_to_user() fails MAINTAINERS: Add maintainers of the CXL driver cxl/mem: Add set of informational commands cxl/mem: Enable commands via CEL cxl/mem: Add a "RAW" send command cxl/mem: Add basic IOCTL interface cxl/mem: Register CXL memX devices cxl/mem: Find device capabilities cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints
2021-02-24Merge tag 'libnvdimm-for-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and device-dax updates from Dan Williams: - Fix the error code polarity for the device-dax/mapping attribute - For the device-dax and libnvdimm bus implementations stop implementing a useless return code for the remove() callback. - Miscellaneous cleanups * tag 'libnvdimm-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax-device: Make remove callback return void device-dax: Drop an empty .remove callback device-dax: Fix error path in dax_driver_register device-dax: Properly handle drivers without remove callback device-dax: Prevent registering drivers without probe callback libnvdimm: Make remove callback return void libnvdimm/dimm: Simplify nvdimm_remove() device-dax: Fix default return code of range_parse()
2021-02-24dma-fence: allow signaling drivers to set fence timestampVeera Sundaram Sankaran
Some drivers have hardware capability to get the precise HW timestamp of certain events based on which the fences are triggered. The delta between the event HW timestamp & current HW reference timestamp can be used to calculate the timestamp in kernel's CLOCK_MONOTONIC time domain. This allows it to set accurate timestamp factoring out any software and IRQ latencies. Add a timestamp variant of fence signal function, dma_fence_signal_timestamp to allow drivers to update the precise timestamp for fences. Changes in v2: - Add a new fence signal variant instead of modifying fence struct Changes in v3: - Add timestamp domain information to commit-text and dma_fence_signal_timestamp documentation Signed-off-by: Veera Sundaram Sankaran <veeras@codeaurora.org> Reviewed-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> [sumits: minor parenthesis alignment] Link: https://patchwork.freedesktop.org/patch/msgid/1610757107-11892-1-git-send-email-veeras@codeaurora.org (cherry picked from commit 5a164ac4dbd21b82bcdc03186d40e455ff467fdc) Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2021-02-24dma-buf: heaps: Rework heap allocation hooks to return struct dma_buf ↵John Stultz
instead of fd Every heap needs to create a dmabuf and then export it to a fd via dma_buf_fd(), so to consolidate things a bit, have the heaps just return a struct dmabuf * and let the top level dma_heap_buffer_alloc() call handle creating the fd via dma_buf_fd(). Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Liam Mark <lmark@codeaurora.org> Cc: Laura Abbott <labbott@kernel.org> Cc: Brian Starkey <Brian.Starkey@arm.com> Cc: Hridya Valsaraju <hridya@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Daniel Mentz <danielmentz@google.com> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> Cc: Ørjan Eide <orjan.eide@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Ezequiel Garcia <ezequiel@collabora.com> Cc: Simon Ser <contact@emersion.fr> Cc: James Jones <jajones@nvidia.com> Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> [sumits: minor reword of commit message] Link: https://patchwork.freedesktop.org/patch/msgid/20210119204508.9256-3-john.stultz@linaro.org (cherry picked from commit c7f59e3dd60313071a989227dcb69094f499d310) Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2021-02-24ACPI: platform: Add balanced-performance platform profileMaximilian Luz
Some devices, including most Microsoft Surface devices, have a platform profile somewhere inbetween balanced and performance. More specifically, adding this profile allows the following mapping on Surface devices: Vendor Name Platform Profile ------------------------------------------ Battery Saver low-power Recommended balanced Better Performance balanced-performance Best Performance performance Suggested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-24ACPI: platform: Fix file references in commentMaximilian Luz
The referenced files are named slightly different. Replace '-' with '_' and drop the .rst ending. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-23io_uring: ensure io-wq context is always destroyed for tasksJens Axboe
If the task ends up doing no IO, the context list is empty and we don't call into __io_uring_files_cancel() when the task exits. This can cause a leak of the io-wq structures. Ensure we always call __io_uring_files_cancel(), even if the task context list is empty. Fixes: 5aa75ed5b93f ("io_uring: tie async worker side to the task context") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23net: remove cmsg restriction from io_uring based send/recvmsg callsJens Axboe
No need to restrict these anymore, as the worker threads are direct clones of the original task. Hence we know for a fact that we can support anything that the regular task can. Since the only user of proto_ops->flags was to flag PROTO_CMSG_DATA_ONLY, kill the member and the flag definition too. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23Merge tag 'keys-misc-20210126' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring updates from David Howells: "Here's a set of minor keyrings fixes/cleanups that I've collected from various people for the upcoming merge window. A couple of them might, in theory, be visible to userspace: - Make blacklist_vet_description() reject uppercase letters as they don't match the all-lowercase hex string generated for a blacklist search. This may want reconsideration in the future, but, currently, you can't add to the blacklist keyring from userspace and the only source of blacklist keys generates lowercase descriptions. - Fix blacklist_init() to use a new KEY_ALLOC_* flag to indicate that it wants KEY_FLAG_KEEP to be set rather than passing KEY_FLAG_KEEP into keyring_alloc() as KEY_FLAG_KEEP isn't a valid alloc flag. This isn't currently a problem as the blacklist keyring isn't currently writable by userspace. The rest of the patches are cleanups and I don't think they should have any visible effect" * tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: watch_queue: rectify kernel-doc for init_watch() certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID certs: Fix blacklist flag type confusion PKCS#7: Fix missing include certs: Fix blacklisted hexadecimal hash string check certs/blacklist: fix kernel doc interface issue crypto: public_key: Remove redundant header file from public_key.h keys: remove trailing semicolon in macro definition crypto: pkcs7: Use match_string() helper to simplify the code PKCS#7: drop function from kernel-doc pkcs7_validate_trust_one encrypted-keys: Replace HTTP links with HTTPS ones crypto: asymmetric_keys: fix some comments in pkcs7_parser.h KEYS: remove redundant memset security: keys: delete repeated words in comments KEYS: asymmetric: Fix kerneldoc security/keys: use kvfree_sensitive() watch_queue: Drop references to /dev/watch_queue keys: Remove outdated __user annotations security: keys: Fix fall-through warnings for Clang
2021-02-24s390/cpumf: Add support for complete counter set extractionThomas Richter
Add support to the CPU Measurement counter facility device driver to extract complete counter sets per CPU and per counter set from user space. This includes a new device named /dev/hwctr and support for the device driver functions open, close and ioctl. Other functions are not supported. The ioctl command supports 3 subcommands: S390_HWCTR_START: enables counter sets on a list of CPUs. S390_HWCTR_STOP: disables counter sets on a list of CPUs. S390_HWCTR_READ: reads counter sets on a list of CPUs. The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which returns data. It requires member data_bytes to be positive and indicates the maximum amount of data available to store counter set data. The other ioctl() subcommands do not use this member and it should be set to zero. The S390_HWCTR_READ subcommand returns the following data: The cpuset data is flattened using the following scheme, stored in member data: 0x0 0x8 0xc 0x10 0x10 0x18 0x20 0x28 0xU-1 +---------+-----+---------+-----+---------+-----+-----+------+------+ | no_cpus | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n | +---------+-----+---------+-----+---------+-----+-----+------+------+ 0xU 0xU+4 0xU+8 0xU+10 0xV-1 +-----+---------+-----+-----+------+------+ | set | no_cnts | cv1 | cv2 | .... | cv_n | +-----+---------+-----+-----+------+------+ 0xV 0xV+4 0xV+8 0xV+c +-----+---------+-----+---------+-----+-----+------+------+ | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n | +-----+---------+-----+---------+-----+-----+------+------+ U and V denote arbitrary hexadezimal addresses. The first integer represents the number of CPUs data was extracted from. This is followed by CPU number and number of counter sets extracted. Both are two integer values. This is followed by the set identifer and number of counters extracted. Both are two integer values. This is followed by the counter values, each element is eight bytes in size. The S390_HWCTR_READ ioctl subcommand is also limited to one call per minute. This ensures that an application does not read out the counter sets too often and reduces the overall CPU performance. The complete counter set extraction is an expensive operation. Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-23Merge tag 'dmaengine-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "We have couple of drivers removed a new driver and bunch of new device support and few updates to drivers for this round. New drivers/devices: - Intel LGM SoC DMA driver - Actions Semi S500 DMA controller - Renesas r8a779a0 dma controller - Ingenic JZ4760(B) dma controller - Intel KeemBay AxiDMA controller Removed: - Coh901318 dma driver - Zte zx dma driver - Sirfsoc dma driver Updates: - mmp_pdma, mmp_tdma gained module support - imx-sdma become modern and dropped platform data support - dw-axi driver gained slave and cyclic dma support" * tag 'dmaengine-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (58 commits) dmaengine: dw-axi-dmac: remove redundant null check on desc dmaengine: xilinx_dma: Alloc tx descriptors GFP_NOWAIT dmaengine: dw-axi-dmac: Virtually split the linked-list dmaengine: dw-axi-dmac: Set constraint to the Max segment size dmaengine: dw-axi-dmac: Add Intel KeemBay AxiDMA BYTE and HALFWORD registers dmaengine: dw-axi-dmac: Add Intel KeemBay AxiDMA handshake dmaengine: dw-axi-dmac: Add Intel KeemBay AxiDMA support dmaengine: drivers: Kconfig: add HAS_IOMEM dependency to DW_AXI_DMAC dmaengine: dw-axi-dmac: Add Intel KeemBay DMA register fields dt-binding: dma: dw-axi-dmac: Add support for Intel KeemBay AxiDMA dmaengine: dw-axi-dmac: Support burst residue granularity dmaengine: dw-axi-dmac: Support of_dma_controller_register() dmaegine: dw-axi-dmac: Support device_prep_dma_cyclic() dmaengine: dw-axi-dmac: Support device_prep_slave_sg dmaengine: dw-axi-dmac: Add device_config operation dmaengine: dw-axi-dmac: Add device_synchronize() callback dmaengine: dw-axi-dmac: move dma_pool_create() to alloc_chan_resources() dmaengine: dw-axi-dmac: simplify descriptor management dt-bindings: dma: Add YAML schemas for dw-axi-dmac dmaengine: ti: k3-psil: optimize struct psil_endpoint_config for size ...
2021-02-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID updates from Jiri Kosina: - support for "Unified Battery" feature on Logitech devices from Filipe Laíns - power management improvements for intel-ish driver from Zhang Lixu - support for Goodix devices from Douglas Anderson - improved handling of generic HID keyboard in order to make it easier for userspace to figure out the details of the device, from Dmitry Torokhov - Playstation DualSense support from Roderick Colenbrander - other assorted small fixes and device ID additions. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (49 commits) HID: playstation: add DualSense player LED support. HID: playstation: add microphone mute support for DualSense. HID: playstation: add initial DualSense lightbar support. HID: wacom: Ignore attempts to overwrite the touch_max value from HID HID: playstation: fix array size comparison (off-by-one) HID: playstation: fix unused variable in ps_battery_get_property. HID: playstation: report DualSense hardware and firmware version. HID: playstation: add DualSense classic rumble support. HID: playstation: add DualSense Bluetooth support. HID: playstation: track devices in list. HID: playstation: add DualSense accelerometer and gyroscope support. HID: playstation: add DualSense touchpad support. HID: playstation: add DualSense battery support. HID: playstation: use DualSense MAC address as unique identifier. HID: playstation: initial DualSense USB support. HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch 10E HID: Ignore battery for Elan touchscreen on HP Spectre X360 15-df0xxx HID: logitech-dj: add support for the new lightspeed connection iteration HID: intel-ish-hid: ipc: Add Tiger Lake H PCI device ID HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming ...
2021-02-23Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-23Merge branch 'for-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: "Percpu had a cleanup come in that makes use of the cpu bitmask helpers instead of the current iterative approach. This clean up then had an adverse interaction when clang's inlining sensitivity is changed such that not all sites are inlined resulting in modpost being upset with section mismatch due to percpu setup being marked __init. That was fixed by introducing __flatten to compiler_attributes.h" * 'for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: fix clang modpost section mismatch percpu: reduce the number of cpu distance comparisons
2021-02-23PCI: Add TI J721E device to PCI IDsKishon Vijay Abraham I
Add TI J721E device to the PCI ID database. Since this device has a configurable PCIe endpoint, it could be used with different drivers. Link: https://lore.kernel.org/r/20210201195809.7342-15-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-02-23PCI: endpoint: Allow user to create sub-directory of 'EPF Device' directoryKishon Vijay Abraham I
Documentation/PCI/endpoint/pci-endpoint-cfs.rst explains how a user has to create a directory in-order to create a 'EPF Device' that can be configured/probed by 'EPF Driver'. Allow user to create a sub-directory of 'EPF Device' directory for any function specific attributes that has to be exposed to the user. Link: https://lore.kernel.org/r/20210201195809.7342-11-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-02-23PCI: endpoint: Add pci_epf_ops to expose function-specific attrsKishon Vijay Abraham I
In addition to the attributes that are generic across function drivers documented in Documentation/PCI/endpoint/pci-endpoint-cfs.rst, there could be function-specific attributes that has to be exposed by the function driver to be configured by the user. Add ->add_cfs() in pci_epf_ops to be populated by the function driver if it has to expose any function-specific attributes and pci_epf_type_add_cfs() to be invoked by pci-ep-cfs.c when sub-directory to main function directory is created. Link: https://lore.kernel.org/r/20210201195809.7342-10-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-02-23PCI: endpoint: Add pci_epc_ops to map MSI IRQKishon Vijay Abraham I
Add pci_epc_ops to map physical address to MSI address and return MSI data. The physical address is an address in the outbound region. This is required to implement doorbell functionality of NTB (non-transparent bridge) wherein EPC on either side of the interface (primary and secondary) can directly write to the physical address (in outbound region) of the other interface to ring doorbell using MSI. Link: https://lore.kernel.org/r/20210201195809.7342-9-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-02-23PCI: endpoint: Add support to associate secondary EPC with EPFKishon Vijay Abraham I
In the case of standard endpoint functions, only one endpoint controller (EPC) will be associated with an endpoint function (EPF). However for providing NTB (non transparent bridge) functionality, two EPCs should be associated with a single EPF. Add support to associate secondary EPC with EPF. This is in preparation for adding NTB endpoint function driver. Link: https://lore.kernel.org/r/20210201195809.7342-7-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>