summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-12-15mm: move free_unref_page to mm/internal.hMatthew Wilcox (Oracle)
Code outside mm/ should not be calling free_unref_page(). Also move free_unref_page_list(). Link: https://lkml.kernel.org/r/20201125034655.27687-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: mmap_lock: add tracepoints around lock acquisitionAxel Rasmussen
The goal of these tracepoints is to be able to debug lock contention issues. This lock is acquired on most (all?) mmap / munmap / page fault operations, so a multi-threaded process which does a lot of these can experience significant contention. We trace just before we start acquisition, when the acquisition returns (whether it succeeded or not), and when the lock is released (or downgraded). The events are broken out by lock type (read / write). The events are also broken out by memcg path. For container-based workloads, users often think of several processes in a memcg as a single logical "task", so collecting statistics at this level is useful. The end goal is to get latency information. This isn't directly included in the trace events. Instead, users are expected to compute the time between "start locking" and "acquire returned", using e.g. synthetic events or BPF. The benefit we get from this is simpler code. Because we use tracepoint_enabled() to decide whether or not to trace, this patch has effectively no overhead unless tracepoints are enabled at runtime. If tracepoints are enabled, there is a performance impact, but how much depends on exactly what e.g. the BPF program does. [axelrasmussen@google.com: fix use-after-free race and css ref leak in tracepoints] Link: https://lkml.kernel.org/r/20201130233504.3725241-1-axelrasmussen@google.com [axelrasmussen@google.com: v3] Link: https://lkml.kernel.org/r/20201207213358.573750-1-axelrasmussen@google.com [rostedt@goodmis.org: in-depth examples of tracepoint_enabled() usage, and per-cpu-per-context buffer design] Link: https://lkml.kernel.org/r/20201105211739.568279-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: cleanup: remove unused tsk arg from __access_remote_vmJohn Hubbard
Despite a comment that said that page fault accounting would be charged to whatever task_struct* was passed into __access_remote_vm(), the tsk argument was actually unused. Making page fault accounting actually use this task struct is quite a project, so there is no point in keeping the tsk argument. Delete both the comment, and the argument. [rppt@linux.ibm.com: changelog addition] Link: https://lkml.kernel.org/r/20201026074137.4147787-1-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: memcontrol: account pagetables per nodeShakeel Butt
For many workloads, pagetable consumption is significant and it makes sense to expose it in the memory.stat for the memory cgroups. However at the moment, the pagetables are accounted per-zone. Converting them to per-node and using the right interface will correctly account for the memory cgroups as well. [akpm@linux-foundation.org: export __mod_lruvec_page_state to modules for arch/mips/kvm/] Link: https://lkml.kernel.org/r/20201130212541.2781790-3-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: move lruvec stats update functions to vmstat.hShakeel Butt
Patch series "memcg: add pagetable comsumption to memory.stat", v2. Many workloads consumes significant amount of memory in pagetables. One specific use-case is the user space network driver which mmaps the application memory to provide zero copy transfer. This driver can consume a large amount memory in page tables. This patch series exposes the pagetable comsumption for each memory cgroup. This patch (of 2): This does not change any functionality and only move the functions which update the lruvec stats to vmstat.h from memcontrol.h. The main reason for this patch is to be able to use these functions in the page table contructor function which is defined in mm.h and we can not include the memcontrol.h in that file. Also this is a better place for this interface in general. The lruvec abstraction, while invented for memcg, isn't specific to memcg at all. Link: https://lkml.kernel.org/r/20201130212541.2781790-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: memcg/slab: rename *_lruvec_slab_state to *_lruvec_kmem_stateMuchun Song
The *_lruvec_slab_state is also suitable for pages allocated from buddy, not just for the slab objects. But the function name seems to tell us that only slab object is applicable. So we can rename the keyword of slab to kmem. Link: https://lkml.kernel.org/r/20201117085249.24319-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15cgroup: remove obsoleted broken_hierarchy and warned_broken_hierarchyRoman Gushchin
With the deprecation of the non-hierarchical mode of the memory controller there are no more examples of broken hierarchies left. Let's remove the cgroup core code which was supposed to print warnings about creating of broken hierarchies. Link: https://lkml.kernel.org/r/20201110220800.929549-4-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: memcg: deprecate the non-hierarchical modeRoman Gushchin
Patch series "mm: memcg: deprecate cgroup v1 non-hierarchical mode", v1. The non-hierarchical cgroup v1 mode is a legacy of early days of the memory controller and doesn't bring any value today. However, it complicates the code and creates many edge cases all over the memory controller code. It's a good time to deprecate it completely. This patchset removes the internal logic, adjusts the user interface and updates the documentation. The alt patch removes some bits of the cgroup core code, which become obsolete. Michal Hocko said: "All that we know today is that we have a warning in place to complain loudly when somebody relies on use_hierarchy=0 with a deeper hierarchy. For all those years we have seen _zero_ reports that would describe a sensible usecase. Moreover we (SUSE) have backported this warning into old distribution kernels (since 3.0 based kernels) to extend the coverage and didn't hear even for users who adopt new kernels only very slowly. The only report we have seen so far was a LTP test suite which doesn't really reflect any real life usecase" This patch (of 3): The non-hierarchical cgroup v1 mode is a legacy of early days of the memory controller and doesn't bring any value today. However, it complicates the code and creates many edge cases all over the memory controller code. It's a good time to deprecate it completely. Functionally this patch enabled is by default for all cgroups and forbids switching it off. Nothing changes if cgroup v2 is used: hierarchical mode was enforced from scratch. To protect the ABI memory.use_hierarchy interface is preserved with a limited functionality: reading always returns "1", writing of "1" passes silently, writing of any other value fails with -EINVAL and a warning to dmesg (on the first occasion). Link: https://lkml.kernel.org/r/20201110220800.929549-1-guro@fb.com Link: https://lkml.kernel.org/r/20201110220800.929549-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: memcg: fix obsolete code commentsRoman Gushchin
This patch fixes/removes some obsolete comments in the code related to the kernel memory accounting: - kmem_cache->memcg_params.memcg_caches has been removed by commit 9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for all accounted allocations") - memcg->kmemcg_id is not used as a gate for kmem accounting since commit 0b8f73e10428 ("mm: memcontrol: clean up alloc, online, offline, free functions") Link: https://lkml.kernel.org/r/20201110184615.311974-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/rmap: always do TTU_IGNORE_ACCESSShakeel Butt
Since commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2"), the code to check the secondary MMU's page table access bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the secondary MMU's page table before the check. More specifically for those secondary MMUs which unmap the memory in mmu_notifier_invalidate_range_start() like kvm. However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the absence of TTU_IGNORE_ACCESS and it explicitly performs the page table access check before trying to unmap the page. So, at worst the reclaim will miss accesses in a very short window if we remove page table access check in unmapping code. There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg reclaim. From memcg reclaim the page_referenced() only account the accesses from the processes which are in the same memcg of the target page but the unmapping code is considering accesses from all the processes, so, decreasing the effectiveness of memcg reclaim. The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping code. Link: https://lkml.kernel.org/r/20201104231928.1494083-1-shakeelb@google.com Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: memcontrol: remove unused mod_memcg_obj_state()Muchun Song
Since commit 991e7673859e ("mm: memcontrol: account kernel stack per node") there is no user of the mod_memcg_obj_state(). So just remove it. Also rework type of the idx parameter of the mod_objcg_state() from int to enum node_stat_item. Link: https://lkml.kernel.org/r/20201013153504.92602-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/shmem.c: make shmem_mapping() inlineHui Su
shmem_mapping() isn't worth an out-of-line call from any callsite. So make it inline by - make shmem_aops global - export shmem_aops - inline the shmem_mapping() and replace the direct call 'shmem_aops' with shmem_mapping() in shmem.c. Link: https://lkml.kernel.org/r/20201115165207.GA265355@rlk Signed-off-by: Hui Su <sh_def@163.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: remove pagevec_lookup_range_nr_tag()Jeff Layton
With the merge of commit 2e1692966034 ("ceph: have ceph_writepages_start call pagevec_lookup_range_tag"), nothing calls this anymore. Link: https://lkml.kernel.org/r/20201021193926.101474-1-jlayton@kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/gup: remove the vma allocation from gup_longterm_locked()Jason Gunthorpe
Long ago there wasn't a FOLL_LONGTERM flag so this DAX check was done by post-processing the VMA list. These days it is trivial to just check each VMA to see if it is DAX before processing it inside __get_user_pages() and return failure if a DAX VMA is encountered with FOLL_LONGTERM. Removing the allocation of the VMA list is a significant speed up for many call sites. Add an IS_ENABLED to vma_is_fsdax so that code generation is unchanged when DAX is compiled out. Remove the dummy version of __gup_longterm_locked() as !CONFIG_CMA already makes memalloc_nocma_save(), check_and_migrate_cma_pages(), and memalloc_nocma_restore() into a NOP. Link: https://lkml.kernel.org/r/0-v1-5551df3ed12e+b8-gup_dax_speedup_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/gup: prevent gup_fast from racing with COW during forkJason Gunthorpe
Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") pages under a FOLL_PIN will not be write protected during COW for fork. This means that pages returned from pin_user_pages(FOLL_WRITE) should not become write protected while the pin is active. However, there is a small race where get_user_pages_fast(FOLL_PIN) can establish a FOLL_PIN at the same time copy_present_page() is write protecting it: CPU 0 CPU 1 get_user_pages_fast() internal_get_user_pages_fast() copy_page_range() pte_alloc_map_lock() copy_present_page() atomic_read(has_pinned) == 0 page_maybe_dma_pinned() == false atomic_set(has_pinned, 1); gup_pgd_range() gup_pte_range() pte_t pte = gup_get_pte(ptep) pte_access_permitted(pte) try_grab_compound_head() pte = pte_wrprotect(pte) set_pte_at(); pte_unmap_unlock() // GUP now returns with a write protected page The first attempt to resolve this by using the write protect caused problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid early COW write protect games during fork()") Instead wrap copy_p4d_range() with the write side of a seqcount and check the read side around gup_pgd_range(). If there is a collision then get_user_pages_fast() fails and falls back to slow GUP. Slow GUP is safe against this race because copy_page_range() is only called while holding the exclusive side of the mmap_lock on the src mm_struct. [akpm@linux-foundation.org: coding style fixes] Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com Link: https://lkml.kernel.org/r/2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: "Ahmed S. Darwish" <a.darwish@linutronix.de> [seqcount_t parts] Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: fix page_owner initializing issue for arm32Zhenhua Huang
Page owner of pages used by page owner itself used is missing on arm32 targets. The reason is dummy_handle and failure_handle is not initialized correctly. Buddy allocator is used to initialize these two handles. However, buddy allocator is not ready when page owner calls it. This change fixed that by initializing page owner after buddy initialization. The working flow before and after this change are: original logic: 1. allocated memory for page_ext(using memblock). 2. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). 3. initialize buddy. after this change: 1. allocated memory for page_ext(using memblock). 2. initialize buddy. 3. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). with the change, failure/dummy_handle can get its correct value and page owner output for example has the one for page owner itself: Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0() init_page_owner+0x28/0x2f8 invoke_init_callbacks_flatmem+0x24/0x34 start_kernel+0x33c/0x5d8 Link: https://lkml.kernel.org/r/1603104925-5888-1-git-send-email-zhenhuah@codeaurora.org Signed-off-by: Zhenhua Huang <zhenhuah@codeaurora.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: slab: provide krealloc_array()Bartosz Golaszewski
When allocating an array of elements, users should check for multiplication overflow or preferably use one of the provided helpers like: kmalloc_array(). There's no krealloc_array() counterpart but there are many users who use regular krealloc() to reallocate arrays. Let's provide an actual krealloc_array() implementation. While at it: add some documentation regarding krealloc. Link: https://lkml.kernel.org/r/20201109110654.12547-3-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christian Knig <christian.koenig@amd.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Rientjes <rientjes@google.com> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: James Morse <james.morse@arm.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15cpufreq: Add special-purpose fast-switching callback for driversRafael J. Wysocki
First off, some cpufreq drivers (eg. intel_pstate) can pass hints beyond the current target frequency to the hardware and there are no provisions for doing that in the cpufreq framework. In particular, today the driver has to assume that it should not allow the frequency to fall below the one requested by the governor (or the required capacity may not be provided) which may not be the case and which may lead to excessive energy usage in some scenarios. Second, the hints passed by these drivers to the hardware need not be in terms of the frequency, so representing the utilization numbers coming from the scheduler as frequency before passing them to those drivers is not really useful. Address the two points above by adding a special-purpose replacement for the ->fast_switch callback, called ->adjust_perf, allowing the governor to pass abstract performance level (rather than frequency) values for the minimum (required) and target (desired) performance along with the CPU capacity to compare them to. Also update the schedutil governor to use the new callback instead of ->fast_switch if present and if the utilization mertics are frequency-invariant (that is requisite for the direct mapping between the utilization and the CPU performance levels to be a reasonable approximation). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-15Merge tag 'kvmarm-5.11' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.11 - PSCI relay at EL2 when "protected KVM" is enabled - New exception injection code - Simplification of AArch32 system register handling - Fix PMU accesses when no PMU is enabled - Expose CSV3 on non-Meltdown hosts - Cache hierarchy discovery fixes - PV steal-time cleanups - Allow function pointers at EL2 - Various host EL2 entry cleanups - Simplification of the EL2 vector allocation
2020-12-15Merge tag 'drm-misc-next-fixes-2020-12-15' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-next Short summary of fixes pull (less than what git shortlog provides): * dma-buf: Fix docs * mxsfb: Silence invalid error message * radeon: Fix TTM multihop Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/X9i0X9mjHN9AZGD3@linux-uq9g
2020-12-15genirq: Provide kstat_irqdesc_cpu()Thomas Gleixner
Most users of kstat_irqs_cpu() have the irq descriptor already. No point in calling into the core code and looking it up once more. Use it in per_cpu_count_show() to start with. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194043.362094758@linutronix.de
2020-12-15genirq: Make kstat_irqs() staticThomas Gleixner
No more users outside the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194043.268774449@linutronix.de
2020-12-15genirq: Provide irq_get_effective_affinity()Thomas Gleixner
Provide an accessor to the effective interrupt affinity mask. Going to be used to replace open coded fiddling with the irq descriptor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.967177918@linutronix.de
2020-12-15genirq: Move irq_set_lockdep_class() to coreThomas Gleixner
irq_set_lockdep_class() is used from modules and requires irq_to_desc() to be exported. Move it into the core code which lifts another requirement for the export. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.860029489@linutronix.de
2020-12-15genirq: Move status flag checks to coreThomas Gleixner
These checks are used by modules and prevent the removal of the export of irq_to_desc(). Move the accessor into the core. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.703779349@linutronix.de
2020-12-15genirq: Move irq_has_action() into core codeThomas Gleixner
This function uses irq_to_desc() and is going to be used by modules to replace the open coded irq_to_desc() (ab)usage. The final goal is to remove the export of irq_to_desc() so driver cannot fiddle with it anymore. Move it into the core code and fixup the usage sites to include the proper header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.548936472@linutronix.de
2020-12-15Merge branches 'acpi-resources' and 'acpi-docs'Rafael J. Wysocki
* acpi-resources: Revert "ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks" resource: provide meaningful MODULE_LICENSE() in test suite ASoC: Intel: catpt: Replace open coded variant of resource_intersection() ACPI: watchdog: Replace open coded variant of resource_union() PCI/ACPI: Replace open coded variant of resource_union() resource: Add test cases for new resource API resource: Introduce resource_intersection() for overlapping resources resource: Introduce resource_union() for overlapping resources resource: Group resource_overlaps() with other inline helpers resource: Simplify region_intersects() by reducing conditionals * acpi-docs: Documentation: ACPI: enumeration: add PCI hierarchy representation Documentation: ACPI: _DSD: enable hyperlink in final references Documentation: ACPI: explain how to use gpio-line-names
2020-12-15Merge branches 'pm-devfreq' and 'pm-tools'Rafael J. Wysocki
* pm-devfreq: PM / devfreq: tegra30: Separate configurations per-SoC generation PM / devfreq: tegra30: Support interconnect and OPPs from device-tree PM / devfreq: tegra20: Deprecate in a favor of emc-stat based driver PM / devfreq: exynos-bus: Add registration of interconnect child device dt-bindings: devfreq: Add documentation for the interconnect properties soc/tegra: fuse: Add stub for tegra_sku_info soc/tegra: fuse: Export tegra_read_ram_code() clk: tegra: Export Tegra20 EMC kernel symbols PM / devfreq: tegra30: Silence deferred probe error PM / devfreq: tegra20: Relax Kconfig dependency PM / devfreq: tegra20: Silence deferred probe error PM / devfreq: Remove redundant governor_name from struct devfreq PM / devfreq: Add governor attribute flag for specifc sysfs nodes PM / devfreq: Add governor feature flag PM / devfreq: Add tracepoint for frequency changes PM / devfreq: Unify frequency change to devfreq_update_target func trace: events: devfreq: Use fixed indentation size to improve readability * pm-tools: pm-graph v5.8 cpupower: Provide online and offline CPU information
2020-12-15Merge branches 'pm-sleep', 'pm-acpi', 'pm-domains' and 'powercap'Rafael J. Wysocki
* pm-sleep: PM: sleep: Add dev_wakeup_path() helper PM / suspend: fix kernel-doc markup PM: sleep: Print driver flags for all devices during suspend/resume * pm-acpi: PM: ACPI: Refresh wakeup device power configuration every time PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup() PM: ACPI: reboot: Use S5 for reboot * pm-domains: PM: domains: create debugfs nodes when adding power domains PM: domains: replace -ENOTSUPP with -EOPNOTSUPP * powercap: powercap: Adjust printing the constraint name with new line powercap: RAPL: Add AMD Fam19h RAPL support powercap: Add AMD Fam17h RAPL support powercap/intel_rapl_msr: Convert rapl_msr_priv into pointer x86/msr-index: sort AMD RAPL MSRs by address
2020-12-15Merge branches 'pm-cpuidle' and 'pm-em'Rafael J. Wysocki
* pm-cpuidle: cpuidle: Select polling interval based on a c-state with a longer target residency cpuidle: psci: Enable suspend-to-idle for PSCI OSI mode PM: domains: Enable dev_pm_genpd_suspend|resume() for suspend-to-idle PM: domains: Rename pm_genpd_syscore_poweroff|poweron() * pm-em: PM / EM: Micro optimization in em_cpu_energy PM: EM: Update Energy Model with new flag indicating power scale PM: EM: update the comments related to power scale PM: EM: Clarify abstract scale usage for power values in Energy Model
2020-12-15Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: (31 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE ...
2020-12-15dma-buf: Fix kerneldoc formattingDaniel Vetter
I wanted to look up something and noticed the hyperlink doesn't work. While fixing that also noticed a trivial kerneldoc comment typo in the same section, fix that too. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Simon Ser <contact@emersion.fr> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204200242.2671481-1-daniel.vetter@ffwll.ch
2020-12-15Merge tag 'irqchip-5.11' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates for 5.11 from Marc Zyngier: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Remove the fasteoi IPI flow which has been proved useless - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups Link: https://lore.kernel.org/r/20201212135626.1479884-1-maz@kernel.org
2020-12-15Merge tag 'drm-misc-next-2020-11-27-1' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.11: UAPI Changes: Cross-subsystem Changes: * char/agp: Disable frontend without CONFIG_DRM_LEGACY * mm: Fix fput in mmap error path; Introduce vma_set_file() to change vma->vm_file Core Changes: * dma-buf: Use sgtables in system heap; Move heap helpers to CMA-heap code; Skip sync for unmapped buffers; Alloc higher order pages is available; Respect num_fences when initializing shared fence list * doc: Improvements around DRM modes and SCALING_FILTER * Pass full state to connector atomic functions + callee updates * Cleanups * shmem: Map pages with caching by default; Cleanups * ttm: Fix DMA32 for global page pool * fbdev: Cleanups * fb-helper: Update framebuffer after userspace writes; Unmap console buffer during shutdown; Rework damage handling of shadow framebuffer Driver Changes: * amdgpu: Multi-hop fixes, Clenaups * imx: Fix rotation for Vivante tiled formats; Support nearest-neighour skaling; Cleanups * mcde: Fix RGB formats; Support DPI output; Cleanups * meson: HDMI clock fixes * panel: Add driver and bindings for Innolux N125HCE-GN1 * panel/s6e63m0: More backlight levels; Fix init; Cleanups * via: Clenunps * virtio: Use fence ID for handling fences; Cleanups Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20201127083055.GA29139@linux-uq9g
2020-12-14Merge tag 'x86-apic-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: "Yet another large set of x86 interrupt management updates: - Simplification and distangling of the MSI related functionality - Let IO/APIC construct the RTE entries from an MSI message instead of having IO/APIC specific code in the interrupt remapping drivers - Make the retrieval of the parent interrupt domain (vector or remap unit) less hardcoded and use the relevant irqdomain callbacks for selection. - Allow the handling of more than 255 CPUs without a virtualized IOMMU when the hypervisor supports it. This has made been possible by the above modifications and also simplifies the existing workaround in the HyperV specific virtual IOMMU. - Cleanup of the historical timer_works() irq flags related inconsistencies" * tag 'x86-apic-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) x86/ioapic: Cleanup the timer_works() irqflags mess iommu/hyper-v: Remove I/O-APIC ID check from hyperv_irq_remapping_select() iommu/amd: Fix IOMMU interrupt generation in X2APIC mode iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled iommu/amd: Fix union of bitfields in intcapxt support x86/ioapic: Correct the PCI/ISA trigger type selection x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available x86/apic: Support 15 bits of APIC ID in MSI where available x86/ioapic: Handle Extended Destination ID field in RTE iommu/vt-d: Simplify intel_irq_remapping_select() x86: Kill all traces of irq_remapping_get_irq_domain() x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain iommu/hyper-v: Implement select() method on remapping irqdomain iommu/vt-d: Implement select() method on remapping irqdomain iommu/amd: Implement select() method on remapping irqdomain x86/apic: Add select() method on vector irqdomain ...
2020-12-14Merge tag 'core-mm-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kmap updates from Thomas Gleixner: "The new preemtible kmap_local() implementation: - Consolidate all kmap_atomic() internals into a generic implementation which builds the base for the kmap_local() API and make the kmap_atomic() interface wrappers which handle the disabling/enabling of preemption and pagefaults. - Switch the storage from per-CPU to per task and provide scheduler support for clearing mapping when scheduling out and restoring them when scheduling back in. - Merge the migrate_disable/enable() code, which is also part of the scheduler pull request. This was required to make the kmap_local() interface available which does not disable preemption when a mapping is established. It has to disable migration instead to guarantee that the virtual address of the mapped slot is the same across preemption. - Provide better debug facilities: guard pages and enforced utilization of the mapping mechanics on 64bit systems when the architecture allows it. - Provide the new kmap_local() API which can now be used to cleanup the kmap_atomic() usage sites all over the place. Most of the usage sites do not require the implicit disabling of preemption and pagefaults so the penalty on 64bit and 32bit non-highmem systems is removed and quite some of the code can be simplified. A wholesale conversion is not possible because some usage depends on the implicit side effects and some need to be cleaned up because they work around these side effects. The migrate disable side effect is only effective on highmem systems and when enforced debugging is enabled. On 64bit and 32bit non-highmem systems the overhead is completely avoided" * tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) ARM: highmem: Fix cache_is_vivt() reference x86/crashdump/32: Simplify copy_oldmem_page() io-mapping: Provide iomap_local variant mm/highmem: Provide kmap_local* sched: highmem: Store local kmaps in task struct x86: Support kmap_local() forced debugging mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL microblaze/mm/highmem: Add dropped #ifdef back xtensa/mm/highmem: Make generic kmap_atomic() work correctly mm/highmem: Take kmap_high_get() properly into account highmem: High implementation details and document API Documentation/io-mapping: Remove outdated blurb io-mapping: Cleanup atomic iomap mm/highmem: Remove the old kmap_atomic cruft highmem: Get rid of kmap_types.h xtensa/mm/highmem: Switch to generic kmap atomic sparc/mm/highmem: Switch to generic kmap atomic powerpc/mm/highmem: Switch to generic kmap atomic nds32/mm/highmem: Switch to generic kmap atomic ...
2020-12-14Merge tag 'sched-core-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - migrate_disable/enable() support which originates from the RT tree and is now a prerequisite for the new preemptible kmap_local() API which aims to replace kmap_atomic(). - A fair amount of topology and NUMA related improvements - Improvements for the frequency invariant calculations - Enhanced robustness for the global CPU priority tracking and decision making - The usual small fixes and enhancements all over the place * tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) sched/fair: Trivial correction of the newidle_balance() comment sched/fair: Clear SMT siblings after determining the core is not idle sched: Fix kernel-doc markup x86: Print ratio freq_max/freq_base used in frequency invariance calculations x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC x86, sched: Calculate frequency invariance for AMD systems irq_work: Optimize irq_work_single() smp: Cleanup smp_call_function*() irq_work: Cleanup sched: Limit the amount of NUMA imbalance that can exist at fork time sched/numa: Allow a floating imbalance between NUMA nodes sched: Avoid unnecessary calculation of load imbalance at clone time sched/numa: Rename nr_running and break out the magic number sched: Make migrate_disable/enable() independent of RT sched/topology: Condition EAS enablement on FIE support arm64: Rebuild sched domains on invariance status changes sched/topology,schedutil: Wrap sched domains rebuild sched/uclamp: Allow to reset a task uclamp constraint value sched/core: Fix typos in comments Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug ...
2020-12-14Merge tag 'timers-core-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers and timekeeping updates from Thomas Gleixner: "Core: - Robustness improvements for the NOHZ tick management - Fixes and consolidation of the NTP/RTC synchronization code - Small fixes and improvements in various places - A set of function documentation udpates and fixes Drivers: - Cleanups and improvements in various clocksoure/event drivers - Removal of the EZChip NPS clocksource driver as the platfrom support was removed from ARC - The usual set of new device tree binding and json conversions - The RTC driver which have been acked by the RTC maintainer: * fix a long standing bug in the MC146818 library code which can cause reading garbage during the RTC internal update. * changes related to the NTP/RTC consolidation work" * tag 'timers-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) ntp: Fix prototype in the !CONFIG_GENERIC_CMOS_UPDATE case tick/sched: Make jiffies update quick check more robust ntp: Consolidate the RTC update implementation ntp: Make the RTC sync offset less obscure ntp, rtc: Move rtc_set_ntp_time() to ntp code ntp: Make the RTC synchronization more reliable rtc: core: Make the sync offset default more realistic rtc: cmos: Make rtc_cmos sync offset correct rtc: mc146818: Reduce spinlock section in mc146818_set_time() rtc: mc146818: Prevent reading garbage clocksource/drivers/sh_cmt: Fix potential deadlock when calling runtime PM clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI clocksource/drivers/arm_arch_timer: Use stable count reader in erratum sne clocksource/drivers/dw_apb_timer_of: Add error handling if no clock available clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI clocksource/drivers/ingenic: Fix section mismatch clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent() dt-bindings: timer: renesas: tmu: Convert to json-schema dt-bindings: timer: renesas: tmu: Document r8a774e1 bindings clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path ...
2020-12-14Merge tag 'perf-kprobes-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf/kprobes updates from Thomas Gleixner: "Make kretprobes lockless to avoid the rp->lock performance and potential lock ordering issues" * tag 'perf-kprobes-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomics: Regenerate the atomics-check SHA1's kprobes: Replace rp->free_instance with freelist freelist: Implement lockless freelist asm-generic/atomic: Add try_cmpxchg() fallbacks kprobes: Remove kretprobe hash llist: Add nonatomic __llist_add() and __llist_dell_all()
2020-12-14Merge tag 'perf-core-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Core: - Better handling of page table leaves on archictectures which have architectures have non-pagetable aligned huge/large pages. For such architectures a leaf can actually be part of a larger entry. - Prevent a deadlock vs exec_update_mutex Architectures: - The related updates for page size calculation of leaf entries - The usual churn to support new CPUs - Small fixes and improvements all over the place" * tag 'perf-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) perf/x86/intel: Add Tremont Topdown support uprobes/x86: Fix fall-through warnings for Clang perf/x86: Fix fall-through warnings for Clang kprobes/x86: Fix fall-through warnings for Clang perf/x86/intel/lbr: Fix the return type of get_lbr_cycles() perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake x86/kprobes: Restore BTF if the single-stepping is cancelled perf: Break deadlock involving exec_update_mutex sparc64/mm: Implement pXX_leaf_size() support powerpc/8xx: Implement pXX_leaf_size() support arm64/mm: Implement pXX_leaf_size() support perf/core: Fix arch_perf_get_page_size() mm: Introduce pXX_leaf_size() mm/gup: Provide gup_get_pte() more generic perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY perf/x86/intel/uncore: Add Rocket Lake support perf/x86/msr: Add Rocket Lake CPU support perf/x86/cstate: Add Rocket Lake CPU support perf/x86/intel: Add Rocket Lake CPU support perf,mm: Handle non-page-table-aligned hugetlbfs ...
2020-12-14Merge tag 'locking-core-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A moderate set of locking updates: - A few extensions to the rwsem API and support for opportunistic spinning and lock stealing - lockdep selftest improvements - Documentation updates - Cleanups and small fixes all over the place" * tag 'locking-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) seqlock: kernel-doc: Specify when preemption is automatically altered seqlock: Prefix internal seqcount_t-only macros with a "do_" Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g locking/rwsem: Remove reader optimistic spinning locking/rwsem: Enable reader optimistic lock stealing locking/rwsem: Prevent potential lock starvation locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath() locking/rwsem: Fold __down_{read,write}*() locking/rwsem: Introduce rwsem_write_trylock() locking/rwsem: Better collate rwsem_read_trylock() rwsem: Implement down_read_interruptible rwsem: Implement down_read_killable_nested refcount: Fix a kernel-doc markup completion: Drop init_completion define atomic: Update MAINTAINERS atomic: Delete obsolete documentation seqlock: Rename __seqprop() users lockdep/selftest: Add spin_nest_lock test lockdep/selftests: Fix PROVE_RAW_LOCK_NESTING seqlock: avoid -Wshadow warnings ...
2020-12-14Merge tag 'core-rcu-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Thomas Gleixner: "RCU, LKMM and KCSAN updates collected by Paul McKenney. RCU: - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs - Lockdep-RCU updates reducing the need for __maybe_unused - Tasks-RCU updates - Miscellaneous fixes - Documentation updates - Torture-test updates KCSAN: - updates for selftests, avoiding setting watchpoints on NULL pointers - fix to watchpoint encoding LKMM: - updates for documentation along with some updates to example-code litmus tests" * tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) srcu: Take early exit on memory-allocation failure rcu/tree: Defer kvfree_rcu() allocation to a clean context rcu: Do not report strict GPs for outgoing CPUs rcu: Fix a typo in rcu_blocking_is_gp() header comment rcu: Prevent lockdep-RCU splats on lock acquisition/release rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs rcu,ftrace: Fix ftrace recursion rcu/tree: Make struct kernel_param_ops definitions const rcu/tree: Add a warning if CPU being onlined did not report QS already rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config rcu: Fix single-CPU check in rcu_blocking_is_gp() rcu: Implement rcu_segcblist_is_offloaded() config dependent list.h: Update comment to explicitly note circular lists rcu: Panic after fixed number of stalls x86/smpboot: Move rcu_cpu_starting() earlier rcu: Allow rcu_irq_enter_check_tick() from NMI tools/memory-model: Label MP tests' producers and consumers tools/memory-model: Use "buf" and "flag" for message-passing tests tools/memory-model: Add types to litmus tests tools/memory-model: Add a glossary of LKMM terms ...
2020-12-14Merge tag 'core-entry-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core entry/exit updates from Thomas Gleixner: "A set of updates for entry/exit handling: - More generalization of entry/exit functionality - The consolidation work to reclaim TIF flags on x86 and also for non-x86 specific TIF flags which are solely relevant for syscall related work and have been moved into their own storage space. The x86 specific part had to be merged in to avoid a major conflict. - The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal delivery mode of task work and results in an impressive performance improvement for io_uring. The non-x86 consolidation of this is going to come seperate via Jens. - The selective syscall redirection facility which provides a clean and efficient way to support the non-Linux syscalls of WINE by catching them at syscall entry and redirecting them to the user space emulation. This can be utilized for other purposes as well and has been designed carefully to avoid overhead for the regular fastpath. This includes the core changes and the x86 support code. - Simplification of the context tracking entry/exit handling for the users of the generic entry code which guarantee the proper ordering and protection. - Preparatory changes to make the generic entry code accomodate S390 specific requirements which are mostly related to their syscall restart mechanism" * tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) entry: Add syscall_exit_to_user_mode_work() entry: Add exit_to_user_mode() wrapper entry_Add_enter_from_user_mode_wrapper entry: Rename exit_to_user_mode() entry: Rename enter_from_user_mode() docs: Document Syscall User Dispatch selftests: Add benchmark for syscall user dispatch selftests: Add kselftest for syscall user dispatch entry: Support Syscall User Dispatch on common syscall entry kernel: Implement selective syscall userspace redirection signal: Expose SYS_USER_DISPATCH si_code type x86: vdso: Expose sigreturn address on vdso to the kernel MAINTAINERS: Add entry for common entry code entry: Fix boot for !CONFIG_GENERIC_ENTRY x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs sched: Detect call to schedule from critical entry code context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK x86: Reclaim unused x86 TI flags ...
2020-12-14Merge tag 'fixes-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull misc fixes from Christian Brauner: "This contains several fixes which felt worth being combined into a single branch: - Use put_nsproxy() instead of open-coding it switch_task_namespaces() - Kirill's work to unify lifecycle management for all namespaces. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered. This work allows us to unify the type of the counters and reduces maintenance cost by moving the counter in one place and indicating that basic lifetime management is identical for all namespaces. - Peilin's fix adding three byte padding to Dmitry's PTRACE_GET_SYSCALL_INFO uapi struct to prevent an info leak. - Two smal patches to convert from the /* fall through */ comment annotation to the fallthrough keyword annotation which I had taken into my branch and into -next before df561f6688fe ("treewide: Use fallthrough pseudo-keyword") made it upstream which fixed this tree-wide. Since I didn't want to invalidate all testing for other commits I didn't rebase and kept them" * tag 'fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: nsproxy: use put_nsproxy() in switch_task_namespaces() sys: Convert to the new fallthrough notation signal: Convert to the new fallthrough notation time: Use generic ns_common::count cgroup: Use generic ns_common::count mnt: Use generic ns_common::count user: Use generic ns_common::count pid: Use generic ns_common::count ipc: Use generic ns_common::count uts: Use generic ns_common::count net: Use generic ns_common::count ns: Add a common refcount into ns_common ptrace: Prevent kernel-infoleak in ptrace_get_syscall_info()
2020-12-14Merge tag 'time-namespace-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull time namespace updates from Christian Brauner: "When time namespaces were introduced we missed to virtualize the 'btime' field in /proc/stat. This confuses tasks which are in another time namespace with a virtualized boottime which is common in some container workloads. This contains Michael's series to fix 'btime' which Thomas asked me to take through my tree. To fix 'btime' virtualization we simply subtract the offset of the time namespace's boottime from btime before printing the stats. Note that since start_boottime of processes are seconds since boottime and the boottime stamp is now shifted according to the time namespace's offset, the offset of the time namespace also needs to be applied before the process stats are given to userspace. This avoids that processes shown by tools such as 'ps' appear as time travelers in the corresponding time namespace. Selftests are included to verify that btime virtualization in /proc/stat works as expected" * tag 'time-namespace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: namespace: make timens_on_fork() return nothing selftests/timens: added selftest for /proc/stat btime fs/proc: apply the time namespace offset to /proc/stat btime timens: additional helper functions for boottime offset handling
2020-12-14Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Expose tag address bits in siginfo. The original arm64 ABI did not expose any of the bits 63:56 of a tagged address in siginfo. In the presence of user ASAN or MTE, this information may be useful. The implementation is generic to other architectures supporting tags (like SPARC ADI, subject to wiring up the arch code). The user will have to opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra bits, if available, become visible in si_addr. - Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the lowest 1GB to cope with the Raspberry Pi 4 limitations, to the detriment of other platforms. With these changes, the kernel scans the Device Tree dma-ranges and the ACPI IORT information before deciding on a smaller ZONE_DMA. - Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building with LTO, there is an increased risk of the compiler converting an address dependency headed by a READ_ONCE() invocation into a control dependency and consequently allowing for harmful reordering by the CPU. - Add CPPC FFH support using arm64 AMU counters. - set_fs() removal on arm64. This renders the User Access Override (UAO) ARMv8 feature unnecessary. - Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs identifier file for SMMUv3, stop event counters support for i.MX8MP, enable the perf events-based hard lockup detector. - Reorganise the kernel VA space slightly so that 52-bit VA configurations can use more virtual address space. - Improve the robustness of the arm64 memory offline event notifier. - Pad the Image header to 64K following the EFI header definition updated recently to increase the section alignment to 64K. - Support CONFIG_CMDLINE_EXTEND on arm64. - Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8 bits for PtrAuth. - Switch to vmapped shadow call stacks. - Miscellaneous clean-ups. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits) perf/imx_ddr: Add system PMU identifier for userspace bindings: perf: imx-ddr: add compatible string arm64: Fix build failure when HARDLOCKUP_DETECTOR_PERF is enabled arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE arm64: mark __system_matches_cap as __maybe_unused arm64: uaccess: remove vestigal UAO support arm64: uaccess: remove redundant PAN toggling arm64: uaccess: remove addr_limit_user_check() arm64: uaccess: remove set_fs() arm64: uaccess cleanup macro naming arm64: uaccess: split user/kernel routines arm64: uaccess: refactor __{get,put}_user arm64: uaccess: simplify __copy_user_flushcache() arm64: uaccess: rename privileged uaccess routines arm64: sdei: explicitly simulate PAN/UAO entry arm64: sdei: move uaccess logic to arch/arm64/ arm64: head.S: always initialize PSTATE arm64: head.S: cleanup SCTLR_ELx initialization arm64: head.S: rename el2_setup -> init_kernel_el arm64: add C wrappers for SET_PSTATE_*() ...
2020-12-14Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-12-14 1) Expose bpf_sk_storage_*() helpers to iterator programs, from Florent Revest. 2) Add AF_XDP selftests based on veth devs to BPF selftests, from Weqaar Janjua. 3) Support for finding BTF based kernel attach targets through libbpf's bpf_program__set_attach_target() API, from Andrii Nakryiko. 4) Permit pointers on stack for helper calls in the verifier, from Yonghong Song. 5) Fix overflows in hash map elem size after rlimit removal, from Eric Dumazet. 6) Get rid of direct invocation of llc in BPF selftests, from Andrew Delgadillo. 7) Fix xsk_recvmsg() to reorder socket state check before access, from Björn Töpel. 8) Add new libbpf API helper to retrieve ring buffer epoll fd, from Brendan Jackman. 9) Batch of minor BPF selftest improvements all over the place, from Florian Lehner, KP Singh, Jiri Olsa and various others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits) selftests/bpf: Add a test for ptr_to_map_value on stack for helper access bpf: Permits pointers on stack for helper calls libbpf: Expose libbpf ring_buffer epoll_fd selftests/bpf: Add set_attach_target() API selftest for module target libbpf: Support modules in bpf_program__set_attach_target() API selftests/bpf: Silence ima_setup.sh when not running in verbose mode. selftests/bpf: Drop the need for LLVM's llc selftests/bpf: fix bpf_testmod.ko recompilation logic samples/bpf: Fix possible hang in xdpsock with multiple threads selftests/bpf: Make selftest compilation work on clang 11 selftests/bpf: Xsk selftests - adding xdpxceiver to .gitignore selftests/bpf: Drop tcp-{client,server}.py from Makefile selftests/bpf: Xsk selftests - Bi-directional Sockets - SKB, DRV selftests/bpf: Xsk selftests - Socket Teardown - SKB, DRV selftests/bpf: Xsk selftests - DRV POLL, NOPOLL selftests/bpf: Xsk selftests - SKB POLL, NOPOLL selftests/bpf: Xsk selftests framework bpf: Only provide bpf_sock_from_file with CONFIG_NET bpf: Return -ENOTSUPP when attaching to non-kernel BTF xsk: Validate socket state in xsk_recvmsg, prior touching socket members ... ==================== Link: https://lore.kernel.org/r/20201214214316.20642-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14rcu-tasks: Move RCU-tasks initialization to before early_initcall()Uladzislau Rezki (Sony)
PowerPC testing encountered boot failures due to RCU Tasks not being fully initialized until core_initcall() time. This commit therefore initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just before early_initcall() time, thus allowing waiting on RCU Tasks grace periods from early_initcall() handlers. Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/ Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-12-14libceph: drop ceph_auth_{create,update}_authorizer()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph, ceph: implement msgr2.1 protocol (crc and secure modes)Ilya Dryomov
Implement msgr2.1 wire protocol, available since nautilus 14.2.11 and octopus 15.2.5. msgr2.0 wire protocol is not implemented -- it has several security, integrity and robustness issues and therefore considered deprecated. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>