summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2020-09-02module: Add more error message for failed kernel module loadingQu Wenruo
When kernel module loading failed, user space only get one of the following error messages: - ENOEXEC This is the most confusing one. From corrupted ELF header to bad WRITE|EXEC flags check introduced by in module_enforce_rwx_sections() all returns this error number. - EPERM This is for blacklisted modules. But mod doesn't do extra explain on this error either. - ENOMEM The only error which needs no explain. This means, if a user got "Exec format error" from modprobe, it provides no meaningful way for the user to debug, and will take extra time communicating to get extra info. So this patch will add extra error messages for -ENOEXEC and -EPERM errors, allowing user to do better debugging and reporting. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2020-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-09-01 The following pull-request contains BPF updates for your *net-next* tree. There are two small conflicts when pulling, resolve as follows: 1) Merge conflict in tools/lib/bpf/libbpf.c between 88a82120282b ("libbpf: Factor out common ELF operations and improve logging") in bpf-next and 1e891e513e16 ("libbpf: Fix map index used in error message") in net-next. Resolve by taking the hunk in bpf-next: [...] scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx); data = elf_sec_data(obj, scn); if (!scn || !data) { pr_warn("elf: failed to get %s map definitions for %s\n", MAPS_ELF_SEC, obj->path); return -EINVAL; } [...] 2) Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c between 9647c57b11e5 ("xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for better performance") in bpf-next and e20f0dbf204f ("net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES") in net-next. Resolve the two locations by retaining net_prefetch() and taking xsk_buff_dma_sync_for_cpu() from bpf-next. Should look like: [...] xdp_set_data_meta_invalid(xdp); xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); net_prefetch(xdp->data); [...] We've added 133 non-merge commits during the last 14 day(s) which contain a total of 246 files changed, 13832 insertions(+), 3105 deletions(-). The main changes are: 1) Initial support for sleepable BPF programs along with bpf_copy_from_user() helper for tracing to reliably access user memory, from Alexei Starovoitov. 2) Add BPF infra for writing and parsing TCP header options, from Martin KaFai Lau. 3) bpf_d_path() helper for returning full path for given 'struct path', from Jiri Olsa. 4) AF_XDP support for shared umems between devices and queues, from Magnus Karlsson. 5) Initial prep work for full BPF-to-BPF call support in libbpf, from Andrii Nakryiko. 6) Generalize bpf_sk_storage map & add local storage for inodes, from KP Singh. 7) Implement sockmap/hash updates from BPF context, from Lorenz Bauer. 8) BPF xor verification for scalar types & add BPF link iterator, from Yonghong Song. 9) Use target's prog type for BPF_PROG_TYPE_EXT prog verification, from Udip Pant. 10) Rework BPF tracing samples to use libbpf loader, from Daniel T. Lee. 11) Fix xdpsock sample to really cycle through all buffers, from Weqaar Janjua. 12) Improve type safety for tun/veth XDP frame handling, from Maciej Żenczykowski. 13) Various smaller cleanups and improvements all over the place. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01bpf: {cpu,dev}map: Change various functions return type from int to voidBjörn Töpel
The functions bq_enqueue(), bq_flush_to_queue(), and bq_xmit_all() in {cpu,dev}map.c always return zero. Changing the return type from int to void makes the code easier to follow. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200901083928.6199-1-bjorn.topel@gmail.com
2020-09-01Merge branch 'master' into for-nextJiri Kosina
Sync with Linus' branch in order to be able to apply fixups of more recent patches.
2020-09-01tracepoint: Optimize using static_call()Steven Rostedt (VMware)
Currently the tracepoint site will iterate a vector and issue indirect calls to however many handlers are registered (ie. the vector is long). Using static_call() it is possible to optimize this for the common case of only having a single handler registered. In this case the static_call() can directly call this handler. Otherwise, if the vector is longer than 1, call a function that iterates the whole vector like the current code. [peterz: updated to new interface] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20200818135805.279421092@infradead.org
2020-09-01static_call: Allow early initPeter Zijlstra
In order to use static_call() to wire up x86_pmu, we need to initialize earlier, specifically before memory allocation works; copy some of the tricks from jump_label to enable this. Primarily we overload key->next to store a sites pointer when there are no modules, this avoids having to use kmalloc() to initialize the sites and allows us to run much earlier. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20200818135805.220737930@infradead.org
2020-09-01static_call: Handle tail-callsPeter Zijlstra
GCC can turn our static_call(name)(args...) into a tail call, in which case we get a JMP.d32 into the trampoline (which then does a further tail-call). Teach objtool to recognise and mark these in .static_call_sites and adjust the code patching to deal with this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20200818135805.101186767@infradead.org
2020-09-01static_call: Add simple self-test for static callsPeter Zijlstra
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200818135804.922581202@infradead.org
2020-09-01static_call: Avoid kprobes on inline static_call()sPeter Zijlstra
Similar to how we disallow kprobes on any other dynamic text (ftrace/jump_label) also disallow kprobes on inline static_call()s. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200818135804.744920586@infradead.org
2020-09-01static_call: Add inline static call infrastructureJosh Poimboeuf
Add infrastructure for an arch-specific CONFIG_HAVE_STATIC_CALL_INLINE option, which is a faster version of CONFIG_HAVE_STATIC_CALL. At runtime, the static call sites are patched directly, rather than using the out-of-line trampolines. Compared to out-of-line static calls, the performance benefits are more modest, but still measurable. Steven Rostedt did some tracepoint measurements: https://lkml.kernel.org/r/20181126155405.72b4f718@gandalf.local.home This code is heavily inspired by the jump label code (aka "static jumps"), as some of the concepts are very similar. For more details, see the comments in include/linux/static_call.h. [peterz: simplified interface; merged trampolines] Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20200818135804.684334440@infradead.org
2020-09-01jump_label,module: Fix module lifetime for __jump_label_mod_text_reserved()Peter Zijlstra
Nothing ensures the module exists while we're iterating mod->jump_entries in __jump_label_mod_text_reserved(), take a module reference to ensure the module sticks around. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20200818135804.504501338@infradead.org
2020-09-01module: Properly propagate MODULE_STATE_COMING failurePeter Zijlstra
Now that notifiers got unbroken; use the proper interface to handle notifier errors and propagate them. There were already MODULE_STATE_COMING notifiers that failed; notably: - jump_label_module_notifier() - tracepoint_module_notify() - bpf_event_notify() By propagating this error, we fix those users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Jessica Yu <jeyu@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20200818135804.444372853@infradead.org
2020-09-01module: Fix up module_notifier return valuesPeter Zijlstra
While auditing all module notifiers I noticed a whole bunch of fail wrt the return value. Notifiers have a 'special' return semantics. As is; NOTIFY_DONE vs NOTIFY_OK is a bit vague; but notifier_from_errno(0) results in NOTIFY_OK and NOTIFY_DONE has a comment that says "Don't care". From this I've used NOTIFY_DONE when the function completely ignores the callback and notifier_to_error() isn't used. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Robert Richter <rric@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20200818135804.385360407@infradead.org
2020-09-01notifier: Fix broken error handling patternPeter Zijlstra
The current notifiers have the following error handling pattern all over the place: int err, nr; err = __foo_notifier_call_chain(&chain, val_up, v, -1, &nr); if (err & NOTIFIER_STOP_MASK) __foo_notifier_call_chain(&chain, val_down, v, nr-1, NULL) And aside from the endless repetition thereof, it is broken. Consider blocking notifiers; both calls take and drop the rwsem, this means that the notifier list can change in between the two calls, making @nr meaningless. Fix this by replacing all the __foo_notifier_call_chain() functions with foo_notifier_call_chain_robust() that embeds the above pattern, but ensures it is inside a single lock region. Note: I switched atomic_notifier_call_chain_robust() to use the spinlock, since RCU cannot provide the guarantee required for the recovery. Note: software_resume() error handling was broken afaict. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20200818135804.325626653@infradead.org
2020-09-01mm: cma: use CMA_MAX_NAME to define the length of cma name arrayBarry Song
CMA_MAX_NAME should be visible to CMA's users as they might need it to set the name of CMA areas and avoid hardcoding the size locally. So this patch moves CMA_MAX_NAME from local header file to include/linux header file and removes the hardcode in both hugetlb.c and contiguous.c. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-01dma-contiguous: provide the ability to reserve per-numa CMABarry Song
Right now, drivers like ARM SMMU are using dma_alloc_coherent() to get coherent DMA buffers to save their command queues and page tables. As there is only one default CMA in the whole system, SMMUs on nodes other than node0 will get remote memory. This leads to significant latency. This patch provides per-numa CMA so that drivers like SMMU can get local memory. Tests show localizing CMA can decrease dma_unmap latency much. For instance, before this patch, SMMU on node2 has to wait for more than 560ns for the completion of CMD_SYNC in an empty command queue; with this patch, it needs 240ns only. A positive side effect of this patch would be improving performance even further for those users who are worried about performance more than DMA security and use iommu.passthrough=1 to skip IOMMU. With local CMA, all drivers can get local coherent DMA buffers. Also, this patch changes the default CONFIG_CMA_AREAS to 19 in NUMA. As 1+CONFIG_CMA_AREAS should be quite enough for most servers on the market even they enable both hugetlb_cma and pernuma_cma. 2 numa nodes: 2(hugetlb) + 2(pernuma) + 1(default global cma) = 5 4 numa nodes: 4(hugetlb) + 4(pernuma) + 1(default global cma) = 9 8 numa nodes: 8(hugetlb) + 8(pernuma) + 1(default global cma) = 17 Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-08-31bpf: Remove bpf_lsm_file_mprotect from sleepable list.Alexei Starovoitov
Technically the bpf programs can sleep while attached to bpf_lsm_file_mprotect, but such programs need to access user memory. So they're in might_fault() category. Which means they cannot be called from file_mprotect lsm hook that takes write lock on mm->mmap_lock. Adjust the test accordingly. Also add might_fault() to __bpf_prog_enter_sleepable() to catch such deadlocks early. Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Fixes: e68a144547fc ("selftests/bpf: Add sleepable tests") Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200831201651.82447-1-alexei.starovoitov@gmail.com
2020-08-31bpf: Fix build without BPF_LSM.Alexei Starovoitov
resolve_btfids doesn't like empty set. Add unused ID when BPF_LSM is off. Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Reported-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200831163132.66521-1-alexei.starovoitov@gmail.com
2020-08-30kcsan: Use tracing-safe version of prandomMarco Elver
In the core runtime, we must minimize any calls to external library functions to avoid any kind of recursion. This can happen even though instrumentation is disabled for called functions, but tracing is enabled. Most recently, prandom_u32() added a tracepoint, which can cause problems for KCSAN even if the rcuidle variant is used. For example: kcsan -> prandom_u32() -> trace_prandom_u32_rcuidle -> srcu_read_lock_notrace -> __srcu_read_lock -> kcsan ... While we could disable KCSAN in kcsan_setup_watchpoint(), this does not solve other unexpected behaviour we may get due recursing into functions that may not be tolerant to such recursion: __srcu_read_lock -> kcsan -> ... -> __srcu_read_lock Therefore, switch to using prandom_u32_state(), which is uninstrumented, and does not have a tracepoint. Link: https://lkml.kernel.org/r/20200821063043.1949509-1-elver@google.com Link: https://lkml.kernel.org/r/20200820172046.GA177701@elver.google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-30Merge tag 'x86-urgent-2020-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three interrupt related fixes for X86: - Move disabling of the local APIC after invoking fixup_irqs() to ensure that interrupts which are incoming are noted in the IRR and not ignored. - Unbreak affinity setting. The rework of the entry code reused the regular exception entry code for device interrupts. The vector number is pushed into the errorcode slot on the stack which is then lifted into an argument and set to -1 because that's regs->orig_ax which is used in quite some places to check whether the entry came from a syscall. But it was overlooked that orig_ax is used in the affinity cleanup code to validate whether the interrupt has arrived on the new target. It turned out that this vector check is pointless because interrupts are never moved from one vector to another on the same CPU. That check is a historical leftover from the time where x86 supported multi-CPU affinities, but not longer needed with the now strict single CPU affinity. Famous last words ... - Add a missing check for an empty cpumask into the matrix allocator. The affinity change added a warning to catch the case where an interrupt is moved on the same CPU to a different vector. This triggers because a condition with an empty cpumask returns an assignment from the allocator as the allocator uses for_each_cpu() without checking the cpumask for being empty. The historical inconsistent for_each_cpu() behaviour of ignoring the cpumask and unconditionally claiming that CPU0 is in the mask struck again. Sigh. plus a new entry into the MAINTAINER file for the HPE/UV platform" * tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/matrix: Deal with the sillyness of for_each_cpu() on UP x86/irq: Unbreak interrupt affinity setting x86/hotplug: Silence APIC only after all interrupts are migrated MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
2020-08-30Merge tag 'locking-urgent-2020-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A set of fixes for lockdep, tracing and RCU: - Prevent recursion by using raw_cpu_* operations - Fixup the interrupt state in the cpu idle code to be consistent - Push rcu_idle_enter/exit() invocations deeper into the idle path so that the lock operations are inside the RCU watching sections - Move trace_cpu_idle() into generic code so it's called before RCU goes idle. - Handle raw_local_irq* vs. local_irq* operations correctly - Move the tracepoints out from under the lockdep recursion handling which turned out to be fragile and inconsistent" * tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep,trace: Expose tracepoints lockdep: Only trace IRQ edges mips: Implement arch_irqs_disabled() arm64: Implement arch_irqs_disabled() nds32: Implement arch_irqs_disabled() locking/lockdep: Cleanup x86/entry: Remove unused THUNKs cpuidle: Move trace_cpu_idle() into generic code cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic sched,idle,rcu: Push rcu_idle deeper into the idle path cpuidle: Fixup IRQ state lockdep: Use raw_cpu_*() for per-cpu variables
2020-08-30genirq/matrix: Deal with the sillyness of for_each_cpu() on UPThomas Gleixner
Most of the CPU mask operations behave the same way, but for_each_cpu() and it's variants ignore the cpumask argument and claim that CPU0 is always in the mask. This is historical, inconsistent and annoying behaviour. The matrix allocator uses for_each_cpu() and can be called on UP with an empty cpumask. The calling code does not expect that this succeeds but until commit e027fffff799 ("x86/irq: Unbreak interrupt affinity setting") this went unnoticed. That commit added a WARN_ON() to catch cases which move an interrupt from one vector to another on the same CPU. The warning triggers on UP. Add a check for the cpumask being empty to prevent this. Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2020-08-28bpf: Add bpf_copy_from_user() helper.Alexei Starovoitov
Sleepable BPF programs can now use copy_from_user() to access user memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
2020-08-28bpf: Introduce sleepable BPF programsAlexei Starovoitov
Introduce sleepable BPF programs that can request such property for themselves via BPF_F_SLEEPABLE flag at program load time. In such case they will be able to use helpers like bpf_copy_from_user() that might sleep. At present only fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only when they are attached to kernel functions that are known to allow sleeping. The non-sleepable programs are relying on implicit rcu_read_lock() and migrate_disable() to protect life time of programs, maps that they use and per-cpu kernel structures used to pass info between bpf programs and the kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs should not be enclosed in migrate_disable() as well. Therefore rcu_read_lock_trace is used to protect the life time of sleepable progs. There are many networking and tracing program types. In many cases the 'struct bpf_prog *' pointer itself is rcu protected within some other kernel data structure and the kernel code is using rcu_dereference() to load that program pointer and call BPF_PROG_RUN() on it. All these cases are not touched. Instead sleepable bpf programs are allowed with bpf trampoline only. The program pointers are hard-coded into generated assembly of bpf trampoline and synchronize_rcu_tasks_trace() is used to protect the life time of the program. The same trampoline can hold both sleepable and non-sleepable progs. When rcu_read_lock_trace is held it means that some sleepable bpf program is running from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru maps. These map types are waiting on programs to complete via synchronize_rcu_tasks_trace(); Updates to trampoline now has to do synchronize_rcu_tasks_trace() and synchronize_rcu_tasks() to wait for sleepable progs to finish and for trampoline assembly to finish. This is the first step of introducing sleepable progs. Eventually dynamically allocated hash maps can be allowed and networking program types can become sleepable too. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
2020-08-28bpf: Relax max_entries check for most of the inner map typesMartin KaFai Lau
Most of the maps do not use max_entries during verification time. Thus, those map_meta_equal() do not need to enforce max_entries when it is inserted as an inner map during runtime. The max_entries check is removed from the default implementation bpf_map_meta_equal(). The prog_array_map and xsk_map are exception. Its map_gen_lookup uses max_entries to generate inline lookup code. Thus, they will implement its own map_meta_equal() to enforce max_entries. Since there are only two cases now, the max_entries check is not refactored and stays in its own .c file. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011813.1970516-1-kafai@fb.com
2020-08-28bpf: Add map_meta_equal map opsMartin KaFai Lau
Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28Merge tag 'drm-misc-next-2020-08-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.10: UAPI Changes: Cross-subsystem Changes: Core Changes: - ttm: various cleanups and reworks of the API Driver Changes: - ast: various cleanups - gma500: A few fixes, conversion to GPIOd API - hisilicon: Change of maintainer, various reworks - ingenic: Clock handling and formats support improvements - mcde: improvements to the DSI support - mgag200: Support G200 desktop cards - mxsfb: Support the i.MX7 and i.MX8M and the alpha plane - panfrost: support devfreq - ps8640: Retrieve the EDID from eDP control, misc improvements - tidss: Add a workaround for AM65xx YUV formats handling - virtio: a few cleanups, support for virtio-gpu exported resources - bridges: Support the chained bridges on more drivers, new bridges: Toshiba TC358762, Toshiba TC358775, Lontium LT9611 - panels: Convert to dev_ based logging, read orientation from the DT, various fixes, new panels: Mantix MLAF057WE51-X, Chefree CH101OLHLWH-002, Powertip PH800480T013, KingDisplay KD116N21-30NV-A010 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200827155517.do6emeacetpturli@gilmour.lan
2020-08-27dma-pool: Fix an uninitialized variable bug in atomic_pool_expand()Dan Carpenter
The "page" pointer can be used with out being initialized. Fixes: d7e673ec2c8e ("dma-pool: Only allocate from CMA when in same memory zone") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-08-26bpf: Fix a verifier failure with xorYonghong Song
bpf selftest test_progs/test_sk_assign failed with llvm 11 and llvm 12. Compared to llvm 10, llvm 11 and 12 generates xor instruction which is not handled properly in verifier. The following illustrates the problem: 16: (b4) w5 = 0 17: ... R5_w=inv0 ... ... 132: (a4) w5 ^= 1 133: ... R5_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ... ... 37: (bc) w8 = w5 38: ... R5=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ... ... 41: (bc) w3 = w8 42: ... R3_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ... 45: (56) if w3 != 0x0 goto pc+1 ... R3_w=inv0 ... 46: (b7) r1 = 34 47: R1_w=inv34 R7=pkt(id=0,off=26,r=38,imm=0) 47: (0f) r7 += r1 48: R1_w=invP34 R3_w=inv0 R7_w=pkt(id=0,off=60,r=38,imm=0) 48: (b4) w9 = 0 49: R1_w=invP34 R3_w=inv0 R7_w=pkt(id=0,off=60,r=38,imm=0) 49: (69) r1 = *(u16 *)(r7 +0) invalid access to packet, off=60 size=2, R7(id=0,off=60,r=38) R7 offset is outside of the packet At above insn 132, w5 = 0, but after w5 ^= 1, we give a really conservative value of w5. At insn 45, in reality the condition should be always false. But due to conservative value for w3, the verifier evaluates it could be true and this later leads to verifier failure complaining potential packet out-of-bound access. This patch implemented proper XOR support in verifier. In the above example, we have: 132: R5=invP0 132: (a4) w5 ^= 1 133: R5_w=invP1 ... 37: (bc) w8 = w5 ... 41: (bc) w3 = w8 42: R3_w=invP1 ... 45: (56) if w3 != 0x0 goto pc+1 47: R3_w=invP1 ... processed 353 insns ... and the verifier can verify the program successfully. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200825064608.2017937-1-yhs@fb.com
2020-08-26bpf: verifier: Use target program's type for access verificationsUdip Pant
This patch adds changes in verifier to make decisions such as granting of read / write access or enforcement of return code status based on the program type of the target program while using dynamic program extension (of type BPF_PROG_TYPE_EXT). The BPF_PROG_TYPE_EXT type can be used to extend types such as XDP, SKB and others. Since the BPF_PROG_TYPE_EXT program type on itself is just a placeholder for those, we need this extended check for those extended programs to actually work with proper access, while using this option. Specifically, it introduces following changes: - may_access_direct_pkt_data: allow access to packet data based on the target prog - check_return_code: enforce return code based on the target prog (currently, this check is skipped for EXT program) - check_ld_abs: check for 'may_access_skb' based on the target prog - check_map_prog_compatibility: enforce the map compatibility check based on the target prog - may_update_sockmap: allow sockmap update based on the target prog Some other occurrences of prog->type is left as it without replacing with the 'resolved' type: - do_check_common() and check_attach_btf_id(): already have specific logic to handle the EXT prog type - jit_subprogs() and bpf_check(): Not changed for jit compilation or while inferring env->ops Next few patches in this series include selftests for some of these cases. Signed-off-by: Udip Pant <udippant@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825232003.2877030-2-udippant@fb.com
2020-08-26audit: Remove redundant null checkXu Wang
Because kfree_skb already checked NULL skb parameter, so the additional check is unnecessary, just remove it. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-08-26lockdep: Take read/write status in consideration when generate chainkeyBoqun Feng
Currently, the chainkey of a lock chain is a hash sum of the class_idx of all the held locks, the read/write status are not taken in to consideration while generating the chainkey. This could result into a problem, if we have: P1() { read_lock(B); lock(A); } P2() { lock(A); read_lock(B); } P3() { lock(A); write_lock(B); } , and P1(), P2(), P3() run one by one. And when running P2(), lockdep detects such a lock chain A -> B is not a deadlock, then it's added in the chain cache, and then when running P3(), even if it's a deadlock, we could miss it because of the hit of chain cache. This could be confirmed by self testcase "chain cached mixed R-L/L-W ". To resolve this, we use concept "hlock_id" to generate the chainkey, the hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16 type. With this, the chainkeys are different is the lock sequences have the same locks but different read/write status. Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks array now store the "hlock_id"s rather than lock_class indexes. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-15-boqun.feng@gmail.com
2020-08-26lockdep: Add recursive read locks into dependency graphBoqun Feng
Since we have all the fundamental to handle recursive read locks, we now add them into the dependency graph. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-13-boqun.feng@gmail.com
2020-08-26lockdep: Fix recursive read lock related safe->unsafe detectionBoqun Feng
Currently, in safe->unsafe detection, lockdep misses the fact that a LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may cause deadlock too, for example: P1 P2 <irq disabled> write_lock(l1); <irq enabled> read_lock(l2); write_lock(l2); <in irq> read_lock(l1); Actually, all of the following cases may cause deadlocks: LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_* LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_* LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ To fix this, we need to 1) change the calculation of exclusive_mask() so that READ bits are not dropped and 2) always call usage() in mark_lock_irq() to check usage deadlocks, even when the new usage of the lock is READ. Besides, adjust usage_match() and usage_acculumate() to recursive read lock changes. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-12-boqun.feng@gmail.com
2020-08-26lockdep: Adjust check_redundant() for recursive read changeBoqun Feng
check_redundant() will report redundancy if it finds a path could replace the about-to-add dependency in the BFS search. With recursive read lock changes, we certainly need to change the match function for the check_redundant(), because the path needs to match not only the lock class but also the dependency kinds. For example, if the about-to-add dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)-> .. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can also say we find an -(SR)-> path from A to B), we can not replace the dependency with that path in the BFS search. Because the -(SN)-> dependency can make a strong path with a following -(S*)-> dependency, however an -(SR)-> path cannot. Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that means if we find a path which is stronger than or equal to the about-to-add dependency, we can report the redundancy. By "stronger", it means both the start and the end of the path are not weaker than the start and the end of the dependency (E is "stronger" than S and N is "stronger" than R), so that we can replace the dependency with that path. To make sure we find a path whose start point is not weaker than the about-to-add dependency, we use a trick: the ->only_xr of the root (start point) of __bfs() is initialized as @prev-> == 0, therefore if @prev is E, __bfs() will pick only -(E*)-> for the first dependency, otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency. To make sure we find a path whose end point is not weaker than the about-to-add dependency, we replace the match function for __bfs() check_redundant(), we check for the case that either @next is R (anything is not weaker than it) or the end point of the path is N (which is not weaker than anything). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-11-boqun.feng@gmail.com
2020-08-26lockdep: Support deadlock detection for recursive read locks in ↵Boqun Feng
check_noncircular() Currently, lockdep only has limit support for deadlock detection for recursive read locks. This patch support deadlock detection for recursive read locks. The basic idea is: We are about to add dependency B -> A in to the dependency graph, we use check_noncircular() to find whether we have a strong dependency path A -> .. -> B so that we have a strong dependency circle (a closed strong dependency path): A -> .. -> B -> A , which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->. Since A -> .. -> B is already a strong dependency path, so if either B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B -> A is strong, otherwise not. So we introduce a new match function hlock_conflict() to replace the class_equal() for the deadlock check in check_noncircular(). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-10-boqun.feng@gmail.com
2020-08-26lockdep: Make __bfs(.match) return boolBoqun Feng
The "match" parameter of __bfs() is used for checking whether we hit a match in the search, therefore it should return a boolean value rather than an integer for better readability. This patch then changes the return type of the function parameter and the match functions to bool. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-9-boqun.feng@gmail.com
2020-08-26lockdep: Extend __bfs() to work with multiple types of dependenciesBoqun Feng
Now we have four types of dependencies in the dependency graph, and not all the pathes carry real dependencies (the dependencies that may cause a deadlock), for example: Given lock A and B, if we have: CPU1 CPU2 ============= ============== write_lock(A); read_lock(B); read_lock(B); write_lock(A); (assuming read_lock(B) is a recursive reader) then we have dependencies A -(ER)-> B, and B -(SN)-> A, and a dependency path A -(ER)-> B -(SN)-> A. In lockdep w/o recursive locks, a dependency path from A to A means a deadlock. However, the above case is obviously not a deadlock, because no one holds B exclusively, therefore no one waits for the other to release B, so who get A first in CPU1 and CPU2 will run non-blockingly. As a result, dependency path A -(ER)-> B -(SN)-> A is not a real/strong dependency that could cause a deadlock. From the observation above, we know that for a dependency path to be real/strong, no two adjacent dependencies can be as -(*R)-> -(S*)->. Now our mission is to make __bfs() traverse only the strong dependency paths, which is simple: we record whether we only have -(*R)-> for the previous lock_list of the path in lock_list::only_xr, and when we pick a dependency in the traverse, we 1) filter out -(S*)-> dependency if the previous lock_list only has -(*R)-> dependency (i.e. ->only_xr is true) and 2) set the next lock_list::only_xr to true if we only have -(*R)-> left after we filter out dependencies based on 1), otherwise, set it to false. With this extension for __bfs(), we now need to initialize the root of __bfs() properly (with a correct ->only_xr), to do so, we introduce some helper functions, which also cleans up a little bit for the __bfs() root initialization code. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-8-boqun.feng@gmail.com
2020-08-26lockdep: Introduce lock_list::depBoqun Feng
To add recursive read locks into the dependency graph, we need to store the types of dependencies for the BFS later. There are four types of dependencies: * Exclusive -> Non-recursive dependencies: EN e.g. write_lock(prev) held and try to acquire write_lock(next) or non-recursive read_lock(next), which can be represented as "prev -(EN)-> next" * Shared -> Non-recursive dependencies: SN e.g. read_lock(prev) held and try to acquire write_lock(next) or non-recursive read_lock(next), which can be represented as "prev -(SN)-> next" * Exclusive -> Recursive dependencies: ER e.g. write_lock(prev) held and try to acquire recursive read_lock(next), which can be represented as "prev -(ER)-> next" * Shared -> Recursive dependencies: SR e.g. read_lock(prev) held and try to acquire recursive read_lock(next), which can be represented as "prev -(SR)-> next" So we use 4 bits for the presence of each type in lock_list::dep. Helper functions and macros are also introduced to convert a pair of locks into lock_list::dep bit and maintain the addition of different types of dependencies. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-7-boqun.feng@gmail.com
2020-08-26lockdep: Reduce the size of lock_list::distanceBoqun Feng
lock_list::distance is always not greater than MAX_LOCK_DEPTH (which is 48 right now), so a u16 will fit. This patch reduces the size of lock_list::distance to save space, so that we can introduce other fields to help detect recursive read lock deadlocks without increasing the size of lock_list structure. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-6-boqun.feng@gmail.com
2020-08-26lockdep: Make __bfs() visit every dependency until a matchBoqun Feng
Currently, __bfs() will do a breadth-first search in the dependency graph and visit each lock class in the graph exactly once, so for example, in the following graph: A ---------> B | ^ | | +----------> C a __bfs() call starts at A, will visit B through dependency A -> B and visit C through dependency A -> C and that's it, IOW, __bfs() will not visit dependency C -> B. This is OK for now, as we only have strong dependencies in the dependency graph, so whenever there is a traverse path from A to B in __bfs(), it means A has strong dependencies to B (IOW, B depends on A strongly). So no need to visit all dependencies in the graph. However, as we are going to add recursive-read lock into the dependency graph, as a result, not all the paths mean strong dependencies, in the same example above, dependency A -> B may be a weak dependency and traverse A -> C -> B may be a strong dependency path. And with the old way of __bfs() (i.e. visiting every lock class exactly once), we will miss the strong dependency path, which will result into failing to find a deadlock. To cure this for the future, we need to find a way for __bfs() to visit each dependency, rather than each class, exactly once in the search until we find a match. The solution is simple: We used to mark lock_class::lockdep_dependency_gen_id to indicate a class has been visited in __bfs(), now we change the semantics a little bit: we now mark lock_class::lockdep_dependency_gen_id to indicate _all the dependencies_ in its lock_{after,before} have been visited in the __bfs() (note we only take one direction in a __bfs() search). In this way, every dependency is guaranteed to be visited until we find a match. Note: the checks in mark_lock_accessed() and lock_accessed() are removed, because after this modification, we may call these two functions on @source_entry of __bfs(), which may not be the entry in "list_entries" Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-5-boqun.feng@gmail.com
2020-08-26lockdep: Demagic the return value of BFSBoqun Feng
__bfs() could return four magic numbers: 1: search succeeds, but none match. 0: search succeeds, find one match. -1: search fails because of the cq is full. -2: search fails because a invalid node is found. This patch cleans things up by using a enum type for the return value of __bfs() and its friends, this improves the code readability of the code, and further, could help if we want to extend the BFS. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-4-boqun.feng@gmail.com
2020-08-26locking: More accurate annotations for read_lock()Boqun Feng
On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive read lock, actually it's only recursive if in_interrupt() is true. So change the annotation accordingly to catch more deadlocks. Note we used to treat read_lock() as pure recursive read locks in lib/locking-seftest.c, and this is useful, especially for the lockdep development selftest, so we keep this via a variable to force switching lock annotation for read_lock(). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-2-boqun.feng@gmail.com
2020-08-26sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of linux/sched/topology.hValentin Schneider
SD_DEGENERATE_GROUPS_MASK is only useful for sched/topology.c, but still gets defined for anyone who imports topology.h, leading to a flurry of unused variable warnings. Move it out of the header and place it next to the SD degeneration functions in sched/topology.c. Fixes: 4ee4ea443a5d ("sched/topology: Introduce SD metaflag for flags needing > 1 groups") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200825133216.9163-2-valentin.schneider@arm.com
2020-08-26sched/topology: Move sd_flag_debug out of linux/sched/topology.hValentin Schneider
Defining an array in a header imported all over the place clearly is a daft idea, that still didn't stop me from doing it. Leave a declaration of sd_flag_debug in topology.h and move its definition to sched/debug.c. Fixes: b6e862f38672 ("sched/topology: Define and assign sched_domain flag metadata") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200825133216.9163-1-valentin.schneider@arm.com
2020-08-26sched: Cache task_struct::flags in sched_submit_work()Sebastian Andrzej Siewior
sched_submit_work() is considered to be a hot path. The preempt_disable() instruction is a compiler barrier and forces the compiler to load task_struct::flags for the second comparison. By using a local variable, the compiler can load the value once and keep it in a register for the second comparison. Verified on x86-64 with gcc-10. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200819200025.lqvmyefqnbok5i4f@linutronix.de
2020-08-26sched/fair: Simplify the work when reweighting entityJiang Biao
The code in reweight_entity() can be simplified. For a sched entity on the rq, the entity accounting can be replaced by cfs_rq instantaneous load updates currently called from within the entity accounting. Even though an entity on the rq can't represent a task in reweight_entity() (a task is always dequeued before calling this function) and so the numa task accounting and the rq->cfs_tasks list management of the entity accounting are never called, the redundant cfs_rq->nr_running decrement/increment will be avoided. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20200811113209.34057-1-benbjiang@tencent.com
2020-08-26sched/fair: Fix wrong negative conversion in find_energy_efficient_cpu()Lukasz Luba
In find_energy_efficient_cpu() 'cpu_cap' could be less that 'util'. It might be because of RT, DL (so higher sched class than CFS), irq or thermal pressure signal, which reduce the capacity value. In such situation the result of 'cpu_cap - util' might be negative but stored in the unsigned long. Then it might be compared with other unsigned long when uclamp_rq_util_with() reduced the 'util' such that is passes the fits_capacity() check. Prevent this situation and make the arithmetic more safe. Fixes: 1d42509e475cd ("sched/fair: Make EAS wakeup placement consider uclamp restrictions") Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20200810083004.26420-1-lukasz.luba@arm.com
2020-08-26sched/fair: Ignore cache hotness for SMT migrationJosh Don
SMT siblings share caches, so cache hotness should be irrelevant for cross-sibling migration. Signed-off-by: Josh Don <joshdon@google.com> Proposed-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200804193413.510651-1-joshdon@google.com
2020-08-26lockdep,trace: Expose tracepointsPeter Zijlstra
The lockdep tracepoints are under the lockdep recursion counter, this has a bunch of nasty side effects: - TRACE_IRQFLAGS doesn't work across the entire tracepoint - RCU-lockdep doesn't see the tracepoints either, hiding numerous "suspicious RCU usage" warnings. Pull the trace_lock_*() tracepoints completely out from under the lockdep recursion handling and completely rely on the trace level recusion handling -- also, tracing *SHOULD* not be taking locks in any case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.782688941@infradead.org