summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2021-12-07rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()Frederic Weisbecker
Preemptible RCU does not use the rcu_data structure's ->cpu_no_qs.b.exp, instead using a separate ->exp_deferred_qs field to record the need for an expedited quiescent state. In fact ->cpu_no_qs.b.exp should never be set in preemptible RCU because preemptible RCU's expedited grace periods use other mechanisms to record quiescent states. This commit therefore removes the implicit rcu_qs() reference to ->cpu_no_qs.b.exp in favor of a direct reference to ->cpu_no_qs.b.norm. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07sched/rt: Try to restart rt period timer when rt runtime exceededLi Hua
When rt_runtime is modified from -1 to a valid control value, it may cause the task to be throttled all the time. Operations like the following will trigger the bug. E.g: 1. echo -1 > /proc/sys/kernel/sched_rt_runtime_us 2. Run a FIFO task named A that executes while(1) 3. echo 950000 > /proc/sys/kernel/sched_rt_runtime_us When rt_runtime is -1, The rt period timer will not be activated when task A enqueued. And then the task will be throttled after setting rt_runtime to 950,000. The task will always be throttled because the rt period timer is not activated. Fixes: d0b27fa77854 ("sched: rt-group: synchonised bandwidth period") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Hua <hucool.lihua@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211203033618.11895-1-hucool.lihua@huawei.com
2021-12-07sched/fair: Document the slow path and fast path in select_task_rq_fairBarry Song
All People I know including myself took a long time to figure out that typical wakeup will always go to fast path and never go to slow path except WF_FORK and WF_EXEC. Vincent reminded me once in a linaro meeting and made me understand slow path won't happen for WF_TTWU. But my other friends repeatedly wasted a lot of time on testing this path like me before I reminded them. So obviously the code needs some document. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211016111109.5559-1-21cnbao@gmail.com
2021-12-07dma-direct: factor the swiotlb code out of __dma_direct_alloc_pagesChristoph Hellwig
Add a new helper to deal with the swiotlb case. This keeps the code nicely boundled and removes the not required call to dma_direct_optimal_gfp_mask for the swiotlb case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-07dma-direct: drop two CONFIG_DMA_RESTRICTED_POOL conditionalsChristoph Hellwig
swiotlb_alloc and swiotlb_free are properly stubbed out if CONFIG_DMA_RESTRICTED_POOL is not set, so skip the extra checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-07dma-direct: warn if there is no pool for force unencrypted allocationsChristoph Hellwig
Instead of blindly running into a blocking operation for a non-blocking gfp, return NULL and spew an error. Note that Kconfig prevents this for all currently relevant platforms, and this is just a debug check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-07dma-direct: fail allocations that can't be made coherentChristoph Hellwig
If the architecture can't remap or set an address uncached there is no way to fullfill a request for a coherent allocation. Return NULL in that case. Note that this case currently does not happen, so this is a theoretical fixup and/or a preparation for eventually supporting platforms that can't support coherent allocations with the generic code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-07dma-direct: refactor the !coherent checks in dma_direct_allocChristoph Hellwig
Add a big central !dev_is_dma_coherent(dev) block to deal with as much as of the uncached allocation schemes and document the schemes a bit better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-07dma-direct: factor out a helper for DMA_ATTR_NO_KERNEL_MAPPING allocationsChristoph Hellwig
Split the code for DMA_ATTR_NO_KERNEL_MAPPING allocations into a separate helper to make dma_direct_alloc a little more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: David Rientjes <rientjes@google.com>
2021-12-07dma-direct: clean up the remapping checks in dma_direct_allocChristoph Hellwig
Add two local variables to track if we want to remap the returned address using vmap or call dma_set_uncached and use that to simplify the code flow. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-07dma-direct: always leak memory that can't be re-encryptedChristoph Hellwig
We must never let unencrypted memory go back into the general page pool. So if we fail to set it back to encrypted when freeing DMA memory, leak the memory instead and warn the user. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-07dma-direct: don't call dma_set_decrypted for remapped allocationsChristoph Hellwig
Remapped allocations handle the encrypted bit through the pgprot passed to vmap, so there is no call dma_set_decrypted. Note that this case is currently entirely theoretical as no valid kernel configuration supports remapped allocations and memory encryption currently. Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-07dma-direct: factor out dma_set_{de,en}crypted helpersChristoph Hellwig
Factor out helpers the make dealing with memory encryption a little less cumbersome. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2021-12-06bpf: Silence purge_cand_cache build warning.Alexei Starovoitov
When CONFIG_DEBUG_INFO_BTF_MODULES is not set the following warning can be seen: kernel/bpf/btf.c:6588:13: warning: 'purge_cand_cache' defined but not used [-Wunused-function] Fix it. Fixes: 1e89106da253 ("bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211207014839.6976-1-alexei.starovoitov@gmail.com
2021-12-06tracing: Switch to kvfree_rcu() APIUladzislau Rezki (Sony)
Instead of invoking a synchronize_rcu() to free a pointer after a grace period we can directly make use of new API that does the same but in more efficient way. Link: https://lkml.kernel.org/r/20211124110308.2053-10-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing: Fix synth_event_add_val() kernel-doc commentQiujun Huang
It's named field here. Link: https://lkml.kernel.org/r/20210516022410.64271-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing/uprobes: Use trace_event_buffer_reserve() helperSteven Rostedt (VMware)
To be consistent with kprobes and eprobes, use trace_event_buffer_reserver() and trace_event_buffer_commit(). This will ensure that any updates to trace events will also be implemented on uprobe events. Link: https://lkml.kernel.org/r/20211206162440.69fbf96c@gandalf.local.home Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing/kprobes: Do not open code event reserve logicSteven Rostedt (VMware)
As kprobe events use trace_event_buffer_commit() to commit the event to the ftrace ring buffer, for consistency, it should use trace_event_buffer_reserve() to allocate it, as the two functions are related. Link: https://lkml.kernel.org/r/20211130024319.257430762@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing: Have eprobes use filtering logic of trace eventsSteven Rostedt (VMware)
The eprobes open code the reserving of the event on the ring buffer for ftrace instead of using the ftrace event wrappers, which means that it doesn't get affected by the filters, breaking the filtering logic on user space. Link: https://lkml.kernel.org/r/20211130024319.068451680@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing: Disable preemption when using the filter bufferSteven Rostedt (VMware)
In case trace_event_buffer_lock_reserve() is called with preemption enabled, the algorithm that defines the usage of the per cpu filter buffer may fail if the task schedules to another CPU after determining which buffer it will use. Disable preemption when using the filter buffer. And because that same buffer must be used throughout the call, keep preemption disabled until the filter buffer is released. This will also keep the semantics between the use case of when the filter buffer is used, and when the ring buffer itself is used, as that case also disables preemption until the ring buffer is released. Link: https://lkml.kernel.org/r/20211130024318.880190623@goodmis.org [ Fixed warning of assignment in if statement Reported-by: kernel test robot <lkp@intel.com> ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing: Use __this_cpu_read() in trace_event_buffer_lock_reserver()Steven Rostedt (VMware)
The value read by this_cpu_read() is used later and its use is expected to stay on the same CPU as being read. But this_cpu_read() does not warn if it is called without preemption disabled, where as __this_cpu_read() will check if preemption is disabled on CONFIG_DEBUG_PREEMPT Currently all callers have preemption disabled, but there may be new callers in the future that may not. Link: https://lkml.kernel.org/r/20211130024318.698165354@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing: Add '__rel_loc' using trace event macrosMasami Hiramatsu
Add '__rel_loc' using trace event macros. These macros are usually not used in the kernel, except for testing purpose. This also add "rel_" variant of macros for dynamic_array string, and bitmask. Link: https://lkml.kernel.org/r/163757342119.510314.816029622439099016.stgit@devnote2 Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing: Support __rel_loc relative dynamic data location attributeMasami Hiramatsu
Add '__rel_loc' new dynamic data location attribute which encodes the data location from the next to the field itself. The '__data_loc' is used for encoding the dynamic data location on the trace event record. But '__data_loc' is not useful if the writer doesn't know the event header (e.g. user event), because it records the dynamic data offset from the entry of the record, not the field itself. This new '__rel_loc' attribute encodes the data location relatively from the next of the field. For example, when there is a record like below (the number in the parentheses is the size of fields) |header(N)|common(M)|fields(K)|__data_loc(4)|fields(L)|data(G)| In this case, '__data_loc' field will be __data_loc = (G << 16) | (N+M+K+4+L) If '__rel_loc' is used, this will be |header(N)|common(M)|fields(K)|__rel_loc(4)|fields(L)|data(G)| where __rel_loc = (G << 16) | (L) This case shows L bytes after the '__rel_loc' attribute field, if there is no fields after the __rel_loc field, L must be 0. This is relatively easy (and no need to consider the kernel header change) when the event data fields are composed by user who doesn't know header and common fields. Link: https://lkml.kernel.org/r/163757341258.510314.4214431827833229956.stgit@devnote2 Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06tracing: Fix spelling mistake "aritmethic" -> "arithmetic"Colin Ian King
There is a spelling mistake in the tracing mini-HOWTO text. Fix it. Link: https://lkml.kernel.org/r/20211108201513.42876-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06bpf: Remove config check to enable bpf support for branch recordsKajol Jain
Branch data available to BPF programs can be very useful to get stack traces out of userspace application. Commit fff7b64355ea ("bpf: Add bpf_read_branch_records() helper") added BPF support to capture branch records in x86. Enable this feature also for other architectures as well by removing checks specific to x86. If an architecture doesn't support branch records, bpf_read_branch_records() still has appropriate checks and it will return an -EINVAL in that scenario. Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures should return -ENOENT in such case. Hence, update the appropriate check to return -ENOENT instead. Selftest 'perf_branches' result on power9 machine which has the branch stacks support: - Before this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:FAIL #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:FAIL Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:OK #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Selftest 'perf_branches' result on power9 machine which doesn't have branch stack report: - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:SKIP #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED Fixes: fff7b64355eac ("bpf: Add bpf_read_branch_records() helper") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com
2021-12-06printk/console: Clean up boot console handling in register_console()Petr Mladek
The variable @bcon has two meanings. It is used several times for iterating the list of registered consoles. In the meantime, it holds the information whether a boot console is first in @console_drivers list. The information about the 1st console driver used to be important for the decision whether to install the new console by default or not. It allowed to re-evaluate the variable @need_default_console when a real console with tty binding has been unregistered in the meantime. The decision about the default console is not longer affected by @bcon variable. The current code checks whether the first driver is real and has tty binding directly. The information about the first console is still used for two more decisions: 1. It prevents duplicate output on non-boot consoles with CON_CONSDEV flag set. 2. Early/boot consoles are unregistered when a real console with CON_CONSDEV is registered and @keep_bootcon is not set. The behavior in the real life is far from obvious. @bcon is set according to the first console @console_drivers list. But the first position in the list is special: 1. Consoles with CON_CONSDEV flag are put at the beginning of the list. It is either the preferred console or any console with tty binding registered by default. 2. Another console might become the first in the list when the first console in the list is unregistered. It might happen either explicitly or automatically when boot consoles are unregistered. There is one more important rule: + Boot consoles can't be registered when any real console is already registered. It is a puzzle. The main complication is the dependency on the first position is the list and the complicated rules around it. Let's try to make it easier: 1. Add variable @bootcon_enabled and set it by iterating all registered consoles. The variable has obvious meaning and more predictable behavior. Any speed optimization and other tricks are not worth it. 2. Use a generic name for the variable that is used to iterate the list on registered console drivers. Behavior change: No, maybe surprisingly, there is _no_ behavior change! Let's provide the proof by contradiction. Both operations, duplicate output prevention and boot consoles removal, are done only when the newly added console has CON_CONSDEV flag set. The behavior would change when the new @bootcon_enabled has different value than the original @bcon. By other words, the behavior would change when the following conditions are true: + a console with CON_CONSDEV flag is added + a real (non-boot) console is the first in the list + a boot console is later in the list Now, a real console might be first in the list only when: + It was the first registered console. In this case, there can't be any boot console because any later ones were rejected. + It was put at the first position because it had CON_CONSDEV flag set. It was either the preferred console or it was a console with tty binding registered by default. We are interested only in a real consoles here. And real console with tty binding fulfills conditions of the default console. Now, there is always only one console that is either preferred or fulfills conditions of the default console. It can't be already in the list and being registered at the same time. As a result, the above three conditions could newer be "true" at the same time. Therefore the behavior can't change. Final dilemma: OK, the new code has the same behavior. But is the change in the right direction? What if the handling of @console_drivers is updated in the future? OK, let's look at it from another angle: 1. The ordering of @console_drivers list is important only in console_device() function. The first console driver with tty binding gets associated with /dev/console. 2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set for the driver that is returned by console_device(). 3. A boot console is removed and the duplicated output is prevented when the real console with CON_CONSDEV flag is registered. Now, in the ideal world: + The driver associated with /dev/console should be either a console preferred via the command line, device tree, or SPCR. Or it should be the first real console with tty binding registered by default. + The code should match the related boot and real console drivers. It should unregister only the obsolete boot driver. And the duplicated output should be prevented only on the related real driver. It is clear that it is not guaranteed by the current code. Instead, the current code looks like a maze of heuristics that try to achieve the above. It is result of adding several features over last few decades. For example, a possibility to register more consoles, unregister consoles, boot consoles, consoles without tty binding, device tree, SPCR, braille consoles. Anyway, there is no reason why the decision, about removing boot consoles and preventing duplicated output, should depend on the first console in the list. The current code does the decisions primary by CON_CONSDEV flag that is used for the preferred console. It looks like a good compromise. And the change seems to be in the right direction. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-12-06printk/console: Remove need_default_console variablePetr Mladek
The variable @need_default_console is used to decide whether a newly registered console should get enabled by default. The logic is complicated. It can be modified in a register_console() call. But it is always re-evaluated in the next call by the following condition: if (need_default_console || bcon || !console_drivers) need_default_console = preferred_console < 0; In short, the value is updated when either of the condition is valid: + the value is still, or again, "true" + boot/early console is still the first in @console_driver list + @console_driver list is empty The value is updated according to @preferred_console. In particular, it is set to "false" when a @preferred_console was set by __add_preferred_console(). This happens when a non-braille console was added via the command line, device tree, or SPCR. It far from clear what this all means together. Let's look at @need_default_console from another angle: 1. The value is "true" by default. It means that it is always set according to @preferred_console during the first register_console() call. By other words, the first register_console() call will register the console by default only when none non-braille console was defined via the command line, device tree, or SPCR. 2. The value will always stay "false" when @preferred_console is set. By other words, try_enable_default_console() will never get called when a non-braille console is explicitly required. 4. The value might be set to "false" in try_enable_default_console() when a console with tty binding (driver) gets enabled. In this case CON_CONSDEV is set as well. It causes that the console will be inserted as first into the list @console_driver. It might be either real or boot/early console. 5. The value will be set _back_ to "true" in the next register_console() call when: + The console added by the previous register_console() had been a boot/early one. + The last console has been unregistered in the meantime and a boot/early console became first in @console_drivers list again. Or the list became empty. By other words, the value will stay "false" only when the last registered console was real, had tty binding, and was not removed in the mean time. The main logic looks clear: + Consoles are enabled by default only when no one is preferred via the command line, device tree, or SPCR. + By default, any console is enabled until a real console with tty binding gets registered. The behavior when the real console with tty binding is later removed is a bit unclear: + By default, any new console is registered again only when there is no console or the first console in the list is a boot one. The question is why the code is suddenly happy when a real console without tty binding is the first in the list. It looks like an overlook and bug. Conclusion: The state of @preferred_console and the first console in @console_driver list should be enough to decide whether we need to enable the given console by default. The rules are simple. New consoles are _not_ enabled by default when either of the following conditions is true: + @preferred_console is set. It means that a non-braille console is explicitly configured via the command line, device tree, or SPCR. + A real console with tty binding is registered. Such a console will have CON_CONSDEV flag set and will always be the first in @console_drivers list. Note: The new code does not use @bcon variable. The meaning of the variable is far from clear. The direct check of the first console in the list makes it more clear that only real console fulfills requirements of the default console. Behavior change: As already discussed above. There was one situation where the original code worked a strange way. Let's have: + console A: real console without tty binding + console B: real console with tty binding and do: register_console(A); /* 1st step */ register_console(B); /* 2nd step */ unregister_console(B); /* 3rd step */ register_console(B); /* 4th step */ The original code will not register the console B in the 4th step. @need_default_console is set to "false" in 2nd step. The real console with tty binding (driver) is then removed in the 3rd step. But @need_default_console will stay "false" in the 4th step because there is no boot/early console and @registered_consoles list is not empty. The new code will register the console B in the 4th step because it checks whether the first console has tty binding (->driver) This behavior change should acceptable: 1. The scenario requires manual intervention (console removal). The system should boot with the same consoles as before. 2. Console B is registered again probably because the user wants to use it. The most likely scenario is that the related module is reloaded. 3. It makes the behavior more consistent and predictable. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20211122132649.12737-5-pmladek@suse.com
2021-12-06printk/console: Remove unnecessary need_default_console manipulationPetr Mladek
There is no need to clear @need_default_console when a console preferred by the command line, device tree, or SPCR, gets enabled. The code is called only when some non-braille console matched a console in @console_cmdline array. It means that a non-braille console was added in __add_preferred_console() and the variable preferred_console is set to a number >= 0. As a result, @need_default_console is always set to "false" in the magic condition: if (need_default_console || bcon || !console_drivers) need_default_console = preferred_console < 0; This is one small step in removing the above magic condition that is hard to follow. The patch removes one superfluous assignment and should not change the functionality. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20211122132649.12737-4-pmladek@suse.com
2021-12-06printk/console: Rename has_preferred_console to need_default_consolePetr Mladek
The logic around the variable @has_preferred_console made my head spin many times. Part of the problem is the ambiguous name. There is the variable @preferred_console. It points to the last non-braille console in @console_cmdline array. This array contains consoles preferred via the command line, device tree, or SPCR. Then there is the variable @has_preferred_console. It is set to "true" when @preferred_console is enabled or when a console with tty binding gets enabled by default. It might get reset back by the magic condition: if (!has_preferred_console || bcon || !console_drivers) has_preferred_console = preferred_console >= 0; It is a puzzle. Dumb explanation is that it gets re-evaluated when: + it was not set before (see above when it gets set) + there is still an early console enabled (bcon) + there is no console enabled (!console_drivers) This is still a puzzle. It gets more clear when we see where the value is checked. The only meaning of the variable is to decide whether we should try to enable the new console by default. Rename the variable according to the single situation where the value is checked. The rename requires an inverted logic. Otherwise, it is a simple search & replace. It does not change the functionality. Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Link: https://lore.kernel.org/r/20211122132649.12737-3-pmladek@suse.com
2021-12-06printk/console: Split out code that enables default consolePetr Mladek
Put the code enabling a console by default into a separate function called try_enable_default_console(). Rename try_enable_new_console() to try_enable_preferred_console() to make the purpose of the different variants more clear. It is a code refactoring without any functional change. Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Link: https://lore.kernel.org/r/20211122132649.12737-2-pmladek@suse.com
2021-12-05Merge tag 'timers_urgent_for_v5.16_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Prevent a tick storm when a dedicated timekeeper CPU in nohz_full mode runs for prolonged periods with interrupts disabled and ends up programming the next tick in the past, leading to that storm * tag 'timers_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/nohz: Last resort update jiffies on nohz_full IRQ entry
2021-12-05Merge tag 'sched_urgent_for_v5.16_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Properly init uclamp_flags of a runqueue, on first enqueuing - Fix preempt= callback return values - Correct utime/stime resource usage reporting on nohz_full to return the proper times instead of shorter ones * tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/uclamp: Fix rq->uclamp_max not set on first enqueue preempt/dynamic: Fix setup_preempt_mode() return value sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
2021-12-04bpf: Disallow BPF_LOG_KERNEL log level for bpf(BPF_BTF_LOAD)Hou Tao
BPF_LOG_KERNEL is only used internally, so disallow bpf_btf_load() to set log level as BPF_LOG_KERNEL. The same checking has already been done in bpf_check(), so factor out a helper to check the validity of log attributes and use it in both places. Fixes: 8580ac9404f6 ("bpf: Process in-kernel BTF") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211203053001.740945-1-houtao1@huawei.com
2021-12-04locking: Make owner_on_cpu() into <linux/sched.h>Kefeng Wang
Move the owner_on_cpu() from kernel/locking/rwsem.c into include/linux/sched.h with under CONFIG_SMP, then use it in the mutex/rwsem/rtmutex to simplify the code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211203075935.136808-2-wangkefeng.wang@huawei.com
2021-12-04lockdep: Remove softirq accounting on PREEMPT_RT.Thomas Gleixner
There is not really a softirq context on PREEMPT_RT. Softirqs on PREEMPT_RT are always invoked within the context of a threaded interrupt handler or within ksoftirqd. The "in-softirq" context is preemptible and is protected by a per-CPU lock to ensure mutual exclusion. There is no difference on PREEMPT_RT between spin_lock_irq() and spin_lock() because the former does not disable interrupts. Therefore if a lock is used in_softirq() and locked once with spin_lock_irq() then lockdep will report this with "inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage". Teach lockdep that we don't really do softirqs on -RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211129174654.668506-6-bigeasy@linutronix.de
2021-12-04locking/rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable().Sebastian Andrzej Siewior
The locking selftest for ww-mutex expects to operate directly on the base-mutex which becomes a rtmutex on PREEMPT_RT. Add a rtmutex based implementation of mutex_lock_nest_lock() and mutex_lock_killable() named rt_mutex_lock_nest_lock() abd rt_mutex_lock_killable(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211129174654.668506-5-bigeasy@linutronix.de
2021-12-04locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.Peter Zijlstra
Similar to the issues in commits: 6467822b8cc9 ("locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes") a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") ww_rt_mutex_lock() should not return EDEADLK without first going through the __ww_mutex logic to set the required state. In fact, the chain-walk can deal with the spurious cycles (per the above commits) this check warns about and is trying to avoid. Therefore ignore this test for ww_rt_mutex and simply let things fall in place. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211129174654.668506-4-bigeasy@linutronix.de
2021-12-04locking: Remove rt_rwlock_is_contended().Sebastian Andrzej Siewior
rt_rwlock_is_contended() has no users. It makes no sense to use it as rwlock_is_contended() because it is a sleeping lock on RT and preemption is possible. It reports always != 0 if used by a writer and even if there is a waiter then the lock might not be handed over if the current owner has the highest priority. Remove rt_rwlock_is_contended(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211129174654.668506-3-bigeasy@linutronix.de
2021-12-04sched: Trigger warning if ->migration_disabled counter underflows.Sebastian Andrzej Siewior
If migrate_enable() is used more often than its counter part then it remains undetected and rq::nr_pinned will underflow, too. Add a warning if migrate_enable() is attempted if without a matching a migrate_disable(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211129174654.668506-2-bigeasy@linutronix.de
2021-12-04sched/fair: Fix per-CPU kthread and wakee stacking for asym CPU capacityVincent Donnefort
select_idle_sibling() has a special case for tasks woken up by a per-CPU kthread where the selected CPU is the previous one. For asymmetric CPU capacity systems, the assumption was that the wakee couldn't have a bigger utilization during task placement than it used to have during the last activation. That was not considering uclamp.min which can completely change between two task activations and as a consequence mandates the fitness criterion asym_fits_capacity(), even for the exit path described above. Fixes: b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20211129173115.4006346-1-vincent.donnefort@arm.com
2021-12-04sched/fair: Fix detection of per-CPU kthreads waking a taskVincent Donnefort
select_idle_sibling() has a special case for tasks woken up by a per-CPU kthread, where the selected CPU is the previous one. However, the current condition for this exit path is incomplete. A task can wake up from an interrupt context (e.g. hrtimer), while a per-CPU kthread is running. A such scenario would spuriously trigger the special case described above. Also, a recent change made the idle task like a regular per-CPU kthread, hence making that situation more likely to happen (is_per_cpu_kthread(swapper) being true now). Checking for task context makes sure select_idle_sibling() will not interpret a wake up from any other context as a wake up by a per-CPU kthread. Fixes: 52262ee567ad ("sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lore.kernel.org/r/20211201143450.479472-1-vincent.donnefort@arm.com
2021-12-04sched/uclamp: Fix rq->uclamp_max not set on first enqueueQais Yousef
Commit d81ae8aac85c ("sched/uclamp: Fix initialization of struct uclamp_rq") introduced a bug where uclamp_max of the rq is not reset to match the woken up task's uclamp_max when the rq is idle. The code was relying on rq->uclamp_max initialized to zero, so on first enqueue static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p, enum uclamp_id clamp_id) { ... if (uc_se->value > READ_ONCE(uc_rq->value)) WRITE_ONCE(uc_rq->value, uc_se->value); } was actually resetting it. But since commit d81ae8aac85c changed the default to 1024, this no longer works. And since rq->uclamp_flags is also initialized to 0, neither above code path nor uclamp_idle_reset() update the rq->uclamp_max on first wake up from idle. This is only visible from first wake up(s) until the first dequeue to idle after enabling the static key. And it only matters if the uclamp_max of this task is < 1024 since only then its uclamp_max will be effectively ignored. Fix it by properly initializing rq->uclamp_flags = UCLAMP_FLAG_IDLE to ensure uclamp_idle_reset() is called which then will update the rq uclamp_max value as expected. Fixes: d81ae8aac85c ("sched/uclamp: Fix initialization of struct uclamp_rq") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20211202112033.1705279-1-qais.yousef@arm.com
2021-12-04preempt/dynamic: Fix setup_preempt_mode() return valueAndrew Halaney
__setup() callbacks expect 1 for success and 0 for failure. Correct the usage here to reflect that. Fixes: 826bfeb37bb4 ("preempt/dynamic: Support dynamic preempt with preempt= boot option") Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211203233203.133581-1-ahalaney@redhat.com
2021-12-03Merge SA_IMMUTABLE-fixes-for-v5.16-rc2Eric W. Biederman
I completed the first batch of signal changes for v5.17 against v5.16-rc1 before the SA_IMMUTABLE fixes where completed. Which leaves me with two lines of development that I want on my signal development branch both rooted at v5.16-rc1. Especially as I am hoping to reach the point of being able to remove SA_IMMUTABLE. Linus merged my SA_IMUTABLE fixes as: 7af959b5d5c8 ("Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace") To avoid rebasing the development changes that are currently complete I am merging the work I sent upstream to Linus to make my life simpler. The SA_IMMUTABLE changes as they are described in Linus's merge commit. Pull exit-vs-signal handling fixes from Eric Biederman: "This is a small set of changes where debuggers were no longer able to intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit cleanups. This is essentially the change you suggested with all of i's dotted and the t's crossed so that ptrace can intercept all of the cases it has been able to intercept the past, and all of the cases that made it to exit without giving ptrace a chance still don't give ptrace a chance" * 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Replace force_fatal_sig with force_exit_sig when in doubt signal: Don't always set SA_IMMUTABLE for forced signals Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-03libbpf: Reduce bpf_core_apply_relo_insn() stack usage.Alexei Starovoitov
Reduce bpf_core_apply_relo_insn() stack usage and bump BPF_CORE_SPEC_MAX_LEN limit back to 64. Fixes: 29db4bea1d10 ("bpf: Prepare relo_core.c for kernel duty.") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211203182836.16646-1-alexei.starovoitov@gmail.com
2021-12-03bpf: Fix the off-by-two error in range markingsMaxim Mikityanskiy
The first commit cited below attempts to fix the off-by-one error that appeared in some comparisons with an open range. Due to this error, arithmetically equivalent pieces of code could get different verdicts from the verifier, for example (pseudocode): // 1. Passes the verifier: if (data + 8 > data_end) return early read *(u64 *)data, i.e. [data; data+7] // 2. Rejected by the verifier (should still pass): if (data + 7 >= data_end) return early read *(u64 *)data, i.e. [data; data+7] The attempted fix, however, shifts the range by one in a wrong direction, so the bug not only remains, but also such piece of code starts failing in the verifier: // 3. Rejected by the verifier, but the check is stricter than in #1. if (data + 8 >= data_end) return early read *(u64 *)data, i.e. [data; data+7] The change performed by that fix converted an off-by-one bug into off-by-two. The second commit cited below added the BPF selftests written to ensure than code chunks like #3 are rejected, however, they should be accepted. This commit fixes the off-by-two error by adjusting new_range in the right direction and fixes the tests by changing the range into the one that should actually fail. Fixes: fb2a311a31d3 ("bpf: fix off by one for range markings with L{T, E} patterns") Fixes: b37242c773b2 ("bpf: add test cases to bpf selftests to cover all access tests") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211130181607.593149-1-maximmi@nvidia.com
2021-12-02workqueue: Fix unbind_workers() VS wq_worker_sleeping() raceFrederic Weisbecker
At CPU-hotplug time, unbind_workers() may preempt a worker while it is going to sleep. In that case the following scenario can happen: unbind_workers() wq_worker_sleeping() -------------- ------------------- if (worker->flags & WORKER_NOT_RUNNING) return; //PREEMPTED by unbind_workers worker->flags |= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_dec_and_test(&pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of -1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == -1", all sorts of hazards can happen, starting with queued works being ignored because no workers are awaken at insert_work() time. Fix this with checking again the worker flags while pool->lock is locked. Fixes: b945efcdd07d ("sched: Remove pointless preemption disable in sched_submit_work()") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-12-02workqueue: Fix unbind_workers() VS wq_worker_running() raceFrederic Weisbecker
At CPU-hotplug time, unbind_worker() may preempt a worker while it is waking up. In that case the following scenario can happen: unbind_workers() wq_worker_running() -------------- ------------------- if (!(worker->flags & WORKER_NOT_RUNNING)) //PREEMPTED by unbind_workers worker->flags |= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_inc(&worker->pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of 1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == 1", further queued work may end up not being served, because no worker is awaken at work insert time. This raises rcutorture writer stalls for example. Fix this with disabling preemption in the right place in wq_worker_running(). It's worth noting that if the worker migrates and runs concurrently with unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. Fixes: 6d25be5782e4 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-12-02bpf: Fix bpf_check_mod_kfunc_call for built-in modulesKumar Kartikeya Dwivedi
When module registering its set is built-in, THIS_MODULE will be NULL, hence we cannot return early in case owner is NULL. Fixes: 14f267d95fe4 ("bpf: btf: Introduce helpers for dynamic BTF set registration") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211122144742.477787-3-memxor@gmail.com
2021-12-02bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALLKumar Kartikeya Dwivedi
Vinicius Costa Gomes reported [0] that build fails when CONFIG_DEBUG_INFO_BTF is enabled and CONFIG_BPF_SYSCALL is disabled. This leads to btf.c not being compiled, and then no symbol being present in vmlinux for the declarations in btf.h. Since BTF is not useful without enabling BPF subsystem, disallow this combination. However, theoretically disabling both now could still fail, as the symbol for kfunc_btf_id_list variables is not available. This isn't a problem as the compiler usually optimizes the whole register/unregister call, but at lower optimization levels it can fail the build in linking stage. Fix that by adding dummy variables so that modules taking address of them still work, but the whole thing is a noop. [0]: https://lore.kernel.org/bpf/20211110205418.332403-1-vinicius.gomes@intel.com Fixes: 14f267d95fe4 ("bpf: btf: Introduce helpers for dynamic BTF set registration") Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211122144742.477787-2-memxor@gmail.com