summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-10-27tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker togetherRobin H. Johnson
Running endpoint security solutions like Sentinel1 that use perf-based tracing heavily lead to this repeated dump complaining about dockerd. The default value of 2048 is nowhere near not large enough. Using the prior patch "tracing: show size of requested buffer", we get "perf buffer not large enough, wanted 6644, have 6144", after repeated up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all, so below is the trace for 6K. I'm wondering if this value should be selectable at boot time, but this is a good starting point. ``` ------------[ cut here ]------------ perf buffer not large enough, wanted 6644, have 6144 WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 perf_trace_buf_alloc+0x8c/0xa0 Modules linked in: [..] CPU: 1 PID: 4997 Comm: sh Tainted: G T 5.13.13-x86_64-00039-gb3959163488e #63 Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019 RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0 Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89 RSP: 0018:ffffb922026b7d58 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9da5ee012000 RCX: 0000000000000027 RDX: ffff9da881657828 RSI: 0000000000000001 RDI: ffff9da881657820 RBP: 00000000000019f4 R08: 0000000000000000 R09: ffffb922026b7b80 R10: ffffb922026b7b78 R11: ffffffff91dda688 R12: 000000000000000f R13: ffff9da5ee012108 R14: ffff9da8816570a0 R15: ffffb922026b7e30 FS: 00007f420db1a080(0000) GS:ffff9da881640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000060 CR3: 00000002504a8006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: kprobe_perf_func+0x11e/0x270 ? do_execveat_common.isra.0+0x1/0x1c0 ? do_execveat_common.isra.0+0x5/0x1c0 kprobe_ftrace_handler+0x10e/0x1d0 0xffffffffc03aa0c8 ? do_execveat_common.isra.0+0x1/0x1c0 do_execveat_common.isra.0+0x5/0x1c0 __x64_sys_execve+0x33/0x40 do_syscall_64+0x6b/0xc0 ? do_syscall_64+0x11/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f420dc1db37 Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd4e8b4e38 EFLAGS: 00000246 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f420dc1db37 RDX: 0000564338d1e740 RSI: 0000564338d32d50 RDI: 0000564338d28f00 RBP: 0000564338d28f00 R08: 0000564338d32d50 R09: 0000000000000020 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000564338d28f00 R13: 0000564338d32d50 R14: 0000564338d1e740 R15: 0000564338d28c60 ---[ end trace 83ab3e8e16275e49 ]--- ``` Link: https://lkml.kernel.org/r/20210831043723.13481-2-robbat2@gentoo.org Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27ftrace: disable preemption when recursion locked王贇
As the documentation explained, ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() were supposed to disable and enable preemption properly, however currently this work is done outside of the function, which could be missing by mistake. And since the internal using of trace_test_and_set_recursion() and trace_clear_recursion() also require preemption disabled, we can just merge the logical. This patch will make sure the preemption has been disabled when trace_test_and_set_recursion() return bit >= 0, and trace_clear_recursion() will enable the preemption if previously enabled. Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com CC: Petr Mladek <pmladek@suse.com> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: Miroslav Benes <mbenes@suse.cz> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> [ Removed extra line in comment - SDR ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21tracing: Explain the trace recursion transition bit betterSteven Rostedt (VMware)
The current text of the explanation of the transition bit in the trace recursion protection is not very clear. Improve the text, so that when all the archs no longer have the issue of tracing between a start of a new (interrupt) context and updating the preempt_count to reflect the new context, that it may be removed. Link: https://lore.kernel.org/all/20211018220203.064a42ed@gandalf.local.home/ Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21ftrace: Add multi direct modify interfaceJiri Olsa
Adding interface to modify registered direct function for ftrace_ops. Adding following function: modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) The function changes the currently registered direct function for all attached functions. Link: https://lkml.kernel.org/r/20211008091336.33616-8-jolsa@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21ftrace: Add multi direct register/unregister interfaceJiri Olsa
Adding interface to register multiple direct functions within single call. Adding following functions: register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) The register_ftrace_direct_multi registers direct function (addr) with all functions in ops filter. The ops filter can be updated before with ftrace_set_filter_ip calls. All requested functions must not have direct function currently registered, otherwise register_ftrace_direct_multi will fail. The unregister_ftrace_direct_multi unregisters ops related direct functions. Link: https://lkml.kernel.org/r/20211008091336.33616-7-jolsa@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-20x86/ftrace: Make function graph use ftrace directlySteven Rostedt (VMware)
We don't need special hook for graph tracer entry point, but instead we can use graph_ops::func function to install the return_hooker. This moves the graph tracing setup _before_ the direct trampoline prepares the stack, so the return_hooker will be called when the direct trampoline is finished. This simplifies the code, because we don't need to take into account the direct trampoline setup when preparing the graph tracer hooker and we can allow function graph tracer on entries registered with direct trampoline. Link: https://lkml.kernel.org/r/20211008091336.33616-4-jolsa@kernel.org [fixed compile error reported by kernel test robot <lkp@intel.com>] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19tracing/perf: Add interrupt_context_level() helperSteven Rostedt (VMware)
Now that there are three different instances of doing the addition trick to the preempt_count() and NMI_MASK, HARDIRQ_MASK and SOFTIRQ_OFFSET macros, it deserves a helper function defined in the preempt.h header. Add the interrupt_context_level() helper and replace the three instances that do that logic with it. Link: https://lore.kernel.org/all/20211015142541.4badd8a9@gandalf.local.home/ Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19tracing: Reuse logic from perf's get_recursion_context()Steven Rostedt (VMware)
Instead of having branches that adds noise to the branch prediction, use the addition logic to set the bit for the level of interrupt context that the state is currently in. This copies the logic from perf's get_recursion_context() function. Link: https://lore.kernel.org/all/20211015161702.GF174703@worktop.programming.kicks-ass.net/ Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19tracing: Use linker magic instead of recasting ftrace_ops_list_func()Steven Rostedt (VMware)
In an effort to enable -Wcast-function-type in the top-level Makefile to support Control Flow Integrity builds, all function casts need to be removed. This means that ftrace_ops_list_func() can no longer be defined as ftrace_ops_no_ops(). The reason for ftrace_ops_no_ops() is to use that when an architecture calls ftrace_ops_list_func() with only two parameters (called from assembly). And to make sure there's no C side-effects, those archs call ftrace_ops_no_ops() which only has two parameters, as ftrace_ops_list_func() has four parameters. Instead of a typecast, use vmlinux.lds.h to define ftrace_ops_list_func() to arch_ftrace_ops_list_func() that will define the proper set of parameters. Link: https://lore.kernel.org/r/20200614070154.6039-1-oscar.carter@gmx.com Link: https://lkml.kernel.org/r/20200617165616.52241bde@oasis.local.home Link: https://lore.kernel.org/all/20211005053922.GA702049@embeddedor/ Requested-by: Oscar Carter <oscar.carter@gmx.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10bootconfig: Cleanup dummy headers in tools/bootconfigMasami Hiramatsu
Cleanup dummy headers in tools/bootconfig/include except for tools/bootconfig/include/linux/bootconfig.h. For this change, I use __KERNEL__ macro to split kernel header #include and introduce xbc_alloc_mem() and xbc_free_mem(). Link: https://lkml.kernel.org/r/163187299574.2366983.18371329724128746091.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10bootconfig: Replace u16 and u32 with uint16_t and uint32_tMasami Hiramatsu
Replace u16 and u32 with uint16_t and uint32_t so that the tools/bootconfig only needs <stdint.h>. Link: https://lkml.kernel.org/r/163187298835.2366983.9838262576854319669.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10bootconfig: Remove unused debug functionMasami Hiramatsu
Remove unused xbc_debug_dump() from bootconfig for clean up the code. Link: https://lkml.kernel.org/r/163187297371.2366983.12943349701785875450.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10bootconfig: Rename xbc_destroy_all() to xbc_exit()Masami Hiramatsu
Avoid using this noisy name and use more calm one. This is just a name change. No functional change. Link: https://lkml.kernel.org/r/163187295918.2366983.5231840238429996027.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10bootconfig: Add xbc_get_info() for the node informationMasami Hiramatsu
Add xbc_get_info() API which allows user to get the number of used xbc_nodes and the size of bootconfig data. This is also useful for checking the bootconfig is initialized or not. Link: https://lkml.kernel.org/r/163177340877.682366.4360676589783197627.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-10bootconfig: Allocate xbc_data inside xbc_init()Masami Hiramatsu
Allocate 'xbc_data' in the xbc_init() so that it does not need to care about the ownership of the copied data. Link: https://lkml.kernel.org/r/163177339986.682366.898762699429769117.stgit@devnote2 Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30x86/kprobes: Fixup return address in generic trampoline handlerMasami Hiramatsu
In x86, the fake return address on the stack saved by __kretprobe_trampoline() will be replaced with the real return address after returning from trampoline_handler(). Before fixing the return address, the real return address can be found in the 'current->kretprobe_instances'. However, since there is a window between updating the 'current->kretprobe_instances' and fixing the address on the stack, if an interrupt happens at that timing and the interrupt handler does stacktrace, it may fail to unwind because it can not get the correct return address from 'current->kretprobe_instances'. This will eliminate that window by fixing the return address right before updating 'current->kretprobe_instances'. Link: https://lkml.kernel.org/r/163163057094.489837.9044470370440745866.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30objtool: Add frame-pointer-specific function ignoreJosh Poimboeuf
Add a CONFIG_FRAME_POINTER-specific version of STACK_FRAME_NON_STANDARD() for the case where a function is intentionally missing frame pointer setup, but otherwise needs objtool/ORC coverage when frame pointers are disabled. Link: https://lkml.kernel.org/r/163163047364.489837.17377799909553689661.stgit@devnote2 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: Add kretprobe_find_ret_addr() for searching return addressMasami Hiramatsu
Introduce kretprobe_find_ret_addr() and is_kretprobe_trampoline(). These APIs will be used by the ORC stack unwinder and ftrace, so that they can check whether the given address points kretprobe trampoline code and query the correct return address in that case. Link: https://lkml.kernel.org/r/163163046461.489837.1044778356430293962.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Make it harder to refer kretprobe_trampoline directlyMasami Hiramatsu
Since now there is kretprobe_trampoline_addr() for referring the address of kretprobe trampoline code, we don't need to access kretprobe_trampoline directly. Make it harder to refer by renaming it to __kretprobe_trampoline(). Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2 Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()Masami Hiramatsu
The __kretprobe_trampoline_handler() callback, called from low level arch kprobes methods, has the 'trampoline_address' parameter, which is entirely superfluous as it basically just replicates: dereference_kernel_function_descriptor(kretprobe_trampoline) In fact we had bugs in arch code where it wasn't replicated correctly. So remove this superfluous parameter and use kretprobe_trampoline_addr() instead. Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Replace arch_deref_entry_point() with ↵Masami Hiramatsu
dereference_symbol_descriptor() ~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function: 3d7e33825d87: ("jprobes: make jprobes a little safer for users") But this is just open-coded dereference_symbol_descriptor() in essence, and its obscure nature was causing bugs. Just use the real thing and remove arch_deref_entry_point(). Link: https://lkml.kernel.org/r/163163043630.489837.7924988885652708696.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: Use bool type for functions which returns boolean valueMasami Hiramatsu
Use the 'bool' type instead of 'int' for the functions which returns a boolean value, because this makes clear that those functions don't return any error code. Link: https://lkml.kernel.org/r/163163041649.489837.17311187321419747536.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Use 'kprobe_opcode_t *' for the code address in ↵Masami Hiramatsu
get_optimized_kprobe() Since get_optimized_kprobe() is only used inside kprobes, it doesn't need to use 'unsigned long' type for 'addr' parameter. Make it use 'kprobe_opcode_t *' for the 'addr' parameter and subsequent call of arch_within_optimized_kprobe() also should use 'kprobe_opcode_t *'. Note that MAX_OPTIMIZED_LENGTH and RELATIVEJUMP_SIZE are defined by byte-size, but the size of 'kprobe_opcode_t' depends on the architecture. Therefore, we must be careful when calculating addresses using those macros. Link: https://lkml.kernel.org/r/163163040680.489837.12133032364499833736.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: Use IS_ENABLED() instead of kprobes_built_in()Masami Hiramatsu
Use IS_ENABLED(CONFIG_KPROBES) instead of kprobes_built_in(). This inline function is introduced only for avoiding #ifdef. But since now we have IS_ENABLED(), it is no longer needed. Link: https://lkml.kernel.org/r/163163038581.489837.2805250706507372658.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: Fix coding style issuesMasami Hiramatsu
Fix coding style issues reported by checkpatch.pl and update comments to quote variable names and add "()" to function name. One TODO comment in __disarm_kprobe() is removed because it has been done by following commit. Link: https://lkml.kernel.org/r/163163037468.489837.4282347782492003960.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: Make arch_check_ftrace_location staticPunit Agrawal
arch_check_ftrace_location() was introduced as a weak function in commit f7f242ff004499 ("kprobes: introduce weak arch_check_ftrace_location() helper function") to allow architectures to handle kprobes call site on their own. Recently, the only architecture (csky) to implement arch_check_ftrace_location() was migrated to using the common version. As a result, further cleanup the code to drop the weak attribute and rename the function to remove the architecture specific implementation. Link: https://lkml.kernel.org/r/163163035673.489837.2367816318195254104.stgit@devnote2 Signed-off-by: Punit Agrawal <punitagrawal@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobe: Simplify prepare_kprobe() by dropping redundant versionPunit Agrawal
The function prepare_kprobe() is called during kprobe registration and is responsible for ensuring any architecture related preparation for the kprobe is done before returning. One of two versions of prepare_kprobe() is chosen depending on the availability of KPROBE_ON_FTRACE in the kernel configuration. Simplify the code by dropping the version when KPROBE_ON_FTRACE is not selected - instead relying on kprobe_ftrace() to return false when KPROBE_ON_FTRACE is not set. No functional change. Link: https://lkml.kernel.org/r/163163033696.489837.9264661820279300788.stgit@devnote2 Signed-off-by: Punit Agrawal <punitagrawal@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-26Merge tag 'x86-urgent-2021-09-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for X86: - Prevent sending the wrong signal when protection keys are enabled and the kernel handles a fault in the vsyscall emulation. - Invoke early_reserve_memory() before invoking e820_memory_setup() which is required to make the Xen dom0 e820 hooks work correctly. - Use the correct data type for the SETZ operand in the EMQCMDS instruction wrapper. - Prevent undefined behaviour to the potential unaligned accesss in the instruction decoder library" * tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses x86/asm: Fix SETZ size enqcmds() build failure x86/setup: Call early_reserve_memory() earlier x86/fault: Fix wrong signal when vsyscall fails with pkey
2021-09-26Merge tag 'irq-urgent-2021-09-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for interrupt chip drivers: - Work around a bad GIC integration on a Renesas platform which can't handle byte-sized MMIO access - Plug a potential memory leak in the GICv4 driver - Fix a regression in the Armada 370-XP IPI code which was caused by issuing EOI instack of ACK. - A couple of small fixes here and there" * tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic: Work around broken Renesas integration irqchip/renesas-rza1: Use semicolons instead of commas irqchip/gic-v3-its: Fix potential VPE leak on error irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build irqchip/mbigen: Repair non-kernel-doc notation irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent irqchip/armada-370-xp: Fix ack/eoi breakage Documentation: Fix irq-domain.rst build warning
2021-09-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "16 patches. Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts, lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache, debug, and pagemap)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix uninitialized use in overcommit_policy_handler mm/memory_failure: fix the missing pte_unmap() call kasan: always respect CONFIG_KASAN_STACK sh: pgtable-3level: fix cast to pointer from integer of different size mm/debug: sync up latest migrate_reason to migrate_reason_names mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN mm: fs: invalidate bh_lrus for only cold path lib/zlib_inflate/inffast: check config in C to avoid unused function warning tools/vm/page-types: remove dependency on opt_file for idle page tracking scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error ocfs2: drop acl cache for directories too mm/shmem.c: fix judgment error in shmem_is_huge() xtensa: increase size of gcc stack frame check mm/damon: don't use strnlen() with known-bogus source length kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
2021-09-25Merge tag 'char-misc-5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char and misc driver fixes for 5.15-rc3. Nothing huge in here, just fixes for a number of small issues that have been reported. These include: - habanalabs race conditions and other bugs fixed - binder driver fixes - fpga driver fixes - coresight build warning fix - nvmem driver fix - comedi memory leak fix - bcm-vk tty race fix - other tiny driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits) comedi: Fix memory leak in compat_insnlist() nvmem: NVMEM_NINTENDO_OTP should depend on WII misc: bcm-vk: fix tty registration race fpga: dfl: Avoid reads to AFU CSRs during enumeration fpga: machxo2-spi: Fix missing error code in machxo2_write_complete() fpga: machxo2-spi: Return an error on failure habanalabs: expose a single cs seq in staged submissions habanalabs: fix wait offset handling habanalabs: rate limit multi CS completion errors habanalabs/gaudi: fix LBW RR configuration habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK" habanalabs: fail collective wait when not supported habanalabs/gaudi: use direct MSI in single mode habanalabs: fix kernel OOPs related to staged cs habanalabs: fix potential race in interrupt wait ioctl mcb: fix error handling in mcb_alloc_bus() misc: genwqe: Fixes DMA mask setting coresight: syscfg: Fix compiler warning nvmem: core: Add stubs for nvmem_cell_read_variable_le_u32/64 if !CONFIG_NVMEM binder: make sure fd closes complete ...
2021-09-25Merge tag 'usb-5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some USB driver fixes and new device ids for 5.15-rc3. They include: - usb-storage quirk additions - usb-serial new device ids - usb-serial driver fixes - USB roothub registration bugfix to resolve a long-reported issue - usb gadget driver fixes for a large number of small things - dwc2 driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits) USB: serial: option: add device id for Foxconn T99W265 USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter USB: serial: cp210x: add part-number debug printk USB: serial: cp210x: fix dropped characters with CP2102 MAINTAINERS: usb, update Peter Korsgaard's entries usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned() usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c Re-enable UAS for LaCie Rugged USB3-FW with fk quirk USB: serial: option: remove duplicate USB device ID USB: serial: mos7840: remove duplicated 0xac24 device ID arm64: dts: qcom: ipq8074: remove USB tx-fifo-resize property usb: gadget: f_uac2: Populate SS descriptors' wBytesPerInterval usb: gadget: f_uac2: Add missing companion descriptor for feedback EP usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd() xhci: Set HCD flag to defer primary roothub registration usb: core: hcd: Add support for deferring roothub registration usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave usb: dwc3: core: balance phy init and exit Revert "USB: bcma: Add a check for devm_gpiod_get" ...
2021-09-24mm/debug: sync up latest migrate_reason to migrate_reason_namesWeizhao Ouyang
Sync up MR_DEMOTION to migrate_reason_names and add a synch prompt. Link: https://lkml.kernel.org/r/20210921064553.293905-3-o451686892@gmail.com Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Weizhao Ouyang <o451686892@gmail.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Xu <weixugc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-24mm: fs: invalidate bh_lrus for only cold pathMinchan Kim
The kernel test robot reported the regression of fio.write_iops[1] with commit 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration"). Since lru_add_drain is called frequently, invalidate bh_lrus there could increase bh_lrus cache miss ratio, which needs more IO in the end. This patch moves the bh_lrus invalidation from the hot path( e.g., zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all, lru_cache_disable). Zhengjun Xing confirmed "I test the patch, the regression reduced to -2.9%" [1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/ [2] 8cc621d2f45d, mm: fs: invalidate BH LRU during page migration Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Tested-by: "Xing, Zhengjun" <zhengjun.xing@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-24Merge tag 'irqchip-fixes-5.15-1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Work around a bad GIC integration on a Renesas platform, where the interconnect cannot deal with byte-sized MMIO accesses - Cleanup another Renesas driver abusing the comma operator - Fix a potential GICv4 memory leak on an error path - Make the type of 'size' consistent with the rest of the code in __irq_domain_add() - Fix a regression in the Armada 370-XP IPI path - Fix the build for the obviously unloved goldfish-pic - Some documentation fixes Link: https://lore.kernel.org/r/20210924090933.2766857-1-maz@kernel.org
2021-09-23Merge tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull rseq fixes from Paolo Bonzini: "A fix for a bug with restartable sequences and KVM. KVM's handling of TIF_NOTIFY_RESUME, e.g. for task migration, clears the flag without informing rseq and leads to stale data in userspace's rseq struct" * tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Remove __NR_userfaultfd syscall fallback KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs tools: Move x86 syscall number fallbacks to .../uapi/ entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume() KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
2021-09-23Merge tag 'net-5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Current release - regressions: - dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports() Previous releases - regressions: - introduce a shutdown method to mdio device drivers, and make DSA switch drivers compatible with masters disappearing on shutdown; preventing infinite reference wait - fix issues in mdiobus users related to ->shutdown vs ->remove - virtio-net: fix pages leaking when building skb in big mode - xen-netback: correct success/error reporting for the SKB-with-fraglist - dsa: tear down devlink port regions when tearing down the devlink port on error - nexthop: fix division by zero while replacing a resilient group - hns3: check queue, vf, vlan ids range before using Previous releases - always broken: - napi: fix race against netpoll causing NAPI getting stuck - mlx4_en: ensure link operstate is updated even if link comes up before netdev registration - bnxt_en: fix TX timeout when TX ring size is set to the smallest - enetc: fix illegal access when reading affinity_hint; prevent oops on sysfs access - mtk_eth_soc: avoid creating duplicate offload entries Misc: - core: correct the sock::sk_lock.owned lockdep annotations" * tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits) atlantic: Fix issue in the pm resume flow. net/mlx4_en: Don't allow aRFS for encapsulated packets net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries nfc: st-nci: Add SPI ID matching DT compatible MAINTAINERS: remove Guvenc Gulce as net/smc maintainer nexthop: Fix memory leaks in nexthop notification chain listeners mptcp: ensure tx skbs always have the MPTCP ext qed: rdma - don't wait for resources under hw error recovery flow s390/qeth: fix deadlock during failing recovery s390/qeth: Fix deadlock in remove_discipline s390/qeth: fix NULL deref in qeth_clear_working_pool_list() net: dsa: realtek: register the MDIO bus under devres net: dsa: don't allocate the slave_mii_bus using devres Doc: networking: Fox a typo in ice.rst net: dsa: fix dsa_tree_setup error path net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work net/smc: add missing error check in smc_clc_prfx_set() net: hns3: fix a return value error in hclge_get_reset_status() net: hns3: check vlan id before using it ...
2021-09-22entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()Sean Christopherson
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now that the two function are always called back-to-back by architectures that have rseq. The rseq helper is stubbed out for architectures that don't support rseq, i.e. this is a nop across the board. Note, tracehook_notify_resume() is horribly named and arguably does not belong in tracehook.h as literally every line of code in it has nothing to do with tracing. But, that's been true since commit a42c6ded827d ("move key_repace_session_keyring() into tracehook_notify_resume()") first usurped tracehook_notify_resume() back in 2012. Punt cleaning that mess up to future patches. No functional change intended. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901203030.1292304-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22irqdomain: Change the type of 'size' in __irq_domain_add() to be consistentBixuan Cui
The 'size' is used in struct_size(domain, revmap, size) and its input parameter type is 'size_t'(unsigned int). Changing the size to 'unsigned int' to make the type consistent. Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210916025203.44841-1-cuibixuan@huawei.com
2021-09-20x86/fault: Fix wrong signal when vsyscall fails with pkeyJiashuo Liang
The function __bad_area_nosemaphore() calls kernelmode_fixup_or_oops() with the parameter @signal being actually @pkey, which will send a signal numbered with the argument in @pkey. This bug can be triggered when the kernel fails to access user-given memory pages that are protected by a pkey, so it can go down the do_user_addr_fault() path and pass the !user_mode() check in __bad_area_nosemaphore(). Most cases will simply run the kernel fixup code to make an -EFAULT. But when another condition current->thread.sig_on_uaccess_err is met, which is only used to emulate vsyscall, the kernel will generate the wrong signal. Add a new parameter @pkey to kernelmode_fixup_or_oops() to fix this. [ bp: Massage commit message, fix build error as reported by the 0day bot: https://lkml.kernel.org/r/202109202245.APvuT8BX-lkp@intel.com ] Fixes: 5042d40a264c ("x86/fault: Bypass no_context() for implicit kernel faults from usermode") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jiashuo Liang <liangjs@pku.edu.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20210730030152.249106-1-liangjs@pku.edu.cn
2021-09-19Merge tag 'x86_urgent_for_v5.15_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Prevent a infinite loop in the MCE recovery on return to user space, which was caused by a second MCE queueing work for the same page and thereby creating a circular work list. - Make kern_addr_valid() handle existing PMD entries, which are marked not present in the higher level page table, correctly instead of blindly dereferencing them. - Pass a valid address to sanitize_phys(). This was caused by the mixture of inclusive and exclusive ranges. memtype_reserve() expect 'end' being exclusive, but sanitize_phys() wants it inclusive. This worked so far, but with end being the end of the physical address space the fail is exposed. - Increase the maximum supported GPIO numbers for 64bit. Newer SoCs exceed the previous maximum. * tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Avoid infinite loop for copy from user recovery x86/mm: Fix kern_addr_valid() to cope with existing but not present entries x86/platform: Increase maximum GPIO number for X86_64 x86/pat: Pass valid address to sanitize_phys()
2021-09-19net: mdio: introduce a shutdown method to mdio device driversVladimir Oltean
MDIO-attached devices might have interrupts and other things that might need quiesced when we kexec into a new kernel. Things are even more creepy when those interrupt lines are shared, and in that case it is absolutely mandatory to disable all interrupt sources. Moreover, MDIO devices might be DSA switches, and DSA needs its own shutdown method to unlink from the DSA master, which is a new requirement that appeared after commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"). So introduce a ->shutdown method in the MDIO device driver structure. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17Merge tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring iov_iter retry fixes from Jens Axboe: "This adds a helper to save/restore iov_iter state, and modifies io_uring to use it. After that is done, we can now kill the iter->truncated addition that we added for this release. The io_uring change is being overly cautious with the save/restore/advance, but better safe than sorry and we can always improve that and reduce the overhead if it proves to be of concern. The only case to be worried about in this regard is huge IO, where iteration can take a while to iterate segments. I spent some time writing test cases, and expanded the coverage quite a bit from the last posting of this. liburing carries this regression test case now: https://git.kernel.dk/cgit/liburing/tree/test/file-verify.c which exercises all of this. It now also supports provided buffers, and explicitly tests for end-of-file/device truncation as well. On top of that, Pavel sanitized the IOPOLL retry path to follow the exact same pattern as normal IO" * tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-block: io_uring: move iopoll reissue into regular IO path Revert "iov_iter: track truncated size" io_uring: use iov_iter state save/restore helpers iov_iter: add helper to save iov_iter state
2021-09-17net: update NXP copyright textVladimir Oltean
NXP Legal insists that the following are not fine: - Saying "NXP Semiconductors" instead of "NXP", since the company's registered name is "NXP" - Putting a "(c)" sign in the copyright string - Putting a comma in the copyright string The only accepted copyright string format is "Copyright <year-range> NXP". This patch changes the copyright headers in the networking files that were sent by me, or derived from code sent by me. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-16Merge tag 'net-5.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. Current release - regressions: - vhost_net: fix OoB on sendmsg() failure - mlx5: bridge, fix uninitialized variable usage - bnxt_en: fix error recovery regression Current release - new code bugs: - bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset() Previous releases - regressions: - r6040: restore MDIO clock frequency after MAC reset - tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() - dsa: flush switchdev workqueue before tearing down CPU/DSA ports Previous releases - always broken: - ptp: dp83640: don't define PAGE0, avoid compiler warning - igc: fix tunnel segmentation offloads - phylink: update SFP selected interface on advertising changes - stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume - mlx5e: fix mutual exclusion between CQE compression and HW TS Misc: - bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode - sfc: fallback for lack of xdp tx queues - hns3: add option to turn off page pool feature" * tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits) mlxbf_gige: clear valid_polarity upon open igc: fix tunnel offloading net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert net: wan: wanxl: define CROSS_COMPILE_M68K selftests: nci: replace unsigned int with int net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports Revert "net: phy: Uniform PHY driver access" net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup ptp: dp83640: don't define PAGE0 bnx2x: Fix enabling network interfaces without VFs Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() net-caif: avoid user-triggerable WARN_ON(1) bpf, selftests: Add test case for mixed cgroup v1/v2 bpf, selftests: Add cgroup v1 net_cls classid helpers bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode bpf: Add oversize check before call kvcalloc() net: hns3: fix the timing issue of VF clearing interrupt sources net: hns3: fix the exception when query imp info net: hns3: disable mac in flr process ...
2021-09-15Merge branch 'absolute-pointer' (patches from Guenter)Linus Torvalds
Merge absolute_pointer macro series from Guenter Roeck: "Kernel test builds currently fail for several architectures with error messages such as the following. drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe': arch/m68k/include/asm/string.h:72:25: error: '__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread] Such warnings may be reported by gcc 11.x for string and memory operations on fixed addresses if gcc's builtin functions are used for those operations. This series introduces absolute_pointer() to fix the problem. absolute_pointer() disassociates a pointer from its originating symbol type and context, and thus prevents gcc from making assumptions about pointers passed to memory operations" * emailed patches from Guenter Roeck <linux@roeck-us.net>: alpha: Use absolute_pointer to define COMMAND_LINE alpha: Move setup.h out of uapi net: i825xx: Use absolute_pointer for memcpy from fixed memory location compiler.h: Introduce absolute_pointer macro
2021-09-15compiler.h: Introduce absolute_pointer macroGuenter Roeck
absolute_pointer() disassociates a pointer from its originating symbol type and context. Use it to prevent compiler warnings/errors such as drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe': arch/m68k/include/asm/string.h:72:25: error: '__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread] Such warnings may be reported by gcc 11.x for string and memory operations on fixed addresses. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15Revert "iov_iter: track truncated size"Jens Axboe
This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa. We no longer need to track the truncation count, the one user that did need it has been converted to using iov_iter_restore() instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14memblock: introduce saner 'memblock_free_ptr()' interfaceLinus Torvalds
The boot-time allocation interface for memblock is a mess, with 'memblock_alloc()' returning a virtual pointer, but then you are supposed to free it with 'memblock_free()' that takes a _physical_ address. Not only is that all kinds of strange and illogical, but it actually causes bugs, when people then use it like a normal allocation function, and it fails spectacularly on a NULL pointer: https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/ or just random memory corruption if the debug checks don't catch it: https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/ I really don't want to apply patches that treat the symptoms, when the fundamental cause is this horribly confusing interface. I started out looking at just automating a sane replacement sequence, but because of this mix or virtual and physical addresses, and because people have used the "__pa()" macro that can take either a regular kernel pointer, or just the raw "unsigned long" address, it's all quite messy. So this just introduces a new saner interface for freeing a virtual address that was allocated using 'memblock_alloc()', and that was kept as a regular kernel pointer. And then it converts a couple of users that are obvious and easy to test, including the 'xbc_nodes' case in lib/bootconfig.c that caused problems. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14iov_iter: add helper to save iov_iter stateJens Axboe
In an ideal world, when someone is passed an iov_iter and returns X bytes, then X bytes would have been consumed/advanced from the iov_iter. But we have use cases that always consume the entire iterator, a few examples of that are iomap and bdev O_DIRECT. This means we cannot rely on the state of the iov_iter once we've called ->read_iter() or ->write_iter(). This would be easier if we didn't always have to deal with truncate of the iov_iter, as rewinding would be trivial without that. We recently added a commit to track the truncate state, but that grew the iov_iter by 8 bytes and wasn't the best solution. Implement a helper to save enough of the iov_iter state to sanely restore it after we've called the read/write iterator helpers. This currently only works for IOVEC/BVEC/KVEC as that's all we need, support for other iterator types are left as an exercise for the reader. Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>