Age | Commit message (Collapse) | Author |
|
Running endpoint security solutions like Sentinel1 that use perf-based
tracing heavily lead to this repeated dump complaining about dockerd.
The default value of 2048 is nowhere near not large enough.
Using the prior patch "tracing: show size of requested buffer", we get
"perf buffer not large enough, wanted 6644, have 6144", after repeated
up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all,
so below is the trace for 6K.
I'm wondering if this value should be selectable at boot time, but this
is a good starting point.
```
------------[ cut here ]------------
perf buffer not large enough, wanted 6644, have 6144
WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 perf_trace_buf_alloc+0x8c/0xa0
Modules linked in: [..]
CPU: 1 PID: 4997 Comm: sh Tainted: G T 5.13.13-x86_64-00039-gb3959163488e #63
Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019
RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0
Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89
RSP: 0018:ffffb922026b7d58 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9da5ee012000 RCX: 0000000000000027
RDX: ffff9da881657828 RSI: 0000000000000001 RDI: ffff9da881657820
RBP: 00000000000019f4 R08: 0000000000000000 R09: ffffb922026b7b80
R10: ffffb922026b7b78 R11: ffffffff91dda688 R12: 000000000000000f
R13: ffff9da5ee012108 R14: ffff9da8816570a0 R15: ffffb922026b7e30
FS: 00007f420db1a080(0000) GS:ffff9da881640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 00000002504a8006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
kprobe_perf_func+0x11e/0x270
? do_execveat_common.isra.0+0x1/0x1c0
? do_execveat_common.isra.0+0x5/0x1c0
kprobe_ftrace_handler+0x10e/0x1d0
0xffffffffc03aa0c8
? do_execveat_common.isra.0+0x1/0x1c0
do_execveat_common.isra.0+0x5/0x1c0
__x64_sys_execve+0x33/0x40
do_syscall_64+0x6b/0xc0
? do_syscall_64+0x11/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f420dc1db37
Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd4e8b4e38 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f420dc1db37
RDX: 0000564338d1e740 RSI: 0000564338d32d50 RDI: 0000564338d28f00
RBP: 0000564338d28f00 R08: 0000564338d32d50 R09: 0000000000000020
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000564338d28f00
R13: 0000564338d32d50 R14: 0000564338d1e740 R15: 0000564338d28c60
---[ end trace 83ab3e8e16275e49 ]---
```
Link: https://lkml.kernel.org/r/20210831043723.13481-2-robbat2@gentoo.org
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.
And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.
This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.
Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com
CC: Petr Mladek <pmladek@suse.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
[ Removed extra line in comment - SDR ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The current text of the explanation of the transition bit in the trace
recursion protection is not very clear. Improve the text, so that when all
the archs no longer have the issue of tracing between a start of a new
(interrupt) context and updating the preempt_count to reflect the new
context, that it may be removed.
Link: https://lore.kernel.org/all/20211018220203.064a42ed@gandalf.local.home/
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Adding interface to modify registered direct function
for ftrace_ops. Adding following function:
modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
The function changes the currently registered direct
function for all attached functions.
Link: https://lkml.kernel.org/r/20211008091336.33616-8-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Adding interface to register multiple direct functions
within single call. Adding following functions:
register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
The register_ftrace_direct_multi registers direct function (addr)
with all functions in ops filter. The ops filter can be updated
before with ftrace_set_filter_ip calls.
All requested functions must not have direct function currently
registered, otherwise register_ftrace_direct_multi will fail.
The unregister_ftrace_direct_multi unregisters ops related direct
functions.
Link: https://lkml.kernel.org/r/20211008091336.33616-7-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
We don't need special hook for graph tracer entry point,
but instead we can use graph_ops::func function to install
the return_hooker.
This moves the graph tracing setup _before_ the direct
trampoline prepares the stack, so the return_hooker will
be called when the direct trampoline is finished.
This simplifies the code, because we don't need to take into
account the direct trampoline setup when preparing the graph
tracer hooker and we can allow function graph tracer on entries
registered with direct trampoline.
Link: https://lkml.kernel.org/r/20211008091336.33616-4-jolsa@kernel.org
[fixed compile error reported by kernel test robot <lkp@intel.com>]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Now that there are three different instances of doing the addition trick
to the preempt_count() and NMI_MASK, HARDIRQ_MASK and SOFTIRQ_OFFSET
macros, it deserves a helper function defined in the preempt.h header.
Add the interrupt_context_level() helper and replace the three instances
that do that logic with it.
Link: https://lore.kernel.org/all/20211015142541.4badd8a9@gandalf.local.home/
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Instead of having branches that adds noise to the branch prediction, use
the addition logic to set the bit for the level of interrupt context that
the state is currently in. This copies the logic from perf's
get_recursion_context() function.
Link: https://lore.kernel.org/all/20211015161702.GF174703@worktop.programming.kicks-ass.net/
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In an effort to enable -Wcast-function-type in the top-level Makefile to
support Control Flow Integrity builds, all function casts need to be
removed.
This means that ftrace_ops_list_func() can no longer be defined as
ftrace_ops_no_ops(). The reason for ftrace_ops_no_ops() is to use that when
an architecture calls ftrace_ops_list_func() with only two parameters
(called from assembly). And to make sure there's no C side-effects, those
archs call ftrace_ops_no_ops() which only has two parameters, as
ftrace_ops_list_func() has four parameters.
Instead of a typecast, use vmlinux.lds.h to define ftrace_ops_list_func() to
arch_ftrace_ops_list_func() that will define the proper set of parameters.
Link: https://lore.kernel.org/r/20200614070154.6039-1-oscar.carter@gmx.com
Link: https://lkml.kernel.org/r/20200617165616.52241bde@oasis.local.home
Link: https://lore.kernel.org/all/20211005053922.GA702049@embeddedor/
Requested-by: Oscar Carter <oscar.carter@gmx.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Cleanup dummy headers in tools/bootconfig/include except
for tools/bootconfig/include/linux/bootconfig.h.
For this change, I use __KERNEL__ macro to split kernel
header #include and introduce xbc_alloc_mem() and
xbc_free_mem().
Link: https://lkml.kernel.org/r/163187299574.2366983.18371329724128746091.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Replace u16 and u32 with uint16_t and uint32_t so
that the tools/bootconfig only needs <stdint.h>.
Link: https://lkml.kernel.org/r/163187298835.2366983.9838262576854319669.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Remove unused xbc_debug_dump() from bootconfig for clean up
the code.
Link: https://lkml.kernel.org/r/163187297371.2366983.12943349701785875450.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Avoid using this noisy name and use more calm one.
This is just a name change. No functional change.
Link: https://lkml.kernel.org/r/163187295918.2366983.5231840238429996027.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Add xbc_get_info() API which allows user to get the
number of used xbc_nodes and the size of bootconfig
data. This is also useful for checking the bootconfig
is initialized or not.
Link: https://lkml.kernel.org/r/163177340877.682366.4360676589783197627.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Allocate 'xbc_data' in the xbc_init() so that it does
not need to care about the ownership of the copied
data.
Link: https://lkml.kernel.org/r/163177339986.682366.898762699429769117.stgit@devnote2
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In x86, the fake return address on the stack saved by
__kretprobe_trampoline() will be replaced with the real return
address after returning from trampoline_handler(). Before fixing
the return address, the real return address can be found in the
'current->kretprobe_instances'.
However, since there is a window between updating the
'current->kretprobe_instances' and fixing the address on the stack,
if an interrupt happens at that timing and the interrupt handler
does stacktrace, it may fail to unwind because it can not get
the correct return address from 'current->kretprobe_instances'.
This will eliminate that window by fixing the return address
right before updating 'current->kretprobe_instances'.
Link: https://lkml.kernel.org/r/163163057094.489837.9044470370440745866.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Add a CONFIG_FRAME_POINTER-specific version of
STACK_FRAME_NON_STANDARD() for the case where a function is
intentionally missing frame pointer setup, but otherwise needs
objtool/ORC coverage when frame pointers are disabled.
Link: https://lkml.kernel.org/r/163163047364.489837.17377799909553689661.stgit@devnote2
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Introduce kretprobe_find_ret_addr() and is_kretprobe_trampoline().
These APIs will be used by the ORC stack unwinder and ftrace, so that
they can check whether the given address points kretprobe trampoline
code and query the correct return address in that case.
Link: https://lkml.kernel.org/r/163163046461.489837.1044778356430293962.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Since now there is kretprobe_trampoline_addr() for referring the
address of kretprobe trampoline code, we don't need to access
kretprobe_trampoline directly.
Make it harder to refer by renaming it to __kretprobe_trampoline().
Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The __kretprobe_trampoline_handler() callback, called from low level
arch kprobes methods, has the 'trampoline_address' parameter, which is
entirely superfluous as it basically just replicates:
dereference_kernel_function_descriptor(kretprobe_trampoline)
In fact we had bugs in arch code where it wasn't replicated correctly.
So remove this superfluous parameter and use kretprobe_trampoline_addr()
instead.
Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
dereference_symbol_descriptor()
~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function:
3d7e33825d87: ("jprobes: make jprobes a little safer for users")
But this is just open-coded dereference_symbol_descriptor() in essence, and
its obscure nature was causing bugs.
Just use the real thing and remove arch_deref_entry_point().
Link: https://lkml.kernel.org/r/163163043630.489837.7924988885652708696.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Use the 'bool' type instead of 'int' for the functions which
returns a boolean value, because this makes clear that those
functions don't return any error code.
Link: https://lkml.kernel.org/r/163163041649.489837.17311187321419747536.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
get_optimized_kprobe()
Since get_optimized_kprobe() is only used inside kprobes,
it doesn't need to use 'unsigned long' type for 'addr' parameter.
Make it use 'kprobe_opcode_t *' for the 'addr' parameter and
subsequent call of arch_within_optimized_kprobe() also should use
'kprobe_opcode_t *'.
Note that MAX_OPTIMIZED_LENGTH and RELATIVEJUMP_SIZE are defined
by byte-size, but the size of 'kprobe_opcode_t' depends on the
architecture. Therefore, we must be careful when calculating
addresses using those macros.
Link: https://lkml.kernel.org/r/163163040680.489837.12133032364499833736.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Use IS_ENABLED(CONFIG_KPROBES) instead of kprobes_built_in().
This inline function is introduced only for avoiding #ifdef.
But since now we have IS_ENABLED(), it is no longer needed.
Link: https://lkml.kernel.org/r/163163038581.489837.2805250706507372658.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix coding style issues reported by checkpatch.pl and update
comments to quote variable names and add "()" to function
name.
One TODO comment in __disarm_kprobe() is removed because
it has been done by following commit.
Link: https://lkml.kernel.org/r/163163037468.489837.4282347782492003960.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
arch_check_ftrace_location() was introduced as a weak function in
commit f7f242ff004499 ("kprobes: introduce weak
arch_check_ftrace_location() helper function") to allow architectures
to handle kprobes call site on their own.
Recently, the only architecture (csky) to implement
arch_check_ftrace_location() was migrated to using the common
version.
As a result, further cleanup the code to drop the weak attribute and
rename the function to remove the architecture specific
implementation.
Link: https://lkml.kernel.org/r/163163035673.489837.2367816318195254104.stgit@devnote2
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The function prepare_kprobe() is called during kprobe registration and
is responsible for ensuring any architecture related preparation for
the kprobe is done before returning.
One of two versions of prepare_kprobe() is chosen depending on the
availability of KPROBE_ON_FTRACE in the kernel configuration.
Simplify the code by dropping the version when KPROBE_ON_FTRACE is not
selected - instead relying on kprobe_ftrace() to return false when
KPROBE_ON_FTRACE is not set.
No functional change.
Link: https://lkml.kernel.org/r/163163033696.489837.9264661820279300788.stgit@devnote2
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for X86:
- Prevent sending the wrong signal when protection keys are enabled
and the kernel handles a fault in the vsyscall emulation.
- Invoke early_reserve_memory() before invoking e820_memory_setup()
which is required to make the Xen dom0 e820 hooks work correctly.
- Use the correct data type for the SETZ operand in the EMQCMDS
instruction wrapper.
- Prevent undefined behaviour to the potential unaligned accesss in
the instruction decoder library"
* tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
x86/asm: Fix SETZ size enqcmds() build failure
x86/setup: Call early_reserve_memory() earlier
x86/fault: Fix wrong signal when vsyscall fails with pkey
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Work around a bad GIC integration on a Renesas platform which can't
handle byte-sized MMIO access
- Plug a potential memory leak in the GICv4 driver
- Fix a regression in the Armada 370-XP IPI code which was caused by
issuing EOI instack of ACK.
- A couple of small fixes here and there"
* tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic: Work around broken Renesas integration
irqchip/renesas-rza1: Use semicolons instead of commas
irqchip/gic-v3-its: Fix potential VPE leak on error
irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
irqchip/mbigen: Repair non-kernel-doc notation
irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent
irqchip/armada-370-xp: Fix ack/eoi breakage
Documentation: Fix irq-domain.rst build warning
|
|
Merge misc fixes from Andrew Morton:
"16 patches.
Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts,
lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache,
debug, and pagemap)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix uninitialized use in overcommit_policy_handler
mm/memory_failure: fix the missing pte_unmap() call
kasan: always respect CONFIG_KASAN_STACK
sh: pgtable-3level: fix cast to pointer from integer of different size
mm/debug: sync up latest migrate_reason to migrate_reason_names
mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
mm: fs: invalidate bh_lrus for only cold path
lib/zlib_inflate/inffast: check config in C to avoid unused function warning
tools/vm/page-types: remove dependency on opt_file for idle page tracking
scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
ocfs2: drop acl cache for directories too
mm/shmem.c: fix judgment error in shmem_is_huge()
xtensa: increase size of gcc stack frame check
mm/damon: don't use strnlen() with known-bogus source length
kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.15-rc3.
Nothing huge in here, just fixes for a number of small issues that
have been reported. These include:
- habanalabs race conditions and other bugs fixed
- binder driver fixes
- fpga driver fixes
- coresight build warning fix
- nvmem driver fix
- comedi memory leak fix
- bcm-vk tty race fix
- other tiny driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
comedi: Fix memory leak in compat_insnlist()
nvmem: NVMEM_NINTENDO_OTP should depend on WII
misc: bcm-vk: fix tty registration race
fpga: dfl: Avoid reads to AFU CSRs during enumeration
fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
fpga: machxo2-spi: Return an error on failure
habanalabs: expose a single cs seq in staged submissions
habanalabs: fix wait offset handling
habanalabs: rate limit multi CS completion errors
habanalabs/gaudi: fix LBW RR configuration
habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK"
habanalabs: fail collective wait when not supported
habanalabs/gaudi: use direct MSI in single mode
habanalabs: fix kernel OOPs related to staged cs
habanalabs: fix potential race in interrupt wait ioctl
mcb: fix error handling in mcb_alloc_bus()
misc: genwqe: Fixes DMA mask setting
coresight: syscfg: Fix compiler warning
nvmem: core: Add stubs for nvmem_cell_read_variable_le_u32/64 if !CONFIG_NVMEM
binder: make sure fd closes complete
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some USB driver fixes and new device ids for 5.15-rc3.
They include:
- usb-storage quirk additions
- usb-serial new device ids
- usb-serial driver fixes
- USB roothub registration bugfix to resolve a long-reported issue
- usb gadget driver fixes for a large number of small things
- dwc2 driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
USB: serial: option: add device id for Foxconn T99W265
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
USB: serial: cp210x: add part-number debug printk
USB: serial: cp210x: fix dropped characters with CP2102
MAINTAINERS: usb, update Peter Korsgaard's entries
usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
USB: serial: option: remove duplicate USB device ID
USB: serial: mos7840: remove duplicated 0xac24 device ID
arm64: dts: qcom: ipq8074: remove USB tx-fifo-resize property
usb: gadget: f_uac2: Populate SS descriptors' wBytesPerInterval
usb: gadget: f_uac2: Add missing companion descriptor for feedback EP
usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd()
xhci: Set HCD flag to defer primary roothub registration
usb: core: hcd: Add support for deferring roothub registration
usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
usb: dwc3: core: balance phy init and exit
Revert "USB: bcma: Add a check for devm_gpiod_get"
...
|
|
Sync up MR_DEMOTION to migrate_reason_names and add a synch prompt.
Link: https://lkml.kernel.org/r/20210921064553.293905-3-o451686892@gmail.com
Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The kernel test robot reported the regression of fio.write_iops[1] with
commit 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration").
Since lru_add_drain is called frequently, invalidate bh_lrus there could
increase bh_lrus cache miss ratio, which needs more IO in the end.
This patch moves the bh_lrus invalidation from the hot path( e.g.,
zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all,
lru_cache_disable).
Zhengjun Xing confirmed
"I test the patch, the regression reduced to -2.9%"
[1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/
[2] 8cc621d2f45d, mm: fs: invalidate BH LRU during page migration
Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Tested-by: "Xing, Zhengjun" <zhengjun.xing@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- Work around a bad GIC integration on a Renesas platform, where the
interconnect cannot deal with byte-sized MMIO accesses
- Cleanup another Renesas driver abusing the comma operator
- Fix a potential GICv4 memory leak on an error path
- Make the type of 'size' consistent with the rest of the code in
__irq_domain_add()
- Fix a regression in the Armada 370-XP IPI path
- Fix the build for the obviously unloved goldfish-pic
- Some documentation fixes
Link: https://lore.kernel.org/r/20210924090933.2766857-1-maz@kernel.org
|
|
Pull rseq fixes from Paolo Bonzini:
"A fix for a bug with restartable sequences and KVM.
KVM's handling of TIF_NOTIFY_RESUME, e.g. for task migration, clears
the flag without informing rseq and leads to stale data in userspace's
rseq struct"
* tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Remove __NR_userfaultfd syscall fallback
KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
tools: Move x86 syscall number fallbacks to .../uapi/
entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
Previous releases - regressions:
- introduce a shutdown method to mdio device drivers, and make DSA
switch drivers compatible with masters disappearing on shutdown;
preventing infinite reference wait
- fix issues in mdiobus users related to ->shutdown vs ->remove
- virtio-net: fix pages leaking when building skb in big mode
- xen-netback: correct success/error reporting for the
SKB-with-fraglist
- dsa: tear down devlink port regions when tearing down the devlink
port on error
- nexthop: fix division by zero while replacing a resilient group
- hns3: check queue, vf, vlan ids range before using
Previous releases - always broken:
- napi: fix race against netpoll causing NAPI getting stuck
- mlx4_en: ensure link operstate is updated even if link comes up
before netdev registration
- bnxt_en: fix TX timeout when TX ring size is set to the smallest
- enetc: fix illegal access when reading affinity_hint; prevent oops
on sysfs access
- mtk_eth_soc: avoid creating duplicate offload entries
Misc:
- core: correct the sock::sk_lock.owned lockdep annotations"
* tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
atlantic: Fix issue in the pm resume flow.
net/mlx4_en: Don't allow aRFS for encapsulated packets
net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
nfc: st-nci: Add SPI ID matching DT compatible
MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
nexthop: Fix memory leaks in nexthop notification chain listeners
mptcp: ensure tx skbs always have the MPTCP ext
qed: rdma - don't wait for resources under hw error recovery flow
s390/qeth: fix deadlock during failing recovery
s390/qeth: Fix deadlock in remove_discipline
s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
net: dsa: realtek: register the MDIO bus under devres
net: dsa: don't allocate the slave_mii_bus using devres
Doc: networking: Fox a typo in ice.rst
net: dsa: fix dsa_tree_setup error path
net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
net/smc: add missing error check in smc_clc_prfx_set()
net: hns3: fix a return value error in hclge_get_reset_status()
net: hns3: check vlan id before using it
...
|
|
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq. The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.
Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing. But, that's been true since commit a42c6ded827d
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012. Punt cleaning that
mess up to future patches.
No functional change intended.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901203030.1292304-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The 'size' is used in struct_size(domain, revmap, size) and its input
parameter type is 'size_t'(unsigned int).
Changing the size to 'unsigned int' to make the type consistent.
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210916025203.44841-1-cuibixuan@huawei.com
|
|
The function __bad_area_nosemaphore() calls kernelmode_fixup_or_oops()
with the parameter @signal being actually @pkey, which will send a
signal numbered with the argument in @pkey.
This bug can be triggered when the kernel fails to access user-given
memory pages that are protected by a pkey, so it can go down the
do_user_addr_fault() path and pass the !user_mode() check in
__bad_area_nosemaphore().
Most cases will simply run the kernel fixup code to make an -EFAULT. But
when another condition current->thread.sig_on_uaccess_err is met, which
is only used to emulate vsyscall, the kernel will generate the wrong
signal.
Add a new parameter @pkey to kernelmode_fixup_or_oops() to fix this.
[ bp: Massage commit message, fix build error as reported by the 0day
bot: https://lkml.kernel.org/r/202109202245.APvuT8BX-lkp@intel.com ]
Fixes: 5042d40a264c ("x86/fault: Bypass no_context() for implicit kernel faults from usermode")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jiashuo Liang <liangjs@pku.edu.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20210730030152.249106-1-liangjs@pku.edu.cn
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prevent a infinite loop in the MCE recovery on return to user space,
which was caused by a second MCE queueing work for the same page and
thereby creating a circular work list.
- Make kern_addr_valid() handle existing PMD entries, which are marked
not present in the higher level page table, correctly instead of
blindly dereferencing them.
- Pass a valid address to sanitize_phys(). This was caused by the
mixture of inclusive and exclusive ranges. memtype_reserve() expect
'end' being exclusive, but sanitize_phys() wants it inclusive. This
worked so far, but with end being the end of the physical address
space the fail is exposed.
- Increase the maximum supported GPIO numbers for 64bit. Newer SoCs
exceed the previous maximum.
* tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid infinite loop for copy from user recovery
x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
x86/platform: Increase maximum GPIO number for X86_64
x86/pat: Pass valid address to sanitize_phys()
|
|
MDIO-attached devices might have interrupts and other things that might
need quiesced when we kexec into a new kernel. Things are even more
creepy when those interrupt lines are shared, and in that case it is
absolutely mandatory to disable all interrupt sources.
Moreover, MDIO devices might be DSA switches, and DSA needs its own
shutdown method to unlink from the DSA master, which is a new
requirement that appeared after commit 2f1e8ea726e9 ("net: dsa: link
interfaces with the DSA master to get rid of lockdep warnings").
So introduce a ->shutdown method in the MDIO device driver structure.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull io_uring iov_iter retry fixes from Jens Axboe:
"This adds a helper to save/restore iov_iter state, and modifies
io_uring to use it.
After that is done, we can now kill the iter->truncated addition that
we added for this release. The io_uring change is being overly
cautious with the save/restore/advance, but better safe than sorry and
we can always improve that and reduce the overhead if it proves to be
of concern. The only case to be worried about in this regard is huge
IO, where iteration can take a while to iterate segments.
I spent some time writing test cases, and expanded the coverage quite
a bit from the last posting of this. liburing carries this regression
test case now:
https://git.kernel.dk/cgit/liburing/tree/test/file-verify.c
which exercises all of this. It now also supports provided buffers,
and explicitly tests for end-of-file/device truncation as well.
On top of that, Pavel sanitized the IOPOLL retry path to follow the
exact same pattern as normal IO"
* tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-block:
io_uring: move iopoll reissue into regular IO path
Revert "iov_iter: track truncated size"
io_uring: use iov_iter state save/restore helpers
iov_iter: add helper to save iov_iter state
|
|
NXP Legal insists that the following are not fine:
- Saying "NXP Semiconductors" instead of "NXP", since the company's
registered name is "NXP"
- Putting a "(c)" sign in the copyright string
- Putting a comma in the copyright string
The only accepted copyright string format is "Copyright <year-range> NXP".
This patch changes the copyright headers in the networking files that
were sent by me, or derived from code sent by me.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf.
Current release - regressions:
- vhost_net: fix OoB on sendmsg() failure
- mlx5: bridge, fix uninitialized variable usage
- bnxt_en: fix error recovery regression
Current release - new code bugs:
- bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset()
Previous releases - regressions:
- r6040: restore MDIO clock frequency after MAC reset
- tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
- dsa: flush switchdev workqueue before tearing down CPU/DSA ports
Previous releases - always broken:
- ptp: dp83640: don't define PAGE0, avoid compiler warning
- igc: fix tunnel segmentation offloads
- phylink: update SFP selected interface on advertising changes
- stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume
- mlx5e: fix mutual exclusion between CQE compression and HW TS
Misc:
- bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode
- sfc: fallback for lack of xdp tx queues
- hns3: add option to turn off page pool feature"
* tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
mlxbf_gige: clear valid_polarity upon open
igc: fix tunnel offloading
net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
net: wan: wanxl: define CROSS_COMPILE_M68K
selftests: nci: replace unsigned int with int
net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
Revert "net: phy: Uniform PHY driver access"
net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
ptp: dp83640: don't define PAGE0
bnx2x: Fix enabling network interfaces without VFs
Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers""
tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
net-caif: avoid user-triggerable WARN_ON(1)
bpf, selftests: Add test case for mixed cgroup v1/v2
bpf, selftests: Add cgroup v1 net_cls classid helpers
bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode
bpf: Add oversize check before call kvcalloc()
net: hns3: fix the timing issue of VF clearing interrupt sources
net: hns3: fix the exception when query imp info
net: hns3: disable mac in flr process
...
|
|
Merge absolute_pointer macro series from Guenter Roeck:
"Kernel test builds currently fail for several architectures with error
messages such as the following.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0
[-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses if gcc's builtin functions are used for
those operations.
This series introduces absolute_pointer() to fix the problem.
absolute_pointer() disassociates a pointer from its originating symbol
type and context, and thus prevents gcc from making assumptions about
pointers passed to memory operations"
* emailed patches from Guenter Roeck <linux@roeck-us.net>:
alpha: Use absolute_pointer to define COMMAND_LINE
alpha: Move setup.h out of uapi
net: i825xx: Use absolute_pointer for memcpy from fixed memory location
compiler.h: Introduce absolute_pointer macro
|
|
absolute_pointer() disassociates a pointer from its originating symbol
type and context. Use it to prevent compiler warnings/errors such as
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa.
We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.
Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:
https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
or just random memory corruption if the debug checks don't catch it:
https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.
I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.
So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer. And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In an ideal world, when someone is passed an iov_iter and returns X bytes,
then X bytes would have been consumed/advanced from the iov_iter. But we
have use cases that always consume the entire iterator, a few examples
of that are iomap and bdev O_DIRECT. This means we cannot rely on the
state of the iov_iter once we've called ->read_iter() or ->write_iter().
This would be easier if we didn't always have to deal with truncate of
the iov_iter, as rewinding would be trivial without that. We recently
added a commit to track the truncate state, but that grew the iov_iter
by 8 bytes and wasn't the best solution.
Implement a helper to save enough of the iov_iter state to sanely restore
it after we've called the read/write iterator helpers. This currently
only works for IOVEC/BVEC/KVEC as that's all we need, support for other
iterator types are left as an exercise for the reader.
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|