Age | Commit message (Collapse) | Author |
|
When running the following:
# cd /sys/kernel/tracing/
# echo 1 > events/sched/sched_waking/enable
# echo 1 > events/sched/sched_switch/enable
# echo 0 > tracing_on
# dd if=per_cpu/cpu0/trace_pipe_raw of=/tmp/raw0.dat
The dd task would get stuck in an infinite loop in the kernel. What would
happen is the following:
When ring_buffer_read_page() returns -1 (no data) then a check is made to
see if the buffer is empty (as happens when the page is not full), it will
call wait_on_pipe() to wait until the ring buffer has data. When it is it
will try again to read data (unless O_NONBLOCK is set).
The issue happens when there's a reader and the file descriptor is closed.
The wait_on_pipe() will return when that is the case. But this loop will
continue to try again and wait_on_pipe() will again return immediately and
the loop will continue and never stop.
Simply check if the file was closed before looping and exit out if it is.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240808235730.78bf63e5@rorschach.local.home
Fixes: 2aa043a55b9a7 ("tracing/ring-buffer: Fix wait_on_pipe() race")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.
For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:
[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G N 6.8.12-grsec+ #122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS: 00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>] ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>] ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>] 0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>] <TASK>
[<...>] [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>] [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>] [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>] [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>] [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>] [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>] [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>] [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>] [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>] [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>] [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>] [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>] [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>] [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>] [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>] [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>] [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>] [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>] [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>] [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>] [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>] [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>] [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>] [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>] [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>] [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>] [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>] [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>] [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>] [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>] [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>] </TASK>
[<...>] <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---
The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:
struct tracefs_inode {
union {
struct inode {
struct list_head {
struct list_head * next; /* 0 8 */
struct list_head * prev; /* 8 8 */
} i_lru;
[...]
} vfs_inode;
struct callback_head {
void (*func)(struct callback_head *); /* 0 8 */
struct callback_head * next; /* 8 8 */
} rcu;
};
[...]
};
Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.
For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -629,6 +629,7 @@ struct inode {
umode_t i_mode;
unsigned short i_opflags;
kuid_t i_uid;
+ struct list_head i_lru; /* inode LRU list */
kgid_t i_gid;
unsigned int i_flags;
@@ -690,7 +691,6 @@ struct inode {
u16 i_wb_frn_avg_time;
u16 i_wb_frn_history;
#endif
- struct list_head i_lru; /* inode LRU list */
struct list_head i_sb_list;
struct list_head i_wb_list; /* backing dev writeback list */
union {
The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".
The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.
Link: https://lore.kernel.org/all/20240807115143.45927-3-minipli@grsecurity.net/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <digirigawa@gmail.com>
Link: https://lore.kernel.org/20240807185402.61410544@gandalf.local.home
Fixes: baa23a8d4360 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <minipli@grsecurity.net>
Reported-by: Brad Spengler <spender@grsecurity.net>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Because ring_buffer_nr_pages() is not an inline function and user accesses
buffer->buffers[cpu]->nr_pages directly, the function ring_buffer_nr_pages
is removed.
Signed-off-by: Jianhui Zhou <912460177@qq.com>
Link: https://lore.kernel.org/tencent_F4A7E9AB337F44E0F4B858D07D19EF460708@qq.com
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.
Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.
Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map")
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Link: https://lore.kernel.org/20240805055922.6277-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When ftrace_graph_ret_addr() is invoked to convert a found stack return
address to its original value, the function can end up producing the
following crash:
[ 95.442712] BUG: kernel NULL pointer dereference, address: 0000000000000028
[ 95.442720] #PF: supervisor read access in kernel mode
[ 95.442724] #PF: error_code(0x0000) - not-present page
[ 95.442727] PGD 0 P4D 0-
[ 95.442731] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[ 95.442736] CPU: 1 UID: 0 PID: 2214 Comm: insmod Kdump: loaded Tainted: G OE K 6.11.0-rc1-default #1 67c62a3b3720562f7e7db5f11c1fdb40b7a2857c
[ 95.442747] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH
[ 95.442750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[ 95.442754] RIP: 0010:ftrace_graph_ret_addr+0x42/0xc0
[ 95.442766] Code: [...]
[ 95.442773] RSP: 0018:ffff979b80ff7718 EFLAGS: 00010006
[ 95.442776] RAX: ffffffff8ca99b10 RBX: ffff979b80ff7760 RCX: ffff979b80167dc0
[ 95.442780] RDX: ffffffff8ca99b10 RSI: ffff979b80ff7790 RDI: 0000000000000005
[ 95.442783] RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
[ 95.442786] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffff8e9491e0
[ 95.442790] R13: ffffffff8d6f70f0 R14: ffff979b80167da8 R15: ffff979b80167dc8
[ 95.442793] FS: 00007fbf83895740(0000) GS:ffff8a0afdd00000(0000) knlGS:0000000000000000
[ 95.442797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 95.442800] CR2: 0000000000000028 CR3: 0000000005070002 CR4: 0000000000370ef0
[ 95.442806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 95.442809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 95.442816] Call Trace:
[ 95.442823] <TASK>
[ 95.442896] unwind_next_frame+0x20d/0x830
[ 95.442905] arch_stack_walk_reliable+0x94/0xe0
[ 95.442917] stack_trace_save_tsk_reliable+0x7d/0xe0
[ 95.442922] klp_check_and_switch_task+0x55/0x1a0
[ 95.442931] task_call_func+0xd3/0xe0
[ 95.442938] klp_try_switch_task.part.5+0x37/0x150
[ 95.442942] klp_try_complete_transition+0x79/0x2d0
[ 95.442947] klp_enable_patch+0x4db/0x890
[ 95.442960] do_one_initcall+0x41/0x2e0
[ 95.442968] do_init_module+0x60/0x220
[ 95.442975] load_module+0x1ebf/0x1fb0
[ 95.443004] init_module_from_file+0x88/0xc0
[ 95.443010] idempotent_init_module+0x190/0x240
[ 95.443015] __x64_sys_finit_module+0x5b/0xc0
[ 95.443019] do_syscall_64+0x74/0x160
[ 95.443232] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 95.443236] RIP: 0033:0x7fbf82f2c709
[ 95.443241] Code: [...]
[ 95.443247] RSP: 002b:00007fffd5ea3b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 95.443253] RAX: ffffffffffffffda RBX: 000056359c48e750 RCX: 00007fbf82f2c709
[ 95.443257] RDX: 0000000000000000 RSI: 000056356ed4efc5 RDI: 0000000000000003
[ 95.443260] RBP: 000056356ed4efc5 R08: 0000000000000000 R09: 00007fffd5ea3c10
[ 95.443263] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
[ 95.443267] R13: 000056359c48e6f0 R14: 0000000000000000 R15: 0000000000000000
[ 95.443272] </TASK>
[ 95.443274] Modules linked in: [...]
[ 95.443385] Unloaded tainted modules: intel_uncore_frequency(E):1 isst_if_common(E):1 skx_edac(E):1
[ 95.443414] CR2: 0000000000000028
The bug can be reproduced with kselftests:
cd linux/tools/testing/selftests
make TARGETS='ftrace livepatch'
(cd ftrace; ./ftracetest test.d/ftrace/fgraph-filter.tc)
(cd livepatch; ./test-livepatch.sh)
The problem is that ftrace_graph_ret_addr() is supposed to operate on the
ret_stack of a selected task but wrongly accesses the ret_stack of the
current task. Specifically, the above NULL dereference occurs when
task->curr_ret_stack is non-zero, but current->ret_stack is NULL.
Correct ftrace_graph_ret_addr() to work with the right ret_stack.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reported-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/20240803131211.17255-1-petr.pavlu@suse.com
Fixes: 7aa1eaef9f42 ("function_graph: Allow multiple users to attach to function graph")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
To mirror the SRCU lock held in eventfs_iterate() when iterating over
eventfs inodes, use call_srcu() to free them too.
This was accidentally(?) degraded to RCU in commit 43aa6f97c2d0
("eventfs: Get rid of dentry pointers without refcounts").
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20240723210755.8970-1-minipli@grsecurity.net
Fixes: 43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Commit 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing
ei->dentry") added another check, testing if the parent was freed after
we released the mutex. If so, the function returns NULL. However, all
callers expect it to either return a valid pointer or an error pointer,
at least since commit 5264a2f4bb3b ("tracing: Fix a NULL vs IS_ERR() bug
in event_subsystem_dir()"). Returning NULL will therefore fail the error
condition check in the caller.
Fix this by substituting the NULL return value with a fitting error
pointer.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Fixes: 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing ei->dentry")
Link: https://lore.kernel.org/20240723122522.2724-1-minipli@grsecurity.net
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The leading comment above alloc_inode_sb() is pretty explicit about it:
/*
* This must be used for allocating filesystems specific inodes to set
* up the inode reclaim context correctly.
*/
Switch tracefs over to alloc_inode_sb() to make sure inodes are properly
linked.
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20240807115143.45927-2-minipli@grsecurity.net
Fixes: ba37ff75e04b ("eventfs: Implement tracefs_inode_cache")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Instead of using an atomic counter for the trace_event_file reference
counter, use the refcount interface. It has various checks to make sure
the reference counting is correct, and will warn if it detects an error
(like refcount_inc() on '0').
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240726144208.687cce24@rorschach.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When eventfs was introduced, special care had to be done to coordinate the
freeing of the file meta data with the files that are exposed to user
space. The file meta data would have a ref count that is set when the file
is created and would be decremented and freed after the last user that
opened the file closed it. When the file meta data was to be freed, it
would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed,
and any new references made (like new opens or reads) would fail as it is
marked freed. This allowed other meta data to be freed after this flag was
set (under the event_mutex).
All the files that were dynamically created in the events directory had a
pointer to the file meta data and would call event_release() when the last
reference to the user space file was closed. This would be the time that it
is safe to free the file meta data.
A shortcut was made for the "format" file. It's i_private would point to
the "call" entry directly and not point to the file's meta data. This is
because all format files are the same for the same "call", so it was
thought there was no reason to differentiate them. The other files
maintain state (like the "enable", "trigger", etc). But this meant if the
file were to disappear, the "format" file would be unaware of it.
This caused a race that could be trigger via the user_events test (that
would create dynamic events and free them), and running a loop that would
read the user_events format files:
In one console run:
# cd tools/testing/selftests/user_events
# while true; do ./ftrace_test; done
And in another console run:
# cd /sys/kernel/tracing/
# while true; do cat events/user_events/__test_event/format; done 2>/dev/null
With KASAN memory checking, it would trigger a use-after-free bug report
(which was a real bug). This was because the format file was not checking
the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the
event that the file meta data pointed to after the event was freed.
After inspection, there are other locations that were found to not check
the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a
new helper function: event_file_file() that will make sure that the
event_mutex is held, and will return NULL if the trace_event_file has the
EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file
pointer use event_file_file() and check for NULL. Later uses can still use
the event_file_data() helper function if the event_mutex is still held and
was not released since the event_file_file() call.
Link: https://lore.kernel.org/all/20240719204701.1605950-1-minipli@grsecurity.net/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Ilkka Naulapää <digirigawa@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Alexey Makhalov <alexey.makhalov@broadcom.com>
Cc: Vasavi Sirnapalli <vasavi.sirnapalli@broadcom.com>
Link: https://lore.kernel.org/20240730110657.3b69d3c1@gandalf.local.home
Fixes: b63db58e2fa5d ("eventfs/tracing: Add callback for release of an eventfs_inode")
Reported-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan()
to keep task blocked")
Booting with the 'profile=sleep' kernel command line option added or
executing
# echo -n sleep > /sys/kernel/profiling
after boot causes the system to lock up.
Lockdep reports
kthreadd/3 is trying to acquire lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
but task is already holding lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
with the call trace being
lock_acquire+0xc8/0x2f0
get_wchan+0x32/0x70
__update_stats_enqueue_sleeper+0x151/0x430
enqueue_entity+0x4b0/0x520
enqueue_task_fair+0x92/0x6b0
ttwu_do_activate+0x73/0x140
try_to_wake_up+0x213/0x370
swake_up_locked+0x20/0x50
complete+0x2f/0x40
kthread+0xfb/0x180
However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.
Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.
A recent change in the ACPI code which consolidated code pathes moved
the invocation of init_freq_invariance_cppc() to be moved to a CPU
hotplug handler. The first invocation on AMD CPUs ends up enabling a
static branch which dead locks because the static branch enable tries
to acquire cpu_hotplug_lock but that lock is already held write by
the hotplug machinery.
Use static_branch_enable_cpuslocked() instead and take the hotplug
lock read for the Intel code path which is invoked from the
architecture code outside of the CPU hotplug operations.
- Fix the number of reserved bits in the sev_config structure bit field
so that the bitfield does not exceed 64 bit.
- Add missing Zen5 model numbers
- Fix the alignment assumptions of pti_clone_pgtable() and
clone_entry_text() on 32-bit:
The code assumes PMD aligned code sections, but on 32-bit the kernel
entry text is not PMD aligned. So depending on the code size and
location, which is configuration and compiler dependent, entry text
can cross a PMD boundary. As the start is not PMD aligned adding PMD
size to the start address is larger than the end address which
results in partially mapped entry code for user space. That causes
endless recursion on the first entry from userspace (usually #PF).
Cure this by aligning the start address in the addition so it ends up
at the next PMD start address.
clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
eventually be PTE mapped, which causes a map fail because the PMD for
the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
the clone() invocation which resolves to PTE on 32-bit and PMD on
64-bit.
- Zero the 8-byte case for get_user() on range check failure on 32-bit
The recend consolidation of the 8-byte get_user() case broke the
zeroing in the failure case again. Establish it by clearing ECX
before the range check and not afterwards as that obvioulsy can't be
reached when the range check fails
* tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
x86/mm: Fix pti_clone_entry_text() for i386
x86/mm: Fix pti_clone_pgtable() alignment assumption
x86/setup: Parse the builtin command line before merging
x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
x86/sev: Fix __reserved field in sev_config
x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the timer/clocksource code:
- The recent fix to make the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed in testing as some compilers hoist the access
into the non-preemotible section where the pointer is actually
used, but obviously compilers can rightfully invoke it where the
code put it.
Move it into the non-preemptible section right to the actual usage
side to cure it.
- The clocksource watchdog is supposed to emit a warning when the
retry count is greater than one and the number of retries reaches
the limit.
The condition is backwards and warns always when the count is
greater than one. Fixup the condition to prevent spamming dmesg"
* tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
tick/broadcast: Move per CPU pointer access into the atomic section
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
- When stime is larger than rtime due to accounting imprecision, then
utime = rtime - stime becomes negative. As this is unsigned math, the
result becomes a huge positive number.
Cure it by resetting stime to rtime in that case, so utime becomes 0.
- Restore consistent state when sched_cpu_deactivate() fails.
When offlining a CPU fails in sched_cpu_deactivate() after the SMT
present counter has been decremented, then the function aborts but
fails to increment the SMT present counter and leaves it imbalanced.
Consecutive operations cause it to underflow. Add the missing fixup
for the error path.
For SMT accounting the runqueue needs to marked online again in the
error exit path to restore consistent state.
* tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
sched/core: Introduce sched_set_rq_on/offline() helper
sched/smt: Fix unbalance sched_smt_present dec/inc
sched/smt: Introduce sched_smt_present_inc/dec() helper
sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Thomas Gleixner:
- Move the smp_processor_id() invocation back into the non-preemtible
region, so that the result is valid to use
- Add the missing package C2 residency counters for Sierra Forest CPUs
to make the newly added support actually useful
* tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix smp_processor_id()-in-preemptible warnings
perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A couple of fixes for interrupt chip drivers:
- Make sure to skip the clear register space in the MBIGEN driver
when calculating the node register index. Otherwise the clear
register is clobbered and the wrong node registers are accessed.
- Fix a signed/unsigned confusion in the loongarch CPU driver which
converts an error code to a huge "valid" interrupt number.
- Convert the mesion GPIO interrupt controller lock to a raw spinlock
so it works on RT.
- Add a missing static to a internal function in the pic32 EVIC
driver"
* tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mbigen: Fix mbigen node address layout
irqchip/meson-gpio: Convert meson_gpio_irq_controller::lock to 'raw_spinlock_t'
irqchip/irq-pic32-evic: Add missing 'static' to internal function
irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two fixes for locking and jump labels:
- Ensure that the atomic_cmpxchg() conditions are correct and
evaluating to true on any non-zero value except 1. The missing
check of the return value leads to inconsisted state of the jump
label counter.
- Add a missing type conversion in the paravirt spinlock code which
makes loongson build again"
* tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_label: Fix the fix, brown paper bags galore
locking/pvqspinlock: Correct the type of "old" variable in pv_kick_node()
|
|
Commit 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and
regulator nodenames") renamed nodes and created 2 "clock-24000000" nodes
(at different paths).
The kernel can't handle these duplicate names even though they are at
different paths. Fix this by renaming one of the nodes to "clock-pclk".
This name is aligned with other Arm boards (those didn't have a known
frequency to use in the node name).
Fixes: 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and regulator nodenames")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull smb client fixes from Steve French:
- two reparse point fixes
- minor cleanup
- additional trace point (to help debug a recent problem)
* tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
smb: client: fix FSCTL_GET_REPARSE_POINT against NetApp
smb3: add dynamic tracepoints for shutdown ioctl
cifs: Remove cifs_aio_ctx
smb: client: handle lack of FSCTL_GET_REPARSE_POINT support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- two Kconfig fixes
- one fix for the UVC driver addressing probing time detection of a UVC
custom controls
- one fix related to PDF generation
* tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: v4l: Fix missing tabular column hint for Y14P format
media: intel/ipu6: select AUXILIARY_BUS in Kconfig
media: ipu-bridge: fix ipu6 Kconfig dependencies
media: uvcvideo: Fix custom control mapping probing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"One core change that reverts the double message print patch in sd.c
(it was causing regressions on embedded systems).
The rest are driver fixes in ufs, mpt3sas and mpi3mr"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: exynos: Don't resume FMP when crypto support is disabled
scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
scsi: mpi3mr: Avoid IOMMU page faults on REPORT ZONES
scsi: ufs: core: Do not set link to OFF state while waking up from hibernation
scsi: Revert "scsi: sd: Do not repeat the starting disk message"
scsi: ufs: core: Fix deadlock during RTC update
scsi: ufs: core: Bypass quick recovery if force reset is needed
scsi: ufs: core: Check LSDBS cap when !mcq
|
|
Pull xfs fixes from Chandan Babu:
- Fix memory leak when corruption is detected during scrubbing parent
pointers
- Allow SECURE namespace xattrs to use reserved block pool to in order
to prevent ENOSPC
- Save stack space by passing tracepoint's char array to file_path()
instead of another stack variable
- Remove unused parameter in macro XFS_DQUOT_LOGRES
- Replace comma with semicolon in a couple of places
* tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: convert comma to semicolon
xfs: convert comma to semicolon
xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
xfs: fix file_path handling in tracepoints
xfs: allow SECURE namespace xattrs to use reserved block pool
xfs: fix a memory leak
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- fix unaligned memory accesses when calling BPF functions
- adjust memory size constants to fix possible DMA corruptions
* tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix a possible DMA corruption
parisc: fix unaligned accesses in BPF
|
|
The runtime constants linker script depended on documented linker
behavior [1]:
"If an output section’s name is the same as the input section’s name
and is representable as a C identifier, then the linker will
automatically PROVIDE two symbols: __start_SECNAME and __stop_SECNAME,
where SECNAME is the name of the section. These indicate the start
address and end address of the output section respectively"
to just automatically define the symbol names for the bounds of the
runtime constant arrays.
It turns out that this isn't actually something we can rely on, with old
linkers not generating these automatic symbols. It looks to have been
introduced in binutils-2.29 back in 2017, and we still support building
with versions all the way back to binutils-2.25 (from 2015).
And yes, Oleg actually seems to be using such ancient versions of
binutils.
So instead of depending on the implicit symbols from "section names
match and are representable C identifiers", just do this all manually.
It's not like it causes us any extra pain, we already have to do that
for all the other sections that we use that often have special
characters in them.
Reported-and-tested-by: Oleg Nesterov <oleg@redhat.com>
Link: https://sourceware.org/binutils/docs/ld/Input-Section-Example.html [1]
Link: https://lore.kernel.org/all/20240802114518.GA20924@redhat.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pinchartl/linux.git
uvcvideo v6.11 regression fix: fix custom control mapping probing
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
|
|
Pull io_uring fixes from Jens Axboe:
"Two minor tweaks for the NAPI handling, both from Olivier:
- Kill two unused list definitions
- Ensure that multishot NAPI doesn't age away"
* tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux:
io_uring: remove unused local list heads in NAPI functions
io_uring: keep multishot request NAPI timeout current
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix a few issues related to the MSI IRQs management in the
int340x thermal driver, fix a thermal core issue that may lead to
missing trip point crossing events and update the thermal core
documentation.
Specifics:
- Fix MSI error path cleanup in int340x, allow it to work with a
subset of thermal MSI IRQs if some of them are not working and make
it free all MSI IRQs on module exit (Srinivas Pandruvada)
- Fix a thermal core issue that may lead to missing trip point
crossing events in some cases when thermal_zone_set_trips() is used
and update the thermal core documentation (Rafael Wysocki)"
* tag 'thermal-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Update thermal zone registration documentation
thermal: trip: Avoid skipping trips in thermal_zone_set_trips()
thermal: intel: int340x: Free MSI IRQ vectors on module exit
thermal: intel: int340x: Allow limited thermal MSI support
thermal: intel: int340x: Fix kernel warning during MSI cleanup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Expand the speculative SSBS errata workaround to more CPUs
- Ensure jump label changes are visible to all CPUs with a
kick_all_cpus_sync() (and also enable jump label batching as part of
the fix)
- The shadow call stack sanitiser is currently incompatible with Rust,
make CONFIG_RUST conditional on !CONFIG_SHADOW_CALL_STACK
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: jump_label: Ensure patched jump_labels are visible to all CPUs
rust: SHADOW_CALL_STACK is incompatible with Rust
arm64: errata: Expand speculative SSBS workaround (again)
arm64: cputype: Add Cortex-A725 definitions
arm64: cputype: Add Cortex-X1C definitions
|
|
Pull ceph fix from Ilya Dryomov:
"A fix for a potential hang in the MDS when cap revocation races with
the client releasing the caps in question, marked for stable"
* tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-client:
ceph: force sending a cap update msg back to MDS for revoke op
|
|
Pull kvm updates from Paolo Bonzini:
"The bulk of the changes here is a largish change to guest_memfd,
delaying the clearing and encryption of guest-private pages until they
are actually added to guest page tables. This started as "let's make
it impossible to misuse the API" for SEV-SNP; but then it ballooned a
bit.
The new logic is generally simpler and more ready for hugepage support
in guest_memfd.
Summary:
- fix latent bug in how usage of large pages is determined for
confidential VMs
- fix "underline too short" in docs
- eliminate log spam from limited APIC timer periods
- disallow pre-faulting of memory before SEV-SNP VMs are initialized
- delay clearing and encrypting private memory until it is added to
guest page tables
- this change also enables another small cleanup: the checks in
SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can
now be moved in the common kvm_gmem_populate() function
- fix compilation error that the RISC-V merge introduced in selftests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: fix determination of max NPT mapping level for private pages
KVM: riscv: selftests: Fix compile error
KVM: guest_memfd: abstract how prepared folios are recorded
KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns
KVM: extend kvm_range_has_memory_attributes() to check subset of attributes
KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
KVM: guest_memfd: move check for already-populated page to common code
KVM: remove kvm_arch_gmem_prepare_needed()
KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm
KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest
KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
KVM: guest_memfd: do not go through struct page
KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation
KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
KVM: x86: disallow pre-fault for SNP VMs before initialization
KVM: Documentation: Fix title underline too short warning
KVM: x86: Eliminate log spam from limited APIC timer periods
|
|
* fix latent bug in how usage of large pages is determined for
confidential VMs
* fix "underline too short" in docs
* eliminate log spam from limited APIC timer periods
* disallow pre-faulting of memory before SEV-SNP VMs are initialized
* delay clearing and encrypting private memory until it is added to
guest page tables
* this change also enables another small cleanup: the checks in
SNP_LAUNCH_UPDATE that limit it to non-populated, private pages
can now be moved in the common kvm_gmem_populate() function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid dropping some of the internal pseudo-extensions, which
breaks *envcfg dependency parsing
- The kernel entry address is now aligned in purgatory, which avoids a
misaligned load that can lead to crash on systems that don't support
misaligned accesses early in boot
- The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of
perf JSON configurations, one of them been updated to
FW_SFENCE_VMA_ASID_SENT
- The starfive cache driver is now restricted to 64-bit systems, as it
isn't 32-bit clean
- A fix for to avoid aliasing legacy-mode perf counters with software
perf counters
- VM_FAULT_SIGSEGV is now handled in the page fault code
- A fix for stalls during CPU hotplug due to IPIs being disabled
- A fix for memblock bounds checking. This manifests as a crash on
systems with discontinuous memory maps that have regions that don't
fit in the linear map
* tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix linear mapping checks for non-contiguous memory regions
RISC-V: Enable the IPI before workqueue_online_cpu()
riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()
perf: riscv: Fix selecting counters in legacy mode
cache: StarFive: Require a 64-bit system
perf arch events: Fix duplicate RISC-V SBI firmware event name
riscv/purgatory: align riscv_kernel_entry
riscv: cpufeature: Do not drop Linux-internal extensions
|
|
into HEAD
KVM/riscv fixes for 6.11, take #1
- Fix compile error in get-reg-list selftests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- remove unused empty CPU alternatives header file
- fix recently and erroneously removed exception handling when loading
an invalid floating point register
- ptdump fixes to reflect the recent changes due to the uncoupling of
physical vs virtual kernel address spaces
- changes to avoid the unnecessary splitting of large pages in kernel
mappings
- add the missing MODULE_DESCRIPTION for the CIO modules
* tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Keep inittext section writable
s390/vmlinux.lds.S: Move ro_after_init section behind rodata section
s390/mm: Get rid of RELOC_HIDE()
s390/mm/ptdump: Improve sorting of markers
s390/mm/ptdump: Add support for relocated lowcore mapping
s390/mm/ptdump: Fix handling of identity mapping area
s390/cio: Add missing MODULE_DESCRIPTION() macros
s390/alternatives: Remove unused empty header file
s390/fpu: Re-add exception handling in load_fpu_state()
|
|
The current "nretries > 1 || nretries >= max_retries" check in
cs_watchdog_read() will always evaluate to true, and thus pr_warn(), if
nretries is greater than 1. The intent is instead to never warn on the
first try, but otherwise warn if the successful retry was the last retry.
Therefore, change that "||" to "&&".
Fixes: db3a34e17433 ("clocksource: Retry clock read if long delays detected")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240802154618.4149953-2-paulmck@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"These are three important bug fixes for the cross-architecture tree,
fixing a regression with the new syscall.tbl file, the inconsistent
numbering for the new uretprobe syscall and a bug with iowrite64be on
alpha"
* tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: fix syscall macros for newfstat/newfstatat
uretprobe: change syscall number, again
alpha: fix ioread64be()/iowrite64be() helpers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A small collection of fixes:
- Revert of FireWire changes that caused a long-time regression
- Another long-time regression fix for AMD HDMI
- MIDI2 UMP fixes
- HD-audio Conexant codec fixes and a quirk"
* tag 'sound-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Conditionally use snooping for AMD HDMI
ALSA: usb-audio: Correct surround channels in UAC1 channel map
ALSA: seq: ump: Explicitly reset RPN with Null RPN
ALSA: seq: ump: Transmit RPN/NRPN message at each MSB/LSB data reception
ALSA: seq: ump: Use the common RPN/bank conversion context
ALSA: ump: Explicitly reset RPN with Null RPN
ALSA: ump: Transmit RPN/NRPN message at each MSB/LSB data reception
Revert "ALSA: firewire-lib: operate for period elapse event in process context"
Revert "ALSA: firewire-lib: obsolete workqueue for period update"
ALSA: hda/realtek: Add quirk for Acer Aspire E5-574G
ALSA: seq: ump: Optimize conversions from SysEx to UMP
ALSA: hda/conexant: Mute speakers at suspend / shutdown
ALSA: hda/generic: Add a helper to mute speakers at suspend/shutdown
ALSA: hda: conexant: Fix headset auto detect fail in the polling mode
|
|
Pull drm fixes from Dave Airlie:
"Regular weekly fixes. This is a bit larger than usual but doesn't seem
too crazy.
Most of it is vmwgfx changes that fix a bunch of issues with wayland
userspaces with dma-buf/external buffers and modesetting fixes.
Otherwise it's kinda spread out, v3d fixes some new ioctls, nouveau
has regression revert and fixes, amdgpu, i915 and ast have some small
fixes, and some core fixes spread about.
client:
- fix error code
atomic:
- allow damage clips with async flips
- allow explicit sync with async flips
kselftests:
- fix dmabuf-heaps test
panic:
- fix schedule_work in panic paths
panel:
- fix OrangePi Neo orientation
gpuvm:
- fix missing dependency
amdgpu:
- SMU 14.x update
- Fix contiguous VRAM handling for IB parsing
- GFX 12 fix
- Regression fix for old APUs
i915:
- Static analysis fix for int overflow
- Fix for HDCP2_STREAM_STATUS macro and removal of PWR_CLK_STATE for gen12
nouveau:
- revert busy wait change that caused a resume regression
- fix buffer placement fault on dynamic pm s/r
- fix refcount underflow
ast:
- fix black screen on resume
- wake during connector status detect
v3d:
- fix issues with perf/timestamp ioctls
vmwgfx:
- fix deadlock in dma-buf fence polling
- fix screen surface refcounting
- fix dumb buffer handling
- fix support for external buffers
- fix overlay with screen targets
- trigger modeset on screen moves"
* tag 'drm-fixes-2024-08-02' of https://gitlab.freedesktop.org/drm/kernel: (31 commits)
Revert "nouveau: rip out busy fence waits"
nouveau: set placement to original placement on uvmm validate.
drm/atomic: Allow userspace to use damage clips with async flips
drm/atomic: Allow userspace to use explicit sync with atomic async flips
drm/i915: Fix possible int overflow in skl_ddi_calculate_wrpll()
drm/i915/hdcp: Fix HDCP2_STREAM_STATUS macro
drm/ast: astdp: Wake up during connector status detection
i915/perf: Remove code to update PWR_CLK_STATE for gen12
kselftests: dmabuf-heaps: Ensure the driver name is null-terminated
drm/client: Fix error code in drm_client_buffer_vmap_local()
drm/amdgpu: Fix APU handling in amdgpu_pm_load_smu_firmware()
drm/amdgpu: increase mes log buffer size for gfx12
drm/amdgpu: fix contiguous handling for IB parsing v2
drm/amdgpu/pm: support gpu_metrics sysfs interface for smu v14.0.2/3
drm/vmwgfx: Trigger a modeset when the screen moves
drm/vmwgfx: Fix overlay when using Screen Targets
drm/vmwgfx: Add basic support for external buffers
drm/vmwgfx: Fix handling of dumb buffers
drm/vmwgfx: Make sure the screen surface is ref counted
drm/vmwgfx: Fix a deadlock in dma buf fence polling
...
|
|
To 2.50
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
NetApp server requires the file to be open with FILE_READ_EA access in
order to support FSCTL_GET_REPARSE_POINT, otherwise it will return
STATUS_INVALID_DEVICE_REQUEST. It doesn't make any sense because
there's no requirement for FILE_READ_EA bit to be set nor
STATUS_INVALID_DEVICE_REQUEST being used for something other than
"unsupported reparse points" in MS-FSA.
To fix it and improve compatibility, set FILE_READ_EA & SYNCHRONIZE
bits to match what Windows client currently does.
Tested-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
For debugging an umount failure in xfstests generic/043 generic/044 in some
configurations, we needed more information on the shutdown ioctl which
was suspected of being related to the cause, so tracepoints are added
in this patch e.g.
"trace-cmd record -e smb3_shutdown_enter -e smb3_shutdown_done -e smb3_shutdown_err"
Sample output:
godown-47084 [011] ..... 3313.756965: smb3_shutdown_enter: flags=0x1 tid=0x733b3e75
godown-47084 [011] ..... 3313.756968: smb3_shutdown_done: flags=0x1 tid=0x733b3e75
Tested-by: Anthony Nandaa (Microsoft) <profnandaa@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Remove struct cifs_aio_ctx and its associated alloc/release functions as it
is no longer used, the functions being taken over by netfslib.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
As per MS-FSA 2.1.5.10.14, support for FSCTL_GET_REPARSE_POINT is
optional and if the server doesn't support it,
STATUS_INVALID_DEVICE_REQUEST must be returned for the operation.
If we find files with reparse points and we can't read them due to
lack of client or server support, just ignore it and then treat them
as regular files or junctions.
Fixes: 5f71ebc41294 ("smb: client: parse reparse point flag in create response")
Reported-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de>
Tested-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Damien Le Moal:
- Add missing power-domains property to the device tree bindings for
the Rockchip Designware AHCI adapter (from Heiko)
* tag 'ata-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
dt-bindings: ata: rockchip-dwc-ahci: add missing power-domains
|
|
Pull vfs fix from Al Viro:
"do_dup2() out-of-bounds array speculation fix"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
protect the fetch of ->fd[fd] in do_dup2() from mispredictions
|
|
Although the Arm architecture permits concurrent modification and
execution of NOP and branch instructions, it still requires some
synchronisation to ensure that other CPUs consistently execute the newly
written instruction:
> When the modified instructions are observable, each PE that is
> executing the modified instructions must execute an ISB or perform a
> context synchronizing event to ensure execution of the modified
> instructions
Prior to commit f6cc0c501649 ("arm64: Avoid calling stop_machine() when
patching jump labels"), the arm64 jump_label patching machinery
performed synchronisation using stop_machine() after each modification,
however this was problematic when flipping static keys from atomic
contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
so we switched to the _nosync() patching routines to avoid "scheduling
while atomic" BUG()s during boot.
In hindsight, the analysis of the issue in f6cc0c501649 isn't quite
right: it cites the use of IPIs in the default patching routines as the
cause of the lockup, whereas stop_machine() does not rely on IPIs and
the I-cache invalidation is performed using __flush_icache_range(),
which elides the call to kick_all_cpus_sync(). In fact, the blocking
wait for other CPUs is what triggers the BUG() and the problem remains
even after f6cc0c501649, for example because we could block on the
jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
avoid the static key entirely in commit a862fc2254bd
("clocksource/arm_arch_timer: Remove use of workaround static key").
This all leaves the jump_label patching code in a funny situation on
arm64 as we do not synchronise with other CPUs to reduce the likelihood
of a bug which no longer exists. Consequently, toggling a static key on
one CPU cannot be assumed to take effect on other CPUs, leading to
potential issues, for example with missing preempt notifiers.
Rather than revert f6cc0c501649 and go back to stop_machine() for each
patch site, implement arch_jump_label_transform_apply() and kick all
the other CPUs with an IPI at the end of patching.
Cc: Alexander Potapenko <glider@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Fixes: f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240731133601.3073-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The __NR_newfstat and __NR_newfstatat macros accidentally got renamed
in the conversion to the syscall.tbl format, dropping the 'new' portion
of the name.
In an unrelated change, the two syscalls are no longer architecture
specific but are once more defined on all 64-bit architectures, so the
'newstat' ABI keyword can be dropped from the table as a simplification.
Fixes: Fixes: 4fe53bf2ba0a ("syscalls: add generic scripts/syscall.tbl")
Closes: https://lore.kernel.org/lkml/838053e0-b186-4e9f-9668-9a3384a71f23@app.fastmail.com/T/#t
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Despite multiple attempts to get the syscall number assignment right
for the newly added uretprobe syscall, we ended up with a bit of a mess:
- The number is defined as 467 based on the assumption that the
xattrat family of syscalls would use 463 through 466, but those
did not make it into 6.11.
- The include/uapi/asm-generic/unistd.h file still lists the number
463, but the new scripts/syscall.tbl that was supposed to have the
same data lists 467 instead as the number for arc, arm64, csky,
hexagon, loongarch, nios2, openrisc and riscv. None of these
architectures actually provide a uretprobe syscall.
- All the other architectures (powerpc, arm, mips, ...) don't list
this syscall at all.
There are two ways to make it consistent again: either list it with
the same syscall number on all architectures, or only list it on x86
but not in scripts/syscall.tbl and asm-generic/unistd.h.
Based on the most recent discussion, it seems like we won't need it
anywhere else, so just remove the inconsistent assignment and instead
move the x86 number to the next available one in the architecture
specific range, which is 335.
Fixes: 5c28424e9a34 ("syscalls: Fix to add sys_uretprobe to syscall.tbl")
Fixes: 190fec72df4a ("uprobe: Wire up uretprobe system call")
Fixes: 63ded110979b ("uprobe: Change uretprobe syscall scope and number")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The thermal sysfs API document is outdated. One of the problems with
it is that is still documents thermal_zone_device_register() which
does not exit any more and it does not reflect the current thermal
zone operations definition.
Replace the thermal_zone_device_register() description in it with
a thermal_zone_device_register_with_trips() description, including
an update of the thermal zone operations list.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2767845.mvXUDI8C0e@rjwysocki.net
|