Age | Commit message (Collapse) | Author |
|
Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O
if necessary") in drm-misc, so start the backmerge cascade.
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
|
|
Pull smb client fixes from Steve French:
- fix potential mount hang
- fix retry problem in two types of compound operations
- important netfs integration fix in SMB1 read paths
- fix potential uninitialized zero point of inode
- minor patch to improve debugging for potential crediting problems
* tag 'v6.11-rc6-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
netfs, cifs: Improve some debugging bits
cifs: Fix SMB1 readv/writev callback in the same way as SMB2/3
cifs: Fix zero_point init on inode initialisation
smb: client: fix double put of @cfile in smb2_set_path_size()
smb: client: fix double put of @cfile in smb2_rename_path()
smb: client: fix hang in wait_for_response() for negproto
|
|
get_stashed_dentry() tries to optimistically retrieve a stashed dentry
from a provided location. It needs to ensure to hold rcu lock before it
dereference the stashed location to prevent UAF issues. Use
rcu_dereference() instead of READ_ONCE() it's effectively equivalent
with some lockdep bells and whistles and it communicates clearly that
this expects rcu protection.
Link: https://lore.kernel.org/r/20240906-vfs-hotfix-5959800ffa68@brauner
Fixes: 07fd7c329839 ("libfs: add path_from_stashed()")
Reported-by: syzbot+f82b36bffae7ef78b6a7@syzkaller.appspotmail.com
Fixes: syzbot+f82b36bffae7ef78b6a7@syzkaller.appspotmail.com
Reported-by: syzbot+cbe4b96e1194b0e34db6@syzkaller.appspotmail.com
Fixes: syzbot+cbe4b96e1194b0e34db6@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix adding a new fgraph callback after function graph tracing has
already started.
If the new caller does not initialize its hash before registering the
fgraph_ops, it can cause a NULL pointer dereference. Fix this by
adding a new parameter to ftrace_graph_enable_direct() passing in the
newly added gops directly and not rely on using the fgraph_array[],
as entries in the fgraph_array[] must be initialized.
Assign the new gops to the fgraph_array[] after it goes through
ftrace_startup_subops() as that will properly initialize the
gops->ops and initialize its hashes.
- Fix a memory leak in fgraph storage memory test.
If the "multiple fgraph storage on a function" boot up selftest fails
in the registering of the function graph tracer, it will not free the
memory it allocated for the filter. Break the loop up into two where
it allocates the filters first and then registers the functions where
any errors will do the appropriate clean ups.
- Only clear the timerlat timers if it has an associated kthread.
In the rtla tool that uses timerlat, if it was killed just as it was
shutting down, the signals can free the kthread and the timer. But
the closing of the timerlat files could cause the hrtimer_cancel() to
be called on the already freed timer. As the kthread variable is is
set to NULL when the kthreads are stopped and the timers are freed it
can be used to know not to call hrtimer_cancel() on the timer if the
kthread variable is NULL.
- Use a cpumask to keep track of osnoise/timerlat kthreads
The timerlat tracer can use user space threads for its analysis. With
the killing of the rtla tool, the kernel can get confused between if
it is using a user space thread to analyze or one of its own kernel
threads. When this confusion happens, kthread_stop() can be called on
a user space thread and bad things happen. As the kernel threads are
per-cpu, a bitmask can be used to know when a kernel thread is used
or when a user space thread is used.
- Add missing interface_lock to osnoise/timerlat stop_kthread()
The stop_kthread() function in osnoise/timerlat clears the osnoise
kthread variable, and if it was a user space thread does a put_task
on it. But this can race with the closing of the timerlat files that
also does a put_task on the kthread, and if the race happens the task
will have put_task called on it twice and oops.
- Add cond_resched() to the tracing_iter_reset() loop.
The latency tracers keep writing to the ring buffer without resetting
when it issues a new "start" event (like interrupts being disabled).
When reading the buffer with an iterator, the tracing_iter_reset()
sets its pointer to that start event by walking through all the
events in the buffer until it gets to the time stamp of the start
event. In the case of a very large buffer, the loop that looks for
the start event has been reported taking a very long time with a non
preempt kernel that it can trigger a soft lock up warning. Add a
cond_resched() into that loop to make sure that doesn't happen.
- Use list_del_rcu() for eventfs ei->list variable
It was reported that running loops of creating and deleting kprobe
events could cause a crash due to the eventfs list iteration hitting
a LIST_POISON variable. This is because the list is protected by SRCU
but when an item is deleted from the list, it was using list_del()
which poisons the "next" pointer. This is what list_del_rcu() was to
prevent.
* tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
tracing/timerlat: Only clear timer if a kthread exists
tracing/osnoise: Use a cpumask to know what threads are kthreads
eventfs: Use list_del_rcu() for SRCU protected list variable
tracing: Avoid possible softlockup in tracing_iter_reset()
tracing: Fix memory leak in fgraph storage selftest
tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
|
|
Chi Zhiling reported:
We found a null pointer accessing in tracefs[1], the reason is that the
variable 'ei_child' is set to LIST_POISON1, that means the list was
removed in eventfs_remove_rec. so when access the ei_child->is_freed, the
panic triggered.
by the way, the following script can reproduce this panic
loop1 (){
while true
do
echo "p:kp submit_bio" > /sys/kernel/debug/tracing/kprobe_events
echo "" > /sys/kernel/debug/tracing/kprobe_events
done
}
loop2 (){
while true
do
tree /sys/kernel/debug/tracing/events/kprobes/
done
}
loop1 &
loop2
[1]:
[ 1147.959632][T17331] Unable to handle kernel paging request at virtual address dead000000000150
[ 1147.968239][T17331] Mem abort info:
[ 1147.971739][T17331] ESR = 0x0000000096000004
[ 1147.976172][T17331] EC = 0x25: DABT (current EL), IL = 32 bits
[ 1147.982171][T17331] SET = 0, FnV = 0
[ 1147.985906][T17331] EA = 0, S1PTW = 0
[ 1147.989734][T17331] FSC = 0x04: level 0 translation fault
[ 1147.995292][T17331] Data abort info:
[ 1147.998858][T17331] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 1148.005023][T17331] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 1148.010759][T17331] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 1148.016752][T17331] [dead000000000150] address between user and kernel address ranges
[ 1148.024571][T17331] Internal error: Oops: 0000000096000004 [#1] SMP
[ 1148.030825][T17331] Modules linked in: team_mode_loadbalance team nlmon act_gact cls_flower sch_ingress bonding tls macvlan dummy ib_core bridge stp llc veth amdgpu amdxcp mfd_core gpu_sched drm_exec drm_buddy radeon crct10dif_ce video drm_suballoc_helper ghash_ce drm_ttm_helper sha2_ce ttm sha256_arm64 i2c_algo_bit sha1_ce sbsa_gwdt cp210x drm_display_helper cec sr_mod cdrom drm_kms_helper binfmt_misc sg loop fuse drm dm_mod nfnetlink ip_tables autofs4 [last unloaded: tls]
[ 1148.072808][T17331] CPU: 3 PID: 17331 Comm: ls Tainted: G W ------- ---- 6.6.43 #2
[ 1148.081751][T17331] Source Version: 21b3b386e948bedd29369af66f3e98ab01b1c650
[ 1148.088783][T17331] Hardware name: Greatwall GW-001M1A-FTF/GW-001M1A-FTF, BIOS KunLun BIOS V4.0 07/16/2020
[ 1148.098419][T17331] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1148.106060][T17331] pc : eventfs_iterate+0x2c0/0x398
[ 1148.111017][T17331] lr : eventfs_iterate+0x2fc/0x398
[ 1148.115969][T17331] sp : ffff80008d56bbd0
[ 1148.119964][T17331] x29: ffff80008d56bbf0 x28: ffff001ff5be2600 x27: 0000000000000000
[ 1148.127781][T17331] x26: ffff001ff52ca4e0 x25: 0000000000009977 x24: dead000000000100
[ 1148.135598][T17331] x23: 0000000000000000 x22: 000000000000000b x21: ffff800082645f10
[ 1148.143415][T17331] x20: ffff001fddf87c70 x19: ffff80008d56bc90 x18: 0000000000000000
[ 1148.151231][T17331] x17: 0000000000000000 x16: 0000000000000000 x15: ffff001ff52ca4e0
[ 1148.159048][T17331] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 1148.166864][T17331] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000804391d0
[ 1148.174680][T17331] x8 : 0000000180000000 x7 : 0000000000000018 x6 : 0000aaab04b92862
[ 1148.182498][T17331] x5 : 0000aaab04b92862 x4 : 0000000080000000 x3 : 0000000000000068
[ 1148.190314][T17331] x2 : 000000000000000f x1 : 0000000000007ea8 x0 : 0000000000000001
[ 1148.198131][T17331] Call trace:
[ 1148.201259][T17331] eventfs_iterate+0x2c0/0x398
[ 1148.205864][T17331] iterate_dir+0x98/0x188
[ 1148.210036][T17331] __arm64_sys_getdents64+0x78/0x160
[ 1148.215161][T17331] invoke_syscall+0x78/0x108
[ 1148.219593][T17331] el0_svc_common.constprop.0+0x48/0xf0
[ 1148.224977][T17331] do_el0_svc+0x24/0x38
[ 1148.228974][T17331] el0_svc+0x40/0x168
[ 1148.232798][T17331] el0t_64_sync_handler+0x120/0x130
[ 1148.237836][T17331] el0t_64_sync+0x1a4/0x1a8
[ 1148.242182][T17331] Code: 54ffff6c f9400676 910006d6 f9000676 (b9405300)
[ 1148.248955][T17331] ---[ end trace 0000000000000000 ]---
The issue is that list_del() is used on an SRCU protected list variable
before the synchronization occurs. This can poison the list pointers while
there is a reader iterating the list.
This is simply fixed by using list_del_rcu() that is specifically made for
this purpose.
Link: https://lore.kernel.org/linux-trace-kernel/20240829085025.3600021-1-chizhiling@163.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240904131605.640d42b1@gandalf.local.home
Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts")
Reported-by: Chi Zhiling <chizhiling@kylinos.cn>
Tested-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pull bcachefs fixes from Kent Overstreet:
- Fix a typo in the rebalance accounting changes
- BCH_SB_MEMBER_INVALID: small on disk format feature which will be
needed for full erasure coding support; this is only the minimum so
that 6.11 can handle future versions without barfing.
* tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs:
bcachefs: BCH_SB_MEMBER_INVALID
bcachefs: fix rebalance accounting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- followup fix for direct io and fsync under some conditions, reported
by QEMU users
- fix a potential leak when disabling quotas while some extent tracking
work can still happen
- in zoned mode handle unexpected change of zone write pointer in
RAID1-like block groups, turn the zones to read-only
* tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix race between direct IO write and fsync when using same fd
btrfs: zoned: handle broken write pointer on zones
btrfs: qgroup: don't use extent changeset when not needed
|
|
Pull smb server fixes from Steve French:
- Fix crash in session setup
- Fix locking bug
- Improve access bounds checking
* tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Unlock on in ksmbd_tcp_set_interfaces()
ksmbd: unset the binding mark of a reused connection
smb: Annotate struct xattr_smb_acl with __counted_by()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"Two netfs fixes for this merge window:
- Ensure that fscache_cookie_lru_time is deleted when the fscache
module is removed to prevent UAF
- Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
Before it used truncate_inode_pages_partial() which causes
copy_file_range() to fail on cifs"
* tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF
mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes, 15 of which are cc:stable.
Mostly MM, no identifiable theme. And a few nilfs2 fixups"
* tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
mailmap: update entry for Jan Kuliga
codetag: debug: mark codetags for poisoned page as empty
mm/memcontrol: respect zswap.writeback setting from parent cg too
scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum
Revert "mm: skip CMA pages when they are not available"
maple_tree: remove rcu_read_lock() from mt_validate()
kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
nilfs2: fix state management in error path of log writing function
nilfs2: fix missing cleanup on rollforward recovery error
nilfs2: protect references to superblock parameters exposed in sysfs
userfaultfd: don't BUG_ON() if khugepaged yanks our page table
userfaultfd: fix checks for huge PMDs
mm: vmalloc: ensure vmap_block is initialised before adding to queue
selftests: mm: fix build errors on armhf
|
|
Create a sentinal value for "invalid device".
This is needed for removing devices that have stripes on them (force
removing, without evacuating); we need a sentinal value for the stripe
pointers to the device being removed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix EIO if splice and page stealing are enabled on the fuse device
- Disable problematic combination of passthrough and writeback-cache
- Other bug fixes found by code review
* tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: disable the combination of passthrough and writeback cache
fuse: update stats for pages in dropped aux writeback list
fuse: clear PG_uptodate when using a stolen page
fuse: fix memory leak in fuse_create_open
fuse: check aborted connection before adding requests to pending list for resending
fuse: use unsigned type for getxattr/listxattr size truncation
|
|
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's VFS lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's VFS lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's VFS lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
------------[ cut here ]------------
kernel BUG at fs/btrfs/ordered-data.c:983!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Code: 50 d6 86 c0 e8 (...)
RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x24
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x6a/0x90
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? exc_invalid_op+0x50/0x70
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? asm_exc_invalid_op+0x1a/0x20
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? __seccomp_filter+0x31d/0x4f0
__x64_sys_fdatasync+0x4f/0x90
do_syscall_64+0x82/0x160
? do_futex+0xcb/0x190
? __x64_sys_futex+0x10e/0x1d0
? switch_fpu_return+0x4f/0xd0
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of task A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi@web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Improve some debugging bits:
(1) The netfslib _debug() macro doesn't need a newline in its format
string.
(2) Display the request debug ID and subrequest index in messages emitted
in smb2_adjust_credits() to make it easier to reference in traces.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Port a number of SMB2/3 async readv/writev fixes to the SMB1 transport:
commit a88d60903696c01de577558080ec4fc738a70475
cifs: Don't advance the I/O iterator before terminating subrequest
commit ce5291e56081730ec7d87bc9aa41f3de73ff3256
cifs: Defer read completion
commit 1da29f2c39b67b846b74205c81bf0ccd96d34727
netfs, cifs: Fix handling of short DIO read
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix cifs_fattr_to_inode() such that the ->zero_point tracking variable
is initialised when the inode is initialised.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If smb2_compound_op() is called with a valid @cfile and returned
-EINVAL, we need to call cifs_get_writable_path() before retrying it
as the reference of @cfile was already dropped by previous call.
This fixes the following KASAN splat when running fstests generic/013
against Windows Server 2022:
CIFS: Attempting to mount //w22-fs0/scratch
run fstests generic/013 at 2024-09-02 19:48:59
==================================================================
BUG: KASAN: slab-use-after-free in detach_if_pending+0xab/0x200
Write of size 8 at addr ffff88811f1a3730 by task kworker/3:2/176
CPU: 3 UID: 0 PID: 176 Comm: kworker/3:2 Not tainted 6.11.0-rc6 #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40
04/01/2014
Workqueue: cifsoplockd cifs_oplock_break [cifs]
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
? detach_if_pending+0xab/0x200
print_report+0x156/0x4d9
? detach_if_pending+0xab/0x200
? __virt_addr_valid+0x145/0x300
? __phys_addr+0x46/0x90
? detach_if_pending+0xab/0x200
kasan_report+0xda/0x110
? detach_if_pending+0xab/0x200
detach_if_pending+0xab/0x200
timer_delete+0x96/0xe0
? __pfx_timer_delete+0x10/0x10
? rcu_is_watching+0x20/0x50
try_to_grab_pending+0x46/0x3b0
__cancel_work+0x89/0x1b0
? __pfx___cancel_work+0x10/0x10
? kasan_save_track+0x14/0x30
cifs_close_deferred_file+0x110/0x2c0 [cifs]
? __pfx_cifs_close_deferred_file+0x10/0x10 [cifs]
? __pfx_down_read+0x10/0x10
cifs_oplock_break+0x4c1/0xa50 [cifs]
? __pfx_cifs_oplock_break+0x10/0x10 [cifs]
? lock_is_held_type+0x85/0xf0
? mark_held_locks+0x1a/0x90
process_one_work+0x4c6/0x9f0
? find_held_lock+0x8a/0xa0
? __pfx_process_one_work+0x10/0x10
? lock_acquired+0x220/0x550
? __list_add_valid_or_report+0x37/0x100
worker_thread+0x2e4/0x570
? __kthread_parkme+0xd1/0xf0
? __pfx_worker_thread+0x10/0x10
kthread+0x17f/0x1c0
? kthread+0xda/0x1c0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 1118:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0xaa/0xb0
cifs_new_fileinfo+0xc8/0x9d0 [cifs]
cifs_atomic_open+0x467/0x770 [cifs]
lookup_open.isra.0+0x665/0x8b0
path_openat+0x4c3/0x1380
do_filp_open+0x167/0x270
do_sys_openat2+0x129/0x160
__x64_sys_creat+0xad/0xe0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 83:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x70
poison_slab_object+0xe9/0x160
__kasan_slab_free+0x32/0x50
kfree+0xf2/0x300
process_one_work+0x4c6/0x9f0
worker_thread+0x2e4/0x570
kthread+0x17f/0x1c0
ret_from_fork+0x31/0x60
ret_from_fork_asm+0x1a/0x30
Last potentially related work creation:
kasan_save_stack+0x30/0x50
__kasan_record_aux_stack+0xad/0xc0
insert_work+0x29/0xe0
__queue_work+0x5ea/0x760
queue_work_on+0x6d/0x90
_cifsFileInfo_put+0x3f6/0x770 [cifs]
smb2_compound_op+0x911/0x3940 [cifs]
smb2_set_path_size+0x228/0x270 [cifs]
cifs_set_file_size+0x197/0x460 [cifs]
cifs_setattr+0xd9c/0x14b0 [cifs]
notify_change+0x4e3/0x740
do_truncate+0xfa/0x180
vfs_truncate+0x195/0x200
__x64_sys_truncate+0x109/0x150
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 71f15c90e785 ("smb: client: retry compound request without reusing lease")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If smb2_set_path_attr() is called with a valid @cfile and returned
-EINVAL, we need to call cifs_get_writable_path() again as the
reference of @cfile was already dropped by previous smb2_compound_op()
call.
Fixes: 71f15c90e785 ("smb: client: retry compound request without reusing lease")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Call cifs_reconnect() to wake up processes waiting on negotiate
protocol to handle the case where server abruptly shut down and had no
chance to properly close the socket.
Simple reproducer:
ssh 192.168.2.100 pkill -STOP smbd
mount.cifs //192.168.2.100/test /mnt -o ... [never returns]
Cc: Rickard Andersson <rickaran@axis.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Btrfs rejects to mount a FS if it finds a block group with a broken write
pointer (e.g, unequal write pointers on two zones of RAID1 block group).
Since such case can happen easily with a power-loss or crash of a system,
we need to handle the case more gently.
Handle such block group by making it unallocatable, so that there will be
no writes into it. That can be done by setting the allocation pointer at
the end of allocating region (= block_group->zone_capacity). Then, existing
code handle zone_unusable properly.
Having proper zone_capacity is necessary for the change. So, set it as fast
as possible.
We cannot handle RAID0 and RAID10 case like this. But, they are anyway
unable to read because of a missing stripe.
Fixes: 265f7237dd25 ("btrfs: zoned: allow DUP on meta-data block groups")
Fixes: 568220fa9657 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree")
CC: stable@vger.kernel.org # 6.1+
Reported-by: HAN Yuwei <hrx@bupt.moe>
Cc: Xuefer <xuefer@gmail.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The local extent changeset is passed to clear_record_extent_bits() where
it may have some additional memory dynamically allocated for ulist. When
qgroup is disabled, the memory is leaked because in this case the
changeset is not released upon __btrfs_qgroup_release_data() return.
Since the recorded contents of the changeset are not used thereafter, just
don't pass it.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Reported-by: syzbot+81670362c283f3dd889c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000aa8c0c060ade165e@google.com
Fixes: af0e2aab3b70 ("btrfs: qgroup: flush reservations during quota disable")
CC: stable@vger.kernel.org # 6.10+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.
First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared. This causes page cache operations to
hang waiting for the writeback flag. For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.
Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.
Fix these issues by uniformly calling nilfs_segctor_abort_construction()
on failure of each step in the loop in nilfs_segctor_do_construct(),
having it clean up logs and segment usages according to progress, and
correcting the conditions for calling nilfs_redirty_inodes() to ensure
that the NILFS_I_COLLECTED flag is cleared.
Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.
It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.
Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.
Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The superblock buffers of nilfs2 can not only be overwritten at runtime
for modifications/repairs, but they are also regularly swapped, replaced
during resizing, and even abandoned when degrading to one side due to
backing device issues. So, accessing them requires mutual exclusion using
the reader/writer semaphore "nilfs->ns_sem".
Some sysfs attribute show methods read this superblock buffer without the
necessary mutual exclusion, which can cause problems with pointer
dereferencing and memory access, so fix it.
Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com
Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fixes: 49aa7830396b ("bcachefs: Fix rebalance_work accounting")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The fscache_cookie_lru_timer is initialized when the fscache module
is inserted, but is not deleted when the fscache module is removed.
If timer_reduce() is called before removing the fscache module,
the fscache_cookie_lru_timer will be added to the timer list of
the current cpu. Afterwards, a use-after-free will be triggered
in the softIRQ after removing the fscache module, as follows:
==================================================================
BUG: unable to handle page fault for address: fffffbfff803c9e9
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 #855
Tainted: [W]=WARN
RIP: 0010:__run_timer_base.part.0+0x254/0x8a0
Call Trace:
<IRQ>
tmigr_handle_remote_up+0x627/0x810
__walk_groups.isra.0+0x47/0x140
tmigr_handle_remote+0x1fa/0x2f0
handle_softirqs+0x180/0x590
irq_exit_rcu+0x84/0xb0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:default_idle+0xf/0x20
default_idle_call+0x38/0x60
do_idle+0x2b5/0x300
cpu_startup_entry+0x54/0x60
start_secondary+0x20d/0x280
common_startup_64+0x13e/0x148
</TASK>
Modules linked in: [last unloaded: netfs]
==================================================================
Therefore delete fscache_cookie_lru_timer when removing the fscahe module.
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240826112056.2458299-1-libaokun@huaweicloud.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull smb client fixes from Steve French:
- copy_file_range fix
- two read fixes including read past end of file rc fix and read retry
crediting fix
- falloc zero range fix
* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
cifs: Fix copy offload to flush destination region
netfs, cifs: Fix handling of short DIO read
cifs: Fix lack of credit renegotiation on read retry
|
|
Push bcachefs fixes from Kent Overstreet:
"The data corruption in the buffered write path is troubling; inode
lock should not have been able to cause that...
- Fix a rare data corruption in the rebalance path, caught as a nonce
inconsistency on encrypted filesystems
- Revert lockless buffered write path
- Mark more errors as autofix"
* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
bcachefs: Mark more errors as autofix
bcachefs: Revert lockless buffered IO path
bcachefs: Fix bch2_extents_match() false positive
bcachefs: Fix failure to return error in data_update_index_update()
|
|
errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.
note that errors are still logged in the superblock, so we'll still know
that they happened.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We had a report of data corruption on nixos when building installer
images.
https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334
It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.
Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.
It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.
Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.
My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.
Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- One more write delegation fix
* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
|
|
Pull xfs fixes from Chandan Babu:
- Do not call out v1 inodes with non-zero di_nlink field as being
corrupt
- Change xfs_finobt_count_blocks() to count "free inode btree" blocks
rather than "inode btree" blocks
- Don't report the number of trimmed bytes via FITRIM because the
underlying storage isn't required to do anything and failed discard
IOs aren't reported to the caller anyway
- Fix incorrect setting of rm_owner field in an rmap query
- Report missing disk offset range in an fsmap query
- Obtain m_growlock when extending realtime section of the filesystem
- Reset rootdir extent size hint after extending realtime section of
the filesystem
* tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reset rootdir extent size hint after growfsrt
xfs: take m_growlock when running growfsrt
xfs: Fix missing interval for missing_owner in xfs fsmap
xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
xfs: Fix the owner setting issue for rmap query in xfs fsmap
xfs: don't bother reporting blocks trimmed via FITRIM
xfs: xfs_finobt_count_blocks() walks the wrong btree
xfs: fix folio dirtying for XFILE_ALLOC callers
xfs: fix di_onlink checking for V1/V2 inodes
|
|
It is not safe to dereference fl->c.flc_owner without first confirming
fl->fl_lmops is the expected manager. nfsd4_deleg_getattr_conflict()
tests fl_lmops but largely ignores the result and assumes that flc_owner
is an nfs4_delegation anyway. This is wrong.
With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave
as it did before the change mentioned below. This is the same as the
current code, but without any reference to a possible delegation.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Unlock before returning an error code if this allocation fails.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Steve French reported null pointer dereference error from sha256 lib.
cifs.ko can send session setup requests on reused connection.
If reused connection is used for binding session, conn->binding can
still remain true and generate_preauth_hash() will not set
sess->Preauth_HashValue and it will be NULL.
It is used as a material to create an encryption key in
ksmbd_gen_smb311_encryptionkey. ->Preauth_HashValue cause null pointer
dereference error from crypto_shash_update().
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 8 PID: 429254 Comm: kworker/8:39
Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET69W (1.52 )
Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
RIP: 0010:lib_sha256_base_do_update.isra.0+0x11e/0x1d0 [sha256_ssse3]
<TASK>
? show_regs+0x6d/0x80
? __die+0x24/0x80
? page_fault_oops+0x99/0x1b0
? do_user_addr_fault+0x2ee/0x6b0
? exc_page_fault+0x83/0x1b0
? asm_exc_page_fault+0x27/0x30
? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3]
? lib_sha256_base_do_update.isra.0+0x11e/0x1d0 [sha256_ssse3]
? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3]
? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3]
_sha256_update+0x77/0xa0 [sha256_ssse3]
sha256_avx2_update+0x15/0x30 [sha256_ssse3]
crypto_shash_update+0x1e/0x40
hmac_update+0x12/0x20
crypto_shash_update+0x1e/0x40
generate_key+0x234/0x380 [ksmbd]
generate_smb3encryptionkey+0x40/0x1c0 [ksmbd]
ksmbd_gen_smb311_encryptionkey+0x72/0xa0 [ksmbd]
ntlm_authenticate.isra.0+0x423/0x5d0 [ksmbd]
smb2_sess_setup+0x952/0xaa0 [ksmbd]
__process_request+0xa3/0x1d0 [ksmbd]
__handle_ksmbd_work+0x1c4/0x2f0 [ksmbd]
handle_ksmbd_work+0x2d/0xa0 [ksmbd]
process_one_work+0x16c/0x350
worker_thread+0x306/0x440
? __pfx_worker_thread+0x10/0x10
kthread+0xef/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x44/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Fixes: f5a544e3bab7 ("ksmbd: add support for SMB3 multichannel")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add the __counted_by compiler attribute to the flexible array member
entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
- binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov)
* tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined
|
|
The runtime constant feature removes all the users of these variables,
allowing the compiler to optimize them away. It's quite difficult to
extract their values from the kernel text, and the memory saved by
removing them is tiny, and it was never the point of this optimization.
Since the dentry_hashtable is a core data structure, it's valuable for
debugging tools to be able to read it easily. For instance, scripts
built on drgn, like the dentrycache script[1], rely on it to be able to
perform diagnostics on the contents of the dcache. Annotate it as used,
so the compiler doesn't discard it.
Link: https://github.com/oracle-samples/drgn-tools/blob/3afc56146f54d09dfd1f6d3c1b7436eda7e638be/drgn_tools/dentry.py#L325-L355 [1]
Fixes: e3c92e81711d ("runtime constants: add x86 architecture support")
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Current design and handling of passthrough is without fuse
caching and with that FUSE_WRITEBACK_CACHE is conflicting.
Fixes: 7dc4e97a4f9a ("fuse: introduce FUSE_PASSTHROUGH capability")
Cc: stable@kernel.org # v6.9
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Under certain conditions, the range to be cleared by FALLOC_FL_ZERO_RANGE
may only be buffered locally and not yet have been flushed to the server.
For example:
xfs_io -f -t -c "pwrite -S 0x41 0 4k" \
-c "pwrite -S 0x42 4k 4k" \
-c "fzero 0 4k" \
-c "pread -v 0 8k" /xfstest.test/foo
will write two 4KiB blocks of data, which get buffered in the pagecache,
and then fallocate() is used to clear the first 4KiB block on the server -
but we don't flush the data first, which means the EOF position on the
server is wrong, and so the FSCTL_SET_ZERO_DATA RPC fails (and xfs_io
ignores the error), but then when we try to read it, we see the old data.
Fix this by preflushing any part of the target region that above the
server's idea of the EOF position to force the server to update its EOF
position.
Note, however, that we don't want to simply expand the file by moving the
EOF before doing the FSCTL_SET_ZERO_DATA[*] because someone else might see
the zeroed region or if the RPC fails we then have to try to clean it up or
risk getting corruption.
[*] And we have to move the EOF first otherwise FSCTL_SET_ZERO_DATA won't
do what we want.
This fixes the generic/008 xfstest.
[!] Note: A better way to do this might be to split the operation into two
parts: we only do FSCTL_SET_ZERO_DATA for the part of the range below the
server's EOF and then, if that worked, invalidate the buffered pages for the
part above the range.
Fixes: 6b69040247e1 ("cifs/smb3: Fix data inconsistent when zero file range")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
cc: Pavel Shilovsky <pshilov@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a number of crashers
- Update email address for an NFSD reviewer
* tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
fs/nfsd: fix update of inode attrs in CB_GETATTR
nfsd: fix potential UAF in nfsd4_cb_getattr_release
nfsd: hold reference to delegation when updating it for cb_getattr
MAINTAINERS: Update Olga Kornievskaia's email address
nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open
nfsd: ensure that nfsd4_fattr_args.context is zeroed out
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix use-after-free when submitting bios for read, after an error and
partially submitted bio the original one is freed while it can be
still be accessed again
- fix fstests case btrfs/301, with enabled quotas wait for delayed
iputs when flushing delalloc
- fix periodic block group reclaim, an unitialized value can be
returned if there are no block groups to reclaim
- fix build warning (-Wmaybe-uninitialized)
* tag 'for-6.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix uninitialized return value from btrfs_reclaim_sweep()
btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk()
btrfs: initialize last_extent_end to fix -Wmaybe-uninitialized warning in extent_fiemap()
btrfs: run delayed iputs when flushing delalloc
|
|
In the case where the aux writeback list is dropped (e.g. the pages
have been truncated or the connection is broken), the stats for
its pages and backing device info need to be updated as well.
Fixes: e2653bd53a98 ("fuse: fix leaked aux requests")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Originally when a stolen page was inserted into fuse's page cache by
fuse_try_move_page(), it would be marked uptodate. Then
fuse_readpages_end() would call SetPageUptodate() again on the already
uptodate page.
Commit 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use
folio_end_read()") changed that by replacing the SetPageUptodate() +
unlock_page() combination with folio_end_read(), which does mostly the
same, except it sets the uptodate flag with an xor operation, which in the
above scenario resulted in the uptodate flag being cleared, which in turn
resulted in EIO being returned on the read.
Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(),
conforming to the expectation of folio_end_read().
Reported-by: Jürg Billeter <j@bitron.ch>
Debugged-by: Matthew Wilcox <willy@infradead.org>
Fixes: 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()")
Cc: <stable@vger.kernel.org> # v6.10
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The memory of struct fuse_file is allocated but not freed
when get_create_ext return error.
Fixes: 3e2b6fdbdc9a ("fuse: send security context of inode on file")
Cc: stable@vger.kernel.org # v5.17
Signed-off-by: yangyun <yangyun50@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
resending
There is a race condition where inflight requests will not be aborted if
they are in the middle of being re-sent when the connection is aborted.
If fuse_resend has already moved all the requests in the fpq->processing
lists to its private queue ("to_queue") and then the connection starts
and finishes aborting, these requests will be added to the pending queue
and remain on it indefinitely.
Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v6.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when
parsing the FUSE daemon's response to a zero-length getxattr/listxattr
request.
On 32-bit kernels, where ssize_t and outarg.size are the same size, this is
wrong: The min_t() will pass through any size values that are negative when
interpreted as signed.
fuse_listxattr() will then return this userspace-supplied negative value,
which callers will treat as an error value.
This kind of bug pattern can lead to fairly bad security bugs because of
how error codes are used in the Linux kernel. If a caller were to convert
the numeric error into an error pointer, like so:
struct foo *func(...) {
int len = fuse_getxattr(..., NULL, 0);
if (len < 0)
return ERR_PTR(len);
...
}
then it would end up returning this userspace-supplied negative value cast
to a pointer - but the caller of this function wouldn't recognize it as an
error pointer (IS_ERR_VALUE() only detects values in the narrow range in
which legitimate errno values are), and so it would just be treated as a
kernel pointer.
I think there is at least one theoretical codepath where this could happen,
but that path would involve virtio-fs with submounts plus some weird
SELinux configuration, so I think it's probably not a concern in practice.
Cc: stable@vger.kernel.org # v4.9
Fixes: 63401ccdb2ca ("fuse: limit xattr returned size")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Fix cifs_file_copychunk_range() to flush the destination region before
invalidating it to avoid potential loss of data should the copy fail, in
whole or in part, in some way.
Fixes: 7b2404a886f8 ("cifs: Fix flushing, invalidation and file size with copy_file_range()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Short DIO reads, particularly in relation to cifs, are not being handled
correctly by cifs and netfslib. This can be tested by doing a DIO read of
a file where the size of read is larger than the size of the file. When it
crosses the EOF, it gets a short read and this gets retried, and in the
case of cifs, the retry read fails, with the failure being translated to
ENODATA.
Fix this by the following means:
(1) Add a flag, NETFS_SREQ_HIT_EOF, for the filesystem to set when it
detects that the read did hit the EOF.
(2) Make the netfslib read assessment stop processing subrequests when it
encounters one with that flag set.
(3) Return rreq->transferred, the accumulated contiguous amount read to
that point, to userspace for a DIO read.
(4) Make cifs set the flag and clear the error if the read RPC returned
ENODATA.
(5) Make cifs set the flag and clear the error if a short read occurred
without error and the read-to file position is now at the remote inode
size.
Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When netfslib asks cifs to issue a read operation, it prefaces this with a
call to ->clamp_length() which cifs uses to negotiate credits, providing
receive capacity on the server; however, in the event that a read op needs
reissuing, netfslib doesn't call ->clamp_length() again as that could
shorten the subrequest, leaving a gap.
This causes the retried read to be done with zero credits which causes the
server to reject it with STATUS_INVALID_PARAMETER. This is a problem for a
DIO read that is requested that would go over the EOF. The short read will
be retried, causing EINVAL to be returned to the user when it fails.
Fix this by making cifs_req_issue_read() negotiate new credits if retrying
(NETFS_SREQ_RETRYING now gets set in the read side as well as the write
side in this instance).
This isn't sufficient, however: the new credits might not be sufficient to
complete the remainder of the read, so also add an additional field,
rreq->actual_len, that holds the actual size of the op we want to perform
without having to alter subreq->len.
We then rely on repeated short reads being retried until we finish the read
or reach the end of file and make a zero-length read.
Also fix a couple of places where the subrequest start and length need to
be altered by the amount so far transferred when being used.
Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|