Age | Commit message (Collapse) | Author |
|
The (!fiq->connected) check was moved into the queuing method resulting in
the following:
Fixes: 5de8acb41c86 ("fuse: cleanup request queuing towards virtiofs")
Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/all/ZvFEAM6JfrBKsOU0@ly-workstation/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We should convert fs/fuse code to use a newly introduced
invalid_mnt_idmap instead of passing a NULL as idmap pointer.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Link: https://lore.kernel.org/linux-fsdevel/20240904-baugrube-erhoben-b3c1c49a2645@brauner/
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Let's convert all existing callers properly.
No functional changes intended.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
It was reported [1] that on linux-next/fs-next the following crash
is reproducible:
[ 42.659136] Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 42.660501] fbcon: Taking over console
[ 42.660930] KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f]
[ 42.661752] CPU: 1 UID: 0 PID: 1589 Comm: dtprobed Not tainted 6.11.0-rc6+ #1
[ 42.662565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.6.6 08/22/2023
[ 42.663472] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse]
[ 42.664046] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83
[ 42.666945] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212
[ 42.668837] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a
[ 42.670122] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058
[ 42.672154] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172
[ 42.672160] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001
[ 42.672179] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840
[ 42.672186] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000
[ 42.672191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.[ CR02: ;32m00007f3237366208 CR3: 0 OK 79e001 CR4: 0000000000770ef0
[ 42.672214] PKRU: 55555554
[ 42.672218] Call Trace:
[ 42.672223] <TASK>
[ 42.672226] ? die_addr+0x41/0xa0
[ 42.672238] ? exc_general_protection+0x14c/0x230
[ 42.672250] ? asm_exc_general_protection+0x26/0x30
[ 42.672260] ? fuse_get_req+0x77a/0x990 [fuse]
[ 42.672281] ? fuse_get_req+0x36b/0x990 [fuse]
[ 42.672300] ? kasan_unpoison+0x27/0x60
[ 42.672310] ? __pfx_fuse_get_req+0x10/0x10 [fuse]
[ 42.672327] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672333] ? alloc_pages_mpol_noprof+0x195/0x440
[ 42.672340] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672345] ? kasan_unpoison+0x27/0x60
[ 42.672350] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672355] ? __kasan_slab_alloc+0x4d/0x90
[ 42.672362] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672367] ? __kmalloc_cache_noprof+0x134/0x350
[ 42.672376] fuse_simple_background+0xe7/0x180 [fuse]
[ 42.672406] cuse_channel_open+0x540/0x710 [cuse]
[ 42.672415] misc_open+0x2a7/0x3a0
[ 42.672424] chrdev_open+0x1ef/0x5f0
[ 42.672432] ? __pfx_chrdev_open+0x10/0x10
[ 42.672439] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672443] ? security_file_open+0x3bb/0x720
[ 42.672451] do_dentry_open+0x43d/0x1200
[ 42.672459] ? __pfx_chrdev_open+0x10/0x10
[ 42.672468] vfs_open+0x79/0x340
[ 42.672475] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672482] do_open+0x68c/0x11e0
[ 42.672489] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672495] ? __pfx_do_open+0x10/0x10
[ 42.672501] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672506] ? open_last_lookups+0x2a2/0x1370
[ 42.672515] path_openat+0x24f/0x640
[ 42.672522] ? __pfx_path_openat+0x10/0x10
[ 42.723972] ? stack_depot_save_flags+0x45/0x4b0
[ 42.724787] ? __fput+0x43c/0xa70
[ 42.725100] do_filp_open+0x1b3/0x3e0
[ 42.725710] ? poison_slab_object+0x10d/0x190
[ 42.726145] ? __kasan_slab_free+0x33/0x50
[ 42.726570] ? __pfx_do_filp_open+0x10/0x10
[ 42.726981] ? do_syscall_64+0x64/0x170
[ 42.727418] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 42.728018] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.728505] ? do_raw_spin_lock+0x131/0x270
[ 42.728922] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 42.729494] ? do_raw_spin_unlock+0x14c/0x1f0
[ 42.729992] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.730889] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.732178] ? alloc_fd+0x176/0x5e0
[ 42.732585] do_sys_openat2+0x122/0x160
[ 42.732929] ? __pfx_do_sys_openat2+0x10/0x10
[ 42.733448] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.734013] ? __pfx_map_id_up+0x10/0x10
[ 42.734482] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.735529] ? __memcg_slab_free_hook+0x292/0x500
[ 42.736131] __x64_sys_openat+0x123/0x1e0
[ 42.736526] ? __pfx___x64_sys_openat+0x10/0x10
[ 42.737369] ? __x64_sys_close+0x7c/0xd0
[ 42.737717] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.738192] ? syscall_trace_enter+0x11e/0x1b0
[ 42.738739] do_syscall_64+0x64/0x170
[ 42.739113] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 42.739638] RIP: 0033:0x7f28cd13e87b
[ 42.740038] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25
[ 42.741943] RSP: 002b:00007ffc992546c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 42.742951] RAX: ffffffffffffffda RBX: 00007f28cd44f1ee RCX: 00007f28cd13e87b
[ 42.743660] RDX: 0000000000000002 RSI: 00007f28cd44f2fa RDI: 00000000ffffff9c
[ 42.744518] RBP: 00007f28cd44f2fa R08: 0000000000000000 R09: 0000000000000001
[ 42.745211] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[ 42.745920] R13: 00007f28cd44f2fa R14: 0000000000000000 R15: 0000000000000003
[ 42.746708] </TASK>
[ 42.746937] Modules linked in: cuse vfat fat ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common kvm_amd ccp bochs drm_vram_helper kvm drm_ttm_helper ttm pcspkr i2c_piix4 drm_kms_helper i2c_smbus pvpanic_mmio pvpanic joydev sch_fq_codel drm fuse xfs nvme_tcp nvme_fabrics nvme_core sd_mod sg virtio_net net_failover virtio_scsi failover crct10dif_pclmul crc32_pclmul ata_generic pata_acpi ata_piix ghash_clmulni_intel virtio_pci sha512_ssse3 virtio_pci_legacy_dev sha256_ssse3 virtio_pci_modern_dev sha1_ssse3 libata serio_raw dm_multipath btrfs blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qemu_fw_cfg aesni_intel crypto_simd cryptd
[ 42.754333] ---[ end trace 0000000000000000 ]---
[ 42.756899] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse]
[ 42.757851] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83
[ 42.760334] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212
[ 42.760940] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a
[ 42.761697] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058
[ 42.763009] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172
[ 42.763920] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001
[ 42.764839] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840
[ 42.765716] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000
[ 42.766890] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.767828] CR2: 00007f3237366208 CR3: 000000012c79e001 CR4: 0000000000770ef0
[ 42.768730] PKRU: 55555554
[ 42.769022] Kernel panic - not syncing: Fatal exception
[ 42.770758] Kernel Offset: 0x7200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 42.771947] ---[ end Kernel panic - not syncing: Fatal exception ]---
It's obviously CUSE related callstack. For CUSE case, we don't have superblock and
our checks for SB_I_NOIDMAP flag does not make any sense. Let's handle this case gracefully.
Fixes: aa16880d9f13 ("fuse: add basic infrastructure to support idmappings")
Link: https://lore.kernel.org/linux-next/87v7z586py.fsf@debian-BULLSEYE-live-builder-AMD64/ [1]
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Reported-by: syzbot+20c7e20cc8f5296dca12@syzkaller.appspotmail.com
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Only f_path is used from backing files registered with
FUSE_DEV_IOC_BACKING_OPEN, so it makes sense to allow O_PATH descriptors.
O_PATH files have an empty f_op, so don't check read_iter/write_iter.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Allow idmapped mounts for virtiofs.
It's absolutely safe as for virtiofs we have the same
feature negotiation mechanism as for classical fuse
filesystems. This does not affect any existing
setups anyhow.
virtiofsd support:
https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Now we have everything in place and we can allow idmapped mounts
by setting the FS_ALLOW_IDMAP flag. Notice that real availability
of idmapped mounts will depend on the fuse daemon. Fuse daemon
have to set FUSE_ALLOW_IDMAP flag in the FUSE_INIT reply.
To discuss:
- we enable idmapped mounts support only if "default_permissions" mode is
enabled, because otherwise we would need to deal with UID/GID mappings in
the userspace side OR provide the userspace with idmapped
req->in.h.uid/req->in.h.gid values which is not something that we probably
want to. Idmapped mounts philosophy is not about faking caller uid/gid.
Some extra links and examples:
- libfuse support
https://github.com/mihalicyn/libfuse/commits/idmap_support
- fuse-overlayfs support:
https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support
- cephfs-fuse conversion example
https://github.com/mihalicyn/ceph/commits/fuse_idmap
- glusterfs conversion example
https://github.com/mihalicyn/glusterfs/commits/fuse_idmap
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
It is not possible with the current fuse code, but let's protect ourselves
from regressions in the future.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This is needed to properly clear suid/sgid.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
RENAME_WHITEOUT is a special case of ->rename
and we need to take idmappings into account there.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
It's just a matter of adjusting a permission check condition
for S_ISGID flag. All the rest is already handled in the generic
VFS code.
Notice that this permission check is the analog of what
we have in posix_acl_update_mode() generic helper, but
fuse doesn't use this helper as on the kernel side we don't
care about ensuring that POSIX ACL and CHMOD permissions are in sync
as it is a responsibility of a userspace daemon to handle that.
For the same reason we don't have a calls to posix_acl_chmod(),
while most of other filesystem do.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We don't need to have idmap in the __fuse_get_acl as we don't
have any use for it.
In the current POSIX ACL implementation, idmapped mounts are
taken into account on the userspace/kernel border
(see vfs_set_acl_idmapped_mnt() and vfs_posix_acl_to_xattr()).
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Need to translate uid and gid in case of chown(2).
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We only cover the case when "default_permissions" flag
is used. A reason for that is that otherwise all the permission
checks are done in the userspace and we have to deal with
VFS idmapping in the userspace (which is bad), alternatively
we have to provide the userspace with idmapped req->in.h.uid/req->in.h.gid
which is also not align with VFS idmaps philosophy.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We have to:
- pass an idmapping to the generic_fillattr()
to properly handle UIG/GID mapping for the userspace.
- pass -/- to fuse_fillattr() (analog of generic_fillattr() in fuse).
Difference between these two is that generic_fillattr() takes all the
stat() data from the inode directly, while fuse_fillattr() codepath takes a
fresh data just from the userspace reply on the FUSE_GETATTR request.
In some cases we can just pass &nop_mnt_idmap, because idmapping won't be
used in these codepaths. For example, when 3rd argument of
fuse_do_getattr() is NULL then idmap argument is not used.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We have all the infrastructure in place, we just need
to pass an idmapping here.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We don't need to remap parent_gid, but have to adjust
group membership checks and take idmapping into account.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If idmap == NULL *and* filesystem daemon declared idmapped mounts
support, then uid/gid values in a fuse header will be -1.
No functional changes intended.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add some preparational changes in fuse_get_req/fuse_force_creds
to handle idmappings.
Miklos suggested [1], [2] to change the meaning of in.h.uid/in.h.gid
fields when daemon declares support for idmapped mounts. In a new semantic,
we fill uid/gid values in fuse header with a id-mapped caller uid/gid (for
requests which create new inodes), for all the rest cases we just send -1
to userspace.
No functional changes intended.
Link: https://lore.kernel.org/all/CAJfpegsVY97_5mHSc06mSw79FehFWtoXT=hhTUK_E-Yhr7OAuQ@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAJfpegtHQsEUuFq1k4ZbTD3E1h-GsrN3PWyv7X8cg6sfU_W2Yw@mail.gmail.com/ [2]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Right now we determine if filesystem support vfs idmappings or not basing
on the FS_ALLOW_IDMAP flag presence. This "static" way works perfecly well
for local filesystems like ext4, xfs, btrfs, etc. But for network-like
filesystems like fuse, cephfs this approach is not ideal, because sometimes
proper support of vfs idmaps requires some extensions for the on-wire
protocol, which implies that changes have to be made not only in the Linux
kernel code but also in the 3rd party components like libfuse, cephfs MDS
server and so on.
We have seen that issue during our work on cephfs idmapped mounts [1] with
Christian, but right now I'm working on the idmapped mounts support for
fuse/virtiofs and I think that it is a right time for this extension.
[1] 5ccd8530dd7 ("ceph: handle idmapped mounts in create_request_message()")
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_mount_list doesn't exist, use fuse_conn_list.
Signed-off-by: Aurelien Aptel <aaptel@nvidia.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
I've been timing various fuse operations and it's quite annoying to do
with kprobes. Add two tracepoints for sending and ending fuse requests
to make it easier to debug and time various operations.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_writepage_locked()
This change refactors the shared logic in fuse_writepages_fill() and
fuse_writepages_locked() into two separate helper functions,
fuse_writepage_args_page_fill() and fuse_writepage_args_setup().
No functional changes added.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Before this change, wpa->ia.ff is initialized with an acquired reference
on the fuse file right before it submits the writeback request. If there
are auxiliary writebacks, then the initialization and reference
acquisition needs to also be set before we submit the auxiliary writeback
request.
To make the logic simpler and to pave the way for a subsequent
refactoring of fuse_writepages_fill() and fuse_writepage_locked(), this
change initializes and acquires wpa->ia.ff when the wpa is allocated.
No functional changes added.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
To pave the way for refactoring out the shared logic in
fuse_writepages_fill() and fuse_writepage_locked(), this change converts
the temporary page in fuse_writepages_fill() to use the folio API.
This is similar to the change in commit e0887e095a80 ("fuse: Convert
fuse_writepage_locked to take a folio"), which converted the tmp page in
fuse_writepage_locked() to use the folio API.
inc_node_page_state() is intentionally preserved here instead of
converting to node_stat_add_folio() since it is updating the stat of the
underlying page and to better maintain API symmetry with
dec_node_page_stat() in fuse_writepage_finish_stat().
No functional changes added.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
callback
Prior to this change, data->ff is checked and if not initialized then
initialized in the fuse_writepages_fill() callback, which gets called
for every dirty page in the address space mapping.
This logic is better placed in the main fuse_writepages() caller where
data.ff is initialized before walking the dirty pages.
No functional changes added.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Move the logic for updating the bdi and page stats for a finished
writeback into a separate helper function, where it can be called from
both fuse_writepage_finish() and fuse_writepage_add() (in the case
where there is already an auxiliary write request for the page).
No functional changes added.
Suggested by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Drop the unused "struct fuse_mount *fm" arg in
fuse_writepage_finish().
No functional changes added.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In some cases, the fi->writepages may be empty. And there is no need
to check fi->writepages with spin_lock, which may have an impact on
performance due to lock contention. For example, in scenarios where
multiple readers read the same file without any writers, or where
the page cache is not enabled.
Also remove the outdated comment since commit 6b2fb79963fb ("fuse:
optimize writepages search") has optimize the situation by replacing
list with rb-tree.
Signed-off-by: yangyun <yangyun50@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Virtiofs has its own queuing mechanism, but still requests are first queued
on fiq->pending to be immediately dequeued and queued onto the virtio
queue.
The queuing on fiq->pending is unnecessary and might even have some
performance impact due to being a contention point.
Forget requests are handled similarly.
Move the queuing of requests and forgets into the fiq->ops->*.
fuse_iqueue_ops are renamed to reflect the new semantics.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Fixed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Tested-by: Peter-Jan Gootzen <pgootzen@nvidia.com>
Reviewed-by: Peter-Jan Gootzen <pgootzen@nvidia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Current design and handling of passthrough is without fuse
caching and with that FUSE_WRITEBACK_CACHE is conflicting.
Fixes: 7dc4e97a4f9a ("fuse: introduce FUSE_PASSTHROUGH capability")
Cc: stable@kernel.org # v6.9
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In the case where the aux writeback list is dropped (e.g. the pages
have been truncated or the connection is broken), the stats for
its pages and backing device info need to be updated as well.
Fixes: e2653bd53a98 ("fuse: fix leaked aux requests")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Originally when a stolen page was inserted into fuse's page cache by
fuse_try_move_page(), it would be marked uptodate. Then
fuse_readpages_end() would call SetPageUptodate() again on the already
uptodate page.
Commit 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use
folio_end_read()") changed that by replacing the SetPageUptodate() +
unlock_page() combination with folio_end_read(), which does mostly the
same, except it sets the uptodate flag with an xor operation, which in the
above scenario resulted in the uptodate flag being cleared, which in turn
resulted in EIO being returned on the read.
Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(),
conforming to the expectation of folio_end_read().
Reported-by: Jürg Billeter <j@bitron.ch>
Debugged-by: Matthew Wilcox <willy@infradead.org>
Fixes: 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()")
Cc: <stable@vger.kernel.org> # v6.10
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The memory of struct fuse_file is allocated but not freed
when get_create_ext return error.
Fixes: 3e2b6fdbdc9a ("fuse: send security context of inode on file")
Cc: stable@vger.kernel.org # v5.17
Signed-off-by: yangyun <yangyun50@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
resending
There is a race condition where inflight requests will not be aborted if
they are in the middle of being re-sent when the connection is aborted.
If fuse_resend has already moved all the requests in the fpq->processing
lists to its private queue ("to_queue") and then the connection starts
and finishes aborting, these requests will be added to the pending queue
and remain on it indefinitely.
Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v6.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when
parsing the FUSE daemon's response to a zero-length getxattr/listxattr
request.
On 32-bit kernels, where ssize_t and outarg.size are the same size, this is
wrong: The min_t() will pass through any size values that are negative when
interpreted as signed.
fuse_listxattr() will then return this userspace-supplied negative value,
which callers will treat as an error value.
This kind of bug pattern can lead to fairly bad security bugs because of
how error codes are used in the Linux kernel. If a caller were to convert
the numeric error into an error pointer, like so:
struct foo *func(...) {
int len = fuse_getxattr(..., NULL, 0);
if (len < 0)
return ERR_PTR(len);
...
}
then it would end up returning this userspace-supplied negative value cast
to a pointer - but the caller of this function wouldn't recognize it as an
error pointer (IS_ERR_VALUE() only detects values in the narrow range in
which legitimate errno values are), and so it would just be treated as a
kernel pointer.
I think there is at least one theoretical codepath where this could happen,
but that path would involve virtio-fs with submounts plus some weird
SELinux configuration, so I think it's probably not a concern in practice.
Cc: stable@vger.kernel.org # v4.9
Fixes: 63401ccdb2ca ("fuse: limit xattr returned size")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix RPM package build error caused by an incorrect locale setup
- Mark modules.weakdep as ghost in RPM package
- Fix the odd combination of -S and -c in stack protector scripts,
which is an error with the latest Clang
* tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Fix '-S -c' in x86 stack protector scripts
kbuild: rpm-pkg: ghost modules.weakdep file
kbuild: rpm-pkg: Fix C locale setup
|
|
This simplifies the min_t() and max_t() macros by no longer making them
work in the context of a C constant expression.
That means that you can no longer use them for static initializers or
for array sizes in type definitions, but there were only a couple of
such uses, and all of them were converted (famous last words) to use
MIN_T/MAX_T instead.
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant
expressions in VM code") added the simpler MIN_T/MAX_T macros in order
to avoid some excessive expansion from the rather complicated regular
min/max macros.
The complexity of those macros stems from two issues:
(a) trying to use them in situations that require a C constant
expression (in static initializers and for array sizes)
(b) the type sanity checking
and MIN_T/MAX_T avoids both of these issues.
Now, in the whole (long) discussion about all this, it was pointed out
that the whole type sanity checking is entirely unnecessary for
min_t/max_t which get a fixed type that the comparison is done in.
But that still leaves min_t/max_t unnecessarily complicated due to
worries about the C constant expression case.
However, it turns out that there really aren't very many cases that use
min_t/max_t for this, and we can just force-convert those.
This does exactly that.
Which in turn will then allow for much simpler implementations of
min_t()/max_t(). All the usual "macros in all upper case will evaluate
the arguments multiple times" rules apply.
We should do all the same things for the regular min/max() vs MIN/MAX()
cases, but that has the added complexity of various drivers defining
their own local versions of MIN/MAX, so that needs another level of
fixes first.
Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
- Many fixes for power-cut issues by Zhihao Cheng
- Another ubiblock error path fix
- ubiblock section mismatch fix
- Misc fixes all over the place
* tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: Fix ubi_init() ubiblock_exit() section mismatch
ubifs: add check for crypto_shash_tfm_digest
ubifs: Fix inconsistent inode size when powercut happens during appendant writing
ubi: block: fix null-pointer-dereference in ubiblock_create()
ubifs: fix kernel-doc warnings
ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarity
mtd: ubi: Restore missing cleanup on ubi_init() failure path
ubifs: dbg_orphan_check: Fix missed key type checking
ubifs: Fix unattached inode when powercut happens in creating
ubifs: Fix space leak when powercut happens in linking tmpfile
ubifs: Move ui->data initialization after initializing security
ubifs: Fix adding orphan entry twice for the same inode
ubifs: Remove insert_dead_orphan from replaying orphan process
Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path"
ubifs: Don't add xattr inode into orphan area
ubifs: Fix unattached xattr inode if powercut happens after deleting
mtd: ubi: avoid expensive do_div() on 32-bit machines
mtd: ubi: make ubi_class constant
ubi: eba: properly rollback inside self_check_eba
|
|
After a recent change in clang to stop consuming all instances of '-S'
and '-c' [1], the stack protector scripts break due to the kernel's use
of -Werror=unused-command-line-argument to catch cases where flags are
not being properly consumed by the compiler driver:
$ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument
clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument]
This results in CONFIG_STACKPROTECTOR getting disabled because
CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set.
'-c' and '-S' both instruct the compiler to stop at different stages of
the pipeline ('-S' after compiling, '-c' after assembling), so having
them present together in the same command makes little sense. In this
case, the test wants to stop before assembling because it is looking at
the textual assembly output of the compiler for either '%fs' or '%gs',
so remove '-c' from the list of arguments to resolve the error.
All versions of GCC continue to work after this change, along with
versions of clang that do or do not contain the change mentioned above.
Cc: stable@vger.kernel.org
Fixes: 4f7fd4d7a791 ("[PATCH] Add the -fstack-protector option to the CFLAGS")
Fixes: 60a5317ff0f4 ("x86: implement x86_32 stack protector")
Link: https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353cf5600e9c [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Since ubiblock_exit() is now called from an init function,
the __exit section no longer makes sense.
Cc: Ben Hutchings <bwh@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407131403.wZJpd8n2-lkp@intel.com/
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Enable turbostat extensions to add both perf and PMT (Intel
Platform Monitoring Technology) counters via the cmdline
- Demonstrate PMT access with built-in support for Meteor Lake's
Die C6 counter
* tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2024.07.26
tools/power turbostat: Include umask=%x in perf counter's config
tools/power turbostat: Document PMT in turbostat.8
tools/power turbostat: Add MTL's PMT DC6 builtin counter
tools/power turbostat: Add early support for PMT counters
tools/power turbostat: Add selftests for added perf counters
tools/power turbostat: Add selftests for SMI, APERF and MPERF counters
tools/power turbostat: Move verbose counter messages to level 2
tools/power turbostat: Move debug prints from stdout to stderr
tools/power turbostat: Fix typo in turbostat.8
tools/power turbostat: Add perf added counter example to turbostat.8
tools/power turbostat: Fix formatting in turbostat.8
tools/power turbostat: Extend --add option with perf counters
tools/power turbostat: Group SMI counter with APERF and MPERF
tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array
tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source
tools/power turbostat: Remove anonymous union from rapl_counter_info_t
tools/power/turbostat: Switch to new Intel CPU model defines
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
"Core:
- A CXL maturity map has been added to the documentation to detail
the current state of CXL enabling.
It provides the status of the current state of various CXL features
to inform current and future contributors of where things are and
which areas need contribution.
- A notifier handler has been added in order for a newly created CXL
memory region to trigger the abstract distance metrics calculation.
This should bring parity for CXL memory to the same level vs
hotplugged DRAM for NUMA abstract distance calculation. The
abstract distance reflects relative performance used for memory
tiering handling.
- An addition for XOR math has been added to address the CXL DPA to
SPA translation.
CXL address translation did not support address interleave math
with XOR prior to this change.
Fixes:
- Fix to address race condition in the CXL memory hotplug notifier
- Add missing MODULE_DESCRIPTION() for CXL modules
- Fix incorrect vendor debug UUID define
Misc:
- A warning has been added to inform users of an unsupported
configuration when mixing CXL VH and RCH/RCD hierarchies
- The ENXIO error code has been replaced with EBUSY for inject poison
limit reached via debugfs and cxl-test support
- Moving the PCI config read in cxl_dvsec_rr_decode() to avoid
unnecessary PCI config reads
- A refactor to a common struct for DRAM and general media CXL
events"
* tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/core/pci: Move reading of control register to immediately before usage
cxl: Remove defunct code calculating host bridge target positions
cxl/region: Verify target positions using the ordered target list
cxl: Restore XOR'd position bits during address translation
cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy
cxl/core: Fix incorrect vendor debug UUID define
Documentation: CXL Maturity Map
cxl/region: Simplify cxl_region_nid()
cxl/region: Support to calculate memory tier abstract distance
cxl/region: Fix a race condition in memory hotplug notifier
cxl: add missing MODULE_DESCRIPTION() macros
cxl/events: Use a common struct for DRAM and General Media events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode update from Gabriel Krisman Bertazi:
"Two small fixes to silence the compiler and static analyzers tools
from Ben Dooks and Jeff Johnson"
* tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: add MODULE_DESCRIPTION() macros
unicode: make utf8 test count static
|
|
In the same way as for other similar files, mark as ghost the new file
generated by depmod for configured weak dependencies for modules,
modules.weakdep, so that although it is not included in the package,
claim the ownership on it.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix for potential null pointer use in init cifs
- additional dynamic trace points to improve debugging of some common
scenarios
- two SMB1 fixes (one addressing reconnect with POSIX extensions, one a
mount parsing error)
* tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb3: add dynamic trace point for session setup key expired failures
smb3: add four dynamic tracepoints for copy_file_range and reflink
smb3: add dynamic tracepoint for reflink errors
cifs: mount with "unix" mount option for SMB1 incorrectly handled
cifs: fix reconnect with SMB1 UNIX Extensions
cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Fix request without payloads cleanup (Leon)
- Use new protection information format (Francis)
- Improved debug message for lost pci link (Bart)
- Another apst quirk (Wang)
- Use appropriate sysfs api for printing chars (Markus)
- ublk async device deletion fix (Ming)
- drbd kerneldoc fixups (Simon)
- Fix deadlock between sd removal and release (Yang)
* tag 'block-6.11-20240726' of git://git.kernel.dk/linux:
nvme-pci: add missing condition check for existence of mapped data
ublk: fix UBLK_CMD_DEL_DEV_ASYNC handling
block: fix deadlock between sd_remove & sd_release
drbd: Add peer_device to Kernel doc
nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE
nvme-pci: Fix the instructions for disabling power management
nvme: remove redundant bdev local variable
nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens()
nvme/pci: Add APST quirk for Lenovo N60z laptop
|