Age | Commit message (Collapse) | Author |
|
Let's convert all existing callers properly.
No functional changes intended.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
It was reported [1] that on linux-next/fs-next the following crash
is reproducible:
[ 42.659136] Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 42.660501] fbcon: Taking over console
[ 42.660930] KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f]
[ 42.661752] CPU: 1 UID: 0 PID: 1589 Comm: dtprobed Not tainted 6.11.0-rc6+ #1
[ 42.662565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.6.6 08/22/2023
[ 42.663472] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse]
[ 42.664046] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83
[ 42.666945] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212
[ 42.668837] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a
[ 42.670122] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058
[ 42.672154] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172
[ 42.672160] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001
[ 42.672179] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840
[ 42.672186] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000
[ 42.672191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.[ CR02: ;32m00007f3237366208 CR3: 0 OK 79e001 CR4: 0000000000770ef0
[ 42.672214] PKRU: 55555554
[ 42.672218] Call Trace:
[ 42.672223] <TASK>
[ 42.672226] ? die_addr+0x41/0xa0
[ 42.672238] ? exc_general_protection+0x14c/0x230
[ 42.672250] ? asm_exc_general_protection+0x26/0x30
[ 42.672260] ? fuse_get_req+0x77a/0x990 [fuse]
[ 42.672281] ? fuse_get_req+0x36b/0x990 [fuse]
[ 42.672300] ? kasan_unpoison+0x27/0x60
[ 42.672310] ? __pfx_fuse_get_req+0x10/0x10 [fuse]
[ 42.672327] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672333] ? alloc_pages_mpol_noprof+0x195/0x440
[ 42.672340] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672345] ? kasan_unpoison+0x27/0x60
[ 42.672350] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672355] ? __kasan_slab_alloc+0x4d/0x90
[ 42.672362] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672367] ? __kmalloc_cache_noprof+0x134/0x350
[ 42.672376] fuse_simple_background+0xe7/0x180 [fuse]
[ 42.672406] cuse_channel_open+0x540/0x710 [cuse]
[ 42.672415] misc_open+0x2a7/0x3a0
[ 42.672424] chrdev_open+0x1ef/0x5f0
[ 42.672432] ? __pfx_chrdev_open+0x10/0x10
[ 42.672439] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672443] ? security_file_open+0x3bb/0x720
[ 42.672451] do_dentry_open+0x43d/0x1200
[ 42.672459] ? __pfx_chrdev_open+0x10/0x10
[ 42.672468] vfs_open+0x79/0x340
[ 42.672475] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672482] do_open+0x68c/0x11e0
[ 42.672489] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672495] ? __pfx_do_open+0x10/0x10
[ 42.672501] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672506] ? open_last_lookups+0x2a2/0x1370
[ 42.672515] path_openat+0x24f/0x640
[ 42.672522] ? __pfx_path_openat+0x10/0x10
[ 42.723972] ? stack_depot_save_flags+0x45/0x4b0
[ 42.724787] ? __fput+0x43c/0xa70
[ 42.725100] do_filp_open+0x1b3/0x3e0
[ 42.725710] ? poison_slab_object+0x10d/0x190
[ 42.726145] ? __kasan_slab_free+0x33/0x50
[ 42.726570] ? __pfx_do_filp_open+0x10/0x10
[ 42.726981] ? do_syscall_64+0x64/0x170
[ 42.727418] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 42.728018] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.728505] ? do_raw_spin_lock+0x131/0x270
[ 42.728922] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 42.729494] ? do_raw_spin_unlock+0x14c/0x1f0
[ 42.729992] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.730889] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.732178] ? alloc_fd+0x176/0x5e0
[ 42.732585] do_sys_openat2+0x122/0x160
[ 42.732929] ? __pfx_do_sys_openat2+0x10/0x10
[ 42.733448] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.734013] ? __pfx_map_id_up+0x10/0x10
[ 42.734482] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.735529] ? __memcg_slab_free_hook+0x292/0x500
[ 42.736131] __x64_sys_openat+0x123/0x1e0
[ 42.736526] ? __pfx___x64_sys_openat+0x10/0x10
[ 42.737369] ? __x64_sys_close+0x7c/0xd0
[ 42.737717] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.738192] ? syscall_trace_enter+0x11e/0x1b0
[ 42.738739] do_syscall_64+0x64/0x170
[ 42.739113] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 42.739638] RIP: 0033:0x7f28cd13e87b
[ 42.740038] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25
[ 42.741943] RSP: 002b:00007ffc992546c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 42.742951] RAX: ffffffffffffffda RBX: 00007f28cd44f1ee RCX: 00007f28cd13e87b
[ 42.743660] RDX: 0000000000000002 RSI: 00007f28cd44f2fa RDI: 00000000ffffff9c
[ 42.744518] RBP: 00007f28cd44f2fa R08: 0000000000000000 R09: 0000000000000001
[ 42.745211] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[ 42.745920] R13: 00007f28cd44f2fa R14: 0000000000000000 R15: 0000000000000003
[ 42.746708] </TASK>
[ 42.746937] Modules linked in: cuse vfat fat ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common kvm_amd ccp bochs drm_vram_helper kvm drm_ttm_helper ttm pcspkr i2c_piix4 drm_kms_helper i2c_smbus pvpanic_mmio pvpanic joydev sch_fq_codel drm fuse xfs nvme_tcp nvme_fabrics nvme_core sd_mod sg virtio_net net_failover virtio_scsi failover crct10dif_pclmul crc32_pclmul ata_generic pata_acpi ata_piix ghash_clmulni_intel virtio_pci sha512_ssse3 virtio_pci_legacy_dev sha256_ssse3 virtio_pci_modern_dev sha1_ssse3 libata serio_raw dm_multipath btrfs blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qemu_fw_cfg aesni_intel crypto_simd cryptd
[ 42.754333] ---[ end trace 0000000000000000 ]---
[ 42.756899] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse]
[ 42.757851] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83
[ 42.760334] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212
[ 42.760940] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a
[ 42.761697] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058
[ 42.763009] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172
[ 42.763920] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001
[ 42.764839] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840
[ 42.765716] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000
[ 42.766890] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.767828] CR2: 00007f3237366208 CR3: 000000012c79e001 CR4: 0000000000770ef0
[ 42.768730] PKRU: 55555554
[ 42.769022] Kernel panic - not syncing: Fatal exception
[ 42.770758] Kernel Offset: 0x7200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 42.771947] ---[ end Kernel panic - not syncing: Fatal exception ]---
It's obviously CUSE related callstack. For CUSE case, we don't have superblock and
our checks for SB_I_NOIDMAP flag does not make any sense. Let's handle this case gracefully.
Fixes: aa16880d9f13 ("fuse: add basic infrastructure to support idmappings")
Link: https://lore.kernel.org/linux-next/87v7z586py.fsf@debian-BULLSEYE-live-builder-AMD64/ [1]
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Reported-by: syzbot+20c7e20cc8f5296dca12@syzkaller.appspotmail.com
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Merge user access fast validation using address masking.
This allows architectures to optionally use a data dependent address
masking model instead of a conditional branch for validating user
accesses. That avoids the Spectre-v1 speculation barriers.
Right now only x86-64 takes advantage of this, and not all architectures
will be able to do it. It requires a guard region between the user and
kernel address spaces (so that you can't overflow from one to the
other), and an easy way to generate a guaranteed-to-fault address for
invalid user pointers.
Also note that this currently assumes that there is no difference
between user read and write accesses. If extended to architectures like
powerpc, we'll also need to separate out the user read-vs-write cases.
* address-masking:
x86: make the masked_user_access_begin() macro use its argument only once
x86: do the user address masking outside the user access area
x86: support user address masking instead of non-speculative conditional
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Introduce '__attribute__((bpf_fastcall))' for helpers and kfuncs with
corresponding support in LLVM.
It is similar to existing 'no_caller_saved_registers' attribute in
GCC/LLVM with a provision for backward compatibility. It allows
compilers generate more efficient BPF code assuming the verifier or
JITs will inline or partially inline a helper/kfunc with such
attribute. bpf_cast_to_kern_ctx, bpf_rdonly_cast,
bpf_get_smp_processor_id are the first set of such helpers.
- Harden and extend ELF build ID parsing logic.
When called from sleepable context the relevants parts of ELF file
will be read to find and fetch .note.gnu.build-id information. Also
harden the logic to avoid TOCTOU, overflow, out-of-bounds problems.
- Improvements and fixes for sched-ext:
- Allow passing BPF iterators as kfunc arguments
- Make the pointer returned from iter_next method trusted
- Fix x86 JIT convergence issue due to growing/shrinking conditional
jumps in variable length encoding
- BPF_LSM related:
- Introduce few VFS kfuncs and consolidate them in
fs/bpf_fs_kfuncs.c
- Enforce correct range of return values from certain LSM hooks
- Disallow attaching to other LSM hooks
- Prerequisite work for upcoming Qdisc in BPF:
- Allow kptrs in program provided structs
- Support for gen_epilogue in verifier_ops
- Important fixes:
- Fix uprobe multi pid filter check
- Fix bpf_strtol and bpf_strtoul helpers
- Track equal scalars history on per-instruction level
- Fix tailcall hierarchy on x86 and arm64
- Fix signed division overflow to prevent INT_MIN/-1 trap on x86
- Fix get kernel stack in BPF progs attached to tracepoint:syscall
- Selftests:
- Add uprobe bench/stress tool
- Generate file dependencies to drastically improve re-build time
- Match JIT-ed and BPF asm with __xlated/__jited keywords
- Convert older tests to test_progs framework
- Add support for RISC-V
- Few fixes when BPF programs are compiled with GCC-BPF backend
(support for GCC-BPF in BPF CI is ongoing in parallel)
- Add traffic monitor
- Enable cross compile and musl libc
* tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (260 commits)
btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version
btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug
btf: remove redundant CONFIG_BPF test in scripts/link-vmlinux.sh
bpf: Call the missed kfree() when there is no special field in btf
bpf: Call the missed btf_record_free() when map creation fails
selftests/bpf: Add a test case to write mtu result into .rodata
selftests/bpf: Add a test case to write strtol result into .rodata
selftests/bpf: Rename ARG_PTR_TO_LONG test description
selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types
bpf: Fix helper writes to read-only maps
bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers
bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit
selftests/bpf: Add tests for sdiv/smod overflow cases
bpf: Fix a sdiv overflow issue
libbpf: Add bpf_object__token_fd accessor
docs/bpf: Add missing BPF program types to docs
docs/bpf: Add constant values for linkages
bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
...
|
|
syzbot reported a null-ptr-deref in bch2_fs_start. [0]
When a sb is marked clear but doesn't have a clean section
bch2_read_superblock_clean returns NULL which PTR_ERR_OR_ZERO
lets through, eventually leading to a null ptr dereference down
the line. Adjust read sb clean to return an ERR_PTR indicating the
invalid clean section.
[0] https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543
Reported-by: syzbot+1cecc37d87c4286e5543@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543
Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The header files bbpos.h is included twice in backpointers.c,
so one inclusion of each can be removed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10783
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This factors out ec_strie_head_devs_update(), which initializes the
bitmap of devices we're allocating from, and runs it every time
c->rw_devs_change_count changes.
We also cancel pending, not allocated stripes, since they may refer to
devices that are no longer available.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a counter that's incremented whenever rw devices change; this will
be used for erasure coding so that it can keep ec_stripe_head in sync
and not deadlock on a new stripe when a device it wants goes away.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can now correctly force-remove a device that has stripes on it; this
uses the new BCH_SB_MEMBER_INVALID sentinal value.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is necessary for erasure coded pointers to devices that have been
removed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
also print out the new stripe key
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
additional debug stat
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When reshaping existing stripes, we should keep them on the same target
that they were allocated on; to do this, we need to add a field to the
btree stripe type.
This is a tad awkward, because we only have 8 bits left, and targets are
16 bits - but we only need to store a label, not a full target.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
factor out a common helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We want to be using private errcodes whenever possible, for better error
messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In backpointers fsck, we do a seqential scan of one btree, and check
references to another: extents <-> backpointers
Checking references generates random lookups, so we want to pin that
btree in memory (or only a range, if it doesn't fit in ram).
Previously, this was done with a simple check in the shrinker - "if
btree node is in range being pinned, don't free it" - but this generated
OOMs, as our shrinker wasn't well behaved if there was less memory
available than expected.
Instead, we now have two different shrinkers and lru lists; the second
shrinker being for pinned nodes, with seeks set much higher than normal
- so they can still be freed if necessary, but we'll prefer not to.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
this is prep for introducing a second live list and shrinker for pinned
nodes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
32 bits won't overflow any time soon, but size_t is the correct type for
counting objects in memory.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_dev_rcu() now properly errors if the device is invalid
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out bch2_show_options() into a generic helper, for debugging
option passing issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix the following compilation error:
```
fs/bcachefs/sb-members.c: In function ‘bch2_sb_member_alloc’:
fs/bcachefs/sb-members.c:508:2: error: a label can only be part of a statement and a declaration is not a statement
508 | unsigned nr_devices = max_t(unsigned, dev_idx + 1, c->sb.nr_devices);
```
Fixes: a7d364a133c7 ("bcachefs: bch2_sb_member_alloc()")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
No reason for it not to be where it's needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds mount options for specifying recovery passes to run, or
exclude; the immediate need for this is that backpointers fsck is having
trouble completing, so we need a way to skip it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When freeing in a shrinker callback, we need to notify memory reclaim,
so it knows forward progress has been made.
Normally this is done in e.g. slab code, but we're not freeing through
slab - or rather we are, but these allocations are big, and use the
kmalloc_large() path.
This is really a bug in the slub code, but we're working around it here
for now.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is needed for overlayfs, which is used by container managers.
Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
this was an oversight: rebalance is moving data to a specific device, so
we don't want it falling back to the full filesystem
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
rebalance writes must be BCH_WRITE_ALLOC_NOWAIT because they don't
allocate from the full filesystem - but we don't want spurious
allocation failures due to open buckets.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
like the previous patch, kill use of bare arrays; the encryption code
likes to work in big batches, so this is a small performance
improvement.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
convert to darray, and add a fallback when allocation fails
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Many singleton patches - please see the various changelogs for
details.
Quite a lot of nilfs2 work this time around.
Notable patch series in this pull request are:
- "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with
assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64()
to provide (much) more accurate results. The current implementation
was causing Uwe some issues in the PWM drivers.
- "xz: Updates to license, filters, and compression options" from
Lasse Collin. Miscellaneous maintenance and kinor feature work to
the xz decompressor.
- "Fix some GDB command error and add some GDB commands" from
Kuan-Ying Lee. Fixes and enhancements to the gdb scripts.
- "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff
Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of
warnings about this.
- "nilfs2: add support for some common ioctls" from Ryusuke Konishi.
Adds various commonly-available ioctls to nilfs2.
- "This series fixes a number of formatting issues in kernel doc
comments" from Ryusuke Konishi does that.
- "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke
Konishi. Fix issues where -ENOENT was being unintentionally and
inappropriately returned to userspace.
- "nilfs2: assorted cleanups" from Huang Xiaojia.
- "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke
Konishi fixes some issues which can occur on corrupted nilfs2
filesystems.
- "scripts/decode_stacktrace.sh: improve error reporting and
usability" from Luca Ceresoli does those things"
* tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (103 commits)
list: test: increase coverage of list_test_list_replace*()
list: test: fix tests for list_cut_position()
proc: use __auto_type more
treewide: correct the typo 'retun'
ocfs2: cleanup return value and mlog in ocfs2_global_read_info()
nilfs2: remove duplicate 'unlikely()' usage
nilfs2: fix potential oob read in nilfs_btree_check_delete()
nilfs2: determine empty node blocks as corrupted
nilfs2: fix potential null-ptr-deref in nilfs_btree_insert()
user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation
tools/mm: rm thp_swap_allocator_test when make clean
squashfs: fix percpu address space issues in decompressor_multi_percpu.c
lib: glob.c: added null check for character class
nilfs2: refactor nilfs_segctor_thread()
nilfs2: use kthread_create and kthread_stop for the log writer thread
nilfs2: remove sc_timer_task
nilfs2: do not repair reserved inode bitmap in nilfs_new_inode()
nilfs2: eliminate the shared counter and spinlock for i_generation
nilfs2: separate inode type information from i_state field
nilfs2: use the BITS_PER_LONG macro
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs update from Mike Marshall:
"Constify struct kobj_type"
* tag 'for-linux-6.12-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Constify struct kobj_type
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Lots of cleanups and bug fixes this cycle, primarily in the block
allocation, extent management, fast commit, and journalling"
* tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (93 commits)
ext4: convert EXT4_B2C(sbi->s_stripe) users to EXT4_NUM_B2C
ext4: check stripe size compatibility on remount as well
ext4: fix i_data_sem unlock order in ext4_ind_migrate()
ext4: remove the special buffer dirty handling in do_journal_get_write_access
ext4: fix a potential assertion failure due to improperly dirtied buffer
ext4: hoist ext4_block_write_begin and replace the __block_write_begin
ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers
ext4: dax: keep orphan list before truncate overflow allocated blocks
ext4: fix error message when rejecting the default hash
ext4: save unnecessary indentation in ext4_ext_create_new_leaf()
ext4: make some fast commit functions reuse extents path
ext4: refactor ext4_swap_extents() to reuse extents path
ext4: get rid of ppath in convert_initialized_extent()
ext4: get rid of ppath in ext4_ext_handle_unwritten_extents()
ext4: get rid of ppath in ext4_ext_convert_to_initialized()
ext4: get rid of ppath in ext4_convert_unwritten_extents_endio()
ext4: get rid of ppath in ext4_split_convert_extents()
ext4: get rid of ppath in ext4_split_extent()
ext4: get rid of ppath in ext4_force_split_extent_at()
ext4: get rid of ppath in ext4_split_extent_at()
...
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs blocksize updates from Christian Brauner:
"This contains the vfs infrastructure as well as the xfs bits to enable
support for block sizes (bs) larger than page sizes (ps) plus a few
fixes to related infrastructure.
There has been efforts over the last 16 years to enable enable Large
Block Sizes (LBS), that is block sizes in filesystems where bs > page
size. Through these efforts we have learned that one of the main
blockers to supporting bs > ps in filesystems has been a way to
allocate pages that are at least the filesystem block size on the page
cache where bs > ps.
Thanks to various previous efforts it is possible to support bs > ps
in XFS with only a few changes in XFS itself. Most changes are to the
page cache to support minimum order folio support for the target block
size on the filesystem.
A motivation for Large Block Sizes today is to support high-capacity
(large amount of Terabytes) QLC SSDs where the internal Indirection
Unit (IU) are typically greater than 4k to help reduce DRAM and so in
turn cost and space. In practice this then allows different
architectures to use a base page size of 4k while still enabling
support for block sizes aligned to the larger IUs by relying on high
order folios on the page cache when needed.
It also allows to take advantage of the drive's support for atomics
larger than 4k with buffered IO support in Linux. As described this
year at LSFMM, supporting large atomics greater than 4k enables
databases to remove the need to rely on their own journaling, so they
can disable double buffered writes, which is a feature different cloud
providers are already enabling through custom storage solutions"
* tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
Documentation: iomap: fix a typo
iomap: remove the iomap_file_buffered_write_punch_delalloc return value
iomap: pass the iomap to the punch callback
iomap: pass flags to iomap_file_buffered_write_punch_delalloc
iomap: improve shared block detection in iomap_unshare_iter
iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
docs:filesystems: fix spelling and grammar mistakes in iomap design page
filemap: fix htmldoc warning for mapping_align_index()
iomap: make zero range flush conditional on unwritten mappings
iomap: fix handling of dirty folios over unwritten extents
iomap: add a private argument for iomap_file_buffered_write
iomap: remove set_memor_ro() on zero page
xfs: enable block size larger than page size support
xfs: make the calculation generic in xfs_sb_validate_fsb_count()
xfs: expose block size in stat
xfs: use kvmalloc for xattr buffers
iomap: fix iomap_dio_zero() for fs bs > system page size
filemap: cap PTE range to be created to allowed zero fill in folio_map_range()
mm: split a folio in minimum folio order chunks
readahead: allocate folios with mapping_min_order in readahead
...
|
|
The pair of bloom filtered used by delegation_blocked() was intended to
block delegations on given filehandles for between 30 and 60 seconds. A
new filehandle would be recorded in the "new" bit set. That would then
be switch to the "old" bit set between 0 and 30 seconds later, and it
would remain as the "old" bit set for 30 seconds.
Unfortunately the code intended to clear the old bit set once it reached
30 seconds old, preparing it to be the next new bit set, instead cleared
the *new* bit set before switching it to be the old bit set. This means
that the "old" bit set is always empty and delegations are blocked
between 0 and 30 seconds.
This patch updates bd->new before clearing the set with that index,
instead of afterwards.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
At this point in compound processing, currentfh refers to the parent of
the file, not the file itself. Get the correct dentry from the delegation
stateid instead.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy.
With this patch we:
- properly handle non-nfsd leases. We must not assume flc_owner is a
delegation unless fl_lmops == &nfsd_lease_mng_ops
- move the main code out of the for loop
- have a single exit which calls nfs4_put_stid()
(and other exits which don't need to call that)
[ jlayton: refactored on top of Neil's other patch: nfsd: fix
nfsd4_deleg_getattr_conflict in presence of third party lease ]
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|