linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-03-01	crypto: rk3288 - Fix use after free in unprepare	Herbert Xu
	The unprepare call must be carried out before the finalize call as the latter can free the request. Fixes: c66c17a0f69b ("crypto: rk3288 - Remove prepare/unprepare request") Reported-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-28	crypto: sun8i-ce - Fix use after free in unprepare	Andrey Skvortsov
	sun8i_ce_cipher_unprepare should be called before crypto_finalize_skcipher_request, because client callbacks may immediately free memory, that isn't needed anymore. But it will be used by unprepare after free. Before removing prepare/unprepare callbacks it was handled by crypto engine in crypto_finalize_request. Usually that results in a pointer dereference problem during a in crypto selftest. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000004716d000 [0000000000000030] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP This problem is detected by KASAN as well. ================================================================== BUG: KASAN: slab-use-after-free in sun8i_ce_cipher_do_one+0x6e8/0xf80 [sun8i_ce] Read of size 8 at addr ffff00000dcdc040 by task 1c15000.crypto-/373 Hardware name: Pine64 PinePhone (1.2) (DT) Call trace: dump_backtrace+0x9c/0x128 show_stack+0x20/0x38 dump_stack_lvl+0x48/0x60 print_report+0xf8/0x5d8 kasan_report+0x90/0xd0 __asan_load8+0x9c/0xc0 sun8i_ce_cipher_do_one+0x6e8/0xf80 [sun8i_ce] crypto_pump_work+0x354/0x620 [crypto_engine] kthread_worker_fn+0x244/0x498 kthread+0x168/0x178 ret_from_fork+0x10/0x20 Allocated by task 379: kasan_save_stack+0x3c/0x68 kasan_set_track+0x2c/0x40 kasan_save_alloc_info+0x24/0x38 __kasan_kmalloc+0xd4/0xd8 __kmalloc+0x74/0x1d0 alg_test_skcipher+0x90/0x1f0 alg_test+0x24c/0x830 cryptomgr_test+0x38/0x60 kthread+0x168/0x178 ret_from_fork+0x10/0x20 Freed by task 379: kasan_save_stack+0x3c/0x68 kasan_set_track+0x2c/0x40 kasan_save_free_info+0x38/0x60 __kasan_slab_free+0x100/0x170 slab_free_freelist_hook+0xd4/0x1e8 __kmem_cache_free+0x15c/0x290 kfree+0x74/0x100 kfree_sensitive+0x80/0xb0 alg_test_skcipher+0x12c/0x1f0 alg_test+0x24c/0x830 cryptomgr_test+0x38/0x60 kthread+0x168/0x178 ret_from_fork+0x10/0x20 The buggy address belongs to the object at ffff00000dcdc000 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 64 bytes inside of freed 256-byte region [ffff00000dcdc000, ffff00000dcdc100) Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Fixes: 4136212ab18e ("crypto: sun8i-ce - Remove prepare/unprepare request") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24	crypto: arm64/neonbs - fix out-of-bounds access on short input	Ard Biesheuvel
	The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with. It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code. The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs. Fixes: fc074e130051 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk") Cc: <stable@vger.kernel.org> Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24	crypto: lskcipher - Copy IV in lskcipher glue code always	Herbert Xu
	The lskcipher glue code for skcipher needs to copy the IV every time rather than only on the first and last request. Otherwise those algorithms that use IV to perform chaining may break, e.g., CBC. This is because crypto_skcipher_import/export do not include the IV as part of the saved state. Reported-by: syzbot+b90b904ef6bdfdafec1d@syzkaller.appspotmail.com Fixes: 662ea18d089b ("crypto: skcipher - Make use of internal state") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09	crypto: virtio/akcipher - Fix stack overflow on memcpy	zhenwei pi
	sizeof(struct virtio_crypto_akcipher_session_para) is less than sizeof(struct virtio_crypto_op_ctrl_req::u), copying more bytes from stack variable leads stack overflow. Clang reports this issue by commands: make -j CC=clang-14 mrproper >/dev/null 2>&1 make -j O=/tmp/crypto-build CC=clang-14 allmodconfig >/dev/null 2>&1 make -j O=/tmp/crypto-build W=1 CC=clang-14 drivers/crypto/virtio/ virtio_crypto_akcipher_algs.o Fixes: 59ca6c93387d ("virtio-crypto: implement RSA algorithm") Link: https://lore.kernel.org/all/0a194a79-e3a3-45e7-be98-83abd3e1cb7e@roeck-us.net/ Cc: <stable@vger.kernel.org> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-02	crypto: algif_hash - Remove bogus SGL free on zero-length error path	Herbert Xu
	When a zero-length message is hashed by algif_hash, and an error is triggered, it tries to free an SG list that was never allocated in the first place. Fix this by not freeing the SG list on the zero-length error path. Reported-by: Shigeru Yoshida <syoshida@redhat.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Fixes: b6d972f68983 ("crypto: af_alg/hash: Fix recvmsg() after sendmsg(MSG_MORE)") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot+3266db0c26d1fbbe3abb@syzkaller.appspotmail.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-02	crypto: cbc - Ensure statesize is zero	Herbert Xu
	The cbc template should not be applied on stream ciphers, especially ones that have internal state. Enforce this by checking the state size when the instance is created. Reported-by: syzbot+050eeedd6c285d8c42f2@syzkaller.appspotmail.com Fixes: 47309ea13591 ("crypto: arc4 - Add internal state") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-02	crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked	Kim Phillips
	The SEV platform device can be shutdown with a null psp_master, e.g., using DEBUG_TEST_DRIVER_REMOVE. Found using KASAN: [ 137.148210] ccp 0000:23:00.1: enabling device (0000 -> 0002) [ 137.162647] ccp 0000:23:00.1: no command queues available [ 137.170598] ccp 0000:23:00.1: sev enabled [ 137.174645] ccp 0000:23:00.1: psp enabled [ 137.178890] general protection fault, probably for non-canonical address 0xdffffc000000001e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI [ 137.182693] KASAN: null-ptr-deref in range [0x00000000000000f0-0x00000000000000f7] [ 137.182693] CPU: 93 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1+ #311 [ 137.182693] RIP: 0010:__sev_platform_shutdown_locked+0x51/0x180 [ 137.182693] Code: 08 80 3c 08 00 0f 85 0e 01 00 00 48 8b 1d 67 b6 01 08 48 b8 00 00 00 00 00 fc ff df 48 8d bb f0 00 00 00 48 89 f9 48 c1 e9 03 <80> 3c 01 00 0f 85 fe 00 00 00 48 8b 9b f0 00 00 00 48 85 db 74 2c [ 137.182693] RSP: 0018:ffffc900000cf9b0 EFLAGS: 00010216 [ 137.182693] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000001e [ 137.182693] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00000000000000f0 [ 137.182693] RBP: ffffc900000cf9c8 R08: 0000000000000000 R09: fffffbfff58f5a66 [ 137.182693] R10: ffffc900000cf9c8 R11: ffffffffac7ad32f R12: ffff8881e5052c28 [ 137.182693] R13: ffff8881e5052c28 R14: ffff8881758e43e8 R15: ffffffffac64abf8 [ 137.182693] FS: 0000000000000000(0000) GS:ffff889de7000000(0000) knlGS:0000000000000000 [ 137.182693] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.182693] CR2: 0000000000000000 CR3: 0000001cf7c7e000 CR4: 0000000000350ef0 [ 137.182693] Call Trace: [ 137.182693] <TASK> [ 137.182693] ? show_regs+0x6c/0x80 [ 137.182693] ? __die_body+0x24/0x70 [ 137.182693] ? die_addr+0x4b/0x80 [ 137.182693] ? exc_general_protection+0x126/0x230 [ 137.182693] ? asm_exc_general_protection+0x2b/0x30 [ 137.182693] ? __sev_platform_shutdown_locked+0x51/0x180 [ 137.182693] sev_firmware_shutdown.isra.0+0x1e/0x80 [ 137.182693] sev_dev_destroy+0x49/0x100 [ 137.182693] psp_dev_destroy+0x47/0xb0 [ 137.182693] sp_destroy+0xbb/0x240 [ 137.182693] sp_pci_remove+0x45/0x60 [ 137.182693] pci_device_remove+0xaa/0x1d0 [ 137.182693] device_remove+0xc7/0x170 [ 137.182693] really_probe+0x374/0xbe0 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] __driver_probe_device+0x199/0x460 [ 137.182693] driver_probe_device+0x4e/0xd0 [ 137.182693] __driver_attach+0x191/0x3d0 [ 137.182693] ? __pfx___driver_attach+0x10/0x10 [ 137.182693] bus_for_each_dev+0x100/0x190 [ 137.182693] ? __pfx_bus_for_each_dev+0x10/0x10 [ 137.182693] ? __kasan_check_read+0x15/0x20 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] ? _raw_spin_unlock+0x27/0x50 [ 137.182693] driver_attach+0x41/0x60 [ 137.182693] bus_add_driver+0x2a8/0x580 [ 137.182693] driver_register+0x141/0x480 [ 137.182693] __pci_register_driver+0x1d6/0x2a0 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] ? esrt_sysfs_init+0x1cd/0x5d0 [ 137.182693] ? __pfx_sp_mod_init+0x10/0x10 [ 137.182693] sp_pci_init+0x22/0x30 [ 137.182693] sp_mod_init+0x14/0x30 [ 137.182693] ? __pfx_sp_mod_init+0x10/0x10 [ 137.182693] do_one_initcall+0xd1/0x470 [ 137.182693] ? __pfx_do_one_initcall+0x10/0x10 [ 137.182693] ? parameq+0x80/0xf0 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] ? __kmalloc+0x3b0/0x4e0 [ 137.182693] ? kernel_init_freeable+0x92d/0x1050 [ 137.182693] ? kasan_populate_vmalloc_pte+0x171/0x190 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] kernel_init_freeable+0xa64/0x1050 [ 137.182693] ? __pfx_kernel_init+0x10/0x10 [ 137.182693] kernel_init+0x24/0x160 [ 137.182693] ? __switch_to_asm+0x3e/0x70 [ 137.182693] ret_from_fork+0x40/0x80 [ 137.182693] ? __pfx_kernel_init+0x10/0x10 [ 137.182693] ret_from_fork_asm+0x1b/0x30 [ 137.182693] </TASK> [ 137.182693] Modules linked in: [ 137.538483] ---[ end trace 0000000000000000 ]--- Fixes: 1b05ece0c931 ("crypto: ccp - During shutdown, check SEV data pointer before using") Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Acked-by: John Allen <john.allen@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26	crypto: caam - fix asynchronous hash	Gaurav Jain
	ahash_alg->setkey is updated to ahash_nosetkey in ahash.c so checking setkey() function to determine hmac algorithm is not valid. to fix this added is_hmac variable in structure caam_hash_alg to determine whether the algorithm is hmac or not. Fixes: 2f1f34c1bf7b ("crypto: ahash - optimize performance when wrapping shash") Signed-off-by: Gaurav Jain <gaurav.jain@nxp.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26	crypto: qat - fix arbiter mapping generation algorithm for QAT 402xx	Damian Muszynski
	The commit "crypto: qat - generate dynamically arbiter mappings" introduced a regression on qat_402xx devices. This is reported when the driver probes the device, as indicated by the following error messages: 4xxx 0000:0b:00.0: enabling device (0140 -> 0142) 4xxx 0000:0b:00.0: Generate of the thread to arbiter map failed 4xxx 0000:0b:00.0: Direct firmware load for qat_402xx_mmp.bin failed with error -2 The root cause of this issue was the omission of a necessary function pointer required by the mapping algorithm during the implementation. Fix it by adding the missing function pointer. Fixes: 5da6a2d5353e ("crypto: qat - generate dynamically arbiter mappings") Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-21	Linux 6.8-rc1v6.8-rc1	Linus Torvalds

2024-01-21	Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs	Linus Torvalds
	Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
2024-01-21	Merge tag 'timers-core-2024-01-21' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for time and clocksources: - A fix for the idle and iowait time accounting vs CPU hotplug. The time is reset on CPU hotplug which makes the accumulated systemwide time jump backwards. - Assorted fixes and improvements for clocksource/event drivers" * tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug clocksource/drivers/ep93xx: Fix error handling during probe clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings clocksource/timer-riscv: Add riscv_clock_shutdown callback dt-bindings: timer: Add StarFive JH8100 clint dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
2024-01-21	Merge tag 'powerpc-6.8-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Aneesh Kumar: - Increase default stack size to 32KB for Book3S Thanks to Michael Ellerman. * tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Increase default stack size to 32KB
2024-01-21	bcachefs: Improve inode_to_text()	Kent Overstreet
	Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: logged_ops_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: reflink_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs; extents_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: ec_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: subvolume_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: snapshot_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: alloc_background_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: xattr_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: dirent_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: inode_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs; quota_format.h	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: sb-counters_format.h	Kent Overstreet
	bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: counters.c -> sb-counters.c	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: comment bch_subvolume	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: bch_snapshot::btime	Kent Overstreet
	Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: add missing __GFP_NOWARN	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: opts->compression can now also be applied in the background	Kent Overstreet
	The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Prep work for variable size btree node buffers	Kent Overstreet
	bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: grab s_umount only if snapshotting	Su Yue
	When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] * DEADLOCK * [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit	Su Yue
	bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut. It should be freed by kvfree not kfree. Or umount will triger: [ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008 [ 406.830676 ] #PF: supervisor read access in kernel mode [ 406.831643 ] #PF: error_code(0x0000) - not-present page [ 406.832487 ] PGD 0 P4D 0 [ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI [ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90 [ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 406.835796 ] RIP: 0010:kfree+0x62/0x140 [ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6 [ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286 [ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4 [ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000 [ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001 [ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80 [ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000 [ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000 [ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0 [ 406.841464 ] Call Trace: [ 406.841583 ] <TASK> [ 406.841682 ] ? __die+0x1f/0x70 [ 406.841828 ] ? page_fault_oops+0x159/0x470 [ 406.842014 ] ? fixup_exception+0x22/0x310 [ 406.842198 ] ? exc_page_fault+0x1ed/0x200 [ 406.842382 ] ? asm_exc_page_fault+0x22/0x30 [ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs] [ 406.842842 ] ? kfree+0x62/0x140 [ 406.842988 ] ? kfree+0x104/0x140 [ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs] [ 406.843390 ] kobject_put+0xb7/0x170 [ 406.843552 ] deactivate_locked_super+0x2f/0xa0 [ 406.843756 ] cleanup_mnt+0xba/0x150 [ 406.843917 ] task_work_run+0x59/0xa0 [ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0 [ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40 [ 406.844510 ] do_syscall_64+0x4e/0xf0 [ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 406.844907 ] RIP: 0033:0x7f0a2664e4fb Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: bios must be 512 byte algined	Kent Overstreet
	Fixes: 023f9ac9f70f bcachefs: Delete dio read alignment check Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: remove redundant variable tmp	Colin Ian King
	The variable tmp is being assigned a value but it isn't being read afterwards. The assignment is redundant and so tmp can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Improve trace_trans_restart_relock	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Fix excess transaction restarts in __bchfs_fallocate()	Kent Overstreet
	drop_locks_do() should not be used in a fastpath without first trying the do in nonblocking mode - the unlock and relock will cause excessive transaction restarts and potentially livelocking with other threads that are contending for the same locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: extents_to_bp_state	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: bkey_and_val_eq()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Better journal tracepoints	Kent Overstreet
	Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Print size of superblock with space allocated	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Avoid flushing the journal in the discard path	Kent Overstreet
	When issuing discards, we may need to flush the journal if there's too many buckets that can't be discarded until a journal flush. But the heuristic was bad; we should be comparing the number of buckets that need to flushes against the number of free buckets, not the number of buckets we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Improve move_extent tracepoint	Kent Overstreet
	Also print out the data_opts, so that we can see what specifically is being done to an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Add missing bch2_moving_ctxt_flush_all()	Kent Overstreet
	This fixes a bug with rebalance IOs getting stuck with reads completed, but writes never being issued. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Re-add move_extent_write tracepoint	Kent Overstreet
	It appears this was accidentally deleted at some point - also, do a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount	Kent Overstreet
	Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code that uses it to be woken up for other reasons, and fixes a bug where rebalance wouldn't wake up when a scan was requested. This raises the possibility of spurious wakeups, but callers should always be able to handle that reasonably well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Add .val_to_text() for KEY_TYPE_cookie	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21	bcachefs: Don't pass memcmp() as a pointer	Kent Overstreet
	Some (buggy!) compilers have issues with this. Fixes: https://github.com/koverstreet/bcachefs/issues/625 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>