summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-13mtd: spi-nor: core: correct type of iMuhammad Usama Anjum
The i should be signed to find out the end of the loop. Otherwise, i >= 0 is always true and loop becomes infinite. Make its type to be int. Fixes: 6a9eda34418f ("mtd: spi-nor: core: set mtd->eraseregions for non-uniform erase map") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Reviewed-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20240304090103.818092-1-usama.anjum@collabora.com
2024-03-13Merge tag 'spi-nor/for-6.9' into mtd/nextMiquel Raynal
SPI NOR gets the non uniform erase code cleaned. We stopped using bitmasks for erase types and flags, and instead introduced dedicated members. We then passed the SPI NOR erase map to MTD. Users can now determine the erase regions and make informed decisions on partitions size.
2024-02-26mtd: spi-nor: core: set mtd->eraseregions for non-uniform erase mapTakahiro Kuwano
Some of Infineon SPI NOR flash devices support hybrid sector layout that overlays 4KB sectors on a 256KB sector and SPI NOR framework recognizes that by parsing SMPT and construct params->erase_map. The hybrid sector layout is similar to CFI flash devices that have small sectors on top and/or bottom address. In case of CFI flash devices, the erase map information is parsed through CFI table and populated into mtd->eraseregions so that users can create MTD partitions that aligned with small sector boundaries. This patch provides the same capability to SPI NOR flash devices that have non-uniform erase map. Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://lore.kernel.org/r/35d0962986e493b06c13bdf7ada8130a9966dc02.1708404584.git.Takahiro.Kuwano@infineon.com Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
2024-02-26mtd: spi-nor: core: get rid of SNOR_OVERLAID_REGION flagTakahiro Kuwano
Only SNOR_OVERLAID_REGION is defined for flags in the spi_nor_erase_region structure. It can be replaced by a boolean parameter. Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://lore.kernel.org/r/076416d5dc8328dec72d31db12b9bec96bf0ac66.1708404584.git.Takahiro.Kuwano@infineon.com Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
2024-02-26mtd: spi-nor: core: get rid of SNOR_LAST_REGION flagTakahiro Kuwano
Introduce n_regions in spi_nor_erase_map structure and remove SNOR_LAST_REGION flag. Loop logics that depend on the flag are also reworked to use n_regions as loop condition. Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> Suggested-by: Tudor Ambarus <tudor.ambarus@linaro.org> Suggested-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://lore.kernel.org/r/eded84294bd81e966d6f423e578fc2cfb9a4a5b6.1708404584.git.Takahiro.Kuwano@infineon.com [ta: update spi_nor_init_erase_cmd_list() and break the for loop sooner.] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
2024-02-26mtd: spi-nor: core: rework struct spi_nor_erase_regionTakahiro Kuwano
Encoding bitmask flags into offset worsen the code readability. The erase type mask and flags should be stored in dedicated members. Also, erase_map.uniform_erase_type can be removed as it is redundant. Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> Suggested-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://lore.kernel.org/r/8e5e9e4081ed9f16ea9dce30693304a4b54d19b1.1708404584.git.Takahiro.Kuwano@infineon.com [ta: remove spi_nor_region_end()] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
2024-02-26mtd: Remove support for Carillo Ranch driverMatthew Wilcox (Oracle)
As far as anybody can tell, this product never shipped. If it did, it shipped in 2007 and nobody has access to one any more. Remove the mtd NOR driver. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20231208224703.1603264-2-willy@infradead.org
2024-02-26mtd: fix minor comment typo for struct mtd_masterMarcel Hamer
Minor typo in the suspend description. Signed-off-by: Marcel Hamer <marcel.hamer@windriver.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20240216155022.79371-1-marcel.hamer@windriver.com
2024-02-26mtd: chips: remove redundant assignment to variable timeoColin Ian King
In two functions the variable timeo is being initialized with a value that is never read, it is being re-assigned later on. The initializations are redundant and can be removed. Cleans up clang scan build warning: warning: Value stored to 'timeo' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20240215140106.2062858-1-colin.i.king@gmail.com
2024-02-26mtd: lpc32xx: use typedef for dma_filter_fnKrzysztof Kozlowski
Use existing typedef for dma_filter_fn to avoid duplicating type definition. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20240208202113.630190-1-krzysztof.kozlowski@linaro.org
2024-02-26dt-bindings: mtd: update references from partition.txt to mtd.yamlNayab Sayed
Commit f902baa917b6 ("dt-bindings: mtd: Remove useless file about partitions") removed the file partition.txt. Hence, in this commit, the lines mentioning this file are updated to reference mtd.yaml, which now includes partition{,s}.yaml. Signed-off-by: Nayab Sayed <nayabbasha.sayed@microchip.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20240208-partition-txt-v1-1-4398af3b7bb2@microchip.com
2024-02-26mtd: maps: physmap-core: fix flash size larger than 32-bitBaruch Siach
mtd-ram can potentially be larger than 4GB. get_bitmask_order() uses fls() that is not guaranteed to work with values larger than 32-bit. Specifically on aarch64 fls() returns 0 when all 32 LSB bits are clear. Use fls64() instead. Fixes: ba32ce95cbd987 ("mtd: maps: Merge gpio-addr-flash.c into physmap-core.c") Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/9fbf3664ce00f8b07867f1011834015f21d162a5.1707388458.git.baruch@tkos.co.il
2024-02-26dt-bindings: mtd: spi-nor: add optional interrupts propertyJosua Mayer
Some spi flash memories have an interrupt signal which can be used for signalling on-chip events such as busy status or ecc errors to the host. Add binding for "interrupts" property so that boards wiring this signal may describe the connection. Signed-off-by: Josua Mayer <josua@solid-run.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Acked-by: Michael Walle <mwalle@kernel.org> Link: https://lore.kernel.org/r/20240219-mtd-flash-interrupt-binding-v7-1-206e30a656fa@solid-run.com Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
2024-02-05mtd: flashchip: explicitly include <linux/wait.h>Tudor Ambarus
While reviewing the hyperbus sfdp proposal the following problem was noticed: In file included from ./include/linux/mtd/gen_probe.h:10, from drivers/mtd/hyperbus/hyperbus-sfdp.c:6: ./include/linux/mtd/flashchip.h:77:9: error: unknown type name ‘wait_queue_head_t’ 77 | wait_queue_head_t wq; /* Wait on here when we're waiting for the chip | ^~~~~~~~~~~~~~~~~ It is good practice to directly include all headers used, it avoids implicit dependencies and spurious breakage if someone rearranges headers and causes the implicit include to vanish. Explicitly include <linux/wait.h> in include/linux/mtd/flashchip. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20240205100955.149755-1-tudor.ambarus@linaro.org
2024-02-05mtd: ssfdc: Improve a size determination in ssfdcr_add_mtd()Markus Elfring
Replace the specification of a data structure by a pointer dereference as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/b7ee7d1b-49a2-41d8-9c8f-3674f1aecc43@web.de
2024-02-05mtd: ssfdc: Fix indentation in ssfdcr_add_mtd()Markus Elfring
Replace eight space characters by one tab. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/4045abdd-fb88-4c1f-a947-cdb3d4e59aa6@web.de
2024-02-05mtd: ssfdc: One function call less in ssfdcr_add_mtd() after error detectionMarkus Elfring
The kfree() function was called in one case by the ssfdcr_add_mtd() function during error handling even if the passed data structure member contained a null pointer. This issue was detected by using the Coccinelle software. Thus use another label. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/56d92e42-db9e-4767-bcb1-9686bdf34a03@web.de
2024-02-05dt-bindings: mtd: avoid automatically select from mtd.yamlZhenhua Huang
The mtd binding is always $ref'ed by other bindings, default selector should be from other binding files which ref'ed it. Now, "$nodename" in mtd.yaml turns into a "select" automatically such that a few binding check issues reported because it conflicts with sram devices(eg, qcom,imem.yaml, rules in folder sram/*) To avoid the automatically created "select" in mtd.yaml, adding: select: false Suggested-by: Bjorn Andersson <quic_bjorande@quicinc.com> Fixes: 7bdc671822e9 ("dt-bindings: mtd: physmap: Reuse the generic definitions") Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/1704885705-7486-1-git-send-email-quic_zhenhuah@quicinc.com
2024-01-21Linux 6.8-rc1v6.8-rc1Linus Torvalds
2024-01-21Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
2024-01-21Merge tag 'timers-core-2024-01-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for time and clocksources: - A fix for the idle and iowait time accounting vs CPU hotplug. The time is reset on CPU hotplug which makes the accumulated systemwide time jump backwards. - Assorted fixes and improvements for clocksource/event drivers" * tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug clocksource/drivers/ep93xx: Fix error handling during probe clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings clocksource/timer-riscv: Add riscv_clock_shutdown callback dt-bindings: timer: Add StarFive JH8100 clint dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
2024-01-21Merge tag 'powerpc-6.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Aneesh Kumar: - Increase default stack size to 32KB for Book3S Thanks to Michael Ellerman. * tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Increase default stack size to 32KB
2024-01-21bcachefs: Improve inode_to_text()Kent Overstreet
Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: logged_ops_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: reflink_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; extents_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: ec_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: subvolume_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: snapshot_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: alloc_background_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: xattr_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: dirent_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: inode_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; quota_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: sb-counters_format.hKent Overstreet
bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: counters.c -> sb-counters.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: comment bch_subvolumeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch_snapshot::btimeKent Overstreet
Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: add missing __GFP_NOWARNKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet
The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Prep work for variable size btree node buffersKent Overstreet
bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: grab s_umount only if snapshottingSu Yue
When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exitSu Yue
bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut. It should be freed by kvfree not kfree. Or umount will triger: [ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008 [ 406.830676 ] #PF: supervisor read access in kernel mode [ 406.831643 ] #PF: error_code(0x0000) - not-present page [ 406.832487 ] PGD 0 P4D 0 [ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI [ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90 [ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 406.835796 ] RIP: 0010:kfree+0x62/0x140 [ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6 [ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286 [ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4 [ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000 [ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001 [ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80 [ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000 [ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000 [ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0 [ 406.841464 ] Call Trace: [ 406.841583 ] <TASK> [ 406.841682 ] ? __die+0x1f/0x70 [ 406.841828 ] ? page_fault_oops+0x159/0x470 [ 406.842014 ] ? fixup_exception+0x22/0x310 [ 406.842198 ] ? exc_page_fault+0x1ed/0x200 [ 406.842382 ] ? asm_exc_page_fault+0x22/0x30 [ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs] [ 406.842842 ] ? kfree+0x62/0x140 [ 406.842988 ] ? kfree+0x104/0x140 [ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs] [ 406.843390 ] kobject_put+0xb7/0x170 [ 406.843552 ] deactivate_locked_super+0x2f/0xa0 [ 406.843756 ] cleanup_mnt+0xba/0x150 [ 406.843917 ] task_work_run+0x59/0xa0 [ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0 [ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40 [ 406.844510 ] do_syscall_64+0x4e/0xf0 [ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 406.844907 ] RIP: 0033:0x7f0a2664e4fb Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bios must be 512 byte alginedKent Overstreet
Fixes: 023f9ac9f70f bcachefs: Delete dio read alignment check Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: remove redundant variable tmpColin Ian King
The variable tmp is being assigned a value but it isn't being read afterwards. The assignment is redundant and so tmp can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Improve trace_trans_restart_relockKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Fix excess transaction restarts in __bchfs_fallocate()Kent Overstreet
drop_locks_do() should not be used in a fastpath without first trying the do in nonblocking mode - the unlock and relock will cause excessive transaction restarts and potentially livelocking with other threads that are contending for the same locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: extents_to_bp_stateKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bkey_and_val_eq()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Better journal tracepointsKent Overstreet
Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>