summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-27doc/netlink: Fix typo in genetlink-* schemasDonald Hunter
Fix typo verion -> version in genetlink-c and genetlink-legacy. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230825122756.7603-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27Merge branch 'devlink-mlx5-add-port-function-attributes-for-ipsec'Jakub Kicinski
Saeed Mahameed says: ==================== {devlink,mlx5}: Add port function attributes for ipsec From Dima: Introduce hypervisor-level control knobs to set the functionality of PCI VF devices passed through to guests. The administrator of a hypervisor host may choose to change the settings of a port function from the defaults configured by the device firmware. The software stack has two types of IPsec offload - crypto and packet. Specifically, the ip xfrm command has sub-commands for "state" and "policy" that have an "offload" parameter. With ip xfrm state, both crypto and packet offload types are supported, while ip xfrm policy can only be offloaded in packet mode. The series introduces two new boolean attributes of a port function: ipsec_crypto and ipsec_packet. The goal is to provide a similar level of granularity for controlling VF IPsec offload capabilities, which would be aligned with the software model. This will allow users to decide if they want both types of offload enabled for a VF, just one of them, or none at all (which is the default). At a high level, the difference between the two knobs is that with ipsec_crypto, only XFRM state can be offloaded. Specifically, only the crypto operation (Encrypt/Decrypt) is offloaded. With ipsec_packet, both XFRM state and policy can be offloaded. Furthermore, in addition to crypto operation offload, IPsec encapsulation is also offloaded. For XFRM state, choosing between crypto and packet offload types is possible. From the HW perspective, different resources may be required for each offload type. Examples of when a user prefers to enable IPsec packet offload for a VF when using switchdev mode: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet enable This enables the corresponding IPsec capability of the function before it's enumerated, so when the driver reads the capability from the device firmware, it is enabled. The driver is then able to configure corresponding features and ops of the VF net device to support IPsec state and policy offloading. v2: https://lore.kernel.org/netdev/20230421104901.897946-1-dchumak@nvidia.com/ ==================== Link: https://lore.kernel.org/r/20230825062836.103744-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27net/mlx5: Implement devlink port function cmds to control ipsec_packetDima Chumak
Implement devlink port function commands to enable / disable IPsec packet offloads. This is used to control the IPsec capability of the device. When ipsec_offload is enabled for a VF, it prevents adding IPsec packet offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec packet offloads on the PF, it's not allowed to enable ipsec_packet on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27net/mlx5: Implement devlink port function cmds to control ipsec_cryptoDima Chumak
Implement devlink port function commands to enable / disable IPsec crypto offloads. This is used to control the IPsec capability of the device. When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec crypto offloads on the PF, it's not allowed to enable ipsec_crypto on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27net/mlx5: Provide an interface to block change of IPsec capabilitiesLeon Romanovsky
mlx5 HW can't perform IPsec offload operation simultaneously both on PF and VFs at the same time. While the previous patches added devlink knobs to change IPsec capabilities dynamically, there is a need to add a logic to block such IPsec capabilities for the cases when IPsec is already configured. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27net/mlx5: Add IFC bits to support IPsec enable/disableLeon Romanovsky
Add hardware definitions to allow to control IPSec capabilities. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27net/mlx5e: Rewrite IPsec vs. TC block interfaceLeon Romanovsky
In the commit 366e46242b8e ("net/mlx5e: Make IPsec offload work together with eswitch and TC"), new API to block IPsec vs. TC creation was introduced. Internally, that API used devlink lock to avoid races with userspace, but it is not really needed as dev->priv.eswitch is stable and can't be changed. So remove dependency on devlink lock and move block encap code back to its original place. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27net/mlx5: Drop extra layer of locks in IPsecLeon Romanovsky
There is no need in holding devlink lock as it gives nothing compared to already used write mode_lock. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27devlink: Expose port function commands to control IPsec packet offloadsDima Chumak
Expose port function commands to enable / disable IPsec packet offloads, this is used to control the port IPsec capabilities. When IPsec packet is disabled for a function of the port (default), function cannot offload IPsec packet operations (encapsulation and XFRM policy offload). When enabled, IPsec packet operations can be offloaded by the function of the port, which includes crypto operation (Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy offload. Example of a PCI VF port which supports IPsec packet offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27devlink: Expose port function commands to control IPsec crypto offloadsDima Chumak
Expose port function commands to enable / disable IPsec crypto offloads, this is used to control the port IPsec capabilities. When IPsec crypto is disabled for a function of the port (default), function cannot offload any IPsec crypto operations (Encrypt/Decrypt and XFRM state offloading). When enabled, IPsec crypto operations can be offloaded by the function of the port. Example of a PCI VF port which supports IPsec crypto offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable $ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27Linux 6.5v6.5Linus Torvalds
2023-08-27dt-bindings: PCI: qcom: Fix SDX65 compatibleKrzysztof Kozlowski
Commit c0aba9f32801 ("dt-bindings: PCI: qcom: Add SDX65 SoC") adding SDX65 was never tested and is clearly bogus. The qcom,sdx65-pcie-ep compatible is followed by a fallback in DTS, and there is no driver matched by this compatible. Driver matches by its fallback qcom,sdx55-pcie-ep. This also fixes dtbs_check warnings like: qcom-sdx65-mtp.dtb: pcie-ep@1c00000: compatible: ['qcom,sdx65-pcie-ep', 'qcom,sdx55-pcie-ep'] is too long [kwilczynski: commit log] Fixes: c0aba9f32801 ("dt-bindings: PCI: qcom: Add SDX65 SoC") Link: https://lore.kernel.org/linux-pci/20230827085351.21932-1-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Cc: stable@vger.kernel.org
2023-08-27ext4: fix slab-use-after-free in ext4_es_insert_extent()Baokun Li
Yikebaer reported an issue: ================================================================== BUG: KASAN: slab-use-after-free in ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 Read of size 4 at addr ffff888112ecc1a4 by task syz-executor/8438 CPU: 1 PID: 8438 Comm: syz-executor Not tainted 6.5.0-rc5 #1 Call Trace: [...] kasan_report+0xba/0xf0 mm/kasan/report.c:588 ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Allocated by task 8438: [...] kmem_cache_zalloc include/linux/slab.h:693 [inline] __es_alloc_extent fs/ext4/extents_status.c:469 [inline] ext4_es_insert_extent+0x672/0xcb0 fs/ext4/extents_status.c:873 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Freed by task 8438: [...] kmem_cache_free+0xec/0x490 mm/slub.c:3823 ext4_es_try_to_merge_right fs/ext4/extents_status.c:593 [inline] __es_insert_extent+0x9f4/0x1440 fs/ext4/extents_status.c:802 ext4_es_insert_extent+0x2ca/0xcb0 fs/ext4/extents_status.c:882 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] ================================================================== The flow of issue triggering is as follows: 1. remove es raw es es removed es1 |-------------------| -> |----|.......|------| 2. insert es es insert es1 merge with es es1 merge with es and free es1 |----|.......|------| -> |------------|------| -> |-------------------| es merges with newes, then merges with es1, frees es1, then determines if es1->es_len is 0 and triggers a UAF. The code flow is as follows: ext4_es_insert_extent es1 = __es_alloc_extent(true); es2 = __es_alloc_extent(true); __es_remove_extent(inode, lblk, end, NULL, es1) __es_insert_extent(inode, &newes, es1) ---> insert es1 to es tree __es_insert_extent(inode, &newes, es2) ext4_es_try_to_merge_right ext4_es_free_extent(inode, es1) ---> es1 is freed if (es1 && !es1->es_len) // Trigger UAF by determining if es1 is used. We determine whether es1 or es2 is used immediately after calling __es_remove_extent() or __es_insert_extent() to avoid triggering a UAF if es1 or es2 is freed. Reported-by: Yikebaer Aizezi <yikebaer61@gmail.com> Closes: https://lore.kernel.org/lkml/CALcu4raD4h9coiyEBL4Bm0zjDwxC2CyPiTwsP3zFuhot6y9Beg@mail.gmail.com Fixes: 2a69c450083d ("ext4: using nofail preallocation in ext4_es_insert_extent()") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230815070808.3377171-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27libfs: remove redundant checks of s_encodingEric Biggers
Now that neither ext4 nor f2fs allows inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support later on during random filesystem operations. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230814182903.37267-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: remove redundant checks of s_encodingEric Biggers
Now that ext4 does not allow inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support later on during random filesystem operations. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230814182903.37267-3-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: reject casefold inode flag without casefold featureEric Biggers
It is invalid for the casefold inode flag to be set without the casefold superblock feature flag also being set. e2fsck already considers this case to be invalid and handles it by offering to clear the casefold flag on the inode. __ext4_iget() also already considered this to be invalid, sort of, but it only got so far as logging an error message; it didn't actually reject the inode. Make it reject the inode so that other code doesn't have to handle this case. This matches what f2fs does. Note: we could check 's_encoding != NULL' instead of ext4_has_feature_casefold(). This would make the check robust against the casefold feature being enabled by userspace writing to the page cache of the mounted block device. However, it's unsolvable in general for filesystems to be robust against concurrent writes to the page cache of the mounted block device. Though this very particular scenario involving the casefold feature is solvable, we should not pretend that we can support this model, so let's just check the casefold feature. tune2fs already forbids enabling casefold on a mounted filesystem. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230814182903.37267-2-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: use LIST_HEAD() to initialize the list_head in mballoc.cRuan Jinjie
Use LIST_HEAD() to initialize the list_head instead of open-coding it. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230812071839.3481909-1-ruanjinjie@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: do not mark inode dirty every time when appending using delallocLiu Song
In the delalloc append write scenario, if inode's i_size is extended due to buffer write, there are delalloc writes pending in the range up to i_size, and no need to touch i_disksize since writeback will push i_disksize up to i_size eventually. Offers significant performance improvement in high-frequency append write scenarios. I conducted tests in my 32-core environment by launching 32 concurrent threads to append write to the same file. Each write operation had a length of 1024 bytes and was repeated 100000 times. Without using this patch, the test was completed in 7705 ms. However, with this patch, the test was completed in 5066 ms, resulting in a performance improvement of 34%. Moreover, in test scenarios of Kafka version 2.6.2, using packet size of 2K, with this patch resulted in a 10% performance improvement. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230810154333.84921-1-liusong@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: rename s_error_work to s_sb_upd_workTheodore Ts'o
The most common use that s_error_work will get scheduled is now the periodic update of the superblock. So rename it to s_sb_upd_work. Also rename the function flush_stashed_error_work() to update_super_work(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: add periodic superblock update checkVitaliy Kuznetsov
This patch introduces a mechanism to periodically check and update the superblock within the ext4 file system. The main purpose of this patch is to keep the disk superblock up to date. The update will be performed if more than one hour has passed since the last update, and if more than 16MB of data have been written to disk. This check and update is performed within the ext4_journal_commit_callback function, ensuring that the superblock is written while the disk is active, rather than based on a timer that may trigger during disk idle periods. Discussion https://www.spinics.net/lists/linux-ext4/msg85865.html Signed-off-by: Vitaliy Kuznetsov <vk.en.mail@gmail.com> Link: https://lore.kernel.org/r/20230810143852.40228-1-vk.en.mail@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: drop dio overwrite only flag and associated warningBrian Foster
The commit referenced below opened up concurrent unaligned dio under shared locking for pure overwrites. In doing so, it enabled use of the IOMAP_DIO_OVERWRITE_ONLY flag and added a warning on unexpected -EAGAIN returns as an extra precaution, since ext4 does not retry writes in such cases. The flag itself is advisory in this case since ext4 checks for unaligned I/Os and uses appropriate locking up front, rather than on a retry in response to -EAGAIN. As it turns out, the warning check is susceptible to false positives because there are scenarios where -EAGAIN can be expected from lower layers without necessarily having IOCB_NOWAIT set on the iocb. For example, one instance of the warning has been seen where io_uring sets IOCB_HIPRI, which in turn results in REQ_POLLED|REQ_NOWAIT on the bio. This results in -EAGAIN if the block layer is unable to allocate a request, etc. [Note that there is an outstanding patch to untangle REQ_POLLED and REQ_NOWAIT such that the latter relies on IOCB_NOWAIT, which would also address this instance of the warning.] Another instance of the warning has been reproduced by syzbot. A dio write is interrupted down in __get_user_pages_locked() waiting on the mm lock and returns -EAGAIN up the stack. If the iomap dio iteration layer has made no progress on the write to this point, -EAGAIN returns up to the filesystem and triggers the warning. This use of the overwrite flag in ext4 is precautionary and half-baked. I.e., ext4 doesn't actually implement overwrite checking in the iomap callbacks when the flag is set, so the only extra verification it provides are i_size checks in the generic iomap dio layer. Combined with the tendency for false positives, the added verification is not worth the extra trouble. Remove the flag, associated warning, and update the comments to document when concurrent unaligned dio writes are allowed and why said flag is not used. Cc: stable@kernel.org Reported-by: syzbot+5050ad0fb47527b1808a@syzkaller.appspotmail.com Reported-by: Pengfei Xu <pengfei.xu@intel.com> Fixes: 310ee0902b8d ("ext4: allow concurrent unaligned dio overwrites") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230810165559.946222-1-bfoster@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: add correct group descriptors and reserved GDT blocks to system zoneWang Jianjian
When setup_system_zone, flex_bg is not initialized so it is always 1. Use a new helper function, ext4_num_base_meta_blocks() which does not depend on sbi->s_log_groups_per_flex being initialized. [ Squashed two patches in the Link URL's below together into a single commit, which is simpler to review/understand. Also fix checkpatch warnings. --TYT ] Cc: stable@kernel.org Signed-off-by: Wang Jianjian <wangjianjian0@foxmail.com> Link: https://lore.kernel.org/r/tencent_21AF0D446A9916ED5C51492CC6C9A0A77B05@qq.com Link: https://lore.kernel.org/r/tencent_D744D1450CC169AEA77FCF0A64719909ED05@qq.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: remove unused function declarationCai Xinchen
These functions do not have its function implementation. So those function declaration is useless. Remove these Signed-off-by: Cai Xinchen <caixinchen1@huawei.com> Link: https://lore.kernel.org/r/20230802030025.173148-1-caixinchen1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: mballoc: avoid garbage value from errSu Hui
clang's static analysis warning: fs/ext4/mballoc.c line 4178, column 6, Branch condition evaluates to a garbage value. err is uninitialized and will be judged when 'len <= 0' or it first enters the loop while the condition "!ext4_sb_block_valid()" is true. Although this can't make problems now, it's better to correct it. Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230725043310.1227621-1-suhui@nfschina.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple()Lu Hongfei
Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Link: https://lore.kernel.org/r/20230707115907.26637-1-luhongfei@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: change the type of blocksize in ext4_mb_init_cache()Lu Hongfei
The return value type of i_blocksize() is 'unsigned int', so the type of blocksize has been modified from 'int' to 'unsigned int' to ensure data type consistency. Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Link: https://lore.kernel.org/r/20230707105516.9156-1-luhongfei@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: fix unttached inode after power cut with orphan file feature enabledZhihao Cheng
Running generic/475(filesystem consistent tests after power cut) could easily trigger unattached inode error while doing fsck: Unattached zero-length inode 39405. Clear? no Unattached inode 39405 Connect to /lost+found? no Above inconsistence is caused by following process: P1 P2 ext4_create inode = ext4_new_inode_start_handle // itable records nlink=1 ext4_add_nondir err = ext4_add_entry // ENOSPC ext4_append ext4_bread ext4_getblk ext4_map_blocks // returns ENOSPC drop_nlink(inode) // won't be updated into disk inode ext4_orphan_add(handle, inode) ext4_orphan_file_add ext4_journal_stop(handle) jbd2_journal_commit_transaction // commit success >> power cut << ext4_fill_super ext4_load_and_init_journal // itable records nlink=1 ext4_orphan_cleanup ext4_process_orphan if (inode->i_nlink) // true, inode won't be deleted Then, allocated inode will be reserved on disk and corresponds to no dentries, so e2fsck reports 'unattached inode' problem. The problem won't happen if orphan file feature is disabled, because ext4_orphan_add() will update disk inode in orphan list mode. There are several places not updating disk inode while putting inode into orphan area, such as ext4_add_nondir(), ext4_symlink() and whiteout in ext4_rename(). Fix it by updating inode into disk in all error branches of these places. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217605 Fixes: 02f310fcf47f ("ext4: Speedup ext4 orphan inode handling") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230628132011.650383-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27jbd2: correct the end of the journal recovery scan rangeZhang Yi
We got a filesystem inconsistency issue below while running generic/475 I/O failure pressure test with fast_commit feature enabled. Symlink /p3/d3/d1c/d6c/dd6/dce/l101 (inode #132605) is invalid. If fast_commit feature is enabled, a special fast_commit journal area is appended to the end of the normal journal area. The journal->j_last point to the first unused block behind the normal journal area instead of the whole log area, and the journal->j_fc_last point to the first unused block behind the fast_commit journal area. While doing journal recovery, do_one_pass(PASS_SCAN) should first scan the normal journal area and turn around to the first block once it meet journal->j_last, but the wrap() macro misuse the journal->j_fc_last, so the recovering could not read the next magic block (commit block perhaps) and would end early mistakenly and missing tN and every transaction after it in the following example. Finally, it could lead to filesystem inconsistency. | normal journal area | fast commit area | +-------------------------------------------------+------------------+ | tN(rere) | tN+1 |~| tN-x |...| tN-1 | tN(front) | .... | +-------------------------------------------------+------------------+ / / / start journal->j_last journal->j_fc_last This patch fix it by use the correct ending journal->j_last. Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Cc: stable@kernel.org Reported-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/linux-ext4/20230613043120.GB1584772@mit.edu/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230626073322.3956567-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27ext4: ext4_get_{dev}_journal return proper error valueZhang Yi
ext4_get_journal() and ext4_get_dev_journal() return NULL if they failed to init journal, making them return proper error value instead, also rename them to ext4_open_{inode,dev}_journal(). [ Folded fix to ext4_calculate_overhead() to check for an ERR_PTR instead of NULL. ] Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-13-yi.zhang@huaweicloud.com Reported-by: syzbot+b3123e6d9842e526de39@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20230826011029.2023140-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small driver fixes and one larger unused function set removal in the raid class (so no external impact)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: snic: Fix double free in snic_tgt_create() scsi: core: raid_class: Remove raid_component_add() scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5 scsi: ufs: mcq: Fix the search/wrap around logic
2023-08-27parisc: led: Fix LAN receive and transmit LEDsHelge Deller
Fix the LAN receive and LAN transmit LEDs, which where swapped up to now. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org>
2023-08-27Merge tag 'usb-serial-6.6-rc1' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next Johan writes: USB-serial updates for 6.6-rc1 Here are the USB-serial updates for 6.6-rc1, including: - support for the RS485 mode of XR devices - new modem device ids All have been in linux-next with no reported issues. * tag 'usb-serial-6.6-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: option: add FOXCONN T99W368/T99W373 product USB: serial: option: add Quectel EM05G variant (0x030e) USB: serial: xr: add TIOCGRS485 and TIOCSRS485 ioctls
2023-08-27tty: shrink the size of struct tty_struct by 40 bytesGreg Kroah-Hartman
It's been a long time since anyone has looked at what struct tty_struct looks like in memory, turns out there was a ton of holes. So move things around a bit, change one variable (closing) from being an int to a bool (it is only being tested for 0/1), and we end up saving 40 bytes per structure overall on x86-64 systems. Before this patch: /* size: 696, cachelines: 11, members: 37 */ /* sum members: 665, holes: 8, sum holes: 31 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 4 */ /* last cacheline: 56 bytes */ After this change: /* size: 656, cachelines: 11, members: 37 */ /* sum members: 654, holes: 1, sum holes: 2 */ /* forced alignments: 2 */ /* last cacheline: 16 bytes */ Cc: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/2023082519-cobbler-unholy-8d1f@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw()Jiri Slaby (SUSE)
The code is duplicated to perform the copy twice -- to handle buffer wrap-around. Instead of the duplication, roll this into the loop. (And add some blank lines around to have the code a bit more readable.) Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-15-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: extract ECHO_OP processing to a separate functionJiri Slaby (SUSE)
__process_echoes() contains ECHO_OPs processing. It is stuffed in a while loop and the whole function is barely readable. Separate it to a new function: n_tty_process_echo_ops(). Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-14-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: unify counts to size_tJiri Slaby (SUSE)
Some count types are already 'size_t' for a long time. Some were switched to 'size_t' recently. Unify the rest with those now. This allows for some min_t()s to become min()s. And make one min() an explicit min_t() as we are comparing signed 'room' to unsigned 'count'. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-13-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: use u8 for chars and flagsJiri Slaby (SUSE)
Unify with the tty layer and use u8 for both chars and flags. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-12-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: simplify chars_in_buffer()Jiri Slaby (SUSE)
The 'if' in chars_in_buffer() is misleadingly inverted. And since the only difference is the head used for computation, cache the head using ternary operator. And use that in return directly. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-11-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: remove unsigned char casts from character constantsJiri Slaby (SUSE)
We compile with -funsigned-char, so all character constants are already unsigned chars. Therefore, remove superfluous casts. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-10-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: move newline handling to a separate functionJiri Slaby (SUSE)
Currently, n_tty handles the newline in a label in n_tty_receive_char_canon(). That is invoked from two more places. Split this code to a separate function and avoid the label in this case. This makes the code flow more understandable. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-9-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: move canon handling to a separate functionJiri Slaby (SUSE)
n_tty_receive_char_special() is already complicated enough. Split the canon handling to a separate function: n_tty_receive_char_canon(). Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-8-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: use MASK() for masking out size bitsJiri Slaby (SUSE)
In n_tty, there is already a macro to mask out top bits from ring buffer counters. It is MASK() added some time ago. So use it more in the code to make it more readable. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-7-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: make n_tty_data::num_overrun unsignedJiri Slaby (SUSE)
n_tty_data::num_overrun is unlikely to overflow in a second. But make it explicitly unsigned to avoid printing negative values. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-6-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun()Jiri Slaby (SUSE)
The jiffies tests in n_tty_receive_overrun() are simplified ratelimiting (without locking). We could use struct ratelimit_state and the helpers, but to me, it occurs to be too complex for this use case. But the code currently tests both if the time passed (the first time_after()) and if jiffies wrapped around (the second time_after()). time_is_before_jiffies() takes care of both, provided overrun_time is initialized at the allocation time. So switch to time_is_before_jiffies(), the same what ratelimiting does. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-5-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: use 'num' for writes' countsJiri Slaby (SUSE)
We have a separate misnomer 'c' to hold the retuned value from tty->ops->write(). Instead, use 'num' already defined on another place (and already properly typed). Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-4-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: use output character directlyJiri Slaby (SUSE)
There is no point to use a local variable to store the character when we can pass it directly. This assignment comes from era when we used to do get_user(c, b). We no longer need this, so fix this. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-3-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27tty: n_tty: make flow of n_tty_receive_buf_common() a boolJiri Slaby (SUSE)
The 'flow' parameter of n_tty_receive_buf_common() is meant to be a boolean value. So use bool and alter call sites accordingly. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230827074147.2287-2-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27Revert "tty: serial: meson: Add a earlycon for the T7 SoC"Lucas Tanure
This reverts commit 6a4197f9763325043abf7690a21124a9facbf52e New SoC will use ttyS0 instead of ttyAML, so T7 SoC doesn't need a OF_EARLYCON_DECLARE. Fixes: 6a4197f97633 ("tty: serial: meson: Add a earlycon for the T7 SoC") Signed-off-by: Lucas Tanure <tanure@linux.com> Link: https://lore.kernel.org/r/20230827082944.5100-1-tanure@linux.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27parisc: lasi: Initialize LASI driver via arch_initcall()Helge Deller
Move initialization code for LASI out of the GSC driver. Since ASP and WAX have been moved in previous commits, the GSC driver is now just a driver which provides library functions for LASI, ASP and WAX and as such doesn't need an own initialization function any longer. Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-27parisc: asp: Initialize asp driver via arch_initcall()Helge Deller
Signed-off-by: Helge Deller <deller@gmx.de>