summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-14ext4: dio take shared inode lock when overwriting preallocated blocksZhang Yi
In the dio write path, we only take shared inode lock for the case of aligned overwriting initialized blocks inside EOF. But for overwriting preallocated blocks, it may only need to split unwritten extents, this procedure has been protected under i_data_sem lock, it's safe to release the exclusive inode lock and take shared inode lock. This could give a significant speed up for multi-threaded writes. Test on Intel Xeon Gold 6140 and nvme SSD with below fio parameters. direct=1 ioengine=libaio iodepth=10 numjobs=10 runtime=60 rw=randwrite size=100G And the test result are: Before: bs=4k IOPS=11.1k, BW=43.2MiB/s bs=16k IOPS=11.1k, BW=173MiB/s bs=64k IOPS=11.2k, BW=697MiB/s After: bs=4k IOPS=41.4k, BW=162MiB/s bs=16k IOPS=41.3k, BW=646MiB/s bs=64k IOPS=13.5k, BW=843MiB/s Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221226062015.3479416-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-14xfs: fix uninitialized variable accessDarrick J. Wong
If the end position of a GETFSMAP query overlaps an allocated space and we're using the free space info to generate fsmap info, the akeys information gets fed into the fsmap formatter with bad results. Zero-init the space. Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-14Merge tag 'xfs-alloc-perag-conversion' of ↵Darrick J. Wong
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-6.3-merge-A xfs: per-ag centric allocation alogrithms This series continues the work towards making shrinking a filesystem possible. We need to be able to stop operations from taking place on AGs that need to be removed by a shrink, so before shrink can be implemented we need to have the infrastructure in place to prevent incursion into AGs that are going to be, or are in the process, of being removed from active duty. The focus of this is making operations that depend on access to AGs use the perag to access and pin the AG in active use, thereby creating a barrier we can use to delay shrink until all active uses of an AG have been drained and new uses are prevented. This series starts by fixing some existing issues that are exposed by changes later in the series. They stand alone, so can be picked up independently of the rest of this patchset. The most complex of these fixes is cleaning up the mess that is the AGF deadlock avoidance algorithm. This algorithm stores the first block that is allocated in a transaction in tp->t_firstblock, then uses this to try to limit future allocations within the transaction to AGs at or higher than the filesystem block stored in tp->t_firstblock. This depends on one of the initial bug fixes in the series to move the deadlock avoidance checks to xfs_alloc_vextent(), and then builds on it to relax the constraints of the avoidance algorithm to only be active when a deadlock is possible. We also update the algorithm to record allocations from higher AGs that are allocated from, because we when we need to lock more than two AGs we still have to ensure lock order is correct. Therefore we can't lock AGs in the order 1, 3, 2, even though tp->t_firstblock indicates that we've allocated from AG 1 and so AG is valid to lock. It's not valid, because we already hold AG 3 locked, and so tp->t-first_block should actually point at AG 3, not AG 1 in this situation. It should now be obvious that the deadlock avoidance algorithm should record AGs, not filesystem blocks. So the series then changes the transaction to store the highest AG we've allocated in rather than a filesystem block we allocated. This makes it obvious what the constraints are, and trivial to update as we lock and allocate from various AGs. With all the bug fixes out of the way, the series then starts converting the code to use active references. Active reference counts are used by high level code that needs to prevent the AG from being taken out from under it by a shrink operation. The high level code needs to be able to handle not getting an active reference gracefully, and the shrink code will need to wait for active references to drain before continuing. Active references are implemented just as reference counts right now - an active reference is taken at perag init during mount, and all other active references are dependent on the active reference count being greater than zero. This gives us an initial method of stopping new active references without needing other infrastructure; just drop the reference taken at filesystem mount time and when the refcount then falls to zero no new references can be taken. In future, this will need to take into account AG control state (e.g. offline, no alloc, etc) as well as the reference count, but right now we can implement a basic barrier for shrink with just reference count manipulations. As such, patches to convert the perag state to atomic opstate fields similar to the xfs_mount and xlog opstate fields follow the initial active perag reference counting patches. The first target for active reference conversion is the for_each_perag*() iterators. This captures a lot of high level code that should skip offline AGs, and introduces the ability to differentiate between a lookup that didn't have an online AG and the end of the AG iteration range. From there, the inode allocation AG selection is converted to active references, and the perag is driven deeper into the inode allocation and btree code to replace the xfs_mount. Most of the inode allocation code operates on a single AG once it is selected, hence it should pass the perag as the primary referenced object around for allocation, not the xfs_mount. There is a bit of churn here, but it emphasises that inode allocation is inherently an allocation group based operation. Next the bmap/alloc interface undergoes a major untangling, reworking xfs_bmap_btalloc() into separate allocation operations for different contexts and failure handling behaviours. This then allows us to completely remove the xfs_alloc_vextent() layer via restructuring the xfs_alloc_vextent/xfs_alloc_ag_vextent() into a set of realtively simple helper function that describe the allocation that they are doing. e.g. xfs_alloc_vextent_exact_bno(). This allows the requirements for accessing AGs to be allocation context dependent. The allocations that require operation on a single AG generally can't tolerate failure after the allocation method and AG has been decided on, and hence the caller needs to manage the active references to ensure the allocation does not race with shrink removing the selected AG for the duration of the operation that requires access to that allocation group. Other allocations iterate AGs and so the first AG is just a hint - these do not need to pin a perag first as they can tolerate not being able to access an AG by simply skipping over it. These require new perag iteration functions that can start at arbitrary AGs and wrap around at arbitrary AGs, hence a new set for for_each_perag_wrap*() helpers to do this. Next is the rework of the filestreams allocator. This doesn't change any functionality, but gets rid of the unnecessary multi-pass selection algorithm when the selected AG is not available. It currently does a lookup pass which might iterate all AGs to select an AG, then checks if the AG is acceptible and if not does a "new AG" pass that is essentially identical to the lookup pass. Both of these scans also do the same "longest extent in AG" check before selecting an AG as is done after the AG is selected. IOWs, the filestreams algorithm can be greatly simplified into a single new AG selection pass if the there is no current association or the currently associated AG doesn't have enough contiguous free space for the allocation to proceed. With this simplification of the filestreams allocator, it's then trivial to convert it to use for_each_perag_wrap() for the AG scan algorithm. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'xfs-alloc-perag-conversion' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (42 commits) xfs: refactor the filestreams allocator pick functions xfs: return a referenced perag from filestreams allocator xfs: pass perag to filestreams tracing xfs: use for_each_perag_wrap in xfs_filestream_pick_ag xfs: track an active perag reference in filestreams xfs: factor out MRU hit case in xfs_filestream_select_ag xfs: remove xfs_filestream_select_ag() longest extent check xfs: merge new filestream AG selection into xfs_filestream_select_ag() xfs: merge filestream AG lookup into xfs_filestream_select_ag() xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c xfs: use xfs_bmap_longest_free_extent() in filestreams xfs: get rid of notinit from xfs_bmap_longest_free_extent xfs: factor out filestreams from xfs_bmap_btalloc_nullfb xfs: convert trim to use for_each_perag_range xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker xfs: move the minimum agno checks into xfs_alloc_vextent_check_args xfs: fold xfs_alloc_ag_vextent() into callers xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno() xfs: introduce xfs_alloc_vextent_prepare() xfs: introduce xfs_alloc_vextent_exact_bno() ...
2023-02-15erofs: unify anonymous inodes for blobJingbo Xu
Currently there're two anonymous inodes (inode and anon_inode in struct erofs_fscache) for each blob. The former was introduced as the address_space of page cache for bootstrap. The latter was initially introduced as both the address_space of page cache and also a sentinel in the shared domain. Since now the management of cookies in share domain has been decoupled with the anonymous inode, there's no need to maintain an extra anonymous inode. Let's unify these two anonymous inodes. Besides, in non-share-domain mode only bootstrap will allocate anonymous inode. To simplify the implementation, always allocate anonymous inode for both bootstrap and data blobs. Similarly release anonymous inodes for data blobs when .put_super() is called, or we'll get "VFS: Busy inodes after unmount." warning. Also remove the redundant set_nlink() when initializing the anonymous inode, since i_nlink has already been initialized to 1 when the inode gets allocated. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Link: https://lore.kernel.org/r/20230209063913.46341-5-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: relinquish volume with mutex heldJingbo Xu
Relinquish fscache volume with mutex held. Otherwise if a new domain is registered when the old domain with the same name gets removed from the list but not relinquished yet, fscache may complain the collision. Fixes: 8b7adf1dff3d ("erofs: introduce fscache-based domain") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Link: https://lore.kernel.org/r/20230209063913.46341-4-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: maintain cookies of share domain in self-contained listJingbo Xu
We'd better not touch sb->s_inodes list and inode->i_count directly. Let's maintain cookies of share domain in a self-contained list in erofs. Besides, relinquish cookie with the mutex held. Otherwise if a cookie is registered when the old cookie with the same name in the same domain has been removed from the list but not relinquished yet, fscache may complain "Duplicate cookie detected". Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Link: https://lore.kernel.org/r/20230209063913.46341-3-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: remove unused device mapping in meta routineJingbo Xu
Currently metadata is always on bootstrap, and thus device mapping is not needed so far. Remove the redundant device mapping in the meta routine. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230209063913.46341-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15MAINTAINERS: erofs: Add Documentation/ABI/testing/sysfs-fs-erofsYangtao Li
Add this doc to the erofs maintainers entry. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230209052013.34952-1-frank.li@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15Documentation/ABI: sysfs-fs-erofs: update supported featuresYue Hu
Add missing feaures for sysfs-fs-erofs feature doc. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230209051128.10571-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: remove unused EROFS_GET_BLOCKS_RAW flagJingbo Xu
For erofs_map_blocks() and erofs_map_blocks_flatmode(), the flags argument is always EROFS_GET_BLOCKS_RAW. Thus remove the unused flags parameter for these two functions. Besides EROFS_GET_BLOCKS_RAW is originally introduced for reading compressed (raw) data for compressed files. However it's never used actually and let's remove it now. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230209024825.17335-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: update print symbols for various flags in traceJingbo Xu
As new flags introduced, the corresponding print symbols for trace are not added accordingly. Add these missing print symbols for these flags. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230209024825.17335-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230209-kobj_type-erofs-v1-1-078c945e2c4b@weissschuh.net Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: add per-cpu threads for decompression as an optionSandeep Dhavale
Using per-cpu thread pool we can reduce the scheduling latency compared to workqueue implementation. With this patch scheduling latency and variation is reduced as per-cpu threads are high priority kthread_workers. The results were evaluated on arm64 Android devices running 5.10 kernel. The table below shows resulting improvements of total scheduling latency for the same app launch benchmark runs with 50 iterations. Scheduling latency is the latency between when the task (workqueue kworker vs kthread_worker) became eligible to run to when it actually started running. +-------------------------+-----------+----------------+---------+ | | workqueue | kthread_worker | diff | +-------------------------+-----------+----------------+---------+ | Average (us) | 15253 | 2914 | -80.89% | | Median (us) | 14001 | 2912 | -79.20% | | Minimum (us) | 3117 | 1027 | -67.05% | | Maximum (us) | 30170 | 3805 | -87.39% | | Standard deviation (us) | 7166 | 359 | | +-------------------------+-----------+----------------+---------+ Background: Boot times and cold app launch benchmarks are very important to the Android ecosystem as they directly translate to responsiveness from user point of view. While EROFS provides a lot of important features like space savings, we saw some performance penalty in cold app launch benchmarks in few scenarios. Analysis showed that the significant variance was coming from the scheduling cost while decompression cost was more or less the same. Having per-cpu thread pool we can see from the above table that this variation is reduced by ~80% on average. This problem was discussed at LPC 2022. Link to LPC 2022 slides and talk at [1] [1] https://lpc.events/event/16/contributions/1338/ [ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue issue [2] on many arm64 devices is resolved. ] [2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com Signed-off-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com
2023-02-15erofs: tidy up internal.hGao Xiang
Reorder internal.h code so that removing unneeded macros and more. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-6-hsiangkao@linux.alibaba.com
2023-02-15erofs: get rid of z_erofs_do_map_blocks() forward declarationGao Xiang
The code can be neater without forward declarations. Let's get rid of z_erofs_do_map_blocks() forward declaration. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-5-hsiangkao@linux.alibaba.com
2023-02-15erofs: move zdata.h into zdata.cGao Xiang
Definitions in zdata.h are only used in zdata.c and for internal use only. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-4-hsiangkao@linux.alibaba.com
2023-02-15erofs: remove tagged pointer helpersGao Xiang
Just open-code the remaining one to simplify the code. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-3-hsiangkao@linux.alibaba.com
2023-02-15erofs: avoid tagged pointers to mark sync decompressionGao Xiang
We could just use a boolean in z_erofs_decompressqueue for sync decompression to simplify the code. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-2-hsiangkao@linux.alibaba.com
2023-02-15erofs: get rid of erofs_inode_datablocks()Gao Xiang
erofs_inode_datablocks() has the only one caller, let's just get rid of it entirely. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-1-hsiangkao@linux.alibaba.com
2023-02-15erofs: simplify iloc()Gao Xiang
Actually we could pass in inodes directly to clean up all callers. Also rename iloc() as erofs_iloc(). Link: https://lore.kernel.org/r/20230114150823.432069-1-xiang@kernel.org Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: get rid of debug_one_dentry()Gao Xiang
Since erofsdump is available, no need to keep this debugging functionality at all. Also drop a useless comment since it's the VFS behavior. Link: https://lore.kernel.org/r/20230114125746.399253-1-xiang@kernel.org Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: remove linux/buffer_head.h dependencyGao Xiang
EROFS actually never uses buffer heads, therefore just get rid of BH_xxx definitions and linux/buffer_head.h inclusive. Link: https://lore.kernel.org/r/20230113065226.68801-2-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15erofs: clean up erofs_iget()Gao Xiang
Move inode hash function into inode.c and simplify erofs_iget(). Link: https://lore.kernel.org/r/20230113065226.68801-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-14riscv: Fix Zbb alternative IDsSamuel Holland
Commit 4bf8860760d9 ("riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions") switched ISA extension alternatives to use the RISCV_ISA_EXT_* macros instead of CPUFEATURE_*. This was mismerged when applied on top of the Zbb series, so the Zbb alternatives referenced the wrong errata ID values. Fixes: 9daca9a5b9ac ("Merge patch series "riscv: improve boot time isa extensions handling"") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Guo Ren <guoren@kernel.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230212021534.59121-3-samuel@sholland.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-14riscv: Fix early alternative patchingSamuel Holland
Now that the text to patch is located using a relative offset from the alternative entry, the text address should be computed without applying the kernel mapping offset, both before and after VM setup. Fixes: 8d23e94a4433 ("riscv: switch to relative alternative entries") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230212021534.59121-2-samuel@sholland.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-14Merge branch 'for-6.3/cxl-rr-emu' into cxl/nextDan Williams
Pick up the CXL DVSEC range register emulation for v6.3, and resolve conflicts with the cxl_port_probe() split (from for-6.3/cxl-ram-region) and event handling (from for-6.3/cxl-events).
2023-02-14RISC-V: re-order Kconfig selects alphanumericallyConor Dooley
Selects should be sorted alphanumerically, and were tidied up originally by Palmer in commit e8c7ef7d5819 ("RISC-V: Sort select statements alphanumerically") since then, things have gotten out of order again. Fish RMK's original script out of commit b1b3f49ce460 ("ARM: config: sort select statements alphanumerically") and do some spring cleaning. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20221219172836.134709-1-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-14Documentation: riscv: fix insufficient list item indentConor Dooley
When adding the ISA string ordering rules, I didn't sufficiently indent one of the list items. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/linux-doc/202301300743.bp7Dpazv-lkp@intel.com/ Fixes: f07b2b3f9d47 ("Documentation: riscv: add a section about ISA string ordering in /proc/cpuinfo") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20230129235701.2393241-1-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-14cxl/pci: Remove locked check for dvsec_range_allowed()Dave Jiang
Remove the CXL_DECODER_F_LOCK check to be permissive of platform BIOSes that allow CXL.mem to be remapped. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167640370085.935665.13128321011001358077.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14cxl/hdm: Add emulation when HDM decoders are not committedDave Jiang
For the case where DVSEC range register(s) are active and HDM decoders are not committed, use RR to provide emulation. A first pass is done to note whether any decoders are committed. If there are no committed endpoint decoders, then DVSEC ranges will be used for emulation. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167640369536.935665.611974113442400127.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decodersDave Jiang
CXL rev3 spec 8.1.3 RCDs may not have HDM register blocks. Create a fake HDM with information from the CXL PCIe DVSEC registers. The decoder count will be set to the HDM count retrieved from the DVSEC cap register. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167640368994.935665.15831225724059704620.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14cxl/hdm: Emulate HDM decoder from DVSEC range registersDave Jiang
In the case where HDM decoder register block exists but is not programmed and at the same time the DVSEC range register range is active, populate the CXL decoder object 'cxl_decoder' with info from DVSEC range registers. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167640368454.935665.13806415120298330717.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14cxl/pci: Refactor cxl_hdm_decode_init()Dave Jiang
With the previous refactoring of DVSEC range registers out of cxl_hdm_decode_init(), it basically becomes a skeleton function. Squash __cxl_hdm_decode_init() with cxl_hdm_decode_init() to simplify the code. cxl_hdm_decode_init() now returns more error codes than just -EBUSY. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167640367916.935665.12898404758336059003.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14cxl/port: Export cxl_dvsec_rr_decode() to cxl_portDave Jiang
Call cxl_dvsec_rr_decode() in the beginning of cxl_port_probe() and preserve the decoded information in a local 'struct cxl_endpoint_dvsec_info'. This info can be passed to various functions later on in order to support the HDM decoder emulation. The invocation of cxl_dvsec_rr_decode() in cxl_hdm_decode_init() is removed and a pointer to the 'struct cxl_endpoint_dvsec_info' is passed in. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167640367377.935665.2848747799651019676.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14cxl/pci: Break out range register decoding from cxl_hdm_decode_init()Dave Jiang
There are 2 scenarios that requires additional handling. 1. A device that has active ranges in DVSEC range registers (RR) but no HDM decoder register block. 2. A device that has both RR active and HDM, but the HDM decoders are not programmed. The goal is to create emulated decoder software structs based on the RR. Move the CXL DVSEC range register decoding code block from cxl_hdm_decode_init() to its own function. Refactor code in preparation for the HDM decoder emulation. There is no functionality change to the code. Name the new function to cxl_dvsec_rr_decode(). The only change is to set range->start and range->end to CXL_RESOURCE_NONE and skipping the reading of base registers if the range size is 0, which equates to range not active. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167640366839.935665.11816388524993234329.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14Merge branch 'for-6.3/cxl' into cxl/nextDan Williams
Pick up the AER unmasking patches for v6.3.
2023-02-14Merge branch 'for-6.3/cxl-ram-region' into cxl/nextDan Williams
Pick up some fixes from exposure of for-6.3/cxl-ram-region in linux-next.
2023-02-14ASoC: soc-ac97: Convert to agnostic GPIO APIAndy Shevchenko
The of_gpio.h is going to be removed. In preparation of that convert the driver to the agnostic API. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230213161713.1450-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-02-14x86/hotplug: Remove incorrect comment about mwait_play_dead()Srivatsa S. Bhat (VMware)
The comment that says mwait_play_dead() returns only on failure is a bit misleading because mwait_play_dead() could actually return for valid reasons (such as mwait not being supported by the platform) that do not indicate a failure of the CPU offline operation. So, remove the comment. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230128003751.141317-1-srivatsa@csail.mit.edu
2023-02-14cxl: add RAS status unmasking for CXLDave Jiang
By default the CXL RAS mask registers bits are defaulted to 1's and suppress all error reporting. If the kernel has negotiated ownership of error handling for CXL then unmask the mask registers by writing 0s. PCI_EXP_DEVCTL capability is checked to see uncorrectable or correctable errors bits are set before unmasking the respective errors. Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_regs.h Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167639402301.778884.12556849214955646539.stgit@djiang5-mobl3.local Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14net/mlx5: Suspend auxiliary devices only in case of PCI device suspendJiri Pirko
The original behavior introduced by commit c6acd629eec7 ("net/mlx5e: Add support for devlink-port in non-representors mode") correctly re-instantiated uplink devlink port and related netdevice during devlink reload. However with migration to auxiliary devices, this behaviour changed. Restore the original behaviour and tear down auxiliary devices completely during devlink reload. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5: Remove "recovery" arg from mlx5_load_one() functionJiri Pirko
mlx5_load_one() is always called with recovery==false, so remove the unneeded function arg. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: Create auxdev devlink instance in the same ns as parent devlinkJiri Pirko
Commit cited in "fixes" tag moved the devlink port under separate devlink entity created for auxiliary device. Respect the network namespace of parent devlink entity and allocate the devlink there. Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: Move devlink port registration to be done before netdev allocJiri Pirko
Move the devlink port registration to be done right after devlink instance registration. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: Move dl_port to struct mlx5e_devJiri Pirko
No need to have dl_port which is tightly coupled with mlx5e code in mlx5 core code. Move it to struct mlx5e_dev and loose mlx5e_devlink_get_dl_port() helper. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: Replace usage of mlx5e_devlink_get_dl_port() by netdev->devlink_portJiri Pirko
On places where netdev pointer is available, access related devlink_port pointer by netdev->devlink_port instead of using mlx5e_devlink_get_dl_port() which is going to be removed. Move SET_NETDEV_DEVLINK_PORT() call right after devlink port registration to make sure netdev->devlink_port is valid. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: Pass mdev to mlx5e_devlink_port_register()Jiri Pirko
Instead of accessing priv->mdev, pass mdev pointer to mlx5e_devlink_port_register() and access it directly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5: Remove outdated commentJiri Pirko
The comment is no longer applicable, as the devlink reload and instance cleanup are both protected with devlink instance lock, therefore no race can happen. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: TC, Remove redundant parse_attr argumentRoi Dayan
The parse_attr argument is not being used in actions_match_supported_fdb(). remove it. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-14net/mlx5e: Use a simpler comparison for uplink repRoi Dayan
get_route_and_out_devs() is uses the following condition mlx5e_eswitch_rep() && mlx5e_is_uplink_rep() to check if a given netdev is the uplink rep. Alternatively we can just use the straight forward version mlx5e_eswitch_uplink_rep() that only checks if a given netdev is uplink rep. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>