Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into clk-renesas
Pull one more Renesas clk driver update from Geert Uytterhoeven:
- Disable R-Car H3 ES1.*, as it was only available to an internal
development group and needed a lot of quirks and workarounds.
* tag 'renesas-clk-for-v6.3-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers:
clk: renesas: rcar-gen3: Disable R-Car H3 ES1.*
|
|
In the dio write path, we only take shared inode lock for the case of
aligned overwriting initialized blocks inside EOF. But for overwriting
preallocated blocks, it may only need to split unwritten extents, this
procedure has been protected under i_data_sem lock, it's safe to
release the exclusive inode lock and take shared inode lock.
This could give a significant speed up for multi-threaded writes. Test
on Intel Xeon Gold 6140 and nvme SSD with below fio parameters.
direct=1
ioengine=libaio
iodepth=10
numjobs=10
runtime=60
rw=randwrite
size=100G
And the test result are:
Before:
bs=4k IOPS=11.1k, BW=43.2MiB/s
bs=16k IOPS=11.1k, BW=173MiB/s
bs=64k IOPS=11.2k, BW=697MiB/s
After:
bs=4k IOPS=41.4k, BW=162MiB/s
bs=16k IOPS=41.3k, BW=646MiB/s
bs=64k IOPS=13.5k, BW=843MiB/s
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221226062015.3479416-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the end position of a GETFSMAP query overlaps an allocated space and
we're using the free space info to generate fsmap info, the akeys
information gets fed into the fsmap formatter with bad results.
Zero-init the space.
Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-6.3-merge-A
xfs: per-ag centric allocation alogrithms
This series continues the work towards making shrinking a filesystem
possible. We need to be able to stop operations from taking place
on AGs that need to be removed by a shrink, so before shrink can be
implemented we need to have the infrastructure in place to prevent
incursion into AGs that are going to be, or are in the process, of
being removed from active duty.
The focus of this is making operations that depend on access to AGs
use the perag to access and pin the AG in active use, thereby
creating a barrier we can use to delay shrink until all active uses
of an AG have been drained and new uses are prevented.
This series starts by fixing some existing issues that are exposed
by changes later in the series. They stand alone, so can be picked
up independently of the rest of this patchset.
The most complex of these fixes is cleaning up the mess that is the
AGF deadlock avoidance algorithm. This algorithm stores the first
block that is allocated in a transaction in tp->t_firstblock, then
uses this to try to limit future allocations within the transaction
to AGs at or higher than the filesystem block stored in
tp->t_firstblock. This depends on one of the initial bug fixes in
the series to move the deadlock avoidance checks to
xfs_alloc_vextent(), and then builds on it to relax the constraints
of the avoidance algorithm to only be active when a deadlock is
possible.
We also update the algorithm to record allocations from higher AGs
that are allocated from, because we when we need to lock more than
two AGs we still have to ensure lock order is correct. Therefore we
can't lock AGs in the order 1, 3, 2, even though tp->t_firstblock
indicates that we've allocated from AG 1 and so AG is valid to lock.
It's not valid, because we already hold AG 3 locked, and so
tp->t-first_block should actually point at AG 3, not AG 1 in this
situation.
It should now be obvious that the deadlock avoidance algorithm
should record AGs, not filesystem blocks. So the series then changes
the transaction to store the highest AG we've allocated in rather
than a filesystem block we allocated. This makes it obvious what
the constraints are, and trivial to update as we lock and allocate
from various AGs.
With all the bug fixes out of the way, the series then starts
converting the code to use active references. Active reference
counts are used by high level code that needs to prevent the AG from
being taken out from under it by a shrink operation. The high level
code needs to be able to handle not getting an active reference
gracefully, and the shrink code will need to wait for active
references to drain before continuing.
Active references are implemented just as reference counts right now
- an active reference is taken at perag init during mount, and all
other active references are dependent on the active reference count
being greater than zero. This gives us an initial method of stopping
new active references without needing other infrastructure; just
drop the reference taken at filesystem mount time and when the
refcount then falls to zero no new references can be taken.
In future, this will need to take into account AG control state
(e.g. offline, no alloc, etc) as well as the reference count, but
right now we can implement a basic barrier for shrink with just
reference count manipulations. As such, patches to convert the perag
state to atomic opstate fields similar to the xfs_mount and xlog
opstate fields follow the initial active perag reference counting
patches.
The first target for active reference conversion is the
for_each_perag*() iterators. This captures a lot of high level code
that should skip offline AGs, and introduces the ability to
differentiate between a lookup that didn't have an online AG and the
end of the AG iteration range.
From there, the inode allocation AG selection is converted to active
references, and the perag is driven deeper into the inode allocation
and btree code to replace the xfs_mount. Most of the inode
allocation code operates on a single AG once it is selected, hence
it should pass the perag as the primary referenced object around for
allocation, not the xfs_mount. There is a bit of churn here, but it
emphasises that inode allocation is inherently an allocation group
based operation.
Next the bmap/alloc interface undergoes a major untangling,
reworking xfs_bmap_btalloc() into separate allocation operations for
different contexts and failure handling behaviours. This then allows
us to completely remove the xfs_alloc_vextent() layer via
restructuring the xfs_alloc_vextent/xfs_alloc_ag_vextent() into a
set of realtively simple helper function that describe the
allocation that they are doing. e.g. xfs_alloc_vextent_exact_bno().
This allows the requirements for accessing AGs to be allocation
context dependent. The allocations that require operation on a
single AG generally can't tolerate failure after the allocation
method and AG has been decided on, and hence the caller needs to
manage the active references to ensure the allocation does not race
with shrink removing the selected AG for the duration of the
operation that requires access to that allocation group.
Other allocations iterate AGs and so the first AG is just a hint -
these do not need to pin a perag first as they can tolerate not
being able to access an AG by simply skipping over it. These require
new perag iteration functions that can start at arbitrary AGs and
wrap around at arbitrary AGs, hence a new set for
for_each_perag_wrap*() helpers to do this.
Next is the rework of the filestreams allocator. This doesn't change
any functionality, but gets rid of the unnecessary multi-pass
selection algorithm when the selected AG is not available. It
currently does a lookup pass which might iterate all AGs to select
an AG, then checks if the AG is acceptible and if not does a "new
AG" pass that is essentially identical to the lookup pass. Both of
these scans also do the same "longest extent in AG" check before
selecting an AG as is done after the AG is selected.
IOWs, the filestreams algorithm can be greatly simplified into a
single new AG selection pass if the there is no current association
or the currently associated AG doesn't have enough contiguous free
space for the allocation to proceed. With this simplification of
the filestreams allocator, it's then trivial to convert it to use
for_each_perag_wrap() for the AG scan algorithm.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'xfs-alloc-perag-conversion' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (42 commits)
xfs: refactor the filestreams allocator pick functions
xfs: return a referenced perag from filestreams allocator
xfs: pass perag to filestreams tracing
xfs: use for_each_perag_wrap in xfs_filestream_pick_ag
xfs: track an active perag reference in filestreams
xfs: factor out MRU hit case in xfs_filestream_select_ag
xfs: remove xfs_filestream_select_ag() longest extent check
xfs: merge new filestream AG selection into xfs_filestream_select_ag()
xfs: merge filestream AG lookup into xfs_filestream_select_ag()
xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c
xfs: use xfs_bmap_longest_free_extent() in filestreams
xfs: get rid of notinit from xfs_bmap_longest_free_extent
xfs: factor out filestreams from xfs_bmap_btalloc_nullfb
xfs: convert trim to use for_each_perag_range
xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker
xfs: move the minimum agno checks into xfs_alloc_vextent_check_args
xfs: fold xfs_alloc_ag_vextent() into callers
xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno()
xfs: introduce xfs_alloc_vextent_prepare()
xfs: introduce xfs_alloc_vextent_exact_bno()
...
|
|
Currently there're two anonymous inodes (inode and anon_inode in struct
erofs_fscache) for each blob. The former was introduced as the
address_space of page cache for bootstrap.
The latter was initially introduced as both the address_space of page
cache and also a sentinel in the shared domain. Since now the management
of cookies in share domain has been decoupled with the anonymous inode,
there's no need to maintain an extra anonymous inode. Let's unify these
two anonymous inodes.
Besides, in non-share-domain mode only bootstrap will allocate anonymous
inode. To simplify the implementation, always allocate anonymous inode
for both bootstrap and data blobs. Similarly release anonymous inodes
for data blobs when .put_super() is called, or we'll get "VFS: Busy
inodes after unmount." warning.
Also remove the redundant set_nlink() when initializing the anonymous
inode, since i_nlink has already been initialized to 1 when the inode
gets allocated.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20230209063913.46341-5-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Relinquish fscache volume with mutex held. Otherwise if a new domain is
registered when the old domain with the same name gets removed from the
list but not relinquished yet, fscache may complain the collision.
Fixes: 8b7adf1dff3d ("erofs: introduce fscache-based domain")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20230209063913.46341-4-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
We'd better not touch sb->s_inodes list and inode->i_count directly.
Let's maintain cookies of share domain in a self-contained list in erofs.
Besides, relinquish cookie with the mutex held. Otherwise if a cookie
is registered when the old cookie with the same name in the same domain
has been removed from the list but not relinquished yet, fscache may
complain "Duplicate cookie detected".
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20230209063913.46341-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Currently metadata is always on bootstrap, and thus device mapping is
not needed so far. Remove the redundant device mapping in the meta
routine.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230209063913.46341-2-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Add this doc to the erofs maintainers entry.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230209052013.34952-1-frank.li@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Add missing feaures for sysfs-fs-erofs feature doc.
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230209051128.10571-1-zbestahu@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
For erofs_map_blocks() and erofs_map_blocks_flatmode(), the flags
argument is always EROFS_GET_BLOCKS_RAW. Thus remove the unused flags
parameter for these two functions.
Besides EROFS_GET_BLOCKS_RAW is originally introduced for reading
compressed (raw) data for compressed files. However it's never used
actually and let's remove it now.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230209024825.17335-2-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
As new flags introduced, the corresponding print symbols for trace are
not added accordingly. Add these missing print symbols for these flags.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230209024825.17335-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definitions to prevent
modification at runtime.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230209-kobj_type-erofs-v1-1-078c945e2c4b@weissschuh.net
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.
The results were evaluated on arm64 Android devices running 5.10 kernel.
The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
| | workqueue | kthread_worker | diff |
+-------------------------+-----------+----------------+---------+
| Average (us) | 15253 | 2914 | -80.89% |
| Median (us) | 14001 | 2912 | -79.20% |
| Minimum (us) | 3117 | 1027 | -67.05% |
| Maximum (us) | 30170 | 3805 | -87.39% |
| Standard deviation (us) | 7166 | 359 | |
+-------------------------+-----------+----------------+---------+
Background: Boot times and cold app launch benchmarks are very
important to the Android ecosystem as they directly translate to
responsiveness from user point of view. While EROFS provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.
Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and talk at [1]
[1] https://lpc.events/event/16/contributions/1338/
[ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue
issue [2] on many arm64 devices is resolved. ]
[2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com
|
|
Reorder internal.h code so that removing unneeded macros and more.
No logic changes.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-6-hsiangkao@linux.alibaba.com
|
|
The code can be neater without forward declarations. Let's
get rid of z_erofs_do_map_blocks() forward declaration.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-5-hsiangkao@linux.alibaba.com
|
|
Definitions in zdata.h are only used in zdata.c and for internal
use only. No logic changes.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-4-hsiangkao@linux.alibaba.com
|
|
Just open-code the remaining one to simplify the code.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-3-hsiangkao@linux.alibaba.com
|
|
We could just use a boolean in z_erofs_decompressqueue for sync
decompression to simplify the code.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-2-hsiangkao@linux.alibaba.com
|
|
erofs_inode_datablocks() has the only one caller, let's just get
rid of it entirely. No logic changes.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-1-hsiangkao@linux.alibaba.com
|
|
Actually we could pass in inodes directly to clean up all callers.
Also rename iloc() as erofs_iloc().
Link: https://lore.kernel.org/r/20230114150823.432069-1-xiang@kernel.org
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Since erofsdump is available, no need to keep this debugging
functionality at all.
Also drop a useless comment since it's the VFS behavior.
Link: https://lore.kernel.org/r/20230114125746.399253-1-xiang@kernel.org
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
EROFS actually never uses buffer heads, therefore just get rid of
BH_xxx definitions and linux/buffer_head.h inclusive.
Link: https://lore.kernel.org/r/20230113065226.68801-2-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Move inode hash function into inode.c and simplify erofs_iget().
Link: https://lore.kernel.org/r/20230113065226.68801-1-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Commit 4bf8860760d9 ("riscv: cpufeature: extend
riscv_cpufeature_patch_func to all ISA extensions") switched ISA
extension alternatives to use the RISCV_ISA_EXT_* macros instead of
CPUFEATURE_*. This was mismerged when applied on top of the Zbb series,
so the Zbb alternatives referenced the wrong errata ID values.
Fixes: 9daca9a5b9ac ("Merge patch series "riscv: improve boot time isa extensions handling"")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230212021534.59121-3-samuel@sholland.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Now that the text to patch is located using a relative offset from the
alternative entry, the text address should be computed without applying
the kernel mapping offset, both before and after VM setup.
Fixes: 8d23e94a4433 ("riscv: switch to relative alternative entries")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230212021534.59121-2-samuel@sholland.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Pick up the CXL DVSEC range register emulation for v6.3, and resolve
conflicts with the cxl_port_probe() split (from for-6.3/cxl-ram-region)
and event handling (from for-6.3/cxl-events).
|
|
Selects should be sorted alphanumerically, and were tidied up originally
by Palmer in commit e8c7ef7d5819 ("RISC-V: Sort select statements
alphanumerically") since then, things have gotten out of order again.
Fish RMK's original script out of commit b1b3f49ce460 ("ARM: config:
sort select statements alphanumerically") and do some spring cleaning.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20221219172836.134709-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
When adding the ISA string ordering rules, I didn't sufficiently indent
one of the list items.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/linux-doc/202301300743.bp7Dpazv-lkp@intel.com/
Fixes: f07b2b3f9d47 ("Documentation: riscv: add a section about ISA string ordering in /proc/cpuinfo")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230129235701.2393241-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Remove the CXL_DECODER_F_LOCK check to be permissive of platform BIOSes
that allow CXL.mem to be remapped.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640370085.935665.13128321011001358077.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
For the case where DVSEC range register(s) are active and HDM decoders are
not committed, use RR to provide emulation. A first pass is done to note
whether any decoders are committed. If there are no committed endpoint
decoders, then DVSEC ranges will be used for emulation.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640369536.935665.611974113442400127.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
CXL rev3 spec 8.1.3
RCDs may not have HDM register blocks. Create a fake HDM with information
from the CXL PCIe DVSEC registers. The decoder count will be set to the
HDM count retrieved from the DVSEC cap register.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640368994.935665.15831225724059704620.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
In the case where HDM decoder register block exists but is not programmed
and at the same time the DVSEC range register range is active, populate the
CXL decoder object 'cxl_decoder' with info from DVSEC range registers.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640368454.935665.13806415120298330717.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
With the previous refactoring of DVSEC range registers out of
cxl_hdm_decode_init(), it basically becomes a skeleton function. Squash
__cxl_hdm_decode_init() with cxl_hdm_decode_init() to simplify the code.
cxl_hdm_decode_init() now returns more error codes than just -EBUSY.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640367916.935665.12898404758336059003.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Call cxl_dvsec_rr_decode() in the beginning of cxl_port_probe() and
preserve the decoded information in a local
'struct cxl_endpoint_dvsec_info'. This info can be passed to various
functions later on in order to support the HDM decoder emulation.
The invocation of cxl_dvsec_rr_decode() in cxl_hdm_decode_init() is
removed and a pointer to the 'struct cxl_endpoint_dvsec_info' is passed
in.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640367377.935665.2848747799651019676.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
There are 2 scenarios that requires additional handling. 1. A device that
has active ranges in DVSEC range registers (RR) but no HDM decoder register
block. 2. A device that has both RR active and HDM, but the HDM decoders
are not programmed. The goal is to create emulated decoder software structs
based on the RR.
Move the CXL DVSEC range register decoding code block from
cxl_hdm_decode_init() to its own function. Refactor code in preparation for
the HDM decoder emulation. There is no functionality change to the code.
Name the new function to cxl_dvsec_rr_decode().
The only change is to set range->start and range->end to CXL_RESOURCE_NONE
and skipping the reading of base registers if the range size is 0, which
equates to range not active.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640366839.935665.11816388524993234329.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Pick up the AER unmasking patches for v6.3.
|
|
Pick up some fixes from exposure of for-6.3/cxl-ram-region in
linux-next.
|
|
The of_gpio.h is going to be removed. In preparation of that convert
the driver to the agnostic API.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230213161713.1450-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The comment that says mwait_play_dead() returns only on failure is a bit
misleading because mwait_play_dead() could actually return for valid
reasons (such as mwait not being supported by the platform) that do not
indicate a failure of the CPU offline operation. So, remove the comment.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230128003751.141317-1-srivatsa@csail.mit.edu
|
|
By default the CXL RAS mask registers bits are defaulted to 1's and
suppress all error reporting. If the kernel has negotiated ownership
of error handling for CXL then unmask the mask registers by writing 0s.
PCI_EXP_DEVCTL capability is checked to see uncorrectable or correctable
errors bits are set before unmasking the respective errors.
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_regs.h
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167639402301.778884.12556849214955646539.stgit@djiang5-mobl3.local
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The original behavior introduced by commit c6acd629eec7 ("net/mlx5e: Add
support for devlink-port in non-representors mode") correctly
re-instantiated uplink devlink port and related netdevice during devlink
reload. However with migration to auxiliary devices, this behaviour
changed.
Restore the original behaviour and tear down auxiliary devices
completely during devlink reload.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_load_one() is always called with recovery==false, so remove the
unneeded function arg.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Commit cited in "fixes" tag moved the devlink port under separate
devlink entity created for auxiliary device. Respect the network
namespace of parent devlink entity and allocate the devlink there.
Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Move the devlink port registration to be done right after devlink
instance registration.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
No need to have dl_port which is tightly coupled with mlx5e code
in mlx5 core code. Move it to struct mlx5e_dev and loose
mlx5e_devlink_get_dl_port() helper.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
On places where netdev pointer is available, access related devlink_port
pointer by netdev->devlink_port instead of using
mlx5e_devlink_get_dl_port() which is going to be removed.
Move SET_NETDEV_DEVLINK_PORT() call right after devlink port
registration to make sure netdev->devlink_port is valid.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Instead of accessing priv->mdev, pass mdev pointer to
mlx5e_devlink_port_register() and access it directly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The comment is no longer applicable, as the devlink reload and instance
cleanup are both protected with devlink instance lock, therefore no race
can happen.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The parse_attr argument is not being used in
actions_match_supported_fdb(). remove it.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|