Age | Commit message (Collapse) | Author |
|
While modifying wait_task_inactive() for PREEMPT_RT; the build robot
noted that UP got broken. This led to audit and consideration of the
UP implementation of wait_task_inactive().
It looks like the UP implementation is also broken for PREEMPT;
consider task_current_syscall() getting preempted between the two
calls to wait_task_inactive().
Therefore move the wait_task_inactive() implementation out of
CONFIG_SMP and unconditionally use it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230602103731.GA630648%40hirez.programming.kicks-ass.net
|
|
early_lookup_bdev is now only used during the early boot code as it
should, so mark it __init to not waste run time memory on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Open code dm_get_dev_t in the only remaining caller, and propagate the
exact error code from lookup_bdev and early_lookup_bdev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blk_lookup_devt is only used by code in early-lookup.c, so move it
there.
printk_all_partitions and it's helper bdevt_str are only used by the
early init code in init/do_mounts.c, so they should go there as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
name_to_dev_t has a very misleading name, that doesn't make clear
it should only be used by the early init code, and also has a bad
calling convention that doesn't allow returning different kinds of
errors. Rename it to early_lookup_bdev to make the use case clear,
and return an errno, where -EINVAL means the string could not be
parsed, and -ENODEV means it the string was valid, but there was
no device found for it.
Also stub out the whole call for !CONFIG_BLOCK as all the non-block
root cases are always covered in the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Assign a Root_Generic magic value for UBI/MTD root and handle the root
mounting in mount_root like all other root types. Besides making the
code more clear this also means that UBI/MTD root can be used together
with an initrd (not that anyone should care).
Also factor parsing of the root name into a helper now that it can
be easily done and will get more complicated with subsequent patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove all unused defines, and just use the expanded versions for
the SCSI disk majors.
I've decided to keep Root_RAM0 even if it could be expanded as there
is a lot of special casing for it in the init code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bool is the most sensible return value for a yes/no return. Also
add __init as this funtion is only called from the early boot code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230531125535.676098-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new ->shutdown super operation that can be used to tell the file
system to shut down, and call it from newly created holder ops when the
block device under a file system shuts down.
This only covers the main block device for "simple" file systems using
get_tree_bdev / mount_bdev. File systems their own get_tree method
or opening additional devices will need to set up their own
blk_holder_ops.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead
to notify the holder that the block device it is using has been marked dead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims. It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
Qualcomm driver fixes for 6.4
Error paths is corrected across icc-bwmon, rpmh-rsc, ramp_controller and
rmtfs. The ice module is renamed qcom_ice, to avoid clashing with
existing "ice" driver.
SA8155P-specific RPMh power-domains are introduced to avoid the code
trying to access resources that exists on SM8150, but not on SA8155P.
Lastly, changes to the EDAC driver to fix an issue where the driver
performs mmio based on the wrong register map.
* tag 'qcom-driver-fixes-for-6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
EDAC/qcom: Get rid of hardcoded register offsets
EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
dt-bindings: cache: qcom,llcc: Fix SM8550 description
soc: qcom: rpmhpd: Add SA8155P power domains
dt-bindings: power: qcom,rpmpd: Add SA8155P
soc: qcom: Rename ice to qcom_ice to avoid module name conflict
soc: qcom: rmtfs: Fix error code in probe()
soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
soc: qcom: rpmh-rsc: drop redundant unsigned >=0 comparision
soc: qcom: icc-bwmon: fix incorrect error code passed to dev_err_probe()
Link: https://lore.kernel.org/r/20230601141058.2246039-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Use proper notion for 16bit values for fixing the sparse warnings.
Fixes: f8ddb0fb3289 ("ALSA: usb-audio: Define USB MIDI 2.0 specs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305260528.wcqjXso8-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202305270534.odwHL9F0-lkp@intel.com/
Link: https://lore.kernel.org/r/20230605144758.6677-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The CTI module has some hard coded refcounting code that has a leak.
For example running perf and then trying to unload it fails:
perf record -e cs_etm// -a -- ls
rmmod coresight_cti
rmmod: ERROR: Module coresight_cti is in use
The coresight core already handles references of devices in use, so by
making CTI a normal helper device, we get working refcounting for free.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-14-james.clark@arm.com
|
|
Currently CATU is the only helper device, and its enable and disable
calls are hard coded. To allow more helper devices to be added in a
generic way, remove these hard coded calls and just enable and disable
all helper devices.
This has to apply to helpers adjacent to the path, because they will
never be in the path. CATU was already discovered in this way, so
there is no change there.
One change that is needed is for CATU to call back into ETR to allocate
the buffer. Because the enable call was previously hard coded, it was
done at a point where the buffer was already allocated, but this is no
longer the case.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-13-james.clark@arm.com
|
|
This removes the need to do an additional lookup for the total number
of ports used and also removes the need to allocate an array of
refcounts which is just another representation of a connection array.
This was only used for link type devices, for regular devices a single
refcount on the coresight device is used.
There is a both an input and output refcount in case two link type
devices are connected together so that they don't overwrite each other's
counts.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-11-james.clark@arm.com
|
|
This will allow CATU to get its associated ETR in a generic way where
currently the enable path has some hard coded searches which avoid
the need to store input connections.
This also means that the full search for connected devices on removal
can be replaced with a loop through only the input and output devices.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-10-james.clark@arm.com
|
|
This will allow the same connection object to be referenced via the
input connection list in a later commit rather than duplicating them.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-8-james.clark@arm.com
|
|
Add a function for adding connections dynamically. This also removes
the 1:1 mapping between port number and the index into the connections
array. The only place this mapping was used was in the warning for
duplicate output ports, which has been replaced by a search. Other
uses of the port number already use the port member variable.
Being able to dynamically add connections will allow other devices like
CTI to re-use the connection mechanism despite not having explicit
connections described in the DT.
The connections array is now no longer sparse, so child_fwnode doesn't
need to be checked as all connections have a target node. Because the
array is no longer sparse, the high in and out port numbers are required
for the refcount arrays. But these will also be removed in a later
commit when the refcount is made a property of the connection.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-7-james.clark@arm.com
|
|
When input connections are added they will use the same connection
object as the output so parent and child could be misinterpreted. Making
the direction unambiguous in the names should improve readability.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-6-james.clark@arm.com
|
|
Rename to avoid confusion between port number and the index in the
connection array. The port number is already stored in the connection,
and in a later commit the connection array will be appended to, so
the length of it will no longer reflect the number of ports.
No functional changes.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-5-james.clark@arm.com
|
|
conns is actually for output connections. Change the name to make it
clearer and so that we can add input connections later.
No functional changes.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-4-james.clark@arm.com
|
|
mode is stored as a local_t, but it is also passed around a lot as a
plain u32, so use the correct type wherever local_t isn't currently
used. This helps a little bit with readability.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230425143542.2305069-3-james.clark@arm.com
|
|
Sidharth reports that on M2, the PMU never generates any interrupt
when using 'perf record', which is a annoying as you get no sample.
I'm temped to say "no sample, no problem", but others may have
a different opinion.
Upon investigation, it appears that the counters on M2 are
significantly different from the ones on M1, as they count on
64 bits instead of 48. Which of course, in the fine M1 tradition,
means that we can only use 63 bits, as the top bit is used to signal
the interrupt...
This results in having to introduce yet another flag to indicate yet
another odd counter width. Who knows what the next crazy implementation
will do...
With this, perf can work out the correct offset, and 'perf record'
works as intended.
Tested on M2 and M2-Pro CPUs.
Cc: Janne Grunau <j@jannau.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Fixes: 7d0bfb7c9977 ("drivers/perf: apple_m1: Add Apple M2 support")
Reported-by: Sidharth Kshatriya <sid.kshatriya@gmail.com>
Tested-by: Sidharth Kshatriya <sid.kshatriya@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230528080205.288446-1-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
read_poll_timeout_atomic() uses ktime_get() to implement the timeout
feature, just like its non-atomic counterpart. However, there are
several issues with this, due to its use in atomic contexts:
1. When called in the s2ram path (as typically done by clock or PM
domain drivers), timekeeping may be suspended, triggering the
WARN_ON(timekeeping_suspended) in ktime_get():
WARNING: CPU: 0 PID: 654 at kernel/time/timekeeping.c:843 ktime_get+0x28/0x78
Calling ktime_get_mono_fast_ns() instead of ktime_get() would get
rid of that warning. However, that would break timeout handling,
as (at least on systems with an ARM architectured timer), the time
returned by ktime_get_mono_fast_ns() does not advance while
timekeeping is suspended.
Interestingly, (on the same ARM systems) the time returned by
ktime_get() does advance while timekeeping is suspended, despite
the warning.
2. Depending on the actual clock source, and especially before a
high-resolution clocksource (e.g. the ARM architectured timer)
becomes available, time may not advance in atomic contexts, thus
breaking timeout handling.
Fix this by abandoning the idea that one can rely on timekeeping to
implement timeout handling in all atomic contexts, and switch from a
global time-based to a locally-estimated timeout handling. In most
(all?) cases the timeout condition is exceptional and an error
condition, hence any additional delays due to underestimating wall clock
time are irrelevant.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/3d2a2f4e553489392d871108797c3be08f88300b.1685692810.git.geert+renesas@glider.be
|
|
It is considered good practice to call cpu_relax() in busy loops, see
Documentation/process/volatile-considered-harmful.rst. This can not
only lower CPU power consumption or yield to a hyperthreaded twin
processor, but also allows an architecture to mitigate hardware issues
(e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the
architecture-specific cpu_relax() implementation.
In addition, cpu_relax() is also a compiler barrier. It is not
immediately obvious that the @op argument "function" will result in an
actual function call (e.g. in case of inlining).
Where a function call is a C sequence point, this is lost on inlining.
Therefore, with agressive enough optimization it might be possible for
the compiler to hoist the:
(val) = op(args);
"load" out of the loop because it doesn't see the value changing. The
addition of cpu_relax() would inhibit this.
As the iopoll helpers lack calls to cpu_relax(), people are sometimes
reluctant to use them, and may fall back to open-coded polling loops
(including cpu_relax() calls) instead.
Fix this by adding calls to cpu_relax() to the iopoll helpers:
- For the non-atomic case, it is sufficient to call cpu_relax() in
case of a zero sleep-between-reads value, as a call to
usleep_range() is a safe barrier otherwise. However, it doesn't
hurt to add the call regardless, for simplicity, and for similarity
with the atomic case below.
- For the atomic case, cpu_relax() must be called regardless of the
sleep-between-reads value, as there is no guarantee all
architecture-specific implementations of udelay() handle this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/45c87bec3397fdd704376807f0eec5cc71be440f.1685692810.git.geert+renesas@glider.be
|
|
All other NFSv[23] procedures manage to keep page_ptr and
rq_next_page in lock step.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In the syscall test of UnixBench, performance regression occurred due
to false sharing.
The lock and atomic members, including file::f_lock, file::f_count and
file::f_pos_lock are highly contended and frequently updated in the
high-concurrency test scenarios. perf c2c indentified one affected
read access, file::f_op.
To prevent false sharing, the layout of file struct is changed as
following
(A) f_lock, f_count and f_pos_lock are put together to share the same
cache line.
(B) The read mostly members, including f_path, f_inode, f_op are put
into a separate cache line.
(C) f_mode is put together with f_count, since they are used frequently
at the same time.
Due to '__randomize_layout' attribute of file struct, the updated layout
only can be effective when CONFIG_RANDSTRUCT_NONE is 'y'.
The optimization has been validated in the syscall test of UnixBench.
performance gain is 30~50%. Furthermore, to confirm the optimization
effectiveness on the other codes path, the results of fsdisk, fsbuffer
and fstime are also shown.
Here are the detailed test results of unixbench.
Command: numactl -C 3-18 ./Run -c 16 syscall fsbuffer fstime fsdisk
Without Patch
------------------------------------------------------------------------
File Copy 1024 bufsize 2000 maxblocks 875052.1 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 235484.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2815153.5 KBps (30.0 s, 2 samples)
System Call Overhead 5772268.3 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX
File Copy 1024 bufsize 2000 maxblocks 3960.0 875052.1 2209.7
File Copy 256 bufsize 500 maxblocks 1655.0 235484.0 1422.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 2815153.5 4853.7
System Call Overhead 15000.0 5772268.3 3848.2
========
System Benchmarks Index Score (Partial Only) 2768.3
With Patch
------------------------------------------------------------------------
File Copy 1024 bufsize 2000 maxblocks 1009977.2 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 264765.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 3052236.0 KBps (30.0 s, 2 samples)
System Call Overhead 8237404.4 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX
File Copy 1024 bufsize 2000 maxblocks 3960.0 1009977.2 2550.4
File Copy 256 bufsize 500 maxblocks 1655.0 264765.9 1599.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 3052236.0 5262.5
System Call Overhead 15000.0 8237404.4 5491.6
========
System Benchmarks Index Score (Partial Only) 3295.3
Signed-off-by: chenzhiyin <zhiyin.chen@intel.com>
Message-Id: <20230601092400.27162-1-zhiyin.chen@intel.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
With commit 849ad04cf562a ("new helper: put_and_unmap_page()"), Al Viro
introduced the put_and_unmap_page() to use in those many places where we
have a common pattern consisting of calls to kunmap_local() +
put_page().
Obviously, first we unmap and then we put pages. Instead, the original
name of this helper seems to imply that we first put and then unmap.
Therefore, rename the helper and change the only known upstreamed user
(i.e., fs/sysv) before this helper enters common use and might become
difficult to find all call sites and instead easy to break the builds.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Message-Id: <20230602103307.5637-1-fmdefrancesco@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The AMD MI200 series accelerators are data center GPUs. They include
unified memory controllers and a data fabric similar to those used in
AMD x86 CPU products. The memory controllers report errors using MCA,
though these errors are generally handled through GPU drivers that
directly manage the accelerator device.
In some configurations, memory errors from these devices will be
reported through MCA and managed by x86 CPUs. The OS is expected to
handle these errors in similar fashion to MCA errors originating from
memory controllers on the CPUs. In Linux, this flow includes passing MCA
errors to a notifier chain with handlers in the EDAC subsystem.
The AMD64 EDAC module requires information from the memory controllers
and data fabric in order to provide detailed decoding of memory errors.
The information is read from hardware registers accessed through
interfaces in the data fabric.
The accelerator data fabrics are visible to the host x86 CPUs as PCI
devices just like x86 CPU data fabrics are already. However, the
accelerator fabrics have new and unique PCI IDs.
Add PCI IDs for the MI200 series of accelerator devices in order to
enable EDAC support. The data fabrics of the accelerator devices will be
enumerated as any other fabric already supported. System-specific
implementation details will be handled within the AMD64 EDAC module.
[ bp: Scrub off marketing speak. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-2-muralimk@amd.com
|
|
There are now no callers of xpcs_create(), so let's remove it from
public view to discourage future direct usage.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we can easily create a mdio-device that represents a
memory-mapped device that exposes an MDIO-like register layout, we don't
need the Altera TSE PCS anymore, since we can use the Lynx PCS instead.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There exists several examples today of devices that embed an ethernet
PHY or PCS directly inside an SoC. In this situation, either the device
is controlled through a vendor-specific register set, or sometimes
exposes the standard 802.3 registers that are typically accessed over
MDIO.
As phylib and phylink are designed to use mdiodevices, this driver
allows creating a virtual MDIO bus, that translates mdiodev register
accesses to regmap accesses.
The reason we use regmap is because there are at least 3 such devices
known today, 2 of them are Altera TSE PCS's, memory-mapped, exposed
with a 4-byte stride in stmmac's dwmac-socfpga variant, and a 2-byte
stride in altera-tse. The other one (nxp,sja1110-base-tx-mdio) is
exposed over SPI.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the atomics are documented in Documentation/atomic_t.txt, and
have no kerneldoc comments. There are a sufficient number of gotchas
(e.g. semantics, noinstr-safety) that it would be nice to have comments
to call these out, and it would be nice to have kerneldoc comments such
that these can be collated.
While it's possible to derive the semantics from the code, this can be
painful given the amount of indirection we currently have (e.g. fallback
paths), and it's easy to be mislead by naming, e.g.
* The unconditional void-returning ops *only* have relaxed variants
without a _relaxed suffix, and can easily be mistaken for being fully
ordered.
It would be nice to give these a _relaxed() suffix, but this would
result in significant churn throughout the kernel.
* Our naming of conditional and unconditional+test ops is rather
inconsistent, and it can be difficult to derive the name of an
operation, or to identify where an op is conditional or
unconditional+test.
Some ops are clearly conditional:
- dec_if_positive
- add_unless
- dec_unless_positive
- inc_unless_negative
Some ops are clearly unconditional+test:
- sub_and_test
- dec_and_test
- inc_and_test
However, what exactly those test is not obvious. A _test_zero suffix
might be clearer.
Others could be read ambiguously:
- inc_not_zero // conditional
- add_negative // unconditional+test
It would probably be worth renaming these, e.g. to inc_unless_zero and
add_test_negative.
As a step towards making this more consistent and easier to understand,
this patch adds kerneldoc comments for all generated *atomic*_*()
functions. These are generated from templates, with some common text
shared, making it easy to extend these in future if necessary.
I've tried to make these as consistent and clear as possible, and I've
deliberately ensured:
* All ops have their ordering explicitly mentioned in the short and long
description.
* All test ops have "test" in their short description.
* All ops are described as an expression using their usual C operator.
For example:
andnot: "Atomically updates @v to (@v & ~@i)"
inc: "Atomically updates @v to (@v + 1)"
Which may be clearer to non-naative English speakers, and allows all
the operations to be described in the same style.
* All conditional ops have their condition described as an expression
using the usual C operators. For example:
add_unless: "If (@v != @u), atomically updates @v to (@v + @i)"
cmpxchg: "If (@v == @old), atomically updates @v to @new"
Which may be clearer to non-naative English speakers, and allows all
the operations to be described in the same style.
* All bitwise ops (and,andnot,or,xor) explicitly mention that they are
bitwise in their short description, so that they are not mistaken for
performing their logical equivalents.
* The noinstr safety of each op is explicitly described, with a
description of whether or not to use the raw_ form of the op.
There should be no functional change as a result of this patch.
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-26-mark.rutland@arm.com
|
|
Currently each ordering variant has several potential definitions,
with a mixture of preprocessor and C definitions, including several
copies of its C prototype, e.g.
| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| __atomic_acquire_fence();
| return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif
Make this a bit simpler by defining the C prototype once, and writing
the various potential definitions as plain C code guarded by ifdeffery.
For example, the above becomes:
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| #if defined(arch_atomic_fetch_andnot_acquire)
| return arch_atomic_fetch_andnot_acquire(i, v);
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| __atomic_acquire_fence();
| return ret;
| #elif defined(arch_atomic_fetch_andnot)
| return arch_atomic_fetch_andnot(i, v);
| #else
| return raw_atomic_fetch_and_acquire(~i, v);
| #endif
| }
Which is far easier to read. As we now always have a single copy of the
C prototype wrapping all the potential definitions, we now have an
obvious single location for kerneldoc comments.
At the same time, the fallbacks for raw_atomic*_xhcg() are made to use
'new' rather than 'i' as the name of the new value. This is what the
existing fallback template used, and is more consistent with the
raw_atomic{_try,}cmpxchg() fallbacks.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-24-mark.rutland@arm.com
|
|
Currently, atomic-long is split into two sections, one defining the
raw_atomic_long_*() ops for CONFIG_64BIT, and one defining the raw
atomic_long_*() ops for !CONFIG_64BIT.
With many lines elided, this looks like:
| #ifdef CONFIG_64BIT
| ...
| static __always_inline bool
| raw_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
| {
| return raw_atomic64_try_cmpxchg(v, (s64 *)old, new);
| }
| ...
| #else /* CONFIG_64BIT */
| ...
| static __always_inline bool
| raw_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
| {
| return raw_atomic_try_cmpxchg(v, (int *)old, new);
| }
| ...
| #endif
The two definitions are spread far apart in the file, and duplicate the
prototype, making it hard to have a legible set of kerneldoc comments.
Make this simpler by defining the C prototype once, and writing the two
definitions inline. For example, the above becomes:
| static __always_inline bool
| raw_atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
| {
| #ifdef CONFIG_64BIT
| return raw_atomic64_try_cmpxchg(v, (s64 *)old, new);
| #else
| return raw_atomic_try_cmpxchg(v, (int *)old, new);
| #endif
| }
As we now always have a single copy of the C prototype wrapping all the
potential definitions, we now have an obvious single location for kerneldoc
comments. As a bonus, both the script and the generated file are
somewhat shorter.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-23-mark.rutland@arm.com
|
|
Currently the various ordering variants of an atomic operation are
defined in groups of full/acquire/release/relaxed ordering variants with
some shared ifdeffery and several potential definitions of each ordering
variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down
different branches of the shared ifdeffery, it can be painful for a
human to find a relevant definition, and we don't have a good location
to place anything common to all definitions of an ordering variant (e.g.
kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering
variants was necessary as we filled in the missing atomics in the same
namespace as the architecture used. It would be easy to accidentally
define one ordering fallback in terms of another ordering fallback with
redundant barriers, and avoiding that would otherwise require a lot of
baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in
the arch_atomic*_<op>() namespace, and only need to fill in the
raw_atomic*_<op>() namespace. Due to this, there's no risk of a
namespace collision, and we can define each raw_atomic*_<op> ordering
variant with its own ifdeffery checking for the arch_atomic*_<op>
ordering variants.
Restructure the fallbacks in this way, with each ordering variant having
its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| __atomic_acquire_fence();
| return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif
Note that where there's no relevant arch_atomic*_<op>() ordering
variant, we'll define the operation in terms of a distinct
raw_atomic*_<otherop>(), as this itself might have been filled in with a
fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we
no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further
improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
|
Now that arch_atomic*() usage is limited to the atomic headers, we no
longer have any users of arch_atomic_long_*(), and can generate
raw_atomic_long_*() directly.
Generate the raw_atomic_long_*() ops directly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-20-mark.rutland@arm.com
|
|
Now that we have raw_atomic*_<op>() definitions, there's no need to use
arch_atomic*_<op>() definitions outside of the low-level atomic
definitions.
Move treewide users of arch_atomic*_<op>() over to the equivalent
raw_atomic*_<op>().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-19-mark.rutland@arm.com
|
|
Currently a number of arch_atomic*_<op>() functions are optional, and
where an arch does not provide a given arch_atomic*_<op>() we will
define an implementation of arch_atomic*_<op>() in
atomic-arch-fallback.h.
Filling in the missing ops requires special care as we want to select
the optimal definition of each op (e.g. preferentially defining ops in
terms of their relaxed form rather than their fully-ordered form). The
ifdeffery necessary for this requires us to group ordering variants
together, which can be a bit painful to read, and is painful for
kerneldoc generation.
It would be easier to handle this if we generated ops into a separate
namespace, as this would remove the need to take special care with the
ifdeffery, and allow each ordering variant to be generated separately.
This patch adds a new set of raw_atomic_<op>() definitions, which are
currently trivial wrappers of their arch_atomic_<op>() equivalent. This
will allow us to move treewide users of arch_atomic_<op>() over to raw
atomic op before we rework the fallback generation to generate
raw_atomic_<op> directly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-18-mark.rutland@arm.com
|
|
Most architectures define the atomic/atomic64 xchg and cmpxchg
operations in terms of arch_xchg and arch_cmpxchg respectfully.
Add fallbacks for these cases and remove the trivial cases from arch
code. On some architectures the existing definitions are kept as these
are used to build other arch_atomic*() operations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-5-mark.rutland@arm.com
|
|
Currently a subset of the fallback templates have kerneldoc comments,
resulting in a haphazard set of generated kerneldoc comments as only
some operations have fallback templates to begin with.
We'd like to generate more consistent kerneldoc comments, and to do so
we'll need to restructure the way the fallback code is generated.
To minimize churn and to make it easier to restructure the fallback
code, this patch removes the existing kerneldoc comments from the
fallback templates. We can add new kerneldoc comments in subsequent
patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-3-mark.rutland@arm.com
|
|
No moar users, remove the monster.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
|
|
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.924677086@infradead.org
|
|
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.855976804@infradead.org
|
|
Add the try_cmpxchg() form to the per-cpu ops.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.587480729@infradead.org
|
|
Wire up the cmpxchg128 family in the atomic wrapper scripts.
These provide the generic cmpxchg128 family of functions from the
arch_ prefixed version, adding explicit instrumentation where needed.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.519237070@infradead.org
|
|
Introduce [us]128 (when available). Unlike [us]64, ensure they are
always naturally aligned.
This also enables 128bit wide atomics (which require natural
alignment) such as cmpxchg128().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.385005581@infradead.org
|
|
We need the tty fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need the USB fixes in here are well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|