Age | Commit message (Collapse) | Author |
|
After the discovery of a case where an implementation misbehaves with
register reads larger than the definition of the register the other
usages of readq() were audited and found to be correct, but some cases
where the io-64-nonatomic-lo-hi.h include is not needed were discovered,
delete them.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149844596.792294.8273108394688012953.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The CXL specification mandates that 4-byte registers must be accessed
with 4-byte access cycles. CXL 3.0 8.2.3 "Component Register Layout and
Definition" states that the behavior is undefined if (2) 32-bit
registers are accessed as an 8-byte quantity. It turns out that at least
one hardware implementation is sensitive to this in practice. The @size
variable results in zero with:
size = readq(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which));
...and the correct size with:
lo = readl(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which));
hi = readl(hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(which));
size = (hi << 32) + lo;
Fixes: d17d0540a0db ("cxl/core/hdm: Add CXL standard decoder enumeration to the core")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149844056.792294.8224490474529733736.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Decoders committed with 0-size lead to later crashes on shutdown as
__cxl_dpa_release() assumes a 'struct resource' has been established in
the in 'cxlds->dpa_res'. Just fail the driver load in this instance
since there are deeper problems with the enumeration or the setup when
this happens.
Fixes: 9c57cde0dcbd ("cxl/hdm: Enumerate allocated DPA")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149843516.792294.11872242648319572632.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Virtual Trust Levels (VTL) helps enable Hyper-V Virtual Secure Mode (VSM)
feature. VSM is a set of hypervisor capabilities and enlightenments
offered to host and guest partitions which enable the creation and
management of new security boundaries within operating system software.
VSM achieves and maintains isolation through VTLs.
Add early initialization for Virtual Trust Levels (VTL). This includes
initializing the x86 platform for VTL and enabling boot support for
secondary CPUs to start in targeted VTL context. For now, only enable
the code for targeted VTL level as 2.
When starting an AP at a VTL other than VTL0, the AP must start directly
in 64-bit mode, bypassing the usual 16-bit -> 32-bit -> 64-bit mode
transition sequence that occurs after waking up an AP with SIPI whose
vector points to the 16-bit AP startup trampoline code.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Link: https://lore.kernel.org/r/1681192532-15460-6-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Add HYPERV_VTL_MODE Kconfig flag for VTL mode.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1681192532-15460-5-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Move hv_get_nmi_reason to .h file so it can be used in other
modules as well.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1681192532-15460-4-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Add structs and hypercalls required to enable VTL support on x86.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Link: https://lore.kernel.org/r/1681192532-15460-3-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Make get/set_rtc_noop() to be public so that they can be used
in other modules as well.
Co-developed-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1681192532-15460-2-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci_am654: Fix support for UHS-I SDR12 and SDR25 speed modes
MEMSTICK:
- Fix memory leak if card device never gets registered"
* tag 'mmc-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
memstick: fix memory leak if card device is never registered
mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25
|
|
Per-vcpu flags are updated using a non-atomic RMW operation.
Which means it is possible to get preempted between the read and
write operations.
Another interesting thing to note is that preemption also updates
flags, as we have some flag manipulation in both the load and put
operations.
It is thus possible to lose information communicated by either
load or put, as the preempted flag update will overwrite the flags
when the thread is resumed. This is specially critical if either
load or put has stored information which depends on the physical
CPU the vcpu runs on.
This results in really elusive bugs, and kudos must be given to
Mostafa for the long hours of debugging, and finally spotting
the problem.
Fix it by disabling preemption during the RMW operation, which
ensures that the state stays consistent. Also upgrade vcpu_get_flag
path to use READ_ONCE() to make sure the field is always atomically
accessed.
Fixes: e87abb73e594 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set")
Reported-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230418125737.2327972-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to to of_property_read_bool().
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
Use generic mbox_bind_client() to bind omap mailbox channel to a client.
mbox_bind_client is identical to the replaced lines, except that it:
- Does the operation under con_mutex which prevents possible races in
removal path
- Sets TXDONE_BY_ACK if pcc uses TXDONE_BY_POLL and the client knows
when tx is done. TXDONE_BY_ACK is already set if there's no interrupt,
so this is not applicable.
- Calls chan->mbox->ops->startup. This is usecase for requesting irq:
move the devm_request_irq into the startup callback and unregister it
in the shutdown path.
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
Use generic mbox_bind_client() to bind omap mailbox channel to a client.
mbox_bind_client is identical to the replaced lines, except that it:
- Does the operation under con_mutex which prevents possible races in
removal path
- Sets TXDONE_BY_ACK if omap uses TXDONE_BY_POLL. omap uses
TXDONE_BY_IRQ, so this check is not applicable.
- Calls chan->mbox->ops->startup, if available. omap doesn't have, so
this is not applicable.
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
Support virtual mailbox controllers and clients which are not platform
devices or come from the devicetree by allowing them to match client to
channel via some other mechanism.
Tested-by: Sudeep Holla <sudeep.holla@arm.com> (pcc)
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
The xiic_xfer() function gets a runtime PM reference when the function is
entered. This reference is released when the function is exited. There is
currently one error path where the function exits directly, which leads to
a leak of the runtime PM reference.
Make sure that this error path also releases the runtime PM reference.
Fixes: fdacc3c7405d ("i2c: xiic: Switch from waitqueue to completion")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Reviewed-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The cdns_i2c_master_xfer() function gets a runtime PM reference when the
function is entered. This reference is released when the function is
exited. There is currently one error path where the function exits
directly, which leads to a leak of the runtime PM reference.
Make sure that this error path also releases the runtime PM reference.
Fixes: 1a351b10b967 ("i2c: cadence: Added slave support")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Reviewed-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Rockchip RK3588/RK3588s GIC600 integration does not support the
sharability feature. Rockchip assigned Erratum ID #3588001 for this
issue.
Note, that the 0x0201743b ID is not Rockchip specific and thus
there is an extra of_machine_is_compatible() check.
The flags are named FORCE_NON_SHAREABLE to be vendor agnostic,
since apparently similar integration design errors exist in other
platforms and they can reuse the same flag.
Co-developed-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Signed-off-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Co-developed-by: Kever Yang <kever.yang@rock-chips.com>
Signed-off-by: Kever Yang <kever.yang@rock-chips.com>
Co-developed-by: Lucas Tanure <lucas.tanure@collabora.com>
Signed-off-by: Lucas Tanure <lucas.tanure@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230418142109.49762-2-sebastian.reichel@collabora.com
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to to of_property_read_bool().
Link: https://lore.kernel.org/r/20230310144700.1541345-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.
Link: https://lore.kernel.org/r/20230310144659.1541247-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Merge series from Srinivas Goud <srinivas.goud@amd.com>:
Currently SPI Cadence controller works in Master mode only.
Update driver to support Slave mode and also Full duplex transfer
support in Slave mode
|
|
"ranges" is a standard property, and we have common helper functions for
parsing it, so let's use them.
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Gregory Clement <gregory.clement@bootlin.com>
Link: https://lore.kernel.org/r/20230216181204.2895676-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Rename the mixer source defines from CS35L56_INPUT_SRC_SWIRE_RXn
to CS35L56_INPUT_SRC_SWIRE_DP1_CHANNELn to match the latest
datasheet.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230418144309.1100721-5-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The mixer source index value for SDW2RX1 is different between
A1 and B0 silicon. As the driver doesn't provide a DAI for SDW2
just remove it as a mixer source option.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230418144309.1100721-4-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Reduce SDW1 to 4 channels and remove the controls for SDW1
TX5 and TX6.
The TX5 and TX6 channels have been removed from B0 silicon.
There is no need to support them on A1 silicon.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230418144309.1100721-3-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
EINT20 contains wake-source interrupts and also interface-blocked
interrupts, which all default to unmasked after reset or wake.
The comment in cs35l56_init() only mentioned the wake interrupts.
Update the comment so it's clear that it's intentional to also
mask the *_BLOCKED interrupts.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230418144309.1100721-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Replace !has_not_enough_free_secs w/ has_enough_free_secs.
BTW avoid nested 'if' statements in f2fs_balance_fs().
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
1. extent_cache
- let's drop the largest extent_cache
2. invalidate_block
- don't show the warnings
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The major change is to call checkpoint, if there's not enough space while having
some prefree segments in FG_GC case.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
cpufreq_verify_current_freq checks() if the frequency returned by
the hardware has a slight delta with the valid frequency value
last set and returns "policy->cur" if the delta is within "1 MHz".
In the comparison, "policy->cur" is in "kHz" but it's compared
against HZ_PER_MHZ. So, the comparison range becomes "1 GHz".
Fix this by comparing against KHZ_PER_MHZ instead of HZ_PER_MHZ.
Fixes: f55ae08c8987 ("cpufreq: Avoid unnecessary frequency updates due to mismatch")
Signed-off-by: Sanjay Chandrashekara <sanjayc@nvidia.com>
[ sumit gupta: Commit message update ]
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
On some Cherry Trail devices the second PWM controller uses
80862289 as ACPI _HID, rather then using 80862288 as is done
for both controllers on most models.
Add the missing 80862289 ACPI _HID, note this uses its own
lpss_device_desc, without ".setup = bsw_pwm_setup" so that
the pwm_lookup is not added for it.
On devices where both controllers use the 80862288 _HID bsw_pwm_setup()
does a UID check to avoid registering the lookup for the second
controller but that will not work here.
Adding the missing id fixes the second PWM controller no longer
working after the entire LPSS1 island has been in D3 at least
once, which causes the contents of the LPSS private registers
to get lost. Adding the _HID makes acpi_lpss restore these
when the controller moves from D3 to D0.
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Currently, acpi_device_remove_notify_handler() may return while the
notify handler being removed is still running which may allow the
module holding that handler to be torn down prematurely.
Address this issue by making acpi_device_remove_notify_handler() wait
for the handling of all the ACPI events in progress to complete before
returning.
Fixes: 5894b0c46e49 ("ACPI / scan: Move bus operations and notification routines to bus.c")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
As per the kernel coding style.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Cleanup bindings dropping unneeded quotes. Once all these are fixed,
checking for this can be enabled in yamllint.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230327170114.4102315-1-robh@kernel.org
|
|
map__dso() is called before thread__find_map() which always results in a
null pointer dereference. Fix it by finding first, then checking if it
exists.
Fixes: 63df0e4bc368adbd ("perf map: Add accessor for dso")
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230418141203.673465-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of updates for devicetree files for Qualcomm,
Rockchips, and NXP i.MX platforms, addressing mistakes in the DT
contents:
- Wrong GPIO polarity on some boards
- Lower SD card interface speed for better stability
- Incorrect power supply, clock, pmic, cache properties
- Disable broken hbr3 on sc7280-herobrine
- Devicetree warning fixes
The only other changes are:
- A regression fix for the Amlogic performance monitoring unit
driver, along with two related DT changes.
- imx_v6_v7_defconfig enables PCI support again.
- Trivial fixes for tee, optee and psci firmware drivers, addressing
compiler warning and error output"
* tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
firmware/psci: demote suspend-mode warning to info level
arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards
ARM: imx_v6_v7_defconfig: Fix unintentional disablement of PCI
arm64: dts: rockchip: correct panel supplies on some rk3326 boards
arm64: dts: rockchip: use just "port" in panel on RockPro64
arm64: dts: rockchip: use just "port" in panel on Pinebook Pro
ARM: dts: imx6ull-colibri: Remove unnecessary #address-cells/#size-cells
ARM: dts: imx7d-remarkable2: Remove unnecessary #address-cells/#size-cells
arm64: dts: imx8mp-verdin: correct off-on-delay
arm64: dts: imx8mm-verdin: correct off-on-delay
arm64: dts: imx8mm-evk: correct pmic clock source
arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers
arm64: dts: rockchip: Remove non-existing pwm-delay-us property
arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices
tee: Pass a pointer to virt_to_page()
perf/amlogic: adjust register offsets
arm64: dts: meson-g12-common: resolve conflict between canvas & pmu
arm64: dts: meson-g12-common: specify full DMC range
arm64: dts: imx8mp: fix address length for LCDIF2
riscv: dts: canaan: drop invalid spi-max-frequency
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into soc/arm
mvebu arm64 for 6.4 (part 1)
turris-mox-rwtm firmware:
- prevent modification at runtime of the kobj_type struct
* tag 'mvebu-arm64-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
firmware: turris-mox-rwtm: make kobj_type structure constant
Link: https://lore.kernel.org/r/878repzfbp.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Original code was largely copy-pasted from the reference board code, correct values to reflect the hardware actually present in the TS-WXL.
Signed-off-by: Jeremy J. Peper <jeremy@jeremypeper.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Adding missing code/values required to enable the XOR and CESA engines for this SoC
Signed-off-by: Jeremy J. Peper <jeremy@jeremypeper.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Original code was largely copy-pasted from the reference board code, adjust to use the actual RTC chip present on the TS-WXL.
Signed-off-by: Jeremy J. Peper <jeremy@jeremypeper.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Original code was largely copy-pasted from the reference board code, adjust pcie initialiazation to reflect the TS-WXL using the single-core variant of this SoC.
Correct pcie_port_size to be a power of 2 as required.
Signed-off-by: Jeremy J. Peper <jeremy@jeremypeper.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
monotonicity
The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which
can also be fetched from clock_gettime() API.
Improve the test coverage while verifying the monotonicity of
CLOCK_BOOTTIME accross both interfaces.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-9-frederic@kernel.org
|
|
Due to broken iowait task counting design (cf: comments above
get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide
the guarantee that /proc/stat or /proc/uptime display monotonic idle
time values.
Remove the assertions that verify the related wrong assumption so that
testers and maintainers don't spend more time on that.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-8-frederic@kernel.org
|
|
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-7-frederic@kernel.org
|
|
There is no need for the __tick_nohz_idle_stop_tick() function between
tick_nohz_idle_stop_tick() and its implementation. Remove that
unnecessary step.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-6-frederic@kernel.org
|
|
The per-cpu iowait task counter is incremented locally upon sleeping.
But since the task can be woken to (and by) another CPU, the counter may
then be decremented remotely. This is the source of a race involving
readers VS writer of idle/iowait sleeptime.
The following scenario shows an example where a /proc/stat reader
observes a pending sleep time as IO whereas that pending sleep time
later eventually gets accounted as non-IO.
CPU 0 CPU 1 CPU 2
----- ----- ------
//io_schedule() TASK A
current->in_iowait = 1
rq(0)->nr_iowait++
//switch to idle
// READ /proc/stat
// See nr_iowait_cpu(0) == 1
return ts->iowait_sleeptime +
ktime_sub(ktime_get(), ts->idle_entrytime)
//try_to_wake_up(TASK A)
rq(0)->nr_iowait--
//idle exit
// See nr_iowait_cpu(0) == 0
ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
As a result subsequent reads on /proc/stat may expose backward progress.
This is unfortunately hardly fixable. Just add a comment about that
condition.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-5-frederic@kernel.org
|
|
Reading idle/IO sleep time (eg: from /proc/stat) can race with idle exit
updates because the state machine handling the stats is not atomic and
requires a coherent read batch.
As a result reading the sleep time may report irrelevant or backward
values.
Fix this with protecting the simple state machine within a seqcount.
This is expected to be cheap enough not to add measurable performance
impact on the idle path.
Note this only fixes reader VS writer condition partitially. A race
remains that involves remote updates of the CPU iowait task counter. It
can hardly be fixed.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-4-frederic@kernel.org
|
|
The idle and IO sleeptime statistics appearing in /proc/stat can be
currently updated from two sites: locally on idle exit and remotely
by cpufreq. However there is no synchronization mechanism protecting
concurrent updates. It is therefore possible to account the sleeptime
twice, among all the other possible broken scenarios.
To prevent from breaking the sleeptime accounting source, restrict the
sleeptime updates to the local idle exit site. If there is a delta to
add since the last update, IO/Idle sleep time readers will now only
compute the delta without actually writing it back to the internal idle
statistic fields.
This fixes a writer VS writer race. Note there are still two known
reader VS writer races to handle. A subsequent patch will fix one.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-3-frederic@kernel.org
|
|
Restructure and group fields by access in order to optimize cache
layout. While at it, also add missing kernel doc for two fields:
@last_jiffies and @idle_expires.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-2-frederic@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into soc/dt
mvebu dt64 for 6.4 (part 1)
Enlarge PCI memory window on Machiatobin (Armada 7040 based)
Add supoport for the GL.iNet GL-MV1000 (Armada 3700 based)
Add missing phy-mode on the cn9310
Align thermal node names with bindings
* tag 'mvebu-dt64-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
ARM64: dts: marvell: cn9310: Add missing phy-mode
arm64: dts: marvell: add DTS for GL.iNet GL-MV1000
arm64: dts: marvell: align thermal node names with bindings
arm64: dts: marvell: mochabin: enlarge PCI memory window
Link: https://lore.kernel.org/r/87bkjlzfcw.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into soc/dt
mvebu dt for 6.4 (part 1)
Add missing phy-mode and fixed links for kirkwood, orion5 and Armada
(370, XP, 38x) SoCs
* tag 'mvebu-dt-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
ARM: dts: armada: Add missing phy-mode and fixed links
ARM: dts: orion5: Add missing phy-mode and fixed links
ARM: dts: kirkwood: Add missing phy-mode and fixed links
Link: https://lore.kernel.org/r/87edohzfeg.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|