Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Keep bridges in D0 if we need to poll downstream devices for PME to
resolve a v6.6 regression where we failed to enumerate devices below
bridges put in D3hot by runtime PM, e.g., NVMe drives connected via
Thunderbolt or USB4 docks (Alex Williamson)
- Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
* tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
PCI: Fix active state requirement in PME polling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
- tracing/probes: Fix BTF structure member finder to find the members
which are placed after any anonymous union member correctly.
* tag 'probes-fixes-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Fix to search structure fields correctly
|
|
Pull smb client fixes from Steve French:
"Five smb3 client fixes, most also for stable:
- Two multichannel fixes (one to fix potential handle leak on retry)
- Work around possible serious data corruption (due to change in
folios in 6.3, for cases when non standard maximum write size
negotiated)
- Symlink creation fix
- Multiuser automount fix"
* tag '6.8-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: Fix regression in writes when non-standard maximum write size negotiated
smb: client: handle path separator of created SMB symlinks
smb: client: set correct id, uid and cruid for multiuser automounts
cifs: update the same create_guid on replay
cifs: fix underflow in parse_server_interfaces()
|
|
The Linux kernel project now has the ability to assign CVEs to fixed
issues, so document the process and how individual developers can get a
CVE if one is not automatically assigned for their fixes.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/2024021731-essence-sadness-28fd@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix to search a field from the structure which has anonymous union
correctly.
Since the reference `type` pointer was updated in the loop, the search
loop suddenly aborted where it hits an anonymous union. Thus it can not
find the field after the anonymous union. This avoids updating the
cursor `type` pointer in the loop.
Link: https://lore.kernel.org/all/170791694361.389532.10047514554799419688.stgit@devnote2/
Fixes: 302db0f5b3d8 ("tracing/probes: Add a function to search a member of a struct/union")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Three fixes are included here. Two are strictly hardware-related
for the i801 and qcom-geni devices. Meanwhile, a fix from Arnd
addresses a compilation error encountered during compile test on
powerpc.
|
|
The Linux CXL subsystem is built on the assumption that HPA == SPA.
That is, the host physical address (HPA) the HDM decoder registers are
programmed with are system physical addresses (SPA).
During HDM decoder setup, the DVSEC CXL range registers (cxl-3.1,
8.1.3.8) are checked if the memory is enabled and the CXL range is in
a HPA window that is described in a CFMWS structure of the CXL host
bridge (cxl-3.1, 9.18.1.3).
Now, if the HPA is not an SPA, the CXL range does not match a CFMWS
window and the CXL memory range will be disabled then. The HDM decoder
stops working which causes system memory being disabled and further a
system hang during HDM decoder initialization, typically when a CXL
enabled kernel boots.
Prevent a system hang and do not disable the HDM decoder if the
decoder's CXL range is not found in a CFMWS window.
Note the change only fixes a hardware hang, but does not implement
HPA/SPA translation. Support for this can be added in a follow on
patch series.
Signed-off-by: Robert Richter <rrichter@amd.com>
Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240216160113.407141-1-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Set a fake qos_class to a unique value in order to do simple testing of
qos_class for root decoders and mem devs via user cxl_test. A mock
function is added to set the fake qos_class values for memory device
and overrides cxl_endpoint_parse_cdat() in cxl driver code.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-5-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Current implementation exports only to
/sys/bus/cxl/devices/.../memN/qos_class. With both ram and pmem exposed,
the second registered sysfs attribute is rejected as duplicate. It's not
possible to create qos_class under the dev_groups via the driver due to
the ram and pmem sysfs sub-directories already created by the device sysfs
groups. Move the ram and pmem qos_class to the device sysfs groups and add
a call to sysfs_update() after the perf data are validated so the
qos_class can be visible. The end results should be
/sys/bus/cxl/devices/.../memN/ram/qos_class and
/sys/bus/cxl/devices/.../memN/pmem/qos_class.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-4-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The passed in host bridge parameter for device_for_each_child() has
unnecessary void * type cast. Remove the type cast.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-3-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
cxl_dpa_perf'
In order to address the issue with being able to expose qos_class sysfs
attributes under 'ram' and 'pmem' sub-directories, the attributes must
be defined as static attributes rather than under driver->dev_groups.
To avoid implementing locking for accessing the 'struct cxl_dpa_perf`
lists, convert the list to a single 'struct cxl_dpa_perf' entry in
preparation to move the attributes to statically defined.
While theoretically a partition may have multiple qos_class via CDAT, this
has not been encountered with testing on available hardware. The code is
simplified for now to not support the complex case until a use case is
needed to support that.
Link: https://lore.kernel.org/linux-cxl/65b200ba228f_2d43c29468@dwillia2-mobl3.amr.corp.intel.com.notmuch/
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-2-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Autodiscovered regions can fail to assemble if they are not discovered
in HPA decode order. The user will see failure messages like:
[] cxl region0: endpoint5: HPA order violation region1
[] cxl region0: endpoint5: failed to allocate region reference
The check that is causing the failure helps the CXL driver enforce
a CXL spec mandate that decoders be committed in HPA order. The
check is needless for autodiscovered regions since their decoders
are already programmed. Trying to enforce order in the assembly of
these regions is useless because they are assembled once all their
member endpoints arrive, and there is no guarantee on the order in
which endpoints are discovered during probe.
Keep the existing check, but for autodiscovered regions, allow the
out of order assembly after a sanity check that the lesser numbered
decoder has the lesser HPA starting address.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Link: https://lore.kernel.org/r/3dec69ee97524ab229a20c6739272c3000b18408.1706736863.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
In preparation for adding a new caller of cxl_region_find_decoders()
teach it to find a decoder from a cxl_endpoint_decoder structure.
Combining switch and endpoint decoder lookup in one function prevents
code duplication in call sites.
Update the existing caller.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Link: https://lore.kernel.org/r/79ae6d72978ef9f3ceec9722e1cb793820553c8e.1706736863.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The compare function used to sort memblks into starting address
order fails when the result of its u64 address subtraction gets
truncated to an int upon return.
The impact of the bad sort is that memblks will be filled out
incorrectly. Depending on the set of memblks, a user may see no
errors at all but still have a bad fill, or see messages reporting
a node overlap that leads to numa init failure:
[] node 0 [mem: ] overlaps with node 1 [mem: ]
[] No NUMA configuration found
Replace with a comparison that can only result in: 1, 0, -1.
Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/99dcb3ae87e04995e9f293f6158dc8fa0749a487.1705085543.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
numa_fill_memblks() fills in the gaps in numa_meminfo memblks over a
physical address range. To do so, it first creates a list of existing
memblks that overlap that address range. The issue is that it is off
by one when comparing to the end of the address range, so memblks
that do not overlap are selected.
The impact of selecting a memblk that does not actually overlap is
that an existing memblk may be filled when the expected action is to
do nothing and return NUMA_NO_MEMBLK to the caller. The caller can
then add a new NUMA node and memblk.
Replace the broken open-coded search for address overlap with the
memblock helper memblock_addrs_overlap(). Update the kernel doc
and in code comments.
Suggested by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/10a3e6109c34c21a8dd4c513cf63df63481a2b07.1705085543.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When emulating an atomic access on behalf of the guest, mark the target
gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This
fixes a bug where KVM effectively corrupts guest memory during live
migration by writing to guest memory without informing userspace that the
page is dirty.
Marking the page dirty got unintentionally dropped when KVM's emulated
CMPXCHG was converted to do a user access. Before that, KVM explicitly
mapped the guest page into kernel memory, and marked the page dirty during
the unmap phase.
Mark the page dirty even if the CMPXCHG fails, as the old data is written
back on failure, i.e. the page is still written. The value written is
guaranteed to be the same because the operation is atomic, but KVM's ABI
is that all writes are dirty logged regardless of the value written. And
more importantly, that's what KVM did before the buggy commit.
Huge kudos to the folks on the Cc list (and many others), who did all the
actual work of triaging and debugging.
Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Pasha Tatashin <tatashin@google.com>
Cc: Michael Krebs <mkrebs@google.com>
base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20240215010004.1456078-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Commit e4bb020f3dbb ("riscv: detect assembler support for .option arch")
added two tests, one for a valid value to '.option arch' that should
succeed and one for an invalid value that is expected to fail to make
sure that support for '.option arch' is properly detected because Clang
does not error when '.option arch' is not supported:
$ clang --target=riscv64-linux-gnu -Werror -x assembler -c -o /dev/null <(echo '.option arch, +m')
/dev/fd/63:1:9: warning: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
.option arch, +m
^
$ echo $?
0
Unfortunately, the invalid test started being accepted by Clang after
the linked llvm-project change, which causes CONFIG_AS_HAS_OPTION_ARCH
and configurations that depend on it to be silently disabled, even
though those versions do support '.option arch'.
The invalid test can be avoided altogether by using
'-Wa,--fatal-warnings', which will turn all assembler warnings into
errors, like '-Werror' does for the compiler:
$ clang --target=riscv64-linux-gnu -Werror -Wa,--fatal-warnings -x assembler -c -o /dev/null <(echo '.option arch, +m')
/dev/fd/63:1:9: error: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
.option arch, +m
^
$ echo $?
1
The as-instr macros have been updated to make use of this flag, so
remove the invalid test, which allows CONFIG_AS_HAS_OPTION_ARCH to work
for all compiler versions.
Cc: stable@vger.kernel.org
Fixes: e4bb020f3dbb ("riscv: detect assembler support for .option arch")
Link: https://github.com/llvm/llvm-project/commit/3ac9fe69f70a2b3541266daedbaaa7dc9c007a2a
Reported-by: Eric Biggers <ebiggers@kernel.org>
Closes: https://lore.kernel.org/r/20240121011341.GA97368@sol.localdomain/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-2-390ac9cc3cd0@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Certain assembler instruction tests may only induce warnings from the
assembler on an unsupported instruction or option, which causes as-instr
to succeed when it was expected to fail. Some tests workaround this
limitation by additionally testing that invalid input fails as expected.
However, this is fragile if the assembler is changed to accept the
invalid input, as it will cause the instruction/option to be unavailable
like it was unsupported even when it is.
Use '-Wa,--fatal-warnings' in the as-instr macro to turn these warnings
into hard errors, which avoids this fragility and makes tests more
robust and well formed.
Cc: stable@vger.kernel.org
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-1-390ac9cc3cd0@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Aligning to the L1 cache does not guarantee the same alignment as
kmallocing an object [1]. Furthermore, in some platforms, that
alignment is not sufficient for DMA safety (in case someone wants
to have a DMA safe buffer in privdata) [2].
Sometime ago, we had the same fixes in IIO.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/devres.c#n35
[2]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/
Fixes: c18e2760308e ("counter: Provide alternative counter registration functions")
Signed-off-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20240209-counter-align-fix-v2-1-5777ea0a2722@analog.com
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
|
|
The SED Opal response parsing function response_parse() does not
handle the case of an empty atom in the response. This causes
the entry count to be too high and the response fails to be
parsed. Recognizing, but ignoring, empty atoms allows response
handling to succeed.
Signed-off-by: Greg Joyce <gjoyce@linux.ibm.com>
Link: https://lore.kernel.org/r/20240216210417.3526064-2-gjoyce@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block
Pull MD changes from Song:
"1. Cleanup redundant checks, by Yu Kuai.
2. Remove deprecated headers, by Marc Zyngier and Song Liu.
3. Concurrency fixes, by Li Lingfeng.
4. Memory leak fix, by Li Nan."
* tag 'md-6.9-20240216' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: fix kmemleak of rdev->serial
md/multipath: Remove md-multipath.h
md/linear: Get rid of md-linear.h
md: use RCU lock to protect traversal in md_spares_need_change()
md: get rdev->mddev with READ_ONCE()
md: remove redundant md_wakeup_thread()
md: remove redundant check of 'mddev->sync_thread'
|
|
Since I have been contributing to the driver for a while and wish to help
with the review process, add myself as a reviewer.
Link: https://lore.kernel.org/r/20240216065926.473805-1-s-vadapalli@ti.com
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The bq27xxx i2c-client may not have an IRQ, in which case
client->irq will be 0. bq27xxx_battery_i2c_probe() already has
an if (client->irq) check wrapping the request_threaded_irq().
But bq27xxx_battery_i2c_remove() unconditionally calls
free_irq(client->irq) leading to:
[ 190.310742] ------------[ cut here ]------------
[ 190.310843] Trying to free already-free IRQ 0
[ 190.310861] WARNING: CPU: 2 PID: 1304 at kernel/irq/manage.c:1893 free_irq+0x1b8/0x310
Followed by a backtrace when unbinding the driver. Add
an if (client->irq) to bq27xxx_battery_i2c_remove() mirroring
probe() to fix this.
Fixes: 444ff00734f3 ("power: supply: bq27xxx: Fix I2C IRQ race on remove")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240215155133.70537-1-hdegoede@redhat.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.8
Pull MD fixes from Song:
"1. Fix issues reported for dm-raid [1], by Yu Kuai. Please note that
this PR only contains the first half of the set [2]. We still need
more fixes in dm and md code (the rest of the set, or alternative
fixes).
2. Fix active_io leak, by Yu Kuai. The fix was posted in the same set
[2]. But it actually fixes a separate issue [3].
[1] https://lore.kernel.org/linux-raid/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/
[2] https://lore.kernel.org/linux-raid/20240201092559.910982-1-yukuai1@huaweicloud.com/
[3] https://lore.kernel.org/linux-raid/20240130172524.0000417b@linux.intel.com/ "
* tag 'md-6.8-20240216' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: Don't suspend the array for interrupted reshape
md: Don't register sync_thread for reshape directly
md: Make sure md_do_sync() will set MD_RECOVERY_DONE
md: Don't ignore read-only array in md_check_recovery()
md: Don't ignore suspended array in md_check_recovery()
md: Fix missing release of 'active_io' for flush
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes: the two fnic ones are a revert and a refix, which is why
the diffstat is a bit big. The target one also extracts a function to
add a check for configuration and so looks bigger than it is"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Move fnic_fnic_flush_tx() to a work queue
scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
scsi: target: Fix unmap setup during configuration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Just one patch to revert commit ca10d851b9ad ("workqueue: Override
implicit ordered attribute in workqueue_apply_unbound_cpumask()").
This commit could break ordering guarantees for ordered workqueues.
The problem that the commit tried to resolve partially - making
ordered workqueues follow unbound cpumask - is fully solved in
wq/for-6.9 branch"
* tag 'wq-for-6.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()"
|
|
Pull block fixes from Jens Axboe:
"Just an nvme pull request via Keith:
- Fabrics connection error handling (Chaitanya)
- Use relaxed effects to reduce unnecessary queue freezes (Keith)"
* tag 'block-6.8-2024-02-16' of git://git.kernel.dk/linux:
nvmet: remove superfluous initialization
nvme: implement support for relaxed effects
nvme-fabrics: fix I/O connect error handling
|
|
Pull io_uring fix from Jens Axboe:
"Just a single fix for a regression in how overflow is handled for
multishot accept requests"
* tag 'io_uring-6.8-2024-02-16' of git://git.kernel.dk/linux:
io_uring/net: fix multishot accept overflow handling
|
|
Pull ceph fixes from Ilya Dryomov:
"Additional cap handling fixes from Xiubo to avoid "client isn't
responding to mclientcaps(revoke)" stalls on the MDS side"
* tag 'ceph-for-6.8-rc5' of https://github.com/ceph/ceph-client:
ceph: add ceph_cap_unlink_work to fire check_caps() immediately
ceph: always queue a writeback when revoking the Fb caps
|
|
The NOKPROBE_SYMBOL macro (and others) were moved to
asm-generic/kprobes.h in 2017 by commit 7d134b2ce639 ("kprobes: move
kprobe declarations to asm-generic/kprobes.h"), and this new header
was included by asm/kprobes.h unconditionally on all architectures.
When kprobe support was added to parisc in 2017 by commit
8858ac8e9e9b1 ("parisc: Implement kprobes"), that header was only
included when CONFIG_KPROBES was enabled.
This can lead to build failures when NOKPROBE_SYMBOL is used, but
CONFIG_KPROBES is disabled. This mistake however was never actually
noticed because linux/kprobes.h also includes asm-generic/kprobes.h
(though I do not understand why that is, because it also includes
asm/kprobes.h).
To prevent eventual build failures, I suggest to always include
asm-generic/kprobes.h on parisc, just like all the other architectures
do. This way, including asm/kprobes.h suffices, and nobody (outside
of arch/) ever needs to explicitly include asm-generic/kprobes.h.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Fixes a bug revealed by -Wmissing-prototypes when
CONFIG_FUNCTION_GRAPH_TRACER is enabled but not CONFIG_DYNAMIC_FTRACE:
arch/parisc/kernel/ftrace.c:82:5: error: no previous prototype for 'ftrace_enable_ftrace_graph_caller' [-Werror=missing-prototypes]
82 | int ftrace_enable_ftrace_graph_caller(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/parisc/kernel/ftrace.c:88:5: error: no previous prototype for 'ftrace_disable_ftrace_graph_caller' [-Werror=missing-prototypes]
88 | int ftrace_disable_ftrace_graph_caller(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Avoid dropping the page refcount twice when freeing an unlinked
page-table subtree.
- Don't source the VFIO Kconfig twice
- Fix protected-mode locking order between kvm and vcpus
RISC-V:
- Fix steal-time related sparse warnings
x86:
- Cleanup gtod_is_based_on_tsc() to return "bool" instead of an "int"
- Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if
and only if the incoming events->nmi.pending is non-zero. If the
target vCPU is in the UNITIALIZED state, the spurious request will
result in KVM exiting to userspace, which in turn causes QEMU to
constantly acquire and release QEMU's global mutex, to the point
where the BSP is unable to make forward progress.
- Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl
being incorrectly truncated, and ultimately causes KVM to think a
fixed counter has already been disabled (KVM thinks the old value
is '0').
- Fix a stack leak in KVM_GET_MSRS where a failed MSR read from
userspace that is ultimately ignored due to ignore_msrs=true
doesn't zero the output as intended.
Selftests cleanups and fixes:
- Remove redundant newlines from error messages.
- Delete an unused variable in the AMX test (which causes build
failures when compiling with -Werror).
- Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails
with an error code other than ENOENT (a Hyper-V selftest bug
resulted in an EMFILE, and the test eventually got skipped).
- Fix TSC related bugs in several Hyper-V selftests.
- Fix a bug in the dirty ring logging test where a sem_post() could
be left pending across multiple runs, resulting in incorrect
synchronization between the main thread and the vCPU worker thread.
- Relax the dirty log split test's assertions on 4KiB mappings to fix
false positives due to the number of mappings for memslot 0 (used
for code and data that is NOT being dirty logged) changing, e.g.
due to NUMA balancing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked()
RISC-V: KVM: Use correct restricted types
RISC-V: paravirt: Use correct restricted types
RISC-V: paravirt: steal_time should be static
KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test
KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
KVM: x86: Fix KVM_GET_MSRS stack info leak
KVM: arm64: Do not source virt/lib/Kconfig twice
KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
KVM: x86: Make gtod_is_based_on_tsc() return 'bool'
KVM: selftests: Make hyperv_clock require TSC based system clocksource
KVM: selftests: Run clocksource dependent tests with hyperv_clocksource_tsc_page too
KVM: selftests: Use generic sys_clocksource_is_tsc() in vmx_nested_tsc_scaling_test
KVM: selftests: Generalize check_clocksource() from kvm_clock_test
KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
KVM: arm64: Fix circular locking dependency
KVM: selftests: Fail tests when open() fails with !ENOENT
KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing
KVM: selftests: Delete superfluous, unused "stage" variable in AMX test
KVM: selftests: x86_64: Remove redundant newlines
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix the #ifndef that didn't have the 'CONFIG_' prefix on
HAVE_DYNAMIC_FTRACE_WITH_REGS
The fix to have dynamic trampolines work with x86 broke arm64 as the
config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the
previous fix was to fix.
- Fix tracing_on state
The code to test if "tracing_on" is set incorrectly used
ring_buffer_record_is_on() which returns false if the ring buffer
isn't able to be written to.
But the ring buffer disable has several bits that disable it. One is
internal disabling which is used for resizing and other modifications
of the ring buffer. But the "tracing_on" user space visible flag
should only report if tracing is actually on and not internally
disabled, as this can cause confusion as writing "1" when it is
disabled will not enable it.
Instead use ring_buffer_record_is_set_on() which shows the user space
visible settings.
- Fix a false positive kmemleak on saved cmdlines
Now that the saved_cmdlines structure is allocated via alloc_page()
and not via kmalloc() it has become invisible to kmemleak. The
allocation done to one of its pointers was flagged as a dangling
allocation leak. Make kmemleak aware of this allocation and free.
- Fix synthetic event dynamic strings
An update that cleaned up the synthetic event code removed the return
value of trace_string(), and had it return zero instead of the
length, causing dynamic strings in the synthetic event to always have
zero size.
- Clean up documentation and header files for seq_buf
* tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
seq_buf: Fix kernel documentation
seq_buf: Don't use "proxy" headers
tracing/synthetic: Fix trace_string() return value
tracing: Inform kmemleak of saved_cmdlines allocation
tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on()
tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"It's a little busier than normal, but it's still not a lot of code and
things seem fairly quiet in general:
- Fix allocation failure during SVE coredumps
- Fix handling of SVE context on signal delivery
- Enable Neoverse N2 CPU errata workarounds for Microsoft's "Azure
Cobalt 100" clone
- Work around CMN PMU erratum in AmpereOneX implementation
- Fix typo in CXL PMU event definition
- Fix jump label asm constraints"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/sve: Lower the maximum allocation for the SVE ptrace regset
arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata
perf/arm-cmn: Workaround AmpereOneX errata AC04_MESH_1 (incorrect child count)
arm64: jump_label: use constraints "Si" instead of "i"
arm64: fix typo in comments
perf: CXL: fix mismatched cpmu event opcode
arm64/signal: Don't assume that TIF_SVE means we saved SVE state
|
|
When a CPU is taken offline the resctrl filesystem code needs to check if it
was the CPU nominated to perform the periodic overflow and limbo work. If so,
another CPU needs to be chosen to do this work.
This is currently done in core.c, mixed in with the code that removes the CPU
from the domain's mask, and potentially free()s the domain.
Move the migration of the overflow and limbo helpers into the filesystem code,
into resctrl_offline_cpu(). As resctrl_offline_cpu() runs before the
architecture code has removed the CPU from the domain mask, the callers need to
be told which CPU is being removed, to avoid picking it as the new CPU. This
uses the exclude_cpu feature previously added.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-24-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
The resctrl architecture specific code may need to free a domain when a CPU
goes offline, it also needs to reset the CPUs PQR_ASSOC register. Amongst
other things, the resctrl filesystem code needs to clear this CPU from the
cpu_mask of any control and monitor groups.
Currently, this is all done in core.c and called from resctrl_offline_cpu(),
making the split between architecture and filesystem code unclear.
Move the filesystem work to remove the CPU from the control and monitor groups
into a filesystem helper called resctrl_offline_cpu(), and rename the one in
core.c resctrl_arch_offline_cpu().
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-23-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
When a CPU is taken offline resctrl may need to move the overflow or limbo
handlers to run on a different CPU.
Once the offline callbacks have been split, cqm_setup_limbo_handler() will be
called while the CPU that is going offline is still present in the CPU mask.
Pass the CPU to exclude to cqm_setup_limbo_handler() and
mbm_setup_overflow_handler(). These functions can use a variant of
cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need
excluding.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-22-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
The resctrl architecture specific code may need to create a domain when a CPU
comes online, it also needs to reset the CPUs PQR_ASSOC register. The resctrl
filesystem code needs to update the rdtgroup_default CPU mask when CPUs are
brought online.
Currently, this is all done in one function, resctrl_online_cpu(). It will
need to be split into architecture and filesystem parts before resctrl can be
moved to /fs/.
Pull the rdtgroup_default update work out as a filesystem specific cpu_online
helper. resctrl_online_cpu() is the obvious name for this, which means the
version in core.c needs renaming.
resctrl_online_cpu() is called by the arch code once it has done the work to
add the new CPU to any domains.
In future patches, resctrl_online_cpu() will take the rdtgroup_mutex itself.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-21-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
resctrl reads rdt_alloc_capable or rdt_mon_capable to determine whether any of
the resources support the corresponding features. resctrl also uses the
static keys that affect the architecture's context-switch code to determine the
same thing.
This forces another architecture to have the same static keys.
As the static key is enabled based on the capable flag, and none of the
filesystem uses of these are in the scheduler path, move the capable flags
behind helpers, and use these in the filesystem code instead of the static key.
After this change, only the architecture code manages and uses the static keys
to ensure __resctrl_sched_in() does not need runtime checks.
This avoids multiple architectures having to define the same static keys.
Cases where the static key implicitly tested if the resctrl filesystem was
mounted all have an explicit check now.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-20-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
rdt_enable_key is switched when resctrl is mounted. It was also previously used
to prevent a second mount of the filesystem.
Any other architecture that wants to support resctrl has to provide identical
static keys.
Now that there are helpers for enabling and disabling the alloc/mon keys,
resctrl doesn't need to switch this extra key, it can be done by the arch code.
Use the static-key increment and decrement helpers, and change resctrl to
ensure the calls are balanced.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-19-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
resctrl enables three static keys depending on the features it has enabled.
Another architecture's context switch code may look different, any static keys
that control it should be buried behind helpers.
Move the alloc/mon logic into arch-specific helpers as a preparatory step for
making the rdt_enable_key's status something the arch code decides.
This means other architectures don't have to mirror the static keys.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-18-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
The rdt_enable_key is switched when resctrl is mounted, and used to prevent
a second mount of the filesystem. It also enables the architecture's context
switch code.
This requires another architecture to have the same set of static keys, as
resctrl depends on them too. The existing users of these static keys are
implicitly also checking if the filesystem is mounted.
Make the resctrl_mounted checks explicit: resctrl can keep track of whether it
has been mounted once. This doesn't need to be combined with whether the arch
code is context switching the CLOSID.
rdt_mon_enable_key is never used just to test that resctrl is mounted, but does
also have this implication. Add a resctrl_mounted to all uses of
rdt_mon_enable_key.
This will allow the static key changing to be moved behind resctrl_arch_ calls.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-17-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Depending on the number of monitors available, Arm's MPAM may need to
allocate a monitor prior to reading the counter value. Allocating a
contended resource may involve sleeping.
__check_limbo() and mon_event_count() each make multiple calls to
resctrl_arch_rmid_read(), to avoid extra work on contended systems,
the allocation should be valid for multiple invocations of
resctrl_arch_rmid_read().
The memory or hardware allocated is not specific to a domain.
Add arch hooks for this allocation, which need calling before
resctrl_arch_rmid_read(). The allocated monitor is passed to
resctrl_arch_rmid_read(), then freed again afterwards. The helper
can be called on any CPU, and can sleep.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-16-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
MPAM's cache occupancy counters can take a little while to settle once the
monitor has been configured. The maximum settling time is described to the
driver via a firmware table. The value could be large enough that it makes
sense to sleep. To avoid exposing this to resctrl, it should be hidden behind
MPAM's resctrl_arch_rmid_read().
resctrl_arch_rmid_read() may be called via IPI meaning it is unable to sleep.
In this case, it should return an error if it needs to sleep. This will only
affect MPAM platforms where the cache occupancy counter isn't available
immediately, nohz_full is in use, and there are no housekeeping CPUs in the
necessary domain.
There are three callers of resctrl_arch_rmid_read(): __mon_event_count() and
__check_limbo() are both called from a non-migrateable context.
mon_event_read() invokes __mon_event_count() using smp_call_on_cpu(), which
adds work to the target CPUs workqueue. rdtgroup_mutex() is held, meaning this
cannot race with the resctrl cpuhp callback. __check_limbo() is invoked via
schedule_delayed_work_on() also adds work to a per-cpu workqueue.
The remaining call is add_rmid_to_limbo() which is called in response to
a user-space syscall that frees an RMID. This opportunistically reads the LLC
occupancy counter on the current domain to see if the RMID is over the dirty
threshold. This has to disable preemption to avoid reading the wrong domain's
value. Disabling preemption here prevents resctrl_arch_rmid_read() from
sleeping.
add_rmid_to_limbo() walks each domain, but only reads the counter on one
domain. If the system has more than one domain, the RMID will always be added
to the limbo list. If the RMIDs usage was not over the threshold, it will be
removed from the list when __check_limbo() runs. Make this the default
behaviour. Free RMIDs are always added to the limbo list for each domain.
The user visible effect of this is that a clean RMID is not available for
re-allocation immediately after 'rmdir()' completes. This behaviour was never
portable as it never happened on a machine with multiple domains.
Removing this path allows resctrl_arch_rmid_read() to sleep if its called with
interrupts unmasked. Document this is the expected behaviour, and add
a might_sleep() annotation to catch changes that won't work on arm64.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-15-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Intel is blessed with an abundance of monitors, one per RMID, that can be
read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
the number implemented is up to the manufacturer. This means when there are
fewer monitors than needed, they need to be allocated and freed.
MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The
CSU counter is allowed to return 'not ready' for a small number of
micro-seconds after programming. To allow one CSU hardware monitor to be
used for multiple control or monitor groups, the CPU accessing the
monitor needs to be able to block when configuring and reading the
counter.
Worse, the domain may be broken up into slices, and the MMIO accesses
for each slice may need performing from different CPUs.
These two details mean MPAMs monitor code needs to be able to sleep, and
IPI another CPU in the domain to read from a resource that has been sliced.
mon_event_read() already invokes mon_event_count() via IPI, which means
this isn't possible. On systems using nohz-full, some CPUs need to be
interrupted to run kernel work as they otherwise stay in user-space
running realtime workloads. Interrupting these CPUs should be avoided,
and scheduling work on them may never complete.
Change mon_event_read() to pick a housekeeping CPU, (one that is not using
nohz_full) and schedule mon_event_count() and wait. If all the CPUs
in a domain are using nohz-full, then an IPI is used as the fallback.
This function is only used in response to a user-space filesystem request
(not the timing sensitive overflow code).
This allows MPAM to hide the slice behaviour from resctrl, and to keep
the monitor-allocation in monitor.c. When the IPI fallback is used on
machines where MPAM needs to make an access on multiple CPUs, the counter
read will always fail.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-14-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
The limbo and overflow code picks a CPU to use from the domain's list of online
CPUs. Work is then scheduled on these CPUs to maintain the limbo list and any
counters that may overflow.
cpumask_any() may pick a CPU that is marked nohz_full, which will either
penalise the work that CPU was dedicated to, or delay the processing of limbo
list or counters that may overflow. Perhaps indefinitely. Delaying the overflow
handling will skew the bandwidth values calculated by mba_sc, which expects to
be called once a second.
Add cpumask_any_housekeeping() as a replacement for cpumask_any() that prefers
housekeeping CPUs. This helper will still return a nohz_full CPU if that is the
only option. The CPU to use is re-evaluated each time the limbo/overflow work
runs. This ensures the work will move off a nohz_full CPU once a housekeeping
CPU is available.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-13-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
When switching tasks, the CLOSID and RMID that the new task should use
are stored in struct task_struct. For x86 the CLOSID known by resctrl,
the value in task_struct, and the value written to the CPU register are
all the same thing.
MPAM's CPU interface has two different PARTIDs - one for data accesses
the other for instruction fetch. Storing resctrl's CLOSID value in
struct task_struct implies the arch code knows whether resctrl is using
CDP.
Move the matching and setting of the struct task_struct properties to
use helpers. This allows arm64 to store the hardware format of the
register, instead of having to convert it each time.
__rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn
values aren't seen as another CPU may schedule the task being moved
while the value is being changed. MPAM has an additional corner-case
here as the PMG bits extend the PARTID space.
If the scheduler sees a new-CLOSID but old-RMID, the task will dirty an
RMID that the limbo code is not watching causing an inaccurate count.
x86's RMID are independent values, so the limbo code will still be
watching the old-RMID in this circumstance.
To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together.
Both values must be provided together.
Because MPAM's RMID values are not unique, the CLOSID must be provided
when matching the RMID.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-12-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be used
for different control groups.
This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.
Instead of allocating the first free CLOSID, on architectures where
CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is enabled, search
closid_num_dirty_rmid[] to find the cleanest CLOSID.
The CLOSID found is returned to closid_alloc() for the free list
to be updated.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-11-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
The resctrl CLOSID allocator uses a single 32bit word to track which
CLOSID are free. The setting and clearing of bits is open coded.
Convert the existing open coded bit manipulations of closid_free_map
to use __set_bit() and friends. These don't need to be atomic as this
list is protected by the mutex.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-10-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
used for different control groups.
This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.
Keep track of the number of RMID held in limbo each CLOSID has. This will
allow a future helper to find the 'cleanest' CLOSID when allocating.
The array is only needed when CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is
defined. This will never be the case on x86.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-9-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|