Age | Commit message (Collapse) | Author |
|
ACPICA commit 1035a3d453f7dd49a235a59ee84ebda9d2d2f41b
Add ACPI_NONSTRING for destination char arrays without a terminating NUL
character. This is a follow-up to commit 35ad99236f3a ("ACPICA: Apply
ACPI_NONSTRING") where not all instances received the same treatment, in
preparation for replacing strncpy() calls with memcpy()
Link: https://github.com/acpica/acpica/commit/1035a3d4
Signed-off-by: Ahmed Salem <x0rw3ll@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3833065.MHq7AAxBmi@rjwysocki.net
|
|
ACPICA commit 8b83a8d88dfec59ea147fad35fc6deea8859c58c
ap_get_table_length() checks if tables are valid by
calling ap_is_valid_header(). The latter then calls
ACPI_VALIDATE_RSDP_SIG(Table->Signature).
ap_is_valid_header() accepts struct acpi_table_header as an argument, so
the signature size is always fixed to 4 bytes.
The problem is when the string comparison is between ACPI-defined table
signature and ACPI_SIG_RSDP. Common ACPI table header specifies the
Signature field to be 4 bytes long[1], with the exception of the RSDP
structure whose signature is 8 bytes long "RSD PTR " (including the
trailing blank character)[2]. Calling strncmp(sig, rsdp_sig, 8) would
then result in a sequence overread[3] as sig would be smaller (4 bytes)
than the specified bound (8 bytes).
As a workaround, pass the bound conditionally based on the size of the
signature being passed.
Link: https://uefi.org/specs/ACPI/6.5_A/05_ACPI_Software_Programming_Model.html#system-description-table-header [1]
Link: https://uefi.org/specs/ACPI/6.5_A/05_ACPI_Software_Programming_Model.html#root-system-description-pointer-rsdp-structure [2]
Link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstringop-overread [3]
Link: https://github.com/acpica/acpica/commit/8b83a8d8
Signed-off-by: Ahmed Salem <x0rw3ll@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2248233.Mh6RI2rZIc@rjwysocki.net
|
|
RAS2 table
ACPICA commit 2c8a38f747de9d977491a494faf0dfaf799b777b
Rename the structure and field names of the RAS2 table to shorten them and
avoid long lines in the ACPI RAS2 drivers.
1. struct acpi_ras2_shared_memory to struct acpi_ras2_shmem
2. In struct acpi_ras2_shared_memory: fields,
- set_capabilities[16] to set_caps[16]
- num_parameter_blocks to num_param_blks
- set_capabilities_status to set_caps_status
3. struct acpi_ras2_patrol_scrub_parameter to
struct acpi_ras2_patrol_scrub_param
4. In struct acpi_ras2_patrol_scrub_parameter: fields,
- patrol_scrub_command to command
- requested_address_range to req_addr_range
- actual_address_range to actl_addr_range
Link: https://github.com/acpica/acpica/commit/2c8a38f7
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1942053.CQOukoFCf9@rjwysocki.net
|
|
ACPICA commit 878823ca20f1987cba0c9d4c1056be0d117ea4fe
In order to distinguish character arrays from C Strings (i.e. strings with
a terminating NUL character), add support for the "nonstring" attribute
provided by GCC. (A better name might be "ACPI_NONCSTRING", but that's
the attribute name, so stick to the existing naming convention.)
GCC 15's -Wunterminated-string-initialization will warn about truncation
of the NUL byte for string initializers unless the destination is marked
with "nonstring". Prepare for applying this attribute to the project.
Link: https://github.com/acpica/acpica/commit/878823ca
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1841930.VLH7GnMWUR@rjwysocki.net
Signed-off-by: Kees Cook <kees@kernel.org>
[ rjw: Pick up the tag from Kees ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ACPICA commit dddd9270531d74af523afa68515d8aae6a18bbe0
The ERDT table (and its many subtables) enumerate capabilities
and methods for Intel Resource Director Technology to monitor
and control L3 cache allocation and memory bandwidth by CPU
cores and IO devices.
Structure defined in the Intel Resource Director Technology (RDT)
Architecture specification downloadable from www.intel.com/sdm
Link: https://github.com/acpica/acpica/commit/dddd9270
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3296755.5fSG56mABF@rjwysocki.net
|
|
ACPICA commit b8713f71b4023a0396fe61503bbbf5226e5eed1b
Some ERDT subtables have 11 and 24 byte reserved fields.
Add the ACPI_DMT_BUF11 and ACPI_DMT_BUF24 types to describe these reserved
fields in struct acpi_dmtable_info structures.
Shorten the ACPI_SUBTABLE_HEADER_16 name to ACPI_SUBTBL_HDR
Link: https://github.com/acpica/acpica/commit/b8713f71
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3643286.iIbC2pHGDl@rjwysocki.net
|
|
ACPICA commit 022e2e4169841f429dbda677a4780830bf4c2177
1) Added source specification to MRRM table comment in actbl2.h
2) Shorten typedef from ACPI_TABLE_MRRM_MEM_RANGE_ENTRY to
struct acpi_mrrm_mem_range_entry
3) Add new typedefs to source/tools/acpisrc/astable.c
4) Fix cut and paste errors in acpi_dm_table_info_mrrm0[] definition
5) Fix indent and source code style errors in actbl2.h
6) The base/length fields in the memory range structure are system
memory addresses, not "MMIO". Update the acpi_dm_table_info_mrrm0[]
strings.
7) Add main/sub table comments to acpi_dm_dump_mrrm() and dt_compile_mrrm()
Link: https://github.com/acpica/acpica/commit/022e2e41
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2018955.PYKUYFuaPT@rjwysocki.net
|
|
ACPICA commit 73c32bc89cad64ab19c1231a202361e917e6823c
RISC-V IO Mapping Table (RIMT) is a new static table defined for RISC-V
to communicate IOMMU information to the OS. The specification for RIMT
is available at [1]. Add structure definitions for RIMT.
Link: https://github.com/riscv-non-isa/riscv-acpi-rimt [1]
Link: https://github.com/acpica/acpica/commit/73c32bc8
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/10665648.nUPlyArG6x@rjwysocki.net
|
|
ACPICA commit 04fd53b2647b9f6f98cfca551383689cb3b59362
The MRRM table describes association between physical address ranges
and "region numbers".
Structure defined in the Intel Resource Director Technology (RDT)
Architecture specification downloadable from www.intel.com/sdm
Link: https://github.com/acpica/acpica/commit/04fd53b2
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3372188.44csPzL39Z@rjwysocki.net
|
|
ACPICA commit 52840d3826bd7e183fcb555e044e190aea0b5021
New MRRM tables can have subtables that are larger than 255 bytes.
Add a new header typedef that uses u16 for Length. Could be
backported to acpi_aspt_header, struct acpi_dmar_header, struct acpi_nfit_header,
struct acpi_prmt_module_header, struct acpi_prmt_module_info. Will be used for
upcoming ERDT table.
MRRM table has a 26-byte reserved section in header. Add ACPI_DMT_BUF26
to describe this in struct acpi_dmtable_info.
Link: https://github.com/acpica/acpica/commit/52840d38
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3005638.e9J7NaK4W3@rjwysocki.net
|
|
ACPICA commit af51f730e0bccf789686cea68e116d5f0b27aacb
Added in revision 3.4 of the VT-d spec. To support SIDP, part of the
previously reserved field in the device scope structure was used to
create a 1-byte "Flags" field.
Link: https://github.com/acpica/acpica/commit/af51f730
Signed-off-by: Alexey Neyman <aneyman@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2239745.irdbgypaU6@rjwysocki.net
|
|
We need the driver core fix in here as well for testing
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When restarting a CPU powered by the PCA9450 power management IC, it
is beneficial to use the PCA9450 to power cycle the CPU and all its
connected peripherals to start up in a known state. The PCA9450 features
a cold start procedure initiated by an I2C command.
Add a restart handler so that the PCA9450 is used to restart the CPU.
The restart handler sends command 0x14 to the SW_RST register,
initiating a cold reset (Power recycle all regulators except LDO1, LDO2
and CLK_32K_OUT)
As the PCA9450 is a PMIC specific for the i.MX8M family CPU, the restart
handler priority is set just slightly higher than imx2_wdt and the PSCI
restart handler. This makes sure this restart handler takes precedence.
Signed-off-by: Paul Geurts <paul.geurts@prodrive-technologies.com>
Link: https://patch.msgid.link/20250505115936.1946891-1-paul.geurts@prodrive-technologies.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
snd_soc_unregister_component_by_driver()
We have below 2 functions, but these are very similar
(A) snd_soc_unregister_component_by_driver()
(B) snd_soc_unregister_component()
(A) void snd_soc_unregister_component_by_driver(...)
{
...
(a) mutex_lock(&client_mutex);
^ (X) component = snd_soc_lookup_component_nolocked(dev, component_driver->name);
| if (!component) ^^^^^^^^^^^^^^^^^^^^^^
| goto out;
(b)
| snd_soc_del_component_unlocked(component);
v
out:
(c) mutex_unlock(&client_mutex);
}
(B) void snd_soc_unregister_component_by_driver(...)
{
(a) mutex_lock(&client_mutex);
^ while (1) {
| (X) struct snd_soc_component *component = snd_soc_lookup_component_nolocked(dev, NULL);
| ^^^^
(b) if (!component)
| break;
|
| snd_soc_del_component_unlocked(component);
v }
(c) mutex_unlock(&client_mutex);
}
Both are calling lock (a), find component and remove it (b), and
unlock (c). The big diff is whether use driver name for lookup() or
not (X).
Merge these into snd_soc_unregister_component_by_driver() (B), and
snd_soc_unregister_component_by_driver() (A) can be macro.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/87h61qy2vn.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc into soc/drivers
These are updates from Marek Behún for the cznic platform drivers:
This series adds support for generating ECDSA signatures with hardware
stored private key on Turris Omnia and Turris MOX.
This ability is exposed via the keyctl() syscall.
* 'cznic/platform' of https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
platform: cznic: use ffs() instead of __bf_shf()
firmware: turris-mox-rwtm: fix building without CONFIG_KEYS
platform: cznic: fix function parameter names
firmware: turris-mox-rwtm: Add support for ECDSA signatures with HW private key
firmware: turris-mox-rwtm: Drop ECDSA signatures via debugfs
platform: cznic: turris-omnia-mcu: Add support for digital message signing with HW private key
platform: cznic: Add keyctl helpers for Turris platform
platform: cznic: turris-omnia-mcu: Refactor requesting MCU interrupt
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Handle soc servicing events which require the rdma auxiliary device resources to
be cleaned up during a suspend, and re-initialized during a resume.
Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1746633545-17653-5-git-send-email-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Initialize gdma device for rdma inside mana module.
For each gdma device, initialize an auxiliary ib device.
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1746633545-17653-2-git-send-email-kotaranov@linux.microsoft.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Some batteries can signal when an internal fuse was blown. In such a
case POWER_SUPPLY_HEALTH_DEAD is too vague for userspace applications
to perform meaningful diagnostics.
Additionally some batteries can also signal when some of their
internal cells are imbalanced. In such a case returning
POWER_SUPPLY_HEALTH_UNSPEC_FAILURE is again too vague for userspace
applications to perform meaningful diagnostics.
Add new health status values for both cases.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://lore.kernel.org/r/20250429003606.303870-1-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Reuse newly added DMA API to cache IOVA and only link/unlink pages
in fast path for UMEM ODP flow.
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
As a preparation to remove dma_list, store access mask in PFN pointer
and not in dma_addr_t.
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
HMM callers use PFN list to populate range while calling
to hmm_range_fault(), the conversion from PFN to DMA address
is done by the callers with help of another DMA list. However,
it is wasteful on any modern platform and by doing the right
logic, that DMA list can be avoided.
Provide generic logic to manage these lists and gave an interface
to map/unmap PFNs to DMA addresses, without requiring from the callers
to be an experts in DMA core API.
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Introduce new sticky flag (HMM_PFN_DMA_MAPPED), which isn't overwritten
by HMM range fault. Such flag allows users to tag specific PFNs with
information if this specific PFN was already DMA mapped.
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Currently the only efficient way to map a complex memory description through
the DMA API is by using the scatter list APIs. The SG APIs are unique in that
they efficiently combine the two fundamental operations of sizing and allocating
a large IOVA window from the IOMMU and processing all the per-address
swiotlb/flushing/p2p/map details.
This uniqueness has been a long standing pain point as the scatter list API
is mandatory, but expensive to use. It prevents any kind of optimization or
feature improvement (such as avoiding struct page for P2P) due to the
impossibility of improving the scatter list.
Several approaches have been explored to expand the DMA API with additional
scatterlist-like structures (BIO, rlist), instead split up the DMA API
to allow callers to bring their own data structure.
The API is split up into parts:
- Allocate IOVA space:
To do any pre-allocation required. This is done based on the caller
supplying some details about how much IOMMU address space it would need
in worst case.
- Map and unmap relevant structures to pre-allocated IOVA space:
Perform the actual mapping into the pre-allocated IOVA. This is very
similar to dma_map_page().
Thanks
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently the full set of crypto self-tests requires
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y. This is problematic in two ways.
First, developers regularly overlook this option. Second, the
description of the tests as "extra" sometimes gives the impression that
it is not required that all algorithms pass these tests.
Given that the main use case for the crypto self-tests is for
developers, make enabling CONFIG_CRYPTO_SELFTESTS=y just enable the full
set of crypto self-tests by default.
The slow tests can still be disabled by adding the command-line
parameter cryptomgr.noextratests=1, soon to be renamed to
cryptomgr.noslowtests=1. The only known use case for doing this is for
people trying to use the crypto self-tests to satisfy the FIPS 140-3
pre-operational self-testing requirements when the kernel is being
validated as a FIPS 140-3 cryptographic module.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
crypto_get_default_null_skcipher() and
crypto_put_default_null_skcipher() are no longer used, so remove them.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
For copying data between two scatterlists, just use memcpy_sglist()
instead of the so-called "null skcipher". This is much simpler.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add explicit array bounds to the function prototypes for the parameters
that didn't already get handled by the conversion to use chacha_state:
- chacha_block_*():
Change 'u8 *out' or 'u8 *stream' to u8 out[CHACHA_BLOCK_SIZE].
- hchacha_block_*():
Change 'u32 *out' or 'u32 *stream' to u32 out[HCHACHA_OUT_WORDS].
- chacha_init():
Change 'const u32 *key' to 'const u32 key[CHACHA_KEY_WORDS]'.
Change 'const u8 *iv' to 'const u8 iv[CHACHA_IV_SIZE]'.
No functional changes. This just makes it clear when fixed-size arrays
are expected.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Now that the ChaCha state matrix is strongly-typed, add a helper
function chacha_zeroize_state() which zeroizes it. Then convert all
applicable callers to use it instead of direct memzero_explicit. No
functional changes.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The ChaCha state matrix is 16 32-bit words. Currently it is represented
in the code as a raw u32 array, or even just a pointer to u32. This
weak typing is error-prone. Instead, introduce struct chacha_state:
struct chacha_state {
u32 x[16];
};
Convert all ChaCha and HChaCha functions to use struct chacha_state.
No functional changes.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Merge series from Bence Csókás <csokas.bence@prolan.hu>:
The probe() function of the atmel-quadspi driver got quite convoluted,
especially since the addition of SAMA7G5 support, that was forward-ported
from an older vendor kernel. During the port, a bug was introduced, where
the PM get() and put() calls were imbalanced. To alleivate this - and
similar problems in the future - an effort was made to migrate as many
functions as possible, to their devm_ managed counterparts. The few
functions, which did not yet have a devm_ variant, are added in patch 1 of
this series. Patch 2 then uses these APIs to fix the probe() function.
|
|
The 'extern' keyword is redundant for function prototypes. list.h never
used them and new code in general is better without them.
Link: https://lkml.kernel.org/r/20250502121742.3997529-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
sprint_OID() was added as part of 2012's commit 4f73175d0375 ("X.509: Add
utility functions to render OIDs as strings") but it hasn't been used.
Remove it.
Note that there's also 'sprint_oid' (lower case) which is used in a lot of
places; that's left as is except for fixing its case in a comment.
Link: https://lkml.kernel.org/r/20250501010502.326472-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In PTR_IF() description the text refers to the parameter as (ptr) while
the kernel-doc format asks for @ptr. Fix this accordingly.
Link: https://lkml.kernel.org/r/20250428072737.3265239-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Deduplicate the same functionality implemented in several places by
moving the cmp_int() helper macro into linux/sort.h.
The macro performs a three-way comparison of the arguments mostly useful
in different sorting strategies and algorithms.
Link: https://lkml.kernel.org/r/20250427201451.900730-1-pchelkin@ispras.ru
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carlos Maiolino <cem@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Coly Li <colyli@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This refines commit c03567a8e8d5 ("include/linux/compiler.h: don't perform
compiletime_assert with -O0") and restores #ifdef __OPTIMIZE__ symmetry by
evaluating the 'condition' variable in both compile-time variants of
__compiletimeassert().
As __OPTIMIZE__ is always true by default, this commit does not change
anything by default. But it fixes warnings with _non-default_ CFLAGS like
for instance this:
make CFLAGS_tcp.o='-Og -U__OPTIMIZE__'
from net/ipv4/tcp.c:273:
include/net/sch_generic.h: In function `qdisc_cb_private_validate':
include/net/sch_generic.h:511:30:
error: unused variable `qcb' [-Werror=unused-variable]
{
struct qdisc_skb_cb *qcb;
BUILD_BUG_ON(sizeof(skb->cb) < sizeof(*qcb));
...
}
[akpm@linux-foundation.org: regularize comment layout, reflow comment]
Link: https://lkml.kernel.org/r/20250424194048.652571-1-marc.herbert@linux.intel.com
Signed-off-by: Marc Herbert <Marc.Herbert@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Jan Hendrik Farr <kernel@jfarr.cc>
Cc: Macro Elver <elver@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Tony Ambardar <tony.ambardar@gmail.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are some spelling mistakes of 'previlege' in comments which
should be 'privilege'.
Fix them and add it to scripts/spelling.txt.
The typo in arch/loongarch/kvm/main.c was corrected by a different
patch [1] and is therefore not included in this submission.
[1]. https://lore.kernel.org/all/20250420142208.2252280-1-wheatfox17@icloud.com/
Link: https://lkml.kernel.org/r/46AD404E411A4BAC+20250421074910.66988-1-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The last use of relay_late_setup_files() was removed in 2018 by commit
2b47733045aa ("drm/i915/guc: Merge log relay file and channel creation")
Remove it and the helper it used.
relay_late_setup_files() was used for eventually registering 'buffer only'
channels. With it gone, delete the docs that explain how to do that.
Which suggests it should be possible to lose the 'has_base_filename'
flags.
(Are there any other uses??)
Link: https://lkml.kernel.org/r/20250418234932.490863-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
rio_request_dma() and rio_dma_prep_slave_sg() were added in 2012 by commit
e42d98ebe7d7 ("rapidio: add DMA engine support for RIO data transfers")
but never used.
rio_find_mport() last use was removed in 2013 by commit 9edbc30b434f
("rapidio: update enumerator registration mechanism")
rio_unregister_scan() was added in 2013 by commit a11650e11093 ("rapidio:
make enumeration/discovery configurable") but never used.
Remove them.
Link: https://lkml.kernel.org/r/20250419203012.429787-3-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
sg_next() is a short function called frequently in I/O paths. Define it
in the header file so it can be inlined into its callers.
Link: https://lkml.kernel.org/r/20250416160615.3571958-1-csander@purestorage.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Inspired by mutex blocker tracking[1], this patch makes a trade-off to
balance the overhead and utility of the hung task detector.
Unlike mutexes, semaphores lack explicit ownership tracking, making it
challenging to identify the root cause of hangs. To address this, we
introduce a last_holder field to the semaphore structure, which is updated
when a task successfully calls down() and cleared during up().
The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.
With this change, the hung task detector can now show blocker task's info
like below:
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr 8 12:19:07 2025] Tainted: G E 6.14.0-rc6+ #1
[Tue Apr 8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr 8 12:19:07 2025] task:cat state:D stack:0 pid:945 tgid:945 ppid:828 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0xe3/0xf0
[Tue Apr 8 12:19:07 2025] ? __folio_mod_stat+0x2a/0x80
[Tue Apr 8 12:19:07 2025] ? set_ptes.constprop.0+0x27/0x90
[Tue Apr 8 12:19:07 2025] __down_common+0x155/0x280
[Tue Apr 8 12:19:07 2025] down+0x53/0x70
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x23/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr 8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr 8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr 8 12:19:07 2025] task:cat state:S stack:0 pid:938 tgid:938 ppid:584 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0x77/0xf0
[Tue Apr 8 12:19:07 2025] ? __pfx_process_timeout+0x10/0x10
[Tue Apr 8 12:19:07 2025] msleep_interruptible+0x49/0x60
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x2d/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr 8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr 8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
Link: https://lkml.kernel.org/r/20250414145945.84916-3-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "hung_task: extend blocking task stacktrace dump to
semaphore", v5.
Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores. Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs. To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().
The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.
With this change, the hung task detector can now show blocker task's info
like below:
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr 8 12:19:07 2025] Tainted: G E 6.14.0-rc6+ #1
[Tue Apr 8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr 8 12:19:07 2025] task:cat state:D stack:0 pid:945 tgid:945 ppid:828 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0xe3/0xf0
[Tue Apr 8 12:19:07 2025] ? __folio_mod_stat+0x2a/0x80
[Tue Apr 8 12:19:07 2025] ? set_ptes.constprop.0+0x27/0x90
[Tue Apr 8 12:19:07 2025] __down_common+0x155/0x280
[Tue Apr 8 12:19:07 2025] down+0x53/0x70
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x23/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr 8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr 8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr 8 12:19:07 2025] task:cat state:S stack:0 pid:938 tgid:938 ppid:584 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0x77/0xf0
[Tue Apr 8 12:19:07 2025] ? __pfx_process_timeout+0x10/0x10
[Tue Apr 8 12:19:07 2025] msleep_interruptible+0x49/0x60
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x2d/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr 8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr 8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
This patch (of 3):
This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
blocker', as only one blocker is active at a time.
The blocker filed can store both the lock addrees and the lock type, with
LSB used to encode the type as Masami suggested, making it easier to
extend the feature to cover other types of locks.
Also, once the lock type is determined, we can directly extract the
address and cast it to a lock pointer ;)
Link: https://lkml.kernel.org/r/20250414145945.84916-1-ioworker0@gmail.com
Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com [1]
Link: https://lkml.kernel.org/r/20250414145945.84916-2-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
While the natural choice of PTR_IF() is kconfig.h, the latter is too broad
to include C code and actually the macro was moved out from there in the
past. But kernel.h is neither a good choice for that. Move it to
util_macros.h. Do the same for u64_to_user_ptr().
While moving, add necessary documentation.
Link: https://lkml.kernel.org/r/20250324105228.775784-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "kernel.h: Move out a couple of macros and constants".
kernel.h hosts a couple of macros and constants that may be better placed.
Do that. Also add missing documentation. No functional changes
intended.
This patch (of 2):
Headers shouldn't be forced to include <linux/kernel.h> just to
gain these simple constants.
Link: https://lkml.kernel.org/r/20250324105228.775784-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20250324105228.775784-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove __HAVE_ARCH_KSTACK_END as it has been obsolete since removal of
metag architecture in v4.17.
Link: https://lkml.kernel.org/r/20250403-kstack-end-v1-1-7798e71f34d1@linaro.org
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/20240311164638.2015063-2-pasha.tatashin@soleen.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It is useful to be able to access current->mm at task exit to, say, record
a bunch of VMA information right before the task exits (e.g., for stack
symbolization reasons when dealing with short-lived processes that exit in
the middle of profiling session). Currently, trace_sched_process_exit()
is triggered after exit_mm() which resets current->mm to NULL making this
tracepoint unsuitable for inspecting and recording task's
mm_struct-related data when tracing process lifetimes.
There is a particularly suitable place, though, right after
taskstats_exit() is called, but before we do exit_mm() and other exit_*()
resource teardowns. taskstats performs a similar kind of accounting that
some applications do with BPF, and so co-locating them seems like a good
fit. So that's where trace_sched_process_exit() is moved with this patch.
Also, existing trace_sched_process_exit() tracepoint is notoriously
missing `group_dead` flag that is certainly useful in practice and some of
our production applications have to work around this. So plumb
`group_dead` through while at it, to have a richer and more complete
tracepoint.
Note that we can't use sched_process_template anymore, and so we use
TRACE_EVENT()-based tracepoint definition. But all the field names and
order, as well as assign and output logic remain intact. We just add one
extra field at the end in backwards-compatible way.
[andrii@kernel.org: document sched_process_exit and sched_process_template relation]
Link: https://lkml.kernel.org/r/20250403174120.4087794-1-andrii@kernel.org
Link: https://lkml.kernel.org/r/20250402180925.90914-1-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently the VMA and mmap locking logic is entangled in two of the most
overwrought files in mm - include/linux/mm.h and mm/memory.c. Separate
this logic out so we can more easily make changes and create an
appropriate MAINTAINERS entry that spans only the logic relating to
locking.
This should have no functional change. Care is taken to avoid dependency
loops, we must regrettably keep release_fault_lock() and
assert_fault_locked() in mm.h as a result due to the dependence on the
vm_fault type.
Additionally we must declare rcuwait_wake_up() manually to avoid a
dependency cycle on linux/rcuwait.h.
Additionally move the nommu implementatino of lock_mm_and_find_vma() to
mmap_lock.c so everything lock-related is in one place.
Link: https://lkml.kernel.org/r/bec6c8e29fa8de9267a811a10b1bdae355d67ed4.1744799282.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
free_page_and_swap_cache() takes a struct page pointer as input parameter,
but it will immediately convert it to folio and all operations following
within use folio instead of page. It makes more sense to pass in folio
directly.
Convert free_page_and_swap_cache() to free_folio_and_swap_cache() to
consume folio directly.
Link: https://lkml.kernel.org/r/20250416201720.41678-1-nifan.cxl@gmail.com
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Adam Manzanares <a.manzanares@samsung.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Luis Chamberalin <mcgrof@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to support rebalancing and spanning stores using less than the
worst case number of nodes, we need to track more than just the vacant
height. Using only vacant height to reduce the worst case maple node
allocation count can lead to a shortcoming of nodes in the following
scenarios.
For rebalancing writes, when a leaf node becomes insufficient, it may be
combined with a sibling into a single node. This means that the parent
node which has entries for this children will lose one entry. If this
parent node was just meeting the minimum entries, losing one entry will
now cause this parent node to be insufficient. This leads to a cascading
operation of rebalancing at different levels and can lead to more node
allocations than simply using vacant height can return.
For spanning writes, a similar situation occurs. At the location at which
a spanning write is detected, the number of ancestor nodes may similarly
need to rebalanced into a smaller number of nodes and the same cascading
situation could occur.
To use less than the full height of the tree for the number of
allocations, we also need to track the height at which a non-leaf node
cannot become insufficient. This means even if a rebalance occurs to a
child of this node, it currently has enough entries that it can lose one
without any further action. This field is stored in the maple write state
as sufficient height. In mas_prealloc_calc() when figuring out how many
nodes to allocate, we check if the vacant node is lower in the tree than a
sufficient node (has a larger value). If it is, we cannot use the vacant
height and must use the difference in the height and sufficient height as
the basis for the number of nodes needed.
An off by one bug was also discovered in mast_overflow() where it is using
>= rather than >. This caused extra iterations of the
mas_spanning_rebalance() loop and lead to unneeded allocations. A test is
also added to check the number of allocations is correct.
Link: https://lkml.kernel.org/r/20250410191446.2474640-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to determine the store type for a maple tree operation, a walk of
the tree is done through mas_wr_walk(). This function descends the tree
until a spanning write is detected or we reach a leaf node. While
descending, keep track of the height at which we encounter a node with
available space. This is done by checking if mas->end is less than the
number of slots a given node type can fit.
Now that the height of the vacant node is tracked, we can use the
difference between the height of the tree and the height of the vacant
node to know how many levels we will have to propagate creating new nodes.
Update mas_prealloc_calc() to consider the vacant height and reduce the
number of worst-case allocations.
Rebalancing and spanning stores are not supported and fall back to using
the full height of the tree for allocations.
Update preallocation testing assertions to take into account vacant
height.
Link: https://lkml.kernel.org/r/20250410191446.2474640-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Split page table locks are not used for pgtables associated to init_mm, at
any level. pte_alloc_kernel() does not call ptlock_init() as a result.
There is however no separate alloc/free functions for kernel PMDs, and
pmd_ptlock_init() is called unconditionally. When ALLOC_SPLIT_PTLOCKS is
true (e.g. 32-bit architectures or if CONFIG_PREEMPT_RT is selected),
this results in unnecessary dynamic memory allocation every time a kernel
PMD is allocated.
Now that pagetable_pmd_ctor() is passed the associated mm, we can easily
remove this overhead by skipping pmd_ptlock_init() if the pgtable is
associated to init_mm. No special-casing is needed on the dtor path, as
ptlock_free() is already called unconditionally for all levels.
(ptlock_free() is a no-op unless a ptlock was allocated for the given
PTP.)
Link: https://lkml.kernel.org/r/20250408095222.860601-8-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <x86@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|