Age | Commit message (Collapse) | Author |
|
During the CPU offlining process, the various timer tick features are
shut down from scattered places, sometimes from teardown callbacks on
stop machine, sometimes through explicit calls, sometimes from the
control CPU after the CPU died. The reason why these shutdown operations
are spread around is not always clear and it makes the tick lifecycle
hard to follow.
The tick should be shut down in order from highest to lowest level:
On stop machine from the dying CPU (high-level):
1) Hand-over the timekeeping duty (tick_handover_do_timer())
2) Cancel the tick implementation called by the clockevent callback
(tick_cancel_sched_timer())
3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline())
On stop machine from the dying CPU (low-level):
4) Shutdown clockevents drivers (CPUHP_AP_*_TIMER_STARTING states)
From the control CPU after the CPU died (low-level):
5) Shutdown/unregister/cleanup clockevents for the dead CPU
(tick_cleanup_dead_cpu())
Instead the current order is 2, 4 (both from CPU hotplug states), then
1 and 3 through direct calls. This layout and order don't make much
sense. The operations 1, 2, 3 should be gathered together and in order.
Sort this situation with creating a new TICK shut-down CPU hotplug state
and start with introducing the timekeeping duty hand-over there. The
state must precede hrtimers migration because the tick hrtimer will be
stopped from it in a further patch.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240225225508.11587-8-frederic@kernel.org
|
|
Minor typo in the suspend description.
Signed-off-by: Marcel Hamer <marcel.hamer@windriver.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240216155022.79371-1-marcel.hamer@windriver.com
|
|
Use existing typedef for dma_filter_fn to avoid duplicating type
definition.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240208202113.630190-1-krzysztof.kozlowski@linaro.org
|
|
The SLAB_KASAN flag prevents merging of caches in some configurations,
which is handled in a rather complicated way via kasan_never_merge().
Since we now have a generic SLAB_NO_MERGE flag, we can instead use it
for KASAN caches in addition to SLAB_KASAN in those configurations,
and simplify the SLAB_NEVER_MERGE handling.
Tested-by: Xiongwei Song <xiongwei.song@windriver.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The values of SLAB_ cache creation flags are defined by hand, which is
tedious and error-prone. Use an enum to assign the bit number and a
__SLAB_FLAG_BIT() macro to #define the final flags.
This renumbers the flag values, which is OK as they are only used
internally.
Also define a __SLAB_FLAG_UNUSED macro to assign value to flags disabled
by their respective config options in a unified and sparse-friendly way.
Reviewed-and-tested-by: Xiongwei Song <xiongwei.song@windriver.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed. SLUB instead relies on the page allocator's NUMA policies.
Change the flag's value to 0 to free up the value it had, and mark it
for full removal once all users are gone.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20240131172027.10f64405@gandalf.local.home/
Reviewed-and-tested-by: Xiongwei Song <xiongwei.song@windriver.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver
core can properly handle constant struct device_type. Move the
tb_domain_type, tb_retimer_type, tb_switch_type, usb4_port_device_type,
tb_service_type and tb_xdomain_type variables to be constant structures as
well, placing it into read-only memory which can not be modified at
runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
We need it here for the USB fixes, and it resolves a merge conflict as
reported in linux-next in drivers/usb/roles/class.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The xlate callbacks are supposed to translate of_phandle_args to proper
provider without modifying the of_phandle_args. Make the argument
pointer to const for code safety and readability.
Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Thierry Reding <treding@nvidia.com> # Tegra
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Alim Akhtar <alim.akhtar@samsung.com> # Samsung
Link: https://lore.kernel.org/r/20240220072213.35779-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
|
|
Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock. This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch. However, such deadlocks are becoming
more frequent. In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.
In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
just as well take advantage of that fact.
This commit therefore adds the data structures that will be needed
to rely on these quiescent states and to eliminate these deadlocks.
Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
Introduce a new notification type UBI_VOLUME_SHUTDOWN to inform users
that a volume is just about to be removed.
This is needed because users (such as the NVMEM subsystem) expect that
at the time their removal function is called, the parenting device is
still available (for removal of sysfs nodes, for example, in case of
NVMEM which otherwise WARNs on volume removal).
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Add min_alarm and max_alarm attributes for humidityX to support devices
that can generate these alarms.
Such attributes already exist for other magnitudes such as tempX.
Tested with a ChipCap 2 temperature-humidity sensor.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://lore.kernel.org/r/20240130-topic-chipcap2-v6-2-260bea05cf9b@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull RCU pathwalk fixes from Al Viro:
"We still have some races in filesystem methods when exposed to RCU
pathwalk. This series is a result of code audit (the second round of
it) and it should deal with most of that stuff.
Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
Up to maintainers (a note for NTFS folks - when documentation says
that a method may not block, it *does* imply that blocking allocations
are to be avoided. Really)"
[ More explanations for people who aren't familiar with the vagaries of
RCU path walking: most of it is hidden from filesystems, but if a
filesystem actively participates in the low-level path walking it
needs to make sure the fields involved in that walk are RCU-safe.
That "actively participate in low-level path walking" includes things
like having its own ->d_hash()/->d_compare() routines, or by having
its own directory permission function that doesn't just use the common
helpers. Having a ->d_revalidate() function will also have this issue.
Note that instead of making everything RCU safe you can also choose to
abort the RCU pathwalk if your operation cannot be done safely under
RCU, but that obviously comes with a performance penalty. One common
pattern is to allow the simple cases under RCU, and abort only if you
need to do something more complicated.
So not everything needs to be RCU-safe, and things like the inode etc
that the VFS itself maintains obviously already are. But these fixes
tend to be about properly RCU-delaying things like ->s_fs_info that
are maintained by the filesystem and that got potentially released too
early. - Linus ]
* tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ext4_get_link(): fix breakage in RCU mode
cifs_get_link(): bail out in unsafe case
fuse: fix UAF in rcu pathwalks
procfs: make freeing proc_fs_info rcu-delayed
procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
nfs: fix UAF on pathwalk running into umount
nfs: make nfs_set_verifier() safe for use in RCU pathwalk
afs: fix __afs_break_callback() / afs_drop_open_mmap() race
hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
affs: free affs_sb_info with kfree_rcu()
rcu pathwalk: prevent bogus hard errors from may_lookup()
fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
|
|
Pull vfs fixes from Al Viro:
"A couple of fixes - revert of regression from this cycle and a fix for
erofs failure exit breakage (had been there since way back)"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
erofs: fix handling kern_mount() failure
Revert "get rid of DCACHE_GENOCIDE"
|
|
http://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
IIO: 1st set of new device support, features and cleanup for 6.9
IIO Backend support
===================
New approach from Nuno Sa to the problem of reuse of drivers with
IIO devices that are actually the combination of a highspeed chip
and an FPGA core handling the data capture and flows. It will hopefully
also apply to some other split designs. The ad9467 and axi-adi drivers
are converted over to this framework.
New device support
==================
adi,admfm2000
- New driver for this dual microwave down converter.
ams,as73211
- Add support for as7331 UV sensor.
richtek,rtq6056
- Add support for related parts RTQ6053 and RTQ6059
st,lsm6dsx
- Add ASM330LHHXG1 accelerometer and gyro support (mainly IDs)
ti,ads1298
- New driver for this medical ADC.
Features
========
tests
- Unit tests for the gain-time-scale helper library.
bosch,bmi088
- I2C support.
bosh,bmi160
- Add 10EC5280 ACPI ID. Used in a number of devices that won't get fixed.
The ID is actually a PCI ID belonging to realtech. No response was received
to earlier attempts to notify them of this.
The manufacturers of some devices have replied to say they will not fix
this incorrect ID. Add the ID and hope it isn't a problem.
bosch,bmi323
- Add BOSC0200 ACPI ID. Note this is a duplicate of one in the bmc150
driver (it appears these parts share a windows driver).
Both drivers perform an ID check that is safe on the other part before
successfully probing.
hid-sensors-als
- Add color temperature and chromaticity support. Note this is a replacement
for the series reverted in 6.8 that correctly handles all the potential
channel combinations.
honeywell,hsc030pa
- Triggered buffer support (after driver cleanup).
honeywell,mprls00025pa
- Improved error handling.
- New DT binding to allow use of part number triplet as provided in data sheet
to specify equivalent of most of the binding more efficiently.
- SPI support.
memsic,mxc4005
- ACPI ID MDA6655 as seen in the Chuwi Minibook X 2023
ti,hdc3020
- Add threshold event support (after some driver cleanup)
veml,vcnl4000
- Switch to high resolution proximity measurement.
Cleanup
=======
Various minor typo fixes and better use of defines etc.
Treewide
- Stop using ACPI_PTR(). The savings in space are small and not worth
the complexity of __maybe_unused of ifdef guards. To avoid use in
new IIO drivers based on copy and paste, clean it out.
- cleanup.h based handling of iio_device_claim_direct_mode()/
iio_device_release_direct_mode() using scope_cond_guard().
In many drivers this is combined with other automated cleanup
to give maximum simplifications.
An initial set of drivers are converted over to this infrastructure.
Tools
- Use rewinddir() instead of seekdir() to return to start of file.
core
- Make iio_bus_type constant.
adi,ad16475
- Use irq_get_trigger_type() instead of opencoding.
adi,ad16480
- Use irq_get_trigger_type() instead of opencoding.
adi,ad-sigma-delta
- Avoid overwriting IRQ flags if provided by firmware.
ams,as73211
- Use IIO_VAL_FRACTIONAL for scales to simplify the code and potentially
improve accuracy.
gts-library
- Use a div64_u64() instead of a loop to do a division.
honeywell,mprls00025pa
- Clean up dt-binding doc.
- Drop defaults when DT binding not providing values. Very unlikely
these were useful given they were wrong for vast majority of supported
devices.
- Whitespace cleanup
miramems,da280
- Use i2c_get_match_data() to replace hand rolled ACPI matching code.
semtech,sx9324
- Avoid unnecessary copying of property strings.
st,lsm6dsx
- Improve docs, particularly wrt to making addition of new device
support less noisy.
st,lsm9ds0
- Use dev_err_probe() in all probe() error handling.
- Improved header includes.
- Tidy up termination of ID tables.
ti,ads1014
- Correct upper bound on PGA (wrong value had no actual impact)
ti,afe4403/4404
- devm_ useage to simplify error handling in probe() and allow() remove to
be dropped.
voltage-divider
- Add dt-binding for io-channel-cells to allow such a device to be both
an IIO consumer and IIO producer at the same time.
* tag 'iio-for-6.9a' of http://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (106 commits)
iio: imu: bmi323: Add ACPI Match Table
iio: accel: bmc150: Document duplicate ACPI entries with bmi323 driver
iio: adc: ti-ads1298: Add driver
dt-bindings: iio: adc: ti-ads1298: Add bindings
iio: pressure: hsc030pa add triggered buffer
iio: pressure: hsc030pa add mandatory delay
iio: pressure: hsc030pa: update datasheet URLs
iio: pressure: hsc030pa: include cleanup
iio: pressure: hsc030pa: use signed type to hold div_64() result
dt-bindings: iio: pressure: honeywell,hsc030pa.yaml add spi props
iio: st_sensors: lsm9ds0: Use common style for terminator in ID tables
iio: st_sensors: lsm9ds0: Don't use "proxy" headers
iio: st_sensors: lsm9ds0: Use dev_err_probe() everywhere
iio: adc: adi-axi-adc: move to backend framework
iio: adc: ad9467: convert to backend framework
iio: add the IIO backend framework
iio: buffer-dmaengine: export buffer alloc and free functions
of: property: add device link support for io-backends
dt-bindings: adc: axi-adc: update bindings for backend framework
dt-bindings: adc: ad9467: add new io-backend property
...
|
|
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-29-adbd023e19cc@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Move both of them to the private block header. There's no caller in the
tree anymore that uses them directly.
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-28-adbd023e19cc@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-27-adbd023e19cc@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-9-adbd023e19cc@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-5-adbd023e19cc@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-4-adbd023e19cc@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add two new helpers to allow opening block devices as files.
This is not the final infrastructure. This still opens the block device
before opening a struct a file. Until we have removed all references to
struct bdev_handle we can't switch the order:
* Introduce blk_to_file_flags() to translate from block specific to
flags usable to pen a new file.
* Introduce bdev_file_open_by_{dev,path}().
* Introduce temporary sb_bdev_handle() helper to retrieve a struct
bdev_handle from a block device file and update places that directly
reference struct bdev_handle to rely on it.
* Don't count block device openes against the number of open files. A
bdev_file_open_by_{dev,path}() file is never installed into any
file descriptor table.
One idea that came to mind was to use kernel_tmpfile_open() which
would require us to pass a path and it would then call do_dentry_open()
going through the regular fops->open::blkdev_open() path. But then we're
back to the problem of routing block specific flags such as
BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste
FMODE_* flags every time we add a new one. With this we can avoid using
a flag bit and we have more leeway in how we open block devices from
bdev_open_by_{dev,path}().
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When we open block devices as files we want to make sure to not charge
them against the open file limit of the caller as that can cause
spurious failures.
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Some Exynos based SoCs like Tensor gs101 protect the PMU registers for
security hardening reasons so that they are only write accessible in el3
via an SMC call.
As most Exynos drivers that need to write PMU registers currently obtain a
regmap via syscon (phys, pinctrl, watchdog). Support for the above usecase
is implemented in this driver using a custom regmap similar to syscon to
handle the SMC call. Platforms that don't secure PMU registers, get a mmio
regmap like before. As regmaps abstract out the underlying register access
changes to the leaf drivers are minimal.
A new API exynos_get_pmu_regmap_by_phandle() is provided for leaf drivers
that currently use syscon_regmap_lookup_by_phandle(). This also handles
deferred probing.
Tested-by: Sam Protsenko <semen.protsenko@linaro.org>
Tested-by: Alexey Klimov <alexey.klimov@linaro.org>
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Link: https://lore.kernel.org/r/20240220220613.797068-2-peter.griffin@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
|
|
GCC recently added option -fmin-function-alignment, which should appear
in GCC 14. Unlike -falign-functions, this option causes all functions to
be aligned at the specified value, including the cold ones.
In particular, when an arm64 kernel is built with
DYNAMIC_FTRACE_WITH_CALL_OPS=y, the 8-byte function alignment is
required for correct functionality. This was done by -falign-functions=8
and having workarounds in the kernel to force the compiler to follow
this alignment. The new -fmin-function-alignment option directly
guarantees it.
Detect availability of -fmin-function-alignment and use it instead of
-falign-functions when present. Introduce CC_HAS_SANE_FUNCTION_ALIGNMENT
and enable __cold to work as expected when it is set.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
is still synchronous, but that's not a problem - it does
rcu-delay everything that needs to be)
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
NFS ->d_revalidate(), ->permission() and ->get_link() need to access
some parts of nfs_server when called in RCU mode:
server->flags
server->caps
*(server->io_stats)
and, worst of all, call
server->nfs_client->rpc_ops->have_delegation
(the last one - as NFS_PROTO(inode)->have_delegation()). We really
don't want to RCU-delay the entire nfs_free_server() (it would have
to be done with schedule_work() from RCU callback, since it can't
be made to run from interrupt context), but actual freeing of
nfs_server and ->io_stats can be done via call_rcu() just fine.
nfs_client part is handled simply by making nfs_free_client() use
kfree_rcu().
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Intel VT-d fixes for nested domain handling:
- Cache invalidation for changes in a parent domain
- Dirty tracking setting for parent and nested domains
- Fix a constant-out-of-range warning
- ARM SMMU fixes:
- Fix CD allocation from atomic context when using SVA with SMMUv3
- Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
breaks the boot for Qualcomm MSM8996 devices
- Restore SVA handle sharing in core code as it turned out there are
still drivers relying on it
* tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/sva: Restore SVA handle sharing
iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock
iommu/vt-d: Fix constant-out-of-range warning
iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking
iommu/vt-d: Add missing dirty tracking set for parent domain
iommu/vt-d: Wrap the dirty tracking loop to be a helper
iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
iommu/vt-d: Add missing device iotlb flush for parent domain
iommu/vt-d: Update iotlb in nested domain attach
iommu/vt-d: Add missing iotlb flush for parent domain
iommu/vt-d: Add __iommu_flush_iotlb_psi()
iommu/vt-d: Track nested domains in parent
Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"A collection of significant fixes for the CXL subsystem.
The largest change in this set, that bordered on "new development", is
the fix for the fact that the location of the new qos_class attribute
did not match the Documentation. The fix ends up deleting more code
than it added, and it has a new unit test to backstop basic errors in
this interface going forward. So the "red-diff" and unit test saved
the "rip it out and try again" response.
In contrast, the new notification path for firmware reported CXL
errors (CXL CPER notifications) has a locking context bug that can not
be fixed with a red-diff. Given where the release cycle stands, it is
not comfortable to squeeze in that fix in these waning days. So, that
receives the "back it out and try again later" treatment.
There is a regression fix in the code that establishes memory NUMA
nodes for platform CXL regions. That has an ack from x86 folks. There
are a couple more fixups for Linux to understand (reassemble) CXL
regions instantiated by platform firmware. The policy around platforms
that do not match host-physical-address with system-physical-address
(i.e. systems that have an address translation mechanism between the
address range reported in the ACPI CEDT.CFMWS and endpoint decoders)
has been softened to abort driver load rather than teardown the memory
range (can cause system hangs). Lastly, there is a robustness /
regression fix for cases where the driver would previously continue in
the face of error, and a fixup for PCI error notification handling.
Summary:
- Fix NUMA initialization from ACPI CEDT.CFMWS
- Fix region assembly failures due to async init order
- Fix / simplify export of qos_class information
- Fix cxl_acpi initialization vs single-window-init failures
- Fix handling of repeated 'pci_channel_io_frozen' notifications
- Workaround platforms that violate host-physical-address ==
system-physical address assumptions
- Defer CXL CPER notification handling to v6.9"
* tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/acpi: Fix load failures due to single window creation failure
acpi/ghes: Remove CXL CPER notifications
cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
cxl/test: Add support for qos_class checking
cxl: Fix sysfs export of qos_class for memdev
cxl: Remove unnecessary type cast in cxl_qos_class_verify()
cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
cxl/region: Allow out of order assembly of autodiscovered regions
cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
x86/numa: Fix the sort compare func used in numa_fill_memblks()
x86/numa: Fix the address overlap check in numa_fill_memblks()
cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
|
|
The new helper function is needed to help blk-mq check if it needs to
dispatch the softirq on another CPU to match the performance level the
IO requester is running at. This is important on HMP systems where not
all CPUs have the same compute capacity.
Signed-off-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240223155749.2958009-2-qyousef@layalina.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The VM_ALLOW_ANY_UNCACHED flag is implemented for ARM64, allowing KVM
stage 2 device mapping attributes to use NormalNC rather than
DEVICE_nGnRE, which allows guest mappings supporting write-combining
attributes (WC). ARM does not architecturally guarantee this is safe,
and indeed some MMIO regions like the GICv2 VCPU interface can trigger
uncontained faults if NormalNC is used.
Even worse, the expectation is that there are platforms where even
DEVICE_nGnRE can allow uncontained faults in corner cases. Unfortunately
existing ARM IP requires platform integration to take responsibility to
prevent this.
To safely use VFIO in KVM the platform must guarantee full safety in the
guest where no action taken against a MMIO mapping can trigger an
uncontained failure. The assumption is that most VFIO PCI platforms
support this for both mapping types, at least in common flows, based
on some expectations of how PCI IP is integrated. This can be enabled
more broadly, for instance into vfio-platform drivers, but only after
the platform vendor completes auditing for safety.
The VMA flag VM_ALLOW_ANY_UNCACHED was found to be the simplest and
cleanest way to communicate the information from VFIO to KVM that
mapping the region in S2 as NormalNC is safe. KVM consumes it to
activate the code that does the S2 mapping as NormalNC.
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20240224150546.368-3-ankita@nvidia.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
A minimum sampling rate value of 10ms was introduced in:
commit cef9615a853e ("[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case")
The use of this value was removed in:
commit ed4676e25463 ("cpufreq: Replace "max_transition_latency" with "dynamic_switching"")
Remove:
- a comment referencing this value
- an unused macro associated to this value
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Refactor the code left in write_cache_pages into an iterator that the file
system can call to get the next folio for a writeback operation:
struct folio *folio = NULL;
while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
error = <do per-folio writeback>;
}
The twist here is that the error value is passed by reference, so that the
iterator can restore it when breaking out of the loop.
Handling of the magic AOP_WRITEPAGE_ACTIVATE value stays outside the
iterator and needs is just kept in the write_cache_pages legacy wrapper.
in preparation for eventually killing it off.
Heavily based on a for_each* based iterator from Matthew Wilcox.
Link: https://lkml.kernel.org/r/20240215063649.2164017-14-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a loop counter inside the folio_batch to let us iterate from 0-nr
instead of decrementing nr and treating the batch as a stack. It would
generate some very weird and suboptimal I/O patterns for page writeback to
iterate over the batch as a stack.
Link: https://lkml.kernel.org/r/20240215063649.2164017-11-hch@lst.de
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This simple helper will be the basis of the writeback iterator. To make
this work, we need to remember the current index and end positions in
writeback_control.
[hch@lst.de: heavily rebased, add helpers to get the tag and end index, don't keep the end index in struct writeback_control]
Link: https://lkml.kernel.org/r/20240215063649.2164017-9-hch@lst.de
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
[hch@lst.de: split from a larger patch]
Link: https://lkml.kernel.org/r/20240215063649.2164017-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extend DAMOS quota goal metric with system wide memory pressure stall
time. Specifically, the system level 'some' PSI for memory is used. The
target value can be set in microseconds. DAMOS measures the increased
amount of the PSI metric in last quota_reset_interval and use the ratio of
it versus the user-specified target PSI value as the score for the
auto-tuning feedback loop.
Link: https://lkml.kernel.org/r/20240219194431.159606-14-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
DAMOS quota auto-tuning asks users to assess the current tuned quota and
provide the feedback in a manual and repeated way. It allows users
generate the feedback from a source that the kernel cannot access, and
writing a script or a function for doing the manual and repeated feeding
is not a big deal. However, additional works are additional works, and it
could be more efficient if DAMOS could do the fetch itself, especially in
case of DAMON sysfs interface use case, since it can avoid the context
switches between the user-space and the kernel-space, though the overhead
would be only trivial in most cases. Also in many cases, feedbacks could
be made from kernel-accessible sources, such as PSI, CPU usage, etc. Make
the quota goal to support multiple types of metrics including such ones.
Link: https://lkml.kernel.org/r/20240219194431.159606-13-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
DAMOS quota auto-tuning feature let users to set the goal by providing a
function for getting the current score of the tuned quota. It allows
flexible goal setup, but only simple user-set quota is currently being
used. As a result, the only user of the DAMOS quota auto-tuning is using
a silly void pointer casting based score value passing function. Simplify
the interface and the user code by letting user directly set the target
and the current value.
Link: https://lkml.kernel.org/r/20240219194431.159606-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
DAMOS quota auto-tuning feature supports static signle goal and dynamic
multiple goals via DAMON kernel API, specifically via ->goal and ->goals
fields of damos_quota struct, respectively. All in-tree DAMOS kernel API
users are using only the dynamic multiple goals now. Remove the unsued
static single goal interface.
Link: https://lkml.kernel.org/r/20240219194431.159606-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The feedback-driven DAMOS quota auto-tuning feature allows only single
goal to the DAMON kernel API users. The API users could implement
multiple goals for the end-users on their level, and that's what DAMON
sysfs interface is doing. More DAMON kernel API users such as
DAMON_RECLAIM would need to do similar work. To reduce unnecessary future
duplciated efforts, support multiple goals from DAMOS core layer. To make
the support in minimum non-destructive change, keep the old single goal
setup interface, and add multiple goals setup. The single goal will
treated as one of the multiple goals, so old API users are not required to
make any change.
Link: https://lkml.kernel.org/r/20240219194431.159606-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
'struct damos_quota' is not small now. Split out fields for quota goal to
a separate struct for easier reading.
Link: https://lkml.kernel.org/r/20240219194431.159606-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The comments and definition of 'struct damos_quota' lists a few fields for
effective quota generation first, fields for regions prioritization under
the quota, and then remaining fields for effective quota generation.
Readers' should unnecesssarily switch their context in the middle. List
all the fields for the effective quota first, and then fields for the
prioritization for making it easier to read.
Link: https://lkml.kernel.org/r/20240219194431.159606-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/damon: let DAMOS feeds and tame/auto-tune itself".
The Aim-oriented Feedback-driven DAMOS Aggressiveness Auto-tuning
patchset[1] which has merged since commit 9294a037c015 ("mm/damon/core:
implement goal-oriented feedback-driven quota auto-tuning") made the
mechanism and the policy separated. That is, users can set a part of
DAMOS control policies without a deep understanding of the mechanism but
just their demands such as SLA.
However, users are still required to do some additional work of manually
collecting their target metric and feeding it to DAMOS. In the case of
end-users who use DAMON sysfs interface, the context switches between
user-space and kernel-space could also make it inefficient. The overhead
is supposed to be only trivial in common cases, though. Meanwhile, in
simple use cases, the target metric could be common system metrics that
the kernel can efficiently self-retrieve, such as memory pressure stall
time (PSI).
Extend DAMOS quota auto-tuning to support multiple types of metrics
including the DAMOS self-retrievable ones, and add support for memory
pressure stall time metric. Different types of metrics can be supported
in future. The auto-tuning capability is currently supported for only
users of DAMOS kernel API and DAMON sysfs interface. Extend the support
to DAMON_RECLAIM.
Patches Sequence
================
First five patches are for helping debugging and fine-tuning existing
quota control features. The first one (patch 1) exposes the effective
quota that is made with given user inputs to DAMOS kernel API users and
kernel-doc documents. Following four patches implement (patches 1, 2 and
3) and document (patches 4 and 5) a new DAMON sysfs file that exposes the
value.
Following six patches cleanup and simplify the existing DAMOS quota
auto-tuning code by improving layout of comments and data structures
(patches 6 and 7), supporting common use cases, namely multiple goals
(patches 8, 9 and 10), and simplifying the interface (patch 11).
Then six patches for the main purpose of this patchset follow. The first
three changes extend the core logic for various target metrics (patch 12),
implement memory pressure stall time-based target metric support (patch
13), and update DAMON sysfs interface to support the new target metric
(patch 14). Then, documentation updates for the features on design (patch
15), ABI (patch 16), and usage (patch 17) follow.
Last three patches add auto-tuning support on DAMON_RECLAIM. The patches
implement DAMON_RECLAIM parameters for user-feedback driven quota
auto-tuning (patch 18), memory pressure stall time-driven quota
self-tuning (patch 19), and finally update the DAMON_RECLAIM usage
document for the new parameters (patch 20).
[1] https://lore.kernel.org/all/20231130023652.50284-1-sj@kernel.org/
This patch (of 20):
DAMOS allow users to specify the quota as they want in multiple ways
including time quota, size quota, and feedback-based auto-tuning. DAMOS
makes one effective quota out of the inputs and use it at the end.
Knowing the current effective quota helps understanding DAMOS' internal
mechanism and fine-tuning quotas. DAMON kernel API users can get the
information from ->esz field of damos_quota struct, but the field is
marked as private purpose, and not kernel-doc documented. Make it public
and document.
Link: https://lkml.kernel.org/r/20240219194431.159606-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240219194431.159606-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes
need be built in to avoid compiling error when building kexec code even
though the crash dumping functionality is not enabled. E.g
--------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
---------------------
After splitting out crashkernel reservation code and vmcoreinfo exporting
code, there's only crash related code left in kernel/crash_core.c. Now
move crash related codes from kexec_core.c to crash_core.c and only build it
in when CONFIG_CRASH_DUMP=y.
And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope,
or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP
ifdef in generic kernel files.
With these changes, crash_core codes are abstracted from kexec codes and
can be disabled at all if only kexec reboot feature is wanted.
Link: https://lkml.kernel.org/r/20240124051254.67105-5-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now move the relevant codes into separate files:
kernel/crash_reserve.c, include/linux/crash_reserve.h.
And add config item CRASH_RESERVE to control its enabling.
And also update the old ifdeffery of CONFIG_CRASH_CORE, including of
<linux/crash_core.h> and config item dependency on CRASH_CORE
accordingly.
And also do renaming as follows:
- arch/xxx/kernel/{crash_core.c => vmcore_info.c}
because they are only related to vmcoreinfo exporting on x86, arm64,
riscv.
And also Remove config item CRASH_CORE, and rely on CONFIG_KEXEC_CORE to
decide if build in crash_core.c.
[yang.lee@linux.alibaba.com: remove duplicated include in vmcore_info.c]
Link: https://lkml.kernel.org/r/20240126005744.16561-1-yang.lee@linux.alibaba.com
Link: https://lkml.kernel.org/r/20240124051254.67105-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Split crash out from kexec and clean up related config
items", v3.
Motivation:
=============
Previously, LKP reported a building error. When investigating, it can't
be resolved reasonablly with the present messy kdump config items.
https://lore.kernel.org/oe-kbuild-all/202312182200.Ka7MzifQ-lkp@intel.com/
The kdump (crash dumping) related config items could causes confusions:
Firstly,
CRASH_CORE enables codes including
- crashkernel reservation;
- elfcorehdr updating;
- vmcoreinfo exporting;
- crash hotplug handling;
Now fadump of powerpc, kcore dynamic debugging and kdump all selects
CRASH_CORE, while fadump
- fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
global variable 'elfcorehdr_addr';
- kcore only needs vmcoreinfo exporting;
- kdump needs all of the current kernel/crash_core.c.
So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
mislead people that we enable crash dumping, actual it's not.
Secondly,
It's not reasonable to allow KEXEC_CORE select CRASH_CORE.
Because KEXEC_CORE enables codes which allocate control pages, copy
kexec/kdump segments, and prepare for switching. These codes are
shared by both kexec reboot and kdump. We could want kexec reboot,
but disable kdump. In that case, CRASH_CORE should not be selected.
--------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
---------------------
Thirdly,
It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.
That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
code built in doesn't make any sense because no kernel loading or
switching will happen to utilize the KEXEC_CORE code.
---------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_DUMP=y
---------------------
In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
while CRASH_DUMP can still be enabled when !MMU, then compiling error is
seen as the lkp test robot reported in above link.
------arch/sh/Kconfig------
config ARCH_SUPPORTS_KEXEC
def_bool MMU
config ARCH_SUPPORTS_CRASH_DUMP
def_bool BROKEN_ON_SMP
---------------------------
Changes:
===========
1, split out crash_reserve.c from crash_core.c;
2, split out vmcore_infoc. from crash_core.c;
3, move crash related codes in kexec_core.c into crash_core.c;
4, remove dependency of FA_DUMP on CRASH_DUMP;
5, clean up kdump related config items;
6, wrap up crash codes in crash related ifdefs on all 8 arch-es
which support crash dumping, except of ppc;
Achievement:
===========
With above changes, I can rearrange the config item logic as below (the right
item depends on or is selected by the left item):
PROC_KCORE -----------> VMCORE_INFO
|----------> VMCORE_INFO
FA_DUMP----|
|----------> CRASH_RESERVE
---->VMCORE_INFO
/
|---->CRASH_RESERVE
KEXEC --| /|
|--> KEXEC_CORE--> CRASH_DUMP-->/-|---->PROC_VMCORE
KEXEC_FILE --| \ |
\---->CRASH_HOTPLUG
KEXEC --|
|--> KEXEC_CORE (for kexec reboot only)
KEXEC_FILE --|
Test
========
On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
riscv, loongarch, I did below three cases of config item setting and
building all passed. Take configs on x86_64 as exampmle here:
(1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
items are unset automatically:
# Kexec and crash features
# CONFIG_KEXEC is not set
# CONFIG_KEXEC_FILE is not set
# end of Kexec and crash features
(2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
---------------
# Kexec and crash features
CONFIG_CRASH_RESERVE=y
CONFIG_VMCORE_INFO=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
CONFIG_CRASH_MAX_MEMORY_RANGES=8192
# end of Kexec and crash features
---------------
(3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':
------------------------
# Kexec and crash features
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
# end of Kexec and crash features
------------------------
Note:
For ppc, it needs investigation to make clear how to split out crash
code in arch folder. Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.
This patch (of 14):
Both kdump and fa_dump of ppc rely on crashkernel reservation. Move the
relevant codes into separate files: crash_reserve.c,
include/linux/crash_reserve.h.
And also add config item CRASH_RESERVE to control its enabling of the
codes. And update config items which has relationship with crashkernel
reservation.
And also change ifdeffery from CONFIG_CRASH_CORE to CONFIG_CRASH_RESERVE
when those scopes are only crashkernel reservation related.
And also rename arch/XXX/include/asm/{crash_core.h => crash_reserve.h} on
arm64, x86 and risc-v because those architectures' crash_core.h is only
related to crashkernel reservation.
[akpm@linux-foundation.org: s/CRASH_RESEERVE/CRASH_RESERVE/, per Klara Modin]
Link: https://lkml.kernel.org/r/20240124051254.67105-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20240124051254.67105-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Earlier, vmap_area_list is exported to vmcoreinfo so that makedumpfile get
the base address of vmalloc area. Now, vmap_area_list is empty, so export
VMALLOC_START to vmcoreinfo instead, and remove vmap_area_list.
[urezki@gmail.com: fix a warning in the crash_save_vmcoreinfo_init()]
Link: https://lkml.kernel.org/r/20240111192329.449189-1-urezki@gmail.com
Link: https://lkml.kernel.org/r/20240102184633.748113-6-urezki@gmail.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Kazuhito Hagio <k-hagio-ab@nec.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
page_owner needs to increment a stack_record refcount when a new
allocation occurs, and decrement it on a free operation. In order to do
that, we need to have a way to get a stack_record from a handle.
Implement __stack_depot_get_stack_record() which just does that, and make
it public so page_owner can use it.
Also, traversing all stackdepot buckets comes with its own complexity,
plus we would have to implement a way to mark only those stack_records
that were originated from page_owner, as those are the ones we are
interested in. For that reason, page_owner maintains its own list of
stack_records, because traversing that list is faster than traversing all
buckets while keeping at the same time a low complexity.
For now, add to stack_list only the stack_records of dummy_handle and
failure_handle, and set their refcount of 1.
Further patches will add code to increment or decrement stack_records
count on allocation and free operation.
Link: https://lkml.kernel.org/r/20240215215907.20121-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to move the heavy lifting into page_owner code, this one needs to
have access to the stack_record structure, which right now sits in
lib/stackdepot.c. Move it to the stackdepot.h header so page_owner can
access stack_record's struct fields.
Link: https://lkml.kernel.org/r/20240215215907.20121-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|