Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Simplify the logic in the timer migration code
- Simplify the clocksource code by utilizing the more modern
cpumask+*() interfaces
* tag 'timers-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Use cpumask_next_wrap() in clocksource_watchdog()
clocksource: Use cpumask_any_but() in clocksource_verify_choose_cpus()
timers/migration: Clean up the loop in tmigr_quick_check()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A treewide cleanup of struct cycle_counter const annotations.
The initial idea of making them const was correct as they were
seperate instances. When they got embedded into larger data
structures, which are even modified by the callback this got moot. The
only reason why this went unnoticed is that the required
container_of() casts the const attribute forcefully away.
Stop pretending that it is const"
* tag 'timers-cleanups-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/timecounter: Fix the lie that struct cyclecounter is const
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt chip driver updates from Thomas Gleixner:
- Add support of forced affinity setting to yet offline CPUs for the
MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
during the early bringup phase of a secondary CPU in the hotplug code
before the CPU is set online and interrupts are enabled
- Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
interrupt chip
- Make the interrupt routing to RISV-V harts specification compliant so
it supports arbitrary hart indices
- Add a command line parameter and related handling to disable the
generic RISCV IMSIC mechanism on platforms which use a trap-emulated
IMSIC. Unfortunatly this is required because there is no mechanism
available to discover this programatically.
- Enable wakeup sources on the Renesas RZV2H driver
- Convert interrupt chip drivers, which use a open coded variant of
msi_create_parent_irq_domain() to use the new functionality
- Convert interrupt chip drivers, which use the old style two level
implementation of MSI support over to the MSI parent mechanism to
prepare for removing at least one of the three PCI/MSI backend
variants.
- The usual cleanups and improvements all over the place
* tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/riscv-imsic: Add kernel parameter to disable IPIs
irqchip/gic-v3: Fix GICD_CTLR register naming
irqchip/ls-scfg-msi: Fix NULL dereference in error handling
irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()
irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Convert to __free
irqchip/alpine-msi: Convert to lock guards
irqchip/alpine-msi: Clean up whitespace style
irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain()
irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain()
irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper
irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper
irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain()
irqdomain: Add device pointer to irq_domain_info and msi_domain_info
irqchip/renesas-rzv2h: Remove unneeded includes
irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/aslint-sswi: Resolve hart index
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp updates from Thomas Gleixner:
"A set of updates for SMP function calls:
- Improve locality of smp_call_function_any() by utilizing
sched_numa_find_nth_cpu() instead of picking a random CPU
- Wait for work completion in smp_call_function_many_cond() only when
there was actually work enqueued
- Simplify functions by unutlizing the appropriate cpumask_*()
interfaces
- Trivial cleanups"
* tag 'smp-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Wait only if work was enqueued
smp: Defer check for local execution in smp_call_function_many_cond()
smp: Use cpumask_any_but() in smp_call_function_many_cond()
smp: Improve locality in smp_call_function_any()
smp: Fix typo in comment for raw_smp_processor_id()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
- Prevent a interrupt migration related live lock in handle_edge_irq()
If the interrupt affinity is moved to a new target CPU and the
interrupt is currently handled on the previous target CPU for edge
type interrupts the handler might get stuck on the previous target
for a long time, which causes both involved CPUs to waste cycles and
eventually run into a soft-lockup situation.
Solve this by checking whether the interrupt is redirected to a new
target CPU and if the interrupt is handled on that new target CPU,
busy wait for completion instead of masking it and sending the
pending but which would cause the old CPU to re-run the handler and
in the worst case repeating this excercise for a long time.
This only works on architectures which use single CPU interrupt
targets, but that's so far the only ones where this behaviour has
been observed.
- Add a kunit test for interrupt disable depth counts
The nested interrupt disable depth has been an issue in the past
especially vs. free_irq(), interrupt shutdown and CPU hotplug and
their interactions. The test exercises the combinations of these
scenarios and checks for correctness.
* tag 'irq-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent migration live lock in handle_edge_irq()
genirq: Split up irq_pm_check_wakeup()
genirq: Move irq_wait_for_poll() to call site
genirq: Remove pointless local variable
genirq: Add kunit tests for depth counts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"debugfs:
- Remove unneeded debugfs_file_{get,put}() instances
- Remove last remnants of debugfs_real_fops()
- Allow storing non-const void * in struct debugfs_inode_info::aux
sysfs:
- Switch back to attribute_group::bin_attrs (treewide)
- Switch back to bin_attribute::read()/write() (treewide)
- Constify internal references to 'struct bin_attribute'
Support cache-ids for device-tree systems:
- Add arch hook arch_compact_of_hwid()
- Use arch_compact_of_hwid() to compact MPIDR values on arm64
Rust:
- Device:
- Introduce CoreInternal device context (for bus internal methods)
- Provide generic drvdata accessors for bus devices
- Provide Driver::unbind() callbacks
- Use the infrastructure above for auxiliary, PCI and platform
- Implement Device::as_bound()
- Rename Device::as_ref() to Device::from_raw() (treewide)
- Implement fwnode and device property abstractions
- Implement example usage in the Rust platform sample driver
- Devres:
- Remove the inner reference count (Arc) and use pin-init instead
- Replace Devres::new_foreign_owned() with devres::register()
- Require T to be Send in Devres<T>
- Initialize the data kept inside a Devres last
- Provide an accessor for the Devres associated Device
- Device ID:
- Add support for ACPI device IDs and driver match tables
- Split up generic device ID infrastructure
- Use generic device ID infrastructure in net::phy
- DMA:
- Implement the dma::Device trait
- Add DMA mask accessors to dma::Device
- Implement dma::Device for PCI and platform devices
- Use DMA masks from the DMA sample module
- I/O:
- Implement abstraction for resource regions (struct resource)
- Implement resource-based ioremap() abstractions
- Provide platform device accessors for I/O (remap) requests
- Misc:
- Support fallible PinInit types in Revocable
- Implement Wrapper<T> for Opaque<T>
- Merge pin-init blanket dependencies (for Devres)
Misc:
- Fix OF node leak in auxiliary_device_create()
- Use util macros in device property iterators
- Improve kobject sample code
- Add device_link_test() for testing device link flags
- Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
- Hint to prefer container_of_const() over container_of()"
* tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits)
rust: io: fix broken intra-doc links to `platform::Device`
rust: io: fix broken intra-doc link to missing `flags` module
rust: io: mem: enable IoRequest doc-tests
rust: platform: add resource accessors
rust: io: mem: add a generic iomem abstraction
rust: io: add resource abstraction
rust: samples: dma: set DMA mask
rust: platform: implement the `dma::Device` trait
rust: pci: implement the `dma::Device` trait
rust: dma: add DMA addressing capabilities
rust: dma: implement `dma::Device` trait
rust: net::phy Change module_phy_driver macro to use module_device_table macro
rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id
rust: device_id: split out index support into a separate trait
device: rust: rename Device::as_ref() to Device::from_raw()
arm64: cacheinfo: Provide helper to compress MPIDR value into u32
cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
cacheinfo: Set cache 'id' based on DT data
container_of: Document container_of() is not to be used in new code
driver core: auxiliary bus: fix OF node leak
...
|
|
Don't populate the read-only 'type' on the stack at run time,
instead make it static.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250714160858.1234719-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a new API to retrieve a user space callstack called
unwind_user_faultable(). The difference between this user space stack
tracer from the current user space stack tracer is that this must be
called from faultable context as it may use routines to access user space
data that needs to be faulted in.
It can be safely called from entering or exiting a system call as the code
can still be faulted in there.
This code is based on work by Josh Poimboeuf's deferred unwinding code:
Link: https://lore.kernel.org/all/6052e8487746603bdb29b65f4033e739092d9925.1737511963.git.jpoimboe@kernel.org/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Link: https://lore.kernel.org/20250729182405.147896868@kernel.org
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Introduce a generic API for unwinding user stacks.
In order to expand user space unwinding to be able to handle more complex
scenarios, such as deferred unwinding and reading user space information,
create a generic interface that all architectures can use that support the
various unwinding methods.
This is an alternative method for handling user space stack traces from
the simple stack_trace_save_user() API. This does not replace that
interface, but this interface will be used to expand the functionality of
user space stack walking.
None of the structures introduced will be exposed to user space tooling.
Support for frame pointer unwinding is added. For an architecture to
support frame pointer unwinding it needs to enable
CONFIG_HAVE_UNWIND_USER_FP and define ARCH_INIT_USER_FP_FRAME.
By encoding the frame offsets in struct unwind_user_frame, much of this
code can also be reused for future unwinder implementations like sframe.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Link: https://lore.kernel.org/20250729182404.975790139@kernel.org
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/all/20250710164301.3094-2-mathieu.desnoyers@efficios.com/
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
With CONFIG_DEBUG_INFO_BTF=y and PAHOLE_HAS_BTF_TAG=y, `__user` is
converted to `__attribute__((btf_type_tag("user")))`. In this case,
some syscall events have it for __user data, like below;
/sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 720
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:int dfd; offset:16; size:8; signed:0;
field:const char __attribute__((btf_type_tag("user"))) * filename; offset:24; size:8; signed:0;
field:int flags; offset:32; size:8; signed:0;
field:umode_t mode; offset:40; size:8; signed:0;
Then the trace event filter fails to set the string acceptable flag
(FILTER_PTR_STRING) to the field and rejects setting string filter;
# echo 'filename.ustring ~ "*ftracetest-dir.wbx24v*"' \
>> events/syscalls/sys_enter_openat/filter
sh: write error: Invalid argument
# cat error_log
[ 723.743637] event filter parse error: error: Expecting numeric field
Command: filename.ustring ~ "*ftracetest-dir.wbx24v*"
Since this __attribute__ makes format parsing complicated and not
needed, remove the __attribute__(.*) from the type string.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/175376583493.1688759.12333973498014733551.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
KVM IRQ changes for 6.17
- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked list. Using a linked list leads to O(n^2) insertion
times, which is hugely problematic for use cases that create large numbers
of VMs. Such use cases typically don't actually use irqbypass, but
eliminating the pointless registration is a future problem to solve as it
likely requires new uAPI.
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
to avoid making a simple concept unnecessarily difficult to understand.
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
and PIT emulation at compile time.
- Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
- Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation.
- Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
clear in the vCPU's physical ID table entry.
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
erratum #1235, to allow (safely) enabling AVIC on such CPUs.
- Dedup x86's device posted IRQ code, as the vast majority of functionality
can be shared verbatime between SVM and VMX.
- Harden the device posted IRQ code against bugs and runtime errors.
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
instead of O(n).
- Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
only if KVM needs a notification in order to wake the vCPU.
- Decouple device posted IRQs from VFIO device assignment, as binding a VM to
a VFIO group is not a requirement for enabling device posted IRQs.
- Clean up and document/comment the irqfd assignment code.
- Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
ensure an eventfd is bound to at most one irqfd through the entire host,
and add a selftest to verify eventfd:irqfd bindings are globally unique.
|
|
Since preempt_count_add/del() are tracable functions, it is not allowed
to use preempt_disable/enable() in ftrace handlers. Without this fix,
probing on `preempt_count_add%return` will cause an infinite recursion
of fprobes.
To fix this problem, use preempt_disable/enable_notrace() in
fprobe_return().
Link: https://lore.kernel.org/all/175374642359.1471729.1054175011228386560.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"As is tradition, cpufreq is the part with the largest number of
updates that include core fixes and cleanups as well as updates of
several assorted drivers, but there are also quite a few updates
related to system sleep, mostly focused on asynchronous suspend and
resume of devices and on making the integration of system suspend
and resume with runtime PM easier.
Runtime PM is also updated to allow some code duplication in drivers
to be eliminated going forward and to work more consistently overall
in some cases.
Apart from that, there are some driver core updates related to PM
domains that should help to address ordering issues with devm_ cleanup
routines relying on PM domains, some assorted devfreq updates
including core fixes and cleanups, tooling updates, and documentation
and MAINTAINERS updates.
Specifics:
- Fix two initialization ordering issues in the cpufreq core and a
governor initialization error path in it, and clean it up (Lifeng
Zheng)
- Add Granite Rapids support in no-HWP mode to the intel_pstate
cpufreq driver (Li RongQing)
- Make intel_pstate always use HWP_DESIRED_PERF when operating in the
passive mode (Rafael Wysocki)
- Allow building the tegra124 cpufreq driver as a module (Aaron
Kling)
- Do minor cleanups for Rust cpufreq and cpumask APIs and fix
MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas
Bulwahn)
- Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)
- Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver
(Prashant Malani)
- Fix minimum performance state label error in the amd-pstate driver
documentation (Shouye Liu)
- Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
governor and explain HW coordination influence on it in the
documentation (Shashank Balaji)
- Fix opencoded for_each_cpu() in idle_state_valid() in the DT
cpuidle driver (Yury Norov)
- Remove info about non-existing QoS interfaces from the PM QoS
documentation (Ulf Hansson)
- Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)
- Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)
- Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)
- Simplify the sun8i-a33-mbus devfreq driver by using more devm
functions (Uwe Kleine-König)
- Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)
- Check devfreq governor before using governor->name (Lifeng Zheng)
- Remove a redundant devfreq_get_freq_range() call from
devfreq_add_device() (Lifeng Zheng)
- Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)
- Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)
- Extend the asynchronous suspend and resume of devices to handle
suppliers like parents and consumers like children (Rafael Wysocki)
- Make pm_runtime_force_resume() work for drivers that set the
DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
collaborate with the general ACPI PM domain to set it (Rafael
Wysocki)
- Add kernel parameter to disable asynchronous suspend/resume of
devices (Tudor Ambarus)
- Drop redundant might_sleep() calls from some functions in the
device suspend/resume core code (Zhongqiu Han)
- Fix the handling of monitors connected right before waking up the
system from sleep (tuhaowen)
- Clean up MAINTAINERS entries for suspend and hibernation (Rafael
Wysocki)
- Fix error code path in the KEXEC_JUMP flow and drop a redundant
pm_restore_gfp_mask() call from it (Rafael Wysocki)
- Rearrange suspend/resume error handling in the core device suspend
and resume code (Rafael Wysocki)
- Fix up white space that does not follow coding style in the
hibernation core code (Darshan Rathod)
- Document return values of suspend-related API functions in the
runtime PM framework (Sakari Ailus)
- Mark last busy stamp in multiple autosuspend-related functions in
the runtime PM framework and update its documentation (Sakari
Ailus)
- Take active children into account in pm_runtime_get_if_in_use() for
consistency (Rafael Wysocki)
- Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
power capping driver (Sivan Zohar-Kotzer)
- Add support for the Bartlett Lake platform to the Intel RAPL power
capping driver (Qiao Wei)
- Add PL4 support for Panther Lake to the intel_rapl_msr power
capping driver (Zhang Rui)
- Update contact information in the PM ABI docs and maintainer
information in the power domains DT binding (Rafael Wysocki)
- Update PM header inclusions to follow the IWYU (Include What You
Use) principle (Andy Shevchenko)
- Add flags to specify power on attach/detach for PM domains, make
the driver core detach PM domains in device_unbind_cleanup(), and
drop the dev_pm_domain_detach() call from the platform bus type
(Claudiu Beznea)
- Improve Python binding's Makefile for cpupower (John B. Wyatt IV)
- Fix printing of CORE, CPU fields in cpupower-monitor (Gautham
Shenoy)"
* tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag
PM: docs: Use my kernel.org address in ABI docs and DT bindings
PM: hibernate: Fix up white space that does not follow coding style
PM: sleep: Rearrange suspend/resume error handling in the core
Documentation: amd-pstate:fix minimum performance state label error
PM: runtime: Take active children into account in pm_runtime_get_if_in_use()
kexec_core: Drop redundant pm_restore_gfp_mask() call
kexec_core: Fix error code path in the KEXEC_JUMP flow
PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
drivers: cpufreq: add Tegra114 support
rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
cpufreq: Exit governor when failed to start old governor
cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq()
cpufreq: Init policy->rwsem before it may be possibly used
cpufreq: Initialize cpufreq-based frequency-invariance later
cpufreq: Remove duplicate check in __cpufreq_offline()
cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs
cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
PM / devfreq: Add HiSilicon uncore frequency scaling driver
...
|
|
Show the rejected function name when attaching tracing programs to
functions in deny list.
With this change, we know why tracing programs can't attach to functions
like __rcu_read_lock() from log.
$ ./fentry
libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL
libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG --
Attaching tracing programs to function '__rcu_read_lock' is rejected.
Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
__noreturn functions
With this change, we know the precise rejected function name when
attaching fexit/fmod_ret to __noreturn functions from log.
$ ./fexit
libbpf: prog 'fexit': BPF program load failed: -EINVAL
libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG --
Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected.
Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"A single audit patch that restores logging of an audit event in the
module load failure case"
* tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit,module: restore audit logging in load failure case
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
"This is the main crypto library pull request for 6.17. The main focus
this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
and establishing conventions for lib/crypto/ going forward:
- Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
most of the SHA-512 code) into lib/crypto/. This includes both the
generic and architecture-optimized code. Greatly simplify how the
architecture-optimized code is integrated. Add an easy-to-use
library API for each SHA variant, including HMAC support. Finally,
reimplement the crypto_shash support on top of the library API.
- Apply the same reorganization to the SHA-256 code (and also SHA-224
which shares most of the SHA-256 code). This is a somewhat smaller
change, due to my earlier work on SHA-256. But this brings in all
the same additional improvements that I made for SHA-1 and SHA-512.
There are also some smaller changes:
- Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
these algorithms it's just a move, not a full reorganization yet.
- Fix the MIPS chacha-core.S to build with the clang assembler.
- Fix the Poly1305 functions to work in all contexts.
- Fix a performance regression in the x86_64 Poly1305 code.
- Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
Note that since the new organization of the SHA code is much simpler,
the diffstat of this pull request is negative, despite the addition of
new fully-documented library APIs for multiple SHA and HMAC-SHA
variants.
These APIs will allow further simplifications across the kernel as
users start using them instead of the old-school crypto API. (I've
already written a lot of such conversion patches, removing over 1000
more lines of code. But most of those will target 6.18 or later)"
* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits)
lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils
lib/crypto: x86/sha1-ni: Convert to use rounds macros
lib/crypto: x86/sha1-ni: Minor optimizations and cleanup
crypto: sha1 - Remove sha1_base.h
lib/crypto: x86/sha1: Migrate optimized code into library
lib/crypto: sparc/sha1: Migrate optimized code into library
lib/crypto: s390/sha1: Migrate optimized code into library
lib/crypto: powerpc/sha1: Migrate optimized code into library
lib/crypto: mips/sha1: Migrate optimized code into library
lib/crypto: arm64/sha1: Migrate optimized code into library
lib/crypto: arm/sha1: Migrate optimized code into library
crypto: sha1 - Use same state format as legacy drivers
crypto: sha1 - Wrap library and add HMAC support
lib/crypto: sha1: Add HMAC support
lib/crypto: sha1: Add SHA-1 library functions
lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()
crypto: x86/sha1 - Rename conflicting symbol
lib/crypto: sha2: Add hmac_sha*_init_usingrawkey()
lib/crypto: arm/poly1305: Remove unneeded empty weak function
lib/crypto: x86/poly1305: Fix performance regression on short messages
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Introduce and start using TRAILING_OVERLAP() helper for fixing
embedded flex array instances (Gustavo A. R. Silva)
- mux: Convert mux_control_ops to a flex array member in mux_chip
(Thorsten Blum)
- string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
- Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
Kees Cook)
- Refactor and rename stackleak feature to support Clang
- Add KUnit test for seq_buf API
- Fix KUnit fortify test under LTO
* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
sched/task_stack: Add missing const qualifier to end_of_stack()
kstack_erase: Support Clang stack depth tracking
kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
init.h: Disable sanitizer coverage for __init and __head
kstack_erase: Disable kstack_erase for all of arm compressed boot code
x86: Handle KCOV __init vs inline mismatches
arm64: Handle KCOV __init vs inline mismatches
s390: Handle KCOV __init vs inline mismatches
arm: Handle KCOV __init vs inline mismatches
mips: Handle KCOV __init vs inline mismatch
powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
configs/hardening: Enable CONFIG_KSTACK_ERASE
stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
stackleak: Rename STACKLEAK to KSTACK_ERASE
seq_buf: Introduce KUnit tests
string: Group str_has_prefix() and strstarts()
kunit/fortify: Add back "volatile" for sizeof() constants
acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Introduce regular REGSET note macros arch-wide (Dave Martin)
- Remove arbitrary 4K limitation of program header size (Yin Fengwei)
- Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi)
* tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits)
fork: reorder function qualifiers for copy_clone_args_from_user
binfmt_elf: remove the 4k limitation of program header size
binfmt_elf: Warn on missing or suspicious regset note names
xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
...
|
|
Pull block updates from Jens Axboe:
- MD pull request via Yu:
- call del_gendisk synchronously (Xiao)
- cleanup unused variable (John)
- cleanup workqueue flags (Ryo)
- fix faulty rdev can't be removed during resync (Qixing)
- NVMe pull request via Christoph:
- try PCIe function level reset on init failure (Keith Busch)
- log TLS handshake failures at error level (Maurizio Lombardi)
- pci-epf: do not complete commands twice if nvmet_req_init()
fails (Rick Wertenbroek)
- misc cleanups (Alok Tiwari)
- Removal of the pktcdvd driver
This has been more than a decade coming at this point, and some
recently revealed breakages that had it causing issues even for cases
where it isn't required made me re-pull the trigger on this one. It's
known broken and nobody has stepped up to maintain the code
- Series for ublk supporting batch commands, enabling the use of
multishot where appropriate
- Speed up ublk exit handling
- Fix for the two-stage elevator fixing which could leak data
- Convert NVMe to use the new IOVA based API
- Increase default max transfer size to something more reasonable
- Series fixing write operations on zoned DM devices
- Add tracepoints for zoned block device operations
- Prep series working towards improving blk-mq queue management in the
presence of isolated CPUs
- Don't allow updating of the block size of a loop device that is
currently under exclusively ownership/open
- Set chunk sectors from stacked device stripe size and use it for the
atomic write size limit
- Switch to folios in bcache read_super()
- Fix for CD-ROM MRW exit flush handling
- Various tweaks, fixes, and cleanups
* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
block: restore two stage elevator switch while running nr_hw_queue update
cdrom: Call cdrom_mrw_exit from cdrom_release function
sunvdc: Balance device refcount in vdc_port_mpgroup_check
nvme-pci: try function level reset on init failure
dm: split write BIOs on zone boundaries when zone append is not emulated
block: use chunk_sectors when evaluating stacked atomic write limits
dm-stripe: limit chunk_sectors to the stripe size
md/raid10: set chunk_sectors limit
md/raid0: set chunk_sectors limit
block: sanitize chunk_sectors for atomic write limits
ilog2: add max_pow_of_two_factor()
nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
nvme-tcp: log TLS handshake failures at error level
docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
nvme: fix typo in status code constant for self-test in progress
nvmet: remove redundant assignment of error code in nvmet_ns_enable()
nvme: fix incorrect variable in io cqes error message
nvme: fix multiple spelling and grammar issues in host drivers
block: fix blk_zone_append_update_request_bio() kernel-doc
md/raid10: fix set but not used variable in sync_request_write()
...
|
|
Fix a typo that uses ',' instead of ';' for line delimiter.
Link: https://lore.kernel.org/linux-trace-kernel/175366879192.487099.5714468217360139639.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs bpf updates from Christian Brauner:
"These changes allow bpf to read extended attributes from cgroupfs.
This is useful in redirecting AF_UNIX socket connections based on
cgroup membership of the socket. One use-case is the ability to
implement log namespaces in systemd so services and containers are
redirected to different journals"
* tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/kernfs: test xattr retrieval
selftests/bpf: Add tests for bpf_cgroup_read_xattr
bpf: Mark cgroup_subsys_state->cgroup RCU safe
bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
kernfs: remove iattr_mutex
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
- persistent info
Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.
The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.
This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.
If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid
is closed the pidfs dentry is released and removed from pid->stashed.
So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.
Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or
it might find a dead dentry in there and fail to reuse it and so
stashes a new pidfs dentry. Multiple tasks may race to stash a new
pidfs dentry but only one will succeed, the other ones will put their
dentry.
The current scheme aims to ensure that a pidfs dentry for a struct
pid can only be created if the task is still alive or if a pidfs
dentry already existed before the task was reaped and so exit
information has been was stashed in the pidfs inode.
That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is
called we will return a pidfd for a reaped task without exit
information being available.
The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might
be successful simply because we're before __unhash_process() but
after pidfs_exit().
Introduce a new scheme where the lifetime of information associated
with a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.
The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.
If all pidfs for the pidfs dentry are closed the dentry and inode can
be cleaned up but the struct pidfs_attr will stick until the struct
pid itself is freed. This will ensure minimal memory usage while
persisting relevant information.
The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which
no exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries
when registering a struct pid with pidfs. There's no need to get or
put a reference. Instead, the lifetime of exit and coredump
information associated with a struct pid is bound to the lifetime of
struct pid itself.
- extended attributes
Now that we have a way to persist information for pidfs dentries we
can start supporting extended attributes on pidfds. This will allow
userspace to attach meta information to tasks.
One natural extension would be to introduce a custom pidfs.* extended
attribute space and allow for the inheritance of extended attributes
across fork() and exec().
The first simple scheme will allow privileged userspace to set
trusted extended attributes on pidfs inodes.
- Allow autonomous pidfs file handles
Various filesystems such as pidfs and drm support opening file
handles without having to require a file descriptor to identify the
filesystem. The filesystem are global single instances and can be
trivially identified solely on the information encoded in the file
handle.
This makes it possible to not have to keep or acquire a sentinal file
descriptor just to pass it to open_by_handle_at() to identify the
filesystem. That's especially useful when such sentinel file
descriptor cannot or should not be acquired.
For pidfs this means a file handle can function as full replacement
for storing a pid in a file. Instead a file handle can be stored and
reopened purely based on the file handle.
Such autonomous file handles can be opened with or without specifying
a a file descriptor. If no proper file descriptor is used the
FD_PIDFS_ROOT sentinel must be passed. This allows us to define
further special negative fd sentinels in the future.
Userspace can trivially test for support by trying to open the file
handle with an invalid file descriptor.
- Allow pidfds for reaped tasks with SCM_PIDFD messages
This is a logical continuation of the earlier work to create pidfds
for reaped tasks through the SO_PEERPIDFD socket option merged in
923ea4d4482b ("Merge patch series "net, pidfs: enable handing out
pidfds for reaped sk->sk_peer_pid"").
- Two minor fixes:
* Fold fs_struct->{lock,seq} into a seqlock
* Don't bother with path_{get,put}() in unix_open_file()
* tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
don't bother with path_get()/path_put() in unix_open_file()
fold fs_struct->{lock,seq} into a seqlock
selftests: net: extend SCM_PIDFD test to cover stale pidfds
af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD
af_unix: stash pidfs dentry when needed
af_unix/scm: fix whitespace errors
af_unix: introduce and use scm_replace_pid() helper
af_unix: introduce unix_skb_to_scm helper
af_unix: rework unix_maybe_add_creds() to allow sleep
selftests/pidfd: decode pidfd file handles withou having to specify an fd
fhandle, pidfs: support open_by_handle_at() purely based on file handle
uapi/fcntl: add FD_PIDFS_ROOT
uapi/fcntl: add FD_INVALID
fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP
uapi/fcntl: mark range as reserved
fhandle: reflow get_path_anchor()
pidfs: add pidfs_root_path() helper
fhandle: rename to get_path_anchor()
fhandle: hoist copy_from_user() above get_path_from_fd()
fhandle: raise FILEID_IS_DIR in handle_type
...
|
|
Add a per-cpu monitor as part of the sched model:
* opid: operations with preemption and irq disabled
Monitor to ensure wakeup and need_resched occur with irq and
preemption disabled or in irq handlers.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-10-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add 2 per-task monitors as part of the sched model:
* nrp: need-resched preempts
Monitor to ensure preemption requires need resched.
* sssw: set state sleep and wakeup
Monitor to ensure sched_set_state to sleepable leads to sleeping and
sleeping tasks require wakeup.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-9-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The tss monitor currently guarantees task switches can happen only while
scheduling, whereas the sncid monitor enforces scheduling occurs with
interrupt disabled.
Replace the monitors with a more comprehensive specification which
implies both but also ensures that:
* each scheduler call disable interrupts to switch
* each task switch happens with interrupts disabled
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nam Cao <namcao@linutronix.de>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-8-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add the following tracepoint:
* sched_set_need_resched(tsk, cpu, tif)
Called when a task is set the need resched [lazy] flag
Remove the unused ip parameter from sched_entry and sched_exit and alter
sched_entry to have a value of preempt consistent with the one used in
sched_switch.
Also adapt all monitors using sched_{entry,exit} to avoid breaking build.
These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nam Cao <namcao@linutronix.de>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-7-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
DA monitor can be accessed from multiple cores simultaneously, this is
likely, for instance when dealing with per-task monitors reacting on
events that do not always occur on the CPU where the task is running.
This can cause race conditions where two events change the next state
and we see inconsistent values. E.g.:
[62] event_srs: 27: sleepable x sched_wakeup -> running (final)
[63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
[63] error_srs: 27: event sched_switch_suspend not expected in the state running
In this case the monitor fails because the event on CPU 62 wins against
the one on CPU 63, although the correct state should have been
sleepable, since the task get suspended.
Detect if the current state was modified by using try_cmpxchg while
storing the next value. If it was, try again reading the current state.
After a maximum number of failed retries, react by calling a special
tracepoint, print on the console and reset the monitor.
Remove the functions da_monitor_curr_state() and da_monitor_set_state()
as they only hide the underlying implementation in this case.
Monitors where this type of condition can occur must be able to account
for racing events in any possible order, as we cannot know the winner.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-6-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
RV monitors relying on the preemptirqs tracepoints are set as dependent
on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations do
enable the tracepoints but are not the minimal configurations enabling
them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not selectable
manually).
Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for
monitors.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-5-gmonaco@redhat.com
Fixes: fbe6c09b7eb4 ("rv: Add scpd, snep and sncid per-cpu monitors")
Acked-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Using DA monitors tracepoints with KASAN enabled triggers the following
warning:
BUG: KASAN: global-out-of-bounds in do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
Read of size 32 at addr ffffffffaada8980 by task ...
Call Trace:
<TASK>
[...]
do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
? __pfx_do_trace_event_raw_event_event_da_monitor+0x10/0x10
? trace_event_sncid+0x83/0x200
trace_event_sncid+0x163/0x200
[...]
The buggy address belongs to the variable:
automaton_snep+0x4e0/0x5e0
This is caused by the tracepoints reading 32 bytes __array instead of
__string from the automata definition. Such strings are literals and
reading 32 bytes ends up in out of bound memory accesses (e.g. the next
automaton's data in this case).
The error is harmless as, while printing the string, we stop at the null
terminator, but it should still be fixed.
Use the __string facilities while defining the tracepoints to avoid
reading out of bound memory.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-4-gmonaco@redhat.com
Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
RV event tracepoints print a line with the format:
"event_xyz: S0 x event -> S1 "
"event_xyz: S1 x event -> S0 (final)"
While printing an event leading to a non-final state, the line
has a trailing white space (visible above before the closing ").
Adapt the format string not to print the trailing whitespace if we are
not printing "(final)".
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-3-gmonaco@redhat.com
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull coredump updates from Christian Brauner:
"This contains an extension to the coredump socket and a proper rework
of the coredump code.
- This extends the coredump socket to allow the coredump server to
tell the kernel how to process individual coredumps. This allows
for fine-grained coredump management. Userspace can decide to just
let the kernel write out the coredump, or generate the coredump
itself, or just reject it.
* COREDUMP_KERNEL
The kernel will write the coredump data to the socket.
* COREDUMP_USERSPACE
The kernel will not write coredump data but will indicate to the
parent that a coredump has been generated. This is used when
userspace generates its own coredumps.
* COREDUMP_REJECT
The kernel will skip generating a coredump for this task.
* COREDUMP_WAIT
The kernel will prevent the task from exiting until the coredump
server has shutdown the socket connection.
The flexible coredump socket can be enabled by using the "@@"
prefix instead of the single "@" prefix for the regular coredump
socket:
@@/run/systemd/coredump.socket
- Cleanup the coredump code properly while we have to touch it
anyway.
Split out each coredump mode in a separate helper so it's easy to
grasp what is going on and make the code easier to follow. The core
coredump function should now be very trivial to follow"
* tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
cleanup: add a scoped version of CLASS()
coredump: add coredump_skip() helper
coredump: avoid pointless variable
coredump: order auto cleanup variables at the top
coredump: add coredump_cleanup()
coredump: auto cleanup prepare_creds()
cred: add auto cleanup method
coredump: directly return
coredump: auto cleanup argv
coredump: add coredump_write()
coredump: use a single helper for the socket
coredump: move pipe specific file check into coredump_pipe()
coredump: split pipe coredumping into coredump_pipe()
coredump: move core_pipe_count to global variable
coredump: prepare to simplify exit paths
coredump: split file coredumping into coredump_file()
coredump: rename do_coredump() to vfs_coredump()
selftests/coredump: make sure invalid paths are rejected
coredump: validate socket path in coredump_parse()
coredump: don't allow ".." in coredump socket path
...
|
|
This patch fixes several minor typos in comments within the BPF verifier.
No changes in functionality.
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit d7f008738171 ("bpf: try harder to deduce register bounds from
different numeric domains") added a second call to __reg_deduce_bounds
in reg_bounds_sync because a single call wasn't enough to converge to a
fixed point in terms of register bounds.
With patch "bpf: Improve bounds when s64 crosses sign boundary" from
this series, Eduard noticed that calling __reg_deduce_bounds twice isn't
enough anymore to converge. The first selftest added in "selftests/bpf:
Test cross-sign 64bits range refinement" highlights the need for a third
call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync
performs the following bounds deduction:
reg_bounds_sync entry: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
__update_reg_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
__reg_deduce_bounds:
__reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg64_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_deduce_mixed_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
__reg_deduce_bounds:
__reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
__reg64_deduce_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_deduce_mixed_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_bound_offset: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
__update_reg_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
In particular, notice how:
1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds
learns new u32 bounds.
2. __reg64_deduce_bounds is unable to improve bounds at this point.
3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds.
4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds
improves the smax and umin bounds thanks to patch "bpf: Improve
bounds when s64 crosses sign boundary" from this series.
5. Subsequent functions are unable to improve the ranges further (only
tnums). Yet, a better smin32 bound could be learned from the smin
bound.
__reg32_deduce_bounds is able to improve smin32 from smin, but for that
we need a third call to __reg_deduce_bounds.
As discussed in [1], there may be a better way to organize the deduction
rules to learn the same information with less calls to the same
functions. Such an optimization requires further analysis and is
orthogonal to the present patchset.
Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
__reg64_deduce_bounds currently improves the s64 range using the u64
range and vice versa, but only if it doesn't cross the sign boundary.
This patch improves __reg64_deduce_bounds to cover the case where the
s64 range crosses the sign boundary but overlaps with the u64 range on
only one end. In that case, we can improve both ranges. Consider the
following example, with the s64 range crossing the sign boundary:
0 U64_MAX
| [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx] |
|----------------------------|----------------------------|
|xxxxx s64 range xxxxxxxxx] [xxxxxxx|
0 S64_MAX S64_MIN -1
The u64 range overlaps only with positive portion of the s64 range. We
can thus derive the following new s64 and u64 ranges.
0 U64_MAX
| [xxxxxx u64 range xxxxx] |
|----------------------------|----------------------------|
| [xxxxxx s64 range xxxxx] |
0 S64_MAX S64_MIN -1
The same logic can probably apply to the s32/u32 ranges, but this patch
doesn't implement that change.
In addition to the selftests, the __reg64_deduce_bounds change was
also tested with Agni, the formal verification tool for the range
analysis [1].
Link: https://github.com/bpfverif/agni [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Argument 'p' of reactors_show() and monitor_reactor_show() is not a pointer
to struct rv_reactor, it is actually a pointer to the list_head inside
struct rv_reactor. Therefore it's wrong to cast 'p' to struct rv_reactor *.
This wrong type cast has been there since the beginning. But it still
worked because the list_head was the first field in struct rv_reactor_def.
This is no longer true since commit 3d3c376118b5 ("rv: Merge struct
rv_reactor_def into struct rv_reactor") moved the list_head, and this wrong
type cast became a functional problem.
Properly use container_of() instead.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/b4febbd6844311209e4c8768b65d508b81bd8c9b.1753625621.git.namcao@linutronix.de
Fixes: 3d3c376118b5 ("rv: Merge struct rv_reactor_def into struct rv_reactor")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Argument 'p' of monitors_show() is not a pointer to struct rv_monitor, it
is actually a pointer to the list_head inside struct rv_monitor. Therefore
it is wrong to cast 'p' to struct rv_monitor *.
This wrong type cast has been there since the beginning. But it still
worked because the list_head was the first field in struct rv_monitor_def.
This is no longer true since commit 24cbfe18d55a ("rv: Merge struct
rv_monitor_def into struct rv_monitor") moved the list_head, and this wrong
type cast became a functional problem.
Properly use container_of() instead.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/35e49e97696007919ceacf73796487a2e15a3d02.1753625621.git.namcao@linutronix.de
Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
During the bounds refinement, we improve the precision of various ranges
by looking at other ranges. Among others, we improve the following in
this order (other things happen between 1 and 2):
1. Improve u32 from s32 in __reg32_deduce_bounds.
2. Improve s/u64 from u32 in __reg_deduce_mixed_bounds.
3. Improve s/u64 from s32 in __reg_deduce_mixed_bounds.
In particular, if the s32 range forms a valid u32 range, we will use it
to improve the u32 range in __reg32_deduce_bounds. In
__reg_deduce_mixed_bounds, under the same condition, we will use the s32
range to improve the s/u64 ranges.
If at (1) we were able to learn from s32 to improve u32, we'll then be
able to use that in (2) to improve s/u64. Hence, as (3) happens under
the same precondition as (1), it won't improve s/u64 ranges further than
(1)+(2) did. Thus, we can get rid of (3).
In addition to the extensive suite of selftests for bounds refinement,
this patch was also tested with the Agni formal verification tool [1].
Additionally, Eduard mentioned:
The argument appears to be as follows:
Under precondition `(u32)reg->s32_min <= (u32)reg->s32_max`
__reg32_deduce_bounds produces:
reg->u32_min = max_t(u32, reg->s32_min, reg->u32_min);
reg->u32_max = min_t(u32, reg->s32_max, reg->u32_max);
And then first part of __reg_deduce_mixed_bounds assigns:
a. reg->umin umax= (reg->umin & ~0xffffffffULL) | max_t(u32, reg->s32_min, reg->u32_min);
b. reg->umax umin= (reg->umax & ~0xffffffffULL) | min_t(u32, reg->s32_max, reg->u32_max);
And then second part of __reg_deduce_mixed_bounds assigns:
c. reg->umin umax= (reg->umin & ~0xffffffffULL) | (u32)reg->s32_min;
d. reg->umax umin= (reg->umax & ~0xffffffffULL) | (u32)reg->s32_max;
But assignment (c) is a noop because:
max_t(u32, reg->s32_min, reg->u32_min) >= (u32)reg->s32_min
Hence RHS(a) >= RHS(c) and umin= does nothing.
Also assignment (d) is a noop because:
min_t(u32, reg->s32_max, reg->u32_max) <= (u32)reg->s32_max
Hence RHS(b) <= RHS(d) and umin= does nothing.
Plus the same reasoning for the part dealing with reg->s{min,max}_value:
e. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | max_t(u32, reg->s32_min_value, reg->u32_min_value);
f. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | min_t(u32, reg->s32_max_value, reg->u32_max_value);
vs
g. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | (u32)reg->s32_min_value;
h. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | (u32)reg->s32_max_value;
RHS(e) >= RHS(g) and RHS(f) <= RHS(h), hence smax=,smin= do nothing.
This appears to be correct.
Also, Shung-Hsi:
Beside going through the reasoning, I also played with CBMC a bit to
double check that as far as a single run of __reg_deduce_bounds() is
concerned (and that the register state matches certain handwavy
expectations), the change indeed still preserve the original behavior.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://github.com/bpfverif/agni [1]
Link: https://lore.kernel.org/bpf/aIJwnFnFyUjNsCNa@mail.gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for the PTP systemcounter mechanism:
The rework of this mechanism added a 'use_nsec' member to struct
system_counterval. get_device_system_crosststamp() instantiates that
struct on the stack and hands a pointer to the driver callback.
Only the drivers which set use_nsec to true, initialize that field,
but all others ignore it. As get_device_system_crosststamp() does not
initialize the struct, the use_nsec field contains random stack
content in those cases. That causes a miscalulation usually resulting
in a failing range check in the best case.
Initialize the structure before handing it to the drivers to cure
that"
* tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Zero initialize system_counterval when querying time from phc drivers
|
|
Once CONFIG_KSTACK_ERASE is enabled with Clang on i386, the build warns:
kernel/kstack_erase.c:168:2: warning: function with attribute 'no_caller_saved_registers' should only call a function with attribute 'no_caller_saved_registers' or be compiled with '-mgeneral-regs-only' [-Wexcessive-regsave]
Add -mgeneral-regs-only for the kstack_erase handler, to make Clang feel
better (it is effectively a no-op flag for the kernel). No binary
changes encountered.
Build & boot tested with Clang 21 on x86_64, and i386.
Build tested with GCC 14.2.0 on x86_64, i386, arm64, and arm.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20250726004313.GA3650901@ax162
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
bpf_jit_get_prog_name() will be used by all JITs when enabling support
for private stack. This function is currently implemented in the x86
JIT.
Move the function to core.c so that other JITs can easily use it in
their implementation of private stack.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-2-puranjay@kernel.org
|
|
The code is unused since 98e20e5e13d2 ("bpfilter: remove bpfilter"),
therefore remove it.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-2-0d0083334382@linutronix.de
|
|
The usermode driver framework is not used anymore by the BPF
preload code.
Fixes: cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-1-0d0083334382@linutronix.de
|
|
The field 'reacting' in struct rv_monitor is set but never used. Delete it.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/a6c16f845d2f1a09c4d0934ab83f3cb14478a71d.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
rv_reactor has a reference counter to ensure it is not removed while
monitors are still using it.
However, this is futile, as __exit functions are not expected to fail and
will proceed normally despite rv_unregister_reactor() returning an error.
At the moment, reactors do not support being built as modules, therefore
they are never removed and the reference counters are not necessary.
If we support building RV reactors as modules in the future, kernel
module's centralized facilities such as try_module_get(), module_put() or
MODULE_SOFTDEP should be used instead of this custom implementation.
Remove this reference counter.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/bb946398436a5e17fb0f5b842ef3313c02291852.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Each struct rv_reactor has a unique struct rv_reactor_def associated with
it. struct rv_reactor is statically allocated, while struct rv_reactor_def
is dynamically allocated.
This makes the code more complicated than it should be:
- Lookup is required to get the associated rv_reactor_def from rv_reactor
- Dynamic memory allocation is required for rv_reactor_def. This is
harder to get right compared to static memory. For instance, there is
an existing mistake: rv_unregister_reactor() does not free the memory
allocated by rv_register_reactor(). This is fortunately not a real
memory leak problem as rv_unregister_reactor() is never called.
Simplify and merge rv_reactor_def into rv_reactor.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/71cb91c86cd40df5b8c492b788787f2a73c3eaa3.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Each struct rv_monitor has a unique struct rv_monitor_def associated with
it. struct rv_monitor is statically allocated, while struct rv_monitor_def
is dynamically allocated.
This makes the code more complicated than it should be:
- Lookup is required to get the associated rv_monitor_def from rv_monitor
- Dynamic memory allocation is required for rv_monitor_def. This is
harder to get right compared to static memory. For instance, there is
an existing mistake: rv_unregister_monitor() does not free the memory
allocated by rv_register_monitor(). This is fortunately not a real
memory leak problem, as rv_unregister_monitor() is never called.
Simplify and merge rv_monitor_def into rv_monitor.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/194449c00f87945c207aab4c96920c75796a4f53.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
rv_monitor_def::task_monitor is not used. Delete it.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/502d94f2696435690a2b1fdbe80a9e56c96fcabf.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"11 hotfixes. 9 are cc:stable and the remainder address post-6.15
issues or aren't considered necessary for -stable kernels.
7 are for MM"
* tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
sprintf.h requires stdarg.h
resource: fix false warning in __request_region()
mm/damon/core: commit damos_quota_goal->nid
kasan: use vmalloc_dump_obj() for vmalloc error reports
mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()
mm: update MAINTAINERS entry for HMM
nilfs2: reject invalid file types when reading inodes
selftests/mm: fix split_huge_page_test for folio_split() tests
mailmap: add entry for Senozhatsky
mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
|
|
Move multiple copies of same code snippet doing `gro_flush` and
`gro_normal_list` into separate helper function.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250723013031.2911384-2-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|