Age | Commit message (Collapse) | Author |
|
Just use sha256() instead of the clunky crypto API. This is much
simpler.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since bcachefs does not access crc32c and crc64 through the crypto API,
there is no need to use module softdeps to ensure they are loaded.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These weren't hooked up, but they probably should be - add some comments
for context.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes another "rebalance spinning and doing no work" issue;
rebalance was reading extents it wanted to move, but then failing in
bch2_write() -> bch2_alloc_sectors_start() due to being unable to
allocate sufficient replicas.
This was triggered by a user playing with the durability settings, the
foreground device was an NVME device with durability=2, and originally
he'd set the background device to durability=2 as well, but changed it
back to 1 (the default) after seeing IO errors.
That meant that with replicas=2, we want to move data off the NVME
device which satisfies that constraint, but with a single durability=1
device on the background target there's no way to move the extent to
that target while satisfiying the "required replicas" constraint.
The solution for now is for bch2_data_update_init() to check for this,
and return an error - before kicking off the read.
bch2_data_update_init() already had two different checks for "will we be
able to write this extent", with partially duplicated code, so this
patch combines and improves that logic.
Additionally, we now always bail out and return an error if there's
insufficient space on the destination target. Previously, we only did
this for BCH_WRITE_alloc_nowait moves, because it might be the case that
copygc just needs to free up space on the destination target.
But we really shouldn't kick off a move if the destination is full, we
can't currently distinguish between "really full" and "just need to wait
for copygc", and if we are going to wait on copygc it'd be better to do
that before kicking off the move.
This will additionally fix "rebalance spinning" issues caused by a
filesystem that has more data than can fit in background_target - which
is a valid scenario, since we don't exclude foreground/cache devices
when calculating filesystem capacity.
Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now there are 16 journal buffers, 8 is too small to be enough.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Next patch will be checking if the extent we're reading from matches the
IO failure we saw before marking the failure.
For this to work, __bch2_read() needs to take the same transaction
context that bch2_rbio_retry() uses to do that check.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Read flags are codepath dependent and change as they're passed around,
while the fields in rbio._state are mostly fixed properties of that
particular object.
Losing track of BCH_READ_data_update would be bad, and previously it was
not obvious if it was always correctly set in the rbio, so this is a
safety cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Merge cpuidle updates for 6.15-rc5, including a menu governor update
that is reported to improve some benchmark results quite significantly:
- Update the handling of the most recent idle intervals in the menu
cpuidle governor to prevent useful information from being discarded
by it in some cases and improve the prediction accuracy (Rafael
Wysocki).
- Make it possible to tell the intel_idle driver to ignore its built-in
table of idle states for the given processor, clean up the handling
of auto-demotion disabling on Baytrail and Cherrytrail chips in it,
and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy,
Rafael Wysocki).
- Make some cpuidle drivers use for_each_present_cpu() instead of
for_each_possible_cpu() during initialization to avoid issues
occurring when nosmp or maxcpus=0 are used (Jacky Bai).
* pm-cpuidle:
cpuidle: Init cpuidle only for present CPUs
cpuidle: intel_idle: Update MAINTAINERS
intel_idle: introduce 'no_native' module parameter
cpuidle: menu: Update documentation after get_typical_interval() changes
cpuidle: menu: Avoid discarding useful information
cpuidle: menu: Eliminate outliers on both ends of the sample set
cpuidle: menu: Tweak threshold use in get_typical_interval()
cpuidle: menu: Use one loop for average and variance computations
cpuidle: menu: Drop a redundant local variable
intel_idle: clean up BYT/CHT auto demotion disable
|
|
Merge cpufreq updates for 6.15-rc1:
- Manage sysfs attributes and boost frequencies efficiently from
cpufreq core to reduce boilerplate code from drivers (Viresh Kumar).
- Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
Dhananjay Ugwekar, Imran Shaik, and zuoqian).
- Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
Bai).
- cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).
- Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng).
- Optimize the amd-pstate driver to avoid cases where call paths end
up calling the same writes multiple times and needlessly caching
variables through code reorganization, locking overhaul and tracing
adjustments (Mario Limonciello, Dhananjay Ugwekar).
- Make it possible to avoid enabling capacity-aware scheduling (CAS) in
the intel_pstate driver and relocate a check for out-of-band (OOB)
platform handling in it to make it detect OOB before checking HWP
availability (Rafael Wysocki).
- Fix dbs_update() to avoid inadvertent conversions of negative integer
values to unsigned int which causes CPU frequency selection to be
inaccurate in some cases when the "conservative" cpufreq governor is
in use (Jie Zhan).
* pm-cpufreq: (91 commits)
dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
cpufreq: Init cpufreq only for present CPUs
cpufreq: tegra186: Share policy per cluster
cpufreq/amd-pstate: Drop actions in amd_pstate_epp_cpu_offline()
cpufreq/amd-pstate: Stop caching EPP
cpufreq/amd-pstate: Rework CPPC enabling
cpufreq/amd-pstate: Drop debug statements for policy setting
cpufreq/amd-pstate: Update cppc_req_cached for shared mem EPP writes
cpufreq/amd-pstate: Move all EPP tracing into *_update_perf and *_set_epp functions
cpufreq/amd-pstate: Cache CPPC request in shared mem case too
cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks
cpufreq/amd-pstate-ut: Adjust variable scope
cpufreq/amd-pstate-ut: Run on all of the correct CPUs
cpufreq/amd-pstate-ut: Drop SUCCESS and FAIL enums
cpufreq/amd-pstate-ut: Allow lowest nonlinear and lowest to be the same
cpufreq/amd-pstate-ut: Use _free macro to free put policy
cpufreq/amd-pstate: Drop `cppc_cap1_cached`
...
|
|
Merge thermal core updates and miscellaneous updates of the thermal
control subsystem for 6.15-rc1:
- Delay exposing thermal zone sysfs interface to prevent user space
from accessing thermal zones that have not been completely
initialized yet (Lucas De Marchi).
- Fix a spelling mistake in a comment in the thermal core (Colin Ian
King).
- Use kcalloc() instead of kzalloc() in some places in the thermal
control subsystem (Lukasz Luba, Ethan Carter Edwards).
- Clean up variable initialization in int340x_thermal_zone_add()
(Christophe JAILLET).
* thermal-core:
thermal: core: Delay exposing sysfs interface
thermal: core: Fix spelling mistake "Occurences" -> "Occurrences"
* thermal-misc:
thermal: intel: Clean up zone_trips[] initialization in int340x_thermal_zone_add()
thermal: hisi: Use kcalloc() instead of kzalloc() with multiplication
thermal: int340x: Use kcalloc() instead of kzalloc() with multiplication
thermal: k3_j72xx_bandgap: Use kcalloc() instead of kzalloc()
thermal/of: Use kcalloc() instead of kzalloc() with multiplication
thermal/debugfs: replace kzalloc() with kcalloc() in thermal_debug_tz_add()
|
|
Merge an ACPI CPPC update, ACPI platform-profile driver updates, an ACPI
APEI update and a MAINTAINERS update related to ACPI for 6.15-rc1:
- Add a missing header file include to the x86 arch CPPC code (Mario
Limonciello).
- Rework the sysfs attributes implementation in the ACPI platform-profile
driver and improve the unregistration code in it (Nathan Chancellor,
Kurt Borja).
- Prevent the ACPI HED driver from being built as a module and change
its initcall level to subsys_initcall to avoid initialization ordering
issues related to it (Xiaofei Tan).
- Update a maintainer email address in the ACPI PMIC entry in
MAINTAINERS (Mika Westerberg).
* acpi-x86:
x86/ACPI: CPPC: Add missing include
* acpi-platform-profile:
ACPI: platform_profile: Improve platform_profile_unregister()
ACPI: platform-profile: Fix CFI violation when accessing sysfs files
* acpi-apei:
ACPI: HED: Always initialize before evged
* acpi-misc:
MAINTAINERS: Use my kernel.org address for ACPI PMIC work
|
|
'acpi-video'
Merge five ACPI driver updates for 6.15-rc1:
- Use the str_on_off() helper function instead of hard-coded strings in
the ACPI power resources handling code (Thorsten Blum).
- Add fan speed reporting for ACPI fans that have _FST, but otherwise
do not support the entire ACPI 4 fan interface (Joshua Grisham).
- Fix a stale comment regarding trip points in acpi_thermal_add() that
diverged from the commented code after removing _CRT evaluation from
acpi_thermal_get_trip_points() (xueqin Luo).
- Make ACPI button driver also subscribe to system events (Mario
Limonciello).
- Use the str_yes_no() helper function instead of hard-coded strings in
the ACPI backlight (video) driver (Thorsten Blum).
* acpi-power:
ACPI: power: Use str_on_off() helper function
* acpi-fan:
ACPI: fan: Add fan speed reporting for fans with only _FST
* acpi-thermal:
ACPI: thermal: Fix stale comment regarding trip points
* acpi-button:
ACPI: button: Install notifier for system events as well
* acpi-video:
ACPI: video: Use str_yes_no() helper in acpi_video_bus_add()
|
|
Binding AX25 socket by using the autobind feature leads to memory leaks
in ax25_connect() and also refcount leaks in ax25_release(). Memory
leak was detected with kmemleak:
================================================================
unreferenced object 0xffff8880253cd680 (size 96):
backtrace:
__kmalloc_node_track_caller_noprof (./include/linux/kmemleak.h:43)
kmemdup_noprof (mm/util.c:136)
ax25_rt_autobind (net/ax25/ax25_route.c:428)
ax25_connect (net/ax25/af_ax25.c:1282)
__sys_connect_file (net/socket.c:2045)
__sys_connect (net/socket.c:2064)
__x64_sys_connect (net/socket.c:2067)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
================================================================
When socket is bound, refcounts must be incremented the way it is done
in ax25_bind() and ax25_setsockopt() (SO_BINDTODEVICE). In case of
autobind, the refcounts are not incremented.
This bug leads to the following issue reported by Syzkaller:
================================================================
ax25_connect(): syz-executor318 uses autobind, please contact jreuter@yaina.de
------------[ cut here ]------------
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 5317 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
Modules linked in:
CPU: 0 UID: 0 PID: 5317 Comm: syz-executor318 Not tainted 6.14.0-rc4-syzkaller-00278-gece144f151ac #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
...
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:336 [inline]
refcount_dec include/linux/refcount.h:351 [inline]
ref_tracker_free+0x6af/0x7e0 lib/ref_tracker.c:236
netdev_tracker_free include/linux/netdevice.h:4302 [inline]
netdev_put include/linux/netdevice.h:4319 [inline]
ax25_release+0x368/0x960 net/ax25/af_ax25.c:1080
__sock_release net/socket.c:647 [inline]
sock_close+0xbc/0x240 net/socket.c:1398
__fput+0x3e9/0x9f0 fs/file_table.c:464
__do_sys_close fs/open.c:1580 [inline]
__se_sys_close fs/open.c:1565 [inline]
__x64_sys_close+0x7f/0x110 fs/open.c:1565
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
</TASK>
================================================================
Considering the issues above and the comments left in the code that say:
"check if we can remove this feature. It is broken."; "autobinding in this
may or may not work"; - it is better to completely remove this feature than
to fix it because it is broken and leads to various kinds of memory bugs.
Now calling connect() without first binding socket will result in an
error (-EINVAL). Userspace software that relies on the autobind feature
might get broken. However, this feature does not seem widely used with
this specific driver as it was not reliable at any point of time, and it
is already broken anyway. E.g. ax25-tools and ax25-apps packages for
popular distributions do not use the autobind feature for AF_AX25.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+33841dc6aa3e1d86b78a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=33841dc6aa3e1d86b78a
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It seems there really exists the need for a simple sysfs interface that
can be easily used from minimal initramfs images that don't contain much
more than busybox. However the current interface poses a challenge to
the removal of global GPIO numberspace. Add an item that tracks
extending the existing ABI with a per-chip export/unexport attribute
pair.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-6-7b38f07110ee@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Add an item tracking the treewide conversion of GPIO drivers to using
the new line value setter callbacks in struct gpio_chip instead of the
old ones that don't allow drivers to signal failures to callers.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-5-7b38f07110ee@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
For better readability of the TODO, let's add some graphical delimiters
between tasks.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-4-7b38f07110ee@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
While there are surely some arguments in favor of integrating the GPIO
and pinctrl subsystems into one, I believe this is not the right
approach.
The GPIO subsystem uses intricate locking with SRCU to handle the fact
that both consumers and providers may run in different contexts.
Pin-controller drivers are always meant to run in process context. This
alone is a huge obstacle to any attempt at integration as evident by
many problems we already encountered during the hotplug rework.
The current glue code is pretty minimal and for most part already allows
GPIO controllers to query pinctrl about the information they need.
I suggest to drop this task and keep the subsystems separate even if
many pin-controllers implement GPIO functionality in addition to pin
functions.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-3-7b38f07110ee@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
The removal of linux/gpio.h is already tracked by the item about
converting drivers to using the descriptor-based API. Remove the
duplicate.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-2-7b38f07110ee@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
The consensus among core GPIO stakeholders seems to be that a new
debugfs interface will only increase maintenance burden and will fail
to attract users that care about long-term stability of the ABI[1].
Let's not go this way and not add a fourth user-facing interface to the
GPIO subsystem.
[1] https://lore.kernel.org/all/9d3f1ca4-d865-45af-9032-c38cacc7fe93@pengutronix.de/
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20250321-gpio-todo-updates-v1-1-7b38f07110ee@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Convert the event_hash array in trace_output.c to use the generic
hashtable implementation from hashtable.h instead of the manually
implemented hash table.
This simplifies the code and makes it more maintainable by using the
standard hashtable API defined in hashtable.h.
Rename EVENT_HASHSIZE to EVENT_HASH_BITS to properly reflect its new
meaning as the number of bits for the hashtable size.
Link: https://lore.kernel.org/20250323132800.3010783-1-sashal@kernel.org
Link: https://lore.kernel.org/20250319190545.3058319-1-sashal@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently, using synth_event_delete() will fail if the event is being
used (tracing in progress), but that is normally done in the module exit
function. At that stage, failing is problematic as returning a non-zero
status means the module will become locked (impossible to unload or
reload again).
Instead, ensure the module exit function does not get called in the
first place by increasing the module refcnt when the event is enabled.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 35ca5207c2d11 ("tracing: Add synthetic event command generation functions")
Link: https://lore.kernel.org/20250318180906.226841-1-douglas.raillard@arm.com
Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
TRACE_REG_UNREGISTER
When __ftrace_event_enable_disable invokes the class callback to
unregister the event, the return value is not reported up to the
caller, hence leading to event unregister failures being silently
ignored.
This patch assigns the ret variable to the invocation of the
event unregister callback, so that its return value is stored
and reported to the caller, and it raises a warning in case
of error.
Link: https://lore.kernel.org/20250321170821.101403-1-gpaoloni@redhat.com
Signed-off-by: Gabriele Paoloni <gpaoloni@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Lockdep reports this deadlock log:
osnoise: could not start sampling thread
============================================
WARNING: possible recursive locking detected
--------------------------------------------
CPU0
----
lock(cpu_hotplug_lock);
lock(cpu_hotplug_lock);
Call Trace:
<TASK>
print_deadlock_bug+0x282/0x3c0
__lock_acquire+0x1610/0x29a0
lock_acquire+0xcb/0x2d0
cpus_read_lock+0x49/0x120
stop_per_cpu_kthreads+0x7/0x60
start_kthread+0x103/0x120
osnoise_hotplug_workfn+0x5e/0x90
process_one_work+0x44f/0xb30
worker_thread+0x33e/0x5e0
kthread+0x206/0x3b0
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x11/0x20
</TASK>
This is the deadlock scenario:
osnoise_hotplug_workfn()
guard(cpus_read_lock)(); // first lock call
start_kthread(cpu)
if (IS_ERR(kthread)) {
stop_per_cpu_kthreads(); {
cpus_read_lock(); // second lock call. Cause the AA deadlock
}
}
It is not necessary to call stop_per_cpu_kthreads() which stops osnoise
kthread for every other CPUs in the system if a failure occurs during
hotplug of a certain CPU.
For start_per_cpu_kthreads(), if the start_kthread() call fails,
this function calls stop_per_cpu_kthreads() to handle the error.
Therefore, similarly, there is no need to call stop_per_cpu_kthreads()
again within start_kthread().
So just remove stop_per_cpu_kthreads() from start_kthread to solve this issue.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250321095249.2739397-1-ranxiaokai627@163.com
Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The vast majority of ftrace event print fmt consist of a space-separated
field=value pair. Synthetic event currently use a comma-separated
field=value pair, which sticks out from events created via more
classical means.
Align the format of synth events so they look just like any other event,
for better consistency and less headache when doing crude text-based
data processing.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250319215028.1680278-1-douglas.raillard@arm.com
Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add compatible string and property for the SiFive CLINT v2. The SiFive
CLINT v2 is incompatible with the SiFive CLINT v0 due to differences
in their control methods.
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20250321083507.25298-1-nick.hu@sifive.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
MITIGATION_RETPOLINE
1. MITIGATION_RETPOLINE is x86-only (defined in arch/x86/Kconfig),
so no need to AND with CONFIG_X86 when checking if enabled.
2. Remove unused declaration of nf_skip_indirect_calls() when
MITIGATION_RETPOLINE is disabled to avoid warnings.
3. Declare nf_skip_indirect_calls() and nf_skip_indirect_calls_enable()
as inline when MITIGATION_RETPOLINE is enabled, as they are called
only once and have simple logic.
Fix follow error with clang-21 when W=1e:
net/netfilter/nf_tables_core.c:39:20: error: unused function 'nf_skip_indirect_calls' [-Werror,-Wunused-function]
39 | static inline bool nf_skip_indirect_calls(void) { return false; }
| ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make[4]: *** [scripts/Makefile.build:207: net/netfilter/nf_tables_core.o] Error 1
make[3]: *** [scripts/Makefile.build:465: net/netfilter] Error 2
make[3]: *** Waiting for unfinished jobs....
Fixes: d8d760627855 ("netfilter: nf_tables: add static key to skip retpoline workarounds")
Co-developed-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nf_sk_lookup_slow_v4 does the conntrack lookup for IPv4 packets to
restore the original 5-tuple in case of SNAT, to be able to find the
right socket (if any). Then socket_match() can correctly check whether
the socket was transparent.
However, the IPv6 counterpart (nf_sk_lookup_slow_v6) lacks this
conntrack lookup, making xt_socket fail to match on the socket when the
packet was SNATed. Add the same logic to nf_sk_lookup_slow_v6.
IPv6 SNAT is used in Kubernetes clusters for pod-to-world packets, as
pods' addresses are in the fd00::/8 ULA subnet and need to be replaced
with the node's external address. Cilium leverages Envoy to enforce L7
policies, and Envoy uses transparent sockets. Cilium inserts an iptables
prerouting rule that matches on `-m socket --transparent` and redirects
the packets to localhost, but it fails to match SNATed IPv6 packets due
to that missing conntrack lookup.
Closes: https://github.com/cilium/cilium/issues/37932
Fixes: eb31628e37a0 ("netfilter: nf_tables: Add support for IPv6 NAT")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
kzalloc() already zero-initializes the destination buffer, making
strscpy() sufficient for safely copying the name. The additional NUL-
padding performed by strscpy_pad() is unnecessary.
The size parameter is optional, and strscpy() automatically determines
the size of the destination buffer using sizeof() if the argument is
omitted. This makes the explicit sizeof() call unnecessary; remove it.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
It is possible that ctx in nfqnl_build_packet_message() could be used
before it is properly initialize, which is only initialized
by nfqnl_get_sk_secctx().
This patch corrects this problem by initializing the lsmctx to a safe
value when it is declared.
This is similar to the commit 35fcac7a7c25
("audit: Initialize lsmctx to avoid memory allocation error").
Fixes: 2d470c778120 ("lsm: replace context+len with lsm_context")
Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Fix double free of irq in amd-mp2 driver"
* tag 'i2c-for-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: amd-mp2: drop free_irq() of devm_request_irq() allocated irq
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf events fix from Ingo Molnar:
"Fix an information leak regression in the AMD IBS PMU code"
* tag 'perf-urgent-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/amd/ibs: Prevent leaking sensitive data to userspace
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys fix from Jarkko Sakkinen:
"Fix potential use-after-free in key_put()"
* tag 'keys-next-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
keys: Fix UAF in key_put()
|
|
Pull io_uring fix from Jens Axboe:
"Just a single fix for the commit that went into your tree yesterday,
which exposed an issue with not always clearing notifications. That
could cause them to be used more than once"
* tag 'io_uring-6.14-20250322' of git://git.kernel.dk/linux:
io_uring/net: fix sendzc double notif flush
|
|
Add ublk stripe target which can take 1~4 underlying backing files
or block device, with stripe size 4k ~ 512K.
Add two basic tests(write verify & mkfs/mount/umount) over ublk/stripe.
This target is helpful to cover multiple IOs aiming at same
fixed/registered IO kernel buffer.
It is also capable of verifying vectored registered (kernel)buffers
in future for zero copy, so far it isn't supported yet.
Todo: support vectored registered kernel buffer for ublk/zc.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the added target io handling helpers for simplifying loop io
completion.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Enable zero copy for null target so that we can evaluate performance
from zero copy or not.
Also this should be the simplest ublk zero copy implementation, which
can be served as zc example.
Add test for covering 'add -t null -z'.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
- pass 'truct dev_ctx *ctx' to target init function
- add 'private_data' to 'struct ublk_dev' for storing target specific data
- add 'private_data' to 'struct ublk_io' for storing per-IO data
- add 'tgt_ios' to 'struct ublk_io' for counting how many io_uring ios
for handling the current io command
- add helper ublk_get_io() for supporting stripe target
- add two helpers for simplifying target io handling
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move two functions for initializing & de-initializing backing file
into common.c.
Also move one common helper into kublk.h.
Prepare for supporting ublk-stripe.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Increase max buffer size to 1MB, and 64KB is too small to evaluate
performance with builtin ublk server implementation.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Unify the sqe allocator helper, and we will use it for supporting
more cases, such as ublk stripe, in which variable sqe allocation
is required.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
block layer, ublk and io_uring might re-order IO in the past
- plug
- queue ublk io command via task work
Add one test for verifying if sequential WRITE IO is dispatched in order.
- null target is taken, so we can just observe io order from
`tracepoint:block:block_rq_complete` which represents the dispatch order
- WRITE IO is taken because READ may come from system-wide utility
Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 5823 at lib/refcount.c:28 refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
RIP: 0010:refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
Call Trace:
<TASK>
io_notif_flush io_uring/notif.h:40 [inline]
io_send_zc_cleanup+0x121/0x170 io_uring/net.c:1222
io_clean_op+0x58c/0x9a0 io_uring/io_uring.c:406
io_free_batch_list io_uring/io_uring.c:1429 [inline]
__io_submit_flush_completions+0xc16/0xd20 io_uring/io_uring.c:1470
io_submit_flush_completions io_uring/io_uring.h:159 [inline]
Before the blamed commit, sendzc relied on io_req_msg_cleanup() to clear
REQ_F_NEED_CLEANUP, so after the following snippet the request will
never hit the core io_uring cleanup path.
io_notif_flush();
io_req_msg_cleanup();
The easiest fix is to null the notification. io_send_zc_cleanup() can
still be called after, but it's tolerated.
Reported-by: syzbot+cf285a028ffba71b2ef5@syzkaller.appspotmail.com
Tested-by: syzbot+cf285a028ffba71b2ef5@syzkaller.appspotmail.com
Fixes: cc34d8330e036 ("io_uring/net: don't clear REQ_F_NEED_CLEANUP unconditionally")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1306007458b8891c88c4f20c966a17595f766b0.1742643795.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge ARM cpufreq updates for 6.15 from Viresh Kumar:
"- manage sysfs attributes and boost frequencies efficiently from cpufreq
core to reduce boilerplate code from drivers (Viresh Kumar).
- Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
Dhananjay Ugwekar, Imran Shaik, and zuoqian).
- Migrate to using for_each_present_cpu (Jacky Bai).
- cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).
- Use str_enable_disable() helper (Lifeng Zheng)."
* tag 'cpufreq-arm-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (59 commits)
dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
cpufreq: Init cpufreq only for present CPUs
cpufreq: tegra186: Share policy per cluster
cpufreq: tegra194: Allow building for Tegra234
cpufreq: enable 1200Mhz clock speed for armada-37xx
cpufreq: Remove cpufreq_enable_boost_support()
cpufreq: staticize policy_has_boost_freq()
cpufreq: qcom: Set .set_boost directly
cpufreq: dt: Set .set_boost directly
cpufreq: scmi: Set .set_boost directly
cpufreq: powernv: Set .set_boost directly
cpufreq: loongson: Set .set_boost directly
cpufreq: apple: Set .set_boost directly
cpufreq: Restrict enabling boost on policies with no boost frequencies
cpufreq: cppc: Set policy->boost_supported
cpufreq: amd: Set policy->boost_supported
cpufreq: acpi: Set policy->boost_supported
...
|
|
The mask operation link->flags | DL_FLAG_PM_RUNTIME is always true which
is incorrect. The mask operation should be using the bit-wise &
operator. Fix this.
Fixes: bca84a7b93fd ("PM: sleep: Use DPM_FLAG_SMART_SUSPEND conditionally")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250319114324.791829-1-colin.i.king@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point. The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.
However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.
Fix this by using a flag to indicate that a key can be gc'd now rather than
looking at the key's refcount in the garbage collector.
Fixes: 9578e327b2b4 ("keys: update key quotas in key_put()")
Reported-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/673b6aec.050a0220.87769.004a.GAE@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
switching
Kairui reported a UAF issue in print_graph_function_flags() during
ftrace stress testing [1]. This issue can be reproduced if puting a
'mdelay(10)' after 'mutex_unlock(&trace_types_lock)' in s_start(),
and executing the following script:
$ echo function_graph > current_tracer
$ cat trace > /dev/null &
$ sleep 5 # Ensure the 'cat' reaches the 'mdelay(10)' point
$ echo timerlat > current_tracer
The root cause lies in the two calls to print_graph_function_flags
within print_trace_line during each s_show():
* One through 'iter->trace->print_line()';
* Another through 'event->funcs->trace()', which is hidden in
print_trace_fmt() before print_trace_line returns.
Tracer switching only updates the former, while the latter continues
to use the print_line function of the old tracer, which in the script
above is print_graph_function_flags.
Moreover, when switching from the 'function_graph' tracer to the
'timerlat' tracer, s_start only calls graph_trace_close of the
'function_graph' tracer to free 'iter->private', but does not set
it to NULL. This provides an opportunity for 'event->funcs->trace()'
to use an invalid 'iter->private'.
To fix this issue, set 'iter->private' to NULL immediately after
freeing it in graph_trace_close(), ensuring that an invalid pointer
is not passed to other tracers. Additionally, clean up the unnecessary
'iter->private = NULL' during each 'cat trace' when using wakeup and
irqsoff tracers.
[1] https://lore.kernel.org/all/20231112150030.84609-1-ryncsn@gmail.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Link: https://lore.kernel.org/20250320122137.23635-1-wutengda@huaweicloud.com
Fixes: eecb91b9f98d ("tracing: Fix memleak due to race between current_tracer and trace")
Closes: https://lore.kernel.org/all/CAMgjq7BW79KDSCyp+tZHjShSzHsScSiJxn5ffskp-QzVM06fxw@mail.gmail.com/
Reported-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The commit ca29a0bf122145 ("tracing: gfp: Remove duplication of recording
GFP flags") caused the following regression in printf_test selftest:
[ 46.208199] test_printf: kvasprintf(..., "%pGg", ...) returned 'none|0xfc000000', expected '0xfc000000'
[ 46.208209] test_printf: kvasprintf(..., "%pGg", ...) returned '__GFP_HIGH|none|0xfc000000', expected '__GFP_HIGH|0xfc000000'
The problem is the new '{ 0, "none" }' entry in __def_gfpflag_names macro
and the following code:
char *format_flags(char *buf, char *end, unsigned long flags,
const struct trace_print_flags *names)
{
[...]
if ((flags & mask) != mask)
continue;
[...]
}
The purpose of the code is to print the name of a mask instead of bits,
for example, printk "GFP_ZONEMASK", instead of
"__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE".
Unfortunately, the mask "0" pass this check and "none" is always
printed.
A solution would be to move TRACE_GFP_FLAGS up so that it is not
the last entry. But it breaks the rule that named masks must
be defined before names of single bytes. Otherwise, it would
print the names of the bytes instead of the mask.
Instead, replace '{ 0, "none" }' with '{ 0, NULL }'. It works because
__def_gfpflag_names defines a standalone array and this is the standard
trailing entry. The code processing these arrays always ends the cycle
when flag->name == NULL.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Tamir Duberstein <tamird@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/Z9Q5d11ZbA3CNMZm@pathway.suse.cz
Fixes: ca29a0bf122145 ("tracing: gfp: Remove duplication of recording GFP flags")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
CONFIG_TRACE_BRANCH_PROFILING inserts a call to ftrace_likely_update()
for each use of likely() or unlikely(). That breaks noinstr rules if
the affected function is annotated as noinstr.
Disable branch profiling for files with noinstr functions. In addition
to some individual files, this also includes the entire arch/x86
subtree, as well as the kernel/entry, drivers/cpuidle, and drivers/idle
directories, all of which are noinstr-heavy.
Due to the nature of how sched binaries are built by combining multiple
.c files into one, branch profiling is disabled more broadly across the
sched code than would otherwise be needed.
This fixes many warnings like the following:
vmlinux.o: warning: objtool: do_syscall_64+0x40: call to ftrace_likely_update() leaves .noinstr.text section
vmlinux.o: warning: objtool: __rdgsbase_inactive+0x33: call to ftrace_likely_update() leaves .noinstr.text section
vmlinux.o: warning: objtool: handle_bug.isra.0+0x198: call to ftrace_likely_update() leaves .noinstr.text section
...
Reported-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/fb94fc9303d48a5ed370498f54500cc4c338eb6d.1742586676.git.jpoimboe@kernel.org
|
|
Add support for HP Cadet, Clipper OmniBook, Turbine OmniBook, Trekker,
Enstrom Onmibook, Piston Omnibook
Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with I2C
Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com>
Link: https://patch.msgid.link/20250321231717.1232792-1-sbinding@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|