linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-03-13	posix-timers: Remove a few paranoid warnings	Thomas Gleixner
	Warnings about a non-initialized timer or non-existing callbacks are just useful for implementing new posix clocks, but there a NULL pointer dereference is expected anyway. :) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.765462334@linutronix.de
2025-03-13	posix-timers: Cleanup includes	Thomas Gleixner
	Remove pointless includes and sort the remaining ones alphabetically. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.701301552@linutronix.de
2025-03-13	posix-timers: Add cond_resched() to posix_timer_add() search loop	Eric Dumazet
	With a large number of POSIX timers the search for a valid ID might cause a soft lockup on PREEMPT_NONE/VOLUNTARY kernels. Add cond_resched() to the loop to prevent that. [ tglx: Split out from Eric's series ] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250214135911.2037402-2-edumazet@google.com Link: https://lore.kernel.org/all/20250308155623.635612865@linutronix.de
2025-03-13	posix-timers: Initialise timer before adding it to the hash table	Eric Dumazet
	A timer is only valid in the hashtable when both timer::it_signal and timer::it_id are set to their final values, but timers are added without those values being set. The timer ID is allocated when the timer is added to the hash in invalid state. The ID is taken from a monotonically increasing per process counter which wraps around after reaching INT_MAX. The hash insertion validates that there is no timer with the allocated ID in the hash table which belongs to the same process. That opens a mostly theoretical race condition: If other threads of the same process manage to create/delete timers in rapid succession before the newly created timer is fully initialized and wrap around to the timer ID which was handed out, then a duplicate timer ID will be inserted into the hash table. Prevent this by: 1) Setting timer::it_id before inserting the timer into the hashtable. 2) Storing the signal pointer in timer::it_signal with bit 0 set before inserting it into the hashtable. Bit 0 acts as a invalid bit, which means that the regular lookup for sys_timer_*() will fail the comparison with the signal pointer. But the lookup on insertion masks out bit 0 and can therefore detect a timer which is not yet valid, but allocated in the hash table. Bit 0 in the pointer is cleared once the initialization of the timer completed. [ tglx: Fold ID and signal iniitializaion into one patch and massage change log and comments. ] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250219125522.2535263-3-edumazet@google.com Link: https://lore.kernel.org/all/20250308155623.572035178@linutronix.de
2025-03-13	posix-timers: Ensure that timer initialization is fully visible	Thomas Gleixner
	Frederic pointed out that the memory operations to initialize the timer are not guaranteed to be visible, when __lock_timer() observes timer::it_signal valid under timer::it_lock: T0 T1 --------- ----------- do_timer_create() // A new_timer->.... = .... spin_lock(current->sighand) // B WRITE_ONCE(new_timer->it_signal, current->signal) spin_unlock(current->sighand) sys_timer_*() t = __lock_timer() spin_lock(&timr->it_lock) // observes B if (timr->it_signal == current->signal) return timr; if (!t) return; // Is not guaranteed to observe A Protect the write of timer::it_signal, which makes the timer valid, with timer::it_lock as well. This guarantees that T1 must observe the initialization A completely, when it observes the valid signal pointer under timer::it_lock. sighand::siglock must still be taken to protect the signal::posix_timers list. Reported-by: Frederic Weisbecker <frederic@kernel.org> Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.507944489@linutronix.de
2025-03-13	clocksource: Remove unnecessary strscpy() size argument	Thorsten Blum
	The size argument of strscpy() is only required when the destination pointer is not a fixed sized array or when the copy needs to be smaller than the size of the fixed sized destination array. For fixed sized destination arrays and full copies, strscpy() automatically determines the length of the destination buffer if the size argument is omitted. This makes the explicit sizeof() unnecessary. Remove it. [ tglx: Massaged change log ] Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250311110624.495718-2-thorsten.blum@linux.dev
2025-03-13	Revert "openvswitch: switch to per-action label counting in conntrack"	Xin Long
	Currently, ovs_ct_set_labels() is only called for confirmed conntrack entries (ct) within ovs_ct_commit(). However, if the conntrack entry does not have the labels_ext extension, attempting to allocate it in ovs_ct_get_conn_labels() for a confirmed entry triggers a warning in nf_ct_ext_add(): WARN_ON(nf_ct_is_confirmed(ct)); This happens when the conntrack entry is created externally before OVS increments net->ct.labels_used. The issue has become more likely since commit fcb1aa5163b1 ("openvswitch: switch to per-action label counting in conntrack"), which changed to use per-action label counting and increment net->ct.labels_used when a flow with ct action is added. Since there’s no straightforward way to fully resolve this issue at the moment, this reverts the commit to avoid breaking existing use cases. Fixes: fcb1aa5163b1 ("openvswitch: switch to per-action label counting in conntrack") Reported-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/1bdeb2f3a812bca016a225d3de714427b2cd4772.1741457143.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: openvswitch: remove misbehaving actions length check	Ilya Maximets
	The actions length check is unreliable and produces different results depending on the initial length of the provided netlink attribute and the composition of the actual actions inside of it. For example, a user can add 4088 empty clone() actions without triggering -EMSGSIZE, on attempt to add 4089 such actions the operation will fail with the -EMSGSIZE verdict. However, if another 16 KB of other actions will be appended to the previous 4089 clone() actions, the check passes and the flow is successfully installed into the openvswitch datapath. The reason for a such a weird behavior is the way memory is allocated. When ovs_flow_cmd_new() is invoked, it calls ovs_nla_copy_actions(), that in turn calls nla_alloc_flow_actions() with either the actual length of the user-provided actions or the MAX_ACTIONS_BUFSIZE. The function adds the size of the sw_flow_actions structure and then the actually allocated memory is rounded up to the closest power of two. So, if the user-provided actions are larger than MAX_ACTIONS_BUFSIZE, then MAX_ACTIONS_BUFSIZE + sizeof(*sfa) rounded up is 32K + 24 -> 64K. Later, while copying individual actions, we look at ksize(), which is 64K, so this way the MAX_ACTIONS_BUFSIZE check is not actually triggered and the user can easily allocate almost 64 KB of actions. However, when the initial size is less than MAX_ACTIONS_BUFSIZE, but the actions contain ones that require size increase while copying (such as clone() or sample()), then the limit check will be performed during the reserve_sfa_size() and the user will not be allowed to create actions that yield more than 32 KB internally. This is one part of the problem. The other part is that it's not actually possible for the userspace application to know beforehand if the particular set of actions will be rejected or not. Certain actions require more space in the internal representation, e.g. an empty clone() takes 4 bytes in the action list passed in by the user, but it takes 12 bytes in the internal representation due to an extra nested attribute, and some actions require less space in the internal representations, e.g. set(tunnel(..)) normally takes 64+ bytes in the action list provided by the user, but only needs to store a single pointer in the internal implementation, since all the data is stored in the tunnel_info structure instead. And the action size limit is applied to the internal representation, not to the action list passed by the user. So, it's not possible for the userpsace application to predict if the certain combination of actions will be rejected or not, because it is not possible for it to calculate how much space these actions will take in the internal representation without knowing kernel internals. All that is causing random failures in ovs-vswitchd in userspace and inability to handle certain traffic patterns as a result. For example, it is reported that adding a bit more than a 1100 VMs in an OpenStack setup breaks the network due to OVS not being able to handle ARP traffic anymore in some cases (it tries to install a proper datapath flow, but the kernel rejects it with -EMSGSIZE, even though the action list isn't actually that large.) Kernel behavior must be consistent and predictable in order for the userspace application to use it in a reasonable way. ovs-vswitchd has a mechanism to re-direct parts of the traffic and partially handle it in userspace if the required action list is oversized, but that doesn't work properly if we can't actually tell if the action list is oversized or not. Solution for this is to check the size of the user-provided actions instead of the internal representation. This commit just removes the check from the internal part because there is already an implicit size check imposed by the netlink protocol. The attribute can't be larger than 64 KB. Realistically, we could reduce the limit to 32 KB, but we'll be risking to break some existing setups that rely on the fact that it's possible to create nearly 64 KB action lists today. Vast majority of flows in real setups are below 100-ish bytes. So removal of the limit will not change real memory consumption on the system. The absolutely worst case scenario is if someone adds a flow with 64 KB of empty clone() actions. That will yield a 192 KB in the internal representation consuming 256 KB block of memory. However, that list of actions is not meaningful and also a no-op. Real world very large action lists (that can occur for a rare cases of BUM traffic handling) are unlikely to contain a large number of clones and will likely have a lot of tunnel attributes making the internal representation comparable in size to the original action list. So, it should be fine to just remove the limit. Commit in the 'Fixes' tag is the first one that introduced the difference between internal representation and the user-provided action lists, but there were many more afterwards that lead to the situation we have today. Fixes: 7d5437c709de ("openvswitch: Add tunneling interface.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250308004609.2881861-1-i.maximets@ovn.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge branch 'gre-fix-regressions-in-ipv6-link-local-address-generation'	Paolo Abeni
	Guillaume Nault says: ==================== gre: Fix regressions in IPv6 link-local address generation. IPv6 link-local address generation has some special cases for GRE devices. This has led to several regressions in the past, and some of them are still not fixed. This series fixes the remaining problems, like the ipv6.conf.<dev>.addr_gen_mode sysctl being ignored and the router discovery process not being started (see details in patch 1). To avoid any further regressions, patch 2 adds selftests covering IPv4 and IPv6 gre/gretap devices with all combinations of currently supported addr_gen_mode values. ==================== Link: https://patch.msgid.link/cover.1741375285.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	selftests: Add IPv6 link-local address generation tests for GRE devices.	Guillaume Nault
	GRE devices have their special code for IPv6 link-local address generation that has been the source of several regressions in the past. Add selftest to check that all gre, ip6gre, gretap and ip6gretap get an IPv6 link-link local address in accordance with the net.ipv6.conf.<dev>.addr_gen_mode sysctl. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/2d6772af8e1da9016b2180ec3f8d9ee99f470c77.1741375285.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	gre: Fix IPv6 link-local address generation.	Guillaume Nault
	Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE devices in most cases and fall back to using add_v4_addrs() only in case the GRE configuration is incompatible with addrconf_addr_gen(). GRE used to use addrconf_addr_gen() until commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") restricted this use to gretap and ip6gretap devices, and created add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones. The original problem came when commit 9af28511be10 ("addrconf: refuse isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its addr parameter was 0. The commit says that this would create an invalid address, however, I couldn't find any RFC saying that the generated interface identifier would be wrong. Anyway, since gre over IPv4 devices pass their local tunnel address to __ipv6_isatap_ifid(), that commit broke their IPv6 link-local address generation when the local address was unspecified. Then commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") tried to fix that case by defining add_v4_addrs() and calling it to generate the IPv6 link-local address instead of using addrconf_addr_gen() (apart for gretap and ip6gretap devices, which would still use the regular addrconf_addr_gen(), since they have a MAC address). That broke several use cases because add_v4_addrs() isn't properly integrated into the rest of IPv6 Neighbor Discovery code. Several of these shortcomings have been fixed over time, but add_v4_addrs() remains broken on several aspects. In particular, it doesn't send any Router Sollicitations, so the SLAAC process doesn't start until the interface receives a Router Advertisement. Also, add_v4_addrs() mostly ignores the address generation mode of the interface (/proc/sys/net/ipv6/conf//addr_gen_mode), thus breaking the IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases. Fix the situation by using add_v4_addrs() only in the specific scenario where the normal method would fail. That is, for interfaces that have all of the following characteristics: run over IPv4, * transport IP packets directly, not Ethernet (that is, not gretap interfaces), * tunnel endpoint is INADDR_ANY (that is, 0), * device address generation mode is EUI64. In all other cases, revert back to the regular addrconf_addr_gen(). Also, remove the special case for ip6gre interfaces in add_v4_addrs(), since ip6gre devices now always use addrconf_addr_gen() instead. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/559c32ce5c9976b269e6337ac9abb6a96abe5096.1741375285.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	netfilter: nft_exthdr: fix offset with ipv4_find_option()	Alexey Kashavkin
	There is an incorrect calculation in the offset variable which causes the nft_skb_copy_to_reg() function to always return -EFAULT. Adding the start variable is redundant. In the __ip_options_compile() function the correct offset is specified when finding the function. There is no need to add the size of the iphdr structure to the offset. Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Alexey Kashavkin <akashavkin@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-13	fs: use debug-only asserts around fd allocation and install	Mateusz Guzik
	This also restores the check which got removed in 52732bb9abc9ee5b ("fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()") for performance reasons -- they no longer apply with a debug-only variant. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250312161941.1261615-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-13	Merge tag 'intel-gpio-v6.15-1' of ↵	Bartosz Golaszewski
	git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-next intel-gpio for v6.15-1 * A cleanup to remove unneeded ERR_CAST() in GPIO ACPI library The following is an automated git shortlog grouped by driver: gpiolib-acpi: - Drop unneeded ERR_CAST() in __acpi_find_gpio()
2025-03-13	reset: mchp: sparx5: Fix for lan966x	Horatiu Vultur
	With the blamed commit it seems that lan966x doesn't seem to boot anymore when the internal CPU is used. The reason seems to be the usage of the devm_of_iomap, if we replace this with devm_ioremap, this seems to fix the issue as we use the same region also for other devices. Fixes: 0426a920d6269c ("reset: mchp: sparx5: Map cpu-syscon locally in case of LAN966x") Reviewed-by: Herve Codina <herve.codina@bootlin.com> Tested-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/20250227105502.25125-1-horatiu.vultur@microchip.com Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-03-13	drm/sched: Fix fence reference count leak	qianyi liu
	The last_scheduled fence leaks when an entity is being killed and adding the cleanup callback fails. Decrement the reference count of prev when dma_fence_add_callback() fails, ensuring proper balance. Cc: stable@vger.kernel.org # v6.2+ [phasta: add git tag info for stable kernel] Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: qianyi liu <liuqianyi125@gmail.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250311060251.4041101-1-liuqianyi125@gmail.com
2025-03-13	gpio: cdev: use raw notifier for line state events	Bartosz Golaszewski
	We use a notifier to implement the mechanism of informing the user-space about changes in GPIO line status. We register with the notifier when the GPIO character device file is opened and unregister when the last reference to the associated file descriptor is dropped. Since commit fcc8b637c542 ("gpiolib: switch the line state notifier to atomic") we use the atomic notifier variant. Atomic notifiers call rcu_synchronize in atomic_notifier_chain_unregister() which caused a significant performance regression in some circumstances, observed by user-space when calling close() on the GPIO device file descriptor. Replace the atomic notifier with the raw variant and provide synchronization with a read-write spinlock. Fixes: fcc8b637c542 ("gpiolib: switch the line state notifier to atomic") Reported-by: David Jander <david@protonic.nl> Closes: https://lore.kernel.org/all/20250311110034.53959031@erd003.prtnl/ Tested-by: David Jander <david@protonic.nl> Tested-by: Kent Gibson <warthog618@gmail.com> Link: https://lore.kernel.org/r/20250311-gpiolib-line-state-raw-notifier-v2-1-138374581e1e@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-13	gpiolib: don't check the retval of get_direction() when registering a chip	Bartosz Golaszewski
	During chip registration we should neither check the return value of gc->get_direction() nor hold the SRCU lock when calling it. The former is because pin controllers may have pins set to alternate functions and return errors from their get_direction() callbacks. That's alright - we should default to the safe INPUT state and not bail-out. The latter is not needed because we haven't registered the chip yet so there's nothing to protect against dynamic removal. In fact: we currently hit a lockdep splat. Revert to calling the gc->get_direction() callback directly and not checking its value. Fixes: 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/all/81f890fc-6688-42f0-9756-567efc8bb97a@samsung.com/ Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250226-retval-fixes-v2-1-c8dc57182441@linaro.org Tested-by: Gene C <arch@sapience.com> Link: https://lore.kernel.org/r/20250311175631.83779-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-13	timer_list: Don't use %pK through printk()	Thomas Weißschuh
	This reverts commit f590308536db ("timer debug: Hide kernel addresses via %pK in /proc/timer_list") The timer list helper SEQ_printf() uses either the real seq_printf() for procfs output or vprintk() to print to the kernel log, when invoked from SysRq-q. It uses %pK for printing pointers. In the past %pK was prefered over %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping looks in atomic contexts. Switch to the regular pointer formatting which is safer, easier to reason about and sufficient here. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Link: https://lore.kernel.org/all/20250311-restricted-pointers-timer-v1-1-6626b91e54ab@linutronix.de
2025-03-13	Merge tag 'asoc-fix-v6.14-rc6' of ↵	Takashi Iwai
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.14 The bulk of this is driver specific fixes, mostly unremarkable. There's also one core fix from Charles, fixing up confusion around the limiting of maximum control values.
2025-03-13	bcachefs: fix tiny leak in bch2_dev_add()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-12	riscv: fix test_and_{set,clear}_bit ordering documentation	Ignacio Encinas
	test_and_{set,clear}_bit are fully ordered as specified in Documentation/atomic_bitops.txt. Fix incorrect comment stating otherwise. Note that the implementation is correct since commit 9347ce54cd69 ("RISC-V: __test_and_op_bit_ord should be strongly ordered") was introduced. Signed-off-by: Ignacio Encinas <ignacio@iencinas.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-03-12	Documentation: dma-buf: heaps: Add heap name definitions	Maxime Ripard
	Following a recent discussion at last Plumbers, John Stultz, Sumit Sewal, TJ Mercier and I came to an agreement that we should document what the dma-buf heaps names are expected to be, and what the buffers attributes you'll get should be documented. Let's create that doc to make sure those attributes and names are guaranteed going forward. Reviewed-by: T.J. Mercier <tjmercier@google.com> Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250306135114.1943738-1-mripard@kernel.org
2025-03-12	LoongArch: Enable jump table for objtool	Tiezhu Yang
	For now, it is time to remove -fno-jump-tables to enable jump table for objtool if the compiler has -mannotate-tablejump, otherwise it is better to remain -fno-jump-tables to keep compatibility with older compilers. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20241217010905.13054-8-yangtiezhu@loongson.cn Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-03-12	objtool/LoongArch: Add support for goto table	Tiezhu Yang
	The objtool program need to analysis the control flow of each object file generated by compiler toolchain, it needs to know all the locations that a branch instruction may jump into, if a jump table is used, objtool has to correlate the jump instruction with the table. On x86 (which is the only port supported by objtool before LoongArch), there is a relocation type on the jump instruction and directly points to the table. But on LoongArch, the relocation is on another kind of instruction prior to the jump instruction, and also with scheduling it is not very easy to tell the offset of that instruction from the jump instruction. Furthermore, because LoongArch has -fsection-anchors (often enabled at -O1 or above) the relocation may actually points to a section anchor instead of the table itself. For the jump table of switch cases, a GCC patch "LoongArch: Add support to annotate tablejump" and a Clang patch "[LoongArch] Add options for annotate tablejump" have been merged into the upstream mainline, it can parse the additional section ".discard.tablejump_annotate" which stores the jump info as pairs of addresses, each pair contains the address of jump instruction and the address of jump table. For the jump table of computed gotos, it is indeed not easy to implement in the compiler, especially if there is more than one computed goto in a function such as ___bpf_prog_run(). objdump kernel/bpf/core.o shows that there are many table jump instructions in ___bpf_prog_run(), but there are no relocations on the table jump instructions and to the table directly on LoongArch. Without the help of compiler, in order to figure out the address of goto table for the special case of ___bpf_prog_run(), since the instruction sequence is relatively single and stable, it makes sense to add a helper find_reloc_of_rodata_c_jump_table() to find the relocation which points to the section ".rodata..c_jump_table". If find_reloc_by_table_annotate() failed, it means there is no relocation info of switch table address in ".rela.discard.tablejump_annotate", then objtool may find the relocation info of goto table ".rodata..c_jump_table" with find_reloc_of_rodata_c_jump_table(). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250211115016.26913-6-yangtiezhu@loongson.cn Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-03-12	objtool/LoongArch: Add support for switch table	Tiezhu Yang
	The objtool program need to analysis the control flow of each object file generated by compiler toolchain, it needs to know all the locations that a branch instruction may jump into, if a jump table is used, objtool has to correlate the jump instruction with the table. On x86 (which is the only port supported by objtool before LoongArch), there is a relocation type on the jump instruction and directly points to the table. But on LoongArch, the relocation is on another kind of instruction prior to the jump instruction, and also with scheduling it is not very easy to tell the offset of that instruction from the jump instruction. Furthermore, because LoongArch has -fsection-anchors (often enabled at -O1 or above) the relocation may actually points to a section anchor instead of the table itself. The good news is that after continuous analysis and discussion, at last a GCC patch "LoongArch: Add support to annotate tablejump" and a Clang patch "[LoongArch] Add options for annotate tablejump" have been merged into the upstream mainline, the compiler changes make life much easier for switch table support of objtool on LoongArch. By now, there is an additional section ".discard.tablejump_annotate" to store the jump info as pairs of addresses, each pair contains the address of jump instruction and the address of jump table. In order to find switch table, it is easy to parse the relocation section ".rela.discard.tablejump_annotate" to get table_sec and table_offset, the rest process is somehow like x86. Additionally, it needs to get each table size. When compiling on LoongArch, there are unsorted table offsets of rodata if there exist many jump tables, it will get the wrong table end and find the wrong table jump destination instructions in add_jump_table(). Sort the rodata table offset by parsing ".rela.discard.tablejump_annotate" and then get each table size of rodata corresponded with each table jump instruction, it is used to check the table end and will break the process when parsing ".rela.rodata" to avoid getting the wrong jump destination instructions. Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0ee028f55640 Link: https://github.com/llvm/llvm-project/commit/4c2c17756739 Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250211115016.26913-5-yangtiezhu@loongson.cn Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-03-12	objtool: Handle PC relative relocation type	Tiezhu Yang
	For the most part, an absolute relocation type is used for rodata. In the case of STT_SECTION, reloc->sym->offset is always zero, for the other symbol types, reloc_addend(reloc) is always zero, thus it can use a simple statement "reloc->sym->offset + reloc_addend(reloc)" to obtain the symbol offset for various symbol types. When compiling on LoongArch, there exist PC relative relocation types for rodata, it needs to calculate the symbol offset with "S + A - PC" according to the spec of "ELF for the LoongArch Architecture". If there is only one jump table in the rodata, the "PC" is the entry address which is equal with the value of reloc_offset(reloc), at this time, reloc_offset(table) is 0. If there are many jump tables in the rodata, the "PC" is the offset of the jump table's base address which is equal with the value of reloc_offset(reloc) - reloc_offset(table). So for LoongArch, if the relocation type is PC relative, it can use a statement "reloc_offset(reloc) - reloc_offset(table)" to get the "PC" value when calculating the symbol offset with "S + A - PC" for one or many jump tables in the rodata. Add an arch-specific function arch_jump_table_sym_offset() to assign the symbol offset, for the most part that is an absolute relocation, the default value is "reloc->sym->offset + reloc_addend(reloc)" in the weak definition, it can be overridden by each architecture that has different requirements. Link: https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250211115016.26913-4-yangtiezhu@loongson.cn Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-03-12	objtool: Handle different entry size of rodata	Tiezhu Yang
	In the most cases, the entry size of rodata is 8 bytes because the relocation type is 64 bit. There are also 32 bit relocation types, the entry size of rodata should be 4 bytes in this case. Add an arch-specific function arch_reloc_size() to assign the entry size of rodata for x86, powerpc and LoongArch. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250211115016.26913-3-yangtiezhu@loongson.cn Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-03-12	objtool: Handle various symbol types of rodata	Tiezhu Yang
	In the relocation section ".rela.rodata" of each .o file compiled with LoongArch toolchain, there are various symbol types such as STT_NOTYPE, STT_OBJECT, STT_FUNC in addition to the usual STT_SECTION, it needs to use reloc symbol offset instead of reloc addend to find the destination instruction in find_jump_table() and add_jump_table(). For the most part, an absolute relocation type is used for rodata. In the case of STT_SECTION, reloc->sym->offset is always zero, and for the other symbol types, reloc_addend(reloc) is always zero, thus it can use a simple statement "reloc->sym->offset + reloc_addend(reloc)" to obtain the symbol offset for various symbol types. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250211115016.26913-2-yangtiezhu@loongson.cn Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-03-12	objtool: Hide unnecessary compiler error message	David Engraf
	The check for using old libelf prints an error message when libelf.h is not available but does not abort. This may confuse so hide the compiler error message. Signed-off-by: David Engraf <david.engraf@sysgo.com> Link: https://lore.kernel.org/r/20250203073610.206000-1-david.engraf@sysgo.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-03-12	docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for ↵	Akira Yokosawa
	cross-ref of README Commit fb12098d8ee4 ("docs: submit-checklist: Allow creating cross-references for ABI README") assumes that the path of "Documentation/ABI/README" would be converted to a cross-ref to the README. However, as the README is included by the "kernel-abi" directive at Documentation/admin-guide/abi.rst, the expected conversion does not happen. Instead, use the path where the "kernel-abi" directive exists for the conversion to work. Restore the original path of README in inline-literal form as an additional note for readers of the .rst file. Apply the same changes for translations. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Fixes: fb12098d8ee4 ("docs: submit-checklist: Allow creating cross-references for ABI README") Fixes: eb0c714120ba ("docs: translations: Allow creating cross-references for ABI README") Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Yanteng Si <si.yanteng@linux.dev> Cc: Dongliang Mu <dzm91@hust.edu.cn> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Federico Vaga <federico.vaga@vaga.pv.it> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250304075734.56660-1-akiyks@gmail.com
2025-03-12	docs: Correct installation instruction	I Hsin Cheng
	Ammend missing "install" operation keyword after "apt-get", and fix "build-essentials" to "build-essential". Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250306030708.8133-1-richard120310@gmail.com
2025-03-12	Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst	Ignacio Encinas
	Make the URL point to the "Plain Accesses and Data Races" section again and prevent it from becoming stale by adding a commit id to it. Signed-off-by: Ignacio Encinas <ignacio@iencinas.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250306-fix-plain-access-url-v1-1-9c653800f9e0@iencinas.com
2025-03-12	Documentation/CoC: Spell out the TAB role in enforcement decisions	Shuah Khan
	Updates the document to clearly describe the scope and role the TAB plays in making decisions on unresolved violations. If and when the CoC has to make a call on instituting a ban, it doesn't act without the TAB's approval and only when the TAB approves it with 2/3 vote in favor of the measure. These changes ensure that the TAB role and its oversight on CoC measures is consistently described throughout the document. Fixes: c818d5c64c9a8cc1 ("Documentation/CoC: spell out enforcement for unacceptable behaviors") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Shuah Khan <shuah@kernel.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250306211231.13154-1-shuah@kernel.org
2025-03-12	Documentation: ocxl.rst: Update consortium site	Fritz Koenig
	Point to post-merger site. Signed-off-by: Fritz Koenig <frkoenig@chromium.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250312-dead_site-v2-1-920a313743ee@chromium.org
2025-03-12	scripts: get_feat.pl: substitute s390x with s390	Heiko Carstens
	Both get_feat.pl and list-arch.sh use uname -m to get the machine hardware name to figure out the current architecture if no architecture is specified with a command line option. This doesn't work for s390, since for 64 bit kernels the hardware name is s390x, while the architecture name within the kernel, as well as in all feature files is s390. Therefore substitute s390x with s390 similar to what is already done for x86_64 and i386. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250312155219.3597768-1-hca@linux.ibm.com
2025-03-12	Merge tag 'sched_ext-for-6.14-rc6-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fix from Tejun Heo: "BPF schedulers could trigger a crash by passing in an invalid CPU to the scx_bpf_select_cpu_dfl() helper. Fix it by verifying input validity" * tag 'sched_ext-for-6.14-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
2025-03-12	Merge tag 'spi-fix-v6.14-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of driver specific fixes, an error handling fix for the Atmel QuadSPI driver and a fix for a nasty synchronisation issue in the data path for the Microchip driver which affects larger transfers. There's also a MAINTAINERS update for the Samsung driver" * tag 'spi-fix-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: microchip-core: prevent RX overflows when transmit size > FIFO size MAINTAINERS: add tambarus as R for Samsung SPI spi: atmel-quadspi: remove references to runtime PM on error path
2025-03-12	KVM: arm64: selftests: Test that TGRAN*_2 fields are writable	Sebastian Ott
	Userspace can write to these fields for non-NV guests; add test that do just that. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/kvmarm/20250306184013.30008-1-sebott@redhat.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-12	KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2	Sebastian Ott
	Allow userspace to write the safe (NI) value for ID_AA64MMFR0_EL1.TGRAN*_2. Disallow to change these fields for NV since kvm provides a sanitized view for them based on the PAGE_SIZE. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/kvmarm/20250306184013.30008-1-sebott@redhat.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-12	cpuidle: Init cpuidle only for present CPUs	Jacky Bai
	for_each_possible_cpu() is currently used to initialize cpuidle in below cpuidle drivers: drivers/cpuidle/cpuidle-arm.c drivers/cpuidle/cpuidle-big_little.c drivers/cpuidle/cpuidle-psci.c drivers/cpuidle/cpuidle-qcom-spm.c drivers/cpuidle/cpuidle-riscv-sbi.c However, in cpu_dev_register_generic(), for_each_present_cpu() is used to register CPU devices which means the CPU devices are only registered for present CPUs and not all possible CPUs. With nosmp or maxcpus=0, only the boot CPU is present, lead to the failure: \| Failed to register cpuidle device for cpu1 Then rollback to cancel all CPUs' cpuidle registration. Change for_each_possible_cpu() to for_each_present_cpu() in the above cpuidle drivers to ensure it only registers cpuidle devices for CPUs that are actually present. Fixes: b0c69e1214bc ("drivers: base: Use present CPUs in GENERIC_CPU_DEVICES") Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Yuanjie Yang <quic_yuanjiey@quicinc.com> Signed-off-by: Jacky Bai <ping.bai@nxp.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20250307145547.2784821-1-ping.bai@nxp.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12	PM: clk: Remove unused pm_clk_remove()	Dr. David Alan Gilbert
	pm_clk_remove() is currently unused. It hasn't been used since at least 2011 when it was renamed from pm_runtime_clk_remove() by commit 3d5c30367cbc ("PM: Rename clock management functions") Remove it. Note that the __pm_clk_remove() is still used and is left in. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250307212347.68785-1-linux@treblig.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12	thermal: core: Delay exposing sysfs interface	Lucas De Marchi
	There's a race between initializing the governor and userspace accessing the sysfs interface. From time to time the Intel graphics CI shows this signature: <1>[] #PF: error_code(0x0000) - not-present page <6>[] PGD 0 P4D 0 <4>[] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI <4>[] CPU: 3 UID: 0 PID: 562 Comm: thermald Not tainted 6.14.0-rc4-CI_DRM_16208-g7e37396f86d8+ #1 <4>[] Hardware name: Intel Corporation Twin Lake Client Platform/AlderLake-N LP5 RVP, BIOS TWLNFWI1.R00.5222.A01.2405290634 05/29/2024 <4>[] RIP: 0010:policy_show+0x1a/0x40 thermald tries to read the policy file between the sysfs files being created and the governor set by thermal_set_governor(), which causes the NULL pointer dereference. Similarly to the hwmon interface, delay exposing the sysfs files to when the governor is already set. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13655 Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20250307-thermal-sysfs-race-v1-1-8a3d4d4ac9c4@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12	kunit/fortify: Replace "volatile" with OPTIMIZER_HIDE_VAR()	Kees Cook
	It does seem that using "volatile" isn't going to be sane compared to using OPTIMIZER_HIDE_VAR() going forward. Some strange interactions[1] with the sanitizers have been observed in the self-test code, so replace the logic. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/2075 [1] Link: https://lore.kernel.org/r/20250312000439.work.112-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-12	kunit/fortify: Expand testing of __compiletime_strlen()	Kees Cook
	It seems that Clang thinks __builtin_constant_p() of undefined variables should return true[1]. This is being fixed separately[2], but in the meantime, expand the fortify tests to help track this kind of thing down faster in the future. Link: https://github.com/ClangBuiltLinux/linux/issues/2073 [1] Link: https://github.com/llvm/llvm-project/pull/130713 [2] Link: https://lore.kernel.org/r/20250312000349.work.786-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-12	compiler_types: Introduce __nonstring_array	Kees Cook
	GCC has expanded support of the "nonstring" attribute so that it can be applied to arrays of character arrays[1], which is needed to identify correct static initialization of those kinds of objects. Since this was not supported prior to GCC 15, we need to distinguish the usage of Linux's existing __nonstring macro for the attribute for non-multi-dimensional char arrays. Until GCC 15 is the minimum version, use __nonstring_array to mark arrays of non-string character arrays. (Regular non-string character arrays can continue to use __nonstring.) Once GCC 15 is the minimum compiler version we can replace all uses of __nonstring_array with just __nonstring and remove this macro. This allows for changes like this: -static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst = { +static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst = { ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, Which will silence the coming -Wunterminated-string-initialization warnings in GCC 15: In file included from ../include/acpi/actbl.h:371, from ../include/acpi/acpi.h:26, from ../include/linux/acpi.h:26, from ../drivers/acpi/tables.c:19: ../include/acpi/actbl1.h:30:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization] 30 \| #define ACPI_SIG_BERT "BERT" /* Boot Error Record Table / \| ^~~~~~ ../drivers/acpi/tables.c:400:9: note: in expansion of macro 'ACPI_SIG_BERT' 400 \| ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, \| ^~~~~~~~~~~~~ ../include/acpi/actbl1.h:31:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization] 31 \| #define ACPI_SIG_BGRT "BGRT" / Boot Graphics Resource Table */ \| ^~~~~~ Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1] Acked-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20250310214244.work.194-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-12	PM: sleep: core: Fix indentation in dpm_wait_for_children()	Geert Uytterhoeven
	The body of dpm_wait_for_children() is indented by 7 spaces instead of a single TAB. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/9c8ff2b103c3ba7b0d27bdc8248b05e3b1dc9551.1741776430.git.geert+renesas@glider.be Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12	PM: s2idle: Extend comment in s2idle_enter()	Ulf Hansson
	The s2idle_lock must be held while checking for a pending wakeup and while moving into S2IDLE_STATE_ENTER, to make sure a wakeup doesn't get lost. Let's extend the comment in the code to make this clear. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20250311160827.1129643-3-ulf.hansson@linaro.org [ rjw: Rewrote the new comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12	PM: s2idle: Drop redundant locks when entering s2idle	Ulf Hansson
	The calls to cpus_read_lock\|unlock() protects us from getting CPUS hotplugged, while entering suspend-to-idle. However, when s2idle_enter() is called we should be far beyond the point when CPUs may be hotplugged. Let's therefore simplify the code and drop the use of the lock. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20250311160827.1129643-2-ulf.hansson@linaro.org [ rjw: Rewrote the new comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12	block: fix adding folio to bio	Ming Lei
	>4GB folio is possible on some ARCHs, such as aarch64, 16GB hugepage is supported, then 'offset' of folio can't be held in 'unsigned int', cause warning in bio_add_folio_nofail() and IO failure. Fix it by adjusting 'page' & trimming 'offset' so that `->bi_offset` won't be overflow, and folio can be added to bio successfully. Fixes: ed9832bc08db ("block: introduce folio awareness and add a bigger size from folio") Cc: Kundan Kumar <kundan.kumar@samsung.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20250312145136.2891229-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>