linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-09-11	io_uring/cmd: expose iowq to cmds	Pavel Begunkov
	When an io_uring request needs blocking context we offload it to the io_uring's thread pool called io-wq. We can get there off ->uring_cmd by returning -EAGAIN, but there is no straightforward way of doing that from an asynchronous callback. Add a helper that would transfer a command to a blocking context. Note, we do an extra hop via task_work before io_queue_iowq(), that's a limitation of io_uring infra we have that can likely be lifted later if that would ever become a problem. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f735f807d7c8ba50c9452c69dfe5d3e9e535037b.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11	Merge branch 'for-6.12/io_uring' into for-6.12/io_uring-discard	Jens Axboe
	* for-6.12/io_uring: (31 commits) io_uring/io-wq: inherit cpuset of cgroup in io worker io_uring/io-wq: do not allow pinning outside of cpuset io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common() io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN io_uring/sqpoll: do not allow pinning outside of cpuset io_uring/eventfd: move refs to refcount_t io_uring: remove unused rsrc_put_fn io_uring: add new line after variable declaration io_uring: add GCOV_PROFILE_URING Kconfig option io_uring/kbuf: add support for incremental buffer consumption io_uring/kbuf: pass in 'len' argument for buffer commit Revert "io_uring: Require zeroed sqe->len on provided-buffers send" io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h io_uring/kbuf: add io_kbuf_commit() helper io_uring/kbuf: shrink nr_iovs/mode in struct buf_sel_arg io_uring: wire up min batch wake timeout io_uring: add support for batch wait timeout io_uring: implement our own schedule timeout handling io_uring: move schedule wait logic into helper io_uring: encapsulate extraneous wait flags into a separate struct ...
2024-09-11	Merge branch 'for-6.12/block' into for-6.12/io_uring-discard	Jens Axboe
	* for-6.12/block: (115 commits) block: unpin user pages belonging to a folio at once mm: release number of pages of a folio block: introduce folio awareness and add a bigger size from folio block: Added folio-ized version of bio_add_hw_page() block, bfq: factor out a helper to split bfqq in bfq_init_rq() block, bfq: remove local variable 'bfqq_already_existing' in bfq_init_rq() block, bfq: remove local variable 'split' in bfq_init_rq() block, bfq: remove bfq_log_bfqg() block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator() block, bfq: fix procress reference leakage for bfqq in merge chain block, bfq: fix uaf for accessing waker_bfqq after splitting blk-throttle: support prioritized processing of metadata blk-throttle: remove last_low_overflow_time drbd: Add NULL check for net_conf to prevent dereference in state validation blk-mq: add missing unplug trace event mtip32xx: Remove redundant null pointer checks in mtip_hw_debugfs_init() md: Add new_level sysfs interface zram: Shrink zram_table_entry::flags. zram: Remove ZRAM_LOCK zram: Replace bit spinlocks with a spinlock_t. ...
2024-09-11	Merge branch 'pm-cpufreq'	Rafael J. Wysocki
	Merge cpufreq updates for 6.12-rc1: - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef). - Add support for Granite Rapids and Sierra Forest in OOB mode to the intel_pstate cpufreq driver (Srinivas Pandruvada). - Add basic support for CPU capacity scaling on x86 and make the intel_pstate driver set asymmetric CPU capacity on hybrid systems without SMT (Rafael Wysocki). - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq driver (Jeff Johnson). - Several OF related cleanups in cpufreq drivers (Rob Herring). - Enable COMPILE_TEST for ARM drivers (Rob Herrring). - Introduce quirks for syscon failures and use socinfo to get revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon). - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay Ugwekar). - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers (Danila Tikhonov, Huacai Chen, and Liu Jing). - Make amd-pstate validate return of any attempt to update EPP limits, which fixes the masking hardware problems (Mario Limonciello). - Move the calculation of the AMD boost numerator outside of amd-pstate, correcting acpi-cpufreq on systems with preferred cores (Mario Limonciello). - Harden preferred core detection in amd-pstate to avoid potential false positives (Mario Limonciello). - Add extra unit test coverage for mode state machine (Mario Limonciello). - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu). * pm-cpufreq: (35 commits) cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue cpufreq/amd-pstate-ut: Add test case for mode switches cpufreq/amd-pstate: Export symbols for changing modes amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking` cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore` cpufreq: amd-pstate: Optimize amd_pstate_update_limits() cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator() x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator() x86/amd: Move amd_get_highest_perf() out of amd-pstate ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn ACPI: CPPC: Drop check for non zero perf ratio x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator() ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c cpufreq/amd-pstate: Catch failures for amd_pstate_epp_update_limit() cpufreq: ti-cpufreq: Use socinfo to get revision in AM62 family cpufreq: Fix the cacography in powernv-cpufreq.c cpufreq: ti-cpufreq: Introduce quirks to handle syscon fails appropriately cpufreq: loongson3: Use raw_smp_processor_id() in do_service_request() cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value ...
2024-09-11	Merge tag 'amd-pstate-v6.12-2024-09-11' of ↵	Rafael J. Wysocki
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge the second round of amd-pstate changes for 6.12 from Mario Limonciello: "* Move the calculation of the AMD boost numerator outside of amd-pstate, correcting acpi-cpufreq on systems with preferred cores * Harden preferred core detection to avoid potential false positives * Add extra unit test coverage for mode state machine" * tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue cpufreq/amd-pstate-ut: Add test case for mode switches cpufreq/amd-pstate: Export symbols for changing modes amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking` cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore` cpufreq: amd-pstate: Optimize amd_pstate_update_limits() cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator() x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator() x86/amd: Move amd_get_highest_perf() out of amd-pstate ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn ACPI: CPPC: Drop check for non zero perf ratio x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator() ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
2024-09-11	tcp: Use skb__nullable in trace_tcp_send_reset	Philo Lu
	Replace skb with skb__nullable as the argument name. The suffix tells bpf verifier through btf that the arg could be NULL and should be checked in tp_btf prog. For now, this is the only nullable argument in tcp tracepoints. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240911033719.91468-4-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11	x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()	Mario Limonciello
	AMD systems that support preferred cores will use "166" as their numerator for max frequency calculations instead of "255". Add a function for detecting preferred cores by looking at the highest perf value on all cores. If preferred cores are enabled return 166 and if disabled the value in the highest perf register. As the function will be called multiple times, cache the values for the boost numerator and if preferred cores will be enabled in global variables. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11	x86/amd: Move amd_get_highest_perf() out of amd-pstate	Mario Limonciello
	amd_pstate_get_highest_perf() is a helper used to get the highest perf value on AMD systems. It's used in amd-pstate as part of preferred core handling, but applicable for acpi-cpufreq as well. Move it out to cppc handling code as amd_get_highest_perf(). Reviewed-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11	x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()	Mario Limonciello
	The function name is ambiguous because it returns an intermediate value for calculating maximum frequency rather than the CPPC 'Highest Perf' register. Rename the function to clarify its use and allow the function to return errors. Adjust the consumer in acpi-cpufreq to catch errors. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11	ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB	Mario Limonciello
	Checkpath emits the following warning: ``` WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP ``` Adjust the code accordingly. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11	ALSA: pcm: Fix breakage of PCM rates used for topology	Takashi Iwai
	It turned out that the topology ABI takes the standard PCM rate bits as is, and it means that the recent change of the PCM rate bits would lead to the inconsistent rate values used for topology. This patch reverts the original PCM rate bit definitions while adding the new rates to the extended bits instead. This needed the change of snd_pcm_known_rates, too. And this also required to fix the handling in snd_pcm_hw_limit_rates() that blindly assumed that the list is sorted while it became unsorted now. Fixes: 090624b7dc83 ("ALSA: pcm: add more sample rate definitions") Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Closes: https://lore.kernel.org/1ab3efaa-863c-4dd0-8f81-b50fd9775fad@linux.intel.com Reviewed-by: Jaroslav Kysela <perex@perex.cz> Tested-by: Jerome Brunet <jbrunet@baylibre.com> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20240911135756.24434-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-09-11	tty: serial: samsung: Fix serial rx on Apple A7-A9	Nick Chan
	Apple's older A7-A9 SoCs seems to use bit 3 in UTRSTAT as RXTO, which is enabled by bit 11 in UCON. Access these bits in addition to the original RXTO and RXTO enable bits, to allow serial rx to function on A7-A9 SoCs. This change does not appear to affect the A10 SoC and up. Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Nick Chan <towinchenmi@gmail.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20240911050741.14477-4-towinchenmi@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-11	tty: serial: samsung: Use bit manipulation macros for APPLE_S5L_*	Nick Chan
	New entries using BIT() will be added soon, so change the existing ones to use bit manipulation macros including BIT() and GENMASK() for consistency. Suggested-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Nick Chan <towinchenmi@gmail.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20240911050741.14477-2-towinchenmi@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-11	soc: qcom: geni-se: add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers	Douglas Anderson
	For UART devices the M_GP_LENGTH is the TX word count. For other devices this is the transaction word count. For UART devices the S_GP_LENGTH is the RX word count. The IRQ_EN set/clear registers allow you to set or clear bits in the IRQ_EN register without needing a read-modify-write. Acked-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20240610152420.v4.1.Ife7ced506aef1be3158712aa3ff34a006b973559@changeid Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20240906131336.23625-4-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-11	mm: release number of pages of a folio	Kundan Kumar
	Add a new function unpin_user_folio() to put the refs of a folio by npages count. The check for BIO_PAGE_PINNED flag is removed as it is already checked in bio_release_pages(). Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240911064935.5630-4-kundan.kumar@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11	Merge tag 'usb-serial-6.12-rc1' of ↵	Greg Kroah-Hartman
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next Johan writes: USB-serial updates for 6.12-rc1 Here are the USB-serial updates for 6.12-rc1, including: - fix kobil_sct initial terminal settings - set driver owner when registering drivers All have been in linux-next with no reported issues. * tag 'usb-serial-6.12-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: kobil_sct: restore initial terminal settings USB: serial: drop driver owner initialization USB: serial: set driver owner when registering drivers
2024-09-11	platform/x86: intel_scu_wdt: Move intel_scu_wdt.h to x86 subfolder	Andy Shevchenko
	This is a platform/x86 library that can only be used on x86 devices. so it makes sense that it lives under the platform_data/x86/ directory instead. No functional changes intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240909124952.1152017-4-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-09-11	platform/x86: intel_scu_ipc: Move intel_scu_ipc.h out of arch/x86/include/asm	Mika Westerberg
	This is a platform/x86 library that is mostly being used by other drivers not directly under arch/x86 anyway (with the exception of the Intel MID setup code) so it makes sense that it lives under the platform_data/x86/ directory instead. No functional changes intended. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240909124952.1152017-3-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-09-11	RDMA/mlx5: Add new ODP memory scheme eqe format	Michael Guralnik
	Add new fields to support the new memory scheme page fault and extend the token field to u64 as in the new scheme the token is 48 bit. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-4-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11	net/mlx5: Expose HW bits for Memory scheme ODP	Michael Guralnik
	Expose IFC bits to support the new memory scheme on demand paging. Change the macro reading odp capabilities to be able to read from the new IFC layout and align the code in upper layers to be compiled. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-3-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11	net/mlx5: Expand mkey page size to support 6 bits	Michael Guralnik
	Protect the usage of the 6th bit with the relevant capability to ensure we are using the new page sizes with FW that supports the bit extension. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-2-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11	net: phylink: Add phylink_set_fixed_link() to configure fixed link state in ↵	Russell King
	phylink The function allows for the configuration of a fixed link state for a given phylink instance. This addition is particularly useful for network devices that operate with a fixed link configuration, where the link parameters do not change dynamically. By using `phylink_set_fixed_link()`, drivers can easily set up the fixed link state during initialization or configuration changes. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-11	sched/deadline: Clarify nanoseconds in uapi	Christian Loehle
	Specify the time values of the deadline parameters of deadline, runtime, and period as being in nanoseconds explicitly as they always have been. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20240813144348.1180344-3-christian.loehle@arm.com
2024-09-11	firmware: imx: remove duplicate scmi_imx_misc_ctrl_get()	Arnd Bergmann
	These two functions have a stub definition when CONFIG_IMX_SCMI_MISC_EXT is not set, which conflict with the global definition: In file included from drivers/firmware/imx/sm-misc.c:6: include/linux/firmware/imx/sm.h:30:1: error: expected identifier or '(' before '{' token 30 \| { \| ^ drivers/firmware/imx/sm-misc.c:26:5: error: redefinition of 'scmi_imx_misc_ctrl_get' 26 \| int scmi_imx_misc_ctrl_get(u32 id, u32 num, u32 val) \| ^~~~~~~~~~~~~~~~~~~~~~ include/linux/firmware/imx/sm.h:24:19: note: previous definition of 'scmi_imx_misc_ctrl_get' with type 'int(u32, u32 , u32 )' {aka 'int(unsigned int, unsigned int , unsigned int )'} 24 \| static inline int scmi_imx_misc_ctrl_get(u32 id, u32 num, u32 val) \| ^~~~~~~~~~~~~~~~~~~~~~ There is no real need for the #ifdef, and removing this avoids the build failure. Fixes: 0b4f8a68b292 ("firmware: imx: Add i.MX95 MISC driver") Link: https://lore.kernel.org/r/20240909203023.1275232-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-11	Merge v6.11-rc7 into drm-next	Simona Vetter
	Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if necessary") in drm-misc, so start the backmerge cascade. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2024-09-11	f2fs: get rid of online repaire on corrupted directory	Chao Yu
	syzbot reports a f2fs bug as below: kernel BUG at fs/f2fs/inode.c:896! RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896 Call Trace: evict+0x532/0x950 fs/inode.c:704 dispose_list fs/inode.c:747 [inline] evict_inodes+0x5f9/0x690 fs/inode.c:797 generic_shutdown_super+0x9d/0x2d0 fs/super.c:627 kill_block_super+0x44/0x90 fs/super.c:1696 kill_f2fs_super+0x344/0x690 fs/f2fs/super.c:4898 deactivate_locked_super+0xc4/0x130 fs/super.c:473 cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373 task_work_run+0x24f/0x310 kernel/task_work.c:228 ptrace_notify+0x2d2/0x380 kernel/signal.c:2402 ptrace_report_syscall include/linux/ptrace.h:415 [inline] ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline] syscall_exit_work+0xc6/0x190 kernel/entry/common.c:173 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline] syscall_exit_to_user_mode+0x279/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896 Online repaire on corrupted directory in f2fs_lookup() can generate dirty data/meta while racing w/ readonly remount, it may leave dirty inode after filesystem becomes readonly, however, checkpoint() will skips flushing dirty inode in a state of readonly mode, result in above panic. Let's get rid of online repaire in f2fs_lookup(), and leave the work to fsck.f2fs. Fixes: 510022a85839 ("f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries") Reported-by: syzbot+ebea2790904673d7c618@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000a7b20f061ff2d56a@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-10	Merge tag 'mlx5-updates-2024-09-02' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2024-08-29 HW-Managed Flow Steering in mlx5 driver Yevgeny Kliteynik says: ======================= 1. Overview ----------- ConnectX devices support packet matching, modification, and redirection. This functionality is referred as Flow Steering. To configure a steering rule, the rule is written to the device-owned memory. This memory is accessed and cached by the device when processing a packet. The first implementation of Flow Steering was done in FW, and it is referred in the mlx5 driver as Device-Managed Flow Steering (DMFS). Later we introduced SW-managed Flow Steering (SWS or SMFS), where the driver is writing directly to the device's configuration memory (ICM) through RC QP using RDMA operations (RDMA-read and RDAM-write), thus achieving higher rates of rule insertion/deletion. Now we introduce a new flow steering implementation: HW-Managed Flow Steering (HWS or HMFS). In this new approach, the driver is configuring steering rules directly to the HW using the WQs with a special new type of WQE. This way we can reach higher rule insertion/deletion rate with much lower CPU utilization compared to SWS. The key benefits of HWS as opposed to SWS: + HW manages the steering decision tree - HW calculates CRC for each entry - HW handles tree hash collisions - HW & FW manage objects refcount + HW keeps cache coherency: - HW provides tree access locking and synchronization - HW provides notification on completion + Insertion rate isn’t affected by background traffic - Dedicated HW components that handle insertion 2. Performance -------------- Measuring Connection Tracking with simple IPv4 flows w/o NAT, we are able to get ~5 times more flows offloaded per second using HWS. 3. Configuration ---------------- The enablement of HWS mode in eswitch manager is done using the same devlink param that is already used for switching between FW-managed steering and SW-managed steering modes: # devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs 4. Upstream Submission ---------------------- HWS support consists of 3 main components: + Steering: - The lower layer that exposes HWS API to upper layers and implements all the management of flow steering building blocks + FS-Core - Implementation of fs_hws layer to enable fs_core to use HWS instead of FW or SW steering - Create HW steering action pools to utilize the ability of HWS to share steering actions among different rules - Add support for configuring HWS mode through devlink command, similar to configuring SWS mode + Connection Tracking - Implementation of CT support for HW steering - Hooks up the CT ops for the new steering mode and uses the HWS API to implement connection tracking. Because of the large number of patches, we need to perform the submission in several separate patch series. This series is the first submission that lays the ground work for the next submissions, where an actual user of HWS will be added. 5. Patches in this series ------------------------- This patch series contains implementation of the first bullet from above. ======================= * tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: HWS, added API and enabled HWS support net/mlx5: HWS, added send engine and context handling net/mlx5: HWS, added debug dump and internal headers net/mlx5: HWS, added backward-compatible API handling net/mlx5: HWS, added memory management handling net/mlx5: HWS, added vport handling net/mlx5: HWS, added modify header pattern and args handling net/mlx5: HWS, added FW commands handling net/mlx5: HWS, added matchers functionality net/mlx5: HWS, added definers handling net/mlx5: HWS, added rules handling net/mlx5: HWS, added tables handling net/mlx5: HWS, added actions handling net/mlx5: Added missing definitions in preparation for HW Steering net/mlx5: Added missing mlx5_ifc definition for HW Steering ==================== Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10	Merge tag 'ipsec-next-2024-09-10' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-09-10 1) Remove an unneeded WARN_ON on packet offload. From Patrisious Haddad. 2) Add a copy from skb_seq_state to buffer function. This is needed for the upcomming IPTFS patchset. From Christian Hopps. 3) Spelling fix in xfrm.h. From Simon Horman. 4) Speed up xfrm policy insertions. From Florian Westphal. 5) Add and revert a patch to support xfrm interfaces for packet offload. This patch was just half cooked. 6) Extend usage of the new xfrm_policy_is_dead_or_sk helper. From Florian Westphal. 7) Update comments on sdb and xfrm_policy. From Florian Westphal. 8) Fix a null pointer dereference in the new policy insertion code From Florian Westphal. 9) Fix an uninitialized variable in the new policy insertion code. From Nathan Chancellor. * tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() xfrm: policy: fix null dereference Revert "xfrm: add SA information to the offloaded packet" xfrm: minor update to sdb and xfrm_policy comments xfrm: policy: use recently added helper in more places xfrm: add SA information to the offloaded packet xfrm: policy: remove remaining use of inexact list xfrm: switch migrate to xfrm_policy_lookup_bytype xfrm: policy: don't iterate inexact policies twice at insert time selftests: add xfrm policy insertion speed test script xfrm: Correct spelling in xfrm.h net: add copy from skb_seq_state to buffer function xfrm: Remove documentation WARN_ON to limit return values for offloaded SA ==================== Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10	PCI: Rename CRS Completion Status to RRS	Bjorn Helgaas
	PCIe r6.0 changed the abbreviation for "Configuration Request Retry Status" Completion Status from "CRS" to "RRS" and uses the terminology of "Configuration RRS Software Visibility" instead of "CRS Software Visibility". Align the Linux usage with the r6.0 spec language. No functional change intended. It's confusing to make this change, but I think "RRS" is a better abbreviation because it was easy to interpret "CRS" as "Completion Retry Status", which really didn't make any sense. Link: https://lore.kernel.org/r/20240827234848.4429-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-09-10	PCI: Wait for device readiness with Configuration RRS	Bjorn Helgaas
	After a device reset, delays are required before the device can successfully complete config accesses. PCIe r6.0, sec 6.6, specifies some delays required before software can perform config accesses. Devices that require more time after those delays may respond to config accesses with Configuration Request Retry Status (RRS) completions. Callers of pci_dev_wait() are responsible for delays until the device can respond to config accesses. pci_dev_wait() waits any additional time until the device can successfully complete config accesses. Reading config space of devices that are not present or not ready typically returns ~0 (PCI_ERROR_RESPONSE). Previously we polled the Command register until we got a value other than ~0. This is sometimes a problem because Root Complex handling of RRS completions may include several retries and implementation-specific behavior that is invisible to software (see sec 2.3.2), so the exponential backoff in pci_dev_wait() may not work as intended. Linux enables Configuration RRS Software Visibility on all Root Ports that support it. If it is enabled, read the Vendor ID instead of the Command register. RRS completions cause immediate return of the 0x0001 reserved Vendor ID value, so the pci_dev_wait() backoff works correctly. When a read of Vendor ID eventually completes successfully by returning a non-0x0001 value (the Vendor ID or 0xffff for VFs), the device should be initialized and ready to respond to config requests. For conventional PCI devices or devices below Root Ports that don't support Configuration RRS Software Visibility, poll the Command register as before. This was developed independently, but is very similar to Stanislav Spassov's previous work at https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com Link: https://lore.kernel.org/r/20240827234848.4429-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Duc Dang <ducdang@google.com>
2024-09-10	net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag	Jason Xing
	introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter out rx software timestamp report, especially after a process turns on netstamp_needed_key which can time stamp every incoming skb. Previously, we found out if an application starts first which turns on netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE could also get rx timestamp. Now we handle this case by introducing this new flag without breaking users. Quoting Willem to explain why we need the flag: "why a process would want to request software timestamp reporting, but not receive software timestamp generation. The only use I see is when the application does request SOF_TIMESTAMPING_SOFTWARE \| SOF_TIMESTAMPING_TX_SOFTWARE." Similarly, this new flag could also be used for hardware case where we can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive hardware receive timestamp. Another thing about errqueue in this patch I have a few words to say: In this case, we need to handle the egress path carefully, or else reporting the tx timestamp will fail. Egress path and ingress path will finally call sock_recv_timestamp(). We have to distinguish them. Errqueue is a good indicator to reflect the flow direction. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10	net: stmmac: move stmmac_fpe_cfg to stmmac_priv data	Furong Xu
	By moving the fpe_cfg field to the stmmac_priv data, stmmac_fpe_cfg becomes platform-data eventually, instead of a run-time config. Suggested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://patch.msgid.link/d9b3d7ecb308c5e39778a4c8ae9df288a2754379.1725631883.git.0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10	drm: new helper: drm_gem_prime_handle_to_dmabuf()	Al Viro
	Once something had been put into descriptor table, the only thing you can do with it is returning descriptor to userland - you can't withdraw it on subsequent failure exit, etc. You certainly can't count upon it staying in the same slot of descriptor table - another thread could've played with close(2)/dup2(2)/whatnot. drm_gem_prime_handle_to_fd() creates a dmabuf, allocates a descriptor and attaches dmabuf's file to it (the last two steps are done in dma_buf_fd()). That's nice when all you are going to do is passing a descriptor to userland. If you just need to work with the resulting object or have something else to be done that might fail, drm_gem_prime_handle_to_fd() is racy. The problem is analogous to one with anon_inode_getfd(), and solution is similar to what anon_inode_getfile() provides. Add drm_gem_prime_handle_to_dmabuf() - the "set dmabuf up" parts of drm_gem_prime_handle_to_fd() without the descriptor-related ones. Instead of inserting into descriptor table and returning the file descriptor it just returns the struct file. drm_gem_prime_handle_to_fd() becomes a wrapper for it. Other users will be introduced in the next commit. Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-10	Bluetooth: hci_core: Fix sending MGMT_EV_CONNECT_FAILED	Luiz Augusto von Dentz
	If HCI_CONN_MGMT_CONNECTED has been set then the event shall be HCI_CONN_MGMT_DISCONNECTED. Fixes: b644ba336997 ("Bluetooth: Update device_connected and device_found events to latest API") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10	Bluetooth: L2CAP: Remove unused declarations	Yue Haibing
	Commit e7b02296fb40 ("Bluetooth: Remove BT_HS") removed the implementations but leave declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10	Bluetooth: Add a helper function to extract iso header	Kiran K
	Add a helper function hci_iso_hdr() to extract iso header from skb. Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10	btrfs: constify more pointer parameters	David Sterba
	Continue adding const to parameters. This is for clarity and minor addition to safety. There are some minor effects, in the assembly code and .ko measured on release config. Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: rename __extent_writepage() and drop double underscores	David Sterba
	The function does not follow the pattern where the underscores would be justified, so rename it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	btrfs: update the writepage tracepoint to take a folio	Josef Bacik
	Willy is wanting to get rid of page->index, convert the writepage tracepoint to take a folio so we can do folio->index instead of page->index. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10	platform/x86: asus-wmi: Disable OOBE experience on Zenbook S 16	Bas Nieuwenhuizen
	The OOBE experience fades the keyboard backlight in & out continuously, and make the backlight uncontrollable using its device. Workaround taken from https://wiki.archlinux.org/index.php?title=ASUS_Zenbook_UM5606&diff=next&oldid=815547 Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Luke D. Jones <luke@ljones.dev> Link: https://lore.kernel.org/r/20240909223503.1445779-1-bas@basnieuwenhuizen.nl Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-09-10	Merge branch 'linus' into timers/core	Thomas Gleixner
	To update with the latest fixes.
2024-09-10	spi: remove spi_controller_is_slave() and spi_slave_abort()	Yang Yingliang
	spi_controller_is_slave() and spi_slave_abort() are all replaced, so they can be removed. No functional changed. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://patch.msgid.link/20240910022618.1397-8-yangyingliang@huaweicloud.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-10	perf/x86/intel/cstate: Clean up cpumask and hotplug	Kan Liang
	There are three cstate PMUs with different scopes, core, die and module. The scopes are supported by the generic perf_event subsystem now. Set the scope for each PMU and remove all the cpumask and hotplug codes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240802151643.1691631-4-kan.liang@linux.intel.com
2024-09-10	perf: Add PERF_EV_CAP_READ_SCOPE	Kan Liang
	Usually, an event can be read from any CPU of the scope. It doesn't need to be read from the advertised CPU. Add a new event cap, PERF_EV_CAP_READ_SCOPE. An event of a PMU with scope can be read from any active CPU in the scope. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240802151643.1691631-3-kan.liang@linux.intel.com
2024-09-10	perf: Generic hotplug support for a PMU with a scope	Kan Liang
	The perf subsystem assumes that the counters of a PMU are per-CPU. So the user space tool reads a counter from each CPU in the system wide mode. However, many PMUs don't have a per-CPU counter. The counter is effective for a scope, e.g., a die or a socket. To address this, a cpumask is exposed by the kernel driver to restrict to one CPU to stand for a specific scope. In case the given CPU is removed, the hotplug support has to be implemented for each such driver. The codes to support the cpumask and hotplug are very similar. - Expose a cpumask into sysfs - Pickup another CPU in the same scope if the given CPU is removed. - Invoke the perf_pmu_migrate_context() to migrate to a new CPU. - In event init, always set the CPU in the cpumask to event->cpu Similar duplicated codes are implemented for each such PMU driver. It would be good to introduce a generic infrastructure to avoid such duplication. 5 popular scopes are implemented here, core, die, cluster, pkg, and the system-wide. The scope can be set when a PMU is registered. If so, a "cpumask" is automatically exposed for the PMU. The "cpumask" is from the perf_online_<scope>_mask, which is to track the active CPU for each scope. They are set when the first CPU of the scope is online via the generic perf hotplug support. When a corresponding CPU is removed, the perf_online_<scope>_mask is updated accordingly and the PMU will be moved to a new CPU from the same scope if possible. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240802151643.1691631-2-kan.liang@linux.intel.com
2024-09-10	slab: make __kmem_cache_create() static inline	Christian Brauner
	Make __kmem_cache_create() a static inline function. Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-09-10	slab: make kmem_cache_create_usercopy() static inline	Christian Brauner
	Make kmem_cache_create_usercopy() a static inline function. Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-09-10	slab: remove kmem_cache_create_rcu()	Christian Brauner
	Now that we have ported all users of kmem_cache_create_rcu() to struct kmem_cache_args the function is unused and can be removed. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-09-10	slab: create kmem_cache_create() compatibility layer	Christian Brauner
	Use _Generic() to create a compatibility layer that type switches on the third argument to either call __kmem_cache_create() or __kmem_cache_create_args(). If NULL is passed for the struct kmem_cache_args argument use default args making porting for callers that don't care about additional arguments easy. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-09-10	slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args	Christian Brauner
	Make KMEM_CACHE_USERCOPY() use struct kmem_cache_args. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>