linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-02-21	media: tuners: Constify struct tunertype, tuner_range and tuner_params	Christophe JAILLET
	'struct tunertype', 'struct tuner_range' and 'struct tuner_params' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig: Before: ====== 2877 8554 0 11431 2ca7 drivers/media/tuners/tuner-types.o After: ===== text data bss dec hex filename 11421 48 0 11469 2ccd drivers/media/tuners/tuner-types.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-02-21	media: uapi: rkisp1-config: Fix typo in extensible params example	Niklas Söderlund
	The define used for the version in the example diagram does not match what is defined in enum rksip1_ext_param_buffer_version, nor the description above it. Correct the typo to make it clear which define to use. Fixes: e9d05e9d5db1 ("media: uapi: rkisp1-config: Add extensible params format") Cc: stable@vger.kernel.org Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-02-21	sysv: Remove the filesystem	Jan Kara
	Since 2002 (change "Replace BKL for chain locking with sysvfs-private rwlock") the sysv filesystem was doing IO under a rwlock in its get_block() function (yes, a non-sleepable lock hold over a function used to read inode metadata for all reads and writes). Nobody noticed until syzbot in 2023 [1]. This shows nobody is using the filesystem. Just drop it. [1] https://lore.kernel.org/all/0000000000000ccf9a05ee84f5b0@google.com/ Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250220163940.10155-2-jack@suse.cz Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	vfs: inline new_inode_pseudo() and de-staticize alloc_inode()	Mateusz Guzik
	The former is a no-op wrapper with the same argument. I left it in place to not lose the information who needs it -- one day "pseudo" inodes may start differing from what alloc_inode() returns. In the meantime no point taking a detour. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250212180459.1022983-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	vfs: inline getname()	Mateusz Guzik
	It is merely a trivial wrapper around getname_flags which adds a zeroed argument, no point paying for an extra call. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250206000105.432528-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	ioctl: Fix return type of several functions from long to int	Yuichiro Tsuji
	Fix the return type of several functions from long to int to match its actu al behavior. These functions only return int values. This change improves type consistency across the filesystem code and aligns the function signatu re with its existing implementation and usage. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Yuichiro Tsuji <yuichtsu@amazon.com> Link: https://lore.kernel.org/r/20250121070844.4413-3-yuichtsu@amazon.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	open: Fix return type of several functions from long to int	Yuichiro Tsuji
	Fix the return type of several functions from long to int to match its actu al behavior. These functions only return int values. This change improves type consistency across the filesystem code and aligns the function signatu re with its existing implementation and usage. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Yuichiro Tsuji <yuichtsu@amazon.com> Link: https://lore.kernel.org/r/20250121070844.4413-2-yuichtsu@amazon.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	fs: avoid mmap sem relocks when coredumping with many missing pages	Mateusz Guzik
	Dumping processes with large allocated and mostly not-faulted areas is very slow. Borrowing a test case from Tavian Barnes: int main(void) { char *mem = mmap(NULL, 1ULL << 40, PROT_READ \| PROT_WRITE, MAP_ANONYMOUS \| MAP_NORESERVE \| MAP_PRIVATE, -1, 0); printf("%p %m\n", mem); if (mem != MAP_FAILED) { mem[0] = 1; } abort(); } That's 1TB of almost completely not-populated area. On my test box it takes 13-14 seconds to dump. The profile shows: - 99.89% 0.00% a.out entry_SYSCALL_64_after_hwframe do_syscall_64 syscall_exit_to_user_mode arch_do_signal_or_restart - get_signal - 99.89% do_coredump - 99.88% elf_core_dump - dump_user_range - 98.12% get_dump_page - 64.19% __get_user_pages - 40.92% gup_vma_lookup - find_vma - mt_find 4.21% __rcu_read_lock 1.33% __rcu_read_unlock - 3.14% check_vma_flags 0.68% vma_is_secretmem 0.61% __cond_resched 0.60% vma_pgtable_walk_end 0.59% vma_pgtable_walk_begin 0.58% no_page_table - 15.13% down_read_killable 0.69% __cond_resched 13.84% up_read 0.58% __cond_resched Almost 29% of the time is spent relocking the mmap semaphore between calls to get_dump_page() which find nothing. Whacking that results in times of 10 seconds (down from 13-14). While here make the thing killable. The real problem is the page-sized iteration and the real fix would patch it up instead. It is left as an exercise for the mm-familiar reader. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250119103205.2172432-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	vfs: use the new debug macros in inode_set_cached_link()	Mateusz Guzik
	Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250209185523.745956-4-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	vfs: add initial support for CONFIG_DEBUG_VFS	Mateusz Guzik
	Small collection of macros taken from mmdebug.h Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250209185523.745956-2-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21	vdso: Remove remnants of architecture-specific time storage	Thomas Weißschuh
	All users of the time releated parts of the vDSO are now using the generic storage implementation. Remove the therefore unnecessary compatibility accessor functions and symbols. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-18-13a4669dfc8c@linutronix.de
2025-02-21	vdso: Remove remnants of architecture-specific random state storage	Thomas Weißschuh
	All users of the random vDSO are using the generic storage implementation. Remove the now unnecessary compatibility accessor functions and symbols. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-17-13a4669dfc8c@linutronix.de
2025-02-21	vdso: Add generic architecture-specific data storage	Thomas Weißschuh
	Some architectures need to expose architecture-specific data to the vDSO. Enable the generic vDSO storage mechanism to both store and map this data. Some architectures require more than a single page, like LoongArch, so prepare for that usecase, too. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-7-13a4669dfc8c@linutronix.de
2025-02-21	vdso: Add generic random data storage	Thomas Weißschuh
	Extend the generic vDSO data storage with a page for the random state data. The random state data is stored in a dedicated page, as the existing storage page is only meant for time-related, time-namespace-aware data. This simplifies to access logic to not need to handle time namespaces anymore and also frees up more space in the time-related page. In case further generic vDSO data store is required it can be added to the random state page. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-6-13a4669dfc8c@linutronix.de
2025-02-21	vdso: Add generic time data storage	Thomas Weißschuh
	Historically each architecture defined their own way to store the vDSO data page. Add a generic mechanism to provide storage for that page. Furthermore this generic storage will be extended to also provide uniform storage for non-time-related data, like the random state or architecture-specific data. These will have their own pages and data structures, so rename 'vdso_data' into 'vdso_time_data' to make that split clear from the name. Also introduce a new consistent naming scheme for the symbols related to the vDSO, which makes it clear if the symbol is accessible from userspace or kernel space and the type of data behind the symbol. The generic fault handler contains an optimization to prefault the vvar page when the timens page is accessed. This was lifted from s390 and x86. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-5-13a4669dfc8c@linutronix.de
2025-02-21	vdso: Introduce vdso/align.h	Thomas Weißschuh
	The vDSO implementation can only include headers from the vdso/ namespace. To enable the usage of the ALIGN() macro from the vDSO, move linux/align.h to vdso/align.h wholly. As the only dependency linux/const.h is only a wrapper around vdso/const.h anyways adapt that dependency. Also provide a compatibility wrapper linux/align.h. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-3-13a4669dfc8c@linutronix.de
2025-02-21	gpio: regmap: Allow ngpio to be read from the property	Andy Shevchenko
	GPIOLIB supports the case when number of supported GPIOs can be read from the device property. Enable this for drivers that are using GPIO regmap layer. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Reviewed-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Link: https://lore.kernel.org/r/20250213195621.3133406-6-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-21	gpio: regmap: Group optional assignments together for better understanding	Andy Shevchenko
	Group ngpio_per_reg, reg_stride, and reg_mask_xlate assignments together with the respective conditional for better understanding what's going on in the code. While at it, mark ngpio_per_reg as (Optional) in the kernel-doc in accordance with what code actually does. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Reviewed-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Link: https://lore.kernel.org/r/20250213195621.3133406-4-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-21	xfrm: check for PMTU in tunnel mode for packet offload	Leon Romanovsky
	In tunnel mode, for the packet offload, there were no PMTU signaling to the upper level about need to fragment the packet. As a solution, call to already existing xfrm[4\|6]_tunnel_check_size() to perform that. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21	xfrm: simplify SA initialization routine	Leon Romanovsky
	SA replay mode is initialized differently for user-space and kernel-space users, but the call to xfrm_init_replay() existed in common path with boolean protection. That caused to situation where we have two different function orders. So let's rewrite the SA initialization flow to have same order for both in-kernel and user-space callers. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21	xfrm: delay initialization of offload path till its actually requested	Leon Romanovsky
	XFRM offload path is probed even if offload isn't needed at all. Let's make sure that x->type_offload pointer stays NULL for such path to reduce ambiguity. Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21	dt-bindings: xilinx: Deprecate header with firmware constants	Michal Simek
	Firmware contants do not fit the purpose of bindings because they are not independent IDs for abstractions. They are more or less just contants which better to wire via header with DT which is using it. That's why add deprecated message to dt binding header and also update existing dt bindings not to use macros from the header and replace them by it's value. Actually value is not relevant because it is only example. The similar changes have been done by commit 9d9292576810 ("dt-bindings: pinctrl: samsung: deprecate header with register constants"). Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/2a6f0229522327939e6893565e540b75f854a37b.1738600745.git.michal.simek@amd.com
2025-02-20	Merge patch series "Support Multi-frequency scale for UFS"	Martin K. Petersen
	Ziqi Chen <quic_ziqichen@quicinc.com> says: With OPP V2 enabled, devfreq can scale clocks amongst multiple frequency plans. However, the gear speed is only toggled between min and max during clock scaling. Enable multi-level gear scaling by mapping clock frequencies to gear speeds, so that when devfreq scales clock frequencies we can put the UFS link at the appropraite gear speeds accordingly. This series has been tested on below platforms - sm8550 mtp + UFS3.1 SM8650 MTP + UFS3.1 SM8750 MTP + UFS4.0 Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-HDK Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-HDK Link: https://lore.kernel.org/r/20250213080008.2984807-1-quic_ziqichen@quicinc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20	scsi: ufs: core: Toggle Write Booster during clock scaling base on gear speed	Can Guo
	During clock scaling, Write Booster is toggled on or off based on whether the clock is scaled up or down. However, with OPP V2 powered multi-level gear scaling, the gear can be scaled amongst multiple gear speeds, e.g., it may scale down from G5 to G4, or from G4 to G2. To provide flexibilities, add a new field for clock scaling such that during clock scaling Write Booster can be enabled or disabled based on gear speeds but not based on scaling up or down. Signed-off-by: Can Guo <quic_cang@quicinc.com> Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com> Link: https://lore.kernel.org/r/20250213080008.2984807-8-quic_ziqichen@quicinc.com Reviewed-by: Bean Huo <beanhuo@micron.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20	scsi: ufs: core: Add a vop to map clock frequency to gear speed	Can Guo
	Add a vop to map UFS host controller clock frequencies to the corresponding maximum supported UFS high speed gear speeds. During clock scaling, we can map the target clock frequency, demanded by devfreq, to the maximum supported gear speed, so that devfreq can scale the gear to the highest gear speed supported at the target clock frequency, instead of just scaling up/down the gear between the min and max gear speeds. Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Can Guo <quic_cang@quicinc.com> Link: https://lore.kernel.org/r/20250213080008.2984807-4-quic_ziqichen@quicinc.com Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20	scsi: ufs: core: Pass target_freq to clk_scale_notify() vop	Can Guo
	Instead of only two frequencies, if OPP V2 is used, the UFS devfreq clock scaling may scale the clock among multiple frequencies, so just passing up/down to vop clk_scale_notify() is not enough to cover the intermediate clock freqs between the min and max freqs. Hence pass the target_freq, which will be used in successive commits, to clk_scale_notify() to allow the vop to perform corresponding configurations with regard to the clock freqs. Signed-off-by: Can Guo <quic_cang@quicinc.com> Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com> Link: https://lore.kernel.org/r/20250213080008.2984807-2-quic_ziqichen@quicinc.com Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20	net: phy: remove unused feature array declarations	Heiner Kallweit
	After 12d5151be010 ("net: phy: remove leftovers from switch to linkmode bitmaps") the following declarations are unused and can be removed too. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/b2883c75-4108-48f2-ab73-e81647262bc2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf bpf-6.14-rc4	Alexei Starovoitov
	Cross-merge bpf fixes after downstream PR (bpf-6.14-rc4). Minor conflict: kernel/bpf/btf.c Adjacent changes: kernel/bpf/arena.c kernel/bpf/btf.c kernel/bpf/syscall.c kernel/bpf/verifier.c mm/memory.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-21	Merge tag 'tags/ib-mfd-power-v6.15' into psy-next	Sebastian Reichel
	Immutable branch between MFD and Power due for the v6.15 merge window Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-02-20	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds
	Pull BPF fixes from Daniel Borkmann: - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels (Alan Maguire) - Fix a missing allocation failure check in BPF verifier's acquire_lock_state (Kumar Kartikeya Dwivedi) - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima) - Fix a deadlock when freeing BPF cgroup storage (Abel Wu) - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex (Andrii Nakryiko) - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is accessing skb data not containing an Ethernet header (Shigeru Yoshida) - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai) - Several BPF sockmap fixes to address incorrect TCP copied_seq calculations, which prevented correct data reads from recv(2) in user space (Jiayuan Chen) - Two fixes for BPF map lookup nullness elision (Daniel Xu) - Fix a NULL-pointer dereference from vmlinux BTF lookup in bpf_sk_storage_tracing_allowed (Jared Kangas) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests: bpf: test batch lookup on array of maps with holes bpf: skip non exist keys in generic_map_lookup_batch bpf: Handle allocation failure in acquire_lock_state bpf: verifier: Disambiguate get_constant_map_key() errors bpf: selftests: Test constant key extraction on irrelevant maps bpf: verifier: Do not extract constant map keys for irrelevant maps bpf: Fix softlockup in arena_map_free on 64k page kernel net: Add rx_skb of kfree_skb to raw_tp_null_args[]. bpf: Fix deadlock when freeing cgroup storage selftests/bpf: Add strparser test for bpf selftests/bpf: Fix invalid flag of recv() bpf: Disable non stream socket for strparser bpf: Fix wrong copied_seq calculation strparser: Add read_sock callback bpf: avoid holding freeze_mutex during mmap operation bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic selftests/bpf: Adjust data size to have ETH_HLEN bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type() bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20	xsk: Add launch time hardware offload support to XDP Tx metadata	Song Yoong Siang
	Extend the XDP Tx metadata framework so that user can requests launch time hardware offload, where the Ethernet device will schedule the packet for transmission at a pre-determined time called launch time. The value of launch time is communicated from user space to Ethernet driver via launch_time field of struct xsk_tx_metadata. Suggested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
2025-02-20	bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback	Jason Xing
	This patch introduces a new callback in tcp_tx_timestamp() to correlate tcp_sendmsg timestamp with timestamps from other tx timestamping callbacks (e.g., SND/SW/ACK). Without this patch, BPF program wouldn't know which timestamps belong to which flow because of no socket lock protection. This new callback is inserted in tcp_tx_timestamp() to address this issue because tcp_tx_timestamp() still owns the same socket lock with tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes the timestamping related fields for the skb, especially tskey. The tskey is the bridge to do the correlation. For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and then stores the sendmsg timestamp at the bpf_sk_storage, correlating this timestamp with its tskey that are later used in other sending timestamping callbacks. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-11-kerneljasonxing@gmail.com
2025-02-20	bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback	Jason Xing
	Support the ACK case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_ACK_CB. This callback will occur at the same timestamping point as the user space's SCM_TSTAMP_ACK. The BPF program can use it to get the same SCM_TSTAMP_ACK timestamp without modifying the user-space application. This patch extends txstamp_ack to two bits: 1 stands for SO_TIMESTAMPING mode, 2 bpf extension. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-10-kerneljasonxing@gmail.com
2025-02-20	bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback	Jason Xing
	Support hw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This callback will occur at the same timestamping point as the user space's hardware SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers from driver side using SKBTX_HW_TSTAMP. The new definition of SKBTX_HW_TSTAMP means the combination tests of socket timestamping and bpf timestamping. After this patch, drivers can work under the bpf timestamping. Considering some drivers don't assign the skb with hardware timestamp, this patch does the assignment and then BPF program can acquire the hwstamp from skb directly. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-9-kerneljasonxing@gmail.com
2025-02-20	bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback	Jason Xing
	Support sw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_SW_CB. This callback will occur at the same timestamping point as the user space's software SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. Based on this patch, BPF program will get the software timestamp when the driver is ready to send the skb. In the sebsequent patch, the hardware timestamp will be supported. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-8-kerneljasonxing@gmail.com
2025-02-20	bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback	Jason Xing
	Support SCM_TSTAMP_SCHED case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SCHED_CB. This callback will occur at the same timestamping point as the user space's SCM_TSTAMP_SCHED. The BPF program can use it to get the same SCM_TSTAMP_SCHED timestamp without modifying the user-space application. A new SKBTX_BPF flag is added to mark skb_shinfo(skb)->tx_flags, ensuring that the new BPF timestamping and the current user space's SO_TIMESTAMPING do not interfere with each other. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-7-kerneljasonxing@gmail.com
2025-02-20	bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback	Jason Xing
	The subsequent patch will implement BPF TX timestamping. It will call the sockops BPF program without holding the sock lock. This breaks the current assumption that all sock ops programs will hold the sock lock. The sock's fields of the uapi's bpf_sock_ops requires this assumption. To address this, a new "u8 is_locked_tcp_sock;" field is added. This patch sets it in the current sock_ops callbacks. The "is_fullsock" test is then replaced by the "is_locked_tcp_sock" test during sock_ops_convert_ctx_access(). The new TX timestamping callbacks added in the subsequent patch will not have this set. This will prevent unsafe access from the new timestamping callbacks. Potentially, we could allow read-only access. However, this would require identifying which callback is read-safe-only and also requires additional BPF instruction rewrites in the covert_ctx. Since the BPF program can always read everything from a socket (e.g., by using bpf_core_cast), this patch keeps it simple and disables all read and write access to any socket fields through the bpf_sock_ops UAPI from the new TX timestamping callback. Moreover, note that some of the fields in bpf_sock_ops are specific to tcp_sock, and sock_ops currently only supports tcp_sock. In the future, UDP timestamping will be added, which will also break this assumption. The same idea used in this patch will be reused. Considering that the current sock_ops only supports tcp_sock, the variable is named is_locked_"tcp"_sock. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com
2025-02-20	bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping	Jason Xing
	This patch introduces a new bpf_skops_tx_timestamping() function that prepares the "struct bpf_sock_ops" ctx and then executes the sockops BPF program. The subsequent patch will utilize bpf_skops_tx_timestamping() at the existing TX timestamping kernel callbacks (__sk_tstamp_tx specifically) to call the sockops BPF program. Later, four callback points to report information to user space based on this patch will be introduced. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250220072940.99994-3-kerneljasonxing@gmail.com
2025-02-20	bpf: Add networking timestamping support to bpf_get/setsockopt()	Jason Xing
	The new SK_BPF_CB_FLAGS and new SK_BPF_CB_TX_TIMESTAMPING are added to bpf_get/setsockopt. The later patches will implement the BPF networking timestamping. The BPF program will use bpf_setsockopt(SK_BPF_CB_FLAGS, SK_BPF_CB_TX_TIMESTAMPING) to enable the BPF networking timestamping on a socket. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-2-kerneljasonxing@gmail.com
2025-02-20	i3c: Remove the const qualifier from i2c_msg pointer in i2c_xfers API	Billy Tsai
	The change is necessary to enable the use of the `i2c_get_dma_safe_msg_buf()` API, which requires a non-const `struct i2c_msg *` to operate. The `i2c_get_dma_safe_msg_buf()` function ensures safe handling of I2C messages when using DMA, making it essential for scenarios where DMA transfers are involved. By removing the `const` qualifier, this patch allows drivers to prepare and manage DMA-safe buffers directly. Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com> Link: https://lore.kernel.org/r/20250204091702.4014466-1-billy_tsai@aspeedtech.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-02-20	net: Add options as a flexible array to struct ip_tunnel_info	Gal Pressman
	Remove the hidden assumption that options are allocated at the end of the struct, and teach the compiler about them using a flexible array. With this, we can revert the unsafe_memcpy() call we have in tun_dst_unclone() [1], and resolve the false field-spanning write warning caused by the memcpy() in ip_tunnel_info_opts_set(). The layout of struct ip_tunnel_info remains the same with this patch. Before this patch, there was an implicit padding at the end of the struct, options would be written at 'info + 1' which is after the padding. This will remain the same as this patch explicitly aligns 'options'. The alignment is needed as the options are later casted to different structs, and might result in unaligned memory access. Pahole output before this patch: struct ip_tunnel_info { struct ip_tunnel_key key; /* 0 64 / / XXX last struct has 1 byte of padding / / --- cacheline 1 boundary (64 bytes) --- / struct ip_tunnel_encap encap; / 64 8 / struct dst_cache dst_cache; / 72 16 / u8 options_len; / 88 1 / u8 mode; / 89 1 / / size: 96, cachelines: 2, members: 5 / / padding: 6 / / paddings: 1, sum paddings: 1 / / last cacheline: 32 bytes / }; Pahole output after this patch: struct ip_tunnel_info { struct ip_tunnel_key key; / 0 64 / / XXX last struct has 1 byte of padding / / --- cacheline 1 boundary (64 bytes) --- / struct ip_tunnel_encap encap; / 64 8 / struct dst_cache dst_cache; / 72 16 / u8 options_len; / 88 1 / u8 mode; / 89 1 / / XXX 6 bytes hole, try to pack / u8 options[] __attribute__((__aligned__(16))); / 96 0 / / size: 96, cachelines: 2, members: 6 / / sum members: 90, holes: 1, sum holes: 6 / / paddings: 1, sum paddings: 1 / / forced alignments: 1, forced holes: 1, sum forced holes: 6 / / last cacheline: 32 bytes */ } __attribute__((__aligned__(16))); [1] Commit 13cfd6a6d7ac ("net: Silence false field-spanning write warning in metadata_dst memcpy") Link: https://lore.kernel.org/all/53D1D353-B8F6-4ADC-8F29-8C48A7C9C6F1@kernel.org/ Suggested-by: Kees Cook <kees@kernel.org> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250219143256.370277-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	ip_tunnel: Use ip_tunnel_info() helper instead of 'info + 1'	Gal Pressman
	Tunnel options should not be accessed directly, use the ip_tunnel_info() accessor instead. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250219143256.370277-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.14-rc4). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20	Merge tag 'net-6.14-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Smaller than usual with no fixes from any subtree. Current release - regressions: - core: fix race of rtnl_net_lock(dev_net(dev)) Previous releases - regressions: - core: remove the single page frag cache for good - flow_dissector: fix handling of mixed port and port-range keys - sched: cls_api: fix error handling causing NULL dereference - tcp: - adjust rcvq_space after updating scaling ratio - drop secpath at the same time as we currently drop dst - eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl(). Previous releases - always broken: - vsock: - fix variables initialization during resuming - for connectible sockets allow only connected - eth: - geneve: fix use-after-free in geneve_find_dev() - ibmvnic: don't reference skb after sending to VIOS" * tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) Revert "net: skb: introduce and use a single page frag cache" net: allow small head cache usage with large MAX_SKB_FRAGS values nfp: bpf: Add check for nfp_app_ctrl_msg_alloc() tcp: drop secpath at the same time as we currently drop dst net: axienet: Set mac_managed_pm arp: switch to dev_getbyhwaddr() in arp_req_set_public() net: Add non-RCU dev_getbyhwaddr() helper sctp: Fix undefined behavior in left shift operation selftests/bpf: Add a specific dst port matching flow_dissector: Fix port range key handling in BPF conversion selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range flow_dissector: Fix handling of mixed port and port-range keys geneve: Suppress list corruption splat in geneve_destroy_tunnels(). gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl(). dev: Use rtnl_net_dev_lock() in unregister_netdev(). net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net(). net: Add net_passive_inc() and net_passive_dec(). net: pse-pd: pd692x0: Fix power limit retrieval MAINTAINERS: trim the GVE entry gve: set xdp redirect target only when it is available ...
2025-02-20	leds: max77705: Add LEDs support	Dzmitry Sankouski
	This adds basic support for LEDs for the max77705 PMIC. Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250123-starqltechn_integration_upstream-v17-7-8b06685b6612@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2025-02-20	mfd: Add new driver for MAX77705 PMIC	Dzmitry Sankouski
	Add the core MFD driver for max77705 PMIC. Drivers for sub-devices will be added in subsequent patches. Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250123-starqltechn_integration_upstream-v17-5-8b06685b6612@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2025-02-20	power: supply: max77705: Add charger driver for Maxim 77705	Dzmitry Sankouski
	Add driver for Maxim 77705 switch-mode charger. It providing power supply class information to userspace. The driver is configured through DTS (battery and system related settings). Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20250123-starqltechn_integration_upstream-v17-3-8b06685b6612@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2025-02-20	mfd: axp20x: AXP717: Add AXP717_TS_PIN_CFG to writeable regs	Chris Morgan
	Add AXP717_TS_PIN_CFG (register 0x50) to the table of writeable registers so that the temperature sensor can be configured by the battery driver. Signed-off-by: Chris Morgan <macromorgan@hotmail.com> Link: https://lore.kernel.org/r/20250204155835.161973-3-macroalpha82@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2025-02-20	io_uring/epoll: add support for IORING_OP_EPOLL_WAIT	Jens Axboe
	For existing epoll event loops that can't fully convert to io_uring, the used approach is usually to add the io_uring fd to the epoll instance and use epoll_wait() to wait on both "legacy" and io_uring events. While this work, it isn't optimal as: 1) epoll_wait() is pretty limited in what it can do. It does not support partial reaping of events, or waiting on a batch of events. 2) When an io_uring ring is added to an epoll instance, it activates the io_uring "I'm being polled" logic which slows things down. Rather than use this approach, with EPOLL_WAIT support added to io_uring, event loops can use the normal io_uring wait logic for everything, as long as an epoll wait request has been armed with io_uring. Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this is an async request. Waiting on io_uring events in general has various timeout parameters, and those are the ones that should be used when waiting on any kind of request. If events are immediately available for reaping, then This opcode will return those immediately. If none are available, then it will post an async completion when they become available. cqe->res will contain either an error code (< 0 value) for a malformed request, invalid epoll instance, etc. It will return a positive result indicating how many events were reaped. IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring cancelation infrastructure. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-20	Merge branch 'vfs-6.15.eventpoll' of ↵	Jens Axboe
	https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs into for-6.15/io_uring-epoll-wait Merge epoll changes from the VFS tree, which the io_uring changes depend on. * 'vfs-6.15.eventpoll' of https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: eventpoll: add epoll_sendevents() helper eventpoll: abstract out ep_try_send_events() helper eventpoll: abstract out parameter sanity checking