summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-02-21rtnetlink: Pack newlink() params into structXiao Liang
There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21afs: Give an afs_server object a ref on the afs_cell object it points toDavid Howells
Give an afs_server object a ref on the afs_cell object it points to so that the cell doesn't get deleted before the server record. Whilst this is circular (cell -> vol -> server_list -> server -> cell), the ref only pins the memory, not the lifetime as that's controlled by the activity counter. When the volume's activity counter reaches 0, it detaches from the cell and discards its server list; when a cell's activity counter reaches 0, it discards its root volume. At that point, the circularity is cut. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250218192250.296870-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge tag 'block-6.14-20250221' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - FC controller state check fixes (Daniel) - PCI Endpoint fixes (Damien) - TCP connection failure fixe (Caleb) - TCP handling C2HTermReq PDU (Maurizio) - RDMA queue state check (Ruozhu) - Apple controller fixes (Hector) - Target crash on disbaled namespace (Hannes) - MD pull request via Yu: - Fix queue limits error handling for raid0, raid1 and raid10 - Fix for a NULL pointer deref in request data mapping - Code cleanup for request merging * tag 'block-6.14-20250221' of git://git.kernel.dk/linux: nvme: only allow entering LIVE from CONNECTING state nvme-fc: rely on state transitions to handle connectivity loss apple-nvme: Support coprocessors left idle apple-nvme: Release power domains when probe fails nvmet: Use enum definitions instead of hardcoded values nvme: Cleanup the definition of the controller config register fields nvme/ioctl: add missing space in err message nvme-tcp: fix connect failure on receiving partial ICResp PDU nvme: tcp: Fix compilation warning with W=1 nvmet: pci-epf: Avoid RCU stalls under heavy workload nvmet: pci-epf: Do not uselessly write the CSTS register nvmet: pci-epf: Correctly initialize CSTS when enabling the controller nvmet-rdma: recheck queue state is LIVE in state lock in recv done nvmet: Fix crash when a namespace is disabled nvme-tcp: add basic support for the C2HTermReq PDU nvme-pci: quirk Acer FA100 for non-uniqueue identifiers block: fix NULL pointer dereferenced within __blk_rq_map_sg block/merge: remove unnecessary min() with UINT_MAX md/raid*: Fix the set_queue_limits implementations
2025-02-21Merge tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Series fixing an issue with multishot read on pollable files that may return -EIOCBQUEUED from ->read_iter(). Four small patches for that, the first one deliberately done in such a way that it'd be easy to backport - Remove some dead constant definitions - Use array_index_nospec() for opcode indexing - Work-around for worker creation retries in the presence of signals * tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux: io_uring/rw: clean up mshot forced sync mode io_uring/rw: move ki_complete init into prep io_uring/rw: don't directly use ki_complete io_uring/rw: forbid multishot async reads io_uring/rsrc: remove unused constants io_uring: fix spelling error in uapi io_uring.h io_uring: prevent opcode speculation io-wq: backoff when retrying worker creation
2025-02-21dt-bindings: clock: add clock definitions for Ralink SoCsSergio Paracuellos
Add clock missing definitions for RT2880, RT305X, RT3352, RT3383, RT5350, MT7620 and MT76X8 Ralink SoCs. Update bindings to clarify clock depending on these new introduced constants so consumer nodes can easily use the correct one in DTS files matching properly what is being used in driver code (clock IDs are implicitly used there). Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-02-21coresight: core: Add provision for panic callbacksLinu Cherian
Panic callback handlers allows coresight device drivers to sync relevant trace data and trace metadata to reserved memory regions so that they can be retrieved later in the subsequent boot or in the crashdump kernel. Signed-off-by: Linu Cherian <lcherian@marvell.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20250212114918.548431-4-lcherian@marvell.com
2025-02-21iommu: Turn fault_data to iommufd private pointerNicolin Chen
A "fault_data" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases, along with the attach_handle. Now, the iommufd version of the sw_msi function will reuse the attach_handle and fault_data for a non-fault case. Rename "fault_data" to "iommufd_hwpt" so as not to confine it to a "fault" case. Move it into a union to be the iommufd private pointer. A following patch will move the iova_cookie to the union for dma-iommu too after the iommufd_sw_msi implementation is added. Since we have two unions now, add some simple comments for readability. Link: https://patch.msgid.link/r/ee5039503f28a16590916e9eef28b917e2d1607a.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21iommu: Make iommu_dma_prepare_msi() into a generic operationJason Gunthorpe
SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21genirq/msi: Refactor iommu_dma_compose_msi_msg()Jason Gunthorpe
The two-step process to translate the MSI address involves two functions, iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as it had to dereference the opaque cookie pointer. Now, the previous patch changed the cookie pointer into an integer so there is no longer any need for the iommu layer to be involved. Further, the call sites of iommu_dma_compose_msi_msg() all follow the same pattern of setting an MSI message address_hi/lo to non-translated and then immediately calling iommu_dma_compose_msi_msg(). Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly accepts the u64 version of the address and simplifies all the callers. Move the new helper to linux/msi.h since it has nothing to do with iommu. Aside from refactoring, this logically prepares for the next patch, which allows multiple implementation options for iommu_dma_prepare_msi(). So, it does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c as it no longer provides the only iommu_dma_prepare_msi() implementation. Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookieJason Gunthorpe
The IOMMU translation for MSI message addresses has been a 2-step process, separated in time: 1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address is stored in the MSI descriptor when an MSI interrupt is allocated. 2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a translated message address. This has an inherent lifetime problem for the pointer stored in the cookie that must remain valid between the two steps. However, there is no locking at the irq layer that helps protect the lifetime. Today, this works under the assumption that the iommu domain is not changed while MSI interrupts being programmed. This is true for normal DMA API users within the kernel, as the iommu domain is attached before the driver is probed and cannot be changed while a driver is attached. Classic VFIO type1 also prevented changing the iommu domain while VFIO was running as it does not support changing the "container" after starting up. However, iommufd has improved this so that the iommu domain can be changed during VFIO operation. This potentially allows userspace to directly race VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()). This potentially causes both the cookie pointer and the unlocked call to iommu_get_domain_for_dev() on the MSI translation path to become UAFs. Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA address is already known during iommu_dma_prepare_msi() and cannot change. Thus, it can simply be stored as an integer in the MSI descriptor. The other UAF related to iommu_get_domain_for_dev() will be addressed in patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by using the IOMMU group mutex. Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21Merge tag 'v6.14-rc3' into x86/mm, to pick up fixes before merging new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-21perf/core: Move perf_event sysctls into kernel/eventsJoel Granados
Move ctl tables to two files: - perf_event_{paranoid,mlock_kb,max_sample_rate} and perf_cpu_time_max_percent into kernel/events/core.c - perf_event_max_{stack,context_per_stack} into kernel/events/callchain.c Make static variables and functions that are fully contained in core.c and callchain.cand remove them from include/linux/perf_event.h. Additionally six_hundred_forty_kb is moved to callchain.c. Two new sysctl tables are added ({callchain,events_core}_sysctl_table) with their respective sysctl registration functions. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kerenel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250218-jag-mv_ctltables-v1-5-cd3698ab8d29@kernel.org
2025-02-21Merge branch 'perf/urgent' into perf/core, to pick up fixes before merging ↵Ingo Molnar
new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-21mm/filemap: fix miscalculated file range for filemap_fdatawrite_range_kick()Jingbo Xu
iocb->ki_pos has been updated with the number of written bytes since generic_perform_write(). Besides __filemap_fdatawrite_range() accepts the inclusive end of the data range. Fixes: 1d4457576570 ("mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20250218120209.88093-2-jefflexu@linux.alibaba.com Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21pps: generators: replace copy of pps-gen info struct with const pointerSubramanian Mohan
Some PPS generator drivers may need to retrieve a pointer to their internal data while executing the PPS generator enable() method. During the driver registration the pps_gen_device pointer is returned from the framework, and for that reason, there is difficulty in getting generator driver data back in the enable function. We won't be able to use container_of macro as it results in static assert, and we might end up in using static pointer. To solve the issue and to get back the generator driver data back, we should not copy the struct pps_gen_source_info within the struct pps_gen_device during the registration stage, but simply save the pointer of the driver one. In this manner, driver may get a pointer to their internal data as shown below: struct pps_gen_foo_data_s { ... struct pps_gen_source_info gen_info; struct pps_gen_device *pps_gen; ... }; static int __init pps_gen_foo_init(void) { struct pps_gen_foo_data_s *foo; ... foo->pps_gen = pps_gen_register_source(&foo->gen_info); ... } Then, in the enable() method, we can retrieve the pointer to the main struct by using the code below: static int pps_gen_foo_enable(struct pps_gen_device *pps_gen, bool enable) { struct pps_gen_foo_data_s *foo = container_of(pps_gen->info, struct pps_gen_foo_data_s, gen_info); ... } Signed-off-by: Rodolfo Giometti <giometti@enneenne.com> Tested-by: Subramanian Mohan <subramanian.mohan@intel.com> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Subramanian Mohan <subramanian.mohan@intel.com> Link: https://lore.kernel.org/r/20250219040618.70962-2-subramanian.mohan@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21USB: core: Add eUSB2 descriptor and parsing in USB coreKannappan R
Add support for the 'eUSB2 Isochronous Endpoint Companion Descriptor' introduced in the recent USB 2.0 specification 'USB 2.0 Double Isochronous IN Bandwidth' ECN. It allows embedded USB2 (eUSB2) devices to report and use higher bandwidths for isochronous IN transfers in order to support higher camera resolutions on the lid of laptops and tablets with minimal change to the USB2 protocol. The motivation for expanding USB 2.0 is further clarified in an additional Embedded USB2 version 2.0 (eUSB2v2) supplement to the USB 2.0 specification. It points out this is optimized for performance, power and cost by using the USB 2.0 low-voltage, power efficient PHY and half-duplex link for the asymmetric camera bandwidth needs, avoiding the costly and complex full-duplex USB 3.x symmetric link and gigabit receivers. eUSB2 devices that support the higher isochronous IN bandwidth and the new descriptor can be identified by their device descriptor bcdUSB value of 0x0220 Co-developed-by: Amardeep Rai <amardeep.rai@intel.com> Signed-off-by: Amardeep Rai <amardeep.rai@intel.com> Signed-off-by: Kannappan R <r.kannappan@intel.com> Co-developed-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20250220141339.1939448-1-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21media: tuners: Constify struct tunertype, tuner_range and tuner_paramsChristophe JAILLET
'struct tunertype', 'struct tuner_range' and 'struct tuner_params' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig: Before: ====== 2877 8554 0 11431 2ca7 drivers/media/tuners/tuner-types.o After: ===== text data bss dec hex filename 11421 48 0 11469 2ccd drivers/media/tuners/tuner-types.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-02-21media: uapi: rkisp1-config: Fix typo in extensible params exampleNiklas Söderlund
The define used for the version in the example diagram does not match what is defined in enum rksip1_ext_param_buffer_version, nor the description above it. Correct the typo to make it clear which define to use. Fixes: e9d05e9d5db1 ("media: uapi: rkisp1-config: Add extensible params format") Cc: stable@vger.kernel.org Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-02-21sysv: Remove the filesystemJan Kara
Since 2002 (change "Replace BKL for chain locking with sysvfs-private rwlock") the sysv filesystem was doing IO under a rwlock in its get_block() function (yes, a non-sleepable lock hold over a function used to read inode metadata for all reads and writes). Nobody noticed until syzbot in 2023 [1]. This shows nobody is using the filesystem. Just drop it. [1] https://lore.kernel.org/all/0000000000000ccf9a05ee84f5b0@google.com/ Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250220163940.10155-2-jack@suse.cz Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21vfs: inline new_inode_pseudo() and de-staticize alloc_inode()Mateusz Guzik
The former is a no-op wrapper with the same argument. I left it in place to not lose the information who needs it -- one day "pseudo" inodes may start differing from what alloc_inode() returns. In the meantime no point taking a detour. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250212180459.1022983-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21vfs: inline getname()Mateusz Guzik
It is merely a trivial wrapper around getname_flags which adds a zeroed argument, no point paying for an extra call. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250206000105.432528-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21ioctl: Fix return type of several functions from long to intYuichiro Tsuji
Fix the return type of several functions from long to int to match its actu al behavior. These functions only return int values. This change improves type consistency across the filesystem code and aligns the function signatu re with its existing implementation and usage. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Yuichiro Tsuji <yuichtsu@amazon.com> Link: https://lore.kernel.org/r/20250121070844.4413-3-yuichtsu@amazon.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21open: Fix return type of several functions from long to intYuichiro Tsuji
Fix the return type of several functions from long to int to match its actu al behavior. These functions only return int values. This change improves type consistency across the filesystem code and aligns the function signatu re with its existing implementation and usage. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Yuichiro Tsuji <yuichtsu@amazon.com> Link: https://lore.kernel.org/r/20250121070844.4413-2-yuichtsu@amazon.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21fs: avoid mmap sem relocks when coredumping with many missing pagesMateusz Guzik
Dumping processes with large allocated and mostly not-faulted areas is very slow. Borrowing a test case from Tavian Barnes: int main(void) { char *mem = mmap(NULL, 1ULL << 40, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_NORESERVE | MAP_PRIVATE, -1, 0); printf("%p %m\n", mem); if (mem != MAP_FAILED) { mem[0] = 1; } abort(); } That's 1TB of almost completely not-populated area. On my test box it takes 13-14 seconds to dump. The profile shows: - 99.89% 0.00% a.out entry_SYSCALL_64_after_hwframe do_syscall_64 syscall_exit_to_user_mode arch_do_signal_or_restart - get_signal - 99.89% do_coredump - 99.88% elf_core_dump - dump_user_range - 98.12% get_dump_page - 64.19% __get_user_pages - 40.92% gup_vma_lookup - find_vma - mt_find 4.21% __rcu_read_lock 1.33% __rcu_read_unlock - 3.14% check_vma_flags 0.68% vma_is_secretmem 0.61% __cond_resched 0.60% vma_pgtable_walk_end 0.59% vma_pgtable_walk_begin 0.58% no_page_table - 15.13% down_read_killable 0.69% __cond_resched 13.84% up_read 0.58% __cond_resched Almost 29% of the time is spent relocking the mmap semaphore between calls to get_dump_page() which find nothing. Whacking that results in times of 10 seconds (down from 13-14). While here make the thing killable. The real problem is the page-sized iteration and the real fix would patch it up instead. It is left as an exercise for the mm-familiar reader. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250119103205.2172432-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21vfs: use the new debug macros in inode_set_cached_link()Mateusz Guzik
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250209185523.745956-4-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21vfs: add initial support for CONFIG_DEBUG_VFSMateusz Guzik
Small collection of macros taken from mmdebug.h Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250209185523.745956-2-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21vdso: Remove remnants of architecture-specific time storageThomas Weißschuh
All users of the time releated parts of the vDSO are now using the generic storage implementation. Remove the therefore unnecessary compatibility accessor functions and symbols. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-18-13a4669dfc8c@linutronix.de
2025-02-21vdso: Remove remnants of architecture-specific random state storageThomas Weißschuh
All users of the random vDSO are using the generic storage implementation. Remove the now unnecessary compatibility accessor functions and symbols. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-17-13a4669dfc8c@linutronix.de
2025-02-21vdso: Add generic architecture-specific data storageThomas Weißschuh
Some architectures need to expose architecture-specific data to the vDSO. Enable the generic vDSO storage mechanism to both store and map this data. Some architectures require more than a single page, like LoongArch, so prepare for that usecase, too. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-7-13a4669dfc8c@linutronix.de
2025-02-21vdso: Add generic random data storageThomas Weißschuh
Extend the generic vDSO data storage with a page for the random state data. The random state data is stored in a dedicated page, as the existing storage page is only meant for time-related, time-namespace-aware data. This simplifies to access logic to not need to handle time namespaces anymore and also frees up more space in the time-related page. In case further generic vDSO data store is required it can be added to the random state page. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-6-13a4669dfc8c@linutronix.de
2025-02-21vdso: Add generic time data storageThomas Weißschuh
Historically each architecture defined their own way to store the vDSO data page. Add a generic mechanism to provide storage for that page. Furthermore this generic storage will be extended to also provide uniform storage for *non*-time-related data, like the random state or architecture-specific data. These will have their own pages and data structures, so rename 'vdso_data' into 'vdso_time_data' to make that split clear from the name. Also introduce a new consistent naming scheme for the symbols related to the vDSO, which makes it clear if the symbol is accessible from userspace or kernel space and the type of data behind the symbol. The generic fault handler contains an optimization to prefault the vvar page when the timens page is accessed. This was lifted from s390 and x86. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-5-13a4669dfc8c@linutronix.de
2025-02-21vdso: Introduce vdso/align.hThomas Weißschuh
The vDSO implementation can only include headers from the vdso/ namespace. To enable the usage of the ALIGN() macro from the vDSO, move linux/align.h to vdso/align.h wholly. As the only dependency linux/const.h is only a wrapper around vdso/const.h anyways adapt that dependency. Also provide a compatibility wrapper linux/align.h. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-3-13a4669dfc8c@linutronix.de
2025-02-21gpio: regmap: Allow ngpio to be read from the propertyAndy Shevchenko
GPIOLIB supports the case when number of supported GPIOs can be read from the device property. Enable this for drivers that are using GPIO regmap layer. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Reviewed-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Link: https://lore.kernel.org/r/20250213195621.3133406-6-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-21gpio: regmap: Group optional assignments together for better understandingAndy Shevchenko
Group ngpio_per_reg, reg_stride, and reg_mask_xlate assignments together with the respective conditional for better understanding what's going on in the code. While at it, mark ngpio_per_reg as (Optional) in the kernel-doc in accordance with what code actually does. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Reviewed-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com> Link: https://lore.kernel.org/r/20250213195621.3133406-4-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-21xfrm: check for PMTU in tunnel mode for packet offloadLeon Romanovsky
In tunnel mode, for the packet offload, there were no PMTU signaling to the upper level about need to fragment the packet. As a solution, call to already existing xfrm[4|6]_tunnel_check_size() to perform that. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21xfrm: simplify SA initialization routineLeon Romanovsky
SA replay mode is initialized differently for user-space and kernel-space users, but the call to xfrm_init_replay() existed in common path with boolean protection. That caused to situation where we have two different function orders. So let's rewrite the SA initialization flow to have same order for both in-kernel and user-space callers. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21xfrm: delay initialization of offload path till its actually requestedLeon Romanovsky
XFRM offload path is probed even if offload isn't needed at all. Let's make sure that x->type_offload pointer stays NULL for such path to reduce ambiguity. Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-21dt-bindings: xilinx: Deprecate header with firmware constantsMichal Simek
Firmware contants do not fit the purpose of bindings because they are not independent IDs for abstractions. They are more or less just contants which better to wire via header with DT which is using it. That's why add deprecated message to dt binding header and also update existing dt bindings not to use macros from the header and replace them by it's value. Actually value is not relevant because it is only example. The similar changes have been done by commit 9d9292576810 ("dt-bindings: pinctrl: samsung: deprecate header with register constants"). Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/2a6f0229522327939e6893565e540b75f854a37b.1738600745.git.michal.simek@amd.com
2025-02-20Merge patch series "Support Multi-frequency scale for UFS"Martin K. Petersen
Ziqi Chen <quic_ziqichen@quicinc.com> says: With OPP V2 enabled, devfreq can scale clocks amongst multiple frequency plans. However, the gear speed is only toggled between min and max during clock scaling. Enable multi-level gear scaling by mapping clock frequencies to gear speeds, so that when devfreq scales clock frequencies we can put the UFS link at the appropraite gear speeds accordingly. This series has been tested on below platforms - sm8550 mtp + UFS3.1 SM8650 MTP + UFS3.1 SM8750 MTP + UFS4.0 Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-HDK Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-HDK Link: https://lore.kernel.org/r/20250213080008.2984807-1-quic_ziqichen@quicinc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20scsi: ufs: core: Toggle Write Booster during clock scaling base on gear speedCan Guo
During clock scaling, Write Booster is toggled on or off based on whether the clock is scaled up or down. However, with OPP V2 powered multi-level gear scaling, the gear can be scaled amongst multiple gear speeds, e.g., it may scale down from G5 to G4, or from G4 to G2. To provide flexibilities, add a new field for clock scaling such that during clock scaling Write Booster can be enabled or disabled based on gear speeds but not based on scaling up or down. Signed-off-by: Can Guo <quic_cang@quicinc.com> Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com> Link: https://lore.kernel.org/r/20250213080008.2984807-8-quic_ziqichen@quicinc.com Reviewed-by: Bean Huo <beanhuo@micron.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20scsi: ufs: core: Add a vop to map clock frequency to gear speedCan Guo
Add a vop to map UFS host controller clock frequencies to the corresponding maximum supported UFS high speed gear speeds. During clock scaling, we can map the target clock frequency, demanded by devfreq, to the maximum supported gear speed, so that devfreq can scale the gear to the highest gear speed supported at the target clock frequency, instead of just scaling up/down the gear between the min and max gear speeds. Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Can Guo <quic_cang@quicinc.com> Link: https://lore.kernel.org/r/20250213080008.2984807-4-quic_ziqichen@quicinc.com Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20scsi: ufs: core: Pass target_freq to clk_scale_notify() vopCan Guo
Instead of only two frequencies, if OPP V2 is used, the UFS devfreq clock scaling may scale the clock among multiple frequencies, so just passing up/down to vop clk_scale_notify() is not enough to cover the intermediate clock freqs between the min and max freqs. Hence pass the target_freq, which will be used in successive commits, to clk_scale_notify() to allow the vop to perform corresponding configurations with regard to the clock freqs. Signed-off-by: Can Guo <quic_cang@quicinc.com> Co-developed-by: Ziqi Chen <quic_ziqichen@quicinc.com> Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com> Link: https://lore.kernel.org/r/20250213080008.2984807-2-quic_ziqichen@quicinc.com Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20net: phy: remove unused feature array declarationsHeiner Kallweit
After 12d5151be010 ("net: phy: remove leftovers from switch to linkmode bitmaps") the following declarations are unused and can be removed too. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/b2883c75-4108-48f2-ab73-e81647262bc2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf bpf-6.14-rc4Alexei Starovoitov
Cross-merge bpf fixes after downstream PR (bpf-6.14-rc4). Minor conflict: kernel/bpf/btf.c Adjacent changes: kernel/bpf/arena.c kernel/bpf/btf.c kernel/bpf/syscall.c kernel/bpf/verifier.c mm/memory.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-21Merge tag 'tags/ib-mfd-power-v6.15' into psy-nextSebastian Reichel
Immutable branch between MFD and Power due for the v6.15 merge window Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-02-20Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels (Alan Maguire) - Fix a missing allocation failure check in BPF verifier's acquire_lock_state (Kumar Kartikeya Dwivedi) - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima) - Fix a deadlock when freeing BPF cgroup storage (Abel Wu) - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex (Andrii Nakryiko) - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is accessing skb data not containing an Ethernet header (Shigeru Yoshida) - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai) - Several BPF sockmap fixes to address incorrect TCP copied_seq calculations, which prevented correct data reads from recv(2) in user space (Jiayuan Chen) - Two fixes for BPF map lookup nullness elision (Daniel Xu) - Fix a NULL-pointer dereference from vmlinux BTF lookup in bpf_sk_storage_tracing_allowed (Jared Kangas) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests: bpf: test batch lookup on array of maps with holes bpf: skip non exist keys in generic_map_lookup_batch bpf: Handle allocation failure in acquire_lock_state bpf: verifier: Disambiguate get_constant_map_key() errors bpf: selftests: Test constant key extraction on irrelevant maps bpf: verifier: Do not extract constant map keys for irrelevant maps bpf: Fix softlockup in arena_map_free on 64k page kernel net: Add rx_skb of kfree_skb to raw_tp_null_args[]. bpf: Fix deadlock when freeing cgroup storage selftests/bpf: Add strparser test for bpf selftests/bpf: Fix invalid flag of recv() bpf: Disable non stream socket for strparser bpf: Fix wrong copied_seq calculation strparser: Add read_sock callback bpf: avoid holding freeze_mutex during mmap operation bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic selftests/bpf: Adjust data size to have ETH_HLEN bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type() bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20xsk: Add launch time hardware offload support to XDP Tx metadataSong Yoong Siang
Extend the XDP Tx metadata framework so that user can requests launch time hardware offload, where the Ethernet device will schedule the packet for transmission at a pre-determined time called launch time. The value of launch time is communicated from user space to Ethernet driver via launch_time field of struct xsk_tx_metadata. Suggested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
2025-02-20bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callbackJason Xing
This patch introduces a new callback in tcp_tx_timestamp() to correlate tcp_sendmsg timestamp with timestamps from other tx timestamping callbacks (e.g., SND/SW/ACK). Without this patch, BPF program wouldn't know which timestamps belong to which flow because of no socket lock protection. This new callback is inserted in tcp_tx_timestamp() to address this issue because tcp_tx_timestamp() still owns the same socket lock with tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes the timestamping related fields for the skb, especially tskey. The tskey is the bridge to do the correlation. For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and then stores the sendmsg timestamp at the bpf_sk_storage, correlating this timestamp with its tskey that are later used in other sending timestamping callbacks. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-11-kerneljasonxing@gmail.com
2025-02-20bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callbackJason Xing
Support the ACK case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_ACK_CB. This callback will occur at the same timestamping point as the user space's SCM_TSTAMP_ACK. The BPF program can use it to get the same SCM_TSTAMP_ACK timestamp without modifying the user-space application. This patch extends txstamp_ack to two bits: 1 stands for SO_TIMESTAMPING mode, 2 bpf extension. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-10-kerneljasonxing@gmail.com
2025-02-20bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callbackJason Xing
Support hw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This callback will occur at the same timestamping point as the user space's hardware SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers from driver side using SKBTX_HW_TSTAMP. The new definition of SKBTX_HW_TSTAMP means the combination tests of socket timestamping and bpf timestamping. After this patch, drivers can work under the bpf timestamping. Considering some drivers don't assign the skb with hardware timestamp, this patch does the assignment and then BPF program can acquire the hwstamp from skb directly. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-9-kerneljasonxing@gmail.com