summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-03-07netpoll: Optimize skb refilling on critical pathBreno Leitao
netpoll tries to refill the skb queue on every packet send, independently if packets are being consumed from the pool or not. This was particularly problematic while being called from printk(), where the operation would be done while holding the console lock. Introduce a more intelligent approach to skb queue management. Instead of constantly attempting to refill the queue, the system now defers refilling to a work queue and only triggers the workqueue when a buffer is actually dequeued. This change significantly reduces operations with the lock held. Add a work_struct to the netpoll structure for asynchronous refilling, updating find_skb() to schedule refill work only when necessary (skb is dequeued). These changes have demonstrated a 15% reduction in time spent during netpoll_send_msg operations, especially when no SKBs are not consumed from consumed from pool. When SKBs are being dequeued, the improvement is even better, around 70%, mainly because refilling the SKB pool is now happening outside of the critical patch (with console_owner lock held). Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-netpoll_refill_v2-v1-1-06e2916a4642@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-07tcp: ulp: diag: more info without CAP_NET_ADMINMatthieu Baerts (NGI0)
When introduced in commit 61723b393292 ("tcp: ulp: add functions to dump ulp-specific information"), the whole ULP diag info has been exported only if the requester had CAP_NET_ADMIN. It looks like not everything is sensitive, and some info can be exported to all users in order to ease the debugging from the userspace side without requiring additional capabilities. Each layer should then decide what can be exposed to everybody. The 'net_admin' boolean is then passed to the different layers. On kTLS side, it looks like there is nothing sensitive there: version, cipher type, tx/rx user config type, plus some flags. So, only some metadata about the configuration, no cryptographic info like keys, etc. Then, everything can be exported to all users. On MPTCP side, that's different. The MPTCP-related sequence numbers per subflow should certainly not be exposed to everybody. For example, the DSS mapping and ssn_offset would give all users on the system access to narrow ranges of values for the subflow TCP sequence numbers and MPTCP-level DSNs, and then ease packet injection. The TCP diag interface doesn't expose the TCP sequence numbers for TCP sockets, so best to do the same here. The rest -- token, IDs, flags -- can be exported to everybody. Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250306-net-next-tcp-ulp-diag-net-admin-v1-2-06afdd860fc9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-07net: phylink: Remove unused phylink_init_eeeDr. David Alan Gilbert
phylink_init_eee() is currently unused. It was last added in 2019 by commit 86e58135bc4a ("net: phylink: add phylink_init_eee() helper") but it didn't actually wire a use up. It had previous been removed in 2017 by commit 939eae25d9a5 ("phylink: remove phylink_init_eee()"). Remove it again. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250306184534.246152-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-08power: supply: core: get rid of of_nodeSebastian Reichel
This removes .of_node from 'struct power_supply', since there is already a copy in .dev.of_node and there is no need to have two copies. Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20250225-psy-core-convert-to-fwnode-v1-1-d5e4369936bb@collabora.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-03-08power: supply: Remove unused set_charged methodDr. David Alan Gilbert
The previous patches in this series removed the only caller and only setter of this method. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250307230225.128775-4-linux@treblig.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-03-08power: supply: core: Remove unused power_supply_set_battery_chargedDr. David Alan Gilbert
power_supply_set_battery_charged() has been unused since 2019's commit 0f884f8a090e ("ARM: pxa: remove raumfeld board files and defconfig") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250307230225.128775-2-linux@treblig.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-03-08counter: microchip-tcb-capture: Add capture extensions for registers RA/RBBence Csókás
TCB hardware is capable of capturing the timer value to registers RA and RB. Add these registers as capture extensions. Signed-off-by: Bence Csókás <csokas.bence@prolan.hu> Link: https://lore.kernel.org/r/20250306134441.582819-3-csokas.bence@prolan.hu Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2025-03-08counter: microchip-tcb-capture: Add IRQ handlingBence Csókás
Add interrupt servicing to allow userspace to wait for the following: * Change-of-state caused by external trigger * Capture of timer value into RA/RB * Compare to RC register * Overflow Signed-off-by: Bence Csókás <csokas.bence@prolan.hu> Link: https://lore.kernel.org/r/20250306134441.582819-2-csokas.bence@prolan.hu Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2025-03-07Merge tag 'acpi-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Restore the previous behavior of the ACPI platform_profile sysfs interface that has been changed recently in a way incompatible with the existing user space (Mario Limonciello)" * tag 'acpi-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: platform/x86/amd: pmf: Add balanced-performance to hidden choices platform/x86/amd: pmf: Add 'quiet' to hidden choices ACPI: platform_profile: Add support for hidden choices
2025-03-07Merge tag 'block-6.14-20250306' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - TCP use after free fix on polling (Sagi) - Controller memory buffer cleanup fixes (Icenowy) - Free leaking requests on bad user passthrough commands (Keith) - TCP error message fix (Maurizio) - TCP corruption fix on partial PDU (Maurizio) - TCP memory ordering fix for weakly ordered archs (Meir) - Type coercion fix on message error for TCP (Dan) - Name the RQF flags enum, fixing issues with anon enums and BPF import of it - ublk parameter setting fix - GPT partition 7-bit conversion fix * tag 'block-6.14-20250306' of git://git.kernel.dk/linux: block: Name the RQF flags enum nvme-tcp: fix signedness bug in nvme_tcp_init_connection() block: fix conversion of GPT partition name to 7-bit ublk: set_params: properly check if parameters can be applied nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() nvme-tcp: Fix a C2HTermReq error message nvmet: remove old function prototype nvme-ioctl: fix leaked requests on mapping error nvme-pci: skip CMB blocks incompatible with PCI P2P DMA nvme-pci: clean up CMBMSC when registering CMB fails nvme-tcp: fix possible UAF in nvme_tcp_poll
2025-03-07drm/amdkfd: Add support for more per-process flagHarish Kasiviswanathan
Add support for more per-process flags starting with option to configure MFMA precision for gfx 9.5 v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION Remove unused else condition v3: Bump the KFD API version v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07elf: add remaining SHF_ flag macrosTimur Tabi
Add the remaining SHF_ flags, as listed in the "Executable and Linkable Format" Wikipedia page and the System V Application Binary Interface[1]. This allows drivers to load and parse ELF images that use some of those flags. In particular, an upcoming change to the Nouveau GPU driver will use some of the flags. Link: https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.sheader.html#sh_flags [1] Signed-off-by: Timur Tabi <ttabi@nvidia.com> Link: https://lore.kernel.org/r/20250307171417.267488-1-ttabi@nvidia.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-07Revert "Bluetooth: hci_core: Fix sleeping function called from invalid context"Luiz Augusto von Dentz
This reverts commit 4d94f05558271654670d18c26c912da0c1c15549 which has problems (see [1]) and is no longer needed since 581dd2dc168f ("Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating") has reworked the code where the original bug has been found. [1] Link: https://lore.kernel.org/linux-bluetooth/877c55ci1r.wl-tiwai@suse.de/T/#t Fixes: 4d94f0555827 ("Bluetooth: hci_core: Fix sleeping function called from invalid context") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-03-07PCI: Increase Resizable BAR support from 512 GB to 128 TBZhiyuan Dai
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to 128 TB in the Resizable BAR Capability register. Larger sizes can be advertised via the Capability register, but that requires an API change. Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the sizes we currently support from 512 GB to 128 TB. Link: https://lore.kernel.org/r/20250307053535.44918-1-daizhiyuan@phytium.com.cn Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-07io_uring/rw: defer reg buf vec importPavel Begunkov
Import registered buffers for vectored reads and writes later at issue time as we now do for other fixed ops. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e8491c976e4ab83a4e3dc428e9fe7555e59583b8.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring/rw: implement vectored registered rwPavel Begunkov
Implement registered buffer vectored reads with new opcodes IORING_OP_WRITEV_FIXED and IORING_OP_READV_FIXED. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7c89eb481e870f598edc91cc66ff4d1e4ae3788.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: add infra for importing vectored reg buffersPavel Begunkov
Add io_import_reg_vec(), which will be responsible for importing vectored registered buffers. The function might reallocate the vector, but it'd try to do the conversion in place first, which is why it's required of the user to pad the iovec to the right border of the cache. Overlapping also depends on struct iovec being larger than bvec, which is not the case on e.g. 32 bit architectures. Don't try to complicate this case and make sure vectors never overlap, it'll be improved later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60bd246b1249476a6996407c1dbc38ef6febad14.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: introduce struct iou_vecPavel Begunkov
I need a convenient way to pass around and work with iovec+size pair, put them into a structure and makes use of it in rw.c Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d39fadafc9e9047b0a292e5be6db3cf2f48bb1f7.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07Merge branch 'for-6.15/io_uring-epoll-wait' into for-6.15/io_uring-reg-vecJens Axboe
* for-6.15/io_uring-epoll-wait: io_uring/epoll: add support for IORING_OP_EPOLL_WAIT io_uring/epoll: remove CONFIG_EPOLL guards eventpoll: add epoll_sendevents() helper eventpoll: abstract out ep_try_send_events() helper eventpoll: abstract out parameter sanity checking
2025-03-07Merge branch 'for-6.15/io_uring-rx-zc' into for-6.15/io_uring-reg-vecJens Axboe
* for-6.15/io_uring-rx-zc: (80 commits) io_uring/zcrx: add selftest case for recvzc with read limit io_uring/zcrx: add a read limit to recvzc requests io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap io_uring: Rename KConfig to Kconfig io_uring/zcrx: fix leaks on failed registration io_uring/zcrx: recheck ifq on shutdown io_uring/zcrx: add selftest net: add documentation for io_uring zcrx io_uring/zcrx: add copy fallback io_uring/zcrx: throttle receive requests io_uring/zcrx: set pp memory provider for an rx queue io_uring/zcrx: add io_recvzc request io_uring/zcrx: dma-map area for the device io_uring/zcrx: implement zerocopy receive pp memory provider io_uring/zcrx: grab a net device io_uring/zcrx: add io_zcrx_area io_uring/zcrx: add interface queue and refill queue net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers ...
2025-03-07Merge branch 'for-6.15/io_uring' into for-6.15/io_uring-reg-vecJens Axboe
* for-6.15/io_uring: (80 commits) io_uring: introduce io_cache_free() helper io_uring/rsrc: skip NULL file/buffer checks in io_free_rsrc_node() io_uring/rsrc: avoid NULL node check on io_sqe_buffer_register() failure io_uring/rsrc: call io_free_node() on io_sqe_buffer_register() failure io_uring/rsrc: free io_rsrc_node using kfree() io_uring/rsrc: split out io_free_node() helper io_uring/rsrc: include io_uring_types.h in rsrc.h ublk: don't cast registered buffer index to int io_uring/nop: use io_find_buf_node() io_uring/rsrc: declare io_find_buf_node() in header file io_uring/ublk: report error when unregister operation fails io_uring: convert cmd_to_io_kiocb() macro to function io_uring/uring_cmd: specify io_uring_cmd_import_fixed() pointer type io_uring/rsrc: use rq_data_dir() to compute bvec dir selftests: ublk: add ublk zero copy test selftests: ublk: add file backed ublk selftests: ublk: add kernel selftests for ublk io_uring: cache nodes and mapped buffers ublk: zc register/unregister bvec io_uring: add support for kernel registered bvecs ...
2025-03-07PM: EM: Address RCU-related sparse warningsRafael J. Wysocki
The usage of __rcu in the Energy Model code is quite inconsistent which causes the following sparse warnings to trigger: kernel/power/energy_model.c:169:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:169:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:169:15: got struct em_perf_table * kernel/power/energy_model.c:171:9: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:171:9: expected struct callback_head *head kernel/power/energy_model.c:171:9: got struct callback_head [noderef] __rcu * kernel/power/energy_model.c:171:9: warning: cast removes address space '__rcu' of expression kernel/power/energy_model.c:182:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:182:19: expected struct kref *kref kernel/power/energy_model.c:182:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:200:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:200:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:200:15: got void *[assigned] _res kernel/power/energy_model.c:204:20: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:204:20: expected struct kref *kref kernel/power/energy_model.c:204:20: got struct kref [noderef] __rcu * kernel/power/energy_model.c:320:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:320:19: expected struct kref *kref kernel/power/energy_model.c:320:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:325:45: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:325:45: expected struct em_perf_state *table kernel/power/energy_model.c:325:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:425:45: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:425:45: expected struct em_perf_state *table kernel/power/energy_model.c:425:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:442:15: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:442:15: expected void const *objp kernel/power/energy_model.c:442:15: got struct em_perf_table [noderef] __rcu *[assigned] em_table kernel/power/energy_model.c:626:55: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:626:55: expected struct em_perf_state *table kernel/power/energy_model.c:626:55: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:681:16: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:681:16: expected struct em_perf_state *new_ps kernel/power/energy_model.c:681:16: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:699:37: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:699:37: expected struct em_perf_state *table kernel/power/energy_model.c:699:37: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:733:38: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:733:38: expected struct em_perf_state *table kernel/power/energy_model.c:733:38: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:855:53: warning: dereference of noderef expression kernel/power/energy_model.c:864:32: warning: dereference of noderef expression This is because the __rcu annotation for sparse is only applicable to pointers that need rcu_dereference() or equivalent for protection, which basically means pointers assigned with rcu_assign_pointer(). Make all of the above sparse warnings go away by cleaning up the usage of __rcu and using rcu_dereference_protected() where applicable. Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/5885405.DvuYhMxLoT@rjwysocki.net
2025-03-07bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()Luis Chamberlain
The commit titled "block/bdev: lift block size restrictions to 64k" lifted the block layer's max supported block size to 64k inside the helper blk_validate_block_size() now that we support large folios. However in lifting the block size we also removed the silly use cases many filesystems have to use sb_set_blocksize() to *verify* that the block size <= PAGE_SIZE. The call to sb_set_blocksize() was used to check the block size <= PAGE_SIZE since historically we've always supported userspace to create for example 64k block size filesystems even on 4k page size systems, but what we didn't allow was mounting them. Older filesystems have been using the check with sb_set_blocksize() for years. While, we could argue that such checks should be filesystem specific, there are much more users of sb_set_blocksize() than LBS enabled filesystem on upstream, so just do the easier thing and bring back the PAGE_SIZE check for sb_set_blocksize() users and only skip it for LBS enabled filesystems. This will ensure that tests such as generic/466 when run in a loop against say, ext4, won't try to try to actually mount a filesystem with a block size larger than your filesystem supports given your PAGE_SIZE and in the worst case crash. Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20250307020403.3068567-1-mcgrof@kernel.org Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06fs/pipe: add simpler helpers for common casesLinus Torvalds
The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the involved types. It turns out that we don't have _that_ many places that access these fields directly and were affected, but we have more than we strictly should have, because our low-level helper functions have been designed to have intimate knowledge of how the pipes work. And as a result, that random noise of direct 'pipe->head' and 'pipe->tail' accesses makes it harder to pinpoint any actual potential problem spots remaining. For example, we didn't have a "is the pipe full" helper function, but instead had a "given these pipe buffer indexes and this pipe size, is the pipe full". That's because some low-level pipe code does actually want that much more complicated interface. But most other places literally just want a "is the pipe full" helper, and not having it meant that those places ended up being unnecessarily much too aware of this all. It would have been much better if only the very core pipe code that cared had been the one aware of this all. So let's fix it - better late than never. This just introduces the trivial wrappers for "is this pipe full or empty" and to get how many pipe buffers are used, so that instead of writing if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) the places that literally just want to know if a pipe is full can just say if (pipe_is_full(pipe)) instead. The existing trivial cases were converted with a 'sed' script. This cuts down on the places that access pipe->head and pipe->tail directly outside of the pipe code (and core splice code) quite a lot. The splice code in particular still revels in doing the direct low-level accesses, and the fuse fuse_dev_splice_write() code also seems a bit unnecessarily eager to go very low-level, but it's at least a bit better than it used to be. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06Merge tag 'nf-25-03-06' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix racy non-atomic read-then-increment operation with PREEMPT_RT in nft_ct, from Sebastian Andrzej Siewior. 2) GC is not skipped when jiffies wrap around in nf_conncount, from Nicklas Bo Jensen. 3) flush_work() on nf_tables_destroy_work waits for the last queued instance, this could be an instance that is different from the one that we must wait for, then make destruction work queue. * tag 'nf-25-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: make destruction work queue pernet netfilter: nf_conncount: garbage collection is not skipped when jiffies wrap around netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template. ==================== Link: https://patch.msgid.link/20250306153446.46712-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net/mlx5: Relocate function declarations from port.h to mlx5_core.hShahar Shitrit
The port header is a general file under include, yet it contains declarations for functions that are either not exported or exported but not used outside the mlx5_core driver. To enhance code organization, we move these declarations to mlx5_core.h, where they are more appropriately scoped. This refactor removes unnecessary exported symbols and prevents unexported functions from being inadvertently referenced outside of the mlx5_core driver. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06block: Name the RQF flags enumBreno Leitao
Commit 5f89154e8e9e3445f9b59 ("block: Use enum to define RQF_x bit indexes") converted the RQF flags to an anonymous enum, which was a beneficial change. This patch goes one step further by naming the enum as "rqf_flags". This naming enables exporting these flags to BPF clients, eliminating the need to duplicate these flags in BPF code. Instead, BPF clients can now access the same kernel-side values through CO:RE (Compile Once, Run Everywhere), as shown in this example: rqf_stats = bpf_core_enum_value(enum rqf_flags, __RQF_STATS) Suggested-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20250306-rqf_flags-v1-1-bbd64918b406@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07Merge tag 'drm-misc-next-2025-03-06' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: Cross-subsystem Changes: base: - component: Provide helper to query bound status fbdev: - fbtft: Remove access to page->index Core Changes: - Fix usage of logging macros in several places gem: - Add test function for imported dma-bufs and use it in core and helpers - Avoid struct drm_gem_object.import_attach tests: - Fix lockdep warnings ttm: - Add helpers for TTM shrinker Driver Changes: adp: - Add support for Apple Touch Bar displays on M1/M2 amdxdna: - Fix interrupt handling appletbdrm: - Add support for Apple Touch Bar displays on x86 bridge: - synopsys: Add HDMI audio support - ti-sn65dsi83: Support negative DE polarity ipu-v3: - Remove unused code nouveau: - Avoid multiple -Wflex-array-member-not-at-end warnings panthor: - Fix CS_STATUS_ defines - Improve locking rockchip: - analogix_dp: Add eDP support - lvds: Improve logging - vop2: Improve HDMI mode handling; Add support for RK3576 - Fix shutdown - Support rk3562-mali xe: - Use TTM shrinker Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250306130700.GA485504@linux.fritz.box
2025-03-06inet: call inet6_ehashfn() once from inet6_hash_connect()Eric Dumazet
inet6_ehashfn() being called from __inet6_check_established() has a big impact on performance, as shown in the Tested section. After prior patch, we can compute the hash for port 0 from inet6_hash_connect(), and derive each hash in __inet_hash_connect() from this initial hash: hash(saddr, lport, daddr, dport) == hash(saddr, 0, daddr, dport) + lport Apply the same principle for __inet_check_established(), although inet_ehashfn() has a smaller cost. Tested: Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server Before this patch: utime_start=0.286131 utime_end=4.378886 stime_start=11.952556 stime_end=1991.655533 num_transactions=1446830 latency_min=0.001061085 latency_max=12.075275028 latency_mean=0.376375302 latency_stddev=1.361969596 num_samples=306383 throughput=151866.56 perf top: 50.01% [kernel] [k] __inet6_check_established 20.65% [kernel] [k] __inet_hash_connect 15.81% [kernel] [k] inet6_ehashfn 2.92% [kernel] [k] rcu_all_qs 2.34% [kernel] [k] __cond_resched 0.50% [kernel] [k] _raw_spin_lock 0.34% [kernel] [k] sched_balance_trigger 0.24% [kernel] [k] queued_spin_lock_slowpath After this patch: utime_start=0.315047 utime_end=9.257617 stime_start=7.041489 stime_end=1923.688387 num_transactions=3057968 latency_min=0.003041375 latency_max=7.056589232 latency_mean=0.141075048 # Better latency metrics latency_stddev=0.526900516 num_samples=312996 throughput=320677.21 # 111 % increase, and 229 % for the series perf top: inet6_ehashfn is no longer seen. 39.67% [kernel] [k] __inet_hash_connect 37.06% [kernel] [k] __inet6_check_established 4.79% [kernel] [k] rcu_all_qs 3.82% [kernel] [k] __cond_resched 1.76% [kernel] [k] sched_balance_domains 0.82% [kernel] [k] _raw_spin_lock 0.81% [kernel] [k] sched_balance_rq 0.81% [kernel] [k] sched_balance_trigger 0.76% [kernel] [k] queued_spin_lock_slowpath Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250305034550.879255-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06Merge branch 'sched/urgent' into sched/core, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: net/ethtool/cabletest.c 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock") 637399bf7e77 ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device") No Adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06eth: bnxt: remove most dependencies on RTNLStanislav Fomichev
Only devlink and sriov paths are grabbing rtnl explicitly. The rest is covered by netdev instance lock which the core now grabs, so there is no need to manage rtnl in most places anymore. On the core side we can now try to drop rtnl in some places (do_setlink for example) for the drivers that signal non-rtnl mode (TBD). Boot-tested and with `ethtool -L eth1 combined 24` to trigger reset. Cc: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-15-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06docs: net: document new locking realityStanislav Fomichev
Also clarify ndo_get_stats (that read and write paths can run concurrently) and mention only RCU. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-14-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: add option to request netdev instance lockStanislav Fomichev
Currently only the drivers that implement shaper or queue APIs are grabbing instance lock. Add an explicit opt-in for the drivers that want to grab the lock without implementing the above APIs. There is a 3-byte hole after @up, use it: /* --- cacheline 47 boundary (3008 bytes) --- */ u32 napi_defer_hard_irqs; /* 3008 4 */ bool up; /* 3012 1 */ /* XXX 3 bytes hole, try to pack */ struct mutex lock; /* 3016 144 */ /* XXX last struct has 1 hole */ Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-13-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: replace dev_addr_sem with netdev instance lockStanislav Fomichev
Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most of the paths that set/get mac are RTNL protected. Two places where it's not, convert to explicit locking: - sysfs address_show - dev_get_mac_address via dev_ioctl 0: https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/993321/24-router-bridge-1d-lag-sh/stderr Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-12-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ndo_bpfStanislav Fomichev
Cover the paths that come via bpf system call and XSK bind. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-10-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during sysfs operationsStanislav Fomichev
Most of them are already covered by the converted dev_xxx APIs. Add the locking wrappers for the remaining ones. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-9-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ioctl operationsStanislav Fomichev
Convert all ndo_eth_ioctl invocations to dev_eth_ioctl which does the locking. Reflow some of the dev_siocxxx to drop else clause. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during rtnetlink operationsStanislav Fomichev
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries: - (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference) This avoids touching most of the existing ioctl/sysfs/drivers paths. Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked. Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during queue operationsStanislav Fomichev
For the drivers that use queue management API, switch to the mode where core stack holds the netdev instance lock. This affects the following drivers: - bnxt - gve - netdevsim Originally I locked only start/stop, but switched to holding the lock over all iterations to make them look atomic to the device (feels like it should be easier to reason about). Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during nft ndo_setup_tcStanislav Fomichev
Introduce new dev_setup_tc for nft ndo_setup_tc paths. Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ndo_open/ndo_stopStanislav Fomichev
For the drivers that use shaper API, switch to the mode where core stack holds the netdev lock. This affects two drivers: * iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly remove these * netdevsim - switch to _locked APIs to avoid deadlock iavf_close diff is a bit confusing, the existing call looks like this: iavf_close() { netdev_lock() .. netdev_unlock() wait_event_timeout(down_waitqueue) } I change it to the following: netdev_lock() iavf_close() { .. netdev_unlock() wait_event_timeout(down_waitqueue) netdev_lock() // reusing this lock call } netdev_unlock() Since I'm reusing existing netdev_lock call, so it looks like I only add netdev_unlock. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06Merge tag 'amd-pstate-v6.15-2025-03-06' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge amd-pstate updates for 6.15 (3/6/25) from Mario Limonciello: "A lot of code optimization to avoid cases where call paths will end up calling the same writes multiple times and needlessly caching variables. To accomplish this some of the writes are now made into an atomically written "perf" variable. Locking has been overhauled to ensure it only applies to the necessary functions. Tracing has been adjusted to ensure trace events only are used right before writing out to the hardware." NOTE: This is a redo of amd-pstate-v6.15-2025-03-03 with a fixed Fixes tag. * tag 'amd-pstate-v6.15-2025-03-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: (29 commits) cpufreq/amd-pstate: Drop actions in amd_pstate_epp_cpu_offline() cpufreq/amd-pstate: Stop caching EPP cpufreq/amd-pstate: Rework CPPC enabling cpufreq/amd-pstate: Drop debug statements for policy setting cpufreq/amd-pstate: Update cppc_req_cached for shared mem EPP writes cpufreq/amd-pstate: Move all EPP tracing into *_update_perf and *_set_epp functions cpufreq/amd-pstate: Cache CPPC request in shared mem case too cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks cpufreq/amd-pstate-ut: Adjust variable scope cpufreq/amd-pstate-ut: Run on all of the correct CPUs cpufreq/amd-pstate-ut: Drop SUCCESS and FAIL enums cpufreq/amd-pstate-ut: Allow lowest nonlinear and lowest to be the same cpufreq/amd-pstate-ut: Use _free macro to free put policy cpufreq/amd-pstate: Drop `cppc_cap1_cached` cpufreq/amd-pstate: Overhaul locking cpufreq/amd-pstate: Move perf values into a union cpufreq/amd-pstate: Drop min and max cached frequencies cpufreq/amd-pstate: Show a warning when a CPU fails to setup cpufreq/amd-pstate: Invalidate cppc_req_cached during suspend cpufreq/amd-pstate: Fix the clamping of perf values ...
2025-03-06PM: EM: Consify two parameters of em_dev_register_perf_domain()Rafael J. Wysocki
Notice that em_dev_register_perf_domain() and the functions called by it do not update objects pointed to by its cb and cpus parameters, so the const modifier can be added to them. This allows the return value of cpumask_of() or a pointer to a struct em_data_callback declared as const to be passed to em_dev_register_perf_domain() directly without explicit type casting which is rather handy. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/4648962.LvFx2qVVIh@rjwysocki.net
2025-03-06drm/xe/uapi: Add DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRRORMatthew Brost
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag, which indicates whether the device supports CPU address mirroring. The intent is for UMDs to use this query to determine if a VM can be set up with CPU address mirroring. This flag is implemented by checking if the device supports GPU faults. v7: - Only report enabled if CONFIG_DRM_GPUSVM is selected (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-20-matthew.brost@intel.com
2025-03-06drm/gpuvm: Add DRM_GPUVA_OP_DRIVERMatthew Brost
Add DRM_GPUVA_OP_DRIVER which allows driver to define their own gpuvm ops. Useful for driver created ops which can be passed into the bind software pipeline. v3: - s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas) - Better commit message (Thomas) Cc: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-14-matthew.brost@intel.com
2025-03-06drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRRORMatthew Brost
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to create unpopulated virtual memory areas (VMAs) without memory backing or GPU page tables. These VMAs are referred to as CPU address mirror VMAs. The idea is that upon a page fault or prefetch, the memory backing and GPU page tables will be populated. CPU address mirror VMAs only update GPUVM state; they do not have an internal page table (PT) state, nor do they have GPU mappings. It is expected that CPU address mirror VMAs will be mixed with buffer object (BO) VMAs within a single VM. In other words, system allocations and runtime allocations can be mixed within a single user-mode driver (UMD) program. Expected usage: - Bind the entire virtual address (VA) space upon program load using the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - If a buffer object (BO) requires GPU mapping (runtime allocation), allocate a CPU address using mmap(PROT_NONE), bind the BO to the mmapped address using existing bind IOCTLs. If a CPU map of the BO is needed, mmap it again to the same CPU address using mmap(MAP_FIXED) - If a BO no longer requires GPU mapping, munmap it from the CPU address space and them bind the mapping address with the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - Any malloc'd or mmapped CPU address accessed by the GPU will be faulted in via the SVM implementation (system allocation). - Upon freeing any mmapped or malloc'd data, the SVM implementation will remove GPU mappings. Only supporting 1 to 1 mapping between user address space and GPU address space at the moment as that is the expected use case. uAPI defines interface for non 1 to 1 but enforces 1 to 1, this restriction can be lifted if use cases arrise for non 1 to 1 mappings. This patch essentially short-circuits the code in the existing VM bind paths to avoid populating page tables when the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set. v3: - Call vm_bind_ioctl_ops_fini on -ENODATA - Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs - s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas) - Rework commit message for expected usage (Thomas) - Describe state of code after patch in commit message (Thomas) v4: - Fix alignment (Checkpatch) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com
2025-03-06drm/gpusvm: Add support for GPU Shared Virtual MemoryMatthew Brost
This patch introduces support for GPU Shared Virtual Memory (SVM) in the Direct Rendering Manager (DRM) subsystem. SVM allows for seamless sharing of memory between the CPU and GPU, enhancing performance and flexibility in GPU computing tasks. The patch adds the necessary infrastructure for SVM, including data structures and functions for managing SVM ranges and notifiers. It also provides mechanisms for allocating, deallocating, and migrating memory regions between system RAM and GPU VRAM. This is largely inspired by GPUVM. v2: - Take order into account in check pages - Clear range->pages in get pages error - Drop setting dirty or accessed bit in get pages (Vetter) - Remove mmap assert for cpu faults - Drop mmap write lock abuse (Vetter, Christian) - Decouple zdd from range (Vetter, Oak) - Add drm_gpusvm_range_evict, make it work with coherent pages - Export drm_gpusvm_evict_to_sram, only use in BO evict path (Vetter) - mmget/put in drm_gpusvm_evict_to_sram - Drop range->vram_alloation variable - Don't return in drm_gpusvm_evict_to_sram until all pages detached - Don't warn on mixing sram and device pages - Update kernel doc - Add coherent page support to get pages - Use DMA_FROM_DEVICE rather than DMA_BIDIRECTIONAL - Add struct drm_gpusvm_vram and ops (Thomas) - Update the range's seqno if the range is valid (Thomas) - Remove the is_unmapped check before hmm_range_fault (Thomas) - Use drm_pagemap (Thomas) - Drop kfree_mapping (Thomas) - dma mapp pages under notifier lock (Thomas) - Remove ctx.prefault - Remove ctx.mmap_locked - Add ctx.check_pages - s/vram/devmem (Thomas) v3: - Fix memory leak drm_gpusvm_range_get_pages - Only migrate pages with same zdd on CPU fault - Loop over al VMAs in drm_gpusvm_range_evict - Make GPUSVM a drm level module - GPL or MIT license - Update main kernel doc (Thomas) - Prefer foo() vs foo for functions in kernel doc (Thomas) - Prefer functions over macros (Thomas) - Use unsigned long vs u64 for addresses (Thomas) - Use standard interval_tree (Thomas) - s/drm_gpusvm_migration_put_page/drm_gpusvm_migration_unlock_put_page (Thomas) - Drop err_out label in drm_gpusvm_range_find_or_insert (Thomas) - Fix kernel doc in drm_gpusvm_range_free_pages (Thomas) - Newlines between functions defs in header file (Thomas) - Drop shall language in driver vfunc kernel doc (Thomas) - Move some static inlines from head to C file (Thomas) - Don't allocate pages under page lock in drm_gpusvm_migrate_populate_ram_pfn (Thomas) - Change check_pages to a threshold v4: - Fix NULL ptr deref in drm_gpusvm_migrate_populate_ram_pfn (Thomas, Himal) - Fix check pages threshold - Check for range being unmapped under notifier lock in get pages (Testing) - Fix characters per line - Drop WRITE_ONCE for zdd->devmem_allocation assignment (Thomas) - Use completion for devmem_allocation->detached (Thomas) - Make GPU SVM depend on ZONE_DEVICE (CI) - Use hmm_range_fault for eviction (Thomas) - Drop zdd worker (Thomas) v5: - Select Kconfig deps (CI) - Set device to NULL in __drm_gpusvm_migrate_to_ram (Matt Auld, G.G.) - Drop Thomas's SoB (Thomas) - Add drm_gpusvm_range_start/end/size helpers (Thomas) - Add drm_gpusvm_notifier_start/end/size helpers (Thomas) - Absorb drm_pagemap name changes (Thomas) - Fix driver lockdep assert (Thomas) - Move driver lockdep assert to static function (Thomas) - Assert mmap lock held in drm_gpusvm_migrate_to_devmem (Thomas) - Do not retry forever on eviction (Thomas) v6: - Fix drm_gpusvm_get_devmem_page alignment (Checkpatch) - Modify Kconfig (CI) - Compile out lockdep asserts (CI) v7: - Add kernel doc for flags fields (CI, Auld) Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-7-matthew.brost@intel.com
2025-03-06drm/pagemap: Add DRM pagemapThomas Hellström
Introduce drm_pagemap ops to map and unmap dma to VRAM resources. In the local memory case it's a matter of merely providing an offset into the device's physical address. For future p2p the map and unmap functions may encode as needed. Similar to how dma-buf works, let the memory provider (drm_pagemap) provide the mapping functionality. v3: - Move to drm level include v4: - Fix kernel doc (G.G.) v5: - s/map_dma/device_map (Thomas) - s/unmap_dma/device_unmap (Thomas) v7: - Fix kernel doc (CI, Auld) - Drop P2P define (Thomas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-5-matthew.brost@intel.com
2025-03-06mm/migrate: Add migrate_device_pfnsMatthew Brost
Add migrate_device_pfns which prepares an array of pre-populated device pages for migration. This is needed for eviction of known set of non-contiguous devices pages to cpu pages which is a common case for SVM in DRM drivers using TTM. v2: - s/migrate_device_vma_range/migrate_device_prepopulated_range - Drop extra mmu invalidation (Vetter) v3: - s/migrate_device_prepopulated_range/migrate_device_pfns (Alistar) - Use helper to lock device pages (Alistar) - Update commit message with why this is required (Alistar) Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-3-matthew.brost@intel.com