summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-04-23bpf: add support for bpf_wq user typeBenjamin Tissoires
Mostly a copy/paste from the bpf_timer API, without the initialization and free, as they will be done in a separate patch. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-5-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24regulator: change devm_regulator_get_enable_optional() stub to return OkMatti Vaittinen
The devm_regulator_get_enable_optional() should be a 'call and forget' API, meaning, when it is used to enable the regulators, the API does not provide a handle to do any further control of the regulators. It gives no real benefit to return an error from the stub if CONFIG_REGULATOR is not set. On the contrary, returning an error is causing problems to drivers when hardware is such it works out just fine with no regulator control. Returning an error forces drivers to specifically handle the case where CONFIG_REGULATOR is not set, making the mere existence of the stub questionalble. Change the stub implementation for the devm_regulator_get_enable_optional() to return Ok so drivers do not separately handle the case where the CONFIG_REGULATOR is not set. Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Fixes: da279e6965b3 ("regulator: Add devm helpers for get and enable") Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/ZiedtOE00Zozd3XO@fedora Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-24Merge tag 'drm-xe-next-2024-04-23' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Remove unused flags (Francois Dugast) - Extend uAPI to query HuC micro-controler firmware version (Francois Dugast) - drm/xe/uapi: Define topology types as indexes rather than masks (Francois Dugast) - drm/xe/uapi: Restore flags VM_BIND_FLAG_READONLY and VM_BIND_FLAG_IMMEDIATE (Francois Dugast) - devcoredump updates. Some touching the output format. (José Roberto de Souza, Matthew Brost) - drm/xe/hwmon: Add infra to support card power and energy attributes - Improve LRC, HWSP and HWCTX error capture. (Maarten Lankhorst) - drm/xe/uapi: Add IP version and stepping to GT list query (Matt roper) - Invalidate userptr VMA on page pin fault (Matthew Brost) - Improve xe_bo_move tracepoint (Priyanka Danamudi) - Align fence output format in ftrace log Cross-driver Changes: - drm/i915/hwmon: Get rid of devm (Ashutosh Dixit) (Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>) - drm/i915/display: convert inner wakeref get towards get_if_in_use (SOB Rodrigo Vivi) - drm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref (Committer, SOB Jani Nikula) Driver Changes: - Fix for unneeded CCS metadata allocation (Akshata Jahagirdar) - Fix for fix multicast support for Xe_LP platforms (Andrzej Hajda) - A couple of build fixes (Arnd Bergmann) - Fix register definition (Ashutosh Dixit) - Add BMG mocs table (Balasubramani Vivekanandan) - Replace sprintf() across driver (Bommu Krishnaiah) - Add an xe2 workaround (Bommu Krishnaiah) - Makefile fix (Dafna Hirschfeld) - force_wake_get error value check (Daniele Ceraolo Spurio) - Handle GSCCS ER interrupt (Daniele Ceraolo Spurio) - GSC Workaround (Daniele Ceraolo Spurio) - Build error fix (Dawei Li) - drm/xe/gt: Add L3 bank mask to GT topology (Francois Dugast) - Implement xe2- and GuC workarounds (Gustavo Sousa, Haridhar Kalvala, Himal rasad Ghimiray, John Harrison, Matt Roper, Radhakrishna Sripada, Vinay Belgaumkar, Badal Nilawar) - xe2hpg compression (Himal Ghimiray Prasad) - Error code cleanups and fixes (Himal Prasad Ghimiray) - struct xe_device cleanup (Jani Nikula) - Avoid validating bos when only requesting an exec dma-fence (José Roberto de Souza) - Remove debug message from migrate_clear (José Roberto de Souza) - Nuke EXEC_QUEUE_FLAG_PERSISTENT leftover internal flag (José Roberto de Souza) - Mark dpt and related vma as uncached (Juha-Pekka Heikkila) - Hwmon updates (Karthik Poosa) - KConfig fix when ACPI_WMI selcted (Lu Yao) - Update intel_uncore_read*() return types (Luca Coelho) - Mocs updates (Lucas De Marchi, Matt Roper) - Drop dynamic load-balancing workaround (Lucas De Marchi) - Fix a PVC workaround (Lucas De Marchi) - Group live kunit tests into a single module (Lucas De Marchi) - Various code cleanups (Lucas De Marchi) - Fix a ggtt init error patch and move ggtt invalidate out of ggtt lock (Maarten Lankhorst) - Fix a bo leak (Marten Lankhorst) - Add LRC parsing for more GPU instructions (Matt Roper) - Add various definitions for hardware and IP (Matt Roper) - Define all possible engines in media IP descriptors (Matt Roper) - Various cleanups, asserts and code fixes (Matthew Auld) - Various cleanups and code fixes (Matthew Brost) - Increase VM_BIND number of per-ioctl Ops (Matthew Brost, Paulo Zanoni) - Don't support execlists in xe_gt_tlb_invalidation layer (Matthew Brost) - Handle timing out of already signaled jobs gracefully (Matthew Brost) - Pipeline evict / restore of pinned BOs during suspend / resume (Matthew Brost) - Do not grab forcewakes when issuing GGTT TLB invalidation via GuC (Matthew Brost) - Drop ggtt invalidate from display code (Matthew Brost) - drm/xe: Add XE_BO_GGTT_INVALIDATE flag (Matthew Brost) - Add debug messages for MMU notifier and VMA invalidate (Matthew Brost) - Use ordered wq for preempt fence waiting (Matthew Brost) - Initial development for SR-IOV support including some refactoring (Michal Wajdeczko) - Various GuC- and GT- related cleanups and fixes (Michal Wajdeczko) - Move userptr over to start using hmm_range_fault (Oak Zeng) - Add new PCI IDs to DG2 platform (Ravi Kumar Vodapalli) - Pcode - and VRAM initialization check update (Riana Tauro) - Large PM update including i915 display patches, and a fix for one of those. (Rodrigo Vivi) - Introduce performance tuning changes for Xe2_HPG (Shekhar Chauhan) - GSC / HDCP updates (Suraj Kandpal) - Minor code cleanup (Tejas Upadhyay) - Rework / fix rebind TLB flushing and move rebind into the drm_exec locking loop (Thomas Hellström) - Backmerge (Thomas Hellström) - GuC updates and fixes (Vinay Belgaumkar, Zhanjun Dong) Signed-off-by: Dave Airlie <airlied@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCZiestQAKCRC4FpNVCsYG # v8dLAQCDFUR7R5rwSdfqzNy+Djg+9ZgmtzVEfHZ+rI2lTReaCwEAhWeK7UooIMV0 # vGsSdsqGsJQm4VLRzE6H1yemCCQOBgM= # =HouD # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Apr 2024 22:42:29 AEST # gpg: using EDDSA key 6C91433BC35A06E6BC762193B81693550AC606BF # gpg: Can't check signature: No public key # Conflicts: # drivers/gpu/drm/xe/xe_device_types.h # drivers/gpu/drm/xe/xe_vm.c # drivers/gpu/drm/xe/xe_vm_types.h From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zievlb1wvqDg1ovi@fedora
2024-04-23genirq/msi: Add MSI allocation helper and export MSI functionsNipun Gupta
MSI functions for allocation and free can be directly used by the device drivers without any wrapper provided by bus drivers. So export these MSI functions. Also, add a wrapper API to allocate MSIs providing only the number of interrupts rather than range for simpler driver usage. Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240423111021.1686144-1-nipun.gupta@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-04-23block: use a per disk workqueue for zone write pluggingDamien Le Moal
A zone write plug BIO work function blk_zone_wplug_bio_work() calls submit_bio_noacct_nocheck() to execute the next unplugged BIO. This function may block. So executing zone plugs BIO works using the block layer global kblockd workqueue can potentially lead to preformance or latency issues as the number of concurrent work for a workqueue is limited to WQ_DFL_ACTIVE (256). 1) For a system with a large number of zoned disks, issuing write requests to otherwise unused zones may be delayed wiating for a work thread to become available. 2) Requeue operations which use kblockd but are independent of zone write plugging may alsoi end up being delayed. To avoid these potential performance issues, create a workqueue per zoned device to execute zone plugs BIO work. The workqueue max active parameter is set to the maximum number of zone write plugs allocated with the zone write plug mempool. This limit is equal to the maximum number of open zones of the disk and defaults to 128 for disks that do not have a limit on the number of open zones. Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240420075811.1276893-3-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-23Merge 6.9-rc5 into usb-nextGreg Kroah-Hartman
We need the usb/thunderbolt fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-23Merge 6.9-rc5 into driver-core-nextGreg Kroah-Hartman
We want the kernfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-23Merge 6.9-rc5 into tty-nextGreg Kroah-Hartman
We want the tty fixes in here as well, and it resolves a merge conflict in: drivers/tty/serial/serial_core.c as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-23Merge 6.9-rc5 into char-misc-nextGreg Kroah-Hartman
We need the char/misc fixes in here as well to work off of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-23soc: mediatek: mtk-cmdq: Add cmdq_pkt_acquire_event() functionJason-JH.Lin
Add cmdq_pkt_acquire_event() function to support CMDQ user making an instruction for acquiring event. CMDQ users can use cmdq_pkt_acquire_event() as `mutex_lock` and cmdq_pkt_clear_event() as `mutex_unlock` to protect the global resource modified instructions between them. cmdq_pkt_acquire_event() would wait for event to be cleared. After event is cleared by cmdq_pkt_clear_event() in other GCE threads, cmdq_pkt_acquire_event() would set event and keep executing next instruction. So the mutex would work like this: cmdq_pkt_acquire_event() /* mutex lock */ /* critical secton instructions that modified global resource */ cmdq_pkt_clear_event() /* mutex unlock */ Prevent the critical section instructions from being affected by other GCE threads. Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240307013458.23550-5-jason-jh.lin@mediatek.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: mtk-cmdq: Add cmdq_pkt_poll_addr() functionJason-JH.Lin
Add cmdq_pkt_poll_addr function to support CMDQ user making an instruction for polling a specific address of hardware rigster to check the value with or without mask. POLL is a legacy operation in GCE, so it does not support SPR and CMDQ_CODE_LOGIC. To support polling the register address which doesn't have the subsys id, CMDQ users need to make an instruction with GPR and CMDQ_CODE_MASK operation to move the register address to be poll into GPR. Then users can make an POLL instruction with GPR to poll the register address assigned in previous instruction. Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240307013458.23550-4-jason-jh.lin@mediatek.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: mtk-cmdq: Add cmdq_pkt_mem_move() functionJason-JH.Lin
Add cmdq_pkt_mem_move() function to support CMDQ user making an instruction for moving a value from a source address to a destination address. Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240307013458.23550-3-jason-jh.lin@mediatek.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: mtk-cmdq: Add specific purpose register definitions for GCEJason-JH.Lin
Add specific purpose register definitions for GCE, so CMDQ users can use them as a buffer to store data. Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240307013458.23550-2-jason-jh.lin@mediatek.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: cmdq: Refine cmdq_pkt_create() and cmdq_pkt_destroy()Chun-Kuang Hu
cmdq_pkt_create() and cmdq_pkt_destroy() is not suitable for client drivers so each client driver has implement its own function. This refinement would pass struct cmdq_pkt pointer into cmdq_pkt_create(). In addition, client driver has the struct cmdq_client information, so it's not necessary to store this information in struct cmdq_pkt. After this refinement, client drivers could use these helper funciton instead of implementing its own version. Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://lore.kernel.org/r/20240222154120.16959-8-chunkuang.hu@kernel.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: cmdq: Remove cmdq_pkt_flush_async() helper functionChun-Kuang Hu
cmdq_pkt_flush_async() is not used by all client drivers (MediaTek drm driver and MediaTek mdp3 driver), so remove it. Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://lore.kernel.org/r/20240222154120.16959-7-chunkuang.hu@kernel.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: cmdq: Add cmdq_pkt_eoc() helper functionChun-Kuang Hu
cmdq_pkt_eoc() append eoc command to CMDQ packet. eoc command would ask GCE to generate IRQ. It's usually appended to the end of packet to notify all command in the packet is done. Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://lore.kernel.org/r/20240222154120.16959-6-chunkuang.hu@kernel.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: cmdq: Add cmdq_pkt_jump_rel() helper functionChun-Kuang Hu
cmdq_pkt_jump_rel() append relative jump command to the packet. Relative jump change PC to the target address with offset from current PC. Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240222154120.16959-5-chunkuang.hu@kernel.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: cmdq: Rename cmdq_pkt_jump() to cmdq_pkt_jump_abs()Chun-Kuang Hu
In order to distinguish absolute jump and relative jump, cmdq_pkt_jump() append absolute jump command, so rename it to cmdq_pkt_jump_abs(). Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240222154120.16959-4-chunkuang.hu@kernel.org [Angelo: Added temporary wrapper to avoid build breakage] Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23soc: mediatek: cmdq: Add parameter shift_pa to cmdq_pkt_jump()Chun-Kuang Hu
In original design, cmdq_pkt_jump() call cmdq_get_shift_pa() every time to get shift_pa. But the shift_pa is constant value for each SoC, so client driver just need to call cmdq_get_shift_pa() once and pass shift_pa to cmdq_pkt_jump() to prevent frequent function call. Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://lore.kernel.org/r/20240222154120.16959-3-chunkuang.hu@kernel.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-04-23fpga: region: add owner module and take its refcountMarco Pagani
The current implementation of the fpga region assumes that the low-level module registers a driver for the parent device and uses its owner pointer to take the module's refcount. This approach is problematic since it can lead to a null pointer dereference while attempting to get the region during programming if the parent device does not have a driver. To address this problem, add a module owner pointer to the fpga_region struct and use it to take the module's refcount. Modify the functions for registering a region to take an additional owner module parameter and rename them to avoid conflicts. Use the old function names for helper macros that automatically set the module that registers the region as the owner. This ensures compatibility with existing low-level control modules and reduces the chances of registering a region without setting the owner. Also, update the documentation to keep it consistent with the new interface for registering an fpga region. Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA") Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Xu Yilun <yilun.xu@intel.com> Reviewed-by: Russ Weight <russ.weight@linux.dev> Signed-off-by: Marco Pagani <marpagan@redhat.com> Acked-by: Xu Yilun <yilun.xu@intel.com> Link: https://lore.kernel.org/r/20240419083601.77403-1-marpagan@redhat.com Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
2024-04-23crash: add a new kexec flag for hotplug supportSourabh Jain
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now we have two kexec flags that enables crash hotplug support for kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures). To simplify the process of finding and reporting the crash hotplug support the following changes are introduced. 1. Define arch specific function to process the kexec flags and determine crash hotplug support 2. Rename the @update_elfcorehdr member of struct kimage to @hotplug_support and populate it for both kexec_load and kexec_file_load syscalls, because architecture can update more than one kexec segment 3. Let generic function crash_check_hotplug_support report hotplug support for loaded kdump image based on value of @hotplug_support To bring the x86 crash hotplug support in line with the above points, the following changes have been made: - Introduce the arch_crash_hotplug_support function to process kexec flags and determine crash hotplug support - Remove the arch_crash_hotplug_[cpu|memory]_support functions Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326055413.186534-3-sourabhjain@linux.ibm.com
2024-04-23crash: forward memory_notify arg to arch crash hotplug handlerSourabh Jain
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326055413.186534-2-sourabhjain@linux.ibm.com
2024-04-23regulator: change stubbed devm_regulator_get_enable to return OkMatti Vaittinen
The devm_regulator_get_enable() should be a 'call and forget' API, meaning, when it is used to enable the regulators, the API does not provide a handle to do any further control of the regulators. It gives no real benefit to return an error from the stub if CONFIG_REGULATOR is not set. On the contrary, returning and error is causing problems to drivers when hardware is such it works out just fine with no regulator control. Returning an error forces drivers to specifically handle the case where CONFIG_REGULATOR is not set, making the mere existence of the stub questionalble. Furthermore, the stub of the regulator_enable() seems to be returning Ok. Change the stub implementation for the devm_regulator_get_enable() to return Ok so drivers do not separately handle the case where the CONFIG_REGULATOR is not set. Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Reported-by: Aleksander Mazur <deweloper@wp.pl> Suggested-by: Guenter Roeck <linux@roeck-us.net> Fixes: da279e6965b3 ("regulator: Add devm helpers for get and enable") Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/ZiYF6d1V1vSPcsJS@drtxq0yyyyyyyyyyyyyby-3.rev.dnainternet.fi Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-22Merge branch 'for-uring-ubufops' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux into for-6.10/io_uring Merge net changes required for the upcoming send zerocopy improvements. * 'for-uring-ubufops' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux: net: add callback for setting a ubuf_info to skb net: extend ubuf_info callback to ops structure Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-22Merge branch 'for-uring-ubufops' into HEADJakub Kicinski
Pavel Begunkov says: ==================== implement io_uring notification (ubuf_info) stacking (net part) To have per request buffer notifications each zerocopy io_uring send request allocates a new ubuf_info. However, as an skb can carry only one uarg, it may force the stack to create many small skbs hurting performance in many ways. The patchset implements notification, i.e. an io_uring's ubuf_info extension, stacking. It attempts to link ubuf_info's into a list, allowing to have multiple of them per skb. liburing/examples/send-zerocopy shows up 6 times performance improvement for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without the patchset it requires much larger sends to utilise all potential. bytes | before | after (Kqps) 1200 | 195 | 1023 4000 | 193 | 1386 8000 | 154 | 1058 ==================== Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23counter: Don't use "proxy" headersAndy Shevchenko
Update header inclusions to follow IWYU (Include What You Use) principle. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240422144850.2031076-1-andriy.shevchenko@linux.intel.com Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2024-04-22net: add callback for setting a ubuf_info to skbPavel Begunkov
At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io_uring. Add a callback for assigning ubuf_info to skb, this way we will implement smarter assignment later like linking ubuf_info together. Note, it's an optional callback, which should be compatible with skb_zcopy_set(), that's because the net stack might potentially decide to clone an skb and take another reference to ubuf_info whenever it wishes. Also, a correct implementation should always be able to bind to an skb without prior ubuf_info, otherwise we could end up in a situation when the send would not be able to progress. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: extend ubuf_info callback to ops structurePavel Begunkov
We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Support updating coalescing configuration without resetting channelsRahul Rameshbabu
When CQE mode or DIM state is changed, gracefully reconfigure channels to handle new configuration. Previously, would create new channels that would reflect the changes rather than update the original channels. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Use DIM constants for CQ period mode parameterRahul Rameshbabu
Use core DIM CQ period mode enum values for the CQ parameter for the period mode. Translate the value to the specific mlx5 device constant for the selected period mode when creating a CQ. Avoid needing to translate mlx5 device constants to DIM constants for core DIM functionality. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22s390: Stop using weak symbols for __iowrite64_copy()Jason Gunthorpe
Complete switching the __iowriteXX_copy() routines over to use #define and arch provided inline/macro functions instead of weak symbols. S390 has an implementation that simply calls another memcpy function. Inline this so the callers don't have to do two jumps. Link: https://lore.kernel.org/r/3-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22x86: Stop using weak symbols for __iowrite32_copy()Jason Gunthorpe
Start switching iomap_copy routines over to use #define and arch provided inline/macro functions instead of weak symbols. Inline functions allow more compiler optimization and this is often a driver hot path. x86 has the only weak implementation for __iowrite32_copy(), so replace it with a static inline containing the same single instruction inline assembly. The compiler will generate the "mov edx,ecx" in a more optimal way. Remove iomap_copy_64.S Link: https://lore.kernel.org/r/1-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22workqueue: Use "@..." in function comment to describe variable length argumentTejun Heo
Previously, it was using "remaining args" without leading "@" which isn't valid. Let's follow snprintf()'s example and use "@...". Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
2024-04-22Merge tag 'nfsd-6.9-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix an NFS/RDMA performance regression in v6.9-rc * tag 'nfsd-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"
2024-04-22io_uring/kbuf: add helpers for getting/peeking multiple buffersJens Axboe
Our provided buffer interface only allows selection of a single buffer. Add an API that allows getting/peeking multiple buffers at the same time. This is only implemented for the ring provided buffers. It could be added for the legacy provided buffers as well, but since it's strongly encouraged to use the new interface, let's keep it simpler and just provide it for the new API. The legacy interface will always just select a single buffer. There are two new main functions: io_buffers_select(), which selects up as many buffers as it can. The caller supplies the iovec array, and io_buffers_select() may allocate a bigger array if the 'out_len' being passed in is non-zero and bigger than what fits in the provided iovec. Buffers grabbed with this helper are permanently assigned. io_buffers_peek(), which works like io_buffers_select(), except they can be recycled, if needed. Callers using either of these functions should call io_put_kbufs() rather than io_put_kbuf() at completion time. The peek interface must be called with the ctx locked from peek to completion. This add a bit state for the request: - REQ_F_BUFFERS_COMMIT, which means that the the buffers have been peeked and should be committed to the buffer ring head when they are put as part of completion. Prior to this, req->buf_list was cleared to NULL when committed. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-22timerqueue: Remove never used function timerqueue_node_expires()Anna-Maria Behnsen
This function was introduced with commit 60bda037f1dd ("posix-cpu-timers: Utilize timerqueue for storage") but never used. Remove it. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240417140229.19633-1-anna-maria@linutronix.de
2024-04-22sysctl: treewide: constify ctl_table_header::ctl_table_argThomas Weißschuh
To be able to constify instances of struct ctl_tables it is necessary to remove ways through which non-const versions are exposed from the sysctl core. One of these is the ctl_table_arg member of struct ctl_table_header. Constify this reference as a prerequisite for the full constification of struct ctl_table instances. No functional change. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22Backmerge tag 'v6.9-rc5' into drm-nextDave Airlie
Linux 6.9-rc5 I've had a persistent msm failure on clang, and the fix is in fixes so just pull it back to fix that. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-04-21Merge tag 'char-misc-6.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver fixes from Greg KH: "Here are some small char/misc and other driver fixes for 6.9-rc5. Included in here are the following: - binder driver fix for reported problem - speakup crash fix - mei driver fixes for reported problems - comdei driver fix - interconnect driver fixes - rtsx driver fix - peci.h kernel doc fix All of these have been in linux-next for over a week with no reported problems" * tag 'char-misc-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: peci: linux/peci.h: fix Excess kernel-doc description warning binder: check offset alignment in binder_get_object() comedi: vmk80xx: fix incomplete endpoint checking mei: vsc: Unregister interrupt handler for system suspend Revert "mei: vsc: Call wake_up() in the threaded IRQ handler" misc: rtsx: Fix rts5264 driver status incorrect when card removed mei: me: disable RPL-S on SPS and IGN firmwares speakup: Avoid crash on very long word interconnect: Don't access req_list while it's being manipulated interconnect: qcom: x1e80100: Remove inexistent ACV_PERF BCM
2024-04-20Merge tag 'block-6.9-20240420' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Just two minor fixes that should go into the 6.9 kernel release, one fixing a regression with partition scanning errors, and one fixing a WARN_ON() that can get triggered if we race with a timer" * tag 'block-6.9-20240420' of git://git.kernel.dk/linux: blk-iocost: do not WARN if iocg was already offlined block: propagate partition scanning errors to the BLKRRPART ioctl
2024-04-20Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A couple clk driver fixes, a build fix, and a deadlock fix: - Mediatek mt7988 has broken PCIe because the wrong parent is used - Mediatek clk drivers may deadlock when registering their clks because the clk provider device is repeatedly runtime PM resumed and suspended during probe and clk registration. Resuming the clk provider device deadlocks with an ABBA deadlock due to genpd_lock and the clk prepare_lock. The fix is to keep the device runtime resumed while registering clks. - Another runtime PM related deadlock, this time with disabling unused clks during late init. We get an ABBA deadlock where a device is runtime PM resuming (or suspending) while the disabling of unused clks is happening in parallel. That runtime PM action calls into the clk framework and tries to grab the clk prepare_lock while the disabling of unused clks holds the prepare_lock and is waiting for that runtime PM action to complete. The fix is to runtime resume all the clk provider devices before grabbing the clk prepare_lock during disable unused. - A build fix to provide an empty devm_clk_rate_exclusive_get() function when CONFIG_COMMON_CLK=n" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: mediatek: mt7988-infracfg: fix clocks for 2nd PCIe port clk: mediatek: Do a runtime PM get on controllers during probe clk: Get runtime PM before walking tree for clk_summary clk: Get runtime PM before walking tree during disable_unused clk: Initialize struct clk_core kref earlier clk: Don't hold prepare_lock when calling kref_put() clk: Remove prepare_lock hold assertion in __clk_release() clk: Provide !COMMON_CLK dummy for devm_clk_rate_exclusive_get()
2024-04-20Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"Chuck Lever
Performance regression reported with NFS/RDMA using Omnipath, bisected to commit e084ee673c77 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain"). Tracing on the server reports: nfsd-7771 [060] 1758.891809: svcrdma_sq_post_err: cq.id=205 cid=226 sc_sq_avail=13643/851 status=-12 sq_post_err reports ENOMEM, and the rdma->sc_sq_avail (13643) is larger than rdma->sc_sq_depth (851). The number of available Send Queue entries is always supposed to be smaller than the Send Queue depth. That seems like a Send Queue accounting bug in svcrdma. As it's getting to be late in the 6.9-rc cycle, revert this commit. It can be revisited in a subsequent kernel release. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218743 Fixes: e084ee673c77 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-20iio: backend: add new functionalityNuno Sa
This adds the needed backend ops for supporting a backend inerfacing with an high speed dac. The new ops are: * data_source_set(); * set_sampling_freq(); * extend_chan_spec(); * ext_info_set(); * ext_info_get(). Also to note the new helpers that are meant to be used by the backends when extending an IIO channel (adding extended info): * iio_backend_ext_info_set(); * iio_backend_ext_info_get(). Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240419-iio-backend-axi-dac-v4-8-5ca45b4de294@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-04-20iio: buffer-dmaengine: Support specifying buffer directionPaul Cercueil
Update the devm_iio_dmaengine_buffer_setup() function to support specifying the buffer direction. Update the iio_dmaengine_buffer_submit() function to handle input buffers as well as output buffers. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com> Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240419-iio-backend-axi-dac-v4-4-5ca45b4de294@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-04-20iio: buffer-dma: Enable buffer write supportPaul Cercueil
Adding write support to the buffer-dma code is easy - the write() function basically needs to do the exact same thing as the read() function: dequeue a block, read or write the data, enqueue the block when entirely processed. Therefore, the iio_buffer_dma_read() and the new iio_buffer_dma_write() now both call a function iio_buffer_dma_io(), which will perform this task. Note that we preemptively reset block->bytes_used to the buffer's size in iio_dma_buffer_request_update(), as in the future the iio_dma_buffer_enqueue() function won't reset it. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com> Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240419-iio-backend-axi-dac-v4-3-5ca45b4de294@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-04-20iio: buffer-dma: Rename iio_dma_buffer_data_available()Paul Cercueil
Change its name to iio_dma_buffer_usage(), as this function can be used both for the .data_available and the .space_available callbacks. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240419-iio-backend-axi-dac-v4-2-5ca45b4de294@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-04-20iio: buffer-dma: add iio_dmaengine_buffer_setup()Nuno Sa
This brings the DMA buffer API more in line with what we have in the triggered buffer. There's no need of having both devm_iio_dmaengine_buffer_setup() and devm_iio_dmaengine_buffer_alloc(). Hence we introduce the new iio_dmaengine_buffer_setup() that together with devm_iio_dmaengine_buffer_setup() should be all we need. Note that as part of this change iio_dmaengine_buffer_alloc() is again static and the axi-adc was updated accordingly. Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240419-iio-backend-axi-dac-v4-1-5ca45b4de294@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-04-19Merge tag 'bootconfig-fixes-v6.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig fixes from Masami Hiramatsu: - Fix potential static_command_line buffer overrun. Currently we allocate the memory for static_command_line based on "boot_command_line", but it will copy "command_line" into it. So we use the length of "command_line" instead of "boot_command_line" (as we previously did) - Use memblock_free_late() in xbc_exit() instead of memblock_free() after the buddy system is initialized - Fix a kerneldoc warning * tag 'bootconfig-fixes-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: bootconfig: Fix the kerneldoc of _xbc_exit() bootconfig: use memblock_free_late to free xbc memory to buddy init/main.c: Fix potential static_command_line memory overflow
2024-04-19KVM: Allow page-sized MMU caches to be initialized with custom 64-bit valuesSean Christopherson
Add support to MMU caches for initializing a page with a custom 64-bit value, e.g. to pre-fill an entire page table with non-zero PTE values. The functionality will be used by x86 to support Intel's TDX, which needs to set bit 63 in all non-present PTEs in order to prevent !PRESENT page faults from getting reflected into the guest (Intel's EPT Violation #VE architecture made the less than brilliant decision of having the per-PTE behavior be opt-out instead of opt-in). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <5919f685f109a1b0ebc6bd8fc4536ee94bcc172d.1705965635.git.isaku.yamahata@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-19Merge tag 'mm-hotfixes-stable-2024-04-18-14-41' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "15 hotfixes. 9 are cc:stable and the remainder address post-6.8 issues or aren't considered suitable for backporting. There are a significant number of fixups for this cycle's page_owner changes (series "page_owner: print stacks and their outstanding allocations"). Apart from that, singleton changes all over, mainly in MM" * tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix OOB in nilfs_set_de_type MAINTAINERS: update Naoya Horiguchi's email address fork: defer linking file vma until vma is fully initialized mm/shmem: inline shmem_is_huge() for disabled transparent hugepages mm,page_owner: defer enablement of static branch Squashfs: check the inode number is not the invalid value of zero mm,swapops: update check in is_pfn_swap_entry for hwpoison entries mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled mm/userfaultfd: allow hugetlb change protection upon poison entry mm,page_owner: fix printing of stack records mm,page_owner: fix accounting of pages when migrating mm,page_owner: fix refcount imbalance mm,page_owner: update metadata for tail pages userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly