summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-17SUNRPC: Move initialization of rq_stimeChuck Lever
Micro-optimization: Call ktime_get() only when ->xpo_recvfrom() has given us a full RPC message to process. rq_stime isn't used otherwise, so this avoids pointless work. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Optimize page release in svc_rdma_sendto()Chuck Lever
Now that we have bulk page allocation and release APIs, it's more efficient to use those than it is for nfsd threads to wait for send completions. Previous patches have eliminated the calls to wait_for_completion() and complete(), in order to avoid scheduler overhead. Now release pages-under-I/O in the send completion handler using the efficient bulk release API. I've measured a 7% reduction in cumulative CPU utilization in svc_rdma_sendto(), svc_rdma_wc_send(), and svc_xprt_release(). In particular, using release_pages() instead of complete() cuts the time per svc_rdma_wc_send() call by two-thirds. This helps improve scalability because svc_rdma_wc_send() is single-threaded per connection. Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: Prevent page release when nothing was receivedChuck Lever
I noticed that svc_rqst_release_pages() was still unnecessarily releasing a page when svc_rdma_recvfrom() returns zero. Fixes: a53d5cb0646a ("svcrdma: Avoid releasing a page in svc_xprt_release()") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17sfc: use budget for TX completionsÍñigo Huguet
When running workloads heavy unbalanced towards TX (high TX, low RX traffic), sfc driver can retain the CPU during too long times. Although in many cases this is not enough to be visible, it can affect performance and system responsiveness. A way to reproduce it is to use a debug kernel and run some parallel netperf TX tests. In some systems, this will lead to this message being logged: kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s! The reason is that sfc driver doesn't account any NAPI budget for the TX completion events work. With high-TX/low-RX traffic, this makes that the CPU is held for long time for NAPI poll. Documentations says "drivers can process completions for any number of Tx packets but should only process up to budget number of Rx packets". However, many drivers do limit the amount of TX completions that they process in a single NAPI poll. In the same way, this patch adds a limit for the TX work in sfc. With the patch applied, the watchdog warning never appears. Tested with netperf in different combinations: single process / parallel processes, TCP / UDP and different sizes of UDP messages. Repeated the tests before and after the patch, without any noticeable difference in network or CPU performance. Test hardware: Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core) Solarflare Communications XtremeScale X2522-25G Network Adapter Fixes: 5227ecccea2d ("sfc: remove tx and MCDI handling from NAPI budget consideration") Fixes: d19a53721863 ("sfc_ef100: TX path for EF100 NICs") Reported-by: Fei Liu <feliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20230615084929.10506-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-17irqchip/jcore-aic: Fix missing allocation of IRQ descriptorsJohn Paul Adrian Glaubitz
The initialization function for the J-Core AIC aic_irq_of_init() is currently missing the call to irq_alloc_descs() which allocates and initializes all the IRQ descriptors. Add missing function call and return the error code from irq_alloc_descs() in case the allocation fails. Fixes: 981b58f66cfc ("irqchip/jcore-aic: Add J-Core AIC driver") Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: Rob Landley <rob@landley.net> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230510163343.43090-1-glaubitz@physik.fu-berlin.de
2023-06-17irqchip/stm32-exti: Fix warning on initialized field overwrittenAntonio Borneo
While compiling with W=1, both gcc and clang complain about a tricky way to initialize an array by filling it with a non-zero value and then overrride some of the array elements. In this case the override is intentional, so just disable the specific warning for only this part of the code. Note: the flag "-Woverride-init" is recognized by both compilers, but the warning msg from clang reports "-Winitializer-overrides". The doc of clang clarifies that the two flags are synonyms, so use here only the flag name common on both compilers. Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> Fixes: c297493336b7 ("irqchip/stm32-exti: Simplify irq description table") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230601155614.34490-1-antonio.borneo@foss.st.com
2023-06-17irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC mapMarek Vasut
The EXTI interrupt 46 is mapped to GIC interrupt 151. Add the missing mapping, which is used for IWDG2 pretimeout interrupt and wake up source. Reviewed-by: Antonio Borneo <antonio.borneo@foss.st.com> Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230517194349.105745-1-marex@denx.de
2023-06-17irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototypeArnd Bergmann
iort_pmsi_get_dev_id() has a __weak definition in the driver, and an override in arm64 specific code, but the declaration is conditional and not always seen when the copy in the driver gets built: drivers/irqchip/irq-gic-v3-its-platform-msi.c:41:12: error: no previous prototype for 'iort_pmsi_get_dev_id' [-Werror=missing-prototypes] Move the existing declaration out of the #ifdef block to ensure it can be seen in all configurations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230516200516.554663-5-arnd@kernel.org
2023-06-17irqchip/mxs: Include linux/irqchip/mxs.hArnd Bergmann
This header contains the definition for icoll_handle_irq(), which is used in arch/arm/mach-mxs/mach-mxs.c, without this we get a warning about a missing prototype when building with W=1. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230516200516.554663-4-arnd@kernel.org
2023-06-17irqchip/clps711x: Remove unused clps711x_intc_init() functionArnd Bergmann
This function has no caller or declaration any more: drivers/irqchip/irq-clps711x.c:215:13: error: no previous prototype for 'clps711x_intc_init' The #ifdef check around clps711x_intc_init_dt() is also not needed since the file is only built when that is enabled. Fixes: 4a56f46a7dc6 ("ARM: clps711x: Remove boards support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230516200516.554663-3-arnd@kernel.org
2023-06-17irqchip/mmp: Remove non-DT codepathArnd Bergmann
Building with "W=1" warns about missing declarations for two functions in the mmp irqchip driver: drivers/irqchip/irq-mmp.c:248:13: error: no previous prototype for 'icu_init_irq' drivers/irqchip/irq-mmp.c:271:13: error: no previous prototype for 'mmp2_init_icu' The declarations are present in an unused header, but since there is no caller, it's best to just remove the functions and the header completely, making the driver DT-only to match the state of the platform. Fixes: 77acc85ce797 ("ARM: mmp: remove device definitions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230516200516.554663-2-arnd@kernel.org
2023-06-17irqchip/ftintc010: Mark all function staticArnd Bergmann
Two functions were always global but never had any callers outside of this file: drivers/irqchip/irq-ftintc010.c:128:39: error: no previous prototype for 'ft010_irqchip_handle_irq' drivers/irqchip/irq-ftintc010.c:165:12: error: no previous prototype for 'ft010_of_init_irq' Fixes: b4d3053c8ce9 ("irqchip: Add a driver for Cortina Gemini") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230516200516.554663-1-arnd@kernel.org
2023-06-17irqdomain: Include internals.h for function prototypesArnd Bergmann
irq_domain_debugfs_init() is defined in irqdomain.c, but the declaration is in a header that is not included here: kernel/irq/irqdomain.c:1965:13: error: no previous prototype for 'irq_domain_debugfs_init' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230516200432.554240-1-arnd@kernel.org
2023-06-17Merge branch irq/loongarch-fixes-6.5 into irq/irqchip-nextMarc Zyngier
* irq/loongarch-fixes-6.5: : . : Yet another series of random fixes for the Loongson/Loongarch : string of interrupt controller, covering : : - affinity setting, : - trigger polarity, : - wake-up, : - DT support : . irqchip/loongson-eiointc: Add DT init support dt-bindings: interrupt-controller: Add Loongson EIOINTC irqchip/loongson-eiointc: Fix irq affinity setting during resume irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag irqchip/loongson-liointc: Fix IRQ trigger polarity irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment irqchip/loongson-pch-pic: Fix initialization of HT vector register Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-06-17irqchip/loongson-eiointc: Add DT init supportBinbin Zhou
Add EIOINTC irqchip DT support, which is needed for Loongson chips based on DT and supporting EIOINTC, such as the Loongson-2K0500 SOC. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/764e02d924094580ac0f1d15535f4b98308705c6.1683279769.git.zhoubinbin@loongson.cn
2023-06-17dt-bindings: interrupt-controller: Add Loongson EIOINTCBinbin Zhou
Add Loongson Extended I/O Interrupt controller binding with DT schema format using json-schema. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/4369959615eda101e612c450b8974d76ce7e8821.1683279769.git.zhoubinbin@loongson.cn
2023-06-17parisc: Delete redundant register definitions in <asm/assembly.h>Ben Hutchings
We define sp and ipsw in <asm/asmregs.h> using ".reg", and when using current binutils (snapshot 2.40.50.20230611) the definitions in <asm/assembly.h> using "=" conflict with those: arch/parisc/include/asm/assembly.h: Assembler messages: arch/parisc/include/asm/assembly.h:93: Error: symbol `sp' is already defined arch/parisc/include/asm/assembly.h:95: Error: symbol `ipsw' is already defined Delete the duplicate definitions in <asm/assembly.h>. Also delete the definition of gp, which isn't used anywhere. Signed-off-by: Ben Hutchings <benh@debian.org> Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2023-06-16Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of clk driver fixes: - Fix an OOB issue in the Mediatek mt8365 driver where arrays of clks are mismatched in size - Use the proper clk_ops for a few clks in the Mediatek mt8365 driver - Stop using abs() in clk_composite_determine_rate() because 64-bit math goes wrong on large unsigned long numbers that are subtracted and passed into abs() - Zero initialize a struct clk_init_data in clk-loongson2 to avoid stack junk confusing clk_hw_register() - Actually use a pointer to __iomem for writel() in pxa3xx_clk_update_accr() so we don't oops" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: pxa: fix NULL pointer dereference in pxa3xx_clk_update_accr clk: clk-loongson2: Zero init clk_init_data clk: mediatek: mt8365: Fix inverted topclk operations clk: composite: Fix handling of high clock rates clk: mediatek: mt8365: Fix index issue
2023-06-16ksmbd: validate session id and tree id in the compound requestNamjae Jeon
This patch validate session id and tree id in compound request. If first operation in the compound is SMB2 ECHO request, ksmbd bypass session and tree validation. So work->sess and work->tcon could be NULL. If secound request in the compound access work->sess or tcon, It cause NULL pointer dereferecing error. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21165 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16ksmbd: fix out-of-bound read in smb2_writeNamjae Jeon
ksmbd_smb2_check_message doesn't validate hdr->NextCommand. If ->NextCommand is bigger than Offset + Length of smb2 write, It will allow oversized smb2 write length. It will cause OOB read in smb2_write. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21164 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16ksmbd: add mnt_want_write to ksmbd vfs functionsNamjae Jeon
ksmbd is doing write access using vfs helpers. There are the cases that mnt_want_write() is not called in vfs helper. This patch add missing mnt_want_write() to ksmbd vfs functions. Cc: stable@vger.kernel.org Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16ksmbd: validate command payload sizeNamjae Jeon
->StructureSize2 indicates command payload size. ksmbd should validate this size with rfc1002 length before accessing it. This patch remove unneeded check and add the validation for this. [ 8.912583] BUG: KASAN: slab-out-of-bounds in ksmbd_smb2_check_message+0x12a/0xc50 [ 8.913051] Read of size 2 at addr ffff88800ac7d92c by task kworker/0:0/7 ... [ 8.914967] Call Trace: [ 8.915126] <TASK> [ 8.915267] dump_stack_lvl+0x33/0x50 [ 8.915506] print_report+0xcc/0x620 [ 8.916558] kasan_report+0xae/0xe0 [ 8.917080] kasan_check_range+0x35/0x1b0 [ 8.917334] ksmbd_smb2_check_message+0x12a/0xc50 [ 8.917935] ksmbd_verify_smb_message+0xae/0xd0 [ 8.918223] handle_ksmbd_work+0x192/0x820 [ 8.918478] process_one_work+0x419/0x760 [ 8.918727] worker_thread+0x2a2/0x6f0 [ 8.919222] kthread+0x187/0x1d0 [ 8.919723] ret_from_fork+0x1f/0x30 [ 8.919954] </TASK> Cc: stable@vger.kernel.org Reported-by: Chih-Yen Chang <cc85nod@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16Merge tag 'drm-fixes-2023-06-17' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "A bunch of misc fixes across the board. amdgpu is the usual bulk with a revert and other fixes, nouveau has a race fix that was causing a UAF that was hard hanging systems, otherwise some qaic, bridge and radeon. amdgpu: - GFX9 preemption fixes - Add missing radeon secondary PCI ID - vblflash fixes - SMU 13 fix - VCN 4.0 fix - Re-enable TOPDOWN flag for large BAR systems to fix regression - eDP fix - PSR hang fix - DPIA fix radeon: - fbdev client warning fix qaic: - leak fix - null ptr deref fix nouveau: - use-after-free caused by fence race fix - runtime pm fix - NULL ptr checks bridge: - ti-sn65dsi86: Avoid possible buffer overflow" * tag 'drm-fixes-2023-06-17' of git://anongit.freedesktop.org/drm/drm: (21 commits) nouveau: fix client work fence deletion race drm/amd/display: limit DPIA link rate to HBR3 drm/amd/display: fix the system hang while disable PSR drm/amd/display: edp do not add non-edid timings Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system" drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1 drm/radeon: Disable outputs when releasing fbdev client drm/amd/pm: workaround for compute workload type on some skus drm/amd: Tighten permissions on VBIOS flashing attributes drm/amd: Make sure image is written to trigger VBIOS image update flow drm/amdgpu: add missing radeon secondary PCI ID drm/amdgpu: Implement gfx9 patch functions for resubmission drm/amdgpu: Modify indirect buffer packages for resubmission drm/amdgpu: Program gds backup address as zero if no gds allocated drm/nouveau: add nv_encoder pointer check for NULL drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled drm/nouveau/dp: check for NULL nv_connector->native_mode drm/bridge: ti-sn65dsi86: Avoid possible buffer overflow drm/nouveau: don't detect DSM for non-NVIDIA device accel/qaic: Fix NULL pointer deref in qaic_destroy_drm_device() ...
2023-06-16afs: Fix vlserver probe RTT handlingDavid Howells
In the same spirit as commit ca57f02295f1 ("afs: Fix fileserver probe RTT handling"), don't rule out using a vlserver just because there haven't been enough packets yet to calculate a real rtt. Always set the server's probe rtt from the estimate provided by rxrpc_kernel_get_srtt, which is capped at 1 second. This could lead to EDESTADDRREQ errors when accessing a cell for the first time, even though the vl servers are known and have responded to a probe. Fixes: 1d4adfaf6574 ("rxrpc: Make rxrpc_kernel_get_srtt() indicate validity") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2023-June/006746.html Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-16Merge tag 'v6.4-rockchip-dtsfixes1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Fixes for the reset pin on nanopi r5c, a reset line on SOQuartz, a duplicate usb regulator on rock64 and PCIe register mappings on rk356x. Also some missing cache properties. * tag 'v6.4-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Fix rk356x PCIe register and range mappings arm64: dts: rockchip: fix button reset pin for nanopi r5c arm64: dts: rockchip: fix nEXTRST on SOQuartz arm64: dts: rockchip: add missing cache properties arm64: dts: rockchip: fix USB regulator on ROCK64 Link: https://lore.kernel.org/r/2885657.e9J7NaK4W3@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-06-16x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n buildThomas Gleixner
Moving mem_encrypt_init() broke the AMD_MEM_ENCRYPT=n because the declaration of that function was under #ifdef CONFIG_AMD_MEM_ENCRYPT and the obvious placement for the inline stub was the #else path. This is a leftover of commit 20f07a044a76 ("x86/sev: Move common memory encryption code to mem_encrypt.c") which made mem_encrypt_init() depend on X86_MEM_ENCRYPT without moving the prototype. That did not fail back then because there was no stub inline as the core init code had a weak function. Move both the declaration and the stub out of the CONFIG_AMD_MEM_ENCRYPT section and guard it with CONFIG_X86_MEM_ENCRYPT. Fixes: 439e17576eb4 ("init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Closes: https://lore.kernel.org/oe-kbuild-all/202306170247.eQtCJPE8-lkp@intel.com/
2023-06-16ieee802154: Replace strlcpy with strscpyAzeem Shaikh
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). Direct replacement is safe here since the return values from the helper macros are ignored by the callers. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230613003326.3538391-1-azeemshaikh38@gmail.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-06-17Merge tag 'drm-misc-fixes-2023-06-16' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes maybe in time for v6.4-rc7: - qaic leak and null deref fix. - Fix runtime pm in nouveau. - Fix array overflow in ti-sn65dsi86 pwm chip handling. - Assorted null check fixes in nouveau. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <dev@lankhorst.se> Link: https://patchwork.freedesktop.org/patch/msgid/641eb8a8-fbd7-90ad-0805-310b7fec9344@lankhorst.se
2023-06-16sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()Hao Jia
After commit 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth"), we may update the rq clock multiple times in the loop of __cfsb_csd_unthrottle(). A prior (although less common) instance of this problem exists in unthrottle_offline_cfs_rqs(). Cure both by ensuring update_rq_clock() is called before the loop and setting RQCF_ACT_SKIP during the loop, to supress further updates. The alternative would be pulling update_rq_clock() out of unthrottle_cfs_rq(), but that gives an even bigger mess. Fixes: 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth") Reviewed-By: Ben Segall <bsegall@google.com> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20230613082012.49615-4-jiahao.os@bytedance.com
2023-06-16sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()Hao Jia
There is a double update_rq_clock() invocation: __balance_push_cpu_stop() update_rq_clock() __migrate_task() update_rq_clock() Sadly select_fallback_rq() also needs update_rq_clock() for __do_set_cpus_allowed(), it is not possible to remove the update from __balance_push_cpu_stop(). So remove it from __migrate_task() and ensure all callers of this function call update_rq_clock() prior to calling it. Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20230613082012.49615-3-jiahao.os@bytedance.com
2023-06-16sched/core: Fixed missing rq clock update before calling set_rq_offline()Hao Jia
When using a cpufreq governor that uses cpufreq_add_update_util_hook(), it is possible to trigger a missing update_rq_clock() warning for the CPU hotplug path: rq_attach_root() set_rq_offline() rq_offline_rt() __disable_runtime() sched_rt_rq_enqueue() enqueue_top_rt_rq() cpufreq_update_util() data->func(data, rq_clock(rq), flags) Move update_rq_clock() from sched_cpu_deactivate() (one of it's callers) into set_rq_offline() such that it covers all set_rq_offline() usage. Additionally change rq_attach_root() to use rq_lock_irqsave() so that it will properly manage the runqueue clock flags. Suggested-by: Ben Segall <bsegall@google.com> Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20230613082012.49615-2-jiahao.os@bytedance.com
2023-06-16sched/deadline: Update GRUB description in the documentationVineeth Pillai
Update the details of GRUB to reflect the updated logic. Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230530135526.2385378-2-vineeth@bitbyteword.org
2023-06-16sched/deadline: Fix bandwidth reclaim equation in GRUBVineeth Pillai
According to the GRUB[1] rule, the runtime is depreciated as: "dq = -max{u, (1 - Uinact - Uextra)} dt" (1) To guarantee that deadline tasks doesn't starve lower class tasks, we do not allocate the full bandwidth of the cpu to deadline tasks. Maximum bandwidth usable by deadline tasks is denoted by "Umax". Considering Umax, equation (1) becomes: "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2) Current implementation has a minor bug in equation (2), which this patch fixes. The reclamation logic is verified by a sample program which creates multiple deadline threads and observing their utilization. The tests were run on an isolated cpu(isolcpus=3) on a 4 cpu system. Tests on 6.3.0 ============== RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33 TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35 RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95% TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 RUN 3: 2 tasks Task 1: runtime=1ms, deadline=period=10ms Task 2: runtime=1ms, deadline=period=100ms TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67 TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37 TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38 TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23 As seen above, the reclamation doesn't reclaim the maximum allowed bandwidth and as the bandwidth of tasks gets smaller, the reclaimed bandwidth also comes down. Tests with this patch applied ============================= RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19 TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16 RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95% TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27 TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21 RUN 3: 2 tasks Task 1: runtime=1ms, deadline=period=10ms Task 2: runtime=1ms, deadline=period=100ms TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64 TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66 TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45 TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73 Running tasks on all cpus allowing for migration also showed that the utilization is reclaimed to the maximum. Running 10 tasks on 3 cpus SCHED_FLAG_RECLAIM - top shows: %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng. (2015). Parallel and sequential reclaiming in multicore real-time global scheduling. Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230530135526.2385378-1-vineeth@bitbyteword.org
2023-06-16trace,smp: Add tracepoints for scheduling remotelly called functionsLeonardo Bras
Add a tracepoint for when a CSD is queued to a remote CPU's call_single_queue. This allows finding exactly which CPU queued a given CSD when looking at a csd_function_{entry,exit} event, and also enables us to accurately measure IPI delivery time with e.g. a synthetic event: $ echo 'hist:keys=cpu,csd.hex:ts=common_timestamp.usecs' >\ /sys/kernel/tracing/events/smp/csd_queue_cpu/trigger $ echo 'csd_latency unsigned int dst_cpu; unsigned long csd; u64 time' >\ /sys/kernel/tracing/synthetic_events $ echo \ 'hist:keys=common_cpu,csd.hex:'\ 'time=common_timestamp.usecs-$ts:'\ 'onmatch(smp.csd_queue_cpu).trace(csd_latency,common_cpu,csd,$time)' >\ /sys/kernel/tracing/events/smp/csd_function_entry/trigger $ trace-cmd record -e 'synthetic:csd_latency' hackbench $ trace-cmd report <...>-467 [001] 21.824263: csd_queue_cpu: cpu=0 callsite=try_to_wake_up+0x2ea func=sched_ttwu_pending csd=0xffff8880076148b8 <...>-467 [001] 21.824280: ipi_send_cpu: cpu=0 callsite=try_to_wake_up+0x2ea callback=generic_smp_call_function_single_interrupt+0x0 <...>-489 [000] 21.824299: csd_function_entry: func=sched_ttwu_pending csd=0xffff8880076148b8 <...>-489 [000] 21.824320: csd_latency: dst_cpu=0, csd=18446612682193848504, time=36 Suggested-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615065944.188876-7-leobras@redhat.com
2023-06-16trace,smp: Add tracepoints around remotelly called functionsLeonardo Bras
The recently added ipi_send_{cpu,cpumask} tracepoints allow finding sources of IPIs targeting CPUs running latency-sensitive applications. For NOHZ_FULL CPUs, all IPIs are interference, and those tracepoints are sufficient to find them and work on getting rid of them. In some setups however, not *all* IPIs are to be suppressed, but long-running IPI callbacks can still be problematic. Add a pair of tracepoints to mark the start and end of processing a CSD IPI callback, similar to what exists for softirq, workqueue or timer callbacks. Signed-off-by: Leonardo Bras <leobras@redhat.com> Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615065944.188876-5-leobras@redhat.com
2023-06-16net/mlx5e: Fix scheduling of IPsec ASO query while in atomicLeon Romanovsky
ASO query can be scheduled in atomic context as such it can't use usleep. Use udelay as recommended in Documentation/timers/timers-howto.rst. Fixes: 76e463f6508b ("net/mlx5e: Overcome slow response for first IPsec ASO WQE") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: Drop XFRM state lock when modifying flow steeringLeon Romanovsky
XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really need to hold lock while modifying flow steering rules to drop traffic. That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit() work will be canceled anyway and won't run in parallel. Fixes: b2f7b01d36a9 ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: Fix ESN update kernel panicPatrisious Haddad
Previously during mlx5e_ipsec_handle_event the driver tried to execute an operation that could sleep, while holding a spinlock, which caused the kernel panic mentioned below. Move the function call that can sleep outside of the spinlock context. Call Trace: <TASK> dump_stack_lvl+0x49/0x6c __schedule_bug.cold+0x42/0x4e schedule_debug.constprop.0+0xe0/0x118 __schedule+0x59/0x58a ? __mod_timer+0x2a1/0x3ef schedule+0x5e/0xd4 schedule_timeout+0x99/0x164 ? __pfx_process_timeout+0x10/0x10 __wait_for_common+0x90/0x1da ? __pfx_schedule_timeout+0x10/0x10 wait_func+0x34/0x142 [mlx5_core] mlx5_cmd_invoke+0x1f3/0x313 [mlx5_core] cmd_exec+0x1fe/0x325 [mlx5_core] mlx5_cmd_do+0x22/0x50 [mlx5_core] mlx5_cmd_exec+0x1c/0x40 [mlx5_core] mlx5_modify_ipsec_obj+0xb2/0x17f [mlx5_core] mlx5e_ipsec_update_esn_state+0x69/0xf0 [mlx5_core] ? wake_affine+0x62/0x1f8 mlx5e_ipsec_handle_event+0xb1/0xc0 [mlx5_core] process_one_work+0x1e2/0x3e6 ? __pfx_worker_thread+0x10/0x10 worker_thread+0x54/0x3ad ? __pfx_worker_thread+0x10/0x10 kthread+0xda/0x101 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x37 </TASK> BUG: workqueue leaked lock or atomic: kworker/u256:4/0x7fffffff/189754#012 last function: mlx5e_ipsec_handle_event [mlx5_core] CPU: 66 PID: 189754 Comm: kworker/u256:4 Kdump: loaded Tainted: G W 6.2.0-2596.20230309201517_5.el8uek.rc1.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X9-2/ASMMBX9-2, BIOS 61070300 08/17/2022 Workqueue: mlx5e_ipsec: eth%d mlx5e_ipsec_handle_event [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x49/0x6c process_one_work.cold+0x2b/0x3c ? __pfx_worker_thread+0x10/0x10 worker_thread+0x54/0x3ad ? __pfx_worker_thread+0x10/0x10 kthread+0xda/0x101 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x37 </TASK> BUG: scheduling while atomic: kworker/u256:4/189754/0x00000000 Fixes: cee137a63431 ("net/mlx5e: Handle ESN update events") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: Don't delay release of hardware objectsLeon Romanovsky
XFRM core provides two callbacks to release resources, one is .xdo_dev_policy_delete() and another is .xdo_dev_policy_free(). This separation allows delayed release so "ip xfrm policy free" commands won't starve. Unfortunately, mlx5 command interface can't run in .xdo_dev_policy_free() callbacks as the latter runs in ATOMIC context. BUG: scheduling while atomic: swapper/7/0/0x00000100 Modules linked in: act_mirred act_tunnel_key cls_flower sch_ingress vxlan mlx5_vdpa vringh vhost_iotlb vdpa rpcrdma rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.3.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x33/0x50 __schedule_bug+0x4e/0x60 __schedule+0x5d5/0x780 ? __mod_timer+0x286/0x3d0 schedule+0x50/0x90 schedule_timeout+0x7c/0xf0 ? __bpf_trace_tick_stop+0x10/0x10 __wait_for_common+0x88/0x190 ? usleep_range_state+0x90/0x90 cmd_exec+0x42e/0xb40 [mlx5_core] mlx5_cmd_do+0x1e/0x40 [mlx5_core] mlx5_cmd_exec+0x18/0x30 [mlx5_core] mlx5_cmd_delete_fte+0xa8/0xd0 [mlx5_core] del_hw_fte+0x60/0x120 [mlx5_core] mlx5_del_flow_rules+0xec/0x270 [mlx5_core] ? default_send_IPI_single_phys+0x26/0x30 mlx5e_accel_ipsec_fs_del_pol+0x1a/0x60 [mlx5_core] mlx5e_xfrm_free_policy+0x15/0x20 [mlx5_core] xfrm_policy_destroy+0x5a/0xb0 xfrm4_dst_destroy+0x7b/0x100 dst_destroy+0x37/0x120 rcu_core+0x2d6/0x540 __do_softirq+0xcd/0x273 irq_exit_rcu+0x82/0xb0 sysvec_apic_timer_interrupt+0x72/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:default_idle+0x13/0x20 Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 7a 4d ee 00 85 c0 7e 07 0f 00 2d 2f 98 2e 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 40 b4 02 00 RSP: 0018:ffff888100843ee0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffff888100812b00 RCX: 4000000000000000 RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000002d2ec RBP: 0000000000000007 R08: 00000021daeded59 R09: 0000000000000001 R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 default_idle_call+0x30/0xb0 do_idle+0x1c1/0x1d0 cpu_startup_entry+0x19/0x20 start_secondary+0xfe/0x120 secondary_startup_64_no_verify+0xf3/0xfb </TASK> bad: scheduling from the idle thread! Fixes: a5b8ca9471d3 ("net/mlx5e: Add XFRM policy offload logic") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Free IRQ rmap and notifier on kernel shutdownSaeed Mahameed
The kernel IRQ system needs the irq affinity notifier to be clear before attempting to free the irq, see WARN_ON log below. On a normal driver unload we don't have this issue since we do the complete cleanup of the irq resources. To fix this, put the important resources cleanup in a helper function and use it in both normal driver unload and shutdown flows. [ 4497.498434] ------------[ cut here ]------------ [ 4497.498726] WARNING: CPU: 0 PID: 9 at kernel/irq/manage.c:2034 free_irq+0x295/0x340 [ 4497.499193] Modules linked in: [ 4497.499386] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.4.0-rc4+ #10 [ 4497.499876] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 [ 4497.500518] Workqueue: events do_poweroff [ 4497.500849] RIP: 0010:free_irq+0x295/0x340 [ 4497.501132] Code: 85 c0 0f 84 1d ff ff ff 48 89 ef ff d0 0f 1f 00 e9 10 ff ff ff 0f 0b e9 72 ff ff ff 49 8d 7f 28 ff d0 0f 1f 00 e9 df fd ff ff <0f> 0b 48 c7 80 c0 008 [ 4497.502269] RSP: 0018:ffffc90000053da0 EFLAGS: 00010282 [ 4497.502589] RAX: ffff888100949600 RBX: ffff88810330b948 RCX: 0000000000000000 [ 4497.503035] RDX: ffff888100949600 RSI: ffff888100400490 RDI: 0000000000000023 [ 4497.503472] RBP: ffff88810330c7e0 R08: ffff8881004005d0 R09: ffffffff8273a260 [ 4497.503923] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881009ae000 [ 4497.504359] R13: ffff8881009ae148 R14: 0000000000000000 R15: ffff888100949600 [ 4497.504804] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 4497.505302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4497.505671] CR2: 00007fce98806298 CR3: 000000000262e005 CR4: 0000000000370ef0 [ 4497.506104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4497.506540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4497.507002] Call Trace: [ 4497.507158] <TASK> [ 4497.507299] ? free_irq+0x295/0x340 [ 4497.507522] ? __warn+0x7c/0x130 [ 4497.507740] ? free_irq+0x295/0x340 [ 4497.507963] ? report_bug+0x171/0x1a0 [ 4497.508197] ? handle_bug+0x3c/0x70 [ 4497.508417] ? exc_invalid_op+0x17/0x70 [ 4497.508662] ? asm_exc_invalid_op+0x1a/0x20 [ 4497.508926] ? free_irq+0x295/0x340 [ 4497.509146] mlx5_irq_pool_free_irqs+0x48/0x90 [ 4497.509421] mlx5_irq_table_free_irqs+0x38/0x50 [ 4497.509714] mlx5_core_eq_free_irqs+0x27/0x40 [ 4497.509984] shutdown+0x7b/0x100 [ 4497.510184] pci_device_shutdown+0x30/0x60 [ 4497.510440] device_shutdown+0x14d/0x240 [ 4497.510698] kernel_power_off+0x30/0x70 [ 4497.510938] process_one_work+0x1e6/0x3e0 [ 4497.511183] worker_thread+0x49/0x3b0 [ 4497.511407] ? __pfx_worker_thread+0x10/0x10 [ 4497.511679] kthread+0xe0/0x110 [ 4497.511879] ? __pfx_kthread+0x10/0x10 [ 4497.512114] ret_from_fork+0x29/0x50 [ 4497.512342] </TASK> Fixes: 9c2d08010963 ("net/mlx5: Free irqs only on shutdown callback") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com>
2023-06-16net/mlx5: DR, Fix wrong action data allocation in decap actionYevgeny Kliteynik
When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local variable was passed as its HW action data, resulting in attempt to free invalid address: BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core] Fixes: 4781df92f4da ("net/mlx5: DR, Move STEv0 modify header logic") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: DR, Support SW created encap actions for FW tableYevgeny Kliteynik
In some cases, steering might need to use SW-created action in FW table, which results in wrong packet reformat being used: mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154): SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed, status bad resource(0×5), syndrome (0xf2ff71) This patch adds support for usage of SW-created packet reformat (encap) actions in FW tables, and adds clear error flow for attempt to use SW-created modify header on FW tables. Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: TC, Cleanup ct resources for nic flowChris Mi
The cited commit removes special handling of CT action. But it removes too much. Pre ct/ct_nat tables and some other resources are not destroyed due to the cited commit. Fix it by adding it back. Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: TC, Add null pointer check for hardware miss supportChris Mi
The cited commits add hardware miss support to tc action. But if the rules can't be offloaded, the pointers are null and system will panic when accessing them. Fix it by checking null pointer. Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Fix driver load with single msix vectorEli Cohen
When a PCI device has just one msix vector available, we want to share this vector between async and completion events. Current code fails to do that assuming it will always have at least one dedicated vector for completion events. Fix this by detecting when the pool contains just a single vector. Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: xsk: Set napi_id to support busy polling on XSK RQMaxim Mikityanskiy
The cited commit missed setting napi_id on XSK RQs, it only affected regular RQs. Add the missing part to support socket busy polling on XSK RQs. Fixes: a2740f529da2 ("net/mlx5e: xsk: Set napi_id to support busy polling") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: XDP, Allow growing tail for XDP multi bufferMaxim Mikityanskiy
The cited commits missed passing frag_size to __xdp_rxq_info_reg, which is required by bpf_xdp_adjust_tail to support growing the tail pointer in fragmented packets. Pass the missing parameter when the current RQ mode allows XDP multi buffer. Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") Fixes: 9cb9482ef10e ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Cc: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16Merge tag 'for-6.4-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two fixes for NOCOW files, a regression fix in scrub and an assertion fix: - NOCOW fixes: - keep length of iomap direct io request in case of a failure - properly pass mode of extent reference checking, this can break some cases for swapfile - fix error value confusion when scrubbing a stripe - convert assertion to a proper error handling when loading global roots, reported by syzbot" * tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: fix a return value overwrite in scrub_stripe() btrfs: do not ASSERT() on duplicated global roots btrfs: can_nocow_file_extent should pass down args->strict from callers btrfs: fix iomap_begin length for nocow writes
2023-06-16Merge tag 'block-6.4-2023-06-15' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fix from Jens Axboe: "Just a single fix for blk-cg stats flushing" * tag 'block-6.4-2023-06-15' of git://git.kernel.dk/linux: blk-cgroup: Flush stats before releasing blkcg_gq
2023-06-16Merge tag 'io_uring-6.4-2023-06-15' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A fix for sendmsg with CMSG, and the followup fix discussed for avoiding touching task->worker_private after the worker has started exiting" * tag 'io_uring-6.4-2023-06-15' of git://git.kernel.dk/linux: io_uring/io-wq: clear current->worker_private on exit io_uring/net: save msghdr->msg_control for retries