summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-26samples/kprobes: Add riscv supportJisheng Zhang
Add riscv specific info dump in both handler_pre() and handler_post(). Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: Select HAVE_DYNAMIC_FTRACE when -fpatchable-function-entry is availableNathan Chancellor
clang prior to 13.0.0 does not support -fpatchable-function-entry for RISC-V. clang: error: unsupported option '-fpatchable-function-entry=8' for target 'riscv64-unknown-linux-gnu' To avoid this error, only select HAVE_DYNAMIC_FTRACE when this option is not available. Fixes: afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") Link: https://github.com/ClangBuiltLinux/linux/issues/1268 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Fangrui Song <maskray@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: Workaround mcount name prior to clang-13Nathan Chancellor
Prior to clang 13.0.0, the RISC-V name for the mcount symbol was "mcount", which differs from the GCC version of "_mcount", which results in the following errors: riscv64-linux-gnu-ld: init/main.o: in function `__traceiter_initcall_level': main.c:(.text+0xe): undefined reference to `mcount' riscv64-linux-gnu-ld: init/main.o: in function `__traceiter_initcall_start': main.c:(.text+0x4e): undefined reference to `mcount' riscv64-linux-gnu-ld: init/main.o: in function `__traceiter_initcall_finish': main.c:(.text+0x92): undefined reference to `mcount' riscv64-linux-gnu-ld: init/main.o: in function `.LBB32_28': main.c:(.text+0x30c): undefined reference to `mcount' riscv64-linux-gnu-ld: init/main.o: in function `free_initmem': main.c:(.text+0x54c): undefined reference to `mcount' This has been corrected in https://reviews.llvm.org/D98881 but the minimum supported clang version is 10.0.1. To avoid build errors and to gain a working function tracer, adjust the name of the mcount symbol for older versions of clang in mount.S and recordmcount.pl. Link: https://github.com/ClangBuiltLinux/linux/issues/1331 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26scripts/recordmcount.pl: Fix RISC-V regex for clangNathan Chancellor
Clang can generate R_RISCV_CALL_PLT relocations to _mcount: $ llvm-objdump -dr build/riscv/init/main.o | rg mcount 000000000000000e: R_RISCV_CALL_PLT _mcount 000000000000004e: R_RISCV_CALL_PLT _mcount After this, the __start_mcount_loc section is properly generated and function tracing still works. Link: https://github.com/ClangBuiltLinux/linux/issues/1331 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Fangrui Song <maskray@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: Use $(LD) instead of $(CC) to link vDSONathan Chancellor
Currently, the VDSO is being linked through $(CC). This does not match how the rest of the kernel links objects, which is through the $(LD) variable. When linking with clang, there are a couple of warnings about flags that will not be used during the link: clang-12: warning: argument unused during compilation: '-no-pie' [-Wunused-command-line-argument] clang-12: warning: argument unused during compilation: '-pg' [-Wunused-command-line-argument] '-no-pie' was added in commit 85602bea297f ("RISC-V: build vdso-dummy.o with -no-pie") to override '-pie' getting added to the ld command from distribution versions of GCC that enable PIE by default. It is technically no longer needed after commit c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+"), which removed vdso-dummy.o in favor of generating vdso-syms.S from vdso.so with $(NM) but this also resolves the issue in case it ever comes back due to having full control over the $(LD) command. '-pg' is for function tracing, it is not used during linking as clang states. These flags could be removed/filtered to fix the warnings but it is easier to just match the rest of the kernel and use $(LD) directly for linking. See commits fe00e50b2db8 ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO") 691efbedc60d ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO") 2ff906994b6c ("MIPS: VDSO: Use $(LD) instead of $(CC) to link VDSO") 2b2a25845d53 ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO") for more information. The flags are converted to linker flags and '--eh-frame-hdr' is added to match what is added by GCC implicitly, which can be seen by adding '-v' to GCC's invocation. Additionally, since this area is being modified, use the $(OBJCOPY) variable instead of an open coded $(CROSS_COMPILE)objcopy so that the user's choice of objcopy binary is respected. Link: https://github.com/ClangBuiltLinux/linux/issues/803 Link: https://github.com/ClangBuiltLinux/linux/issues/970 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Fangrui Song <maskray@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: sifive: Apply errata "cip-1200" patchVincent Chen
For certain SiFive CPUs, "sfence.vma addr" cannot exactly flush addr from TLB in the particular cases. The details could be found here: https://sifive.cdn.prismic.io/sifive/167a1a56-03f4-4615-a79e-b2a86153148f_FU740_errata_20210205.pdf In order to ensure the functionality, this patch uses the Alternative scheme to replace all "sfence.vma addr" with "sfence.vma" at runtime. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: sifive: Apply errata "cip-453" patchVincent Chen
Add sign extension to the $badaddr before addressing the instruction page fault and instruction access fault to workaround the issue "cip-453". To avoid affecting the existing code sequence, this patch will creates two trampolines to add sign extension to the $badaddr. By the "alternative" mechanism, these two trampolines will replace the original exception handler of instruction page fault and instruction access fault in the excp_vect_table. In this case, only the specific SiFive CPU core jumps to the do_page_fault and do_trap_insn_fault through these two trampolines. Other CPUs are not affected. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: sifive: Add SiFive alternative portsVincent Chen
Add required ports of the Alternative scheme for SiFive. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: Introduce alternative mechanism to apply errata solutionVincent Chen
Introduce the "alternative" mechanism from ARM64 and x86 to apply the CPU vendors' errata solution at runtime. The main purpose of this patch is to provide a framework. Therefore, the implementation is quite basic for now so that some scenarios could not use this schemei, such as patching code to a module, relocating the patching code and heterogeneous CPU topology. Users could use the macro ALTERNATIVE to apply an errata to the existing code flow. In the macro ALTERNATIVE, users need to specify the manufacturer information(vendorid, archid, and impid) for this errata. Therefore, kernel will know this errata is suitable for which CPU core. During the booting procedure, kernel will select the errata required by the CPU core and then patch it. It means that the kernel only applies the errata to the specified CPU core. In this case, the vendor's errata does not affect each other at runtime. The above patching procedure only occurs during the booting phase, so we only take the overhead of the "alternative" mechanism once. This "alternative" mechanism is enabled by default to ensure that all required errata will be applied. However, users can disable this feature by the Kconfig "CONFIG_RISCV_ERRATA_ALTERNATIVE". Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: Add 3 SBI wrapper functions to get cpu manufacturer informationVincent Chen
Add 3 wrapper functions to get vendor id, architecture id and implement id from M-mode Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26Merge branch 'acpi-misc'Rafael J. Wysocki
* acpi-misc: ACPI: dock: fix some coding style issues ACPI: sysfs: fix some coding style issues ACPI: PM: add a missed blank line after declarations ACPI: custom_method: fix a coding style issue ACPI: CPPC: fix some coding style issues ACPI: button: fix some coding style issues ACPI: battery: fix some coding style issues ACPI: acpi_pad: add a missed blank line after declarations ACPI: LPSS: add a missed blank line after declarations ACPI: ipmi: remove useless return statement for void function ACPI: processor: fix some coding style issues ACPI: APD: fix a block comment align issue ACPI: AC: fix some coding style issues ACPI: fix various typos in comments
2021-04-26drivers/block/null_blk/main: Fix a double free in null_init.Lv Yunlong
In null_init, null_add_dev(dev) is called. In null_add_dev, it calls null_free_zoned_dev(dev) to free dev->zones via kvfree(dev->zones) in out_cleanup_zone branch and returns err. Then null_init accept the err code and then calls null_free_dev(dev). But in null_free_dev(dev), dev->zones is freed again by null_free_zoned_dev(). My patch set dev->zones to NULL in null_free_zoned_dev() after kvfree(dev->zones) is called, to avoid the double free. Fixes: 2984c8684f962 ("nullb: factor disk parameters") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Link: https://lore.kernel.org/r/20210426143229.7374-1-lyl2019@mail.ustc.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-26Merge branches 'acpi-cppc', 'acpi-video' and 'acpi-utils'Rafael J. Wysocki
* acpi-cppc: ACPI: CPPC: Replace cppc_attr with kobj_attribute ACPI: CPPC: Add emtpy stubs of functions for CONFIG_ACPI_CPPC_LIB unset * acpi-video: ACPI: video: use native backlight for GA401/GA502/GA503 ACPI: video: Check LCD flag on ACPI-reduced-hardware devices ACPI: utils: Add acpi_reduced_hardware() helper * acpi-utils: ACPI: utils: Capitalize abbreviations in the comments ACPI: utils: Document for_each_acpi_dev_match() macro
2021-04-26io_uring: fix NULL reg-bufferPavel Begunkov
io_import_fixed() doesn't expect a registered buffer slot to be NULL and would fail stumbling on it. We don't allow it, but if during __io_sqe_buffers_update() rsrc removal succeeds but following register fails, we'll get such a situation. Do it atomically and don't remove buffers until we sure that a new one can be set. Fixes: 634d00df5e1cf ("io_uring: add full-fledged dynamic buffers support") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/830020f9c387acddd51962a3123b5566571b8c6d.1619446608.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-26Merge branches 'acpi-scan', 'acpi-drivers', 'acpi-pm' and 'acpi-resources'Rafael J. Wysocki
* acpi-scan: ACPI: bus: Introduce acpi_dev_get() and reuse it in ACPI code ACPI: scan: Utilize match_string() API ACPI: scan: Call acpi_get_object_info() from acpi_set_pnp_ids() ACPI: scan: Drop sta argument from acpi_init_device_object() ACPI: scan: Drop sta argument from acpi_add_single_object() ACPI: scan: Rearrange checks in acpi_bus_check_add() ACPI: scan: Fold acpi_bus_type_and_status() into its caller * acpi-drivers: ACPI: HED: Drop unused ACPI_MODULE_NAME() definition * acpi-pm: ACPI: power: Turn off unused power resources unconditionally ACPI: scan: Turn off unused power resources during initialization * acpi-resources: resource: Prevent irqresource_disabled() from erasing flags
2021-04-26Merge branch 'acpi-messages'Rafael J. Wysocki
* acpi-messages: hwmon: acpi_power_meter: Get rid of ACPICA message printing IIO: acpi-als: Get rid of ACPICA message printing ACPI: utils: Introduce acpi_evaluation_failure_warn() ACPI: Drop unused ACPI_*_COMPONENT definitions and update documentation ACPI: sysfs: Get rid of ACPICA message printing
2021-04-26Merge branches 'acpi-pci' and 'acpi-processor'Rafael J. Wysocki
* acpi-pci: ACPI: PCI: Replace direct printk() invocations in pci_link.c ACPI: PCI: Drop ACPI_PCI_COMPONENT that is not used any more ACPI: PCI: Replace ACPI_DEBUG_PRINT() and ACPI_EXCEPTION() ACPI: PCI: IRQ: Consolidate printing diagnostic messages * acpi-processor: ACPI: processor: perflib: Eliminate redundant status check ACPI: processor: Get rid of ACPICA message printing ACPI: processor: idle: Drop extra prefix from pr_notice() ACPI: processor: Remove initialization of static variable
2021-04-26Merge branch 'acpica'Rafael J. Wysocki
* acpica: (22 commits) ACPICA: Update version to 20210331 ACPICA: IORT: Updates for revision E.b ACPICA: acpisrc: Add missing conversion for VIOT support ACPICA: iASL: Decode subtable type field for VIOT ACPICA: iASL: Add support for CEDT table ACPICA: ACPI 6.4: add support for PHAT table ACPICA: ACPI 6.4: add CSI2Bus resource template ACPICA: ACPI 6.4: PMTT: add new fields/structures ACPICA: CXL 2.0: CEDT: Add new CEDT table ACPICA: iASL: Add definitions for the VIOT table ACPICA: ACPI 6.4: add SDEV secure access components ACPICA: ACPI 6.4: Add new flags in SRAT ACPICA: ACPI 6.4: HMAT: add new fields/flags ACPICA: ACPI 6.4: NFIT: add Location Cookie field ACPICA: Tree-wide: fix various typos and spelling mistakes ACPICA: ACPI 6.4: PPTT: add new version of subtable type 1 ACPICA: ACPI 6.4: PCCT: add support for subtable type 5 ACPICA: ACPI 6.4: MADT: add Multiprocessor Wakeup Structure ACPICA: ACPI 6.4: add CXL ACPI device ID and _CBR object ACPICA: ACPI 6.4: add USB4 capabilities UUID ...
2021-04-26Merge branches 'pm-docs' and 'pm-tools'Rafael J. Wysocki
* pm-docs: PM: clk: remove kernel-doc warning PM: wakeup: fix kernel-doc warnings and fix typos PM: runtime: remove kernel-doc warnings * pm-tools: pm-graph: Fix typo "accesible"
2021-04-26Merge branch 'pm-devfreq'Rafael J. Wysocki
* pm-devfreq: PM / devfreq: imx8m-ddrc: Remove unneeded of_match_ptr() PM / devfreq: imx-bus: Remove unneeded of_match_ptr() PM / devfreq: imx8m-ddrc: Remove imx8m_ddrc_get_dev_status PM / devfreq: Remove the invalid description for get_target_freq PM / devfreq: Check get_dev_status in devfreq_update_stats PM / devfreq: Fix the wrong set_freq path for userspace governor in Kconfig dt-bindings: devfreq: rk3399_dmc: Remove references of unexistant defines dt-bindings: devfreq: rk3399_dmc: Add rockchip,pmu phandle. PM / devfreq: rk3399_dmc: Simplify with dev_err_probe() PM / devfreq: Use more accurate returned new_freq as resume_freq PM / devfreq: Unlock mutex and free devfreq struct in error path PM / devfreq: Register devfreq as a cooling device on demand
2021-04-26Merge branch 'pm-opp'Rafael J. Wysocki
* pm-opp: memory: samsung: exynos5422-dmc: Convert to use resource-managed OPP API drm/panfrost: Convert to use resource-managed OPP API drm/lima: Convert to use resource-managed OPP API mmc: sdhci-msm: Convert to use resource-managed OPP API spi: spi-qcom-qspi: Convert to use resource-managed OPP API spi: spi-geni-qcom: Convert to use resource-managed OPP API serial: qcom_geni_serial: Convert to use resource-managed OPP API opp: Change return type of devm_pm_opp_attach_genpd() opp: Change return type of devm_pm_opp_register_set_opp_helper() opp: Add devres wrapper for dev_pm_opp_of_add_table opp: Add devres wrapper for dev_pm_opp_set_supported_hw opp: Add devres wrapper for dev_pm_opp_set_regulators opp: Add devres wrapper for dev_pm_opp_set_clkname
2021-04-26Merge tag 'asoc-v5.13' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v5.13 A lot of changes here for quite a quiet release in subsystem terms - there's been a lot of fixes and cleanups all over the subsystem both from generic work and from people working on specific drivers. - More cleanup and consolidation work in the core and the generic card drivers from Morimoto-san. - Lots of cppcheck fixes for Pierre-Louis Brossart. - New drivers for Freescale i.MX DMA over rpmsg, Mediatek MT6358 accessory detection, and Realtek RT1019, RT1316, RT711 and RT715.
2021-04-26Merge branches 'pm-core', 'pm-pci', 'pm-sleep', 'pm-domains' and 'powercap'Rafael J. Wysocki
* pm-core: PM: runtime: Add documentation for pm_runtime_resume_and_get() PM: runtime: Replace inline function pm_runtime_callbacks_present() PM: core: Remove duplicate declaration from header file * pm-pci: PCI: PM: Do not read power state in pci_enable_device_flags() * pm-sleep: PM: wakeup: remove redundant assignment to variable retval PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check PM: wakeup: use dev_set_name() directly PM: sleep: fix typos in comments freezer: Remove unused inline function try_to_freeze_nowarn() * pm-domains: PM: domains: Don't runtime resume devices at genpd_prepare() * powercap: powercap: RAPL: Fix struct declaration in header file MAINTAINERS: Add DTPM subsystem maintainer powercap: Add Hygon Fam18h RAPL support
2021-04-26Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: (22 commits) cpufreq: Kconfig: fix documentation links cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits() cpufreq: armada-37xx: Fix module unloading cpufreq: armada-37xx: Remove cur_frequency variable cpufreq: armada-37xx: Fix determining base CPU frequency cpufreq: armada-37xx: Fix driver cleanup when registration failed clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0 clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz cpufreq: armada-37xx: Fix the AVS value for load L1 clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock cpufreq: armada-37xx: Fix setting TBG parent for load levels cpufreq: Remove unused for_each_policy macro cpufreq: dt: dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER cpufreq: intel_pstate: Clean up frequency computations cpufreq: cppc: simplify default delay_us setting cpufreq: Rudimentary typos fix in the file s5pv210-cpufreq.c cpufreq: CPPC: Add support for frequency invariance ia64: fix format string for ia64-acpi-cpu-freq cpufreq: schedutil: Call sugov_update_next_freq() before check to fast_switch_enabled arch_topology: Export arch_freq_scale and helpers ...
2021-04-26dt-bindings: mailbox: qcom-ipcc: Add compatible for SC7280Sai Prakash Ranjan
Add IPCC compatible for SC7280 SoC. Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-04-26ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixerLv Yunlong
Our code analyzer reported a uaf. In snd_emu8000_create_mixer, the callee snd_ctl_add(..,emu->controls[i]) calls snd_ctl_add_replace(.., kcontrol,..). Inside snd_ctl_add_replace(), if error happens, kcontrol will be freed by snd_ctl_free_one(kcontrol). Then emu->controls[i] points to a freed memory, and the execution comes to __error branch of snd_emu8000_create_mixer. The freed emu->controls[i] is used in snd_ctl_remove(card, emu->controls[i]). My patch set emu->controls[i] to NULL if snd_ctl_add() failed to avoid the uaf. Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210426131129.4796-1-lyl2019@mail.ustc.edu.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-04-26NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.Dai Ngo
The client SSC code should not depend on any of the CONFIG_NFSD config. This patch removes all CONFIG_NFSD from NFSv4.2 client SSC code and simplifies the config of CONFIG_NFS_V4_2_SSC_HELPER, NFSD_V4_2_INTER_SSC. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Move fr_mr field to struct rpcrdma_mrChuck Lever
Clean up: The last remaining field in struct rpcrdma_frwr has been removed, so the struct can be eliminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Move the Work Request union to struct rpcrdma_mrChuck Lever
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Move fr_linv_done field to struct rpcrdma_mrChuck Lever
Clean up: Move more of struct rpcrdma_frwr into its parent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Move cqe to struct rpcrdma_mrChuck Lever
Clean up. - Simplify variable initialization in the completion handlers. - Move another field out of struct rpcrdma_frwr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Move fr_cid to struct rpcrdma_mrChuck Lever
Clean up (for several purposes): - The MR's cid is initialized sooner so that tracepoints can show something reasonable even if the MR is never posted. - The MR's res.id doesn't change so the cid won't change either. Initializing the cid once is sufficient. - struct rpcrdma_frwr is going away soon. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Remove the RPC/RDMA QP event handlerChuck Lever
Clean up: The handler only recorded a trace event. If indeed no action is needed by the RPC/RDMA consumer, then the event can be ignored. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Don't display r_xprt memory addresses in tracepointsChuck Lever
The remote peer's IP address is sufficient, and does not expose details of the kernel's memory layout. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Add an rpcrdma_mr_completion_classChuck Lever
I found it confusing that the MR_EVENT class displays the mr.id but the associated COMPLETION_EVENT class displays a cid (that happens to contain the mr.id!). To make it a little easier on humans who have to read and interpret these events, create an MR_COMPLETION class that displays the mr.id in the same way as the MR_EVENT class. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Add tracepoints showing FastReg WRs and remote invalidationChuck Lever
The Send signaling logic is a little subtle, so add some observability around it. For every xprtrdma_mr_fastreg event, there should be an xprtrdma_mr_localinv or xprtrdma_mr_reminv event. When these tracepoints are enabled, we can see exactly when an MR is DMA-mapped, registered, invalidated (either locally or remotely) and then DMA-unmapped. kworker/u25:2-190 [000] 787.979512: xprtrdma_mr_map: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) kworker/u25:2-190 [000] 787.979515: xprtrdma_chunk_read: task:351@5 pos=148 5608@0x8679e0c8f6f56000:0x00000503 (last) kworker/u25:2-190 [000] 787.979519: xprtrdma_marshal: task:351@5 xid=0x8679e0c8: hdr=52 xdr=148/5608/0 read list/inline kworker/u25:2-190 [000] 787.979525: xprtrdma_mr_fastreg: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) kworker/u25:2-190 [000] 787.979526: xprtrdma_post_send: task:351@5 cq.id=0 cid=73 (2 SGEs) ... kworker/5:1H-219 [005] 787.980567: xprtrdma_wc_receive: cq.id=1 cid=161 status=SUCCESS (0/0x0) received=164 kworker/5:1H-219 [005] 787.980571: xprtrdma_post_recvs: peer=[192.168.100.55]:20049 r_xprt=0xffff8884974d4000: 0 new recvs, 70 active (rc 0) kworker/5:1H-219 [005] 787.980573: xprtrdma_reply: task:351@5 xid=0x8679e0c8 credits=64 kworker/5:1H-219 [005] 787.980576: xprtrdma_mr_reminv: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) kworker/5:1H-219 [005] 787.980577: xprtrdma_mr_unmap: mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE) Note that I've moved the xprtrdma_post_send tracepoint so that event always appears after the xprtrdma_mr_fastreg tracepoint. Otherwise the event log looks counterintuitive (FastReg is always supposed to happen before Send). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Avoid Send Queue wrappingChuck Lever
Send WRs can be signalled or unsignalled. A signalled Send WR always has a matching Send completion, while a unsignalled Send has a completion only if the Send WR fails. xprtrdma has a Send account mechanism that is designed to reduce the number of signalled Send WRs. This in turn mitigates the interrupt rate of the underlying device. RDMA consumers can't leave all Sends unsignaled, however, because providers rely on Send completions to maintain their Send Queue head and tail pointers. xprtrdma counts the number of unsignaled Send WRs that have been posted to ensure that Sends are signalled often enough to prevent the Send Queue from wrapping. This mechanism neglected to account for FastReg WRs, which are posted on the Send Queue but never signalled. As a result, the Send Queue wrapped on occasion, resulting in duplication completions of FastReg and LocalInv WRs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Do not wake RPC consumer on a failed LocalInvChuck Lever
Throw away any reply where the LocalInv flushes or could not be posted. The registered memory region is in an unknown state until the disconnect completes. rpcrdma_xprt_disconnect() will find and release the MR. No need to put it back on the MR free list in this case. The client retransmits pending RPC requests once it reestablishes a fresh connection, so a replacement reply should be forthcoming on the next connection instance. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Do not recycle MR after FastReg/LocalInv flushesChuck Lever
Better not to touch MRs involved in a flush or post error until the Send and Receive Queues are drained and the transport is fully quiescent. Simply don't insert such MRs back onto the free list. They remain on mr_all and will be released when the connection is torn down. I had thought that recycling would prevent hardware resources from being tied up for a long time. However, since v5.7, a transport disconnect destroys the QP and other hardware-owned resources. The MRs get cleaned up nicely at that point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()Chuck Lever
Clean up: The comment and the placement of the memory barrier is confusing. Humans want to read the function statements from head to tail. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Rename frwr_release_mr()Chuck Lever
Clean up: To be consistent with other functions in this source file, follow the naming convention of putting the object being acted upon before the action itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: rpcrdma_mr_pop() already does list_del_init()Chuck Lever
The rpcrdma_mr_pop() earlier in the function has already cleared out mr_list, so it must not be done again in the error path. Fixes: 847568942f93 ("xprtrdma: Remove fr_state") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Delete rpcrdma_recv_buffer_put()Chuck Lever
Clean up: The name recv_buffer_put() is a vestige of older code, and the function is just a wrapper for the newer rpcrdma_rep_put(). In most of the existing call sites, a pointer to the owning rpcrdma_buffer is already available. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Fix cwnd update orderingChuck Lever
After a reconnect, the reply handler is opening the cwnd (and thus enabling more RPC Calls to be sent) /before/ rpcrdma_post_recvs() can post enough Receive WRs to receive their replies. This causes an RNR and the new connection is lost immediately. The race is most clearly exposed when KASAN and disconnect injection are enabled. This slows down rpcrdma_rep_create() enough to allow the send side to post a bunch of RPC Calls before the Receive completion handler can invoke ib_post_recv(). Fixes: 2ae50ad68cd7 ("xprtrdma: Close window between waking RPC senders and posting Receives") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Improve locking around rpcrdma_rep creationChuck Lever
Defensive clean up: Protect the rb_all_reps list during rep creation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Improve commentary around rpcrdma_reps_unmap()Chuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Improve locking around rpcrdma_rep destructionChuck Lever
Currently rpcrdma_reps_destroy() assumes that, at transport tear-down, the content of the rb_free_reps list is the same as the content of the rb_all_reps list. Although that is usually true, using the rb_all_reps list should be more reliable because of the way it's managed. And, rpcrdma_reps_unmap() uses rb_all_reps; these two functions should both traverse the "all" list. Ensure that all rpcrdma_reps are always destroyed whether they are on the rep free list or not. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Put flushed Receives on free list instead of destroying themChuck Lever
Defer destruction of an rpcrdma_rep until transport tear-down to preserve the rb_all_reps list while Receives flush. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Do not refresh Receive Queue while it is drainingChuck Lever
Currently the Receive completion handler refreshes the Receive Queue whenever a successful Receive completion occurs. On disconnect, xprtrdma drains the Receive Queue. The first few Receive completions after a disconnect are typically successful, until the first flushed Receive. This means the Receive completion handler continues to post more Receive WRs after the drain sentinel has been posted. The late- posted Receives flush after the drain sentinel has completed, leading to a crash later in rpcrdma_xprt_disconnect(). To prevent this crash, xprtrdma has to ensure that the Receive handler stops posting Receives before ib_drain_rq() posts its drain sentinel. Suggested-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26xprtrdma: Avoid Receive Queue wrappingChuck Lever
Commit e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") increased the number of Receive WRs that are posted by the client, but did not increase the size of the Receive Queue allocated during transport set-up. This is usually not an issue because RPCRDMA_BACKWARD_WRS is defined as (32) when SUNRPC_BACKCHANNEL is defined. In cases where it isn't, there is a real risk of Receive Queue wrapping. Fixes: e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>