summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-06hwmon: (emc2305) fix pwm never being able to set lowerXingjiang Qiao
There are fields 'last_hwmon_state' and 'last_thermal_state' in the structure 'emc2305_cdev_data', which respectively store the cooling state set by the 'hwmon' and 'thermal' subsystem, and the driver author hopes that if the state set by 'hwmon' is lower than the value set by 'thermal', the driver will just save it without actually setting the pwm. Currently, the 'last_thermal_state' also be updated by 'hwmon', which will cause the cooling state to never be set to a lower value. This patch fixes that. Signed-off-by: Xingjiang Qiao <nanpuyue@gmail.com> Link: https://lore.kernel.org/r/20221206055331.170459-2-nanpuyue@gmail.com Fixes: 0d8400c5a2ce1 ("hwmon: (emc2305) add support for EMC2301/2/3/5 RPM-based PWM Fan Speed Controller.") [groeck: renamed emc2305_set_cur_state_shim -> __emc2305_set_cur_state] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-12-06cxl/pci: Remove endian confusionDan Williams
readl() already handles endian conversion. That's the main difference between readl() and __raw_readl(). This is benign on little-endian systems, but big endian systems will end up byte-swabbing twice. Fixes: 2905cb5236cb ("cxl/pci: Add (hopeful) error handling support") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/167030092025.4045167.10651070153523351093.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-06cxl/pci: Add some type-safety to the AER trace pointsDan Williams
The first argument to the CXL AER trace points is the source device. Pass a 'const struct device *' rather than a 'const char *' for more type precision / safety. Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/167030091477.4045167.15174636482098463885.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-06hwmon: (emc2305) fix unable to probe emc2301/2/3Xingjiang Qiao
The definitions of 'EMC2305_REG_PRODUCT_ID' and 'EMC2305_REG_DEVICE' are both '0xfd', they actually return the same value, but the values returned by emc2301/2/3/5 are different, so probe emc2301/2/3 will fail, This patch fixes that. Signed-off-by: Xingjiang Qiao <nanpuyue@gmail.com> Link: https://lore.kernel.org/r/20221206055331.170459-1-nanpuyue@gmail.com Fixes: 0d8400c5a2ce1 ("hwmon: (emc2305) add support for EMC2301/2/3/5 RPM-based PWM Fan Speed Controller.") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-12-06cxl/security: Drop security command ioctl uapiDan Williams
CXL PMEM security operations are routed through the NVDIMM sysfs interface. For this reason the corresponding commands are marked "exclusive" to preclude collisions between the ioctl ABI and the sysfs ABI. However, a better way to preclude that collision is to simply remove the ioctl ABI (command-id definitions) for those operations. Now that cxl_internal_send_cmd() (formerly cxl_mbox_send_cmd()) no longer needs to talk the cxl_mem_commands array, all of the uapi definitions for the security commands can be dropped. These never appeared in a released kernel, so no regression risk. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/167030056464.4044561.11486507095384253833.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-06cxl/mbox: Add variable output size validation for internal commandsDan Williams
cxl_internal_send_cmd() skips output size validation for variable output commands which is not ideal. Most of the time internal usages want to fail if the output size does not match what was requested. For other commands where the caller cannot predict the size there is usually a a header that conveys how much vaild data is in the payload. For those cases add @min_out as a parameter to specify what the minimum response payload needs to be for the caller to parse the rest of the payload. In this patch only Get Supported Logs has that behavior, but going forward records retrieval commands like Get Poison List and Get Event Records can use @min_out to retrieve a variable amount of records. Critically, this validation scheme skips the needs to interrogate the cxl_mem_commands array which in turn frees up the implementation to support internal command enabling without also enabling external / user commands. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/167030055918.4044561.10339573829837910505.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-06cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output sizeDan Williams
Internally cxl_mbox_send_cmd() converts all passed-in parameters to a 'struct cxl_mbox_cmd' instance and sends that to cxlds->mbox_send(). It then teases the possibilty that the caller can validate the output size. However, they cannot since the resulting output size is not conveyed to the called. Fix that by making the caller pass in a constructed 'struct cxl_mbox_cmd'. This prepares for a future patch to add output size validation on a per-command basis. Given the change in signature, also change the name to differentiate it from the user command submission path that performs more validation before generating the 'struct cxl_mbox_cmd' instance to execute. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/167030055370.4044561.17788093375112783036.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-06cxl/security: Fix Get Security State output payload endian handlingDan Williams
Multi-byte integer values in CXL mailbox payloads are little endian. Add a definition of the Get Security State output payload and convert the value before testing flags. Fixes: 328281155539 ("cxl/pmem: Introduce nvdimm_security_ops with ->get_flags() operation") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/167030054822.4044561.4917796262037689553.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-06lsm: Clarify documentation of vm_enough_memory hookRoberto Sassu
include/linux/lsm_hooks.h reports the result of the LSM infrastructure to the callers, not what LSMs should return to the LSM infrastructure. Clarify that and add that if all LSMs return a positive value __vm_enough_memory() will be called with cap_sys_admin set. If at least one LSM returns 0 or negative, it will be called with cap_sys_admin cleared. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-12-06ARM: dts: socfpga: Fix pca9548 i2c-mux node nameGeert Uytterhoeven
"make dtbs_check": arch/arm/boot/dts/socfpga_cyclone5_vining_fpga.dtb: i2cswitch@70: $nodename:0: 'i2cswitch@70' does not match '^(i2c-?)?mux' From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml arch/arm/boot/dts/socfpga_cyclone5_vining_fpga.dtb: i2cswitch@70: Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'i2c@0', 'i2c@1', 'i2c@2', 'i2c@3', 'i2c@4', 'i2c@5', 'i2c@6', 'i2c@7' were unexpected) From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml Fix this by renaming the PCA9548 node to "i2c-mux", to match the I2C bus multiplexer/switch DT bindings and the Generic Names Recommendation in the Devicetree Specification. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2022-12-06dt-bindings: Drop Jee Heng SiaKrzysztof Kozlowski
Emails to Jee Heng Sia bounce ("550 #5.1.0 Address rejected."). Add Keembay platform maintainers as Keembay I2S maintainers. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20221205164254.36418-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: thermal: cooling-devices: Add missing cache related propertiesRob Herring
The examples' cache nodes are incomplete as 'cache-unified' and 'cache-level' are required cache properties. Acked-by: Amit Kucheria <amitk@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221104162450.1982114-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: irled: ir-spi-led: convert to DT schemaKrzysztof Kozlowski
Convert the SPI IR LED bindings to DT schema. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221204104323.117974-3-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: irled: pwm-ir-tx: convert to DT schemaKrzysztof Kozlowski
Convert the PWM IR LED bindings to DT schema. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221204104323.117974-2-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: irled: gpio-ir-tx: convert to DT schemaKrzysztof Kozlowski
Convert the GPIO IR LED bindings to DT schema. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221204104323.117974-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: mt6360: rework to match multi-ledKrzysztof Kozlowski
The binding allows two type of LEDs - single and multi-color. They differ with properties, so fix the bindings to accept both cases. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221127204058.57111-6-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: lp55xx: rework to match multi-ledKrzysztof Kozlowski
The binding allows two type of LEDs - single and multi-color. They differ with properties, so fix the bindings to accept both cases. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221127204058.57111-5-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: lp55xx: switch to preferred 'gpios' suffixKrzysztof Kozlowski
The preferred name suffix for properties with single and multiple GPIOs is "gpios". Linux GPIO core code supports both. The DTS has mixed usage, so switch to preferred naming: omap3-n900.dtb: lp5523@32: 'enable-gpios' does not match any of the regexes: '^led@[0-8]$', '^multi-led@[0-8]$', 'pinctrl-[0-9]+' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221127204058.57111-4-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: lp55xx: allow labelKrzysztof Kozlowski
The Linux driver and at least one upstream board use 'label' property: qcom/msm8996-xiaomi-gemini.dtb: lp5562@30: 'label' does not match any of the regexes: '^led@[0-8]$', '^multi-led@[0-8]$', 'pinctrl-[0-9]+' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221127204058.57111-3-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: use unevaluatedProperties for common.yamlKrzysztof Kozlowski
The common.yaml schema allows further properties, so the bindings using it should restrict it with unevaluatedProperties:false. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221127204058.57111-2-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: thermal: tsens: Add SM6115 compatibleAdam Skladowski
Document compatible for tsens on Qualcomm SM6115 platform according to downstream dts it ship v2.4 of IP Signed-off-by: Adam Skladowski <a39.skl@gmail.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221130200950.144618-3-a39.skl@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06of/kexec: Fix reading 32-bit "linux,initrd-{start,end}" valuesRob Herring
"linux,initrd-start" and "linux,initrd-end" can be 32-bit values even on a 64-bit platform. Ideally, the size should be based on '#address-cells', but that has never been enforced in the kernel's FDT boot parsing code (early_init_dt_check_for_initrd()). Bootloader behavior is known to vary. For example, kexec always writes these as 64-bit. The result of incorrectly reading 32-bit values is most likely the reserved memory for the original initrd will still be reserved for the new kernel. The original arm64 equivalent of this code failed to release the initrd reserved memory in *all* cases. Use of_read_number() to mirror the early_init_dt_check_for_initrd() code. Fixes: b30be4dc733e ("of: Add a common kexec FDT setup function") Cc: stable@vger.kernel.org Reported-by: Peter Maydell <peter.maydell@linaro.org> Link: https://lore.kernel.org/r/20221128202440.1411895-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: display: Convert fsl,imx-fb.txt to dt-schemaUwe Kleine-König
Compared to the txt description this adds clocks and clock-names to match reality. Note that fsl,imx-lcdc was picked as the new name as this is the actual hardware's name. There will be a new binding implementing the saner drm concept that is supposed to supersede this legacy fb binding Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221129180414.2729091-1-u.kleine-koenig@pengutronix.de Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: Add missing start and/or end of line regex anchorsRob Herring
json-schema patterns by default will match anywhere in a string, so typically we want at least the start or end anchored. Fix the obvious cases where the anchors were forgotten. Acked-by: Matti Vaittinen <mazziesaccount@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20221118223728.1721589-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: qcom,pdc: Add missing compatiblesLuca Weiss
Document the compatibles that are already in use in the upstream Linux kernel to resolve dtbs_check warnings. Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221013091208.356739-1-luca.weiss@fairphone.com Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: leds: sgm3140: Document ocp8110 compatibleAndré Apitzsch
Add devicetree binding for Orient Chip OCP8110 charge pump used for camera flash LEDs. Signed-off-by: André Apitzsch <git@apitzsch.eu> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220505185344.10067-1-git@apitzsch.eu Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: Move fixed string node names under 'properties'Rob Herring
Fixed string node names should be under 'properties' rather than 'patternProperties'. Additionally, without beginning and end of line anchors, any prefix or suffix is allowed on the specified node name. These cases don't appear to want a prefix or suffix, so move them under 'properties'. In some cases, the diff turns out to look like we're moving some patterns rather than the fixed string properties. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20221118223708.1721134-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06of: unittest: Convert to i2c's .probe_new()Uwe Kleine-König
In struct i2c_driver, field new_probe replaces the soon to be deprecated field probe. Update unittest for this change. The probe function doesn't make use of the i2c_device_id * parameter so it can be trivially converted. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Frank Rowand <frowand.list@gmail.com> Link: https://lore.kernel.org/r/20221118224540.619276-510-uwe@kleine-koenig.org [robh: Add Frank's commit msg addition] Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: Drop type from 'cpus' propertyRob Herring
'cpus' is a common property, and it is now defined in dtschema schemas, so drop the type references in the tree. Acked-by: Suzuki K Poulose <suzuki.poulse@arm.com> Acked-by: Bjorn Andersson <andersson@kernel.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20221111212857.4104308-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06dt-bindings: thermal: thermal-idle: Fix example pathsRob Herring
The reference by path (&{/cpus/cpu@101/thermal-idle}) in the example causes an error with new version of dtc: FATAL ERROR: Can't generate fixup for reference to path &{/cpus/cpu@100/thermal-idle} This is because the examples are built as an overlay and absolute paths are not valid as references must be by label. The path was also not resolvable because, by default, examples are placed under 'example-N' nodes. As the example contains top-level nodes, the root node must be explicit for the example to be extracted as-is. This changes the indentation for the whole example, but the existing indentation is a mess of of random amounts. Clean this up to be 4 spaces everywhere. Link: https://lore.kernel.org/r/20221111162729.3381835-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2022-12-06selftests/bpf: Allow building bpf tests with CONFIG_XFRM_INTERFACE=[m|n]Martin KaFai Lau
It is useful to use vmlinux.h in the xfrm_info test like other kfunc tests do. In particular, it is common for kfunc bpf prog that requires to use other core kernel structures in vmlinux.h Although vmlinux.h is preferred, it needs a ___local flavor of struct bpf_xfrm_info in order to build the bpf selftests when CONFIG_XFRM_INTERFACE=[m|n]. Cc: Eyal Birger <eyal.birger@gmail.com> Fixes: 90a3a05eb33f ("selftests/bpf: add xfrm_info tests") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221206193554.1059757-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-06bpftool: Fix memory leak in do_build_table_cbMiaoqian Lin
strdup() allocates memory for path. We need to release the memory in the following error path. Add free() to avoid memory leak. Fixes: 8f184732b60b ("bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221206071906.806384-1-linmq006@gmail.com
2022-12-06riscv, bpf: Emit fixed-length instructions for BPF_PSEUDO_FUNCPu Lehui
For BPF_PSEUDO_FUNC instruction, verifier will refill imm with correct addresses of bpf_calls and then run last pass of JIT. Since the emit_imm of RV64 is variable-length, which will emit appropriate length instructions accorroding to the imm, it may broke ctx->offset, and lead to unpredictable problem, such as inaccurate jump. So let's fix it with fixed-length instructions. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Suggested-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20221206091410.1584784-1-pulehui@huaweicloud.com
2022-12-06hisi_acc_vfio_pci: Enable PRE_COPY flagShameer Kolothum
Now that we have everything to support the PRE_COPY state, enable it. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20221123113236.896-5-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06hisi_acc_vfio_pci: Move the dev compatibility tests for early checkShameer Kolothum
Instead of waiting till data transfer is complete to perform dev compatibility, do it as soon as we have enough data to perform the check. This will be useful when we enable the support for PRE_COPY. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20221123113236.896-4-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitionsShameer Kolothum
The saving_migf is open in PRE_COPY state if it is supported and reads initial device match data. hisi_acc_vf_stop_copy() is refactored to make use of common code. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20221123113236.896-3-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06hisi_acc_vfio_pci: Add support for precopy IOCTLShameer Kolothum
PRECOPY IOCTL in the case of HiSiIicon ACC driver can be used to perform the device compatibility check earlier during migration. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20221123113236.896-2-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Enable MIGRATION_PRE_COPY flagShay Drory
Now that everything has been set up for MIGRATION_PRE_COPY, enable it. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-15-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY errorShay Drory
Before a SAVE command is issued, a QUERY command is issued in order to know the device data size. In case PRE_COPY is used, the above commands are issued while the device is running. Thus, it is possible that between the QUERY and the SAVE commands the state of the device will be changed significantly and thus the SAVE will fail. Currently, if a SAVE command is failing, the driver will fail the migration. In the above case, don't fail the migration, but don't allow for new SAVEs to be executed while the device is in a RUNNING state. Once the device will be moved to STOP_COPY, SAVE can be executed again and the full device state will be read. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-14-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Introduce multiple loadsYishai Hadas
In order to support PRE_COPY, mlx5 driver transfers multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. This patch implements the changes for the target VF to decompose the header for each state and to write and load multiple states. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-13-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Consider temporary end of stream as part of PRE_COPYYishai Hadas
During PRE_COPY the migration data FD may have a temporary "end of stream" that is reached when the initial_bytes were read and no other dirty data exists yet. For instance, this may indicate that the device is idle and not currently dirtying any internal state. When read() is done on this temporary end of stream the kernel driver should return ENOMSG from read(). Userspace can wait for more data or consider moving to STOP_COPY. To not block the user upon read() and let it get ENOMSG we add a new state named MLX5_MIGF_STATE_PRE_COPY on the migration file. In addition, we add the MLX5_MIGF_STATE_SAVE_LAST state to block the read() once we call the last SAVE upon moving to STOP_COPY. Any further error will be marked with MLX5_MIGF_STATE_ERROR and the user won't be blocked. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-12-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Introduce vfio precopy ioctl implementationYishai Hadas
vfio precopy ioctl returns an estimation of data available for transferring from the device. Whenever a user is using VFIO_MIG_GET_PRECOPY_INFO, track the current state of the device, and if needed, append the dirty data to the transfer FD data. This is done by saving a middle state. As mlx5 runs the SAVE command asynchronously, make sure to query for incremental data only once there is no active save command. Running both in parallel, might end-up with a failure in the incremental query command on un-tracked vhca. Also, a middle state will be saved only after the previous state has finished its SAVE command and has been fully transferred, this prevents endless use resources. Co-developed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Introduce SW headers for migration statesYishai Hadas
As mentioned in the previous patches, mlx5 is transferring multiple states when the PRE_COPY protocol is used. This states mechanism requires the target VM to know the states' size in order to execute multiple loads. Therefore, add SW header, with the needed information, for each saved state the source VM is transferring to the target VM. This patch implements the source VM handling of the headers, following patch will implement the target VM handling of the headers. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-10-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Introduce device transitions of PRE_COPYYishai Hadas
In order to support PRE_COPY, mlx5 driver is transferring multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. The device is saving three kinds of states: 1) Initial state - when the device moves to PRE_COPY state. 2) Middle state - during PRE_COPY phase via VFIO_MIG_GET_PRECOPY_INFO. There can be multiple states of this type. 3) Final state - when the device moves to STOP_COPY state. After moving to PRE_COPY state, user is holding the saving migf FD and can use it. For example: user can start transferring data via read() callback. Also, user can switch from PRE_COPY to STOP_COPY whenever he sees it fits. This will invoke saving of final state. This means that mlx5 VFIO device can be switched to STOP_COPY without transferring any data in PRE_COPY state. Therefore, when the device moves to STOP_COPY, mlx5 will store the final state on a dedicated queue entry on the list. Co-developed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-9-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Refactor to use queue based data chunksYishai Hadas
Refactor to use queue based data chunks on the migration file. The SAVE command adds a chunk to the tail of the queue while the read() API finds the required chunk and returns its data. In case the queue is empty but the state of the migration file is MLX5_MIGF_STATE_COMPLETE, read() may not be blocked but will return 0 to indicate end of file. This is a step towards maintaining multiple images and their meta data (i.e. headers) on the migration file as part of next patches from the series. Note: At that point, we still use a single chunk on the migration file but becomes ready to support multiple. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Refactor migration file stateYishai Hadas
Refactor migration file state to be an emum which is mutual exclusive. As of that dropped the 'disabled' state as 'error' is the same from functional point of view. Next patches from the series will extend this enum for other relevant states. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Refactor MKEY usageYishai Hadas
This patch refactors MKEY usage such as its life cycle will be as of the migration file instead of allocating/destroying it upon each SAVE/LOAD command. This is a preparation step towards the PRE_COPY series where multiple images will be SAVED/LOADED. We achieve it by having a new struct named mlx5_vhca_data_buffer which holds the mkey and its related stuff as of sg_append_table, allocated_length, etc. The above fields were taken out from the migration file main struct, into mlx5_vhca_data_buffer dedicated struct with the proper helpers in place. For now we have a single mlx5_vhca_data_buffer per migration file. However, in coming patches we'll have multiple of them to support multiple images. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Refactor PD usageYishai Hadas
This patch refactors PD usage such as its life cycle will be as of the migration file instead of allocating/destroying it upon each SAVE/LOAD command. This is a preparation step towards the PRE_COPY series where multiple images will be SAVED/LOADED and a single PD can be simply reused. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio/mlx5: Enforce a single SAVE command at a timeYishai Hadas
Enforce a single SAVE command at a time. As the SAVE command is an asynchronous one, we must enforce running only a single command at a time. This will preserve ordering between multiple calls and protect from races on the migration file data structure. This is a must for the next patches from the series where as part of PRE_COPY we may have multiple images to be saved and multiple SAVE commands may be issued from different flows. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio: Extend the device migration protocol with PRE_COPYJason Gunthorpe
The optional PRE_COPY states open the saving data transfer FD before reaching STOP_COPY and allows the device to dirty track internal state changes with the general idea to reduce the volume of data transferred in the STOP_COPY stage. While in PRE_COPY the device remains RUNNING, but the saving FD is open. Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P, which halts P2P transfers while continuing the saving FD. PRE_COPY, with P2P support, requires the driver to implement 7 new arcs and exists as an optional FSM branch between RUNNING and STOP_COPY: RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to query the progress of the precopy operation in the driver with the idea it will judge to move to STOP_COPY at least once the initial data set is transferred, and possibly after the dirty size has shrunk appropriately. This ioctl is valid only in PRE_COPY states and kernel driver should return -EINVAL from any other migration state. Compared to the v1 clarification, STOP_COPY -> PRE_COPY is blocked and to be defined in future. We also split the pending_bytes report into the initial and sustaining values, e.g.: initial_bytes and dirty_bytes. initial_bytes: Amount of initial precopy data. dirty_bytes: Device state changes relative to data previously retrieved. These fields are not required to have any bearing to STOP_COPY phase. It is recommended to leave PRE_COPY for STOP_COPY only after the initial_bytes field reaches zero. Leaving PRE_COPY earlier might make things slower. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>